Tesis sobre el tema "Floating point"
Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros
Consulte los 50 mejores tesis para su investigación sobre el tema "Floating point".
Junto a cada fuente en la lista de referencias hay un botón "Agregar a la bibliografía". Pulsa este botón, y generaremos automáticamente la referencia bibliográfica para la obra elegida en el estilo de cita que necesites: APA, MLA, Harvard, Vancouver, Chicago, etc.
También puede descargar el texto completo de la publicación académica en formato pdf y leer en línea su resumen siempre que esté disponible en los metadatos.
Explore tesis sobre una amplia variedad de disciplinas y organice su bibliografía correctamente.
Skogstrøm, Kristian. "Implementation of Floating-point Coprocessor". Thesis, Norwegian University of Science and Technology, Department of Computer and Information Science, 2005. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-9202.
Texto completoThis thesis presents the architecture and implementation of a high-performance floating-point coprocessor for Atmel's new microcontroller. The coprocessor architecture is based on a fused multiply-add pipeline developed in the specialization project, TDT4720. This pipeline has been optimized significantly and extended to support negation of all operands and single-precision input and output. New hardware has been designed for the decode/fetch unit, the register file, the compare/convert pipeline and the approximation tables. Division and square root is performed in software using Newton-Raphson iteration. The Verilog RTL implementation has been synthesized at 167 MHz using a 0.18 um standard cell library. The total area of the final implementation is 107 225 gates. The coprocessor has also been synthesized with the CPU. Test-programs have been run to verify that the coprocessor works correctly. A complete verification of the floating-point coprocessor, however, has not been performed due to limitations in time.
Zhang, Yiwei. "Biophysically accurate floating point neuroprocessors". Thesis, University of Bristol, 2011. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.544427.
Texto completoBaidas, Zaher Abdulkarim. "High-level floating-point synthesis". Thesis, University of Southampton, 2000. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.325049.
Texto completoDuracz, Jan Andrzej. "Verification of floating point programs". Thesis, Aston University, 2010. http://publications.aston.ac.uk/15778/.
Texto completoRoss, Johan y Hans Engström. "Voice Codec for Floating Point Processor". Thesis, Linköping University, Department of Electrical Engineering, 2008. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-15763.
Texto completoAs part of an ongoing project at the department of electrical engineering, ISY, at Linköping University, a voice decoder using floating point formats has been the focus of this master thesis. Previous work has been done developing an mp3-decoder using the floating point formats. All is expected to be implemented on a single DSP.The ever present desire to make things smaller, more efficient and less power consuming are the main reasons for this master thesis regarding the use of a floating point format instead of the traditional integer format in a GSM codec. The idea with the low precision floating point format is to be able to reduce the size of the memory. This in turn reduces the size of the total chip area needed and also decreases the power consumption.One main question is if this can be done with the floating point format without losing too much sound quality of the speech. When using the integer format, one can represent every value in the range depending on how many bits are being used. When using a floating point format you can represent larger values using fewer bits compared to the integer format but you lose representation of some values and have to round the values off.From the tests that have been made with the decoder during this thesis, it has been found that the audible difference between the two formats is very small and can hardly be heard, if at all. The rounding seems to have very little effect on the quality of the sound and the implementation of the codec has succeeded in reproducing similar sound quality to the GSM standard decoder.
Englund, Madeleine. "Hybrid Floating-point Units in FPGAs". Thesis, Linköpings universitet, Datorteknik, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-86587.
Texto completoXiao, Yancheng. "Two floating point LLL reduction algorithms". Thesis, McGill University, 2013. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=114503.
Texto completoLe Lenstra, Lenstra et réduction Lovasz (LLL) est la réduction de réseaux plus populaire et il est un outil puissant pour résoudre de nombreux problèmes complexes en mathématiques et en informatique. La technique bloc LLL bloquante reformule les algorithmes en termes de matrice-matrice opérations de permettre la réutilisation efficace des données dans les algorithmes bloc LLL. Dans cette thèse, nous utilisons la technique de blocage de développer les deux algorithmes de réduction bloc LLL en points flottants, l'algorithme de réduction bloc LLL de la gauche vers la droite (LRBLLL) et l'algorithme de réduction bloc LLL en partition alternative (APBLLL), et donner a l'analyse de la complexité des ces deux algorithmes. Nous comparons ces deux algorithmes de réduction bloc LLL avec l'algorithme de réduction LLL original (en arithmétique au point flottant) et l'algorithme de réduction LLL partielle (PLLL) dans la littérature en termes de temps d'exécution CPU, flops et les erreurs de l'arrière par rapport. Les résultats des simulations montrent que les temps d'exécution CPU pour les deux algorithmes de réduction blocs LLL sont plus rapides que l'algorithme de réduction LLL partielle et beaucoup plus rapide que la réduction LLL originale, même si les deux algorithmes par bloc coûtent plus de flops que l'algorithme de réduction LLL partielle dans certains cas. L'inconvénient de ces deux algorithmes par blocs, c'est que parfois, ils peuvent n'être pas aussi stable numériquement que les algorithmes originaux et les algorithmes de réduction LLL partielle. Le parallélisation de APBLLL est discutée.
Kupriianova, Olga. "Towards a modern floating-point environment". Thesis, Paris 6, 2015. http://www.theses.fr/2015PA066584/document.
Texto completoThis work investigates two ways of enlarging the current floating-point environment. The first is to support several implementation versions of each mathematical function (elementary such as $\exp$ or $\log$ and special such as $\erf$ or $\Gamma$), the second one is to provide IEEE754 operations that mix the inputs and the output of different \radixes. As the number of various implementations for each mathematical function is large, this work is focused on code generation. Our code generator supports the huge variety of functions: it generates parametrized implementations for the user-specified functions. So it may be considered as a black-box function generator. This work contains a novel algorithm for domain splitting and an approach to replace branching on reconstruction by a polynomial. This new domain splitting algorithm produces less subdomains and the polynomial degrees on adjacent subdomains do not change much. To produce vectorizable implementations, if-else statements on the reconstruction step have to be avoided. Since the revision of the IEEE754 Standard in 2008 it is possible to mix numbers of different precisions in one operation. However, there is no mechanism that allows users to mix numbers of different radices in one operation. This research starts an examination ofmixed-radix arithmetic with the worst cases search for FMA. A novel algorithm to convert a decimal character sequence of arbitrary length to a binary floating-point number is presented. It is independent of currently-set rounding mode and produces correctly-rounded results
Kupriianova, Olga. "Towards a modern floating-point environment". Electronic Thesis or Diss., Paris 6, 2015. http://www.theses.fr/2015PA066584.
Texto completoThis work investigates two ways of enlarging the current floating-point environment. The first is to support several implementation versions of each mathematical function (elementary such as exp or log and special such as erf or Γ), the second one is to provide IEEE754 operations that mix the inputs and the output of different radixes. As the number of various implementations for each mathematical function is large, this work is focused on code generation. Our code generator supports the huge variety of functions: it generates parametrized implementations for the user-specified functions. So it may be considered as a black-box function generator. This work contains a novel algorithm for domain splitting and an approach to replace branching on reconstruction by a polynomial. This new domain splitting algorithm produces less subdomains and the polynomial degrees on adjacent subdomains do not change much. To produce vectorizable implementations, if-else statements on the reconstruction step have to be avoided. Since the revision of the IEEE754 Standard in 2008 it is possible to mix numbers of different precisions in one operation. However, there is no mechanism that allows users to mix numbers of different radices in one operation. This research starts an examination ofmixed-radix arithmetic with the worst cases search for FMA. A novel algorithm to convert a decimal character sequence of arbitrary length to a binary floating-point number is presented. It is independent of currently-set rounding mode and produces correctly-rounded results
Aamodt, Tor. "Floating-point to fixed-point compilation and embedded architectural support". Thesis, National Library of Canada = Bibliothèque nationale du Canada, 2001. http://www.collectionscanada.ca/obj/s4/f2/dsk3/ftp04/MQ58787.pdf.
Texto completoShen, Shumin. "A floating-point analog-to-digital converter". Thesis, University of Ottawa (Canada), 2004. http://hdl.handle.net/10393/26772.
Texto completoPanisset, Jean François. "A double precision floating point convolution processor /". Thesis, McGill University, 1994. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=68047.
Texto completoSince convolution is basically a two-dimensional multiply and accumulate operation, it is computationally intensive. General-purpose computer architectures are often ill-suited to perform two-dimensional convolutions, since they lack the required processing speed or memory bandwidth. This motivated the project to design and build a specialized device which can compute the convolution operation efficiently for such applications.
This thesis addresses the design and implementation of a specialized processor which can perform two-dimensional convolution using double-precision floating-point operands. The selected architecture is based on the concept of the systolic array. These architectures are reviewed particularly for the constraints which impact their logical and physical design, as well as for the numerous applications for which they have been proposed in the literature or have been implemented. After outlining the overall system architecture of the convolution processor, the thesis focuses on the details of the implementation of the bus interface and Direct Memory Access controller. Finally, the performance of the proposed design is evaluated and compared against alternative software implementations of the convolution algorithm on representative architectures. (Abstract shortened by UMI.)
Zhang, Michael Ruogu 1977. "Software floating-point computation on parallel mahcines". Thesis, Massachusetts Institute of Technology, 1999. http://hdl.handle.net/1721.1/80133.
Texto completoIncludes bibliographical references (p. 71).
by Michael Ruogu Zhang.
M.Eng.
Havermark, Joel. "Bit-Vector Approximations of Floating-Point Arithmetic". Thesis, Uppsala universitet, Institutionen för informationsteknologi, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-372077.
Texto completoKolumban, Gaspar. "Low Cost Floating-Point Extensions to a Fixed-Point SIMD Datapath". Thesis, Linköpings universitet, Datorteknik, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-101586.
Texto completoDebski, Michal. "Self-calibrating floating-point analog-to-digital converter". Thesis, University of Ottawa (Canada), 2005. http://hdl.handle.net/10393/26884.
Texto completoPillai, Rajan V. K. "On low power floating point data path architectures". Thesis, National Library of Canada = Bibliothèque nationale du Canada, 2000. http://www.collectionscanada.ca/obj/s4/f2/dsk1/tape4/PQDD_0021/NQ47712.pdf.
Texto completoShah, Syed Yawar Ali. "On synthesis and optimization of floating point units". Thesis, National Library of Canada = Bibliothèque nationale du Canada, 2001. http://www.collectionscanada.ca/obj/s4/f2/dsk3/ftp04/MQ59309.pdf.
Texto completoDrolet, Jean. "The design of a floating-point convolution system /". Thesis, McGill University, 1992. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=56813.
Texto completoThis thesis presents the design of a specialized convolution processor that operates on double precision floating-point data. This convolver is based on an array of systolic cells and may be configured to process both images and unidimensional signals. Support circuitry handles data format conversion as well as data sequencing for the systolic array. In addition, the processor communicates with the memory of a host computer via a DMA (direct memory access) interface to he VMEbus. In this thesis, the design of these auxiliary subsystems is emphasized and their implementation in application specific integrated circuits (ASIC) is presented.
Jain, Sheetal A. 1980. "Low-power single-precision IEEE Floating-point unit". Thesis, Massachusetts Institute of Technology, 2003. http://hdl.handle.net/1721.1/87426.
Texto completoHellman, Noah. "Mitchell-Based Approximate Operations on Floating-Point Numbers". Thesis, Linköpings universitet, Datorteknik, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-178882.
Texto completoMishra, Biswajit. "Investigation into a Floating Point Geometric Algebra Processor". Thesis, University of Southampton, 2007. https://eprints.soton.ac.uk/266009/.
Texto completoRaina, Saurabh-Kumar. "FLIP, a floating-point library for integer processors". Lyon, École normale supérieure (sciences), 2006. http://www.theses.fr/2006ENSL0369.
Texto completoCollingbourne, Peter Cyrus. "Symbolic crosschecking of data-parallel floating point code". Thesis, Imperial College London, 2013. http://hdl.handle.net/10044/1/10936.
Texto completoBrown, Ashley W. "Profile-directed specialisation of custom floating-point hardware". Thesis, Imperial College London, 2010. http://hdl.handle.net/10044/1/5604.
Texto completoDeLorimier, Michael DeHon André. "Floating-point sparse matrix-vector multiply for FPGAs /". Diss., Pasadena, Calif. : California Institute of Technology, 2005. http://resolver.caltech.edu/CaltechETD:etd-05132005-144347.
Texto completoMcCleeary, Ryan. "Lazy exact real arithmetic using floating point operations". Diss., University of Iowa, 2019. https://ir.uiowa.edu/etd/6991.
Texto completoDe, Blasio Simone y Karpers Fredrik Ekstedt. "Comparing the precision in matrix multiplication between Posits and IEEE 754 floating-points : Assessing precision improvement with emerging floating-point formats". Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-280036.
Texto completoIEEE 754 flyttal är den nuvarande standarden för att representera reella tal i datorer, men det finns framväxande alternativa format. Ett av dessa nya format är Posit. Huvudkarakteristiken för Posit är att formatet möjliggör för högre precision än IEEE 754 flyttal med samma bitstorlek för värden av magnitud nära 1, men lägre precision för värden av mycket mindre eller större magnitud Denna studie jämförde precisionen mellan flyttal av formaten IEEE 754 och Posit när det gäller matrismultiplikation. Olika storlekar av matriser jämfördes, samt olika intervall av värden som matriselementen genererades i. Resultaten visade att Posits presterade bättre än IEEE 754 flyttal när det gäller precision när värdena är i ett intervall lika med eller större än [0:01; 0:01), eller lika med eller mindre än [100; 100). Matrisstorlek hade inte en anmärkningsvärd effekt på detta förutom när formatet Quire användes för att eliminera avrundningsfel. I nästan alla andra intervall presterade IEEE 754 flyttal bättre än Posits. Även om de flesta av våra resultat gynnade IEEE 754-flyttal, har Posits en precisions fördel om man kan vara säker på att värdena ligger inom det ideella intervallet. Posits kan alltså ha en roll att spela i framtiden för representation av flyttal.
Robe, Edward D. "SIMULINK modules that emulate digital controllers realized with fixed-point or floating-point arithmetic". Ohio : Ohio University, 1994. http://www.ohiolink.edu/etd/view.cgi?ohiou1180120138.
Texto completoCatanzaro, Bryan C. "Higher radix floating-point representations for FPGA-based arithmetic /". Diss., CLICK HERE for online access, 2005. http://contentdm.lib.byu.edu/ETD/image/etd808.pdf.
Texto completoDahlberg, Anders. "Evaluation of a Floating Point Acoustic Echo Canceller Implementation". Thesis, Linköping University, Department of Electrical Engineering, 2007. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-8938.
Texto completoThis master thesis consists of implementation and evaluation of an AEC, Acoustic Echo Canceller, algorithm in a floating-point architecture. The most important question this thesis will try to answer is to determine benefits or drawbacks of using a floating-point architecture, relative a fixed-point architecture, to do AEC. In a telephony system there is two common forms of echo, line echo and acoustic echo. Acoustic echo is introduced by sound emanating from a loudspeaker, e.g. in a handsfree or speakerphone, being picked up by a microphone and then sent back to the source. The problem with this feedback is that the far-end speaker will hear one, or multiple, time-delayed version(s) of her own speech. This time-delayed version of speech is usually perceived as both confusing and annoying unless removed by the use of AEC. In this master thesis the performance of a floating-point version of a normalized least-mean-square AEC algorithm was evaluated in an environment designed and implemented to approximate live telephony calls. An instruction-set simulator and assembler available at the initiation of this master thesis were extended to enable; zero-overhead loops, modular addressing, post-increment of registers and register-write forwarding. With these improvements a bit-true assembly version was implemented capable of real-time AEC requiring 15 million instructions per second. A solution using as few as eight mantissa bits, in an external format used when storing data in memory, was found to have an insignificant effect on the selected AEC implementation’s performance. Due to the relatively low memory requirement of the selected AEC algorithm, the use of a small external format has a minor effect on the required memory size. In total this indicates that the possible reduction of the memory requirement and related energy consumption, does not justify the added complexity and energy consumption of using a floating-point architecture for the selected algorithm. Use of a floating-point format can still be advantageous in speech-related signal processing when the introduced time delay by a subband, or a similar frequency domain, solution is unacceptable. Speech algorithms that have high memory use and small introduced delay requirements are a good candidate for a floating-point digital signal processor architecture.
Costello, Joseph Patrick. "Behavioural synthesis of low-power floating point CORDIC processors". Thesis, National Library of Canada = Bibliothèque nationale du Canada, 2000. http://www.collectionscanada.ca/obj/s4/f2/dsk1/tape4/PQDD_0032/MQ65854.pdf.
Texto completoLyu, Chung-nan. "Pipelined floating point divider with built-in testing circuits". Ohio : Ohio University, 1988. http://www.ohiolink.edu/etd/view.cgi?ohiou1182864748.
Texto completoCôté, Jean-François 1966. "The design of a testable floating point convolution processor /". Thesis, McGill University, 1990. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=60002.
Texto completoHok, Ho Chun. "Customisable and reconfigurable platform for optimising floating point computations". Thesis, Imperial College London, 2010. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.509798.
Texto completoLiew, Daniel Simon. "Symbolic execution of verification languages and floating-point code". Thesis, Imperial College London, 2017. http://hdl.handle.net/10044/1/59705.
Texto completoWittman, Susan Jean. "Servo compensation using a floating point digital signal processor". Thesis, Massachusetts Institute of Technology, 1989. http://hdl.handle.net/1721.1/39018.
Texto completoLyu, Chuang-nan. "Pipelined floating point divider with built-in testing circuits". Ohio University / OhioLINK, 1988. http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1182864748.
Texto completoRatan, Amrita. "Hardware Modules for Safe Integer and Floating-Point Arithmetic". University of Cincinnati / OhioLINK, 2013. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1383812316.
Texto completoLugo, Martinez Jose E. "Strategies for sharing a floating point unit between SPEs". Diss., [La Jolla] : University of California, San Diego, 2010. http://wwwlib.umi.com/cr/ucsd/fullcit?p1470744.
Texto completoTitle from first page of PDF file (viewed February 17, 2010). Available via ProQuest Digital Dissertations. Includes bibliographical references (p. 55-57).
Costello, Joseph Patrick. "Behavioural synthesis of low-power floating point CORDIC processors". Ottawa : National Library of Canada = Bibliothèque nationale du Canada, 2002. http://www.nlc-bnc.ca/obj/s4/f2/dsk1/tape4/PQDD%5F0032/MQ65854.pdf.
Texto completoCatanzaro, Bryan Christopher. "Higher Radix Floating-Point Representations for FPGA-Based Arithmetic". BYU ScholarsArchive, 2005. https://scholarsarchive.byu.edu/etd/311.
Texto completoCoors, Martin. "A floating-point to fixed-point design flow for high performance digital signal processors /". Aachen : Shaker, 2005. http://bvbr.bib-bvb.de:8991/F?func=service&doc_library=BVB01&doc_number=013834304&line_number=0001&func_code=DB_RECORDS&service_type=MEDIA.
Texto completoStenersen, Espen. "Vectorized 128-bit Input FP16/FP32/FP64 Floating-Point Multiplier". Thesis, Norwegian University of Science and Technology, Department of Electronics and Telecommunications, 2008. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-8876.
Texto completo3D graphic accelerators are often limited by their floating-point performance. A Graphic Processing Unit (GPU) has several specialized floating-point units to achieve high throughput and performance. The floating-point units consume a large part of total area, and power consumption, and hence architectural choices are important to evaluate when implementing the design. GPUs are specially tuned for performing a set of operations on large sets of data. The task of a 3D graphic solution is to render a image or a scene. The scene contains geometric primitives as well as descriptions of the light, the way each object reflects light and the viewer position and orientation. This thesis evaluates four different pipelined, vectorized floating-point multipliers, supporting 16-bit, 32-bit and 64-bit floating-point numbers. The architectures are compared concerning area usage, power consumption and performance. Two of the architectures are implemented at Register Transfer Level (RTL), tested and synthesized, to see if assumptions made in the estimation methodologies are accurate enough to select the best architecture to implement given a set of architectures and constraints. The first architecture trades area for lower power consumption with a throughput of 38.4 Gbit/s at 300 MHz clock frequency, and the second architecture trades power for smaller area with equal throughput. The two architectures are synthesized at 200 MHz, 300 MHz and 400 MHz clock frequency, in a 65 nm low-power standard cell library and a 90 nm general purpose library, and for different input data format distributions, to compare area and power results at different clock frequencies, input data distributions and target technology. Architecture one has lower power consumption than architecture two at all clock frequencies and input data format distributions. At 300 MHz, architecture one has a total power consumption of 1.9210 mW at 65 nm, and 15.4090 mW at 90 nm. Architecture two has a total power consumption of 7.3569 mW at 65 nm, and 17.4640 mW at 90 nm. Architecture two requires less area than architecture one at all clock frequencies. At 300 MHz, architecture one has a total area of 59816.4414 um^2 at 65 nm, and 116362.0625 um^2 at 90 nm. Architecture two has a total area of 50843.0 um^2 at 65 nm, and 95242.0469 um^2 at 90 nm.
Lu, Chung-Kuei. "A design of floating point FFT using Genesil Silicon Compiler". Thesis, Monterey, California. Naval Postgraduate School, 1991. http://hdl.handle.net/10945/30956.
Texto completoDutta, Sumit Ph D. Massachusetts Institute of Technology. "Floating-point unit (FPU) designs with nano-electromechanical (NEM) relays". Thesis, Massachusetts Institute of Technology, 2013. http://hdl.handle.net/1721.1/84724.
Texto completoThis electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Includes bibliographical references (pages 71-74).
Nano-electromechanical (NEM) relays are an alternative to CMOS transistors as the fabric of digital circuits. Circuits with NEM relays offer energy-efficiency benefits over CMOS since they have zero leakage power and are strategically designed to maintain throughput that is competitive with CMOS despite their slow actuation times. The floating-point unit (FPU) is the most complex arithmetic unit in a computational system. This thesis investigates if the energy-efficiency promise of NEM relays demonstrated before on smaller circuit blocks holds for complex computational structures such as the FPU. The energy, performance, and area trade-offs of FPU designs with NEM relays are examined and compared with that of state-of-the-art CMOS designs in an equivalent scaled process. Circuits that are critical path bottlenecks, including primarily the leading zero detector (LZD) and leading zero anticipator (LZA) blocks, are carefully identified and optimized for low latency and device count. We manage to drop the NEM relay FPU latency from 71 mechanical delays in a CMOS-style implementation to 16 mechanical delays in a NEM relay pass-logic style implementation. The FPU designed with NEM relays features 15x lower energy per operation compared to CMOS.
by Sumit Dutta.
S.M.
Peterson, Scott Thomas. "Experimental response and analysis of the Evergreen Point Floating Bridge". Connect to this title online, 2002. http://www.dissertations.wsu.edu/dissertations/Fall2002/s%5Fpeterson%5F102102.pdf.
Texto completoPlet, Antoine. "Contribution to error analysis of algorithms in floating-point arithmetic". Thesis, Lyon, 2017. http://www.theses.fr/2017LYSEN038/document.
Texto completoFloating-point arithmetic is an approximation of real arithmetic in which each operation may introduce a rounding error. The IEEE 754 standard requires elementary operations to be as accurate as possible. However, through a computation, rounding errors may accumulate and lead to totally wrong results. It happens for example with an expression as simple as ab + cd for which the naive algorithm sometimes returns a result with a relative error larger than 1. Thus, it is important to analyze algorithms in floating-point arithmetic to understand as thoroughly as possible the generated error. In this thesis, we are interested in the analysis of small building blocks of numerical computing, for which we look for sharp error bounds on the relative error. For this kind of building blocks, in base and precision p, we often successfully prove error bounds of the form α·u + o(u²) where α > 0 and u = 1/2·β1-p is the unit roundoff. To characterize the sharpness of such a bound, one can provide numerical examples for the standard precisions that are close to the bound, or examples that are parametrized by the precision and generate an error of the same form α·u + o(u²), thus proving the asymptotic optimality of the bound. However, the paper and pencil checking of such parametrized examples is a tedious and error-prone task. We worked on the formalization of a symbolicfloating-point arithmetic, over numbers that are parametrized by the precision, and implemented it as a library in the Maple computer algebra system. We also worked on the error analysis of the basic operations for complex numbers in floating-point arithmetic. We proved a very sharp error bound for an algorithm for the inversion of a complex number in floating-point arithmetic. This result suggests that the computation of a complex division according to x/y = (1/y)·x may be preferred, instead of the more classical formula x/y = (x·y)/|y|². Indeed, for any complex multiplication algorithm, the error bound is smaller with the algorithms described by the “inverse and multiply” approach.This is a joint work with my PhD advisors, with the collaboration of Claude-Pierre Jeannerod (CR Inria in AriC, at LIP)
El, Moussawi Ali Hassan. "SIMD-aware word length optimization for floating-point to fixed-point conversion targeting embedded processors". Thesis, Rennes 1, 2016. http://www.theses.fr/2016REN1S150/document.
Texto completoIn order to cut-down their cost and/or their power consumption, many embedded processors do not provide hardware support for floating-point arithmetic. However, applications in many domains, such as signal processing, are generally specified using floating-point arithmetic for the sake of simplicity. Porting these applications on such embedded processors requires a software emulation of floating-point arithmetic, which can greatly degrade performance. To avoid this, the application is converted to use fixed-point arithmetic instead. Floating-point to fixed-point conversion involves a subtle tradeoff between performance and precision ; it enables the use of narrower data word lengths at the cost of degrading the computation accuracy. Besides, most embedded processors provide support for SIMD (Single Instruction Multiple Data) as a mean to improve performance. In fact, this allows the execution of one operation on multiple data in parallel, thus ultimately reducing the execution time. However, the application should usually be transformed in order to take advantage of the SIMD instruction set. This transformation, known as Simdization, is affected by the data word lengths ; narrower word lengths enable a higher SIMD parallelism rate. Hence the tradeoff between precision and Simdization. Many existing work aimed at provide/improving methodologies for automatic floating-point to fixed-point conversion on the one side, and Simdization on the other. In the state-of-the-art, both transformations are considered separately even though they are strongly related. In this context, we study the interactions between these transformations in order to better exploit the performance/accuracy tradeoff. First, we propose an improved SLP (Superword Level Parallelism) extraction (an Simdization technique) algorithm. Then, we propose a new methodology to jointly perform floating-point to fixed-point conversion and SLP extraction. Finally, we implement this work as a fully automated source-to-source compiler flow. Experimental results, targeting four different embedded processors, show the validity of our approach in efficiently exploiting the performance/accuracy tradeoff compared to a typical approach, which considers both transformations independently
Coors, Martin [Verfasser]. "A Floating-Point to Fixed-Point Design Flow for High Performance Digital Signal Processors / Martin Coors". Aachen : Shaker, 2005. http://d-nb.info/1181610834/34.
Texto completo