Tesis sobre el tema "Embedded Processors"

Siga este enlace para ver otros tipos de publicaciones sobre el tema: Embedded Processors.

Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros

Elija tipo de fuente:

Consulte los 50 mejores tesis para su investigación sobre el tema "Embedded Processors".

Junto a cada fuente en la lista de referencias hay un botón "Agregar a la bibliografía". Pulsa este botón, y generaremos automáticamente la referencia bibliográfica para la obra elegida en el estilo de cita que necesites: APA, MLA, Harvard, Vancouver, Chicago, etc.

También puede descargar el texto completo de la publicación académica en formato pdf y leer en línea su resumen siempre que esté disponible en los metadatos.

Explore tesis sobre una amplia variedad de disciplinas y organice su bibliografía correctamente.

1

Gong, Shaojie y Zhongping Deng. "Benchmarks for Embedded Multi-processors". Thesis, Halmstad University, School of Information Science, Computer and Electrical Engineering (IDE), 2007. http://urn.kb.se/resolve?urn=urn:nbn:se:hh:diva-660.

Texto completo
Resumen

During the recent years, computer performance has increased dramatically. To measure

the performance of computers, benchmarks are ideal tools. Benchmarks exist in many

areas and point to different applications. For instance, in a normal PC, benchmarks can be

used to test the performance of the whole system which includes the CPU, graphic card,

memory system, etc. For multiprocessor systems, there also exist open source benchmark

programs. In our project, we gathered information about some open benchmark programs

and investigated their applicability for evaluating embedded multiprocessor systems

intended for radar signal processing. During our investigation, parallel cluster systems

and embedded multiprocessor systems were studied. Two benchmark programs, HPL and

NAS Parallel Benchmark were identified as particularly relevant for the application field.

The benchmark testing was done on a parallel cluster system which has an architecture

that is similar to the architecture of embedded multiprocessor systems, used for radar

signal processing.

Los estilos APA, Harvard, Vancouver, ISO, etc.
2

Dasarathan, Dinesh. "Benchmark Characterization of Embedded Processors". NCSU, 2005. http://www.lib.ncsu.edu/theses/available/etd-05152005-170108/.

Texto completo
Resumen
The design of a processor is an iterative process, with many cycles of simulation, performance analysis and subsequent changes. The inputs to these cycles of simulations are generally a selected subset of standard benchmarks. To aid in reducing the number of cycles involved in design, one can characterize these selected benchmarks and use those characteristics to hit at a good initial design that will converge faster. Methods and systems to characterize benchmarks for normal processors are designed and implemented. This thesis extends these approaches and defines an abstract system to characterize benchmarks for embedded processors, taking into consideration the architectural requirements, power constraints and code compressibility. To demonstrate this method, around 25 benchmarks are characterized (10 from SPEC, and 15 from standard embedded benchmark suites - Mediabench and Netbench), and compared. Moreover, the similarities between these benchmarks are also analyzed and presented.
Los estilos APA, Harvard, Vancouver, ISO, etc.
3

Ryu, Soojung. "Storage Management for Embedded SIMD Processors". Diss., Georgia Institute of Technology, 2003. http://hdl.handle.net/1853/5122.

Texto completo
Resumen
SIMD parallelism offers a high performance and efficient execution approach for today's broad range of portable multimedia consumer products. However, new methods are needed to meet the complex demands of high performance, embedded systems. This research explores new storage management techniques for this focused but critical application. These techniques include memory design exploration based on the application retargeting technique, storage-based systolic instruction broadcast, and systolic virtual memory to improve both the performance and efficiency of embedded SIMD systems. For an efficient storage usage by memory design space exploration in embedded SIMD systems, an analysis method for assessing storage needs and costs of a given application automatically retargeted across a spectrum of storage configuration designs was developed. Using this technique, a SIMD processing element achieves optimal area and energy efficiency with a register file containing between 8 and 12 words for given workload. This configuration is between 15% and 25% more area and energy efficient than other memory configurations being considered. Systolic instruction broadcast is a high performance and area efficient instruction broadcasting scheme with short-wire interconnects by eliminating of wire latency bottleneck found in global instruction broadcast. Three implementation methods are defined and evaluated - software method, 2-write port register file method, and bypass method. In our evaluations, due to the system's short clock cycle time and scheduler, a speedup in system performance of up to 7.5 can be achieved by the year 2010. In addition, speedup of area efficiency also can be achieved up to 7.2 for a given workload. The ability of minimizing off-chip memory access latency while maximizing access frequency by scheduling techniques along with data prefetch techniques in systolic virtual memory mechanism was evaluated using our SIMD-systolic architecture simulator. Results show that, systolic virtual off-chip memory with shared address space can achieve over 50% higher area efficiency than that of an on-chip only system for a matrix multiplication application.
Los estilos APA, Harvard, Vancouver, ISO, etc.
4

Johnson, N. E. "Code size optimization for embedded processors". Thesis, University of Cambridge, 2004. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.605626.

Texto completo
Resumen
This thesis studies the problem of reducing code size produced by an optimizing compiler. We develop the Value State Dependence Graph (VSDG) as a powerful intermediate form. Nodes represent computation, and edges represent value (data) and state (control) dependencies between nodes. The edges specify a partial ordering of the nodes—sufficient ordering to maintain the I/O semantics of the source program, while allowing optimizers greater freedom to move nodes within the program to achieve better (smaller) code. Optimizations, both classical and new, transform the graph through graph rewriting rules prior to code generation. Additional (se-mantically inessential) state edges are added to transform the VSDG into a Control Flow Graph, from which target code is generated. We show how procedural abstraction can be advantageously applied to the VSDG. Graph patterns are extracted from a program's VSDG. We then select repeated patterns giving the greatest size reduction, generate new functions from these patterns, and replace all occurrences of the patterns in the original VSDG with calls to these abstracted functions. Several embedded processors have load- and store-multiple instructions, representing several loads (or stores) as one instruction. We present a method, benefiting from the VSDG form, for using these instructions to reduce code size by provisionally combining loads and stores before code generation. The final contribution of this thesis is a combined register allocation and code motion (RACM) algorithm. We show that our RACM algorithm formulates these two previously antagonistic phases as one combined pass over the VSDG, transforming the graph (moving or cloning nodes, or spilling edges) to fit within the physical resources of the target processor. We have implemented our ideas within a prototype C compiler and suite of VSDG optimizers, generating code for the Thumb 32-bit processor. Our results show improvements for each optimization and that we can achieve code sizes comparable to, and in some cases better than, that produced by commercial compilers with significant investments in optimization technology.
Los estilos APA, Harvard, Vancouver, ISO, etc.
5

Mucci, Claudio <1977&gt. "Software tools or embedded reconfigurable processors". Doctoral thesis, Alma Mater Studiorum - Università di Bologna, 2007. http://amsdottorato.unibo.it/402/1/claudiomucci_phdthesis.pdf.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
6

Mucci, Claudio <1977&gt. "Software tools or embedded reconfigurable processors". Doctoral thesis, Alma Mater Studiorum - Università di Bologna, 2007. http://amsdottorato.unibo.it/402/.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
7

Hoffmann, Andreas Leupers Rainer Meyr Heinrich. "Architecture exploration for embedded processors with LISA /". Boston [u.a.] : Kluwer Acad. Publ, 2002. http://www.loc.gov/catdir/toc/fy037/2002043258.html.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
8

Hadjiyiannis, George Ioannou. "An architecture synthesis system for embedded processors". Thesis, Massachusetts Institute of Technology, 2000. http://hdl.handle.net/1721.1/86440.

Texto completo
Resumen
Thesis (Ph.D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2000.
Includes bibliographical references (leaves 261-264).
by George Ioannou Hadjiyiannis.
Ph.D.
Los estilos APA, Harvard, Vancouver, ISO, etc.
9

Mistry, Jatin N. "Leakage power minimisation techniques for embedded processors". Thesis, University of Southampton, 2013. https://eprints.soton.ac.uk/348805/.

Texto completo
Resumen
Leakage power is a growing concern in modern technology nodes. In some current and emerging applications, speed performance is uncritical but many of these applications rely on untethered power making energy a primary constraint. Leakage power minimisation is therefore key to maximising energy efficiency for these applications. This thesis proposes two new leakage power minimisation techniques to improve the energy efficiency of embedded processors. The first technique, called sub-clock power gating,can be used to reduce leakage power during the active mode. The technique capitalises on the observation that there can be large combinational idle time within the clock period in low performance applications and therefore power gates it. Sub-clock power gating is the first study into the application of power gating within the clock period, and simulation results on post layout netlists using a 90nm technology library show 3.5x, 2x and 1.3x improvement in energy efficiency for three test cases: 16-bit multiplier, ARM Cortex-M0 and Event Processor at a given performance point. To reduce the energy cost associated with moving between the sleep and active mode of operation, a second technique called symmetric virtual rail clamping is proposed. Rather than shutting down completely during sleep mode, the proposed technique uses a pair of NMOS and PMOS transistors at the head and foot of the power gated logic to lower the supply voltage by 2Vth. This reduces the energy needed to recharge the supply rails and eliminates signal glitching energy cost during wake-up. Experimental results from a 65nm test chip shows application of symmetric virtual rail clamping in sub-clock power gating improves energy efficiency, extending its applicable clock frequency range by 400x. The physical layout of power gating requires dedicated techniques and this thesis proposes dRail, a new physical layout technique for power gating. Unlike the traditional voltage area approach, dRail allows both power gated and non-power gated cells to be placed together in the physical layout to reduce area and routing overheads. Results from a post layout netlist of an ARM Cortex-M0 with sub-clock power gating shows standard cell area and signal routing are improved by 3% and 19% respectively. Sub-clock power gating, symmetric virtual rail clamping and dRail are incorporated into power gating design flows and are compatible with commercial EDA tools and gate libraries.
Los estilos APA, Harvard, Vancouver, ISO, etc.
10

Zushi, Junpei, Gang Zeng, Hiroyuki Tomiyama, Hiroaki Takada y Koji Inoue. "Improved Policies for Drowsy Caches in Embedded Processors". IEEE, 2008. http://hdl.handle.net/2237/12081.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
11

Hanono, Silvina Zimi. "AVIV : a retargetable code generator for embedded processors". Thesis, Massachusetts Institute of Technology, 1999. http://hdl.handle.net/1721.1/80080.

Texto completo
Resumen
Thesis (Ph.D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1999.
Vita.
Includes bibliographical references (p. 225-231).
by Silvina Zimi Hanono.
Ph.D.
Los estilos APA, Harvard, Vancouver, ISO, etc.
12

BenDor, Jonathan y J. D. Baker. "Processing Real-Time Telemetry with Multiple Embedded Processors". International Foundation for Telemetering, 1994. http://hdl.handle.net/10150/611671.

Texto completo
Resumen
International Telemetering Conference Proceedings / October 17-20, 1994 / Town & Country Hotel and Conference Center, San Diego, California
This paper describes a system in which multiple embedded processors are used for real-time processing of telemetry streams from satellites and radars. Embedded EPC-5 modules are plugged into VME slots in a Loral System 550. Telemetry streams are acquired and decommutated by the System 550, and selected parameters are packetized and appended to a mailbox which resides in VME memory. A Windows-based program continuously fetches packets from the mailbox, processes the data, writes to log files, displays processing results on screen, and sends messages via a modem connected to a serial port.
Los estilos APA, Harvard, Vancouver, ISO, etc.
13

Kufel, Jedrzej. "Techniques and validation for protection of embedded processors". Thesis, University of Southampton, 2015. https://eprints.soton.ac.uk/381185/.

Texto completo
Resumen
Advances in technology scaling and miniaturization of on-chip structures have caused an increasing complexity of modern devices. Due to immense time-to-market pressures, the reusability of intellectual property (IP) sub-systems has become a necessity. With the resulting high risks involved with such a methodology, securing IP has become a major concern. Despite a number of proposed IP protection (IPP) techniques being available, securing an IP in the register transfer level (RTL) is not a trivial task, with many of the techniques presenting a number of shortfalls or design limitations. The most prominent and the least invasive solution is the integration of a digital watermark into an existing IP. In this thesis new techniques are proposed to address the implementation difficulties in constrained embedded IP processor cores. This thesis establishes the parameters of sequences used for digital watermarking and the tradeoffs between the hardware implementation cost, detection performance and robustness against IP tampering. A new parametric approach is proposed which can be implemented with any watermarking sequence. MATLAB simulations and experimental results of two fabricated silicon ASICs with a watermark circuit embedded in an ARMR Cortex R-M0 IP core and an ARMR Cortex R-A5 IP core demonstrate the tradeoffs between various sequences based on the final design application. The thesis further focuses on minimization of hardware costs of a watermark circuit implementation. A new clock-modulation based technique is proposed and reuses the existing circuit of an IP core to generate a watermark signature. Power estimation and experimental results demonstrate a significant area and power overhead reduction, when compared with the existing techniques. To further minimize the costs of a watermark implementation, a new technique is proposed which allows a non-deterministic and sporadic generation of a watermark signature. The watermark was embedded in an ARMR Cortex R-A5 IP core and was fabricated in silicon. Experimental silicon results have validated the proposed technique and have demonstrated the negligible hardware implementation costs of an embedded watermark.
Los estilos APA, Harvard, Vancouver, ISO, etc.
14

Ragel, Roshan Gabriel Computer Science &amp Engineering Faculty of Engineering UNSW. "Architectural support for security and reliability in embedded processors". Awarded by:University of New South Wales. Computer Science and Engineering, 2006. http://handle.unsw.edu.au/1959.4/28797.

Texto completo
Resumen
Security and reliability in processor based systems are concerns requiring adroit solutions. Security is often compromised by code injection attacks, jeopardizing even ???trusted software???. Reliability is of concern, where unintended code is executed in modern processors with ever smaller feature sizes and low voltage swings causing bit flips. Countermeasures by software-only approaches increase code size and therefore significantly reduce performance. Hardware assisted approaches use additional hardware monitors and thus incur considerably high hardware cost and have scalability problems. Considering reliability and security issues during the design of an embedded system has its advantages as this overcomes the limitations of existing solutions. The research work presented in this thesis combines two elements: one, defining a hardware software design framework for reliability and security monitoring at the granularity of micro-instructions, and two, applying this framework for real world problems. At a given time, a processor executes only a few instructions and large part of the processor is idle. Utilizing these idling hardware components by sharing them with the monitoring hardware, to perform security and reliability monitoring reduces the impact of the monitors on hardware cost. Using micro-instruction routines within the machine instructions, allows us to share most of the monitoring hardware. Therefore, our technique requires little hardware overhead in comparison to having additional hardware blocks outside the processor. This reduction in overhead is due to maximal sharing of hardware resources of the processor. Our framework is superior to software-only techniques as the monitoring routines are formed with micro-instructions and therefore reduces code size and execution time overheads, since they occur in parallel with machine instructions. This dissertation makes four significant contributions to the field of security and reliability on embedded processor research and they are: (i) proposed a security and reliability framework for embedded processors that could be included into its design phase; (ii) shown that inline (machine instruction level) monitoring will detect common security attacks (four inline monitors against common attacks cost 9.21% area and 0.67% performance, as opposed to previous work where an external monitor with two monitoring modules costs 15% area overhead); (iii) illustrated that basic block check-summing for code integrity is much simpler and efficient than currently proposed integrity violation detectors which address code injection attacks (this costs 5.03% area increase and 3.67% performance penalty with a single level control flow checking, as opposed to previous work where the area overhead is 5.59%, which needed three control flow levels of integrity checking); and (iv) shown that hardware assisted control flow checking implemented during the design of a processor is much cheaper and effective than software only approaches (this approach costs 0.24-1.47% performance and 3.59% area overheads, as opposed to previous work that costs 53.5-99.5% performance).
Los estilos APA, Harvard, Vancouver, ISO, etc.
15

Xu, Xianhong. "Code memory compression technologies for embedded arm/thumb processors". Thesis, University of Bath, 2007. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.442019.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
16

Liao, Stan Yi-Huang 1972. "Code generation and optimization for embedded digital signal processors". Thesis, Massachusetts Institute of Technology, 1996. http://hdl.handle.net/1721.1/11048.

Texto completo
Resumen
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1996.
Vita.
Includes bibliographical references (p. [203]-211).
by Stan Yi-Huang Liao.
Ph.D.
Los estilos APA, Harvard, Vancouver, ISO, etc.
17

Chakraborty, Samarjit. "System-level timing analysis and scheduling for embedded packet processors /". Zürich : Institut für Technische Informatik und Kommunikationsnetze TIK, ETH Zürich, 2003. http://e-collection.ethbib.ethz.ch/show?type=diss&nr=15093.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
18

Talavera, Velilla Guillermo. "Scratchpad-oriented address generation for low-power embedded VLIW processors". Doctoral thesis, Universitat Autònoma de Barcelona, 2009. http://hdl.handle.net/10803/5780.

Texto completo
Resumen
Actualmente, los sistemas encastados están creciendo a un ritmo impresionante y proporcionan cada vez aplicaciones más sofisticadas. Un conjunto de creciente importancia son los sistemas multimedia portátiles de tiempo real y los sistemas de comunicación de procesado digital de señal: teléfonos móviles, PDAs, cámaras digitales, consolas portátiles de juegos, terminales multimedia, netbooks, etc. Estos sistemas requieren computación específica de alto rendimiento, generalmente con restricciones de tiempo real y calidad de servicio (Quality of Service - QoS), que han de ejecutarse con un nivel bajo de consumo para extender la vida de la batería y evitar el calentamiento del dispositivo. También se requiere una arquitectura flexible para satisfacer las restricciones del "time-to-market". En consecuencia, los sistemas encastados necesitan una solución programable, de bajo consumo y alta capacidad de computación para satisfacer todos los requerimientos.
Las arquitecturas de tipo Very Long Instruction Word parecen una buena solución ya que proporcionan el suficiente rendimiento a bajo consumo con la programabilidad requerida. Estas arquitecturas se asientan sobre el esfuerzo del compilador para extraer el paralelismo disponible a nivel datos y de instrucciones para mantener las unidades computacionales ocupadas todo el rato. Con la densidad de los transistores doblando cada 18 meses, están emergiendo arquitecturas cada vez más complejas con un alto número de recursos computacionales ejecutándose en paralelo. Con esta, cada vez mayor, computación paralela, el acceso a los datos se está convirtiendo en el mayor impedimento que limita la posible extracción del paralelismo. Para aliviar este problema, en las actuales arquitecturas, una unidad especial trabaja en paralelo con los principales elementos computacionales para asegurar una eficiente transmisión de datos: la Unidad Generadora de Direcciones (Address Generator Unit), que puede implementarse de diferentes formas.
El propósito de esta tesis es probar que optimizar el proceso de la generación de direcciones es una manera eficiente de solucionar el proceso de acceder a los datos al mismo tiempo que disminuye el tiempo de ejecución y el consumo de energía.
Esta tesis evalúa la efectividad de los diferentes dispositivos que actualmente se usan en los sistemas encastados, argumenta el uso de procesadores de tipo "very long instruction word" y presenta la infraestructura de compilador y exploración arquitectural usada en los experimentos.
Esta tesis también presenta una clasificación sistemática de los generadores de direcciones, un repaso de las diferentes técnicas de optimización actuales acorde con esta clasificación y una metodología, usando técnicas ya publicadas, sistemática y óptima que reduce gradualmente la energía necesitada. También se introduce el entorno de trabajo que permite una exploración arquitectural sistemática y los métodos usados para obtener una unidad de generación de direcciones.
Los resultados de este unidad de generación de direcciones reconfigurable se muestran en diferentes aplicaciones de referencia (benchmarks) y la metodología sistemática se muestra en una aplicación completa real.
Nowadays Embedded Systems are growing at an impressive rate and provide more and more sophisticated applications. An increasingly important set of embedded systems are real-time portable multimedia and digital signal processing communication systems: cellular phones, PDAs, digital cameras, handheld gaming consoles, multimedia terminals, netbooks, etc. These systems require high performance specific computations, usually with real-time and Quality of Service (QoS) constraints, which should run at a low energy level to extend battery life and avoid heating. A flexible system architecture is also required to successfully meet short time-to-market restrictions. Hence, embedded systems need a programmable, low power and high performance solution in order to deal with these requirements.
Very Long Instruction Word architectures seem a good solution for providing enough computational performance at low-power with the required programmability to speed the time-to-market. Those architectures rely on compiler effort to exploit the available instruction and data parallelism to keep the data path busy all the time. With the density of transistors doubling each 18 months, more and more complex architectures with a high number of computational resources running in parallel are emerging. With this increasing parallel computation, the access to data is becoming the main bottleneck that limits the available parallelism. To alleviate this problem, in current embedded architectures, a special unit works in parallel with the main computing elements to ensure efficient feed and storage of the data: the Address Generator Unit, which comes in many flavors.
The purpose of this dissertation is to prove that optimizing the process of address generation is an effective way of solving the problem of accessing data while decreasing execution time and energy consumption.
As a first step, this thesis evaluates the effectiveness of different state-of-the-art devices commonly used in the embedded domain, argues for the use of very long instruction word processors and presents the compiler and architecture framework used for our experiments.
This thesis also presents a systematic classification of address generators, a review of literature according to the classification of the different optimizations on the address generation process and a step-wise methodology that gradually reduces energy reusing techniques that already have been published. The systematic architecture exploration framework and methods used to obtain a reconfigurable address generation unit are also introduced.
Results of the reconfigurable address generator unit are shown on several benchmarks and applications, and the complete step-wise methodology is demonstrated on a real-life example.
Los estilos APA, Harvard, Vancouver, ISO, etc.
19

Caulfield, Ian Michael. "Complexity-effective superscalar embedded processors using instruction-level distributed processing". Thesis, University of Cambridge, 2007. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.613309.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
20

Franke, Bjorn. "Compilation techniques for high-performance embedded systems with multiple processors". Thesis, University of Edinburgh, 2004. http://hdl.handle.net/1842/568.

Texto completo
Resumen
Despite the progress made in developing more advanced compilers for embedded systems, programming of embedded high-performance computing systems based on Digital Signal Processors (DSPs) is still a highly skilled manual task. This is true for single-processor systems, and even more for embedded systems based on multiple DSPs. Compilers often fail to optimise existing DSP codes written in C due to the employed programming style. Parallelisation is hampered by the complex multiple address space memory architecture, which can be found in most commercial multi-DSP configurations. This thesis develops an integrated optimisation and parallelisation strategy that can deal with low-level C codes and produces optimised parallel code for a homogeneous multi-DSP architecture with distributed physical memory and multiple logical address spaces. In a first step, low-level programming idioms are identified and recovered. This enables the application of high-level code and data transformations well-known in the field of scientific computing. Iterative feedback-driven search for “good” transformation sequences is being investigated. A novel approach to parallelisation based on a unified data and loop transformation framework is presented and evaluated. Performance optimisation is achieved through exploitation of data locality on the one hand, and utilisation of DSP-specific architectural features such as Direct Memory Access (DMA) transfers on the other hand. The proposed methodology is evaluated against two benchmark suites (DSPstone & UTDSP) and four different high-performance DSPs, one of which is part of a commercial four processor multi-DSP board also used for evaluation. Experiments confirm the effectiveness of the program recovery techniques as enablers of high-level transformations and automatic parallelisation. Source-to-source transformations of DSP codes yield an average speedup of 2.21 across four different DSP architectures. The parallelisation scheme is – in conjunction with a set of locality optimisations – able to produce linear and even super-linear speedups on a number of relevant DSP kernels and applications.
Los estilos APA, Harvard, Vancouver, ISO, etc.
21

Liu, ke. "A Simulation Based Approach to EstimateEnergy Consumption for Embedded Processors". Thesis, Högskolan i Halmstad, Centrum för forskning om inbyggda system (CERES), 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:hh:diva-29913.

Texto completo
Resumen
Embedded systems have entered a new era in which system designers have to consider more and more strict energy consumption constraints. This thesis reviewsprevious studies of the processor energy consumption estimation. Particularly, wefocus on instruction-level energy consumption for embedded processors and explorethe energy consumption model for manycore architecture in real traffic pattern.The purpose of this thesis project is to estimate energy consumption and constructan energy model using an instruction-set simulator for embedded processors. OpenVirtual Platforms (OVP) and Epiphany Single Core Simulator (ESCS) are used toobtain an instruction sequence for a given software. Then, the functionality of energyconsumption estimation is integrated into OVP.Our energy consumption estimation approach categorizes instructions in fourgroups and uses base energy cost of each category to calculate the total energyconsumption for an application that runs on Epiphany core and ARM Cortex M0.The energy consumption estimation for Epiphany core has been successfully testedby BEEBS benchmarks. Details of the derivation process, results and analysis ofenergy consumption estimation methods are provided.
Los estilos APA, Harvard, Vancouver, ISO, etc.
22

Lins, Filipe Maciel. "The effects of the compiler optimizations in embedded processors reliability". reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2017. http://hdl.handle.net/10183/169248.

Texto completo
Resumen
O recente avanço tecnológico dos processadores embarcados aumentou a complexidade dos compiladores e o uso de recursos heterogêneos, como Arranjo de Portas Programáveis em Campo (Field Programmable Gate Array - FPGA) e Unidade de Processamento Gráfico (Graphics Processing Unit - GPU), integrado aos processadores. Além disso, aumentou-se o uso de componentes de prateleira (Commercial off-the-shelf - COTS) em aplicações críticas, ao invés de chips tolerantes a radiação, pois os COTS podem ser mais baratos, flexíveis, terem uma rápida colocação no mercado e um menor consumo de energia. No entanto, mesmo com essas vantagens, os COTS são suscetíveis a falha sendo necessário garantir uma alta confiabilidade nos sistemas utilizados. Assim como, no caso de aplicações em tempo real, também se precisa respeitar os requisitos determinísticos. Como caso de estudo, este trabalho utiliza a Zynq que é um dispositivo COTS do tipo Sistema em Chip Totalmente Programável (All Programmable System on Chip - APSoC) no qual possui um processador ARM Cortex-A9 embarcado. Nesta pesquisa, investigou-se o impacto das falhas que afetam o arquivo de registradores na confiabilidade dos processadores embarcados. Para tanto, experimentos de injeção de falhas e de radiação de íons pesados foram realizados. Além do mais, avaliou-se como os diferentes níveis de otimização do compilador modificam o uso e a probabilidade de falha do arquivo de registradores do processador. Selecionou-se seis benchmarks representativos, cada um compilado com três níveis diferentes de otimização. Realizamos campanhas exaustivas de injeção de falhas para medir o Fator de Vulnerabilidade Arquitetural (Architectural Vulnerability Factor - AVF) de cada código e configuração, identificando os registradores que são mais propensos a gerar uma corrupção de dados silenciosos (Silent Data Corruption - SDC) ou uma interrupção funcional de evento único (Single Event Functional Interruption - SEFI). Também foram correlacionadas as variações de confiabilidade observadas com a utilização do arquivo de registradores. Finalmente, irradiamos com íons pesados dois dos benchmarks selecionados compilados com dois níveis de otimização. Os resultados mostram que mesmo com o melhor desempenho, o menor uso do arquivo de registradores ou o menor AVF não é garantido que as aplicações irão alcançar a maior Carga de Trabalho Média Entre Falhas (Mean Workload Between Failure - MWBF). Por exemplo, os resultados mostram que o melhor desempenho da aplicação Multiplicação de Matrizes (Matrix Multiplication - MxM) é alcançado no nível de otimização mais alta. No entanto, nos resultados dos experimentos de injeção de falhas, a maior confiabilidade é alcançada no menor nível de otimização que possuem os menores AVFs e o menor uso do arquivo de registradores. Os resultados também mostram que o impacto das otimizações está fortemente relacionado com o algoritmo executado e como o compilador faz esta otimização.
The recent advances in the embedded processors increase the compilers complexity, and the usage of heterogeneous resources such as Field Programmable Gate Array (FPGA) and Graphics Processing Unit (GPU) integrated with the processors. Additionally, the increase in the usage of Commercial off-the-shelf (COTS) instead of radiation hardened chips in safety critical applications occurs because the COTS can be more flexible, inexpensive, have a fast time-to market and a lower power consumption. However, even with these advantages, it is still necessary to guarantee a high reliability in a system that uses a COTS for safety critical applications because they are susceptible to failures. Additionally, in the case of real time applications, the time requirements also need to be respected. As a case of study, this work uses the Zynq which is a COTS device classified as an All Programmable System-on-Chip (APSOC) and has an ARM Cortex-A9 as the embedded processor. In this research, the impact of faults that affect the register file in the embedded processors reliability was investigated. For that, fault-injection and heavy-ion radiation experiments were performed. Moreover, an evaluation of how the different levels of compiler optimization modify the usage and the failure probability of a processor register file. A set of six representative benchmarks, each one compiled with three different levels of compiler optimization. Exhaustive fault injection campaigns were performed to measure the registers Architectural Vulnerability Factor (AVF) of each code and configuration, identifying the registers that are more likely to generate Silent Data Corruption (SDC) or Single Event Functional Interruption (SEFI). Moreover, the observed reliability variations with register file utilization were correlated. Finally, two of the selected benchmarks, each one compiled with two different levels of optimization were irradiated in the heavy ions experiments. The results show that the best performance, the minor register file usage, or the lowest AVF does not always bring the highest Mean Workload Between Failures (MWBF). As an example, in the Matrix Multiplication (MxM) application, the best performance is achieved in the highest compiler optimization. However, in the fault injection, the higher reliability is obtained in the lower compiler optimization which has, the lower AVFs and the lower register file usage. Results also show that the impact of optimizations is strongly related to the executed algorithm and how the compiler optimizes them.
Los estilos APA, Harvard, Vancouver, ISO, etc.
23

Cooke, Alan. "The Killer App – Combining Embedded Processors, FPGAs and Smart Software". International Foundation for Telemetering, 2016. http://hdl.handle.net/10150/624252.

Texto completo
Resumen
In this paper, the benefits and advantages of combining advanced embedded processing capabilities with an FPGA based approach within a Data Acquisition Unit (DAU) are discussed. The paper begins with a discussion of some of the services and functionality that such a system enables. Basic features such as system discovery, verification, configuration and upgrade are discussed in addition to other value added services such as continuous built in test (CBIT) and embedded real-time parameter quick-look. Finally, the paper discusses some advanced services that could be deployed to these systems such as emerging communication protocols, multimedia connectivity and discovery, and advanced Machine Learning based systems diagnostics.
Los estilos APA, Harvard, Vancouver, ISO, etc.
24

Akturan, Cagdas. "Performance enhancing software loop transformations for embedded VLIW/EPIC processors". Access restricted to users with UT Austin EID Full text (PDF) from UMI/Dissertation Abstracts International, 2001. http://wwwlib.umi.com/cr/utexas/fullcit?p3035929.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
25

Gayen, Neela. "Automatic parallelization of stream programs for resource efficient embedded processors". Thesis, Queensland University of Technology, 2021. https://eprints.qut.edu.au/213058/1/Neela_Gayen_Thesis.pdf.

Texto completo
Resumen
This thesis considers how to exploit the specific characteristics of data streaming functions and multi-core processors to increase throughput through appropriate software process mappings. The hypothesis is that large numbers of low-power processors can achieve high throughput for streaming applications if a good mapping is provided. The innovation is to use compilation principles to guide the mapping, rather than heuristics. Three increasingly complex approaches are developed that focus on computational bottlenecks, then adds communication overheads, and lastly adds the costs of splitting and merging operations. Using this approach demonstrates that the successively more complex models can achieve correspondingly greater throughput.
Los estilos APA, Harvard, Vancouver, ISO, etc.
26

Franz, Jonathan D. Duren Russell Walker. "An evaluation of CoWare Inc.'s Processor Designer tool suite for the design of embedded processors". Waco, Tex. : Baylor University, 2008. http://hdl.handle.net/2104/5254.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
27

Lau, ChokSheak. "An optimization framework for embedded processors with auto-modify addressing modes". Thesis, Available online, Georgia Institute of Technology, 2004:, 2004. http://etd.gatech.edu/theses/available/etd-11152004-212501/unrestricted/Lau%5FChokSheak%5F200412%5Fmast.pdf.

Texto completo
Resumen
Thesis (M.S.)--Computing, Georgia Institute of Technology, 2005.
Pande, Santosh, Committee Chair ; Lee, Hsien-Hsin Sean, Committee Member ; Uh, Gang-Ryung, Committee Member. Includes bibliographical references.
Los estilos APA, Harvard, Vancouver, ISO, etc.
28

Ganesan, Sharan Kumaar. "Design and Implementation of Digital Spiking Neurons for Ultra-low-Power In-cluster processors". Thesis, KTH, Skolan för informations- och kommunikationsteknik (ICT), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-198115.

Texto completo
Resumen
Neuromorphic computing is a recent and growing field of research. Its conceptual attractiveness is due to the potential it has in deep learning applications such as sensor networks, low-power computer vision, robotics and other fields. Inspired by the functioning of brain, different neural network models have been devised, each with their own special focus on certain applications. Using such computing models are already helping us in different cases such as image, character and voice recognition, data analysis, stock market prediction, etc. Among the multitude of artificial neural models available, spiking neurons are more deeply inspired by biological neural networks. Leaky, Integrate and Fire (LIF) neuron model is one such model that can reproduce a good number of functions, be simple and also extensible in structure. Current deep learning applications are tied to servers and datacenters for their power and resource hungry existence. This work aims at building a low power neuron core taking advantage of LIF neuron, that could possible result in independent battery powered devices. A hardware design of LIF neuron based scalable neural core is explored, constructed and analysis for power consumption is made.
Los estilos APA, Harvard, Vancouver, ISO, etc.
29

Mourad, Azzam. "A Selective Dynamic Compiler for Embedded Java Virtual Machine Targeting ARM Processors". Thesis, Université Laval, 2005. http://www.theses.ulaval.ca/2005/22534/22534.pdf.

Texto completo
Resumen
Ce travail présente une nouvelle technique de compilation dynamique sélective pour les systèmes embarqués avec processeurs ARM. Ce compilateur a été intégré dans la plateforme J2ME/CLDC (Java 2 Micro Edition for Connected Limited Device Con- figuration). L’objectif principal de notre travail est d’obtenir une machine virtuelle accélérée, légère et compacte prête pour l’exécution sur les systèmes embarqués. Cela est atteint par l’implémentation d’un compilateur dynamique sélectif pour l’architecture ARM dans la Kilo machine virtuelle de Sun (KVM). Ce compilateur est appelé Armed E-Bunny. Premièrement, on présente la plateforme Java, le Java 2 Micro Edition(J2ME) pour les systèmes embarqués et les composants de la machine virtuelle Java. Ensuite, on discute les différentes techniques d’accélération pour la machine virtuelle Java et on détaille le principe de la compilation dynamique. Enfin, on illustre l’architecture, le design (la conception), l’implémentation et les résultats expérimentaux de notre compilateur dynamique sélective Armed E-Bunny. La version modifiée de KVM a été portée sur un ordinateur de poche (PDA) et a été testée en utilisant un benchmark standard de J2ME. Les résultats expérimentaux de la performance montrent une accélération de 360 % par rapport à la dernière version de la KVM de Sun avec un espace mémoire additionnel qui n’excède pas 119 kilobytes.
This work presents a new selective dynamic compilation technique targeting ARM 16/32-bit embedded system processors. This compiler is built inside the J2ME/CLDC (Java 2 Micro Edition for Connected Limited Device Configuration) platform. The primary objective of our work is to come up with an efficient, lightweight and low-footprint accelerated Java virtual machine ready to be executed on embedded machines. This is achieved by implementing a selective ARM dynamic compiler called Armed E-Bunny into Sun’s Kilobyte Virtual Machine (KVM). We first present the Java platform, Java 2 Micro Edition (J2ME) for embedded systems and Java virtual machine components. Then, we discuss the different acceleration techniques for Java virtual machine and we detail the principle of dynamic compilation. After that we illustrate the architecture, design, implementation and experimental results of our selective dynamic compiler Armed E-Bunny. The modified KVM is ported on a handheld PDA and is tested using standard J2ME benchmarks. The experimental results on its performance demonstrate that a speedup of 360% over the last version of Sun’s KVM is accomplished with a footprint overhead that does not exceed 119 kilobytes.
Inscrit au Tableau d'honneur de la Faculté des études supérieures
Los estilos APA, Harvard, Vancouver, ISO, etc.
30

BHALGAT, ASHISH ZUMBARLAL. "INSTRUCTION SCHEDULING TO HIDE LOAN/STORE LATENCY IN IRREGULAR ARCHITECTURE EMBEDDED PROCESSORS". University of Cincinnati / OhioLINK, 2001. http://rave.ohiolink.edu/etdc/view?acc_num=ucin982261963.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
31

Arunachalam, Srinath. "An online wear state monitoring methodology for off-the-shelf embedded processors". DigitalCommons@USU, 2015. https://digitalcommons.usu.edu/etd/4552.

Texto completo
Resumen
The continued scaling of transistors has led to an exponential increase in on-chip power density, which has resulted in increasing temperature. In turn, the increase in temperature directly leads to the increase in the rate of wear of a processor. Negative-bias temperature instability (NBTI) is one of the most dominant integrated circuit (IC) failure mechanisms [13, 5] that strongly depends on temperature. NBTI manifests in the form of increased circuit delays which can lead to timing failures and processor crashes. The ability to monitor the wear progression of a processor due to NBTI is valuable when designing real-time embedded systems. While NBTI can be detected using wear state sensors, not all chips are equipped with these sensors because detecting wear due to NBTI requires modifications to the chip design and incurs area and power overhead. NBTI sensor data may also not be exposed to users in software. In addition, wear sensors cannot take into account variations in wear due to the differences in the wear sensor devices and the other functional devices and their operating conditions. In this paper, we propose a lightweight, online methodology to monitor the wear process due to NBTI for off-the-shelf embedded processors. Our proposed method requires neither data on the threshold voltage and critical paths nor additional hardware. Our methodology can also be extended to predict the wear progression due to some other dominant IC failure mechanisms. Experiments on embedded processors provide insights on NBTI wear progression over time. This knowledge can be used to design real-time embedded systems that explicitly consider runtime wear progression to increase predictability and maintain lifetime reliability requirements.
Los estilos APA, Harvard, Vancouver, ISO, etc.
32

Bechara, Charly. "Study and design of a manycore architecture with multithreaded processors for dynamic embedded applications". Phd thesis, Université Paris Sud - Paris XI, 2011. http://tel.archives-ouvertes.fr/tel-00713536.

Texto completo
Resumen
Embedded systems are getting more complex and require more intensive processing capabilities. They must be able to adapt to the rapid evolution of the high-end embedded applications that are characterized by their high computation-intensive workloads (order of TOPS: Tera Operations Per Second), and their high level of parallelism. Moreover, since the dynamism of the applications is becoming more significant, powerful computing solutions should be designed accordingly. By exploiting efficiently the dynamism, the load will be balanced between the computing resources, which will improve greatly the overall performance. To tackle the challenges of these future high-end massively-parallel dynamic embedded applications, we have designed the AHDAM architecture, which stands for "Asymmetric Homogeneous with Dynamic Allocator Manycore architecture". Its architecture permits to process applications with large data sets by efficiently hiding the processors' stall time using multithreaded processors. Besides, it exploits the parallelism of the applications at multiple levels so that they would be accelerated efficiently on dedicated resources, hence improving efficiently the overall performance. AHDAM architecture tackles the dynamism of these applications by dynamically balancing the load between its computing resources using a central controller to increase their utilization rate.The AHDAM architecture has been evaluated using a relevant embedded application from the telecommunication domain called "spectrum radio-sensing". With 136 cores running at 500 MHz, AHDAM architecture reaches a peak performance of 196 GOPS and meets the computation requirements of the application.
Los estilos APA, Harvard, Vancouver, ISO, etc.
33

Oliveira, Ádria Barros de. "Applying dual core lockstep in embedded processors to mitigate radiation induced soft errors". reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2017. http://hdl.handle.net/10183/173785.

Texto completo
Resumen
Os processadores embarcados operando em sistemas de segurança ou de missão crítica não podem falhar. Qualquer falha neste tipo de aplicação pode levar a consequências inaceitáveis, como risco de vida ou danos à propriedade ou ao meio ambiente. Os sistemas embarcados que operam em aplicações aeroespaciais são sucetíveis à falhas transientes induzidas por radiação. Entretanto, os efeitos de radiação também podem ser observados ao nível do solo. Falhas transientes afetam os processadores modificando os valores armazenados em elementos de memória, tais como registradores e memória de dados. Essas falhas podem levar o processador a executar incorretamente a aplicação, provocando erros na saída ou travamentos no sistema. Os avanços recentes em processadores embarcados concistem na integração de processadores hard-core e FPGAs. Tais dispositivos, comumente chamados de Sistemas-em-Chip Totalmente Programáveis (APSoCs), também são sucetíveis aos efeitos de radiação. Com objetivo de minimizar esse problema de tolerância a falhas, este trabalho apresenta um Dual-Core LockStep (DCLS) como uma técnica de tolerância para mitigar falhas induzidas por radiação que afetam processadores embarcados em APSoCs. Lockstep é um método baseado em redundância usado para detectar e corrigir falhas transientes. O DCLS proposto é implementado em um processador ARM Cortex-A9 hard-core embarcado no APSoC Zynq-7000. A eficiência da abordagem implementada foi validada tanto em aplicações executando em bare-metal como no sistema operacional FreeRTOS. Experimentos com íons pesados e emulação de falhas por injeção foram executados para analisar a sucetibilidade do sistema a inversão de bits. Os resultados obtidos mostram que a abordagem é capaz de diminuir a seção de choque do sistema com uma alta taxa de proteção. O sistema DCLS mitigou com sucesso até 78% das falhas injetadas. Otimizações de software também foram avaliadas para uma melhor compreenção dos trade-offs entre desempenho e confiabilidade. Através da análise de diferentes partições de software, observou-se que o tempo de execução de um bloco da aplicação deve ser muito maior que o tempo de verificação para que se obtenha menor impacto em desempenho. A avaliação de otimizações de compilador demonstrou que utilizar o nível O3 aumenta a vulnerabilidade da aplicação à falhas transientes. Como o O3 requer o uso de mais registradores que os otros níveis de otimização, o sistema se torna mais sucetível à falhas. Por outro lado, os resultados dos experimentos de radiação apontam que a aplicação compilada com nível O3 obtém maior Carga de Trabalho Média Entre Falhas (MWBF). Como a aplicação executa mais rápido, mais dados são computados corretamente antes da ocorrência de um erro.
The embedded processors operating in safety- or mission-critical systems are not allowed to fail. Any failure in such applications could lead to unacceptable consequences as life risk or significant damage to property or environment. Concerning faults originated by the radiation-induced soft errors, the embedded systems operating in aerospace applications are particularly susceptible. However, the radiation effects can also be observed at ground level. Soft errors affect processors by modifying values stored in memory elements, such as registers and data memory. These faults may lead the processor to execute an application incorrectly, generating output errors or leading hangs and crashes in the system. The recent advances in embedded systems concern the integration of hard-core processors and FPGAs. Such devices, called All Programmable System-on-Chip (APSoC), are also susceptible to radiation effects. Aiming to address this fault tolerance problem this work presents a Dual-Core LockStep (DCLS) as a fault tolerance technique to mitigate radiation-induced faults affecting processors embedded into APSoCs. Lockstep is a method based on redundancy used to detect and correct soft errors. The proposed DCLS is implemented in a hard-core ARM Cortex-A9 embedded into a Zynq-7000 APSoC. The approach efficiency was validated not only on applications running in baremetal but also on top of FreeRTOS systems. Heavy ions experiments and fault injection emulation were performed to analyze the system susceptibility to bit-flips. The obtained results show that the approach is able to decrease the system cross section with a high rate of protection. The DCLS system successfully mitigated up to 78% of the injected faults. Software optimizations were also evaluated to understand the trade-offs between performance and reliability better. By the analysis of different software partitions, it was observed that the execution time of an application block must to be much longer than the verification time to achieve fewer performance penalties. The compiler optimizations assessment demonstrate that using O3 level increases the application vulnerability to soft errors. Because O3 handles more registers than other optimizations, the system is more susceptible to faults. On the other hand, results from radiation experiments show that O3 level provides a higher Mean Workload Between Failures (MWBF). As the application runs faster, more data are correctly computed before an error occurrence.
Los estilos APA, Harvard, Vancouver, ISO, etc.
34

Chen, Zhimin. "SCA-Resistant and High-Performance Embedded Cryptography Using Instruction Set Extensions and Multi-Core Processors". Diss., Virginia Tech, 2011. http://hdl.handle.net/10919/51256.

Texto completo
Resumen
Nowadays, we use embedded electronic devices in almost every aspect of our daily lives. They represent our electronic identity; they store private information; they monitor health status; they do confidential communications, and so on. All these applications rely on cryptography and, therefore, present us a research objective: how to implement cryptography on embedded systems in a trustworthy and efficient manner. Implementing embedded cryptography faces two challenges - constrained resources and physical attacks. Due to low cost constraints and power budget constraints, embedded devices are not able to use high-end processors. They cannot run at extremely high frequencies either. Since most embedded devices are portable and deployed in the field, attackers are able to get physical access and to mount attacks as they want. For example, the power dissipation, electromagnetic radiation, and execution time of embedded cryptography enable Side-Channel Attacks (SCAs), which can break cryptographic implementations in a very short time with a quite low cost. In this dissertation, we propose solutions to efficient implementation of SCA-resistant and high-performance cryptographic software on embedded systems. These solutions make use of two state-of-the-art architectures of embedded processors: instruction set extensions and multi-core architectures. We show that, with proper processor micro-architecture design and suitable software programming, we are able to deliver SCA-resistant software which performs well in security, performance, and cost. In comparison, related solutions have either high hardware cost or poor performance or low attack resistance. Therefore, our solutions are more practical and see a promising future in commercial products. Another contribution of our research is the proper partitioning of the Montgomery multiplication over multi-core processors. Our solution is scalable over multiple cores, achieving almost linear speedup with a high tolerance to inter-core communication delays. We expect our contributions to serve as solid building blocks that support secure and high-performance embedded systems.
Ph. D.
Los estilos APA, Harvard, Vancouver, ISO, etc.
35

Lide, David A. y Stephen Talabac. "The Use of Digital Signal Processors in Front-End Weather Satellite Telemetry Processing". International Foundation for Telemetering, 1994. http://hdl.handle.net/10150/608545.

Texto completo
Resumen
International Telemetering Conference Proceedings / October 17-20, 1994 / Town & Country Hotel and Conference Center, San Diego, California
This paper discusses the use of DSP technology in the embedded real time ingest and pre-processing of weather satellite data. Specifically, case studies are presented in the use of Texas Instrument TMS 320 processors as front-end handlers of GOES MODE AAA and GOES GVAR data formats.
Los estilos APA, Harvard, Vancouver, ISO, etc.
36

Azambuja, José Rodrigo Furlanetto de. "Designing and evaluating hybrid techniques to detect transient faults in processors embedded in FPGAs". reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2013. http://hdl.handle.net/10183/102687.

Texto completo
Resumen
Der aktuelle Stand der Technologie bringt schnellere und kleinere Bausteine für die Herstellung von integrierten Schaltungen mit sich, die während sie effizienter sind auch anfälliger für Strahlung werden. Kleinere Abmessungen der Transistoren, höhere Integrationsdichte, geringere Versorgungsspannungen und höhere Betriebsfrequenzen sind einige der Charakteristika, die energiegeladene Partikel zu einer Herausforderung machen, wenn man integrierte Schaltungen in rauen Umgebungen einsetzt. Diese Art der Partikel hat einen sehr großen Einfluss auf Prozessoren, die in einer solchen Umgebung eingesetzt werden. Sowohl die Ausführung des Programms, welche durch fehlerhafte Sprünge in der Programmsequenz beeinflusst wird, als auch Daten, die in speichernden Elementen wie Programmspeicher, Datenspeicher oder in Registern abgelegt sind, werden verfälscht. Um solche Prozessorsysteme abzusichern, wird in der Literatur Fehlertoleranz empfohlen, welche die Systemperformanz verringert, einen größeren Flächenverbrauch mit sich bringt und das System dennoch nicht komplett schützen kann. Diese Fehlertoleranz kann sowohl durch software- als auch durch hardwarebasierte Ansätze umgesetzt werden. In diesem Zusammenhang schlagen wir eine Kombination aus Hardware- und Software- Lösung vor, welche die Systemperformanz nur sehr wenig beeinflusst und den zusätzlichen Speicheraufwand minimiert. Diese Hybrid-Technologie zielt darauf ab, alle Fehler in einem System zu finden. Fünf solcher Techniken werden beschrieben und erklärt, zwei der vorgestellten Techniken sind bekannte Software-Lösungen, die anderen drei sind neue Hybrid-Lösungen, um alle transienten Effekte von Strahlung in Prozessoren erkennen zu können. Diese unterschiedlichen Ansätze werden anhand ihrer Ausführungszeit, Programm-, Datenspeicher, Flächenvergrößerung und Taktfrequenz analysiert und ausgewertet. Um die Effizienz und die Machbarkeit des vorgeschlagenen Ansatzes verifizieren zu können, werden Fehlerinjektionstests sowohl durch Simulation als auch durch Bestrahlungsexperimente in unterschiedlichen Positionen mit einer Cobalt-60 Quelle durchgeführt. Die Ergebnisse des vorgeschlagenen Ansatzes verbessern den Stand der Technik durch die Bereitstellung einer höheren Fehlererkennungsrate bei sehr geringer negativer Beeinflussung der Performanz und des Speicherverbrauchs.
Os recentes avanços tecnológicos proporcionaram dispositivos menores e mais rápidos para a fabricação de circuitos que, apesar de mais eficientes, se tornaram mais sensíveis aos efeitos de radiação. Menores dimensões de transistores, mais densidade de integração, tensões de alimentação mais baixas e frequências de operação mais altas são algumas das características que tornaram partículas energizadas um problema, quando lidando com sistemas integrados em ambientes severos. Estes tipos de partículas tem uma grande influencia em processadores funcionando em tais ambientes, afetando tanto o fluxo de execução do programa ao causar desvios incorretos, bem como os dados armazenados em elementos de memória, como memórias de dados e programas e registradores. A fim de proteger sistemas processados, técnicas de tolerância a falhas foram propostas na literatura usando propostas baseadas em hardware, software, que diminuem o desempenho do sistema, aumentam a sua área e não são capazes de proteger totalmente o sistema destes efeitos. Neste contexto, propomos a combinação de técnicas baseadas em hardware e software para criar técnicas híbridas orientadas a detectar todas as falhas que afetam o sistema, com baixa degradação de desempenho e aumento de memória. Cinco técnicas são apresentadas e descritas em detalhes, das quais duas são conhecidas técnicas baseadas puramente em software e três são técnicas híbridas novas, para detectar todos os tipos de efeitos transientes causados pela radiação em processadores. As técnicas são avaliadas de acordo com o aumento no tempo de execução, no uso das memórias de dados e programa e de área, e degradação da frequência de operação. Para verificar a eficiência e aplicabilidade das técnicas propostas, campanhas de injeção de falhas são realizadas ao se simular a injeção de falhas e realizar experimentos de irradiação em diferentes localidades com nêutron e fontes de Cobalto-60. Os resultados mostraram que as técnicas propostas aprimoraram o estado da arte ao fornecer altas taxas de detecção de falhas com baixas penalidades em degradação de desempenho e aumento de memória.
Recent technology advances have provided faster and smaller devices for manufacturing circuits that while more efficient have become more sensitive to the effects of radiation. Smaller transistor dimensions, higher density integration, lower voltage supplies and higher operating frequencies are some of the characteristics that make energized particles an issue when dealing with integrated circuits in harsh environments. These types of particles have a major influence in processors working in such environments, affecting both the program’s execution flow by causing incorrect jumps in the program, and the data stored in memory elements, such as data and program memories, and registers. In order to protect processor systems, fault tolerance techniques have been proposed in literature using hardware-based and software-based approaches, which decrease the system’s performance, increase its area, and are not able to fully protect the system against such effects. In this context, we proposed a combination of hardware- and software-based techniques to create hybrid techniques aimed at detecting all the faults affecting the system, at low performance degradation and memory overhead. Five techniques are presented and described in detail, from which two are known software-based only techniques and three are new hybrid techniques, to detect all kinds of transient effects caused by radiation in processors. The techniques are evaluated according to execution time, program and data memories, and area overhead and operating frequency degradation. To verify the effectiveness and the feasibility of the proposed techniques, fault injection campaigns are performed by injecting faults by simulation and performing irradiation experiments in different locations with neutrons and a Cobalt-60 sources. Results have shown that the proposed techniques improve the state-of-the-art by providing high fault detection rates at low penalties on performance degradation and memory overhead.
Los estilos APA, Harvard, Vancouver, ISO, etc.
37

Musasa, Mutombo Mike. "Evaluation of embedded processors for next generation asic : Evaluation of open source Risc-V processors and tools ability to perform packet processing operations compared to Arm Cortex M7 processors". Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-299656.

Texto completo
Resumen
Nowadays, network processors are an integral part of information technology. With the deployment of 5G network ramping up around the world, numerous new devices are going to take advantage of their processing power and programming flexibility. Contemporary information technology providers of today such as Ericsson, spend a great amount of financial resources on licensing deals to use processors with proprietary instruction set architecture designs from companies like Arm holdings. There is a new non-proprietary instruction set architecture technology being developed known as Risc-V. There are many open source processors based on Risc-V architecture, but it is still unclear how well an open-source Risc-V processor performs network packet processing tasks compared to an Arm-based processor. The main purpose of this thesis is to design a test model simulating and evaluating how well an open-source Risc-V processor performs packet processing compared to an Arm Cortex M7 processor. This was done by designing a C code simulating some key packet processing functions processing 50 randomly generated 72 bytes data packets. The following functions were tested: framing, parsing, pattern matching, and classification. The code was ported and executed in both an Arm Cortex M7 processor and an emulated open source Risc-V processor. A working packet processing test code was built, evaluated on an Arm Cortex M7 processor. Three different open-source Risc-V processors were tested, Arianne, SweRV core, and Rocket-chip. The execution time of both cases was analyzed and compared. The execution time of the test code on Arm was 67, 5 ns. Based on the results, it can be argued that open source Risc-V processor tools are not fully reliable yet and ready to be used for packet processing applications. Further evaluation should be performed on this topic, with a more in-depth look at the SweRV core processor, at physical open-source Risc-V hardware instead of emulators.
Nätverksprocessorer är en viktig byggsten av informationsteknik idag. I takt med att 5G nätverk byggs ut runt om i världen, många fler enheter kommer att kunna ta del av deras kraftfulla prestanda och programerings flexibilitet. Informationsteknik företag som Ericsson, spenderarmycket ekonomiska resurser på licenser för att kunna använda proprietära instruktionsuppsättnings arkitektur teknik baserade processorer från ARM holdings. Det är väldigt kostam att fortsätta köpa licenser då dessa arkitekturer är en byggsten till designen av många processorer och andra komponenter. Idag finns det en lovande ny processor instruktionsuppsättnings arkitektur teknik som inte är licensierad så kallad Risc-V. Tack vare Risc-V har många propietära och öppen källkod processor utvecklats idag. Det finns dock väldigt lite information kring hur bra de presterar i nätverksapplikationer är känt idag. Kan en öppen-källkod Risc-V processor utföra nätverks databehandling funktioner lika bra som en proprietär Arm Cortex M7 processor? Huvudsyftet med detta arbete är att bygga en test model som undersöker hur väl en öppen-källkod Risc-V baserad processor utför databehandlings operationer av nätverk datapacket jämfört med en Arm Cortex M7 processor. Detta har utförts genom att ta fram en C programmeringskod som simulerar en mottagning och behandling av 72 bytes datapaket. De följande funktionerna testades, inramning, parsning, mönster matchning och klassificering. Koden kompilerades och testades i både en Arm Cortex M7 processor och 3 olika emulerade öppen källkod Risc-V processorer, Arianne, SweRV core och Rocket-chip. Efter att ha testat några öppen källkod Risc-V processorer och använt test koden i en ArmCortex M7 processor, kan det hävdas att öppen-källkod Risc-V processor verktygen inte är tillräckligt pålitliga än. Denna rapport tyder på att öppen-källkod Risc-V emulatorer och verktygen behöver utvecklas mer för att användas i nätverks applikationer. Det finns ett behov av ytterligare undersökning inom detta ämne i framtiden. Exempelvis, en djupare undersökning av SweRV core processor, eller en öppen-källkod Risc-V byggd hårdvara krävs.
Los estilos APA, Harvard, Vancouver, ISO, etc.
38

Webb, Robert L. "ASYNCHRONOUS MIPS PROCESSORS: EDUCATIONAL SIMULATIONS". DigitalCommons@CalPoly, 2010. https://digitalcommons.calpoly.edu/theses/381.

Texto completo
Resumen
The system clock has been omnipresent in most mainstream chip designs. While simplifying many design problems the clock has caused the problems of clock skew, high power consumption, electromagnetic interference, and worst-case performance. In recent years, as the timing constraints of synchronous designs have been squeezed ever tighter, the efficiencies of asynchronous designs have become more attractive. By removing the clock, these issues can be mitigated. How- ever, asynchronous designs are generally more complex and difficult to debug. In this paper I discuss the advantages of asynchronous processors and the specifics of some asynchronous designs, outline the roadblocks to asynchronous processor design, and propose a series of asynchronous designs to be used by students in tandem with traditional synchronous designs when taking an undergraduate computer architecture course.
Los estilos APA, Harvard, Vancouver, ISO, etc.
39

El, Moussawi Ali Hassan. "SIMD-aware word length optimization for floating-point to fixed-point conversion targeting embedded processors". Thesis, Rennes 1, 2016. http://www.theses.fr/2016REN1S150/document.

Texto completo
Resumen
Afin de limiter leur coût et/ou leur consommation électrique, certains processeurs embarqués sacrifient le support matériel de l'arithmétique à virgule flottante. Pourtant, pour des raisons de simplicité, les applications sont généralement spécifiées en utilisant l'arithmétique à virgule flottante. Porter ces applications sur des processeurs embarqués de ce genre nécessite une émulation logicielle de l'arithmétique à virgule flottante, qui peut sévèrement dégrader la performance. Pour éviter cela, l'application est converti pour utiliser l'arithmétique à virgule fixe, qui a l'avantage d'être plus efficace à implémenter sur des unités de calcul entier. La conversion de virgule flottante en virgule fixe est une procédure délicate qui implique des compromis subtils entre performance et précision de calcul. Elle permet, entre autre, de réduire la taille des données pour le coût de dégrader la précision de calcul. Par ailleurs, la plupart de ces processeurs fournissent un support pour le calcul vectoriel de type SIMD (Single Instruction Multiple Data) afin d'améliorer la performance. En effet, cela permet l'exécution d'une opération sur plusieurs données en parallèle, réduisant ainsi le temps d'exécution. Cependant, il est généralement nécessaire de transformer l'application pour exploiter les unités de calcul vectoriel. Cette transformation de vectorisation est sensible à la taille des données ; plus leurs tailles diminuent, plus le taux de vectorisation augmente. Il apparaît donc un compromis entre vectorisation et précision de calcul. Plusieurs travaux ont proposé des méthodologies permettant, d'une part la conversion automatique de virgule flottante en virgule fixe, et d'autre part la vectorisation automatique. Dans l'état de l'art, ces deux transformations sont considérées indépendamment, pourtant elles sont fortement liées. Dans ce contexte, nous étudions la relation entre ces deux transformations, dans le but d'exploiter efficacement le compromis entre performance et précision de calcul. Ainsi, nous proposons d'abord un algorithme amélioré pour l'extraction de parallélisme SLP (Superword Level Parallelism ; une technique de vectorisation). Puis, nous proposons une nouvelle méthodologie permettant l'application conjointe de la conversion de virgule flottante en virgule fixe et de l'exploitation du SLP. Enfin, nous implémentons cette approche sous forme d'un flot de compilation source-à-source complètement automatisé, afin de valider ces travaux. Les résultats montrent l'efficacité de cette approche, dans l'exploitation du compromis entre performance et précision, vis-à-vis d'une approche classique considérant ces deux transformations indépendamment
In order to cut-down their cost and/or their power consumption, many embedded processors do not provide hardware support for floating-point arithmetic. However, applications in many domains, such as signal processing, are generally specified using floating-point arithmetic for the sake of simplicity. Porting these applications on such embedded processors requires a software emulation of floating-point arithmetic, which can greatly degrade performance. To avoid this, the application is converted to use fixed-point arithmetic instead. Floating-point to fixed-point conversion involves a subtle tradeoff between performance and precision ; it enables the use of narrower data word lengths at the cost of degrading the computation accuracy. Besides, most embedded processors provide support for SIMD (Single Instruction Multiple Data) as a mean to improve performance. In fact, this allows the execution of one operation on multiple data in parallel, thus ultimately reducing the execution time. However, the application should usually be transformed in order to take advantage of the SIMD instruction set. This transformation, known as Simdization, is affected by the data word lengths ; narrower word lengths enable a higher SIMD parallelism rate. Hence the tradeoff between precision and Simdization. Many existing work aimed at provide/improving methodologies for automatic floating-point to fixed-point conversion on the one side, and Simdization on the other. In the state-of-the-art, both transformations are considered separately even though they are strongly related. In this context, we study the interactions between these transformations in order to better exploit the performance/accuracy tradeoff. First, we propose an improved SLP (Superword Level Parallelism) extraction (an Simdization technique) algorithm. Then, we propose a new methodology to jointly perform floating-point to fixed-point conversion and SLP extraction. Finally, we implement this work as a fully automated source-to-source compiler flow. Experimental results, targeting four different embedded processors, show the validity of our approach in efficiently exploiting the performance/accuracy tradeoff compared to a typical approach, which considers both transformations independently
Los estilos APA, Harvard, Vancouver, ISO, etc.
40

Raisi, Mehrdad. "Adaptive applications of OPTO-VLSI processors in WDM networks". Thesis, Edith Cowan University, Research Online, Perth, Western Australia, 2004. https://ro.ecu.edu.au/theses/840.

Texto completo
Resumen
Communication is an inseparable part of human life and its nature continues to evolve and improve. The advent of laser was a herald to the new possibilities in the communication world. In recent years technologies such as Wavelength Division Multiplexing (WDM) and Erbium Doped Fiber Amplifiers (EDFA) have afforded significant boost to the practice of optical communication. At the heart of this brave new world is the need to dynamically/ adaptively steer/route beams of light carrying very large amounts of data. In recent years many techniques have been proposed for this purpose by various researchers. In this study we have elected to utilise the beam-steering capabilities of Opto-VLSI processors to investigate band-pass filtering and channel equalisation as two possible and practical applications in WDM networks.
Los estilos APA, Harvard, Vancouver, ISO, etc.
41

De, Guzman Ethan Paul Palisoc. "Energy Efficient Computing using Scalable General Purpose Analog Processors". DigitalCommons@CalPoly, 2021. https://digitalcommons.calpoly.edu/theses/2305.

Texto completo
Resumen
Due to fundamental physical limitations, conventional digital circuits have not been able to scale at the pace expected from Moore’s law. In addition, computationally intensive applications such as neural networks and computer vision demand large amounts of energy from digital circuits. As a result, energy efficient alternatives are needed in order to provide continued performance scaling. Analog circuits have many well known benefits: the ability to store more information onto a single wire and efficiently perform mathematical operations such as addition, subtraction, and differential equation solving. However, analog computing also comes with drawbacks such as its sensitivity to process variation and noise, limited scalability, programming difficulty, and poor compatibility with digital circuits and design tools. We propose to leverage the strengths of analog circuits and avoid its weaknesses by using digital circuits and time-encoded computation. Time-encoded circuits also operate on continuous data but are implemented using digital circuits. We propose a novel scalable general purpose analog processor using time-encoded circuits that is well suited for emerging applications that require high numeric precision. The processor’s datapath, including time-domain register file and function units are described. We evaluate our proposed approach using an implementation that is simulated with a 0.18µm TSMC process and demonstrate that this approach improves the performance of a scientific benchmark by 4x compared against conventional analog implementations and improves energy consumption by 146x compared against digital implementations.
Los estilos APA, Harvard, Vancouver, ISO, etc.
42

Kim, Jongmyon. "Architectural Enhancements for Color Image and Video Processing on Embedded Systems". Diss., Georgia Institute of Technology, 2005. http://hdl.handle.net/1853/6948.

Texto completo
Resumen
As emerging portable multimedia applications demand more and more computational throughput with limited energy consumption, the need for high-efficiency, high-throughput embedded processing is becoming an important challenge in computer architecture. In this regard, this dissertation addresses application-, architecture-, and technology-level issues in existing processing systems to provide efficient processing of multimedia in many, or ideally all, of its form. In particular, this dissertation explores color imaging in multimedia while focusing on two architectural enhancements for memory- and performance-hungry embedded applications: (1) a pixel-truncation technique and (2) a color-aware instruction set (CAX) for embedded multimedia systems. The pixel-truncation technique differs from previous techniques (e.g., 4:2:2 and 4:2:0 subsampling) used in image and video compression applications (e.g., JPEG and MPEG) in that it reduces the information content in individual pixel word sizes rather than in each dimension. Thus, this technique drastically reduces the bandwidth and memory required to transport and store color images without perceivable distortion in color. At the same time, it maintains the pixel storage format of color image processing in which each pixel computation is performed simultaneously on 3-D YCbCr components, which are widely used in the image and video processing community. CAX supports parallel operations on two-packed 16-bit (6:5:5) YCbCr data in a 32-bit datapath processor, providing greater concurrency and efficiency for processing color image sequences. This dissertation presents the impact of CAX on processing performance and on both area and energy efficiency for color imaging applications in three major processor architectures: dynamically scheduled (superscalar), statically scheduled (very long instruction word, VLIW), and embedded single instruction multiple data (SIMD) array processors. Unlike typical multimedia extensions, CAX obtains substantial performance and code density improvements through direct support for color data processing rather than depending solely on generic subword parallelism. In addition, the ability to reduce data format size reduces system cost. The reduction in data bandwidth also simplifies system design. In summary, CAX, coupled with the pixel-truncation technique, provides an efficient mechanism that meets the computational requirements and cost goals for future embedded multimedia products.
Los estilos APA, Harvard, Vancouver, ISO, etc.
43

DUTT, Nikil D., Hiroaki TAKADA y Hiroyuki TOMIYAMA. "Memory Data Organization for Low-Energy Address Buses". Institute of Electronics, Information and Communication Engineers, 2004. http://hdl.handle.net/2237/15042.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
44

Revy, Guillaume. "Implementation of binary floating-point arithmetic on embedded integer processors - Polynomial evaluation-based algorithms and certified code generation". Phd thesis, Ecole normale supérieure de lyon - ENS LYON, 2009. http://tel.archives-ouvertes.fr/tel-00469661.

Texto completo
Resumen
Aujourd'hui encore, certains systèmes embarqués n'intègrent pas leur propre unité flottante, pour des contraintes de surface, de coût et de consommation d'énergie. Cependant, ce type d'architecture est largement utilisé dans des domaines d'application extrêmement exigeants en calculs flottants (le multimédia, l'audio et la vidéo ou les télécommunications). Pour compenser le fait que l'arithmétique flottante ne soit pas implantée en matériel, elle doit être émulée efficacement à travers une implantation logicielle. Cette thèse traite de la conception et de l'implantation d'un support logiciel efficace pour l'arithmétique virgule flottante IEEE 754 aux processeurs entiers embarqués. Plus spécialement, elle propose de nouveaux algorithmes et outils pour la génération efficace de programmes à la fois rapides et certifiés, permettant notamment d'obtenir des codes C de très faibles latences pour l'évaluation polynomiale en arithmétique virgule fixe. Comparés aux implantations complètement écrites à la main, ces outils permettent de réduire de manière significative le temps de développement d'opérateurs flottants. La première partie de la thèse traite de la conception d'algorithmes optimisés pour certains opérateurs flottants en base 2, et donne des détails sur leur implantation logicielle pour le format virgule flottante binary32 et pour certains processeurs VLIW entiers embarqués comme ceux de la famille ST200 de STMicroelectronics. En particulier, nous proposons ici une approche uniforme pour l'implantation correctement arrondie des racines et de leur inverse, ainsi qu'une extension à la division. Notre approche, qui repose sur l'évaluation d'un seul polynôme bivarié, permet d'exprimer un plus haut degré de parallélisme d'instruction (ILP) que les méthodes précédentes, et s'avère particulièrement efficace en pratique. Ces travaux nous ont permis de fournir une version complètement remaniée de la bibliothèque FLIP, entraînant des gains significatifs par rapport à la version précédente. La deuxième partie de la thèse présente une méthodologie pour générer automatiquement et efficacement des codes C rapides et certifiés pour l'évaluation de polynômes bivariés en arithmétique virgule fixe. En particulier, elle consiste en un ensemble d'heuristiques pour calculer des schémas d'évaluation très parallèles et de faible latence, ainsi qu'un ensemble de techniques pour vérifier si ces schémas restent efficaces sur une architecture cible réelle et suffisamment précis pour garantir l'arrondi correct de l'implantation des opérateurs sous-jacente. Cette approche a été implantée dans l'environnement logiciel CGPE (Code Generation for Polynomial Evaluation). Nous avons ainsi utilisé notre outil pour générer et certifier rapidement des parties significatives des codes de la bibliothèque FLIP.
Los estilos APA, Harvard, Vancouver, ISO, etc.
45

TAVARES, Eduardo Antônio Guimarães. "A time Petri net based approach for software synthesis in Hard Real-Time embedded systems with multiple processors". Universidade Federal de Pernambuco, 2006. https://repositorio.ufpe.br/handle/123456789/2589.

Texto completo
Resumen
Made available in DSpace on 2014-06-12T15:59:31Z (GMT). No. of bitstreams: 2 arquivo5135_1.pdf: 1049051 bytes, checksum: e5be25e2aa87cb17b0788411f129a4a8 (MD5) license.txt: 1748 bytes, checksum: 8a4605be74aa9ea9d79846c1fba20a33 (MD5) Previous issue date: 2006
Atualmente, sistemas embarcados são ubíquos. Em outras palavras, eles estão em todos os lugares. Desde utilitários domésticos (ex: fornos microondas, refrigeradores, videocassetes, máquinas de fax, máquinas de lavar roupa, alarmes) até equipamentos militares (ex: mísseis guiados, satélites espiões, sondas espaciais, aeronaves), nós podemos encontrar um sistema embarcado. Desnecessário afirmar que a vida humana tem se tornado mais e mais dependente desses sistemas. Alguns sistemas embarcados são classificados como sistemas de tempo real, onde o comportamento correto depende não somente da integridade dos resultados, mas também nos tempos em que tais resultados são produzidos. Em sistemas embarcados de tempo real críticos, se as restrições temporais não forem satisfeitas, as conseqüências podem ser desastrosas, incluindo grandes danos aos equipamentos ou mesmo perdas de vidas humanas. Devido a tarefas que possuem alta taxa de utilização de processador, alguns sistemas embarcados (ex: dispositivos médicos) precisam ser compostos de mais de um processador para obter performance aceitável e, no caso de sistemas embarcados de tempo real críticos, para satisfazer as restrições temporais críticas. Entretanto, questões adicionais precisam ser consideradas para lidar com um ambiente multiprocessado, tal como comunicação entre processadores e sincronização. Nessa dissertação, um método de síntese de software baseado no formalismo matemático redes de Petri com tempo é apresentado para lidar com sistemas embarcardos de tempo real críticos com múltiplos processadores. A abordagem inicia a partir de uma especificação (usualmente composta de tarefas concorrentes e comunicantes) e automaticamente gera o código fonte de um programa considerando: (i) as funcionalidades e restrições; e (ii) o suporte operacional para execução das tarefas em um ambiente multiprocessado. Síntese de software é uma alternativa para sistemas operacionais especializados para dar suporte a execução de um programa. Sistemas operacionais são usualmente genéricos e podem introduzir atrasos no tempo de execução, e ao mesmo tempo produzir alto consumo de memória. Por outro lado, a síntese de software é uma alternativa de projeto, dado que este método automaticamente gera o código fonte do programa, satisfazendo a funcionalidade, as restrições especificadas, o suporte para execução, e a minimização dos atrasos e uso de memória
Los estilos APA, Harvard, Vancouver, ISO, etc.
46

Chielle, Eduardo. "Selective Software-Implemented Hardware Fault Tolerance Techniques to Detect Soft Errors in Processors with Reduced Overhead". Doctoral thesis, Universidad de Alicante, 2016. http://hdl.handle.net/10045/62467.

Texto completo
Resumen
Software-based fault tolerance techniques are a low-cost way to protect processors against soft errors. However, they introduce significant overheads to the execution time and code size, which consequently increases the energy consumption. System operating with time or energy restrictions may not be able to use these techniques. For this reason, this work proposes new software-based fault tolerance techniques with lower overheads and similar fault coverage to state-of-the-art software techniques. Thus, they can meet the system constraints. In addition, the shorter execution time reduces the exposure time to radiation. Consequently, the reliability is higher for the same fault coverage. Techniques can work with error correction or error detection. Once detection is less costly than correction, this work focuses on software-based detection techniques. Firstly, a set of data-flow techniques called VAR is proposed. The techniques are based on general building rules to allow an exhaustive assessment, in terms of reliability and overheads, of different technique variations. The rules define how the technique duplicates the code and insert checkers. Each technique uses a different set of rules. Then, a control-flow technique called SETA (Software-only Error-detection Technique using Assertions) is introduced. Comparing SETA with a state-of-the-art technique, SETA is 11.0% faster and occupies 10.3% fewer memory positions. The most promising data-flow techniques are combined with the control-flow technique in order to protect both dataflow and control-flow of the target application. To go even further with the reduction of the overheads, methods to selective apply the proposed software techniques have been developed. For the data-flow techniques, instead of protecting all registers, only a set of selected registers is protected. The set is selected based on a metric that analyzes the code and rank the registers by their criticality. For the control-flow technique, two approaches are taken: (1) removing checkers from basic blocks: all the basic blocks are protected by SETA, but only selected basic blocks have checkers inserted, and (2) selectively protecting basic blocks: only a set of basic blocks is protected. The techniques and their selective versions are evaluated in terms of execution time, code size, fault coverage, and Mean Work To Failure (MWTF), which is a metric to measure the trade-off between fault coverage and execution time. Results show that was possible to reduce the overheads without affecting the fault coverage, and for a small reduction in the fault coverage it was possible to significantly reduce the overheads. Lastly, since the evaluation of all the possible combinations for selective hardening of every application takes too much time, this work uses a method to extrapolate the results obtained by simulation in order to find the parameters for the selective combination of data and control-flow techniques that are probably the best candidates to improve the trade-off between reliability and overheads.
Los estilos APA, Harvard, Vancouver, ISO, etc.
47

Guan, Nan. "New Techniques for Building Timing-Predictable Embedded Systems". Doctoral thesis, Uppsala universitet, Avdelningen för datorteknik, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-209623.

Texto completo
Resumen
Embedded systems are becoming ubiquitous in our daily life. Due to close interaction with physical world, embedded systems are typically subject to timing constraints. At design time, it must be ensured that the run-time behaviors of such systems satisfy the pre-specified timing constraints under any circumstance. In this thesis, we develop techniques to address the timing analysis problems brought by the increasing complexity of underlying hardware and software on different levels of abstraction in embedded systems design. On the program level, we develop quantitative analysis techniques to predict the cache hit/miss behaviors for tight WCET estimation, and study two commonly used replacement policies, MRU and FIFO, which cannot be analyzed adequately using the state-of-the-art qualitative cache analysis method. Our quantitative approach greatly improves the precision of WCET estimation and discloses interesting predictability properties of these replacement policies, which are concealed in the qualitative analysis framework. On the component level, we address the challenges raised by multi-core computing. Several fundamental problems in multiprocessor scheduling are investigated. In global scheduling, we propose an analysis method to rule out a great part of impossible system behaviors for better analysis precision, and establish conditions to guarantee the bounded responsiveness of computing tasks. In partitioned scheduling, we close a long standing open problem to generalize the famous Liu and Layland's utilization bound in uniprocessor real-time scheduling to multiprocessor systems. We also propose to use cache partitioning for multi-core systems to avoid contentions on shared caches, and solve the underlying schedulability analysis problem. On the system level, we present techniques to improve the Real-Time Calculus (RTC) analysis framework in both efficiency and precision. First, we have developed Finitary Real-Time Calculus to solve the scalability problem of the original RTC due to period explosion. The key idea is to only maintain and operate on a limited prefix of each curve that is relevant to the final results during the whole analysis procedure. We further improve the analysis precision of EDF components in RTC, by precisely bounding the response time of each computation request.
Los estilos APA, Harvard, Vancouver, ISO, etc.
48

Colombet, Quentin. "Decoupled (SSA-based) register allocators : from theory to practice, coping with just-in-time compilation and embedded processors constraints". Phd thesis, Ecole normale supérieure de lyon - ENS LYON, 2012. http://tel.archives-ouvertes.fr/tel-00764405.

Texto completo
Resumen
My thesis deals with register allocation. During this phase, the compiler has to assign variables of the source program, in an arbitrary big number, to actual registers of the processor, in a limited number k. Recent works, for instance the thesis of F. Bouchez and S. Hack, have shown that it is possible to split in two different decoupled step this phase: the spill - store the variables into memory to release registers - followed by the registers assignment. These works demonstrate the feasibility of this decoupling relying on a theoretic framework and some assumptions. In particular, it is sufficient to ensure after the spill step that the number of variables simultaneously live is below k.My thesis follows these works by showing how to apply this kind of approach when real-world constraints come in play: instructions encoding, ABI (application binary interface), register aliasing. Different approaches are proposed. They allow either to ignore these problems or to directly tackle them into the theoretic framework. The hypothesis of the models and the proposed solutions are evaluated and validated using a thorough experimental study with the compiler of STMicroelectronics. Finally, all these works have been done with the constraints of modern compilers in mind, the JIT (just-in-time) compilation, where the compilation time et the memory footprint of the compiler are key factors. We strive to offer solutions that cope with these criteria or improve the result until a given budget is reached. We, in particular, used the SSA (static single assignment) form to define algorithm like tree scan that generalizes linear scan based approaches proposed for JIT compilation.
Los estilos APA, Harvard, Vancouver, ISO, etc.
49

Schölzel, Mario [Verfasser] y Heinrich Theodor [Akademischer Betreuer] Vierhaus. "Self-testing and self-repairing embedded processors: techniques for statically scheduled superscalar architectures / Mario Schölzel ; Betreuer: Heinrich Theodor Vierhaus". Cottbus : BTU Cottbus - Senftenberg, 2014. http://d-nb.info/1114664901/34.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
50

Burgio, Paolo <1981&gt. "Use of shared memory in the context of embedded multi-core processors: exploration of the technology and its limits". Doctoral thesis, Alma Mater Studiorum - Università di Bologna, 2013. http://amsdottorato.unibo.it/6187/1/Burgio_Paolo_Tesi.pdf.

Texto completo
Resumen
Modern embedded systems embrace many-core shared-memory designs. Due to constrained power and area budgets, most of them feature software-managed scratchpad memories instead of data caches to increase the data locality. It is therefore programmers’ responsibility to explicitly manage the memory transfers, and this make programming these platform cumbersome. Moreover, complex modern applications must be adequately parallelized before they can the parallel potential of the platform into actual performance. To support this, programming languages were proposed, which work at a high level of abstraction, and rely on a runtime whose cost hinders performance, especially in embedded systems, where resources and power budget are constrained. This dissertation explores the applicability of the shared-memory paradigm on modern many-core systems, focusing on the ease-of-programming. It focuses on OpenMP, the de-facto standard for shared memory programming. In a first part, the cost of algorithms for synchronization and data partitioning are analyzed, and they are adapted to modern embedded many-cores. Then, the original design of an OpenMP runtime library is presented, which supports complex forms of parallelism such as multi-level and irregular parallelism. In the second part of the thesis, the focus is on heterogeneous systems, where hardware accelerators are coupled to (many-)cores to implement key functional kernels with orders-of-magnitude of speedup and energy efficiency compared to the “pure software” version. However, three main issues rise, namely i) platform design complexity, ii) architectural scalability and iii) programmability. To tackle them, a template for a generic hardware processing unit (HWPU) is proposed, which share the memory banks with cores, and the template for a scalable architecture is shown, which integrates them through the shared-memory system. Then, a full software stack and toolchain are developed to support platform design and to let programmers exploiting the accelerators of the platform. The OpenMP frontend is extended to interact with it.
I sistemi integrati moderni sono architetture many-core, in cui spesso lo spazio di memoria è condiviso fra i processori. Per ridurre i consumi, molte di queste architetture sostituiscono le cache dati con memorie scratchpad gestite in software, per massimizzarne la località alle CPU e aumentare le performance. Questo significa che i dati devono essere spostati manualmente da parte del programmatore. Inoltre, tradurre in perfomance l’enorme parallelismo potenziale delle piattaforme many-core non è semplice. Per supportare la programmazione, diversi programming model sono stati proposti, e siccome lavorano ad un alto livello di astrazione, sfruttano delle librerie di runtime che forniscono servizi di base quali sincronizzazione, allocazione della memoria, threading. Queste librerie hanno un costo, che nei sistemi integrati è troppo elevato e ostacola il raggiungimento delle piene performance. Questa tesi analizza come un programming model ad alto livello di astrazione – OpenMP – possa essere efficientemente supportato, se il suo stack software viene adattato per sfruttare al meglio la piattaforma sottostante. In una prima parte, studio diversi meccanismi di sincronizzazione e comunicazione fra thread paralleli, portati sulle piattaforme many-core. In seguito, li utilizzo per scrivere un runtime di supporto a OpenMP che sia il più possibile efficente e “leggero” e che supporti paradigmi di parallelismo multi-livello e irregolare, spesso presenti nelle applicazioni moderne. Una seconda parte della tesi esplora le architetture eterogenee, ossia con acceleratori hardware. Queste architetture soffrono di problematiche sia i) per il processo di design della piattaforma, che ii) di scalabilità della piattaforma stessa (aumento del numero degli acceleratori e dei processori), che iii) di programmabilità. La tesi propone delle soluzioni a tutti e tre i problemi. Il linguaggio di programmazione usato è OpenMP, sia per la sua grande espressività a livello semantico, sia perché è lo standard de-facto per programmare sistemi a memoria condivisa.
Los estilos APA, Harvard, Vancouver, ISO, etc.
Ofrecemos descuentos en todos los planes premium para autores cuyas obras están incluidas en selecciones literarias temáticas. ¡Contáctenos para obtener un código promocional único!

Pasar a la bibliografía