Rozprawy doktorskie na temat „Processor Architectures”
Utwórz poprawne odniesienie w stylach APA, MLA, Chicago, Harvard i wielu innych
Sprawdź 50 najlepszych rozpraw doktorskich naukowych na temat „Processor Architectures”.
Przycisk „Dodaj do bibliografii” jest dostępny obok każdej pracy w bibliografii. Użyj go – a my automatycznie utworzymy odniesienie bibliograficzne do wybranej pracy w stylu cytowania, którego potrzebujesz: APA, MLA, Harvard, Chicago, Vancouver itp.
Możesz również pobrać pełny tekst publikacji naukowej w formacie „.pdf” i przeczytać adnotację do pracy online, jeśli odpowiednie parametry są dostępne w metadanych.
Przeglądaj rozprawy doktorskie z różnych dziedzin i twórz odpowiednie bibliografie.
Sherwood, Timothy. "Application-tuned processor architectures /". Diss., Connect to a 24 p. preview or request complete full text in PDF format. Access restricted to UC campuses, 2003. http://wwwlib.umi.com/cr/ucsd/fullcit?p3090450.
Pełny tekst źródłaKilleen, Timothy F. "Improving processor utilization in multiple context processor architectures". Ohio : Ohio University, 1997. http://www.ohiolink.edu/etd/view.cgi?ohiou1174618393.
Pełny tekst źródłaTune, Eric. "Critical-path aware processor architectures /". Diss., Connect to a 24 p. preview or request complete full text in PDF format. Access restricted to UC campuses, 2004. http://wwwlib.umi.com/cr/ucsd/fullcit?p3153686.
Pełny tekst źródłaCommissariat, Hormazd P. "Performance Modeling of Single Processor and Multi-Processor Computer Architectures". Thesis, Virginia Tech, 1995. http://hdl.handle.net/10919/31377.
Pełny tekst źródłaMaster of Science
Al-Khayatt, Samir S. "Functional partitioning of multi-processor architectures". Thesis, Loughborough University, 1990. https://dspace.lboro.ac.uk/2134/32337.
Pełny tekst źródłaSeng, John. "Optimizing processor architectures for power-efficiency /". Diss., Connect to a 24 p. preview or request complete full text in PDF format. Access restricted to UC campuses, 2003. http://wwwlib.umi.com/cr/ucsd/fullcit?p3091334.
Pełny tekst źródłaShnidman, Nathan R. (Nathan Robert). "Multipass communication systems for tiled processor architectures". Thesis, Massachusetts Institute of Technology, 2006. http://hdl.handle.net/1721.1/36137.
Pełny tekst źródłaIncludes bibliographical references (p. 191-202).
Multipass communication systems utilize multiple sets of parallel baseband receiver functions to balance communication data rates and available computation capabilities. This is achieved by spatially pipelining baseband functions across parallel resources to perform multiple processing passes on the same set of received values, thus allowing the system to simultaneously convey multiple sequences of data using a single wireless link. The use of multiple passes mitigates the effects of data rate on receiver processing bottlenecks, making the use of general-purpose processing elements for high data rate communication functions viable. The flexibility of general-purpose processing, in turn, allows the receiver composition to trade-off resource usage and required processing rate. For instance, a communication system could be distributed across 2 passes using 2x the overall area, but reducing the data rate for each pass and the resultant overall required processing rate, and hence clock speed, by 1/2. Lowering the clock speed can also be leveraged to reduce power through voltage scaling and/or the use of higher Vt devices. The characteristics of general-purpose parallel processors for communications processing are explored, as well as the applicability of specific parallel designs to communications processing.
(Cont.) In particular, an in depth look is taken of the Raw processor's tiled architecture as a general-purpose parallel processor particularly well suited to portable communications processing. An example of a multipass system, based on the 802.11a baseband, implemented on the Raw processor along with the accompanying hardware implementation is presented as both a proof-of-concept, as well as a means to explore some of the advantages and trade-offs of such a system. A bit-error rate study is presented which shows this multipass system to be within a small fraction of dB of the performance of an equivalent data rate single pass system, thus demonstrating the viability of the multipass algorithm. In addition, the capability of tiled processors to maximize processing capabilities at the system block level, as well as the system architectural level, is shown. Parallel implementations of two processing intensive functions: the FFT and the Viterbi decoder are shown. A parallelized assembly language FFT utilizing 16 tiles is shown to have a 1,000x improvement , and a parallelized 48-tile assembly language Viterbi decoder is shown to have a 10, 000x improvement over corresponding serial C implementations.
by Nathan Robert Shnidman.
Ph.D.
Trilla, Rodríguez David. "Non-functional considerations of time-randomized processor architectures". Doctoral thesis, Universitat Politècnica de Catalunya, 2020. http://hdl.handle.net/10803/670903.
Pełny tekst źródłaLos Sistemas Críticos Empotrados de Tiempo Real (SCETR) son el subconjunto de sistemas empotrados con requerimientos temporales cuyo mal funcionamiento puede poner en peligro vidas humanas o material valioso. Para obtener evidencias de su correcta operación, los SCETR son diseñados, implementados y desplegados en conformidad con los estándares de fiabilidad y las regulaciones de certificación. Para lograrlo, los SCETR deben seguir estrictos procesos de Validación y Verificación (VyV) de sus propiedades funcionales y no funcionales. Una de las propiedades no funcionales más importantes es la temporalidad, cuya verificación se basa en derivar los tiempos de ejecución en el peor caso de las tareas y generar una planificación de éstas para asegurar el correcto comportamiento temporal del sistema. Sin embargo, el uso de hardware y software de mayor complejidad para poder satisfacer las crecientes demandas de rendimiento en los SCETR provoca un incremento sustancial de los costes de la VyV. En el caso de la VyV temporal, métodos estadísticos como el Análisis Temporal Probabilístico Basado en Mediciones (ATPBM) ayudan a reducir el coste de la VyV en el hardware y software complejo de los SCETR. Para lograrlo, se emplea el uso de la randomización temporal a nivel de hardware. En este sentido, los Procesadores Temporalmente Randomizados (PTR) logran contener los costes de VyV mediante la destrucción de comportamientos patológicos sistemáticos y habilitando el uso de las técnicas de ATPBM. En este contexto, esta tesis demuestra que los diseños hardware y software que incorporan randomización no solo consiguen exitosamente solucionar parte del problema de análisis temporal, sino que también son útiles para analizar otras métricas no funcionales clave en los SCETR cómo la durabilidad, la seguridad y la energía. En términos de durabilidad, esta tesis demuestra que los PTR son de manera natural resilientes ante efectos de envejecimiento del hardware, efectos de inestabilidad en la alimentación y aumentamos esas propiedades proponiendo mejoras a su diseño. Además, los PTR mitigan las amenazas de seguridad e intrusiones mediante la destrucción de la asociación determinista entre el mapeo de memoria y su tiempo de acceso y desarrollamos una metodología en concordancia para una operabilidad segura en automóviles. Finalmente, para la temática energética, introducimos una taxonomía para guiar a los futuros retos en la derivación de estimaciones para consumo energético en el peor caso y marcamos los primeros pasos para usar una metodología tipo ATPBM en estimaciones energéticas bajo los efectos de variaciones de proceso. Siguiendo en la temática energética, esta tesis también muestra como los PTR de manera natural rompen y exponen patrones patológicos de consumo energético y ayudan a cuantificar y validar picos instantáneos de demanda energética. En resumen, esta tesis abre el camino en el uso de los PTR en los SCETR para atacar sus retos emergentes en las temáticas de durabilidad, seguridad y consumo energético.
Rebello, Vinod. "On the distribution of control in asynchronous processor architectures". Thesis, University of Edinburgh, 1997. http://hdl.handle.net/1842/507.
Pełny tekst źródłaPetters, Stefan M. E. "Worst case execution time estimation for advanced processor architectures". [S.l. : s.n.], 2002. http://deposit.ddb.de/cgi-bin/dokserv?idn=965404110.
Pełny tekst źródłaKwak, Jae-hyuck. "High speed CORDIC processor designs : algorithms, architectures, and applications /". Digital version accessible at:, 2000. http://wwwlib.umi.com/cr/utexas/main.
Pełny tekst źródłaOrlando, Gerardo. "Efficient elliptic curve processor architectures for field programmable logic". Link to electronic thesis, 2002. http://www.wpi.edu/Pubs/ETD/Available/etd-0327102-103635.
Pełny tekst źródłaWhitham, Jack. "Real-time processor architectures for worst case execution time reduction". Thesis, University of York, 2008. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.479513.
Pełny tekst źródłaLee, Walter (Walter Cheng-Wan). "Software orchestration of instruction level parallelism on tiled processor architectures". Thesis, Massachusetts Institute of Technology, 2005. http://hdl.handle.net/1721.1/33862.
Pełny tekst źródłaIncludes bibliographical references (p. 135-138).
Projection from silicon technology is that while transistor budget will continue to blossom according to Moore's law, latency from global wires will severely limit the ability to scale centralized structures at high frequencies. A tiled processor architecture (TPA) eliminates long wires from its design by distributing its resources over a pipelined interconnect. By exposing the spatial distribution of these resources to the compiler, a TPA allows the compiler to optimize for locality, thus minimizing the distance that data needs to travel to reach the consuming computation. This thesis examines the compiler problem of exploiting instruction level parallelism (ILP) on a TPA. It describes Rawcc, an ILP compiler for Raw, a fully distributed TPA. The thesis examines the implication of the resource distribution on the exploitation of ILP for each of the following resources: instructions, registers, control, data memory, and wires. It designs novel solutions for each one, and it describes the solutions within the integrated framework of a working compiler. Performance is evaluated on a cycle-accurate Raw simulator as well as on a 16-tile Raw chip. Results show that Rawcc can attain modest speedups for fine-grained applications, as well speedups that scale up to 64 tiles for applications with such parallelism.
by Walter Lee.
Ph.D.
Gelhaar, B., K. Alvermann i F. Dzaak. "A MULTICHANNEL DATA ACQUISITION SYSTEM BASED ON PARALLEL PROCESSOR ARCHITECTURES". International Foundation for Telemetering, 1992. http://hdl.handle.net/10150/608884.
Pełny tekst źródłaFor research purposes on helicopter rotor acoustics a large data acquisition system called TEDAS (Transputer based Expandable Data Acquisition System) has been developed. The key features of this system are: unlimited expandability and sum data rate, local storage of data during operation, very simple analog anti aliasing filtering due to extensive digital filtering, and integrated computational power which scales with the number of channels. The sample rate is up to 50 kHz/channel, the resolution is 16 bit, 360 channels are realized now. TEDAS consists of blocks with 8 A/D converters which are controlled by one transputer T800. The size of the local memory is 4 Mbyte. Any number of blocks (IDAM = Intelligent Data Acquisition Module) can be combined to a complete system. Data preprocessing is done in parallel inside the IDAMs. As for 16 bit systems the analog antialiasing filtering becomes a dominant factor of the costs, delta sigma ADCs with oversampling and internal digital filtering are used. This produces an exact linear phase and a stop band rejection of -90 dB.
Hanen, Claire. "Problemes d'ordonnancement des architectures pipelines : modelisation, optimisation, algorithmes". Paris 6, 1987. http://www.theses.fr/1987PA066424.
Pełny tekst źródłaMa, Nicholas. "Modeling and evaluation of multi-core multithreading processor architectures in SystemC". Thesis, Kingston, Ont. : [s.n.], 2007. http://hdl.handle.net/1974/510.
Pełny tekst źródłaCeder, Frederick. "Efficient Implementation of 3D Finite Difference Schemes on Recent Processor Architectures". Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-170082.
Pełny tekst źródłaEffektiv implementering av finita differensmetoder i 3D på moderna processorarkitekturer Sammanfattning Denna uppsats diskuterar implementationen av ett program som kan lösa problem modellerade efter Burgers ekvation numeriskt. Programmet är byggt ifrån grunden och använder sig av finita differensmetoder och applicerar FTCS metoden (Forward in Time Central in Space). Implementationen paralleliseras och optimeras på Intel Xeon Phi 7120P Coprocessor och Intel Xeon E5-2699v3 processorn för att undersöka skillnader i prestanda mellan de två modellerna. Vi optimerade programmet med omtanke på dataåtkomst och minneslayout för att få bra cacheutnyttjande. Loopblockningsstrategier används också för att dela upp arbetsminnet i mindre delar för att begränsa delarna i L2 cacheminnet. För att utnyttja vektorisering till fullo så används kompilatordirektiv som beskriver minnesåtkomsten, vilket ska hjälpa kompilatorn att förstå vilka dataaccesser som är alignade. Vi implementerade också prefetching strategier och streaming stores på Xeon Phi och disskuterar deras värde. Paralleliseringen gjordes med OpenMP och MPI. Parallelliseringen för Xeon Phi:en är baserad på bara OpenMP och exekverades direkt på chipet. Detta gav en rå prestanda på nästan 100 GFLOP/s och nådde en speedup på 50 med en 83% effektivitet. En OpenMP implementation på E5-2699v3 (Haswell) processorn fick upp till 292 GFLOP/s och nådde en speedup på 31 med en effektivitet på 85%. I jämnförelse fick en hybrid implementation 267 GFLOP/s och nådde en speedup på 28 med en effektivitet på 87%. En ren MPI implementation på PDC's Beskow superdator med 16 noder gav en total prestanda på 1450 GFLOP/s och för en större problemställning gav det totalt 2325 GFLOP/s, med speedup och effektivitet på respektive 170 och 33% och 290 och 56%. En analys baserad på roofline modellen visade att beräkningarna var minnesbudna till L2 cache bandbredden, vilket tyder på bra L2-cache användning för både Haswell och Xeon Phi:s arkitekturer. Xeon Phis prestanda kan förmodligen förbättras genom att även använda MPI. Håller man i åtanke de tekniska framstegen när det gäller beräkningskärnor på de senaste åren, så preseterar både arkitekturer bra. Beräkningskärnan av implementationen kan förmodligen anpassas till en mer kompilatorvänlig variant, vilket eventuellt kan leda till mer optimeringar av kompilatorn för respektive plattform. Experimenten på Cray-systemet Beskow visade en ökad effektivitet från 33,3% till 56% för större problemställningar, vilket visar tecken på bra weak scaling. Detta tyder på att effektivitet kan uppehållas om problemställningen växer med fler antal beräkningsnoder. Frederick Ceder
Hasan, Mehedi. "Coherent Optical & Electro-Optical Signal Processor Circuit Architectures for Photonic Integration". Thesis, Université d'Ottawa / University of Ottawa, 2020. http://hdl.handle.net/10393/41580.
Pełny tekst źródłaPatel, Dipesh Ishwerbhai. "Architectural considerations for a control system processor". Thesis, Loughborough University, 1996. https://dspace.lboro.ac.uk/2134/11075.
Pełny tekst źródłaChen, Hua. "FPGA Based Multi-core Architectures for Deep Learning Networks". University of Dayton / OhioLINK, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=dayton1449417091.
Pełny tekst źródłaGrudnitsky, Artjom [Verfasser], i J. [Akademischer Betreuer] Henkel. "A Reconfigurable Processor for Heterogeneous Multi-Core Architectures / Artjom Grudnitsky ; Betreuer: J. Henkel". Karlsruhe : KIT-Bibliothek, 2015. http://d-nb.info/1120498201/34.
Pełny tekst źródłaPang, Yihan. "Leveraging Processor-diversity For Improved Performance In Heterogeneous-ISA Systems". Thesis, Virginia Tech, 2019. http://hdl.handle.net/10919/95299.
Pełny tekst źródłaMaster of Science
The author of this thesis has a family full of non-engineers. To persuade family members that the work of this thesis is meaningful, aka the author is not procrastinating in school, the author decided to draw an analogy between processors and cars. Suppose in an alternative universe, cars (systems) can be powered by engines (processors) that uses two different fuel-sources (ISAs): gasoline or electric (single-ISA) processors but not both (heterogeneous-ISA). Car manufacturers (chip designers) can build engines with different design choices (processors with varying design options): engines combined with turbochargers for gasoline-powered cars, high-performance batteries combined with energy-efficient batteries for electric-powered cars (added extended instruction sets, CPU designs that target vastly different use cases, etc.). However, each design choice is limited to improving performance for a specific type of fuel-source based engine. For example, having battery alternatives has no performance impact on gasoline-powered engines. As time passes by, car manufacturers have exhausted options to make a drastic improvement to their existing engine designs (limited performance gains in recent chips). To tackle this problem, in this thesis, the author first examined the usage of cars: driving on the road (running applications). The author's study found that no single engine is suitable for all routes (no single processor is good for all workloads), and cars powered by different fuel-source based engines showed a significant diversity in performance (application performance varies drastically between systems with processors built on different ISAs). Gasoline-powered cars perform well on high-speed roads, whereas electric-powered cars perform well on low-speed roads. Unfortunately, in real life, a person's commute (a workload of applications) consists of a mixture of high-speed roads and low-speed roads, and one cannot know the exact percentage of each kind of path they travel (exact application composition in a workload) beforehand. Therefore it is challenging for a person to make the correct car selection for the incoming commute (choose the right system for a workload). This thesis tries to solve this commuting problem by building a car that has multiple engines fitted to suit different road needs (systems with processors that have vastly different use cases). This thesis looks at a particular dimension of combining various fuel-powered engines in the same car (a system with heterogeneous-ISA processors). The author believes that adding diversity in fuel-powered engine selections provide an exciting dimension in car design choices (adding ISA-heterogeneity in processors provide a unique dimension in system design). Thus, this thesis focuses on estimating a theoretical multi fuel-powered car's performance by combining two different fuel-powered cars into a single mega-car using some framework (Popcorn Linux). This framework allows this mega-car to be driven by a combined fuel source with fuel intake freely transfer between fuel-sources (cross-ISA migration and execution) based on road conditions (application encountered). Based on the evaluation of this new prototype, the author finds that in a real-life scenario (workload with mixed application combination), cars with multiple fuel-source based engines have better performance than two single fuel-source based cars (systems with heterogeneous-ISAs processors perform better than systems with homogeneous-ISAs processors). The author hopes that this study can help build the foundation for the development of hybrid cars (system with heterogeneous-ISAs in the same processor) in the future as well as the consideration of modifying existing car into a mega-car with multiple engines suited for different road needs for improved commute performance for now. Ultimately, this thesis is not about cars. The author hopes that by explaining the research done in this paper through cars, general audiences can understand what this work is trying to investigate and what solution they have provided. In this work, we investigate the potential of a system with heterogeneous-ISA processors. This thesis prototypes one such system and finds that heterogeneous-ISA systems have performance benefits than traditional homogeneous-ISA systems over a series of experiment evaluations.
Canal, Corretger Ramon. "Power- and Performance - Aware Architectures". Doctoral thesis, Universitat Politècnica de Catalunya, 2004. http://hdl.handle.net/10803/5984.
Pełny tekst źródłaIn recent years portability has become important. Historically, portable applications were characterized by low throughput requirements such as for a wristwatch. This is no longer true.
Among the new portable applications are hand-held multimedia terminals with video display and capture, audio reproduction and capture, voice recognition, and handwriting recognition capabilities. These capabilities call for a tremendous amount of computational capacity. This computational capacity has to be realized with very low power requirements in order for the battery to have a satisfactory life span. This thesis is an attempt to provide microarchitecture and compiler techniques for low-power chips with high-computational capacity.
The first part of this work presents some schemes for reducing the complexity of the issue logic. The issue logic has become one of the main sources of energy consumption in recent years. The inherent associative look-up and the size of the structures (crucial for exploiting ILP), have led the issue logic to a significant energy budget. The techniques presented in this work eliminate or reduce the associative logic by determining producer-consumer relationships between the instructions or by scheduling the instructions according to the latency of the operations.
An important effort has been deployed to reduce the energy requirements and the power dissipation through novel mechanisms based on value compression. As a result, the second part of this thesis introduces several ultra-low power and high-end processor designs. First, the design space for ultra-low power processors is explored. Several designs are developed (at the architectural level) from scratch that exploit value compression at all levels of the data-path.
Second, value compression for high-performance processors is proposed and evaluated. At the end of this thesis, two compile-time techniques are presented that show how the compiler can help in reducing the energy consumption. By means of a static analysis of the program code or through profiling, the compiler is able to know the size of the operands involved in the computation. Through these analyses, the compiler is able to use narrower operations (i.e. a 64-bit addition can be converted to an 8-bit addition due to the information of the size of the operands).
Overall, this thesis compromises the detailed study of one of the most power hungry units in a processor (the issue logic) and the use of value compression (through hardware and software) as a mean to reduce the energy consumption in all the stages of the pipeline.
Omundsen, Daniel (Daniel Simon) Carleton University Dissertation Engineering Electrical. "A pipelined, multi-processor architecture for a connectionless server for broadband ISDN". Ottawa, 1992.
Znajdź pełny tekst źródłaSelva, Manuel. "Performance monitoring of throughput constrained dataflow programs executed on shared-memory multi-core architectures". Thesis, Lyon, INSA, 2015. http://www.theses.fr/2015ISAL0055/document.
Pełny tekst źródłaBecause of physical limits, hardware designers have switched to parallel systems to exploit the still growing number of transistors per square millimeter of silicon. These parallel systems are made of several independent computing units. To benefit from these computing units, software must be changed. Existing sequential applications have to be split into independent tasks to be executed in parallel on the different computing units. To that end, many concurrent programming models have been proposed and are in use today. We focus in this thesis on the dataflow concurrent programming model. This work is about performance evaluation of dataflow programs on multicore architectures. We propose to extend dataflow programming models with the notion of throughput constraints and to take this information into account in the compilation tool chain to detect at runtime the throughput bottlenecks. The profiling results gathered during the execution are used both for off-line analyzes and to adapt the application during its execution. In the former case, the developer uses this information to know which part of the dataflow program should be optimized and to efficiently distribute the program on the computing units. In the later case, the profiling information is used by runtime adaptation mechanisms to distribute differently the work on the computing units. We give a particular focus on the profiling of the usage of the memory subsystem. The data exchange information provide by the programming model allows to efficiently used the memory subsystem of multicore architectures. Nevertheless, the complexity of modern memory systems doesn't allow to statically evaluate the impact of memory accesses on the global performances of the application. We propose to set up memory profiling dedicated to dataflow applications based on hardware profiling mechanisms
Shivashankar, Nithin. "Design and Analysis of Modular Architectures for an RNS to Mixed Radix Conversion Multi-processor". University of Cincinnati / OhioLINK, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1396531505.
Pełny tekst źródłaFronte, Daniele. "Design and development of a recongurable cryptographic co-processor". Phd thesis, Université de Provence - Aix-Marseille I, 2008. http://tel.archives-ouvertes.fr/tel-00364723.
Pełny tekst źródłaL'architecture de Celator est basée sur un réseau systolique de 4x4 Processing Elements, nommé réseau de PE, commandé par un Contrôleur réalisé avec une Machine d'États Finis (FSM) et une mémoire locale.
Cette thèse présente l'architecture de Celator, ainsi que les opérations de base nécessaires pour qu'il exécute AES, DES et SHA. Les performances de Celator sont également présentées, et comparées à celles d'autres circuits sécurisés.
Vangal, Sriram R. "Performance and Energy Efficient Building Blocks for Network-on-Chip Architectures". Licentiate thesis, Linköping : Linköpings universitet, 2006. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-7845.
Pełny tekst źródłaKennedy, Matthew D. "Power-Efficient Nanophotonic Architectures for Intra- and Inter-Chip Communication". Ohio University / OhioLINK, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1458232838.
Pełny tekst źródłaBhide, Kanchan P. "DESIGN ENHANCEMENT AND INTEGRATION OF A PROCESSOR-MEMORY INTERCONNECT NETWORK INTO A SINGLE-CHIP MULTIPROCESSOR ARCHITECTURE". UKnowledge, 2004. http://uknowledge.uky.edu/gradschool_theses/253.
Pełny tekst źródłaHebert, Nicolas. "Stratégie de fiabilisation au niveau système des architectures MPSoC". Thesis, Montpellier 2, 2011. http://www.theses.fr/2011MON20069/document.
Pełny tekst źródłaThis thesis is placed in a context where, for each technology node, integrated circuits are design at an earlier stage in the qualification process and where the CMOS technology appears to be closer to the silicon physical limitations. Despite technological countermeasure, we face an increase in the failure rate which creates conditions in favor of the return of fault-tolerant techniques for non-critical integrated circuits.Nowadays, we have reached such an integration density that we can consider the reconfigurable processor array as future SoC architectures. Indeed, these homogenous architectures suggest possible platform reconfigurations that would ensure quality of service and consequently a minimum reliability in presence of defects. Thus, new protection solutions must be proposed to ensure circuit smooth operations not only for sub-critical functionalities but at the system architecture level itself.Based on these prerogatives, we present an innovative dynamical and distributed protection method, named D-Scale. This method consists in detecting, isolating and recovering the systems in the presence of error which lead to a "crash" of the platform. The crash error detection is based on heartbeat specific messages exchanged between PEs. The recovery phase is based on an autonomous mechanism which reconfigures the platform.A hardware/software implementation was proposed and evaluated. The protection cost is reduced in order to be integrated within future multi-processor SoC architectures. Finally, a fault effect analysis tool is studied in order to validate the fault-tolerant method robustness
Quinto, Michele Arcangelo. "Méthode de reconstruction adaptive en tomographie par rayons X : optimisation sur architectures parallèles de type GPU". Thesis, Grenoble, 2013. http://www.theses.fr/2013GRENT109/document.
Pełny tekst źródłaTomography reconstruction from projections data is an inverse problem widely used inthe medical imaging field. With sufficiently large number of projections over the requiredangle, the FBP (filtered backprojection) algorithms allow fast and accurate reconstructions.However in the cases of limited views (lose dose imaging) and/or limited angle (specificconstrains of the setup), the data available for inversion are not complete, the problembecomes more ill-conditioned, and the results show significant artifacts. In these situations,an alternative approach of reconstruction, based on a discrete model of the problem,consists in using an iterative algorithm or a statistical modelisation of the problem to computean estimate of the unknown object. These methods are classicaly based on a volumediscretization into a set of voxels and provide 3D maps of densities. Computation time andmemory storage are their main disadvantages. Moreover, whatever the application, thevolumes are segmented for a quantitative analysis. Numerous methods of segmentationwith different interpretations of the contours and various minimized energy functionalare offered, and the results can depend on their use.This thesis presents a novel approach of tomographic reconstruction simultaneouslyto segmentation of the different materials of the object. The process of reconstruction isno more based on a regular grid of pixels (resp. voxel) but on a mesh composed of nonregular triangles (resp. tetraedra) adapted to the shape of the studied object. After aninitialization step, the method runs into three main steps: reconstruction, segmentationand adaptation of the mesh, that iteratively alternate until convergence. Iterative algorithmsof reconstruction used in a conventionnal way have been adapted and optimizedto be performed on irregular grids of triangular or tetraedric elements. For segmentation,two methods, one based on a parametric approach (snake) and the other on a geometricapproach (level set) have been implemented to consider mono and multi materials objects.The adaptation of the mesh to the content of the estimated image is based on the previoussegmented contours that makes the mesh progressively coarse from the edges to thelimits of the domain of reconstruction. At the end of the process, the result is a classicaltomographic image in gray levels, but whose representation by an adaptive mesh toits content provide a correspoonding segmentation. The results show that the methodprovides reliable reconstruction and leads to drastically decrease the memory storage. Inthis context, the operators of projection have been implemented on parallel archituecturecalled GPU. A first 2D version shows the feasability of the full process, and an optimizedversion of the 3D operators provides more efficent compoutations
Daneshbeh, Amir. "Bit Serial Systolic Architectures for Multiplicative Inversion and Division over GF(2m)". Thesis, University of Waterloo, 2005. http://hdl.handle.net/10012/776.
Pełny tekst źródłaBeier, Felix [Verfasser], Kai-Uwe [Akademischer Betreuer] Sattler, Wolfgang [Gutachter] Lehner i Götz [Gutachter] Graefe. "Generalized database index structures on massively parallel processor architectures / Felix Beier ; Gutachter: Wolfgang Lehner, Götz Graefe ; Betreuer: Kai-Uwe Sattler". Ilmenau : TU Ilmenau, 2019. http://d-nb.info/1194062261/34.
Pełny tekst źródłaLaurino, Luiz Sequeira. "Reuso especulativo de traços com instruções de acesso à memória". reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2007. http://hdl.handle.net/10183/14741.
Pełny tekst źródłaEven with the growing efforts to detect and handle redundant instructions, the true dependencies are still one of the bottlenecks of the computations. Value reuse and value prediction techniques have been studied in order to become an alternative to these issues. Following this approach, RST (Reuse through Speculation on Traces) combines both reuse mechanisms and has achieved some good performance improvements for superscalar processors. However, the original RST mechanism does not consider load/store instructions as reuse candidates. Because of this, our work presents a new value reuse and value prediction technique named RSTm (Reuse through Speculation on Traces with Memory), that extends RST and adds memory-access instructions to the reuse domain of the architecture. Among all studied solutions, we chose the approach of using a dedicated table (Memo_Table_L) to take care of the load/store instructions. This solution guarantees low hardware overhead, does not limit the number of memory-access instructions that could be stored for each trace and stores both the address and its value. From our experiments, performed with SPEC2000 integer and floating-point benchmarks, RSTm can achieve average performance improvements (harmonic means) of 2,97% over the original RST and 17,42% over the baseline architecture. These performance improvements are due to several reasons: bigger traces (in average, 7,75 per trace; the original RST has 3,17 in average), with a reuse rate of around 10,88% (less than RST, that presents reuse rate of 15,23%) because the latency of the instructions in the RSTm traces is bigger and compensates the smaller reuse rate.
Skaf, Ali. "Conception de processeurs arithmétiques redondants et en-ligne : algorithmes, architectures et implantations VLSI". Grenoble INPG, 1995. http://www.theses.fr/1995INPG0108.
Pełny tekst źródłaDupros, Fabrice. "Contribution à la modélisation numérique de la propagation des ondes sismiques sur architectures multicœurs et hiérarchiques". Thesis, Bordeaux 1, 2010. http://www.theses.fr/2010BOR14147/document.
Pełny tekst źródłaOne major goal of strong motion seismology is the estimation of damage in future earthquake scenarios. Simulation of large scale seismic wave propagation is of great importance for efficient strong motion analysis and risk mitigation. Being particularly CPU-consuming, this three-dimensional problem makes use of high-performance computing technologies to make realistic simulation feasible on a regional scale at relatively high frequencies.Several evolutions at the chip level have an important impact on the performance of classical implementation of seismic applications. The trend in parallel computing is to increase the number of cores available at the shared-memory level with possible non-uniform cost of memory accesses. The increasing number of cores per processor and the effort made to overcome the limitation of classical symmetric multiprocessors SMP systems make available a growing number of NUMA (Non Uniform Memory Access) architecture as computing node. We therefore need to consider new approaches more suitable to such parallel systems.This PhD work addresses both the algorithmic issues and the integration of efficient programming models for multicore architectures. The proposed contributions are validated with two large scale examples. The first case is the modeling of the 2007 Niigata-Chuetsu, Japan earthquake based on the finite differences numerical method. The second example considers a potential seismic event in the Nice sedimentary basin in the French Riviera. The finite elements method is used and the nonlinear soil behavior is taken into account
Grad, Mariusz [Verfasser], i Marco [Akademischer Betreuer] Platzner. "Just-in-time processor customization on the feasibility and limitations of FPGA-based dynamically reconfigurable instruction set architectures / Mariusz Grad. Betreuer: Marco Platzner". Paderborn : Universitätsbibliothek, 2011. http://d-nb.info/1036423565/34.
Pełny tekst źródłaDelespierre, Tiba. "Etude de cas sur architectures à mémoires distribuées : une maquette systolique programmable et l'hypercube d'Intel". Paris 9, 1987. https://portail.bu.dauphine.fr/fileviewer/index.php?doc=1987PA090073.
Pełny tekst źródłaMatherat, Philippe. "Contribution à l'augmentation de puissance des architectures de visus graphiques". Phd thesis, Université Pierre et Marie Curie - Paris VI, 1988. http://tel.archives-ouvertes.fr/tel-00172858.
Pełny tekst źródłaBringer, Yves. "Performances de nouvelles architectures machines pour la mise en oeuvre d'algorithmes de traitement et d'analyse d'image". Saint-Etienne, 1993. http://www.theses.fr/1993STET4024.
Pełny tekst źródłaSenni, Sophiane. "Exploration of non-volatile magnetic memory for processor architecture". Thesis, Montpellier, 2015. http://www.theses.fr/2015MONTS264/document.
Pełny tekst źródłaWith the downscaling of the complementary metal-oxide semiconductor (CMOS) technology,designing dense and energy-efficient systems-on-chip (SoC) is becoming a realchallenge. Concerning the density, reducing the CMOS transistor size faces up to manufacturingconstraints while the cost increases exponentially. Regarding the energy, a significantincrease of the power density and dissipation obstructs further improvement inperformance. This issue is mainly due to the growth of the leakage current of the CMOStransistors, which leads to an increase of the static energy consumption. Observing currentSoCs, more and more area is occupied by embedded volatile memories, such as staticrandom access memory (SRAM) and dynamic random access memory (DRAM). As a result,a significant proportion of total power is spent into memory systems. In the past twodecades, alternative memory technologies have emerged with attractive characteristics tomitigate the aforementioned issues. Among these technologies, magnetic random accessmemory (MRAM) is a promising candidate as it combines simultaneously high densityand very low static power consumption while its performance is competitive comparedto SRAM and DRAM. Moreover, MRAM is non-volatile. This capability, if present inembedded memories, has the potential to add new features to SoCs to enhance energyefficiency and reliability. In this thesis, an area, performance and energy exploration ofembedding the MRAM technology in the memory hierarchy of a processor architectureis investigated. A first fine-grain exploration was made at cache level for multi-core architectures.A second study evaluated the possibility to design a non-volatile processorintegrating MRAM at register level. Within the context of internet of things, new featuresand the benefits brought by the non-volatility were investigated
Sato, Toshinori. "History Directed Processor Architecture". Kyoto University, 1999. http://hdl.handle.net/2433/182380.
Pełny tekst źródłaCarli, Roberto. "Flexible MIPS soft processor architecture". Thesis, Massachusetts Institute of Technology, 2008. http://hdl.handle.net/1721.1/45809.
Pełny tekst źródłaIncludes bibliographical references (p. 48-49).
The flexible MIPS soft processor architecture borrows selected technologies from high-performance computing to deliver a modular, highly customizable CPU targeted towards FPGA implementations for embedded systems; the objective is to provide a more flexible architectural alternative to coprocessor-based solutions. The processor performs out-of-order execution on parallel functional units, it delivers in -order instruction commit and it is compatible with the MIPS-1 Instruction Set Architecture. Amongst many available options, the user can introduce custom instructions and matching functional units; modify existing units; change the pipelining depth within functional units to any fixed or variable value; customize instruction definitions in terms of operands, control signals and register file interaction; insert multiple redundant functional units for improved performance. The flexibility provided by the architecture allows the user to expand the processor functionality to implement instructions of coprocessor-level complexity through additional functional units. The processor design was implemented and simulated on two FPGA platforms, tested on multiple applications, and compared to three commercially available soft processor solutions in terms of features, area, clock frequency and benchmark performance.
by Robert Carli.
M.Eng.
Costa, Celsio Maciel da. "Environnement d'éxécution parallèle : conception et architecture". Grenoble 1, 1993. http://tel.archives-ouvertes.fr/tel-00005132.
Pełny tekst źródłaNous proposons une implantation de ce modèle comme la mise en œuvre systématique d'une architecture client/ serveur. Cette implantation a été effectuée sur une machine Supernode. La base est un Micro Noyau Parallèle, ou le composant principal est un mécanisme d'appel de procédure à distance minimal"
Martin, Rovira Julia, i Fructoso Melero Francisco Manuel. "Micro-Network Processor : A Processor Architecture for Implementing NoC Routers". Thesis, Jönköping University, JTH, Computer and Electrical Engineering, 2007. http://urn.kb.se/resolve?urn=urn:nbn:se:hj:diva-941.
Pełny tekst źródłaRouters are probably the most important component of a NoC, as the performance of the whole network is driven by the routers’ performance. Cost for the whole network in terms of area will also be minimised if the router design is kept small. A new application specific processor architecture for implementing NoC routers is proposed in this master thesis, which will be called µNP (Micro-Network Processor). The aim is to offer a solution in which there is a trade-off between the high performance of routers implemented in hardware and the high level of flexibility that could be achieved by loading a software that routed packets into a GPP. Therefore, a study including the design of a hardware based router and a GPP based router has been conducted. In this project the first version of the µNP has been designed and a complete instruction set, along with some sample programs, is also proposed. The results show that, in the best case for all implementation options, µNP was 7.5 times slower than the hardware based router. It has also behaved more than 100 times faster than the GPP based router, keeping almost the same degree of flexibility for routing purposes within NoC.
Chow, K. W. "Multi-processor architecture for machine vision". Thesis, Cardiff University, 1992. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.358531.
Pełny tekst źródłaCampanella, William C. "The nature of the problem statement in architectural programming : a critical analysis of three programming processes". Thesis, Georgia Institute of Technology, 1987. http://hdl.handle.net/1853/23156.
Pełny tekst źródłaYang, Chih-Chyau, i 楊智喬. "The Study on Media Processor Architectures". Thesis, 1999. http://ndltd.ncl.edu.tw/handle/73880867274699646454.
Pełny tekst źródła國立交通大學
電子工程系
87
With the rapidly evolving in multimedia technology, multimedia applications have taken a great influence on our daily life. A media processor is a good solution to these multimedia applications. In this thesis, a media processor is presented for multimedia data processing. It contains an ARM-like mainprocessor and a SIMD Vector Coprocessor (AM-SVC). The ARM-like mainprocessor, which acts as a system controller, is instruction compatible with ARM7. The SIMD Vector Coprocessor consists of one multiplier unit, one arithmetic unit, two load/store units, and two scalar units. Combining with a DMA-like controller, splittable design of multiplier and arithmetic units, separated load/store units, scalar units and a concurrent control unit, our media processor is able to alleviate mainprocessor's burden, exploit high data parallelism, and achieve out-of-order execution and in-order completion. Multithread architecture is also adopted in the coprocessor for exploiting thread level parallelism. All the modules except memory and register files in the design are coded in synthesizable RTL Verilog HDL. Simulation results of AM-SVC are also given in this thesis.