Dissertations / Theses on the topic 'Processor Architectures'

To see the other types of publications on this topic, follow the link: Processor Architectures.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Processor Architectures.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Sherwood, Timothy. "Application-tuned processor architectures /." Diss., Connect to a 24 p. preview or request complete full text in PDF format. Access restricted to UC campuses, 2003. http://wwwlib.umi.com/cr/ucsd/fullcit?p3090450.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Killeen, Timothy F. "Improving processor utilization in multiple context processor architectures." Ohio : Ohio University, 1997. http://www.ohiolink.edu/etd/view.cgi?ohiou1174618393.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Tune, Eric. "Critical-path aware processor architectures /." Diss., Connect to a 24 p. preview or request complete full text in PDF format. Access restricted to UC campuses, 2004. http://wwwlib.umi.com/cr/ucsd/fullcit?p3153686.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Commissariat, Hormazd P. "Performance Modeling of Single Processor and Multi-Processor Computer Architectures." Thesis, Virginia Tech, 1995. http://hdl.handle.net/10919/31377.

Full text
Abstract:
Determining the optimum computer architecture configuration for a specific application or a generic algorithm is a difficult task. The complexity involved in today's computer architectures and systems makes it more difficult and expensive to easily and economically implement and test full functional prototypes of computer architectures. High level VHDL performance modeling of architectures is an efficient way to rapidly prototype and evaluate computer architectures. Determining the architecture configuration is fixed, one would like to know the tolerance and expected performance of individual/critical components and also what would be the best way to map the software tasks onto the processor(s). Trade-offs and engineering compromises can be analyzed and the effects of certain component failures and communication bottle-necks can be studied. A part of the research work done for the RASSP (Rapid Prototyping of Application Specific Signal Processors) project funded by Department of Defense contracts is documented in this thesis. The architectures modeled include a single-processor, single-global-bus system; a four processor, single-global-bus system; a four processor, multiple-local-bus, single-global-bus system; and finally, a four processor multiple-local-bus system interconnected by a crossbar interconnection switch. The hardware models used are mostly legacy/inherited models from an earlier project and they were upgraded, modified and customized to suit the current research needs and requirements. The software tasks that are run on the processors are pieces of the signal and image processing algorithm run on the Synthetic Aperture Radar (SAR). The communication between components/devices is achieved in the form of tokens which are record structures. The output is a trace file which tracks the passage of the tokens through various components of the architecture. The output trace file is post-processed to obtain activity plots and latency plots for individual components of the architecture.
Master of Science
APA, Harvard, Vancouver, ISO, and other styles
5

Al-Khayatt, Samir S. "Functional partitioning of multi-processor architectures." Thesis, Loughborough University, 1990. https://dspace.lboro.ac.uk/2134/32337.

Full text
Abstract:
Many real-time computations such as process control and robotic applications may be naturally distributed in a functional manner. One way of ensuring good performance, reliability and security of operation is to map or distribute such tasks onto a distributed, multi-processor system. The time-critical task is thus functionally partitioned into a set of cooperating sub-tasks. These sub-tasks run concurrently and asynchronously on different nodes (stations) of the system. The software design and support of such a functional distribution of sub-tasks (processes) depends on the degree of interaction of these processes among the different nodes.
APA, Harvard, Vancouver, ISO, and other styles
6

Seng, John. "Optimizing processor architectures for power-efficiency /." Diss., Connect to a 24 p. preview or request complete full text in PDF format. Access restricted to UC campuses, 2003. http://wwwlib.umi.com/cr/ucsd/fullcit?p3091334.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Shnidman, Nathan R. (Nathan Robert). "Multipass communication systems for tiled processor architectures." Thesis, Massachusetts Institute of Technology, 2006. http://hdl.handle.net/1721.1/36137.

Full text
Abstract:
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2006.
Includes bibliographical references (p. 191-202).
Multipass communication systems utilize multiple sets of parallel baseband receiver functions to balance communication data rates and available computation capabilities. This is achieved by spatially pipelining baseband functions across parallel resources to perform multiple processing passes on the same set of received values, thus allowing the system to simultaneously convey multiple sequences of data using a single wireless link. The use of multiple passes mitigates the effects of data rate on receiver processing bottlenecks, making the use of general-purpose processing elements for high data rate communication functions viable. The flexibility of general-purpose processing, in turn, allows the receiver composition to trade-off resource usage and required processing rate. For instance, a communication system could be distributed across 2 passes using 2x the overall area, but reducing the data rate for each pass and the resultant overall required processing rate, and hence clock speed, by 1/2. Lowering the clock speed can also be leveraged to reduce power through voltage scaling and/or the use of higher Vt devices. The characteristics of general-purpose parallel processors for communications processing are explored, as well as the applicability of specific parallel designs to communications processing.
(Cont.) In particular, an in depth look is taken of the Raw processor's tiled architecture as a general-purpose parallel processor particularly well suited to portable communications processing. An example of a multipass system, based on the 802.11a baseband, implemented on the Raw processor along with the accompanying hardware implementation is presented as both a proof-of-concept, as well as a means to explore some of the advantages and trade-offs of such a system. A bit-error rate study is presented which shows this multipass system to be within a small fraction of dB of the performance of an equivalent data rate single pass system, thus demonstrating the viability of the multipass algorithm. In addition, the capability of tiled processors to maximize processing capabilities at the system block level, as well as the system architectural level, is shown. Parallel implementations of two processing intensive functions: the FFT and the Viterbi decoder are shown. A parallelized assembly language FFT utilizing 16 tiles is shown to have a 1,000x improvement , and a parallelized 48-tile assembly language Viterbi decoder is shown to have a 10, 000x improvement over corresponding serial C implementations.
by Nathan Robert Shnidman.
Ph.D.
APA, Harvard, Vancouver, ISO, and other styles
8

Trilla, Rodríguez David. "Non-functional considerations of time-randomized processor architectures." Doctoral thesis, Universitat Politècnica de Catalunya, 2020. http://hdl.handle.net/10803/670903.

Full text
Abstract:
Critical Real-Time Embedded Systems (CRTES) are the subset of embedded systems with timing constraints whose miss behavior can endanger human lives or expensive equipment. To provide evidence of correctness, CRTES are designed, implemented and deployed in adherence to safety standards and certification regulations. To that end, CRTES follow strict Validation & Verification (V&V) procedures of their functional and non-functional properties. One of the most important non-functional properties is timing, which builds on computing the worst-case execution time of tasks and a schedule of tasks so that the overall system timing behavior is correct. However, the use of more complex hardware and software to satisfy CRTES unprecedented performance requirements, heavily increase the cost of V&V. For timing V&V, statistical techniques, like Measurement-Based Probabilistic Timing Analysis (MBPTA) help to address the complexity of hardware and software in CRTES. To that end, they benefit from randomization of temporal behavior at the hardware level. In this line, Time-Randomized Processors (TRP) contain timing V&V costs by breaking systematic pathological behaviors and enabling MBPTA applicability. In the context of TRP, this thesis shows that hardware and software designs incorporating randomization can not only successfully tackle the existing timing analysis problem, but also provide helpful properties to other emerging non-functional metrics key in CRTES like reliability, security and energy. For reliability, we show that TRP are naturally resilient against hardware aging effects and voltage noise and we add up to such resilience by improving its design. Also, TRP hinders security threats and intrusions by breaking and mangling the deterministic association between memory mapping and access time and we develop a framework for secure automotive operation. Finally for energy, we introduce a taxonomy to guide the future challenges for worst-case energy estimation and make the first steps towards the use of MBPTA-like methodology to address worst-case energy estimation under the presence of process variation. Moreover this thesis also shows that together with the application of MBPTA-like methodology, TRP also naturally expose and break pathological energy consumption patterns and help in validating and accounting instantaneous peak power demands. In summary, this thesis pioneers several aspects of the use of TRP to address the emerging challenges that CRTES face in the reliability, security and energy domains.
Los Sistemas Críticos Empotrados de Tiempo Real (SCETR) son el subconjunto de sistemas empotrados con requerimientos temporales cuyo mal funcionamiento puede poner en peligro vidas humanas o material valioso. Para obtener evidencias de su correcta operación, los SCETR son diseñados, implementados y desplegados en conformidad con los estándares de fiabilidad y las regulaciones de certificación. Para lograrlo, los SCETR deben seguir estrictos procesos de Validación y Verificación (VyV) de sus propiedades funcionales y no funcionales. Una de las propiedades no funcionales más importantes es la temporalidad, cuya verificación se basa en derivar los tiempos de ejecución en el peor caso de las tareas y generar una planificación de éstas para asegurar el correcto comportamiento temporal del sistema. Sin embargo, el uso de hardware y software de mayor complejidad para poder satisfacer las crecientes demandas de rendimiento en los SCETR provoca un incremento sustancial de los costes de la VyV. En el caso de la VyV temporal, métodos estadísticos como el Análisis Temporal Probabilístico Basado en Mediciones (ATPBM) ayudan a reducir el coste de la VyV en el hardware y software complejo de los SCETR. Para lograrlo, se emplea el uso de la randomización temporal a nivel de hardware. En este sentido, los Procesadores Temporalmente Randomizados (PTR) logran contener los costes de VyV mediante la destrucción de comportamientos patológicos sistemáticos y habilitando el uso de las técnicas de ATPBM. En este contexto, esta tesis demuestra que los diseños hardware y software que incorporan randomización no solo consiguen exitosamente solucionar parte del problema de análisis temporal, sino que también son útiles para analizar otras métricas no funcionales clave en los SCETR cómo la durabilidad, la seguridad y la energía. En términos de durabilidad, esta tesis demuestra que los PTR son de manera natural resilientes ante efectos de envejecimiento del hardware, efectos de inestabilidad en la alimentación y aumentamos esas propiedades proponiendo mejoras a su diseño. Además, los PTR mitigan las amenazas de seguridad e intrusiones mediante la destrucción de la asociación determinista entre el mapeo de memoria y su tiempo de acceso y desarrollamos una metodología en concordancia para una operabilidad segura en automóviles. Finalmente, para la temática energética, introducimos una taxonomía para guiar a los futuros retos en la derivación de estimaciones para consumo energético en el peor caso y marcamos los primeros pasos para usar una metodología tipo ATPBM en estimaciones energéticas bajo los efectos de variaciones de proceso. Siguiendo en la temática energética, esta tesis también muestra como los PTR de manera natural rompen y exponen patrones patológicos de consumo energético y ayudan a cuantificar y validar picos instantáneos de demanda energética. En resumen, esta tesis abre el camino en el uso de los PTR en los SCETR para atacar sus retos emergentes en las temáticas de durabilidad, seguridad y consumo energético.
APA, Harvard, Vancouver, ISO, and other styles
9

Rebello, Vinod. "On the distribution of control in asynchronous processor architectures." Thesis, University of Edinburgh, 1997. http://hdl.handle.net/1842/507.

Full text
Abstract:
The effective performance of computer systems is to a large measure determined by the synergy between the processor architecture, the instruction set and the compiler. In the past, the sequencing of information within processor architectures has normally been synchronous: controlled centrally by a clock. However, this global signal could possibly limit the future gains in performance that can potentially be achieved through improvements in implementation technology. This thesis investigates the effects of relaxing this strict synchrony by distributing control within processor architectures through the use of a novel asynchronous design model known as a micronet. The impact of asynchronous control on the performance of a RISC-style processor is explored at different levels. Firstly, improvements in the performance of individual instructions by exploiting actual run-time behaviours are demonstrated. Secondly, it is shown that micronets are able to exploit further (both spatial and temporal) instructionlevel parallelism (ILP) efficiently through the distribution of control to datapath resources. Finally, exposing fine-grain concurrency within a datapath can only be of benefit to a computer system if it can easily be exploited by the compiler. Although compilers for micronet-based asynchronous processors may be considered to be more complex than their synchronous counterparts, it is shown that the variable execution time of an instruction does not adversely affect the compiler's ability to schedule code efficiently. In conclusion, the modelling of a processor's datapath as a micronet permits the exploitation of both finegrain ILP and actual run-time delays, thus leading to the efficient utilisation of functional units and in turn resulting in an improvement in overall system performance.
APA, Harvard, Vancouver, ISO, and other styles
10

Petters, Stefan M. E. "Worst case execution time estimation for advanced processor architectures." [S.l. : s.n.], 2002. http://deposit.ddb.de/cgi-bin/dokserv?idn=965404110.

Full text
APA, Harvard, Vancouver, ISO, and other styles
11

Kwak, Jae-hyuck. "High speed CORDIC processor designs : algorithms, architectures, and applications /." Digital version accessible at:, 2000. http://wwwlib.umi.com/cr/utexas/main.

Full text
APA, Harvard, Vancouver, ISO, and other styles
12

Orlando, Gerardo. "Efficient elliptic curve processor architectures for field programmable logic." Link to electronic thesis, 2002. http://www.wpi.edu/Pubs/ETD/Available/etd-0327102-103635.

Full text
APA, Harvard, Vancouver, ISO, and other styles
13

Whitham, Jack. "Real-time processor architectures for worst case execution time reduction." Thesis, University of York, 2008. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.479513.

Full text
APA, Harvard, Vancouver, ISO, and other styles
14

Lee, Walter (Walter Cheng-Wan). "Software orchestration of instruction level parallelism on tiled processor architectures." Thesis, Massachusetts Institute of Technology, 2005. http://hdl.handle.net/1721.1/33862.

Full text
Abstract:
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2005.
Includes bibliographical references (p. 135-138).
Projection from silicon technology is that while transistor budget will continue to blossom according to Moore's law, latency from global wires will severely limit the ability to scale centralized structures at high frequencies. A tiled processor architecture (TPA) eliminates long wires from its design by distributing its resources over a pipelined interconnect. By exposing the spatial distribution of these resources to the compiler, a TPA allows the compiler to optimize for locality, thus minimizing the distance that data needs to travel to reach the consuming computation. This thesis examines the compiler problem of exploiting instruction level parallelism (ILP) on a TPA. It describes Rawcc, an ILP compiler for Raw, a fully distributed TPA. The thesis examines the implication of the resource distribution on the exploitation of ILP for each of the following resources: instructions, registers, control, data memory, and wires. It designs novel solutions for each one, and it describes the solutions within the integrated framework of a working compiler. Performance is evaluated on a cycle-accurate Raw simulator as well as on a 16-tile Raw chip. Results show that Rawcc can attain modest speedups for fine-grained applications, as well speedups that scale up to 64 tiles for applications with such parallelism.
by Walter Lee.
Ph.D.
APA, Harvard, Vancouver, ISO, and other styles
15

Gelhaar, B., K. Alvermann, and F. Dzaak. "A MULTICHANNEL DATA ACQUISITION SYSTEM BASED ON PARALLEL PROCESSOR ARCHITECTURES." International Foundation for Telemetering, 1992. http://hdl.handle.net/10150/608884.

Full text
Abstract:
International Telemetering Conference Proceedings / October 26-29, 1992 / Town and Country Hotel and Convention Center, San Diego, California
For research purposes on helicopter rotor acoustics a large data acquisition system called TEDAS (Transputer based Expandable Data Acquisition System) has been developed. The key features of this system are: unlimited expandability and sum data rate, local storage of data during operation, very simple analog anti aliasing filtering due to extensive digital filtering, and integrated computational power which scales with the number of channels. The sample rate is up to 50 kHz/channel, the resolution is 16 bit, 360 channels are realized now. TEDAS consists of blocks with 8 A/D converters which are controlled by one transputer T800. The size of the local memory is 4 Mbyte. Any number of blocks (IDAM = Intelligent Data Acquisition Module) can be combined to a complete system. Data preprocessing is done in parallel inside the IDAMs. As for 16 bit systems the analog antialiasing filtering becomes a dominant factor of the costs, delta sigma ADCs with oversampling and internal digital filtering are used. This produces an exact linear phase and a stop band rejection of -90 dB.
APA, Harvard, Vancouver, ISO, and other styles
16

Hanen, Claire. "Problemes d'ordonnancement des architectures pipelines : modelisation, optimisation, algorithmes." Paris 6, 1987. http://www.theses.fr/1987PA066424.

Full text
Abstract:
Modélisation et résolution du problème de la maximisation du débit de pipelines micro programmables calculant des boucles vectorielles. La difficulté réside dans la prise en compte de toutes les contraintes engendrées par la nature du calcul et par l'architecture du pipeline. Il est montre que sous certaines hypothèses, la solution d'un problème d’ordonnancement répétitif permet de construire un microprogramme optimal. Afin de résoudre ce problème, deux algorithmes sont proposes
APA, Harvard, Vancouver, ISO, and other styles
17

Ma, Nicholas. "Modeling and evaluation of multi-core multithreading processor architectures in SystemC." Thesis, Kingston, Ont. : [s.n.], 2007. http://hdl.handle.net/1974/510.

Full text
APA, Harvard, Vancouver, ISO, and other styles
18

Ceder, Frederick. "Efficient Implementation of 3D Finite Difference Schemes on Recent Processor Architectures." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-170082.

Full text
Abstract:
Efficient Implementation of 3D Finite Difference Schemes on Recent Processors Abstract In this paper a solver is introduced that solves a problem set modelled by the Burgers equation using the finite difference method: forward in time and central in space. The solver is parallelized and optimized for Intel Xeon Phi 7120P as well as Intel Xeon E5-2699v3 processors to investigate differences in terms of performance between the two architectures. Optimized data access and layout have been implemented to ensure good cache utilization. Loop tiling strategies are used to adjust data access with respect to the L2 cache size. Compiler hints describing aligned memory access are used to support vectorization on both processors. Additionally, prefetching strategies and streaming stores have been evaluated for the Intel Xeon Phi. Parallelization was done using OpenMP and MPI. The parallelisation for native execution on Xeon Phi is based on OpenMP and yielded a raw performance of nearly 100 GFLOP/s, reaching a speedup of almost 50 at a 83\% parallel efficiency. An OpenMP implementation on the E5-2699v3 (Haswell) processors produced up to 292 GFLOP/s, reaching a speedup of almost 31 at a 85\% parallel efficiency. For comparison a mixed implementation using interleaved communications with computations reached 267 GFLOP/s at a speedup of 28 with a 87\% parallel efficiency. Running a pure MPI implementation on the PDC's Beskow supercomputer with 16 nodes yielded a total performance of 1450 GFLOP/s and for a larger problem set it yielded a total of 2325 GFLOP/s, reaching a speedup and parallel efficiency at resp. 170 and 33,3\% and 290 and 56\%. An analysis based on the roofline performance model shows that the computations were memory bound to the L2 cache bandwidth, suggesting good L2 cache utilization for both the Haswell and the Xeon Phi's architectures. Xeon Phi performance can probably be improved by also using MPI. Keeping technological progress for computational cores in the Haswell processor in mind for the comparison, both processors perform well. Improving the stencil computations to a more compiler friendly form might improve performance more, as the compiler can possibly optimize more for the target platform. The experiments on the Cray system Beskow showed an increased efficiency from 33,3\% to 56\% for the larger problem, illustrating good weak scaling. This suggests that problem sizes should increase accordingly for larger number of nodes in order to achieve high efficiency. Frederick Ceder
Effektiv implementering av finita differensmetoder i 3D på moderna processorarkitekturer Sammanfattning Denna uppsats diskuterar implementationen av ett program som kan lösa problem modellerade efter Burgers ekvation numeriskt. Programmet är byggt ifrån grunden och använder sig av finita differensmetoder och applicerar FTCS metoden (Forward in Time Central in Space). Implementationen paralleliseras och optimeras på Intel Xeon Phi 7120P Coprocessor och Intel Xeon E5-2699v3 processorn för att undersöka skillnader i prestanda mellan de två modellerna. Vi optimerade programmet med omtanke på dataåtkomst och minneslayout för att få bra cacheutnyttjande. Loopblockningsstrategier används också för att dela upp arbetsminnet i mindre delar för att begränsa delarna i L2 cacheminnet. För att utnyttja vektorisering till fullo så används kompilatordirektiv som beskriver minnesåtkomsten, vilket ska hjälpa kompilatorn att förstå vilka dataaccesser som är alignade. Vi implementerade också prefetching strategier och streaming stores på Xeon Phi och disskuterar deras värde. Paralleliseringen gjordes med OpenMP och MPI. Parallelliseringen för Xeon Phi:en är baserad på bara OpenMP och exekverades direkt på chipet. Detta gav en rå prestanda på nästan 100 GFLOP/s och nådde en speedup på 50 med en 83% effektivitet. En OpenMP implementation på E5-2699v3 (Haswell) processorn fick upp till 292 GFLOP/s och nådde en speedup på 31 med en effektivitet på 85%. I jämnförelse fick en hybrid implementation 267 GFLOP/s och nådde en speedup på 28 med en effektivitet på 87%. En ren MPI implementation på PDC's Beskow superdator med 16 noder gav en total prestanda på 1450 GFLOP/s och för en större problemställning gav det totalt 2325 GFLOP/s, med speedup och effektivitet på respektive 170 och 33% och 290 och 56%. En analys baserad på roofline modellen visade att beräkningarna var minnesbudna till L2 cache bandbredden, vilket tyder på bra L2-cache användning för både Haswell och Xeon Phi:s arkitekturer. Xeon Phis prestanda kan förmodligen förbättras genom att även använda MPI. Håller man i åtanke de tekniska framstegen när det gäller beräkningskärnor på de senaste åren, så preseterar både arkitekturer bra. Beräkningskärnan av implementationen kan förmodligen anpassas till en mer kompilatorvänlig variant, vilket eventuellt kan leda till mer optimeringar av kompilatorn för respektive plattform. Experimenten på Cray-systemet Beskow visade en ökad effektivitet från 33,3% till 56% för större problemställningar, vilket visar tecken på bra weak scaling. Detta tyder på att effektivitet kan uppehållas om problemställningen växer med fler antal beräkningsnoder. Frederick Ceder
APA, Harvard, Vancouver, ISO, and other styles
19

Hasan, Mehedi. "Coherent Optical & Electro-Optical Signal Processor Circuit Architectures for Photonic Integration." Thesis, Université d'Ottawa / University of Ottawa, 2020. http://hdl.handle.net/10393/41580.

Full text
Abstract:
The capacity of optical communications networks continues to grow unabated. Applications for streaming video, social networking and cloud computing, are driving exponential growth of the traffic carried over the world’s ICT networks, which has been sustained thus far through the proliferation of datacenters and efficient, effective use of existing optical fibre. To meet increasing capacity demands requires increasingly sophisticated modulation formats and spectral management to achieve effective use of the available spectrum provided by an optical fibre. Moreover, the technology developed for optical communications is finding broader application to other sectors such as data centres, 5&6 G wireless; lidar and radar. Ultimately, some essential signal processing functions must occur at speeds beyond purely electronic means even when accounting for anticipated technological development. The option is to perform signal processing in the optical domain. Optical signal processors are fundamentally analog and linear in nature. To provide high performance, an analogue processor must be well controlled in a way analogous to the numerous and sophisticated controllers employed by the process industry. Consequently, a further extension of control to deeper levels within the physical layer reaching the optical layer will be necessary. For example, current reconfigurable optical add-drop multiplexers are coloured and directional and the wavelength division multiplexing channel grid, transponders modulation format, and the routing are all fixed. Through optimization of the interface between the physical components, sensors, and processors elastic optical network technology can be achieved by employing colour-, direction-, contention-, grid-less, filter-, gap-less reconfigurable optical add-drop multiplexers, flexible channels centre frequencies and width, flexible sub-carriers in super-channels, flexible modulation formats and forward error control coding transponders, and impairment-aware wavelength routing and spectral assignment. The aim of this thesis is to advance the state-of-the-art in photonic circuits and subsystems via proposing new architecture; study of the feasibility of photonic integration and, proof of concept implementations using available resources. The goal is to introduce new architectural concepts that make effective use of physical components and/or optical processors with reduced energy consumption, reduced footprint and offer speed beyond all-electronic implementations. The thesis presents four case studies based on one or more published papers and supplementary material that advance the goal of the thesis. The first study presents a coherent electro-optic circuit architecture that generates N spatially distinct phase-correlated harmonically related carriers using a generalized Mach-Zehnder Interferometer with its N×1 combiner replaced by an N×N optical Discrete Fourier Transform. The architecture subsumes all Mach-Zehnder Interferometer-based architectures in the prior art given an appropriate selection of output port(s) and dimension N, although the principal application envisaged is phase-correlated subcarrier generation for next-generation optical transmission systems. The theoretical prediction is then verified experimentally using laboratory available photonic integrated circuit fabricated for other applications. Later on, a novel extension of the circuit architecture is introduced by replacing the optical Discrete Fourier Transform network using the combination of a properly chosen phase shifter and single MMI coupler. The second study proposes two novel architectures for an on-chip ultra-high-resolution panoramic spectrometer and presents their design, analysis, integration feasibility, and verification by simulation. The target application is to monitor the power of a wavelength division multiplexed signals in both fixed and flex grid over entire C-band with minimum scan time and better than 1 GHz frequency accuracy. The two architectures combine in synchrony a scanning comb filter stage and channelized coarse filter. The fine filtering is obtained using a ring resonator while the coarse filtering is obtained using an arrayed waveguide grating with appropriate configuration. The fully coherent first architecture is optimised for compactness but relies on a repeatable fabrication processes to match the optical path lengths between a Mach-Zehnder interferometer and a multiple input arrayed waveguide grating. The second architecture is less compact than the first but is robust to fabrication tolerances as it does not require the path length matching. The third study proposes a new circuit architecture for single sideband modulation or frequency conversion which employs a cascade Mach-Zehnder modulator architecture departing from the orthodox dual parallel solution. The theoretical analysis shows that the circuit has 3-dB optical and 3-dB electrical advantage over the orthodox solution. The 3-dB electrical advantage increases the linear operating range of Mach-Zehnder modulator before RF amplifier saturation. An experimental verification of the proposed architecture is provided using an available photonic integrated circuit. The proposed circuit can also perform complex modulation. An alternative implementation based on polarization modulators is also described. The fourth study presents the theoretical modelling of a photonic generation of broadband radio frequency phase shifter. The proposed phase shifter can generate any phase without bound: the complex transmission of the phase shifter follows a trajectory that rotates on a unit circle and may encircle the origin any number of times in either direction, which has great utility in the tuning of RF-photonic systems. The proposed concept is then verified experimentally using off the shelf low frequency electronic components.
APA, Harvard, Vancouver, ISO, and other styles
20

Patel, Dipesh Ishwerbhai. "Architectural considerations for a control system processor." Thesis, Loughborough University, 1996. https://dspace.lboro.ac.uk/2134/11075.

Full text
Abstract:
Modern design methodologies for control systems create controllers with dynamics which are of a similar order to the physical system being controlled. When these are implemented digitally as Infinite Impulse Response (HR) filters the processing requirements are extensive, in particular when high sample rates are necessary to minimise the detrimental effects of sample delay. The aim of the research was to apply signal processing techniques to facilitate the implementation of control algorithms in digital form, with the principal objective of maximising the computational efficiency, either to achieve the highest possible sample rates using a given processor, or to minimise the processor complexity for a given requirement. One of the approaches is to design a fixed point processor whose architecture is optimised to meet the computational requirements of signal processing for control, thereby maximising what can be achieved with a single processor. Hence the aim of the research was to head towards a processor architecture optimised for Control System Processing. The design of this processor is based on a unified structural form and it will be shown that controllers, represented either in state space form or as transfer functions, can be implemented using this unified structure. The structure is based on the σ-operator, which has been shown to be robust to changes in coefficients and hence require shorter coefficient wordlength to achieve a comparable performance to traditional z-operator based structures. Additionally, the σ-operator structures are also shown to have lower wordlength requirements for the internal variables. Also presented is a possible architecture for a Control System Processor and a model for the processor is developed and constructed using VHDL. This is simulated on a test bench, also designed in VHDL. The results of implementing a phase advance controller on the processor are then compared with those obtained from a MATLAB simulation.
APA, Harvard, Vancouver, ISO, and other styles
21

Chen, Hua. "FPGA Based Multi-core Architectures for Deep Learning Networks." University of Dayton / OhioLINK, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=dayton1449417091.

Full text
APA, Harvard, Vancouver, ISO, and other styles
22

Grudnitsky, Artjom [Verfasser], and J. [Akademischer Betreuer] Henkel. "A Reconfigurable Processor for Heterogeneous Multi-Core Architectures / Artjom Grudnitsky ; Betreuer: J. Henkel." Karlsruhe : KIT-Bibliothek, 2015. http://d-nb.info/1120498201/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
23

Pang, Yihan. "Leveraging Processor-diversity For Improved Performance In Heterogeneous-ISA Systems." Thesis, Virginia Tech, 2019. http://hdl.handle.net/10919/95299.

Full text
Abstract:
The purpose of this thesis is to investigate the effectiveness of executing High Performance Computing (HPC) workloads on multiprocessors with heterogeneous Instruction Set Architecture (ISA) cores. ISA-heterogeneity in processor designs provides a unique dimension for researchers to explore performance benefits through diversity in design choices. Additionally, each application has a natural preference to one processor in a selected group of processors (we defined this term as processor-preference), and processor-preference is highly affected by processor design choices. Thus, a system with heterogeneous-ISA cores offers an intriguing design perspective, packing heterogeneous-ISA cores in the same processor or system that compensate each other in dynamic workload scenarios. This thesis considers dynamic migrating applications with different processor-preferences across ISA-different cores to exploit the potential of this idea. With SIMD instructions getting more attention from chip designers, this thesis also presents the necessary modifications for a general compiler/run-time infrastructure to transform the dynamic program state of SIMD regions at run-time from one ISA format to another for cross-ISA migration and execution. Lastly, this thesis presents a processor-preference-aware scheduling policy that makes dynamic cross-ISA migration decisions that improve overall system throughput compared to homogeneous-ISA systems. This thesis prototypes a heterogeneous-ISA system using an Intel Xeon Gold 5118 x86-64 server and a Cavium ThunderX ARMv8 server and evaluates the effectiveness of our infrastructure and scheduling policy. Our results reveal that heterogeneous-ISA systems that are processor-preference-aware and with cross-ISA execution migration capability can yield throughput gains up to 36\% compared to traditional homogeneous ISA systems.
Master of Science
The author of this thesis has a family full of non-engineers. To persuade family members that the work of this thesis is meaningful, aka the author is not procrastinating in school, the author decided to draw an analogy between processors and cars. Suppose in an alternative universe, cars (systems) can be powered by engines (processors) that uses two different fuel-sources (ISAs): gasoline or electric (single-ISA) processors but not both (heterogeneous-ISA). Car manufacturers (chip designers) can build engines with different design choices (processors with varying design options): engines combined with turbochargers for gasoline-powered cars, high-performance batteries combined with energy-efficient batteries for electric-powered cars (added extended instruction sets, CPU designs that target vastly different use cases, etc.). However, each design choice is limited to improving performance for a specific type of fuel-source based engine. For example, having battery alternatives has no performance impact on gasoline-powered engines. As time passes by, car manufacturers have exhausted options to make a drastic improvement to their existing engine designs (limited performance gains in recent chips). To tackle this problem, in this thesis, the author first examined the usage of cars: driving on the road (running applications). The author's study found that no single engine is suitable for all routes (no single processor is good for all workloads), and cars powered by different fuel-source based engines showed a significant diversity in performance (application performance varies drastically between systems with processors built on different ISAs). Gasoline-powered cars perform well on high-speed roads, whereas electric-powered cars perform well on low-speed roads. Unfortunately, in real life, a person's commute (a workload of applications) consists of a mixture of high-speed roads and low-speed roads, and one cannot know the exact percentage of each kind of path they travel (exact application composition in a workload) beforehand. Therefore it is challenging for a person to make the correct car selection for the incoming commute (choose the right system for a workload). This thesis tries to solve this commuting problem by building a car that has multiple engines fitted to suit different road needs (systems with processors that have vastly different use cases). This thesis looks at a particular dimension of combining various fuel-powered engines in the same car (a system with heterogeneous-ISA processors). The author believes that adding diversity in fuel-powered engine selections provide an exciting dimension in car design choices (adding ISA-heterogeneity in processors provide a unique dimension in system design). Thus, this thesis focuses on estimating a theoretical multi fuel-powered car's performance by combining two different fuel-powered cars into a single mega-car using some framework (Popcorn Linux). This framework allows this mega-car to be driven by a combined fuel source with fuel intake freely transfer between fuel-sources (cross-ISA migration and execution) based on road conditions (application encountered). Based on the evaluation of this new prototype, the author finds that in a real-life scenario (workload with mixed application combination), cars with multiple fuel-source based engines have better performance than two single fuel-source based cars (systems with heterogeneous-ISAs processors perform better than systems with homogeneous-ISAs processors). The author hopes that this study can help build the foundation for the development of hybrid cars (system with heterogeneous-ISAs in the same processor) in the future as well as the consideration of modifying existing car into a mega-car with multiple engines suited for different road needs for improved commute performance for now. Ultimately, this thesis is not about cars. The author hopes that by explaining the research done in this paper through cars, general audiences can understand what this work is trying to investigate and what solution they have provided. In this work, we investigate the potential of a system with heterogeneous-ISA processors. This thesis prototypes one such system and finds that heterogeneous-ISA systems have performance benefits than traditional homogeneous-ISA systems over a series of experiment evaluations.
APA, Harvard, Vancouver, ISO, and other styles
24

Canal, Corretger Ramon. "Power- and Performance - Aware Architectures." Doctoral thesis, Universitat Politècnica de Catalunya, 2004. http://hdl.handle.net/10803/5984.

Full text
Abstract:
The scaling of silicon technology has been ongoing for over forty years. We are on the way to commercializing devices having a minimum feature size of one-tenth of a micron. The push for miniaturization comes from the demand for higher functionality and higher performance at a lower cost. As a result, successively higher levels of integration have been driving up the power consumption of chips. Today, heat removal and power distribution are at the forefront of the problems faced by chip designers.
In recent years portability has become important. Historically, portable applications were characterized by low throughput requirements such as for a wristwatch. This is no longer true.
Among the new portable applications are hand-held multimedia terminals with video display and capture, audio reproduction and capture, voice recognition, and handwriting recognition capabilities. These capabilities call for a tremendous amount of computational capacity. This computational capacity has to be realized with very low power requirements in order for the battery to have a satisfactory life span. This thesis is an attempt to provide microarchitecture and compiler techniques for low-power chips with high-computational capacity.
The first part of this work presents some schemes for reducing the complexity of the issue logic. The issue logic has become one of the main sources of energy consumption in recent years. The inherent associative look-up and the size of the structures (crucial for exploiting ILP), have led the issue logic to a significant energy budget. The techniques presented in this work eliminate or reduce the associative logic by determining producer-consumer relationships between the instructions or by scheduling the instructions according to the latency of the operations.
An important effort has been deployed to reduce the energy requirements and the power dissipation through novel mechanisms based on value compression. As a result, the second part of this thesis introduces several ultra-low power and high-end processor designs. First, the design space for ultra-low power processors is explored. Several designs are developed (at the architectural level) from scratch that exploit value compression at all levels of the data-path.
Second, value compression for high-performance processors is proposed and evaluated. At the end of this thesis, two compile-time techniques are presented that show how the compiler can help in reducing the energy consumption. By means of a static analysis of the program code or through profiling, the compiler is able to know the size of the operands involved in the computation. Through these analyses, the compiler is able to use narrower operations (i.e. a 64-bit addition can be converted to an 8-bit addition due to the information of the size of the operands).
Overall, this thesis compromises the detailed study of one of the most power hungry units in a processor (the issue logic) and the use of value compression (through hardware and software) as a mean to reduce the energy consumption in all the stages of the pipeline.
APA, Harvard, Vancouver, ISO, and other styles
25

Omundsen, Daniel (Daniel Simon) Carleton University Dissertation Engineering Electrical. "A pipelined, multi-processor architecture for a connectionless server for broadband ISDN." Ottawa, 1992.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
26

Selva, Manuel. "Performance monitoring of throughput constrained dataflow programs executed on shared-memory multi-core architectures." Thesis, Lyon, INSA, 2015. http://www.theses.fr/2015ISAL0055/document.

Full text
Abstract:
Les progrès continus de la microélectronique couplés au problème de gestion de la puissance dissipée ont conduit les fabricants de processeurs à se tourner vers des puces dites multi-coeurs au début des années 2000. Ces processeurs sont composés de plusieurs unités de calcul indépendantes. Contrairement aux progrès précédents ces architectures multi-coeurs, le logiciel doit être en grande parti repensé pour tirer parti de toutes les unités de calcul. Il faut pouvoir paralléliser une application séquentielle en tâches le plus indépendantes possibles pour pouvoir les exécuter sur différentes unités de calcul. Pour cela, de nombreux modèles de programmations dits concurrents ont été proposés. Dans cette thèse nous nous intéressons aux programmes décrits à l’aide du modèle dataflow. Ce travail porte sur l’évaluation des performances de programmes dataflow (forme que revêtent typiquement des applications de types traitement de flux vidéos ou protocoles de communication) sur des architectures multi-coeurs. Plus particulièrement, le sujet de la thèse porte sur l’extension de modèles de programmation dataflow avec des éléments d’expression de propriétés de qualité de service ainsi que la prise en compte de ces éléments pour détecter, à l’exécution, les goulots d’étranglement de performance au sein des programmes. Les informations concernant les goulots d'étranglements collectées pendant l'exécution sont utilisées à la fois pour faire de l'analyse hors-ligne et pour faire des adaptations pendant l'exécution des programmes. Dans le premier cas, le programmeur utilise ces informations pour savoir quelles parties du programme dataflow il faut optimiser et pour savoir comment distribuer efficacement le programme sur les unités de calcul. Dans le second cas, les informations collectées sont utilisées par des mécanismes d'adaptation automatique afin de redistribuer le travail sur les différentes unités de calcul de façon plus efficace. Nous portons une attention particulière au profiling de l'utilisation faite par les applications dataflow du système mémoire. Les informations sur les échanges de données fournies par le modèle de programmation permettent d'exploiter de façon intelligente les architectures mémoires des machines multi-coeurs. Néanmoins, la complexité de ces dernières ne permet pas de façon générale d'évaluer statiquement l'impact sur les performances des accès mémoires. Nous proposons donc la mise en place d'un système de profiling mémoire pour des applications dataflow basé sur des mécanismes matériels
Because of physical limits, hardware designers have switched to parallel systems to exploit the still growing number of transistors per square millimeter of silicon. These parallel systems are made of several independent computing units. To benefit from these computing units, software must be changed. Existing sequential applications have to be split into independent tasks to be executed in parallel on the different computing units. To that end, many concurrent programming models have been proposed and are in use today. We focus in this thesis on the dataflow concurrent programming model. This work is about performance evaluation of dataflow programs on multicore architectures. We propose to extend dataflow programming models with the notion of throughput constraints and to take this information into account in the compilation tool chain to detect at runtime the throughput bottlenecks. The profiling results gathered during the execution are used both for off-line analyzes and to adapt the application during its execution. In the former case, the developer uses this information to know which part of the dataflow program should be optimized and to efficiently distribute the program on the computing units. In the later case, the profiling information is used by runtime adaptation mechanisms to distribute differently the work on the computing units. We give a particular focus on the profiling of the usage of the memory subsystem. The data exchange information provide by the programming model allows to efficiently used the memory subsystem of multicore architectures. Nevertheless, the complexity of modern memory systems doesn't allow to statically evaluate the impact of memory accesses on the global performances of the application. We propose to set up memory profiling dedicated to dataflow applications based on hardware profiling mechanisms
APA, Harvard, Vancouver, ISO, and other styles
27

Shivashankar, Nithin. "Design and Analysis of Modular Architectures for an RNS to Mixed Radix Conversion Multi-processor." University of Cincinnati / OhioLINK, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1396531505.

Full text
APA, Harvard, Vancouver, ISO, and other styles
28

Fronte, Daniele. "Design and development of a recongurable cryptographic co-processor." Phd thesis, Université de Provence - Aix-Marseille I, 2008. http://tel.archives-ouvertes.fr/tel-00364723.

Full text
Abstract:
Les circuits à haut technologie d'aujourd'hui requièrent toujours plus de services et de sécurité. Le marché correspondant est orienté vers de la reconfigurabilité. Dans cette thèse je propose une nouvelle solution de coprocesseur cryptographique multi-algorithmes, appelé Celator. Celator est capable de crypter et décrypter des blocs de données en utilisant des algorithmes cryptographiques à clé symétrique tel que l'Advanced Encryption Standard (AES) ou le Data Encryption Standard (DES). De plus, Celator permet de hacher des données en utilisant le Secure Hash Algorithm (SHA). Ces algorithmes sont implémentés de façon matérielle ou logicielle dans les produits sécurisés. Celator appartient à la classe des implémentations matérielles flexibles, et permet à son utilisateur, sous certaines conditions, d'exécuter des algorithmes cryptographiques standards ou propriétaires.

L'architecture de Celator est basée sur un réseau systolique de 4x4 Processing Elements, nommé réseau de PE, commandé par un Contrôleur réalisé avec une Machine d'États Finis (FSM) et une mémoire locale.

Cette thèse présente l'architecture de Celator, ainsi que les opérations de base nécessaires pour qu'il exécute AES, DES et SHA. Les performances de Celator sont également présentées, et comparées à celles d'autres circuits sécurisés.
APA, Harvard, Vancouver, ISO, and other styles
29

Vangal, Sriram R. "Performance and Energy Efficient Building Blocks for Network-on-Chip Architectures." Licentiate thesis, Linköping : Linköpings universitet, 2006. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-7845.

Full text
APA, Harvard, Vancouver, ISO, and other styles
30

Kennedy, Matthew D. "Power-Efficient Nanophotonic Architectures for Intra- and Inter-Chip Communication." Ohio University / OhioLINK, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1458232838.

Full text
APA, Harvard, Vancouver, ISO, and other styles
31

Bhide, Kanchan P. "DESIGN ENHANCEMENT AND INTEGRATION OF A PROCESSOR-MEMORY INTERCONNECT NETWORK INTO A SINGLE-CHIP MULTIPROCESSOR ARCHITECTURE." UKnowledge, 2004. http://uknowledge.uky.edu/gradschool_theses/253.

Full text
Abstract:
This thesis involves modeling, design, Hardware Description Language (HDL) design capture, synthesis, implementation and HDL virtual prototype simulation validation of an interconnect network for a Hybrid Data/Command Driven Computer Architecture (HDCA) system. The HDCA is a single-chip shared memory multiprocessor architecture system. Various candidate processor-memory interconnect topologies that may meet the requirements of the HDCA system are studied and evaluated related to utilization within the HDCA system. It is determined that the Crossbar network topology best meets the HDCA system requirements and it is therefore used as the processormemory interconnect network of the HDCA system. The design capture, synthesis, implementation and HDL simulation is done in VHDL using XILINX ISE 6.2.3i and ModelSim 5.7g CAD softwares. The design is validated by individually testing against some possible test cases and then integrated into the HDCA system and validated against two different applications. The inclusion of crossbar switch in the HDCA architecture involved major modifications to the HDCA system and some minor changes in the design of the switch. Virtual Prototype testing of the HDCA executing applications when utilizing crossbar interconnect revealed proper functioning of the interconnect and HDCA. Inclusion of the interconnect into the HDCA now allows it to implement dynamic node level reconfigurability and multiple forking functionality.
APA, Harvard, Vancouver, ISO, and other styles
32

Hebert, Nicolas. "Stratégie de fiabilisation au niveau système des architectures MPSoC." Thesis, Montpellier 2, 2011. http://www.theses.fr/2011MON20069/document.

Full text
Abstract:
Cette thèse s'inscrit dans un contexte où chaque saut technologique, voit apparaitre des circuits intégrés produits de plus en plus tôt dans la phase de qualification et où la technologie de ces circuits intégrés se rapproche de plus en plus des limitations physiques de la matière. Malgré des contre-mesures technologiques, on se retrouve devant un taux de défaillance grandissant ce qui crée des conditions favorables au retour des techniques de tolérance aux fautes sur les circuits intégrés non critiques.La densité d'intégration atteinte aujourd'hui nous permet de considérer les réseaux reconfigurables de processeur comme des architectures SoC d'avenir. En effet, l'homogénéité de ces architectures laisse entrevoir des reconfigurations possibles de la plateforme qui permettraient d'assurer une qualité de service et donc une fiabilité minimum en présence de défauts. Ainsi, de nouvelles solutions de protection doivent être proposées pour garantir le bon fonctionnement des circuits non plus uniquement au niveau de quelques sous-fonctionnalités critiques mais au niveau architecture système lui-même.En s'appuyant sur ces prérogatives, nous présentons une méthode de protection distribuée et dynamique innovatrice, D-Scale. La méthode consiste à détecter, isoler et recouvrir les systèmes en présence d'erreurs de type « crash ». La détection des erreurs qui ont pour conséquence un « crash » de la plateforme est basée sur un mécanisme de messages de diagnostique échangés entre les unités de traitement. La phase de recouvrement est quant à elle basée sur un mécanisme permettant la reconfiguration de la plateforme de manière autonome. Une implémentation de cette protection matérielle et logicielle est proposée. Le coût de protection est réduit afin d'être intégré dans de futures architectures multiprocesseurs. Finalement, un outil d'évaluation d'impacte des fautes sur la plateforme est aussi étudié afin de valider l'efficacité de la protection
This thesis is placed in a context where, for each technology node, integrated circuits are design at an earlier stage in the qualification process and where the CMOS technology appears to be closer to the silicon physical limitations. Despite technological countermeasure, we face an increase in the failure rate which creates conditions in favor of the return of fault-tolerant techniques for non-critical integrated circuits.Nowadays, we have reached such an integration density that we can consider the reconfigurable processor array as future SoC architectures. Indeed, these homogenous architectures suggest possible platform reconfigurations that would ensure quality of service and consequently a minimum reliability in presence of defects. Thus, new protection solutions must be proposed to ensure circuit smooth operations not only for sub-critical functionalities but at the system architecture level itself.Based on these prerogatives, we present an innovative dynamical and distributed protection method, named D-Scale. This method consists in detecting, isolating and recovering the systems in the presence of error which lead to a "crash" of the platform. The crash error detection is based on heartbeat specific messages exchanged between PEs. The recovery phase is based on an autonomous mechanism which reconfigures the platform.A hardware/software implementation was proposed and evaluated. The protection cost is reduced in order to be integrated within future multi-processor SoC architectures. Finally, a fault effect analysis tool is studied in order to validate the fault-tolerant method robustness
APA, Harvard, Vancouver, ISO, and other styles
33

Quinto, Michele Arcangelo. "Méthode de reconstruction adaptive en tomographie par rayons X : optimisation sur architectures parallèles de type GPU." Thesis, Grenoble, 2013. http://www.theses.fr/2013GRENT109/document.

Full text
Abstract:
La reconstruction tomographique à partir de données de projections est un problème inverse largement utilisé en imagerie médicale et de façon plus modeste pour le contrôle nondestructif. Avec un nombre suffisant de projections, les algorithmes analytiques permettentdes reconstructions rapides et précises. Toutefois, dans le cas d’un faible nombre de vues(imagerie faible dose) et/ou d’angle limité (contraintes spécifiques liées à l’installation), lesdonnées disponibles pour l’inversion ne sont pas complètes, le mauvais conditionnementdu problème s’accentue, et les résultats montrent des artefacts importants. Pour aborderces situations, une approche alternative consiste à discrétiser le problème de reconstruction,et à utiliser des algorithmes itératifs ou une formulation statistique du problème afinde calculer une estimation de l’objet inconnu. Ces méthodes sont classiquement basées surune discrétisation du volume en un ensemble de voxels, et fournissent des cartes 3D de ladensité de l’objet étudié. Les temps de calcul et la ressource mémoire de ces méthodesitératives sont leurs principaux points faibles. Par ailleurs, quelle que soit l’application, lesvolumes sont ensuite segmentés pour une analyse quantitative. Devant le large éventaild’outils de segmentation existant, basés sur différentes interprétations des contours et defonctionnelles à minimiser, les choix sont multiples et les résultats en dépendent.Ce travail de thèse présente une nouvelle approche de reconstruction simultanée àla segmentation des différents matériaux qui composent le volume. Le processus dereconstruction n’est plus basé sur une grille régulière de pixels (resp. voxels), mais sur unmaillage composé de triangles (resp. tétraèdres) non réguliers qui s’adaptent à la formede l’objet. Après une phase d’initialisation, la méthode se décompose en trois étapesprincipales que sont la reconstruction, la segmentation et l’adaptation du maillage, quialternent de façon itérative jusqu’à convergence. Des algorithmes itératifs de reconstructioncommunément utilisés avec une représentation conventionnelle de l’image ont étéadaptés et optimisés pour être exécutés sur des grilles irrégulières composées d’élémentstriangulaires ou tétraédriques. Pour l’étape de segmentation, deux méthodes basées surune approche paramétrique (snake) et l’autre sur une approche géométrique (level set)ont été mises en oeuvre afin de considérer des objets de différentes natures (mono- etmulti- matériaux). L’adaptation du maillage au contenu de l’image estimée est basée surles contours segmentés précédemment, pour affiner la maille au niveau des détails del’objet et la rendre plus grossière dans les zones contenant peu d’information. En finde processus, le résultat est une image classique de reconstruction tomographique enniveaux de gris, mais dont la représentation par un maillage adapté au contenu proposeidirectement une segmentation associée. Les résultats montrent que la partie adaptative dela méthode permet de représenter efficacement les objets et conduit à diminuer drastiquementla mémoire nécessaire au stockage. Dans ce contexte, une version 2D du calcul desopérateurs de reconstruction sur une architecture parallèle type GPU montre la faisabilitédu processus dans son ensemble. Une version optimisée des opérateurs 3D permet descalculs encore plus efficaces
Tomography reconstruction from projections data is an inverse problem widely used inthe medical imaging field. With sufficiently large number of projections over the requiredangle, the FBP (filtered backprojection) algorithms allow fast and accurate reconstructions.However in the cases of limited views (lose dose imaging) and/or limited angle (specificconstrains of the setup), the data available for inversion are not complete, the problembecomes more ill-conditioned, and the results show significant artifacts. In these situations,an alternative approach of reconstruction, based on a discrete model of the problem,consists in using an iterative algorithm or a statistical modelisation of the problem to computean estimate of the unknown object. These methods are classicaly based on a volumediscretization into a set of voxels and provide 3D maps of densities. Computation time andmemory storage are their main disadvantages. Moreover, whatever the application, thevolumes are segmented for a quantitative analysis. Numerous methods of segmentationwith different interpretations of the contours and various minimized energy functionalare offered, and the results can depend on their use.This thesis presents a novel approach of tomographic reconstruction simultaneouslyto segmentation of the different materials of the object. The process of reconstruction isno more based on a regular grid of pixels (resp. voxel) but on a mesh composed of nonregular triangles (resp. tetraedra) adapted to the shape of the studied object. After aninitialization step, the method runs into three main steps: reconstruction, segmentationand adaptation of the mesh, that iteratively alternate until convergence. Iterative algorithmsof reconstruction used in a conventionnal way have been adapted and optimizedto be performed on irregular grids of triangular or tetraedric elements. For segmentation,two methods, one based on a parametric approach (snake) and the other on a geometricapproach (level set) have been implemented to consider mono and multi materials objects.The adaptation of the mesh to the content of the estimated image is based on the previoussegmented contours that makes the mesh progressively coarse from the edges to thelimits of the domain of reconstruction. At the end of the process, the result is a classicaltomographic image in gray levels, but whose representation by an adaptive mesh toits content provide a correspoonding segmentation. The results show that the methodprovides reliable reconstruction and leads to drastically decrease the memory storage. Inthis context, the operators of projection have been implemented on parallel archituecturecalled GPU. A first 2D version shows the feasability of the full process, and an optimizedversion of the 3D operators provides more efficent compoutations
APA, Harvard, Vancouver, ISO, and other styles
34

Daneshbeh, Amir. "Bit Serial Systolic Architectures for Multiplicative Inversion and Division over GF(2m)." Thesis, University of Waterloo, 2005. http://hdl.handle.net/10012/776.

Full text
Abstract:
Systolic architectures are capable of achieving high throughput by maximizing pipelining and by eliminating global data interconnects. Recursive algorithms with regular data flows are suitable for systolization. The computation of multiplicative inversion using algorithms based on EEA (Extended Euclidean Algorithm) are particularly suitable for systolization. Implementations based on EEA present a high degree of parallelism and pipelinability at bit level which can be easily optimized to achieve local data flow and to eliminate the global interconnects which represent most important bottleneck in todays sub-micron design process. The net result is to have high clock rate and performance based on efficient systolic architectures. This thesis examines high performance but also scalable implementations of multiplicative inversion or field division over Galois fields GF(2m) in the specific case of cryptographic applications where field dimension m may be very large (greater than 400) and either m or defining irreducible polynomial may vary. For this purpose, many inversion schemes with different basis representation are studied and most importantly variants of EEA and binary (Stein's) GCD computation implementations are reviewed. A set of common as well as contrasting characteristics of these variants are discussed. As a result a generalized and optimized variant of EEA is proposed which can compute division, and multiplicative inversion as its subset, with divisor in either polynomial or triangular basis representation. Further results regarding Hankel matrix formation for double-basis inversion is provided. The validity of using the same architecture to compute field division with polynomial or triangular basis representation is proved. Next, a scalable unidirectional bit serial systolic array implementation of this proposed variant of EEA is implemented. Its complexity measures are defined and these are compared against the best known architectures. It is shown that assuming the requirements specified above, this proposed architecture may achieve a higher clock rate performance w. r. t. other designs while being more flexible, reliable and with minimum number of inter-cell interconnects. The main contribution at system level architecture is the substitution of all counter or adder/subtractor elements with a simpler distributed and free of carry propagation delays structure. Further a novel restoring mechanism for result sequences of EEA is proposed using a double delay element implementation. Finally, using this systolic architecture a CMD (Combined Multiplier Divider) datapath is designed which is used as the core of a novel systolic elliptic curve processor. This EC processor uses affine coordinates to compute scalar point multiplication which results in having a very small control unit and negligible with respect to the datapath for all practical values of m. The throughput of this EC based on this bit serial systolic architecture is comparable with designs many times larger than itself reported previously.
APA, Harvard, Vancouver, ISO, and other styles
35

Beier, Felix [Verfasser], Kai-Uwe [Akademischer Betreuer] Sattler, Wolfgang [Gutachter] Lehner, and Götz [Gutachter] Graefe. "Generalized database index structures on massively parallel processor architectures / Felix Beier ; Gutachter: Wolfgang Lehner, Götz Graefe ; Betreuer: Kai-Uwe Sattler." Ilmenau : TU Ilmenau, 2019. http://d-nb.info/1194062261/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
36

Laurino, Luiz Sequeira. "Reuso especulativo de traços com instruções de acesso à memória." reponame:Biblioteca Digital de Teses e Dissertações da UFRGS, 2007. http://hdl.handle.net/10183/14741.

Full text
Abstract:
Mesmo com o crescente esforço para a detecção e tratamento de instruções redundantes, as dependências verdadeiras ainda causam um grande atraso na execução dos programas. Mecanismos que utilizam técnicas de reuso e previsão de valores têm sido constantemente estudados como alternativa para estes problemas. Dentro desse contexto destaca-se a arquitetura RST (Reuse through Speculation on Traces), aliando essas duas técnicas e atingindo um aumento significativo no desempenho de microprocessadores superescalares. A arquitetura RST original, no entanto, não considera instruções de acesso à memória como candidatas ao reuso. Desse modo, esse trabalho introduz um novo mecanismo de reuso e previsão de valores chamado RSTm (Reuse through Speculation on Traces with Memory), que estende as funcionalidades do mecanismo original, com a adição de instruções de acesso à memória ao domínio de reuso da arquitetura. Dentre as soluções analisadas, optou-se pela utilização de uma tabela dedicada (Memo_Table_L) para o armazenamento das instruções de carga/escrita. Esta solução garante boa economia de hardware, não limita o número de instruções de acesso à memória por traço e, também, armazena tanto o endereço como seu respectivo valor. Os experimentos, realizados com benchmarks do SPEC2000 integer e floating-point, mostram um crescimento de 2,97% (média harmônica) no desempenho do RSTm sobre o mecanismo original e de17,42% sobre a arquitetura base. O ganho é resultado de uma combinação de diversos fatores: traços maiores (em média, 7,75 instruções por traço; o RST original apresenta 3,17 em média), embora com taxa de reuso de aproximadamente 10,88% (inferior ao RST, que apresenta taxa de 15,23%); entretanto, a latência das instruções presentes nos traços do RSTm é maior e compensa a taxa de reuso inferior.
Even with the growing efforts to detect and handle redundant instructions, the true dependencies are still one of the bottlenecks of the computations. Value reuse and value prediction techniques have been studied in order to become an alternative to these issues. Following this approach, RST (Reuse through Speculation on Traces) combines both reuse mechanisms and has achieved some good performance improvements for superscalar processors. However, the original RST mechanism does not consider load/store instructions as reuse candidates. Because of this, our work presents a new value reuse and value prediction technique named RSTm (Reuse through Speculation on Traces with Memory), that extends RST and adds memory-access instructions to the reuse domain of the architecture. Among all studied solutions, we chose the approach of using a dedicated table (Memo_Table_L) to take care of the load/store instructions. This solution guarantees low hardware overhead, does not limit the number of memory-access instructions that could be stored for each trace and stores both the address and its value. From our experiments, performed with SPEC2000 integer and floating-point benchmarks, RSTm can achieve average performance improvements (harmonic means) of 2,97% over the original RST and 17,42% over the baseline architecture. These performance improvements are due to several reasons: bigger traces (in average, 7,75 per trace; the original RST has 3,17 in average), with a reuse rate of around 10,88% (less than RST, that presents reuse rate of 15,23%) because the latency of the instructions in the RSTm traces is bigger and compensates the smaller reuse rate.
APA, Harvard, Vancouver, ISO, and other styles
37

Skaf, Ali. "Conception de processeurs arithmétiques redondants et en-ligne : algorithmes, architectures et implantations VLSI." Grenoble INPG, 1995. http://www.theses.fr/1995INPG0108.

Full text
Abstract:
L'utilisation des systemes classiques de notation dans des realisations d'algorithmes de calcul sur silicium se heurte au probleme incontournable de la propagation de la retenue. En effet, presque toutes les methodes visant a accelerer la propagation de la retenue, et donc le calcul, ne remettent pas en cause l'ecriture classique basee sur le fameux couple (bit, valeur binaire). Or l'arithmetique redondante est justement basee sur l'idee d'ecrire les nombres autrement afin d'eviter carrement la propagation de la retenue. Dans le cadre de cette these, nous nous interessons au developpement et a l'implantation d'algorithmes redondants. De telles realisations vlsi montrent les possibilites et surtout l'efficacite des notations redondantes. Nous etudions en particulier la conception de circuits vlsi implantant un algorithme de calcul des fonctions reelles par leurs developpements limites. Nous exposons egalement une realisation d'un algorithme en-ligne derive de cordic et donnant l'exponentielle et le logarithme complexes. Le circuit obtenu, saga, peut etre considere comme le premier coprocesseur arithmetique en-ligne
APA, Harvard, Vancouver, ISO, and other styles
38

Dupros, Fabrice. "Contribution à la modélisation numérique de la propagation des ondes sismiques sur architectures multicœurs et hiérarchiques." Thesis, Bordeaux 1, 2010. http://www.theses.fr/2010BOR14147/document.

Full text
Abstract:
En termes de prévention du risque associé aux séismes, la prédiction quantitative des phénomènes de propagation et d'amplification des ondes sismiques dans des structures géologiques complexes devient essentielle. Dans ce domaine, la simulation numérique est prépondérante et l'exploitation efficace des techniques de calcul haute performance permet d'envisager les modélisations à grande échelle nécessaires dans le domaine du risque sismique.Plusieurs évolutions récentes au niveau de l'architecture des machines parallèles nécessitent l'adaptation des algorithmes classiques utilisées pour la modélisation sismique. En effet, l'augmentation de la puissance des processeurs se traduit maintenant principalement par un nombre croissant de cœurs de calcul et les puces multicœurs sont maintenant à la base de la majorité des architectures multiprocesseurs. Ce changement correspond également à une plus grande complexité au niveau de l'organisation physique de la mémoire qui s'articule généralement autour d'une architecture NUMA (Non Uniform Memory Access pour accès mémoire non uniforme) de profondeur importante.Les contributions de cette thèse se situent à la fois au niveau algorithmique et numérique mais abordent également l'articulation avec les supports d'exécution optimisés pour les architectures multicœurs. Les solutions retenues sont validées à grande échelle en considérant deux exemples de modélisation sismique. Le premier cas se situe dans la préfecture de Niigata-Chuetsu au Japon (événement du 16 juillet 2007) et repose sur la méthode des différences finies. Le deuxième exemple met en œuvre la méthode des éléments finis. Un séisme hypothétique dans la région de Nice est modélisé en tenant compte du comportement non linéaire du sol
One major goal of strong motion seismology is the estimation of damage in future earthquake scenarios. Simulation of large scale seismic wave propagation is of great importance for efficient strong motion analysis and risk mitigation. Being particularly CPU-consuming, this three-dimensional problem makes use of high-performance computing technologies to make realistic simulation feasible on a regional scale at relatively high frequencies.Several evolutions at the chip level have an important impact on the performance of classical implementation of seismic applications. The trend in parallel computing is to increase the number of cores available at the shared-memory level with possible non-uniform cost of memory accesses. The increasing number of cores per processor and the effort made to overcome the limitation of classical symmetric multiprocessors SMP systems make available a growing number of NUMA (Non Uniform Memory Access) architecture as computing node. We therefore need to consider new approaches more suitable to such parallel systems.This PhD work addresses both the algorithmic issues and the integration of efficient programming models for multicore architectures. The proposed contributions are validated with two large scale examples. The first case is the modeling of the 2007 Niigata-Chuetsu, Japan earthquake based on the finite differences numerical method. The second example considers a potential seismic event in the Nice sedimentary basin in the French Riviera. The finite elements method is used and the nonlinear soil behavior is taken into account
APA, Harvard, Vancouver, ISO, and other styles
39

Grad, Mariusz [Verfasser], and Marco [Akademischer Betreuer] Platzner. "Just-in-time processor customization on the feasibility and limitations of FPGA-based dynamically reconfigurable instruction set architectures / Mariusz Grad. Betreuer: Marco Platzner." Paderborn : Universitätsbibliothek, 2011. http://d-nb.info/1036423565/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
40

Delespierre, Tiba. "Etude de cas sur architectures à mémoires distribuées : une maquette systolique programmable et l'hypercube d'Intel." Paris 9, 1987. https://portail.bu.dauphine.fr/fileviewer/index.php?doc=1987PA090073.

Full text
Abstract:
Présentation de deux types de calculateurs parallèles à mémoires distribuées: la machine Systolimag, un réseau systolique programmable universel et l'hypercube d'Intel. Une étude est faite sur les programmes réalisés sur la machine Systolimag et l'hypercube d'Intel
APA, Harvard, Vancouver, ISO, and other styles
41

Matherat, Philippe. "Contribution à l'augmentation de puissance des architectures de visus graphiques." Phd thesis, Université Pierre et Marie Curie - Paris VI, 1988. http://tel.archives-ouvertes.fr/tel-00172858.

Full text
Abstract:
La motivation de ce travail est la réalisation de circuits permettant d'afficher rapidement des images sur un écran d'ordinateur. Voici dix ans, nous avons proposé un circuit LSI, prenant en charge la gestion d'une mémoire d'image et l'écriture rapide de segments de droite et de caractères, dans une optique de "terminal graphique". Nous avons ensuite cherché à augmenter les performances de cette architecture et à l'adapter à l'environnement "station de travail". Nous sommes aujourd'hui convaincu que la solution ne passe pas par des circuits spécialisés, mais par la définition d'opérateurs généraux de calcul très puissants. Pour expliquer cet itinéraire, nous décrivons une suite d'expérimentations réalisées, précédée par une histoire des architectures de visualisation.
APA, Harvard, Vancouver, ISO, and other styles
42

Bringer, Yves. "Performances de nouvelles architectures machines pour la mise en oeuvre d'algorithmes de traitement et d'analyse d'image." Saint-Etienne, 1993. http://www.theses.fr/1993STET4024.

Full text
Abstract:
Une carte électronique a été réalisée à l'Institut de chimie et physique industrielles de Lyon utilisant quatre processeurs à architecture à flot de données et programmable liant ainsi puissance et souplesse d'utilisation. Pour valider cette architecture pour le traitement et l'analyse d'image, l'approche a été double : - mise en oeuvre d'algorithme à la fois coûteux et originaux scientifiquement : algorithme de Danielson, suppression de flou, reconstruction 3D. - implantation sur site industriel avec prise en compte des contraintes de temps et intégration dans une chaine complète de contrôle
APA, Harvard, Vancouver, ISO, and other styles
43

Senni, Sophiane. "Exploration of non-volatile magnetic memory for processor architecture." Thesis, Montpellier, 2015. http://www.theses.fr/2015MONTS264/document.

Full text
Abstract:
De par la réduction continuelle des dimensions du transistor CMOS, concevoir des systèmes sur puce (SoC) à la fois très denses et énergétiquement efficients devient un réel défi. Concernant la densité, réduire la dimension du transistor CMOS est sujet à de fortes contraintes de fabrication tandis que le coût ne cesse d'augmenter. Concernant l'aspect énergétique, une augmentation importante de la puissance dissipée par unité de surface frêne l'évolution en performance. Ceci est essentiellement dû à l'augmentation du courant de fuite dans les transistors CMOS, entraînant une montée de la consommation d'énergie statique. En observant les SoCs actuels, les mémoires embarquées volatiles tels que la SRAM et la DRAM occupent de plus en plus de surface silicium. C'est la raison pour laquelle une partie significative de la puissance totale consommée provient des composants mémoires. Ces deux dernières décennies, de nouvelles mémoires non volatiles sont apparues possédant des caractéristiques pouvant aider à résoudre les problèmes des SoCs actuels. Parmi elles, la MRAM est une candidate à fort potentiel car elle permet à la fois une forte densité d'intégration et une consommation d'énergie statique quasi nulle, tout en montrant des performances comparables à la SRAM et à la DRAM. De plus, la MRAM a la capacité d'être non volatile. Ceci est particulièrement intéressant pour l'ajout de nouvelles fonctionnalités afin d'améliorer l'efficacité énergétique ainsi que la fiabilité. Ce travail de thèse a permis de mener une exploration en surface, performance et consommation énergétique de l'intégration de la MRAM au sein de la hiérarchie mémoire d'un processeur. Une première exploration fine a été réalisée au niveau mémoire cache pour des architectures multicoeurs. Une seconde étude a permis d'évaluer la possibilité d'intégrer la MRAM au niveau registre pour la conception d'un processeur non volatile. Dans le cadre d'applications des objets connectés, de nouvelles fonctionnalités ainsi que les intérêts apportés par la non volatilité ont été étudiés et évalués
With the downscaling of the complementary metal-oxide semiconductor (CMOS) technology,designing dense and energy-efficient systems-on-chip (SoC) is becoming a realchallenge. Concerning the density, reducing the CMOS transistor size faces up to manufacturingconstraints while the cost increases exponentially. Regarding the energy, a significantincrease of the power density and dissipation obstructs further improvement inperformance. This issue is mainly due to the growth of the leakage current of the CMOStransistors, which leads to an increase of the static energy consumption. Observing currentSoCs, more and more area is occupied by embedded volatile memories, such as staticrandom access memory (SRAM) and dynamic random access memory (DRAM). As a result,a significant proportion of total power is spent into memory systems. In the past twodecades, alternative memory technologies have emerged with attractive characteristics tomitigate the aforementioned issues. Among these technologies, magnetic random accessmemory (MRAM) is a promising candidate as it combines simultaneously high densityand very low static power consumption while its performance is competitive comparedto SRAM and DRAM. Moreover, MRAM is non-volatile. This capability, if present inembedded memories, has the potential to add new features to SoCs to enhance energyefficiency and reliability. In this thesis, an area, performance and energy exploration ofembedding the MRAM technology in the memory hierarchy of a processor architectureis investigated. A first fine-grain exploration was made at cache level for multi-core architectures.A second study evaluated the possibility to design a non-volatile processorintegrating MRAM at register level. Within the context of internet of things, new featuresand the benefits brought by the non-volatility were investigated
APA, Harvard, Vancouver, ISO, and other styles
44

Sato, Toshinori. "History Directed Processor Architecture." Kyoto University, 1999. http://hdl.handle.net/2433/182380.

Full text
APA, Harvard, Vancouver, ISO, and other styles
45

Carli, Roberto. "Flexible MIPS soft processor architecture." Thesis, Massachusetts Institute of Technology, 2008. http://hdl.handle.net/1721.1/45809.

Full text
Abstract:
Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008.
Includes bibliographical references (p. 48-49).
The flexible MIPS soft processor architecture borrows selected technologies from high-performance computing to deliver a modular, highly customizable CPU targeted towards FPGA implementations for embedded systems; the objective is to provide a more flexible architectural alternative to coprocessor-based solutions. The processor performs out-of-order execution on parallel functional units, it delivers in -order instruction commit and it is compatible with the MIPS-1 Instruction Set Architecture. Amongst many available options, the user can introduce custom instructions and matching functional units; modify existing units; change the pipelining depth within functional units to any fixed or variable value; customize instruction definitions in terms of operands, control signals and register file interaction; insert multiple redundant functional units for improved performance. The flexibility provided by the architecture allows the user to expand the processor functionality to implement instructions of coprocessor-level complexity through additional functional units. The processor design was implemented and simulated on two FPGA platforms, tested on multiple applications, and compared to three commercially available soft processor solutions in terms of features, area, clock frequency and benchmark performance.
by Robert Carli.
M.Eng.
APA, Harvard, Vancouver, ISO, and other styles
46

Costa, Celsio Maciel da. "Environnement d'éxécution parallèle : conception et architecture." Grenoble 1, 1993. http://tel.archives-ouvertes.fr/tel-00005132.

Full text
Abstract:
"la 4e de couverture indique :L'objectif de cette thèse est l'étude d'un environnement d'exécution pour machines parallèles sans mémoire commune. Elle comprend la définition d'un modèle de programme parallèle, base sur l'échange de message offrant une forme restreinte de mémoire partagée. La communication est indirecte, via des portes ; les processus utilisent les barrières pour la synchronisation. Les entités du système, processus, portes et barrières, sont créées dynamiquement, et placées sur un processeur quelconque du réseau de processeurs de façon explicite
Nous proposons une implantation de ce modèle comme la mise en œuvre systématique d'une architecture client/ serveur. Cette implantation a été effectuée sur une machine Supernode. La base est un Micro Noyau Parallèle, ou le composant principal est un mécanisme d'appel de procédure à distance minimal"
APA, Harvard, Vancouver, ISO, and other styles
47

Martin, Rovira Julia, and Fructoso Melero Francisco Manuel. "Micro-Network Processor : A Processor Architecture for Implementing NoC Routers." Thesis, Jönköping University, JTH, Computer and Electrical Engineering, 2007. http://urn.kb.se/resolve?urn=urn:nbn:se:hj:diva-941.

Full text
Abstract:

Routers are probably the most important component of a NoC, as the performance of the whole network is driven by the routers’ performance. Cost for the whole network in terms of area will also be minimised if the router design is kept small. A new application specific processor architecture for implementing NoC routers is proposed in this master thesis, which will be called µNP (Micro-Network Processor). The aim is to offer a solution in which there is a trade-off between the high performance of routers implemented in hardware and the high level of flexibility that could be achieved by loading a software that routed packets into a GPP. Therefore, a study including the design of a hardware based router and a GPP based router has been conducted. In this project the first version of the µNP has been designed and a complete instruction set, along with some sample programs, is also proposed. The results show that, in the best case for all implementation options, µNP was 7.5 times slower than the hardware based router. It has also behaved more than 100 times faster than the GPP based router, keeping almost the same degree of flexibility for routing purposes within NoC.

APA, Harvard, Vancouver, ISO, and other styles
48

Chow, K. W. "Multi-processor architecture for machine vision." Thesis, Cardiff University, 1992. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.358531.

Full text
APA, Harvard, Vancouver, ISO, and other styles
49

Campanella, William C. "The nature of the problem statement in architectural programming : a critical analysis of three programming processes." Thesis, Georgia Institute of Technology, 1987. http://hdl.handle.net/1853/23156.

Full text
APA, Harvard, Vancouver, ISO, and other styles
50

Yang, Chih-Chyau, and 楊智喬. "The Study on Media Processor Architectures." Thesis, 1999. http://ndltd.ncl.edu.tw/handle/73880867274699646454.

Full text
Abstract:
碩士
國立交通大學
電子工程系
87
With the rapidly evolving in multimedia technology, multimedia applications have taken a great influence on our daily life. A media processor is a good solution to these multimedia applications. In this thesis, a media processor is presented for multimedia data processing. It contains an ARM-like mainprocessor and a SIMD Vector Coprocessor (AM-SVC). The ARM-like mainprocessor, which acts as a system controller, is instruction compatible with ARM7. The SIMD Vector Coprocessor consists of one multiplier unit, one arithmetic unit, two load/store units, and two scalar units. Combining with a DMA-like controller, splittable design of multiplier and arithmetic units, separated load/store units, scalar units and a concurrent control unit, our media processor is able to alleviate mainprocessor's burden, exploit high data parallelism, and achieve out-of-order execution and in-order completion. Multithread architecture is also adopted in the coprocessor for exploiting thread level parallelism. All the modules except memory and register files in the design are coded in synthesizable RTL Verilog HDL. Simulation results of AM-SVC are also given in this thesis.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography