Dissertations / Theses: 'Hardware/algorithm co-design'

1

Zhang, Zhengdong Ph D. Massachusetts Institute of Technology. "Efficient computing for autonomous navigation using algorithm-and-hardware co-design." Thesis, Massachusetts Institute of Technology, 2019. https://hdl.handle.net/1721.1/122691.

Full text

Abstract:

This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2019
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 211-221).
Autonomous navigation algorithms are the backbone of many robotic systems, such as self-driving cars and drones. However, state-of-the-art autonomous navigation algorithms are computationally expensive, requiring powerful CPUs and GPUs to enable them to run in real time. As a result, it is prohibitive to deploy them on miniature robots with limited computational resources onboard. To tackle this challenge, this thesis presents an algorithm-and-hardware co-design approach to design energy-efficient algorithms that are optimized for dedicated hardware architectures at the same time. It covers the design for three essential modules of an autonomous navigation system: perception, localization, and exploration.
Compared with previous research that considers either algorithmic improvements or hardware architecture optimizations, our approach leads to algorithms that not only have lower time and space complexity but also map efficiently to specialized hardware architectures, resulting in significantly improved energy efficiency and throughput. First, this thesis studies how to design an energy-efficient visual perception system using the deformable part models (DPM) based object detection algorithm. It describes an algorithm that enforces sparsity in the data stored on a chip, which reduces the memory requirement by 34% and lowers the cost of the classification by 43%. Together with other hardware optimizations, this technique leads to an object detection chip that runs at 30 fps on 1920 x 1080 videos while consuming only 58.6mW of power.
Second, this thesis describes a systematic way to explore algorithm-hardware design choices to build a low-power chip that performs visual inertial odometry (VIO) to localize a vehicle. Each of the components in a VIO pipeline has multiple algorithmic choices with different time and space complexity. However, some algorithms of lower time complexity can be more expensive when implemented on-chip. This thesis examines each of the design choices from both the algorithm and hardware's point of view and presents a design that consumes 24mW of power while running at up to 90 fps and achieving near state-of-the-art localization accuracy Third, this thesis presents an efficient information theoretic mapping system for exploration. It features a novel algorithm called Fast computation of Shannon Mutual Information (FSMI) that computes the Shannon mutual information (MI) between perspective range measurements and the environment.
FSMI algorithm features an analytic solution that avoids the expensive numerical integration required by the previous state-of-the-art algorithms, enabling FSMI to run three orders-of-magnitude faster in practice. We also present an extension of the FSMI algorithm to 3D mapping; the algorithm leverages the compression of a large 3D map using run-length encoding (RLE) and achieves 8x acceleration in a real-world exploration task. In addition, this thesis presents a hardware architecture designed for the FSMI algorithm. The design consists of a novel memory banking method that increases the memory bandwidth so that multiple FSMI cores can run in parallel while maintaining high utilization. A novel arbiter is proposed to resolve the memory read conflicts between multiple cores within one clock cycle. The final design on an FPGA achieves more than 100x higher throughput compared with a CPU while consuming less than 1/10 of the power.
by Zhengdong Zhang.
Ph. D.
Ph.D. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science