Energy efficiency has become the limiting factor of current and future computing performance, affecting computing systems of all kinds, from mobile devices to datacenters. Meanwhile, modern applications continue to grow more complex and computationally expensive, while relying on larger amounts of data. This presents a considerable challenge: how can we continue to improve our computational capabilities in spite of these limitations?

A key technique to improve energy efficiency and reach high performance is hardware specialization. Recently, there has been much interest in using field-programmable gate arrays (FPGAs) as accelerators in general-purpose computing environments. Their fine-grained parallel structures allow them to exploit the benefits of hardware-level customization while they still allow reprogrammability.

However the biggest obstacle limiting the growth of FPGAs is the difficulty of implementing algorithms in hardware and integrating this hardware into real-world computer systems. My research aims to address these difficulties by combining the areas of digital hardware design with compilers, tools, and domain-specific languages. More specifically, my work explores how we can use computer-based tools to make digital hardware more efficient, how we can reduce the effort needed to design, optimize, and verify digital systems, and how these technologies can be exploited to address key challenges in modern computing.

Below you will find high-level descriptions of my current research work and information for a few selected papers. For a full list of papers please see my Publications page.

Accelerating Deep Learning and Computer Vision with FPGAs

Deep learning and convolutional neural networks (CNNs) have revolutionized machine learning, leading to recent advances in several areas such natural language processing and computer vision, and widespread interest from industry and academia. However, these advances come at a steep computational cost. The goal of this project is to enable implementation of large-scale deep learning applications on a scalable parallel “cloud” of FPGAs by automating the translation from straightforward algorithmic specifications of deep learning problems into optimized hardware, parallelized across many interconnected FPGAs.

This work is funded by the National Science Foundation’s Exploiting Parallelism and Scalability (XPS) program through award 1533739.

Selected papers:

Domain-Specific Languages and Tools for Automatic Hardware Generation

In order to reduce the difficulty of implementing FPGA and ASIC accelerators, researchers have proposed a number of different types of automated systems. Some of these take the form of parameterized IP (intellectual property) cores, which are implementations of a given problem created by an expert with a small amount of flexibility through parameters. At the other end of the spectrum are tools such as “high-level synthesis” (HLS) that aim to convert C or C++ code directly into hardware. In practice, typical parameterized IPs are too restrictive, forcing designers into a “one-size-fits-all” approach; meanwhile, HLS is too open-ended: by trying to work well for all problems, it is too difficult to produce good solutions.

My work aims to address these problems through the use of domain-specific hardware generation tools. These tools target a specific domain of problems (e.g. linear DSP transforms), providing enough flexibility to work well for a variety of different problems in the domain, while being targeted enough that they can produce very good results with little effort from the end user. One example of this is my work on the Spiral hardware generation framework, a domain specific hardware generation tool for linear signal processing transforms such as the fast Fourier transform. This system uses a mathematical domain-specific language (DSL) to optimize transform algorithm hardware; its results are competitive with (and often are more efficient than) hand-designed systems.

My ongoing work aims to create a flexible framework for creating domain-specific hardware generators, improving their usability, and using the results to study new application domains.

Selected papers:

See also the Spiral DFT/FFT hardware generator, which produces high quality designs over a very wide tradeoff space, allowing users to choose designs that best match their implementation-specific tradeoff goals, balancing cost (power, energy, area) against performance (throughput, latency). The system is able to produce cores that compare well with existing designs in the literature or in IP libraries and enables higher performance/cost design points than otherwise available.

Hardware Accelerators for Datacenters and Networks

Datacenters (large-scale computing centers comprised of large numbers of servers) have become ubiquitous in modern computing, but are severely power constrained. Although typical datacenter applications are not traditional targets for hardware acceleration, their strict power limits have made FPGA acceleration an attractive target. However, typical datacenter applications can be considerably challenging to accelerate with FPGAs. The goal of this work is to study how FPGAs can improve efficiency and speed of large-scale datacenters and their applications.

This work is supported by the Semiconductor Research Corporation.

Low-Cost and Low-Power Hardware for Collaborative Spectrum Sensing

With the explosive growth in wireless communications, the RF spectrum is now more than ever an important but limited resource. However, monitoring use of the RF spectrum in space and time, whether to patrol for unauthorized access or to exploit under-utilized bandwidth, can become extremely difficult. The goal of this collaborative project is to bring “spectrum sensing to the masses” by studying how to design efficient low-power spectrum sensing hardware that can be distributed across a region of interest, and pairing it with intelligent centralized algorithms that can aggregate and interpolate sensed data. Specifically, we are studying how automatic generation techniques can help create efficient hardware for sensing and detecting usage of the frequency spectrum. The goal is to produce a domain-specific hardware generation framework that will allow users to quickly create hardware designs adapted to different tradeoff scenarios.

This work is supported by the National Science Foundation EARS (“Enhancing Access to the Radio Spectrum”) program, under award 1642965.

Generating Domain-Specific Hardware Accelerators for Edge Computing

In the near future, edge devices (including smartphones, connected vehicles, road-side units with sensors and radios, and Internet-of-Things (IoT) devices) will be densely distributed and pervasively embedded in the world around us. Edge devices will sense and control our physical environments, processing sensing data to allow objects and computers to understand our surroundings. This new edge sensing/computing paradigm utilizes pervasively embedded sensing and computing resources, thus distributing the data processing, decision making and intelligence throughout the environment. Edge computing devices pose significant challenges, often requiring high computational capabilities over a range of different types of algorithms (e.g., signal processing, feature detection, machine learning) within limited power budgets. For these reasons, FPGAs represent an attractive platform for edge computing (especially for research in this area), but the difficulty of working with hardware dissuades many researchers. The goal of this recent project is to create a hardware/software platform for FPGA-based edge sensing and computing devices with an accompanying domain-specific hardware generator that will allow practitioners to easily prototype edge computing systems with FPGAs.

This work is supported by the National Science Foundation under award 1730291.