Scalable DRAM Architecture
DRAM (Dynamic Random Access Memory) is generally used memory technology for the main memory of a computer system.
However, there are challenging problems in solving DRAM scaling, such as, power consumption, latency, and reliability.
We study and present a new DRAM architecture to solve these problems.
Near Memory Processing Architecture
For many data-intensive applications such as machine learning and image processing, memory bottleneck problem rather than lacking processing capability causes system performance degradation.
To overcome this memory bottleneck problem, NMP (Near Memory Processing) is getting a lot of attention, which executes some operations close to DRAM
instead of executing the operations on the CPU.
Our goal is doing research and designing a novel NMP architecture to improve the performance of running memory-bounded applications.
Hardware Security: RowHammer
As DRAM process technology has scaled down to approximately 10 nm, High-density cells are likely to suffer from electromagnetic interference between neighboring cells.
There are certain circuit-level error mechanisms that exploit these issues to create system security vulnerabilities, called RowHammer.
We study and propose robust hardware-based protection method for RowHammer attacks.
Accelerator
Design Accelerator for Neural Networks
General-purpose accelerator such as GPU is inefficient for accelerating neural network in terms of power consumption and chip area, especially on edge devices.
Therefore it is important to design an architecture specialized in neural networks for efficiency.
We are interested in the architecture of a neural network accelerator considering hardware implementation cost, external memory access overhead, PE utilization, and data reuse.
Deep Learning Algorithm
Video Object Detection
Recent advances in deep convolutional neural networks have driven significant progress in still image object detection. However, common single-image-based object detectors fail to achieve sufficiently high accuracy when detecting objects in videos, mainly due to severe deterioration effects such as motion blur, partial occlusion, camera defocus, and pose variation. Our research interest is to remedy that problem by designing efficient neural networks utilizing spatiotemporal information obtained using multi-head attention and external memory networks.
Model Compression
In deep learning, model compression is the key technique in real-world applications and model deployment. The goals of the model compression are as follows; minimizing the model size and decreasing model inference latency, and these goals can be achieved by various methods such as pruning, knowledge distillation, factorization, quantization, and so on.
Our research interest is pruning and quantization among model compression techniques.
Pruning eliminates redundant parts of weight parameters to make the network smaller and lower the computation cost, and pruning could be done elements-wise, or in structural manners; row/col/channel-wise.
Quantization reduces the number of bits required to represent the model, such as weights and activation. Quantization can reduce the size of the model. It can also decrease inference latency by using a dedicated framework and hardware.
Searching Optimal Data Augmentation Strategy
To make data augmentation more effective, novel data augmentation techniques have been proposed, such as removing some of the images, combining two or more images, and semantic data augmentation. As a result, researchers have more options to choose from. Therefore, we analyze novel data augmentation techniques and study how to find optimal strategies based on the characteristics of the data.