# **Embedded Systems and High Performance Computing (EPiC) Lab**

**Prof. Sudeep Pasricha (Director) Monfort and Rockwell-Anderson Professor Dept. of Electrical and Computer Engineering | Dept. of Computer Science Colorado State University, Fort Collins, CO – 80523 email: sudeep@colostate.edu**



- **Many embedded systems use cloud computing**
- **Mobile Computing**
- **Energy demands** and **capabilities** of "smart" mobile devices are





# **CAD Tools for Multicore Chip Design**

- increasing rapidly with growing mobile app complexity
	- But **battery technology** is lagging behind and is expected to continue to be a limiting factor for future growth of mobile devices such as smartphones
	- How to intelligently **manage energy** and improve **battery lifetime** for mobile devices?





### **AURA middleware for CPU/backlight energy optimization**

- Predicts idle periods (perceptual, cognitive, motor) during user-app interactions
- Bayesian classification of mobile apps at runtime based on user-device interactions
- Markov Decision Process (MDP) based algorithms to control:
	- dynamic voltage/freq. scaling (DVFS) for CPU energy saving during idle periods
	- backlight level (and energy) reduction based on theory of human change blindness
- Power model based on real measurements of various Android OS-based smartphones
- Avg. energy savings of **29%** vs. default Android scheme; **5x** over prior work; no QoS impact

- Indoor location sensing is difficult due to lack of GPS signals in indoor environments
- Current techniques are energy-hungry, lack accuracy, and are very infrastructure dependent
- Can use Wifi/UWB/cellular fingerprinting and inertial sensors to predict location indoors
- Use machine learning techniques (LearnLoc) together with fingerprinting and inertial sensing (dead reckoning) to improve prediction accuracy and save energy
- **Critical for search-and-rescue during emergency scenarios (e.g., cave-ins in mining)**

### **Context-aware cloud offloading, wireless data transfers, and outdoor location sensing**

- **Novel CAD tools for emerging 2D/3D multicore chip design**
	- **•** Design-time algorithms for core/memory/network selection and configuration
	- Run-time algorithms to map computation, communication, and data on the chip die
	- Co-optimize: performance, energy/power, soft/hard fault resilience, security, yield, cost, ...

- Reduce energy for data transmission and outdoor location sensing on mobile devices
- Use software-based machine learning techniques to learn usage of data and location interfaces to determine optimal interface selection, ON/OFF schedule, configuration
	- Linear discriminant analysis, linear logistic regression, Non-linear logistic regression with neural networks, k-nearest neighbor, Support vector machines
- Use similar techniques to determine when it is beneficial to compute in cloud vs. device
- Up to **85%** energy savings vs. default Android scheme; **24%** savings over prior work

### **Energy-efficient and accurate indoor location sensing**

- **Major challenge** in designing datacenters that support cloud computing as well as supercomputers that solve large scientific problems: **need for energy-efficient operation**
	- Energy costs today ~ \$1M/year/petaflop
		- Cannot sustain such costs at exascale!
- **How can we reduce energy costs?**

# **Network-on-Chip (NoC) Architectures**

# **High Performance Computing**

- **Nearly all modern innovations depend on continued advances in multicore system-on-chip computing performance**
	- **Major impact on innovation across application domains: automotive, defense, medical, multimedia, telecommunications, aerospace, mobile/cloud computing**



- **Improvement in memory density, bandwidth, and form factor are critical for next generation multicore computing chips**
	- To support increasing data demands from rising chip core counts, growing graphics capabilities, the Big Data revolution, and higher network/Ethernet speeds
	- **Challenges:** 
		- How to scale memory component density?

- **But multicore system-on-chip design in advanced semiconductor fabrication technologies today faces several challenges**
- **High power/energy dissipation that increases costs and limits achievable performance**
- Process, voltage, and thermal variations that cause **uncertainty** and **high time-to-market**
- Increasing susceptibility to transient and permanent faults that **reduces design reliability**



**Need new computer-aided design (CAD) tools to perform multi-objective chip design exploration and optimization**

**Design of on-chip communication fabric is a very critical factor influencing multicore chip performance, power, and reliability** 

- **NoCs have replaced on-chip buses, but face challenges**
- High packet transfer latency with increasing core counts
- High susceptibility to transient (soft) and permanent (aging/hard) faults
- Need to balance multiple goals while satisfying design constraints
- **Fault-tolerant NoC protocols and adaptation**
	- **Reliable NoC packet routing algorithms** OE+IOE: hybrid multiple turn-model routing algorithm for 2D NoCs 4NP-First: hybrid turn-model routing algorithm for 3D NoCs **Fault vulnerability aware NoC optimizations Proposed network vulnerability factor (NVF) metric to characterize** vulnerability of network interfaces and routers to faults









Scheduleı

**AWDTM** 

Thread Migratioı



Thermal Sensors

 $\rightarrow$ 

### **Energy Efficient and Stochastically Robust Resource Allocation**

- **Workload and system uncertainty modeling**
	- Model uncertainty in execution time, network  $\uparrow$ transfers, data access, and GPU offloading
- Quantify task and machine heterogeneities in real world HPC systems
- **Analysis of thermal and energy dynamics**
	- Real time thermal analysis
	- Adapting thermal setpoints
	- Characterize cooling energy & costs
- **Smart resource allocation algorithms**
- Dynamically adapt to unpredictable performance jitters (delays)
- Prevent security breaches with lightweight key management protocols
- Adapt to heterogeneous network types: CAN, FlexRay, TTEthernet, wireless,













- **NoC architecture optimization**
	- **Roce-Bush NoC router for 3D NoCs**
	- Routing algorithm-aware decomposition of NoC router
	- **Memory- and application-aware NoC prioritization**
		- **Heterogeneous NoC scheduling with anti-starvation support**
- **Photonic NoC architectures**
	- **Speed of light latency, low power, high throughput** ■ Scale much better than traditional electrical wires!
	- **Challenges:**
	- Photonic crosstalk
	- Process variations **Thermal variations**
	- Topology design
	- Protocol design
	- **Fabrication choices •** Off-chip interfacing
	- **Solutions:**
	- New 2D/3D photonic NoC architectures **Free-space, mono-layer, multi-layer**
	- New cross-layer (device, circuit, system) techniques To overcome crosstalk and variation uncertainty
	- New photonic arbitration, encoding, flow-control protocols

Multi-wavelengt

**New off-chip CPU-memory photonic architectures** 





- Based on uncertainty models and thermal/energy analysis
- Validation on diverse scientific applications and real world teraand peta-scale systems at NCAR, DOE/ORNL, and CSU



- Enable autonomous vehicles: design robust vehicle/pedestrian/traffic-light/sign/lane detection algorithms
- Implement vision algorithms on low-power ADAS boards, smartphones, tablets
- Utilize stereo vision cameras, and other data from: LIDAR, RADAR, vehicle-to-vehicle, vehicle-to-infrastructure

**Mission:** Algorithms and architectures for energy-efficient, fault-tolerant, and secure design of **embedded systems** (cyber-physical systems), **mobile computing** (smartphones, wearables, internet-of-things), and **high performance computing** (datacenters, supercomputers)

# **Memory Architectures**

er Access of a 64B Cachelin



**New 3D DRAM architectures**

**Decomposed (folded) bank architecture**

Split bank and rank across layers (3D-ProWiz)

Improved performance over state-of-the-art DRAMs

- How to increase bandwidth and reduce latency?
- How to best manage power dissipation/energy?

HMC, HBM, DDR/GDDR, DiRAM, …



### **DRAM and cache/scratchpad memory optimizations**

- **DRAM refresh overhead reduction (new massed/crammed refresh techniques)**
- **Scratchpad data placement (static and adaptive packing strategies)**
- **Hybrid SRAM/NVRAM cache architecture design; policy configuration**



# **Energy Harvesting IoT Platforms**

## **Automotive Embedded Systems**

# **Embedded System Applications and Prototypes**

- **Solar energy harvesting** can power many IoT and embedded systems
- How to **schedule software** applications on multicore platforms under variant and stringent energy harvesting conditions that often exist at runtime?
- How to cope with **thermal spikes** and **faults** arising at runtime?

### **Run-time harvesting-aware scheduling framework**

- Dynamically enables/disables cores, scales voltage/frequency to manage energy
- Proactively throttles cores to manage temperature
- Distributes and dynamically reassigns software to maximize core utilization





**Drive was STOP** 



- **Another major challenge:** ensuring **fault-resilient operation** Exascale HPC systems will experience a fault every few minutes!
- How to quickly and effectively recover from frequent faults?
- **Reliability exploration/management for extreme-scale HPC Analysis of checkpointing, redundancy based techniques**





**CSU buildings chosen for indoor localization analysis**











- 
- -
- -

# **Graduate Students**

 **Current: Sai Chittamuru, Ishan Thakkar, Daniel Dauwe, Yaswanth Raparti, Vipin Kukkala, Saideep Tiku, Shoumik Maiti, Greg Kittilson, Ninad Hogade, Yahav Biran, Chris Langlois, Varun Kilenje, Swapnil Bhosale, Ayush Kumar, Jordan Tunnell, Zemin Tao, Rohit Kudre, Rohan Jhaveri**

 **Alumni: Shirish Bahirat, Yong Zou, Yi Xiang, Nishit Kapadia, Mark Oxley, Brad Donohoo, Pramit Rajakrishna, Miguel Salas, Tejasi Pimpalkhute, Viney Ugave, Eric Jonardi, Yuhang Li, Srinivas Desai, Haneet Mahajan, Aditya Khune, Onkar Gulvani, Jiabao Jin, Manoj Kumar, Sai Kiran, Nanda Kumar, Taylo Santiago, Surya Vamsi Vemparala, Jingjie Zhu** 

#### **Jitter and security-aware automotive network design**

### **Vehicles are controlled by distributed, real-time embedded systems**

- Hundreds of embedded controllers/devices and millions of lines of code
- Connected by multiple, diverse network protocols
- **Challenge:**
	- Meet real-time computation and communication requirements
	- Prevent security breaches (tampering, snooping, …)
	- Support advanced driver-assistance systems (ADAS)

### **Advanced driver assistance system (ADAS) algorithms and prototyping**