Below is a  list of my current research projects. If you want to learn more about  these projects please click in the corresponding link below or  feel free to contact me.

  • OmpCloud  (The Cloud as an Offloading Device) The large number of cores has turned the cloud into a powerful computing system. On the other hand, programming these cores in far from trivial. OmpCloud enables programmers to easily offload parallel computation to the cloud as if it is a device directly connected to the local computer. OmpCloud automatically moves data to cloud storage, converts the annotated code to a set of map-reduce operations and parallelizes them into cloud cores. It is currently compatible with Amazon AWS and being adapted   to Microsoft Azure.
  • AClang (Compiling for Parallelism)  is a LLVM based  compiler/runtime capable of automatically converting OpenMP 4.X annotated code into kernels of acceleration devices.  In this project we are currently studying techniques for: (a) extracting parallelism from loops and mapping them to GPUs in mobile devices; and (b) performing data coherence analysis and optimization to keep CPU and GPU data coherent.
  • STREAM (ARM Profiling for Performance The goal of this project is to design a dynamic profiling mechanism for ARM architectures. Assume, for example, that the user wants to measure the execution time or energy consumed by a function. To do that, STREAM  is activated  as the application enters that particular function; it will also enable the user to dynamically change the desired function without stopping the application.
  • HardCloud (Cloud Hardware Acceleration)  The computing industry has recently proposed the usage of  FPGAs as a way to improve energy efficiency in modern cloud clusters. As a follow-up of a previous research project (ChameLeon) we are interested in investigating novel mechanisms to offload computation to  FPGAs available in the  Intel HARP2 and Microsoft Catapult architecture clusters.
  • MTSP  (Multicore Task Scheduling Platform) is a small, lightweight research runtime aimed at extracting task parallelism by means of OpenMP 4.X annotations. Guided by an extensive profiling study, it has been designed for flexibility and easy of use. Its open API enables runtime designers to fast prototype new task stealing polices and workload balance heuristics.
  • NUMA  (Workload Balancing in NUMA Architectures) Most of modern many-core machines are NUMA architectures, i.e. distributed memory  architectures in which the cores are divided into nodes, each containing a local memory. As a result the  application data is spread across different memory nodes, and a thread sometimes  access data from a local (fast) memory and sometimes data from a remote (slow) memory. The goal of this project is to assign threads and data pages of the parallel application so as to improved its workload balancing.
  • PhTM (Phase Transactional Memory) Transactional Memory  (TM) is a new programming paradigm in which program regions, called transactions, are speculatively run in parallel. In the past we have studied  new scheduling algorithms that can be used to improve transaction parallelism. We are currently researching novel combinations of Software TM and Hardware TM to improve program performance.
  • PUFs (Physical Unclonable Functions) are digital circuits which use the statistical variations of manufacturing VLSI process (e.g. RC delay) to safely store keys and improve the security of cryptographic mechanisms. We are researching new PUF architectures which could be used in the  design of low-cost  authentication protocolos of future IoT devices (e.g. RFIDs).
  • DISCLAIMER: This is a personal page and not an official UNICAMP page. Its contents are of entire responsibility of Guido Araujo.