We developed a PE compiler to automatically map a physical model into a PE network and provide a design space.
Following figure shows the results (for five different physiology models) of the network of processing elements compare to other common approaches. Results show the simulation runtime of Euler solver at 0.01 ms step. The networks of PEs are fully synthesized and implemented on a Xilinx Virtex6 240T-2 FPGA. (note 1000 ms is the real-time constraint).
PC: C code on a 3.06 GHz Intel I7-950 quad-core processor with 16G DDR3 RAM, compiled with Microsoft VS2010 with –O3 flag. (PC(1): single thread, PC(4): multi-thread, estimated with PC(1) / 4)
GPU: CUDA C code on a 763 MHz NVIDIA GTX460 Fermi GPU with 336 CUDA cores, compiled using nvcc with –O3 flag