DUE Monday, May 18
The objective of this assignment is to implement an optimized reduction kernel and analyze basic architectural performance properties.
In the first step, you will implement an optimized reduction kernel with optimized thread indexing. Recall that we discussed two types of reduction in class: naive and optimized. The naive reduction kernel implementation suffers from significant warp divergence due to naive thread indexing. The optimized version avoids a significant number of warp divergence. The goal is to have the thread indexing behave as shown in slide 33 of the Reduction slides.
For this lab, we will be using Github Classroom.
Please join the classroom by clicking the following link:
https://classroom.github.com/a/GSLjLGtI
Once you join the classroom, a private github repository will automatically be created with the starter code.
Simply git clone
to copy the starter code to Bender.
kernel.cu
, main.cu
, Makefile
, support.cu
, support.h
kernel.cu
. There should be no changes necessary for main.cu
. Assume we run reduction with an input size of 1,000,000 Note that some of these questions are conceptual and can be answered without the programming assignment.
kernel.cu
.)report.pdf
or report.txt
or report.docx
, etc.
Please include your name in the report.