Exercises
Implement your own custom random walk CUDA kernel which performs similarly to the original fused implementation. See how close you can get to the similar performance. Hints: Try to use different random number generation (from CURAND).
Take some of your existing code and port it to the GPU using the inbuilt functions, and try to write a custom kernel for the more complicated operations.