Here is a list of resources for CUDA programming, in particular, in C.
Basic
Perhaps the best beginning guide is written by Mark Harris, currently spot 10 articles. They start from simple HelloWorld-type of example. But goes deeper and deeper into important topic such as data transfer optimization, as well as shared memory. The final 3 articles focus on optimizing real-life applications such as matrix transpose and finite-difference method.
- An Easy Introduction to CUDA C and C++
- How to Implement Performance Metrics in CUDA C/C++
- How to Query Device Properties and Handle Errors in CUDA C/C++
- How to Optimize Data Transfers in CUDA C/C++
- How to Overlap Data Transfers in CUDA C/C++
- An Even Easier Introduction to CUDA
- Unified Memory for CUDA Beginners
- An Efficient Matrix Transpose in CUDA C/C++
- Finite Difference Methods in CUDA C/C++, Part 1
- Finite Difference Methods in CUDA C/C++, Part 2
Intermediate
A very important document on the internal of Nvidia chips as well as CUDA programming models would be CUDA C Programming Guide.
In version 9, the document has around 90 pages of content with the rest of 210 pages to be appendices. I found it very helpful to read through the content and look up the appendices from time to time.
The next document which is useful is CUDA Best Practice Guide. You will find a lot of performance tuning tips there in the guide.
If you want to profile a CUDA application, you must use nvprof and the Visual profiler, you can find their manuals here. Two other very good links to read are here and this one by Mark Harris.
If you want to read a very good textbook, consider to read “Professional CUDA C Programming” which I think is the best book on the topic. You will learn what the author called “profile-based programming” which is perhaps the best way to proceed in CUDA programming.
Others
CuBLAS: indispensible for linear algebra. The original Nvidia documentation is good. But you may also find this little gem on “cuBLAS by example” useful.