Here is a list of resources for CUDA programming, in particular, in C.
Perhaps the best beginning guide is written by Mark Harris, currently spot 10 articles. They start from simple HelloWorld-type of example. But goes deeper and deeper into important topic such as data transfer optimization, as well as shared memory. The final 3 articles focus on optimizing real-life applications such as matrix transpose and finite-difference method.
- An Easy Introduction to CUDA C and C++
- How to Implement Performance Metrics in CUDA C/C++
- How to Query Device Properties and Handle Errors in CUDA C/C++
- How to Optimize Data Transfers in CUDA C/C++
- How to Overlap Data Transfers in CUDA C/C++
- An Even Easier Introduction to CUDA
- Unified Memory for CUDA Beginners
- An Efficient Matrix Transpose in CUDA C/C++
- Finite Difference Methods in CUDA C/C++, Part 1
- Finite Difference Methods in CUDA C/C++, Part 2
A very important document on the internal of Nvidia chips as well as CUDA programming models would be CUDA C Programming Guide.
In version 9, the document has around 90 pages of content with the rest of 210 pages to be appendices. I found it very helpful to read through the content and look up the appendices from time to time.
The next document which is useful is CUDA Best Practice Guide. You will find a lot of performance tuning tips there in the guide.