HLSL Test Suite
If we're going to have an HLSLCompiler, we need a test suite for it.
It will eventually be big and gnarly, like the gcc test suite, but for now we'd be happy with a really simple test that just verified that the primitive functions in HLSL behave as expected. This sounds a lot like what's called GPGPU.
Eventually, we'll want to write it in C, but initially C++ (compiled by MSVC) would be fine as a prototype. See http://kegel.com/wine/cl-howto-win7sdk.html for how to download and install Visual C++ and the DirectX SDK for use on Wine.
We should look at existing test suites like the Piglit OpenGL test suite ( http://people.freedesktop.org/~nh/piglit/ ) and re-use any good ideas they have.
GPGPU tutorials
http://www.mathematik.uni-dortmund.de/~goeddeke/gpgpu/tutorial.html is a great tutorial on how to do all this in OpenGL, and it points to the demo at http://wwwcg.in.tum.de/Research/Publications/LinAlg as an example of how to do it with D3D. Sure enough, that source code uses GetSurfaceLevel() and GetRenderTarget()...
http://jorik.sourceforge.net/ looks like a much simpler example.
Notes
http://www.ibiblio.org/harrism/phpBB2/viewtopic.php?p=3589&sid=0b45e9d019ee8abdda3516a5aae3b4f7 mentions one way to get data out of the gpu is
Get a surface A from a RenderTarget Texture (GetSurfaceLevel()).
- Create an auxiliar SYSTEMMEM surface B (RAM surface).
Copy Data from RT surface to RAM surface (GetRenderTargetData ()).
- Lock RAM surface and read data.
- ...
- Release/Dispose all resources
and a Brook developer mentions that's how they do it.
RenderTargets are mentioned in many places, e.g.
http://www.shaderx2.com/shaderx.PDF is an early paper describing how to get data into and out of the GPU to do matrix math.
Searching for "D3DFMT_R32F gpgpu" dredged up the informative threads
http://www.eggheadcafe.com/forumarchives/win32programmerdirectxgraphics/Dec2005/post24849349.asp
http://www.bokebb.com/dev/english/1999/posts/199917661.shtml
GPGPU Sanity Check
Here's how an initial shader sanity test might look for, say, HLSL's cos() function:
Write a C++ function cosine_cpu(float *p, int n) that takes an array of floats and replaces each float in the array with its cosine using the plain old C++ function cos(). Then write a second C++ function cosine_gpu(float *p, int n) that does the same thing, but instead of just calling cos(), it sends the array to a function written in D3D9c's HLSL language on the GPU.
Then write a program that compares the output of those two functions to see how closely the GPU's cos() function's behavior matches that of the main CPU's cos() function. Declare failure if the mean square error is "too high".
We'd want to do this with several formats, including 32 bit floats, i.e. D3DFMT_R32F.
And it's not clear that comparing against the host's cos() function is the right thing to do.
