Copy the content here for fear that the link is invalide someday.
So I would imagine the API call I make most often is SetVertexShaderConstantF. :This handy dandy page, at the bottom, says some stuff about it. It's not really all that expensive. But I do it alot, so it adds up! And I've been thinking about optimizing.
My question is thus: would it be faster for me to change things around such that I set a bunch of constant registers all in one go, or would it be about the same as if I kept it is - setting them one at a time.
Example: Going from this:
pd3dDevice->SetVertexShaderConstant( 0, (float*)&( value1 ), 1 ); pd3dDevice->SetVertexShaderConstantF( 1, (float*)&( value2 ), 1 ); pd3dDevice->SetVertexShaderConstantF( 2, (float*)&( matrix1 ), 4 ); pd3dDevice->SetVertexShaderConstantF( 6, (float*)&( value3 ), 1 );
vec4 values; // copy value1/value2/matrix1/value3 into values pd3dDevice->SetVertexShaderConstantF( 0, (float*)values, 7 );
Any experience or thoughts? I have no idea if it even would be an optimization, and it's too much work to go ahead and implement it if it's not even going to help.
Less crossing of the API boundary is usually a good thing - you should get better performance, but the question remains as to how much better.
A fundamental rule with optimization - something I see overlooked by far too many people - is that you absolutely MUST have performance data to start with. Before you even consider optimization strategies you need to spend a lot of time working out where your time is really being spent - optimizing the worst offender is going to have a lot more benefit than some simple but cheap parts.
All I'm trying to get at is that seeing you call SetVertexShaderConstantF lots doesn't necessarily mean its a real performance bottleneck in your code. You need to measure it first, which also has the advantage of telling you whether your optimizations really did improve performance...
Get used to using PIX for Windows - its not great for benchmarking (last I used it the timing was a bit broken in this sense, and observation interferes with the measurements) but the call-stream capture can be invaluable.
Use the D3DPERF_BeginEvent() and D3DPERF_EndEvent() (or use one I prepared for you earlier [wink]) to make the call-stream easier to interpret.
Running a difference against two streams (before/after) can give you a good idea as to whether your new algorithms really are reducing the number of API calls for example.
Jollyjeffers' advice is good... you should test.
Back in DX8 is was well known the setting shader constants was one of the slower operations and batching was a huge win. With newer cards, and DX9, I have no idea if those performance concerns still exist. You best bet is to test it, or rely on known data.
The SDK has a page (click index, then "Accurately Profiling Direct3D API Calls") on how to profile D3D. The end of the article contains common timings for many D3D functions, and it appears that setting pixel shader constants can be extremely slow, while setting vertex shader constants are only somewhat slow. In other words, it looks like batching is still a good idea.
Well, I implemented it with my SM2.0 render pipeline, which is the main one. It took quite a while, but it looks like, at least in this particular scene, I'm seeing a 45% increase in performance.
I'm sure most of it is the batching of the pixel shader constants, since yes, they are quite a bit more expensive.
I also put in the ScopeProfiler thingy. It's quite useful, and will help a lot :D