Rendertarget changes in XNA Game Studio 4.0

We made several changes to the rendertarget API in Game Studio 4.0, all with the goal of increasing usability and reducing error.

The most common cause of confusion is probably the RenderTargetUsage.DiscardContents behavior, but this is one thing we did not change. PreserveContents mode is just too slow on Xbox, and even slower on phone hardware, which typically uses some variant of tiled or binned rendering, and thus has the same preference for discard behavior as Xbox but with less memory bandwidth for the extra buffer copies if you do request preserve mode.

Making our API simple is well and good, but not if that is going to cost enormous amounts of performance! So discard mode rendertarget semantics are here to stay. Learn em, love em, live with em :-)

Here are the things we did change:

Has-a versus Is-a

I often see people attempt something like:

    RenderTarget2D rt = new RenderTarget2D(...);
    List<Texture2D> textures = new List<Texture2D>();

    // Prerender animation frames
    for (int i = 0; i < 100; i++)
    {
        GraphicsDevice.SetRenderTarget(0, rt);
        DrawCharacterAnimationFrame(i);
        GraphicsDevice.SetRenderTarget(0, null);

        textures.Add(rt.GetTexture());
    }

This doesn’t work, because GetTexture returns an alias for the same surface memory as the rendertarget itself, rather than a separate copy of the data, so each drawing operation will replace the contents of all previously created textures. But these semantics are not all obvious from the API! GetTexture returns a reference to shared data, but the API makes it look like this could return a copy.

This is the classic has-a versus is-a distinction. Rendertargets are a special kind of texture, but our API made it look like they just had associated textures, or perhaps could be converted into textures.

We fixed this by removing the GetTexture method, and instead having RenderTarget2D inherit directly from Texture2D (and RenderTargetCube from TextureCube). It is harder to get these semantics wrong with the 4.0 API:

    List<Texture2D> textures = new List<Texture2D>();

    for (int i = 0; i < 100; i++)
    {
        RenderTarget2D rt = new RenderTarget2D(...);

        GraphicsDevice.SetRenderTarget(rt);
        DrawCharacterAnimationFrame(i);
        GraphicsDevice.SetRenderTarget(null);

        textures.Add(rt);
    }

Atomicity

How do you un-set a rendertarget? In previous versions of Game Studio we would often write:

    GraphicsDevice.SetRenderTarget(0, null);

That mostly works, but after using multiple rendertargets we must use this more complex version:

    for (int i = 0; i < HoweverManyRenderTargetsIJustUsed; i++)
    {
        GraphicsDevice.SetRenderTarget(i, null);
    }

Ugly, not to mention error prone if the un-set code does not loop enough times.

In Game Studio 4.0, we made SetRenderTarget an atomic method, so it always sets all the possible rendertargets at the same time. This call will always un-set all rendertargets, no matter how many were previously bound:

    GraphicsDevice.SetRenderTarget(null);

To set a single rendertarget, you no longer need to specify an index:

    GraphicsDevice.SetRenderTarget(renderTarget);

If multiple rendertargets were previously bound, this will change the first one to the specified value, then un-set the others.

To set multiple rendertargets (which is a HiDef feature, so not supported in the CTP), specify them all at the same time:

    GraphicsDevice.SetRenderTargets(diffuseRt, normalRt, depthRt);

That is a shortcut for this more flexible but verbose equivalent:

    RenderTargetBinding[] bindings =
    {
        new RenderTargetBinding(diffuseRt),
        new RenderTargetBinding(normalRt),
        new RenderTargetBinding(depthRt),
    };

    GraphicsDevice.SetRenderTargets(bindings);

Making the set call atomic has two main benefits:

It reduces the chance of accidentally forgetting to unset multiple rendertargets
It makes our validation code more efficient, as we now have a single place to validate MRT rules such as all surfaces being the same size and bit depth. Previously, SetRenderTarget had no way to know when it had the final state, or whether other calls were about to change rendertargets on different indices, so it had to just set a dirty flag which clued the next draw operation to validate and commit the new surfaces. This added a small but measurable overhead to all draw operations (even when MRT was not in use), which is no longer necessary now these operations are atomic.

Declarative depth

Our bloom sample contains a subtle bug in this line:

    renderTarget1 = new RenderTarget2D(GraphicsDevice, width, height, 1, format);

The problem is that when we later draw to this rendertarget, we do not explicitly un-set the depth buffer. Even though we are not using depth while rendering the bloom postprocess, the default depth buffer is still bound to the device, so must be compatible with the rendertarget we are using.

If you change the bloom sample by turning on multisampling, the default depth buffer will be multisampled, but the bloom rendertarget will not, so the two are no longer compatible and rendering will fail.

We could fix this by changing the bloom rendertarget to use the same multisample format as the backbuffer, or we could explicitly un-set the depth buffer before drawing bloom:

    DepthStencilBuffer previousDepth = GraphicsDevice.DepthStencilBuffer;
    GraphicsDevice.DepthStencilBuffer = null;

    DrawBloom();

    GraphicsDevice.DepthStencilBuffer = previousDepth;

This is ugly and far from obvious. We forgot to put this code in our sample, and I see other people making the same mistake all the time!

The more we thought about this, we realized some things:

Any time you change rendertarget without also changing depth buffer, that is almost certainly a bug waiting to bite.
Any time you un-set from a rendertarget to the backbuffer without also resetting the depth buffer, that’s another bug.
Many rendertargets are used in ways that do not actually require a depth buffer. The correct thing to do here is set the depth buffer to null. If you forget to do that, things will often still work, but can fail in subtle and confusing ways.
For rendertargets that do require a depth buffer, it can be a pain making sure your depth buffer has the same size and multisample format as the rendertarget. The XNA framework had much code dedicated to validating these rules, and I often see people getting this wrong.
Our DepthStencilBuffer class had no interesting methods or properties. In fact, the only thing you could do with it was to set it onto the graphics device, which always happened at the same time as setting a rendertarget.

We decided the DepthStencilBuffer class was so useless, we should get rid of it entirely! Instead, the depth format is now specified as part of each rendertarget. If I call:

    new RenderTarget2D(device, width, height);

I get a rendertarget with no associated depth buffer. If I want to use a depth buffer while drawing into my rendertarget, I use this constructor overload:

    new RenderTarget2D(device, width, height, false, SurfaceFormat.Color, DepthFormat.Depth24Stencil8);

Note: I could specify DepthFormat.None to use the full overload but get no depth buffer.

Note: when using MRT, the depth format is controlled by the first rendertarget.

With this design, many previously common errors become impossible:

The depth buffer cannot fail to match the rendertarget size and multisample format, so we don’t even need to bother validating this (which speeds up all rendering).
When you are doing 2D work like the bloom sample, and not thinking about depth buffers at all, the device is automatically set to use null depth.
Whenever you un-set a rendertarget, the default depth buffer is automatically restored. No need to explicitly save and restore means no chance of getting that wrong!

Several of you expressed concern that this design could lead to wasted memory, as you can no longer share a single depth buffer between many rendertargets.

Not at all! The key shift here is from an imperative API, where you explicitly create depth buffer objects, manage their lifespan, and tell us which one to use at what times, to a declarative API, where you tell us what depth format you want to use, and we figure out how best to make that happen.

The two important pieces of information you need to provide are:

Do I want a depth buffer when using this rendertarget? If so, what format?
Do I want to be able to go back to this buffer later and continue drawing over it (RenderTargetUsage.PreserveContents) or am I only interested in the final texture image? (RenderTargetUsage.DiscardContents)

Armed with this data, we can choose the appropriate implementation strategy for different situations:

On Xbox, there isn’t really such a thing as a depth buffer in the first place: it’s actually all just a small piece of shared EDRAM, plus backing store if you request PreserveContents mode. So this design takes less memory than what we had before, as we no longer need to jump through hoops to give the illusion that these are real objects with actual memory of their own.
On Windows, if you request PreserveContents mode, we allocate a separate depth buffer per rendertarget.
On Windows, if you use the default DiscardContents mode, we can be smarter, and can do things like automatically sharing a single native depth buffer between many rendertargets (as long as they all have the same size and multisample format).

Honesty compels me to admit that we haven’t actually implemented this sharing optimization yet. It’s currently on the schedule for 4.0 RTM, but things can always change, so please don’t beat me up too hard if we for some reason fail to get that part done in time :-)

Blog index - Back to my homepage

Rendertarget changes in XNA Game Studio 4.0

Originally posted to Shawn Hargreaves Blog on MSDN, Friday, March 26, 2010

Has-a versus Is-a

Atomicity

Declarative depth