Rendertarget changes in XNA Game Studio 4.0

Originally posted to Shawn Hargreaves Blog on MSDN, Friday, March 26, 2010

We made several changes to the rendertarget API in Game Studio 4.0, all with the goal of increasing usability and reducing error.

The most common cause of confusion is probably the RenderTargetUsage.DiscardContents behavior, but this is one thing we did not change. PreserveContents mode is just too slow on Xbox, and even slower on phone hardware, which typically uses some variant of tiled or binned rendering, and thus has the same preference for discard behavior as Xbox but with less memory bandwidth for the extra buffer copies if you do request preserve mode.

Making our API simple is well and good, but not if that is going to cost enormous amounts of performance! So discard mode rendertarget semantics are here to stay. Learn em, love em, live with em :-)

Here are the things we did change:

 

Has-a versus Is-a

I often see people attempt something like:

    RenderTarget2D rt = new RenderTarget2D(...);
    List<Texture2D> textures = new List<Texture2D>();

    // Prerender animation frames
    for (int i = 0; i < 100; i++)
    {
        GraphicsDevice.SetRenderTarget(0, rt);
        DrawCharacterAnimationFrame(i);
        GraphicsDevice.SetRenderTarget(0, null);

        textures.Add(rt.GetTexture());
    }

This doesn’t work, because GetTexture returns an alias for the same surface memory as the rendertarget itself, rather than a separate copy of the data, so each drawing operation will replace the contents of all previously created textures. But these semantics are not all obvious from the API! GetTexture returns a reference to shared data, but the API makes it look like this could return a copy.

This is the classic has-a versus is-a distinction. Rendertargets are a special kind of texture, but our API made it look like they just had associated textures, or perhaps could be converted into textures.

We fixed this by removing the GetTexture method, and instead having RenderTarget2D inherit directly from Texture2D (and RenderTargetCube from TextureCube). It is harder to get these semantics wrong with the 4.0 API:

    List<Texture2D> textures = new List<Texture2D>();

    for (int i = 0; i < 100; i++)
    {
        RenderTarget2D rt = new RenderTarget2D(...);

        GraphicsDevice.SetRenderTarget(rt);
        DrawCharacterAnimationFrame(i);
        GraphicsDevice.SetRenderTarget(null);

        textures.Add(rt);
    }

 

Atomicity

How do you un-set a rendertarget? In previous versions of Game Studio we would often write:

    GraphicsDevice.SetRenderTarget(0, null);

That mostly works, but after using multiple rendertargets we must use this more complex version:

    for (int i = 0; i < HoweverManyRenderTargetsIJustUsed; i++)
    {
        GraphicsDevice.SetRenderTarget(i, null);
    }

Ugly, not to mention error prone if the un-set code does not loop enough times.

In Game Studio 4.0, we made SetRenderTarget an atomic method, so it always sets all the possible rendertargets at the same time. This call will always un-set all rendertargets, no matter how many were previously bound:

    GraphicsDevice.SetRenderTarget(null);

To set a single rendertarget, you no longer need to specify an index:

    GraphicsDevice.SetRenderTarget(renderTarget);

If multiple rendertargets were previously bound, this will change the first one to the specified value, then un-set the others.

To set multiple rendertargets (which is a HiDef feature, so not supported in the CTP), specify them all at the same time:

    GraphicsDevice.SetRenderTargets(diffuseRt, normalRt, depthRt);

That is a shortcut for this more flexible but verbose equivalent:

    RenderTargetBinding[] bindings =
    {
        new RenderTargetBinding(diffuseRt),
        new RenderTargetBinding(normalRt),
        new RenderTargetBinding(depthRt),
    };

    GraphicsDevice.SetRenderTargets(bindings);

Making the set call atomic has two main benefits:

 

Declarative depth

Our bloom sample contains a subtle bug in this line:

    renderTarget1 = new RenderTarget2D(GraphicsDevice, width, height, 1, format);

The problem is that when we later draw to this rendertarget, we do not explicitly un-set the depth buffer. Even though we are not using depth while rendering the bloom postprocess, the default depth buffer is still bound to the device, so must be compatible with the rendertarget we are using.

If you change the bloom sample by turning on multisampling, the default depth buffer will be multisampled, but the bloom rendertarget will not, so the two are no longer compatible and rendering will fail.

We could fix this by changing the bloom rendertarget to use the same multisample format as the backbuffer, or we could explicitly un-set the depth buffer before drawing bloom:

    DepthStencilBuffer previousDepth = GraphicsDevice.DepthStencilBuffer;
    GraphicsDevice.DepthStencilBuffer = null;

    DrawBloom();

    GraphicsDevice.DepthStencilBuffer = previousDepth;

This is ugly and far from obvious. We forgot to put this code in our sample, and I see other people making the same mistake all the time!

The more we thought about this, we realized some things:

We decided the DepthStencilBuffer class was so useless, we should get rid of it entirely! Instead, the depth format is now specified as part of each rendertarget. If I call:

    new RenderTarget2D(device, width, height);

I get a rendertarget with no associated depth buffer. If I want to use a depth buffer while drawing into my rendertarget, I use this constructor overload:

    new RenderTarget2D(device, width, height, false, SurfaceFormat.Color, DepthFormat.Depth24Stencil8);

Note: I could specify DepthFormat.None to use the full overload but get no depth buffer.

Note: when using MRT, the depth format is controlled by the first rendertarget.

With this design, many previously common errors become impossible:

Several of you expressed concern that this design could lead to wasted memory, as you can no longer share a single depth buffer between many rendertargets.

Not at all! The key shift here is from an imperative API, where you explicitly create depth buffer objects, manage their lifespan, and tell us which one to use at what times, to a declarative API, where you tell us what depth format you want to use, and we figure out how best to make that happen.

The two important pieces of information you need to provide are:

Armed with this data, we can choose the appropriate implementation strategy for different situations:

Honesty compels me to admit that we haven’t actually implemented this sharing optimization yet. It’s currently on the schedule for 4.0 RTM, but things can always change, so please don’t beat me up too hard if we for some reason fail to get that part done in time :-)

Blog index   -   Back to my homepage