XBOX-HQ.COM - ATI Multi-Threading Patent For Xenon?

BCossette Writes:

In our previous Differing Philosophies Emerge Between ATI and NVIDIA report we looked at comments from NVIDIA’s Chief Scientist, David Kirk, that mentioned his dislike of the idea of a unified Shader pipeline at the hardware level due to the differing demands and workloads between Pixel Shaders and Vertex Shaders. In the commentary David Kirk specifically singled out texturing as an example of the differing pipelines, and in replying to these comments ATI’s Eric Demers agreed that there are different demands on the pipelines but suggested that "if one were able to figure out a way to unify the shaders such that nothing extra is required to fulfil all the requirements of the different shaders, while being able to share all their commonality, that would be a great solution.". It has widely been expected that ATI’s hardware will move to a unified shading architecture at the hardware level for some time, not least because ATI themselves has made indications in such a direction, and a recent patent appears to be further evidence of this.

Filed on 29 September 2003, ATI patent "Multi-thread graphic processing system" was granted on 31 March 2005 and although deals primarily with a multi-threaded graphics command queuing system this gives further evidence of a unified shader pipeline and also gives some potential insights into Eric Demers comments. The patent describe one embodiment in which there are two command thread queue "Reservation Stations", one for pixel commands and another of vertex commands, these are both linked to an arbiter device that can distribute the instructions from either reservation station to either an Arithmetic Logic Unit (ALU) or a texture unit - thus distributing the specific command from each of the vertex and pixel shader command between the available processing unit types. The results from both the ALU or texture units can them be passed back to either the pixel or vertex reservation stations dependant on the resultant operations that are required. Effectively the command queues are storing multiple workload threads and storing them for execution until applicable processing elements become available.

On the face of it splitting texture units from the ALU’s appear to be what ATI’s Eric Demers was hinting at in his reply to David Kirk as effectively there is a dedicated pools of texture math processors and separate ALU processors for the actual "pixel shading" and "vertex shading" math operations. Such an arrangement would appear to answer many of the issues surrounding texture read latencies as this should be able to effectively interleave texture operations and available pixel and vertex shader instructions from other threads (that aren’t dependant on those texture reads) until the texture instruction results are performed and fed back into the command queue reservation stations, in an attempt to utilise the available math ALU processing capabilities as much of the time rather than suffering from texture latency stalls. A unified pipeline should also be able to effectively load balance the workload dependant on the number of specific pixel and vertex operations within command queues in order to ensure that stalls do not occur from one being dependant on the other, however if is not clear that the patent outline this particular aspect, unless its assumed that vertex instructions from the current thread that can be executed get priority over pixel instructions.

The patent itself is concerned with the queuing and execution of commands and itself only indicates a single ALU and texture unit, however we can extrapolate this to equate to multiple ALU’s and texture units and graphics performance can easily scale by having the arbiter unit have control over increasing numbers of texture units and ALU’s.

With the next generation DirectX WGF2.0 incarnation specifying a unified shader language for both vertex and pixel shaders within the API this type of command queuing system may be what ATI will be liking at for their WGF2.0 capable graphics processors. However, supposed leaked specifications for Microsoft’s next generation console, codenamed "Xenon", which has ATI produced the graphics at its core, have made suggestions like "The shader core has 48 Arithmetic Logic Units (ALUs) that can execute 64 simultaneous threads on groups of 64 vertices or pixels. ALUs are automatically and dynamically assigned to either pixel or vertex processing depending on load." which sounds like one of the elements this command queuing system is designed to address.

News-Source: http://www. Beyond3D.com