Reply to thread

Message: <blockquote data-quote="Core" data-source="post: 6785218" data-attributes="member: 263471">Geometry NVIDIA’s goal is for GF100 to enable film-like geometric realism for   game characters and objects. Geometric realism is central to the GF100   architectural enhancements for graphics. In addition, PhysX simulations   are faster and developers can utilize GPU computing features in games   more easily and effectively. While programmable shading has allowed PC games to mimic the cinema  in  per-pixel effects, geometric realism is way behind. The most advanced   modern PC games will use one to two million polygons per frame whereas a   typical frame in a computer generated film uses hundreds of millions  of  polygons. While the number of pixel shaders has grown from one to  many  hundreds, the triangle setup engine has remained a singular unit.  For  example, the GeForce GTX 285 has more than 150 times the shading   horsepower of the old GeForce FX, but less than 3 times the geometry   processing rate. This means that pixels are shaded well but their   geometric detail is weak. Take a look at NVIDIA’s example from Far Cry 2. The holster  has  a heavily segmented strap. The corrugated roof is just a flat  surface  with a striped texture instead of curving properly. We also note  that  this character wears a hat to avoid the complexity of rendering  hair.<a href="http://alienbabeltech.com/main/wp-content/uploads/2010/01/GamesLackGeometry.jpg" target="_blank"><img src="http://alienbabeltech.com/main/wp-content/uploads/2010/01/GamesLackGeometry_thumb.jpg" alt="" class="fr-fic fr-dii fr-draggable " style="" /></a>On the other  hand, the  exquisitely detailed characters in CG films are made possible  by  tessellation and displacement mapping. Tessellation refines large   triangles into collections of smaller triangles, while displacement   mapping changes their relative position. To achieve these same goals,   GF100’s entire graphics pipeline is designed to deliver higher   performance in tessellation and geometry throughput. GF100 replaces the traditional geometry processing architecture at  the  front end of the graphics pipeline with an entirely new distributed   geometry processing architecture that is implemented using multiple   “PolyMorph Engines”. Each of these engine includes a tessellation unit,   an attribute setup unit, and other geometry processing units. Each SM   has its own dedicated PolyMorph Engine as shown by the three grouped   diagrams that we showed you earlier (above). Newly generated primitives are converted to pixels by four Raster   Engines that operate in parallel compared to a single Raster Engine in   GT200 and in earlier GPUs. On-chip L1 and L2 caches now enable high   bandwidth transfer of primitive attributes between the SM and the   tessellation unit as well as between different SMs. Tessellation and all   its supporting stages are performed in parallel on GF100 with improved   geometry throughput. GF100‘s ability to perform parallel geometry   processing is possibly the single most important GF100 architectural   improvement. The ability to deliver setup rates exceeding one primitive   per clock while maintaining correct rendering order is a significant   technical achievement. Major compute features improved on GF100 that will be useful in games   include faster context switching between graphics and PhysX, concurrent   compute kernel execution and an enhanced caching architecture which is   good for irregular algorithms such as ray tracing, and AI.   Simultaneously, improved atomic operations performance allows threads to   safely cooperate through work queues, accelerating novel rendering   algorithms. For example, fast atomic operations allow transparent   objects to be rendered without presorting (order independent   transparency) enabling developers to create levels with complex glass   environments. GF100’s GigaThread engine reduces context switch time,   making it possible to execute multiple compute and physics kernels for   each frame.  –~~~~~~~~~~~~–Tessellation  and Displacement Mapping It takes DX11 to take advantage of geometry. DX9 and DX10 are unable   to create generalized geometry on the GPU. Therefore we will see   Tessellation and displacement mapping used together to create more   realism in games. The ability to control the geometric level of detail   (LOD) is very important. Because it is on-demand and the data is all   kept on-chip, precious memory bandwidth is preserved. Also, because one   model may produce many LODs, the same game assets may be used on a   variety of platforms which makes the game developers very happy. Their   characters can also be easily adjusted as to how it appears in the   scene; if it is small then it gets little geometry, if it is close to   the screen then it is rendered with greater detail. As an additional benefit, developers may be able to use the same   models on many generations of games and future GPUs where performance   increases will allow for enabling even greater detail than was possible   when the game was first released. Complexity can be adjusted  dynamically  to even target a given frame rate! Here in NVIDIA’s slide from the Unigine engine demo, we see   tessellation compared, on and off. There is no comparison; tessellation   adds to realism.<a href="http://alienbabeltech.com/main/wp-content/uploads/2010/01/Tesselation.jpg" target="_blank"><img src="http://alienbabeltech.com/main/wp-content/uploads/2010/01/Tesselation_thumb.jpg" alt="" class="fr-fic fr-dii fr-draggable " style="" /></a>Take a look at the third image that we  presented earlier in this  article. The use of tessellation  fundamentally changes the GPU’s  graphics workload balance. With  tessellation, the triangle density of a  given frame can increase by  multiple orders of magnitude which strains  serial resources such as the  setup and rasterization units. To  facilitate high triangle rates,  NVIDIA designed a scalable geometry  engine called the PolyMorph Engine.  Each of GF100’s 16 PolyMorph engines  has its own dedicated vertex  fetch unit and a tessellator which expands  geometry performance. In conjunction with the PolyMorph Engine, NVIDIA designed four   parallel Raster Engines which allows up to four triangles to be setup   per clock. Results calculated in each of five stages which are then   passed to an SM. The SM executes the game’s shader, returning the   results to the next stage in the PolyMorph Engine. After all stages are   complete, the results are forwarded to one of the four Raster Engines. The Rasterizer takes the edge equations for each primitive and   computes pixel coverage. If antialiasing is enabled, coverage is   performed for each multisample and coverage sample. Each Rasterizer   outputs eight pixels per clock for a total of 32 rasterized pixels per   clock across the chip. Pixels produced by the rasterizer are sent to the   Z-cull unit. By having a dedicated tessellator for each SM, and a   Raster Engine for each GPC, GF100 delivers up to 8 times the geometry   performance of GT200. NVIDIA also compares the geometry performance of   GF100 to HD 5870 and finds Fermi is significantly faster. Here is a performance comparison between GF100 and HD 5870 using a 60   second run with the Unigine engine:<a href="http://alienbabeltech.com/main/wp-content/uploads/2010/01/60SecUnigine.jpg" target="_blank"><img src="http://alienbabeltech.com/main/wp-content/uploads/2010/01/60SecUnigine_thumb.jpg" alt="" class="fr-fic fr-dii fr-draggable " style="" /></a>  –~~~~~~~~~~~~–Anti-Aliasing   Image Quality To improve anti-aliasing image quality, the GF100 introduces a new   anti-aliasing mode: 32xCSAA. nVidia’s previous strongest edge AA mode   was 16xQ, but this is now bested by 32xAA. Here’s the sample pattern for   it, courtesy of nVidia:<img src="http://alienbabeltech.com/main/wp-content/uploads/2010/01/32.png" alt="" class="fr-fic fr-dii fr-draggable " style="" />32xAA = 8xMSAA + 24xCSAA. Thus 32xCSAA is a natural extension of 16xQ, and offers even stronger   edge (polygon) anti-aliasing, courtesy of providing a total of 32   unique samples But that’s not all that has improved. The GF100 has a new ability to   use coverage samples to affect the quality of alpha textures, as   implemented through transparency anti-aliasing. With previous nVidia   hardware such as the GT200, coverage samples had no effect on   transparency anti-aliasing quality, as the result was derived solely   from the base multi-sampling pattern in effect. Also in the specific case of transparency multi-sampling, image   quality has improved there too. Any titles using the older alpha test   method to render transparent textures have their shader code   automatically converted to use the alpha-to-cover technique, which   should greatly improve image quality, especially in heavily aliased   areas. The upshot of this is higher quality edges, and higher quality alpha   textures.  Anti-Aliasing Performance In addition to improving image quality, anti-aliasing performance has   also increased. When it comes to AA, the most obvious area to target is   the ROPs, and that’s exactly what nVidia has done. The GF100 has 48 ROPs, up from 32  ROPs on the GTX285, which  is especially helpful for portions of the  scene that cannot be  compressed. Each ROP is also faster and more efficient than on previous   generations, so it can do more work per cycle. This includes   improvements made to the compression technology.<img src="http://alienbabeltech.com/main/wp-content/uploads/2010/01/25a.png" alt="" class="fr-fic fr-dii fr-draggable " style="" />Aside from better AA performance in general,  nVidia’s old Achilles  heel with 8xMSAA performance should also be  addressed by the  improvements. Historically, prior nVidia architectures  have exhibited  much higher relative performance hit when going from  4xMSAA to 8xMSAA,  compared to competing ATi architectures. Also by moving to 384 bit GDDR5, nVidia should have access to plenty   of memory bandwidth to keep all of those ROPs fed with data.  –~~~~~~~~~~~~–Texture   Filtering As with anti-aliasing, there have been improvements made to texturing   too. Interestingly the GF100 only has 64  TMUs, which is much less than the 80 TMUs  on the GTX285, but nVidia claims  overall performance should still be  higher because of improvements to  performance and efficiency. Texture caching has been substantially improved, with the L1 cache   being redesigned for greater efficiency. Also the presence of a unified   L2 cache means the texture cache size is three times higher than on the   GT200. Layout changes and internal improvements to the texture units also   combine with a higher TMU clock speed. On the GT200 the TMUs ran at the   GPU’s core clock; on the GF100 they run at a higher clock, which allows   them to perform more work in the same amount of time. nVidia’s numbers   show 40% to 70% higher texturing performance than the GT200, despite   having much fewer TMUs.<img src="http://alienbabeltech.com/main/wp-content/uploads/2010/01/25b.png" alt="" class="fr-fic fr-dii fr-draggable " style="" />The GF100’s texture units also offer hardware  accelerated jittered  sampling. This essentially means the hardware has  the ability to offer a  form of stochastic filtering by varying the  texture sampling on a  per-pixel basis. This is done by implementing  DirectX 11’s Gather4 in  hardware, and it provides the ability for up to  four texels to be  fetched from a 128×128 pixel grid with a single  instruction.<a href="http://alienbabeltech.com/main/wp-content/uploads/2010/01/29.png" target="_blank"><img src="http://alienbabeltech.com/main/wp-content/uploads/2010/01/29-150x150.png" alt="" class="fr-fic fr-dii fr-draggable " style="" /></a>This not only improves performance with  things like ambient  occlusion, but it can also improve image quality by  removing banding  through random sampling. It also allows game  developers to implement  customized texture filtering more efficiently.  nVidia states that the  GF100’s hardware implementation of this  technique offers up to twice the  performance of the GT200.  –~~~~~~~~~~~~–Compute  Architecture The compute engine is designed to handle the GPGPU side of things and   encapsulates features such as CUDA, OpenCL, Direct Compute, and PhysX.   Many of these have been around since the G80 days, but the GF100   delivers a number of improvements to make such general purpose computing   run better. The GF100 is designed to handle a wider range of algorithms better to   encourage the use of the GPU more for parallel problems. One key area   of improvement comes from its better cache system, which allows threads   that access the same memory locations to run faster. Another key improvement allows the GF100 to execute multiple task   kernels at once, and the context switching between such tasks is much   faster than on previous GPUs. This differs from the GT200 which could   only run one task kernel at a time, and had very slow context switching. And lastly, high level features such as debugging and a C++   programming environment to access GPGPU features are made possible with   nVidia’s Nexus plug-in  for Visual Studio.  Such features simplify programming GPGPU tasks as  they assist  developers to work at a higher level than was previously  possible.  Ray Tracing<a href="http://alienbabeltech.com/main/wp-content/uploads/2010/01/RT.jpg" target="_blank"><img src="http://alienbabeltech.com/main/wp-content/uploads/2010/01/RT_thumb.jpg" alt="" class="fr-fic fr-dii fr-draggable " style="" /></a>The GF100 will not be able to do complex  ray tracing (RT) in real  time in PC games as in the above image.  However, NVIDIA believes that RT  is the future of graphics and they  expect some implementation of it in  conjunction with rasterization  fairly soon as developers begin to take  advantage of GF100’s new  programming capabilities.  –~~~~~~~~~~~~–Conclusion It’s clear that nVidia has invested a lot of resources and design   effort into trying to make the GF100 the fastest single GPU to date. In   addition to a very clear focus on improving GPGPU performance and   usability, numerous enhancements to image quality and performance for   gaming purposes have also been made. It’ll be very interesting to see how the card performs in actual   gaming situations, and more importantly, how it compares to ATi’s   current single GPU flagship, the Radeon 5870. We are looking forward to bringing our readers the latest news about   the Fermi GF100 and we will be testing its performance and image quality   in gaming. There is much more to be revealed about NVIDIA’s new GPU.   Stay tuned.  The graphics wars are heating up and it is getting very   interesting again.<a href="http://alienbabeltech.com/main/wp-content/uploads/2010/01/Turbulence.jpg" target="_blank"><img src="http://alienbabeltech.com/main/wp-content/uploads/2010/01/Turbulence_thumb.jpg" alt="" class="fr-fic fr-dii fr-draggable " style="" /></a> Article written by Mark  Poppin and  BFG10K,  AlienBabelTech Senior  Editors.Reference  by apoppin  on Jan.17, 2010, under ABTnews,  Articles,  Technology</blockquote>

Verification: Dahaya deken beduwama keeyada?

Top Bottom