Search
Search titles only
By:
Search titles only
By:
Log in
Register
Search
Search titles only
By:
Search titles only
By:
Menu
Install the app
Install
Forums
New posts
All threads
Latest threads
New posts
Trending threads
Trending
Search forums
What's new
New posts
New ads
New profile posts
Latest activity
Free Ads
Latest reviews
Search ads
Members
Current visitors
New profile posts
Search profile posts
Contact us
Latest ads
Ad icon
Sell your Land, House on idamata.lk for FREE
sajith.xp.pk
Updated:
Yesterday at 9:03 AM
Handmade Character Soft Toys
anil1961
Updated:
Tuesday at 2:11 PM
Bodim.lk out now !
Manoj Suranga Bandara
Updated:
Sunday at 3:05 AM
Power Lifting Lever Belt
SkullVamp
Updated:
Jun 13, 2026
Ad icon
port.lk Domain for sale
Lankan-Tech
Updated:
Jun 13, 2026
Electronics
Vehicles
Property
Search
Reply to thread
Forums
General
ElaKiri Talk!
A Good News .!!!
Get the App
JavaScript is disabled. For a better experience, please enable JavaScript in your browser before proceeding.
You are using an out of date browser. It may not display this or other websites correctly.
You should upgrade or use an
alternative browser
.
Message
<blockquote data-quote="Core" data-source="post: 6785218" data-attributes="member: 263471"><p><strong><span style="color: #99ccff"><u>Geometry</u></span></strong></p><p> NVIDIA’s goal is for GF100 to enable film-like geometric realism for game characters and objects. Geometric realism is central to the GF100 architectural enhancements for graphics. In addition, PhysX simulations are faster and developers can utilize GPU computing features in games more easily and effectively.</p><p> While programmable shading has allowed PC games to mimic the cinema in per-pixel effects, geometric realism is way behind. The most advanced modern PC games will use one to two million polygons per frame whereas a typical frame in a computer generated film uses hundreds of millions of polygons. While the number of pixel shaders has grown from one to many hundreds, the triangle setup engine has remained a singular unit. For example, the GeForce GTX 285 has more than 150 times the shading horsepower of the old GeForce FX, but less than 3 times the geometry processing rate. This means that pixels are shaded well but their geometric detail is weak.</p><p> Take a look at NVIDIA’s example from <em>Far Cry 2</em>. The holster has a heavily segmented strap. The corrugated roof is just a flat surface with a striped texture instead of curving properly. We also note that this character wears a hat to avoid the complexity of rendering hair.</p><p style="text-align: center"><a href="http://alienbabeltech.com/main/wp-content/uploads/2010/01/GamesLackGeometry.jpg" target="_blank"><img src="http://alienbabeltech.com/main/wp-content/uploads/2010/01/GamesLackGeometry_thumb.jpg" alt="" class="fr-fic fr-dii fr-draggable " style="" /></a></p><p><span style="color: #99ccff"><em>On the other hand, the exquisitely detailed characters in CG films are made possible by tessellation and displacement mapping. Tessellation refines large triangles into collections of smaller triangles, while displacement mapping changes their relative position. To achieve these same goals, GF100’s entire graphics pipeline is designed to deliver higher performance in tessellation and geometry throughput.</em></span></p><p> GF100 replaces the traditional geometry processing architecture at the front end of the graphics pipeline with an entirely new distributed geometry processing architecture that is implemented using multiple “PolyMorph Engines”. Each of these engine includes a tessellation unit, an attribute setup unit, and other geometry processing units. Each SM has its own dedicated PolyMorph Engine as shown by the three grouped diagrams that we showed you earlier (above).</p><p> Newly generated primitives are converted to pixels by four Raster Engines that operate in parallel compared to a single Raster Engine in GT200 and in earlier GPUs. On-chip L1 and L2 caches now enable high bandwidth transfer of primitive attributes between the SM and the tessellation unit as well as between different SMs. Tessellation and all its supporting stages are performed in parallel on GF100 with improved geometry throughput. GF100‘s ability to perform parallel geometry processing is possibly the single most important GF100 architectural improvement. The ability to deliver setup rates exceeding one primitive per clock while maintaining correct rendering order is a significant technical achievement.</p><p> Major compute features improved on GF100 that will be useful in games include faster context switching between graphics and PhysX, concurrent compute kernel execution and an enhanced caching architecture which is good for irregular algorithms such as ray tracing, and AI. Simultaneously, improved atomic operations performance allows threads to safely cooperate through work queues, accelerating novel rendering algorithms. For example, fast atomic operations allow transparent objects to be rendered without presorting (order independent transparency) enabling developers to create levels with complex glass environments. GF100’s GigaThread engine reduces context switch time, making it possible to execute multiple compute and physics kernels for each frame.</p><p> </p><p style="text-align: center">–~~~~~~~~~~~~–</p><p><strong><span style="color: #99ccff"><u>Tessellation and Displacement Mapping</u></span></strong></p><p> It takes DX11 to take advantage of geometry. DX9 and DX10 are unable to create generalized geometry on the GPU. Therefore we will see Tessellation and displacement mapping used together to create more realism in games. The ability to control the geometric level of detail (LOD) is very important. Because it is on-demand and the data is all kept on-chip, precious memory bandwidth is preserved. Also, because one model may produce many LODs, the same game assets may be used on a variety of platforms which makes the game developers very happy. Their characters can also be easily adjusted as to how it appears in the scene; if it is small then it gets little geometry, if it is close to the screen then it is rendered with greater detail.</p><p> As an additional benefit, developers may be able to use the same models on many generations of games and future GPUs where performance increases will allow for enabling even greater detail than was possible when the game was first released. Complexity can be adjusted dynamically to even target a given frame rate!</p><p> Here in NVIDIA’s slide from the Unigine engine demo, we see tessellation compared, on and off. There is no comparison; tessellation adds to realism.</p><p style="text-align: center"><a href="http://alienbabeltech.com/main/wp-content/uploads/2010/01/Tesselation.jpg" target="_blank"><img src="http://alienbabeltech.com/main/wp-content/uploads/2010/01/Tesselation_thumb.jpg" alt="" class="fr-fic fr-dii fr-draggable " style="" /></a></p><p>Take a look at the third image that we presented earlier in this article. The use of tessellation fundamentally changes the GPU’s graphics workload balance. With tessellation, the triangle density of a given frame can increase by multiple orders of magnitude which strains serial resources such as the setup and rasterization units. To facilitate high triangle rates, NVIDIA designed a scalable geometry engine called the PolyMorph Engine. Each of GF100’s 16 PolyMorph engines has its own dedicated vertex fetch unit and a tessellator which expands geometry performance.</p><p> In conjunction with the PolyMorph Engine, NVIDIA designed four parallel Raster Engines which allows up to four triangles to be setup per clock. Results calculated in each of five stages which are then passed to an SM. The SM executes the game’s shader, returning the results to the next stage in the PolyMorph Engine. After all stages are complete, the results are forwarded to one of the four Raster Engines.</p><p> The Rasterizer takes the edge equations for each primitive and computes pixel coverage. If antialiasing is enabled, coverage is performed for each multisample and coverage sample. Each Rasterizer outputs eight pixels per clock for a total of 32 rasterized pixels per clock across the chip. Pixels produced by the rasterizer are sent to the Z-cull unit. By having a dedicated tessellator for each SM, and a Raster Engine for each GPC, GF100 delivers up to 8 times the geometry performance of GT200. NVIDIA also compares the geometry performance of GF100 to HD 5870 and finds Fermi is significantly faster.</p><p> Here is a performance comparison between GF100 and HD 5870 using a 60 second run with the Unigine engine:</p><p style="text-align: center"><a href="http://alienbabeltech.com/main/wp-content/uploads/2010/01/60SecUnigine.jpg" target="_blank"><img src="http://alienbabeltech.com/main/wp-content/uploads/2010/01/60SecUnigine_thumb.jpg" alt="" class="fr-fic fr-dii fr-draggable " style="" /></a></p><p> </p><p style="text-align: center">–~~~~~~~~~~~~–</p><p><strong><span style="color: #99ccff"><u>Anti-Aliasing Image Quality</u></span></strong></p><p> To improve anti-aliasing image quality, the GF100 introduces a new anti-aliasing mode: 32xCSAA. nVidia’s previous strongest edge AA mode was 16xQ, but this is now bested by 32xAA. Here’s the sample pattern for it, courtesy of nVidia:</p><p style="text-align: center"><img src="http://alienbabeltech.com/main/wp-content/uploads/2010/01/32.png" alt="" class="fr-fic fr-dii fr-draggable " style="" /></p><p><span style="color: #99ccff">32x</span>AA = <span style="color: #99ccff">8x</span>MSAA + <span style="color: #99ccff">24x</span>CSAA.</p><p> Thus 32xCSAA is a natural extension of 16xQ, and offers even stronger edge (polygon) anti-aliasing, courtesy of providing a total of 32 unique samples</p><p> But that’s not all that has improved. The GF100 has a new ability to use coverage samples to affect the quality of alpha textures, as implemented through transparency anti-aliasing. With previous nVidia hardware such as the GT200, coverage samples had no effect on transparency anti-aliasing quality, as the result was derived solely from the base multi-sampling pattern in effect.</p><p> Also in the specific case of transparency multi-sampling, image quality has improved there too. Any titles using the older alpha test method to render transparent textures have their shader code automatically converted to use the alpha-to-cover technique, which should greatly improve image quality, especially in heavily aliased areas.</p><p> The upshot of this is higher quality edges, and higher quality alpha textures.</p><p> </p><p> <strong><span style="color: #99ccff"><u>Anti-Aliasing Performance</u></span></strong></p><p> In addition to improving image quality, anti-aliasing performance has also increased. When it comes to AA, the most obvious area to target is the ROPs, and that’s exactly what nVidia has done. The GF100 has <span style="color: #99ccff">48</span> ROPs, up from <span style="color: #99ccff">32</span> ROPs on the GTX285, which is especially helpful for portions of the scene that cannot be compressed.</p><p> Each ROP is also faster and more efficient than on previous generations, so it can do more work per cycle. This includes improvements made to the compression technology.</p><p style="text-align: center"><img src="http://alienbabeltech.com/main/wp-content/uploads/2010/01/25a.png" alt="" class="fr-fic fr-dii fr-draggable " style="" /></p><p>Aside from better AA performance in general, nVidia’s old Achilles heel with 8xMSAA performance should also be addressed by the improvements. Historically, prior nVidia architectures have exhibited much higher relative performance hit when going from 4xMSAA to 8xMSAA, compared to competing ATi architectures.</p><p> Also by moving to 384 bit GDDR5, nVidia should have access to plenty of memory bandwidth to keep all of those ROPs fed with data.</p><p> </p><p style="text-align: center">–~~~~~~~~~~~~–</p><p><span style="color: #99ccff"><strong><u>Texture Filtering</u></strong></span></p><p> As with anti-aliasing, there have been improvements made to texturing too. Interestingly the GF100 only has <span style="color: #99ccff">64</span> TMUs, which is much less than the <span style="color: #99ccff">80</span> TMUs on the GTX285, but nVidia claims overall performance should still be higher because of improvements to performance and efficiency.</p><p> Texture caching has been substantially improved, with the L1 cache being redesigned for greater efficiency. Also the presence of a unified L2 cache means the texture cache size is three times higher than on the GT200.</p><p> Layout changes and internal improvements to the texture units also combine with a higher TMU clock speed. On the GT200 the TMUs ran at the GPU’s core clock; on the GF100 they run at a higher clock, which allows them to perform more work in the same amount of time. nVidia’s numbers show 40% to 70% higher texturing performance than the GT200, despite having much fewer TMUs.</p><p style="text-align: center"><img src="http://alienbabeltech.com/main/wp-content/uploads/2010/01/25b.png" alt="" class="fr-fic fr-dii fr-draggable " style="" /></p><p>The GF100’s texture units also offer hardware accelerated jittered sampling. This essentially means the hardware has the ability to offer a form of stochastic filtering by varying the texture sampling on a per-pixel basis. This is done by implementing DirectX 11’s Gather4 in hardware, and it provides the ability for up to four texels to be fetched from a 128×128 pixel grid with a single instruction.</p><p style="text-align: center"><a href="http://alienbabeltech.com/main/wp-content/uploads/2010/01/29.png" target="_blank"><img src="http://alienbabeltech.com/main/wp-content/uploads/2010/01/29-150x150.png" alt="" class="fr-fic fr-dii fr-draggable " style="" /></a></p><p>This not only improves performance with things like ambient occlusion, but it can also improve image quality by removing banding through random sampling. It also allows game developers to implement customized texture filtering more efficiently. nVidia states that the GF100’s hardware implementation of this technique offers up to twice the performance of the GT200.</p><p> </p><p style="text-align: center">–~~~~~~~~~~~~–</p><p><span style="color: #99ccff"><strong><u>Compute Architecture</u></strong></span></p><p> The compute engine is designed to handle the GPGPU side of things and encapsulates features such as CUDA, OpenCL, Direct Compute, and PhysX. Many of these have been around since the G80 days, but the GF100 delivers a number of improvements to make such general purpose computing run better.</p><p> The GF100 is designed to handle a wider range of algorithms better to encourage the use of the GPU more for parallel problems. One key area of improvement comes from its better cache system, which allows threads that access the same memory locations to run faster.</p><p> Another key improvement allows the GF100 to execute multiple task kernels at once, and the context switching between such tasks is much faster than on previous GPUs. This differs from the GT200 which could only run one task kernel at a time, and had very slow context switching.</p><p> And lastly, high level features such as debugging and a C++ programming environment to access GPGPU features are made possible with nVidia’s <span style="color: #99ccff">Nexus</span> plug-in for Visual Studio. Such features simplify programming GPGPU tasks as they assist developers to work at a higher level than was previously possible.</p><p> </p><p> <span style="color: #99ccff"><strong><u>Ray Tracing</u></strong></span></p><p style="text-align: center"><a href="http://alienbabeltech.com/main/wp-content/uploads/2010/01/RT.jpg" target="_blank"><img src="http://alienbabeltech.com/main/wp-content/uploads/2010/01/RT_thumb.jpg" alt="" class="fr-fic fr-dii fr-draggable " style="" /></a></p><p>The GF100 will not be able to do complex ray tracing (RT) in real time in PC games as in the above image. However, NVIDIA believes that RT is the future of graphics and they expect some implementation of it in conjunction with rasterization fairly soon as developers begin to take advantage of GF100’s new programming capabilities.</p><p> </p><p style="text-align: center">–~~~~~~~~~~~~–</p><p><span style="color: #99ccff"><strong><u>Conclusion</u></strong></span></p><p> It’s clear that nVidia has invested a lot of resources and design effort into trying to make the GF100 the fastest single GPU to date. In addition to a very clear focus on improving GPGPU performance and usability, numerous enhancements to image quality and performance for gaming purposes have also been made.</p><p> It’ll be very interesting to see how the card performs in actual gaming situations, and more importantly, how it compares to ATi’s current single GPU flagship, the Radeon 5870.</p><p> We are looking forward to bringing our readers the latest news about the Fermi GF100 and we will be testing its performance and image quality in gaming. There is much more to be revealed about NVIDIA’s new GPU. Stay tuned. The graphics wars are heating up and it is getting very interesting again.</p><p style="text-align: center"><a href="http://alienbabeltech.com/main/wp-content/uploads/2010/01/Turbulence.jpg" target="_blank"><img src="http://alienbabeltech.com/main/wp-content/uploads/2010/01/Turbulence_thumb.jpg" alt="" class="fr-fic fr-dii fr-draggable " style="" /></a></p><p></p><p> Article written by <span style="color: #00ff00"><strong>Mark Poppin</strong></span> and <span style="color: #00ff00"><strong>BFG10K</strong></span>, AlienBabelTech Senior Editors.</p><p></p><p></p><p></p><p></p><p>Reference </p><p></p><p> by apoppin on Jan.17, 2010, under ABTnews, Articles, Technology</p></blockquote><p></p>
[QUOTE="Core, post: 6785218, member: 263471"] [B][COLOR=#99ccff][U]Geometry[/U][/COLOR][/B] NVIDIA’s goal is for GF100 to enable film-like geometric realism for game characters and objects. Geometric realism is central to the GF100 architectural enhancements for graphics. In addition, PhysX simulations are faster and developers can utilize GPU computing features in games more easily and effectively. While programmable shading has allowed PC games to mimic the cinema in per-pixel effects, geometric realism is way behind. The most advanced modern PC games will use one to two million polygons per frame whereas a typical frame in a computer generated film uses hundreds of millions of polygons. While the number of pixel shaders has grown from one to many hundreds, the triangle setup engine has remained a singular unit. For example, the GeForce GTX 285 has more than 150 times the shading horsepower of the old GeForce FX, but less than 3 times the geometry processing rate. This means that pixels are shaded well but their geometric detail is weak. Take a look at NVIDIA’s example from [I]Far Cry 2[/I]. The holster has a heavily segmented strap. The corrugated roof is just a flat surface with a striped texture instead of curving properly. We also note that this character wears a hat to avoid the complexity of rendering hair. [CENTER][URL="http://alienbabeltech.com/main/wp-content/uploads/2010/01/GamesLackGeometry.jpg"][IMG]http://alienbabeltech.com/main/wp-content/uploads/2010/01/GamesLackGeometry_thumb.jpg[/IMG][/URL][/CENTER] [COLOR=#99ccff][I]On the other hand, the exquisitely detailed characters in CG films are made possible by tessellation and displacement mapping. Tessellation refines large triangles into collections of smaller triangles, while displacement mapping changes their relative position. To achieve these same goals, GF100’s entire graphics pipeline is designed to deliver higher performance in tessellation and geometry throughput.[/I][/COLOR] GF100 replaces the traditional geometry processing architecture at the front end of the graphics pipeline with an entirely new distributed geometry processing architecture that is implemented using multiple “PolyMorph Engines”. Each of these engine includes a tessellation unit, an attribute setup unit, and other geometry processing units. Each SM has its own dedicated PolyMorph Engine as shown by the three grouped diagrams that we showed you earlier (above). Newly generated primitives are converted to pixels by four Raster Engines that operate in parallel compared to a single Raster Engine in GT200 and in earlier GPUs. On-chip L1 and L2 caches now enable high bandwidth transfer of primitive attributes between the SM and the tessellation unit as well as between different SMs. Tessellation and all its supporting stages are performed in parallel on GF100 with improved geometry throughput. GF100‘s ability to perform parallel geometry processing is possibly the single most important GF100 architectural improvement. The ability to deliver setup rates exceeding one primitive per clock while maintaining correct rendering order is a significant technical achievement. Major compute features improved on GF100 that will be useful in games include faster context switching between graphics and PhysX, concurrent compute kernel execution and an enhanced caching architecture which is good for irregular algorithms such as ray tracing, and AI. Simultaneously, improved atomic operations performance allows threads to safely cooperate through work queues, accelerating novel rendering algorithms. For example, fast atomic operations allow transparent objects to be rendered without presorting (order independent transparency) enabling developers to create levels with complex glass environments. GF100’s GigaThread engine reduces context switch time, making it possible to execute multiple compute and physics kernels for each frame. [CENTER]–~~~~~~~~~~~~–[/CENTER] [B][COLOR=#99ccff][U]Tessellation and Displacement Mapping[/U][/COLOR][/B] It takes DX11 to take advantage of geometry. DX9 and DX10 are unable to create generalized geometry on the GPU. Therefore we will see Tessellation and displacement mapping used together to create more realism in games. The ability to control the geometric level of detail (LOD) is very important. Because it is on-demand and the data is all kept on-chip, precious memory bandwidth is preserved. Also, because one model may produce many LODs, the same game assets may be used on a variety of platforms which makes the game developers very happy. Their characters can also be easily adjusted as to how it appears in the scene; if it is small then it gets little geometry, if it is close to the screen then it is rendered with greater detail. As an additional benefit, developers may be able to use the same models on many generations of games and future GPUs where performance increases will allow for enabling even greater detail than was possible when the game was first released. Complexity can be adjusted dynamically to even target a given frame rate! Here in NVIDIA’s slide from the Unigine engine demo, we see tessellation compared, on and off. There is no comparison; tessellation adds to realism. [CENTER][URL="http://alienbabeltech.com/main/wp-content/uploads/2010/01/Tesselation.jpg"][IMG]http://alienbabeltech.com/main/wp-content/uploads/2010/01/Tesselation_thumb.jpg[/IMG][/URL][/CENTER] Take a look at the third image that we presented earlier in this article. The use of tessellation fundamentally changes the GPU’s graphics workload balance. With tessellation, the triangle density of a given frame can increase by multiple orders of magnitude which strains serial resources such as the setup and rasterization units. To facilitate high triangle rates, NVIDIA designed a scalable geometry engine called the PolyMorph Engine. Each of GF100’s 16 PolyMorph engines has its own dedicated vertex fetch unit and a tessellator which expands geometry performance. In conjunction with the PolyMorph Engine, NVIDIA designed four parallel Raster Engines which allows up to four triangles to be setup per clock. Results calculated in each of five stages which are then passed to an SM. The SM executes the game’s shader, returning the results to the next stage in the PolyMorph Engine. After all stages are complete, the results are forwarded to one of the four Raster Engines. The Rasterizer takes the edge equations for each primitive and computes pixel coverage. If antialiasing is enabled, coverage is performed for each multisample and coverage sample. Each Rasterizer outputs eight pixels per clock for a total of 32 rasterized pixels per clock across the chip. Pixels produced by the rasterizer are sent to the Z-cull unit. By having a dedicated tessellator for each SM, and a Raster Engine for each GPC, GF100 delivers up to 8 times the geometry performance of GT200. NVIDIA also compares the geometry performance of GF100 to HD 5870 and finds Fermi is significantly faster. Here is a performance comparison between GF100 and HD 5870 using a 60 second run with the Unigine engine: [CENTER][URL="http://alienbabeltech.com/main/wp-content/uploads/2010/01/60SecUnigine.jpg"][IMG]http://alienbabeltech.com/main/wp-content/uploads/2010/01/60SecUnigine_thumb.jpg[/IMG][/URL][/CENTER] [CENTER]–~~~~~~~~~~~~–[/CENTER] [B][COLOR=#99ccff][U]Anti-Aliasing Image Quality[/U][/COLOR][/B] To improve anti-aliasing image quality, the GF100 introduces a new anti-aliasing mode: 32xCSAA. nVidia’s previous strongest edge AA mode was 16xQ, but this is now bested by 32xAA. Here’s the sample pattern for it, courtesy of nVidia: [CENTER][IMG]http://alienbabeltech.com/main/wp-content/uploads/2010/01/32.png[/IMG][/CENTER] [COLOR=#99ccff]32x[/COLOR]AA = [COLOR=#99ccff]8x[/COLOR]MSAA + [COLOR=#99ccff]24x[/COLOR]CSAA. Thus 32xCSAA is a natural extension of 16xQ, and offers even stronger edge (polygon) anti-aliasing, courtesy of providing a total of 32 unique samples But that’s not all that has improved. The GF100 has a new ability to use coverage samples to affect the quality of alpha textures, as implemented through transparency anti-aliasing. With previous nVidia hardware such as the GT200, coverage samples had no effect on transparency anti-aliasing quality, as the result was derived solely from the base multi-sampling pattern in effect. Also in the specific case of transparency multi-sampling, image quality has improved there too. Any titles using the older alpha test method to render transparent textures have their shader code automatically converted to use the alpha-to-cover technique, which should greatly improve image quality, especially in heavily aliased areas. The upshot of this is higher quality edges, and higher quality alpha textures. [B][COLOR=#99ccff][U]Anti-Aliasing Performance[/U][/COLOR][/B] In addition to improving image quality, anti-aliasing performance has also increased. When it comes to AA, the most obvious area to target is the ROPs, and that’s exactly what nVidia has done. The GF100 has [COLOR=#99ccff]48[/COLOR] ROPs, up from [COLOR=#99ccff]32[/COLOR] ROPs on the GTX285, which is especially helpful for portions of the scene that cannot be compressed. Each ROP is also faster and more efficient than on previous generations, so it can do more work per cycle. This includes improvements made to the compression technology. [CENTER][IMG]http://alienbabeltech.com/main/wp-content/uploads/2010/01/25a.png[/IMG][/CENTER] Aside from better AA performance in general, nVidia’s old Achilles heel with 8xMSAA performance should also be addressed by the improvements. Historically, prior nVidia architectures have exhibited much higher relative performance hit when going from 4xMSAA to 8xMSAA, compared to competing ATi architectures. Also by moving to 384 bit GDDR5, nVidia should have access to plenty of memory bandwidth to keep all of those ROPs fed with data. [CENTER]–~~~~~~~~~~~~–[/CENTER] [COLOR=#99ccff][B][U]Texture Filtering[/U][/B][/COLOR] As with anti-aliasing, there have been improvements made to texturing too. Interestingly the GF100 only has [COLOR=#99ccff]64[/COLOR] TMUs, which is much less than the [COLOR=#99ccff]80[/COLOR] TMUs on the GTX285, but nVidia claims overall performance should still be higher because of improvements to performance and efficiency. Texture caching has been substantially improved, with the L1 cache being redesigned for greater efficiency. Also the presence of a unified L2 cache means the texture cache size is three times higher than on the GT200. Layout changes and internal improvements to the texture units also combine with a higher TMU clock speed. On the GT200 the TMUs ran at the GPU’s core clock; on the GF100 they run at a higher clock, which allows them to perform more work in the same amount of time. nVidia’s numbers show 40% to 70% higher texturing performance than the GT200, despite having much fewer TMUs. [CENTER][IMG]http://alienbabeltech.com/main/wp-content/uploads/2010/01/25b.png[/IMG][/CENTER] The GF100’s texture units also offer hardware accelerated jittered sampling. This essentially means the hardware has the ability to offer a form of stochastic filtering by varying the texture sampling on a per-pixel basis. This is done by implementing DirectX 11’s Gather4 in hardware, and it provides the ability for up to four texels to be fetched from a 128×128 pixel grid with a single instruction. [CENTER][URL="http://alienbabeltech.com/main/wp-content/uploads/2010/01/29.png"][IMG]http://alienbabeltech.com/main/wp-content/uploads/2010/01/29-150x150.png[/IMG][/URL][/CENTER] This not only improves performance with things like ambient occlusion, but it can also improve image quality by removing banding through random sampling. It also allows game developers to implement customized texture filtering more efficiently. nVidia states that the GF100’s hardware implementation of this technique offers up to twice the performance of the GT200. [CENTER]–~~~~~~~~~~~~–[/CENTER] [COLOR=#99ccff][B][U]Compute Architecture[/U][/B][/COLOR] The compute engine is designed to handle the GPGPU side of things and encapsulates features such as CUDA, OpenCL, Direct Compute, and PhysX. Many of these have been around since the G80 days, but the GF100 delivers a number of improvements to make such general purpose computing run better. The GF100 is designed to handle a wider range of algorithms better to encourage the use of the GPU more for parallel problems. One key area of improvement comes from its better cache system, which allows threads that access the same memory locations to run faster. Another key improvement allows the GF100 to execute multiple task kernels at once, and the context switching between such tasks is much faster than on previous GPUs. This differs from the GT200 which could only run one task kernel at a time, and had very slow context switching. And lastly, high level features such as debugging and a C++ programming environment to access GPGPU features are made possible with nVidia’s [COLOR=#99ccff]Nexus[/COLOR] plug-in for Visual Studio. Such features simplify programming GPGPU tasks as they assist developers to work at a higher level than was previously possible. [COLOR=#99ccff][B][U]Ray Tracing[/U][/B][/COLOR] [CENTER][URL="http://alienbabeltech.com/main/wp-content/uploads/2010/01/RT.jpg"][IMG]http://alienbabeltech.com/main/wp-content/uploads/2010/01/RT_thumb.jpg[/IMG][/URL][/CENTER] The GF100 will not be able to do complex ray tracing (RT) in real time in PC games as in the above image. However, NVIDIA believes that RT is the future of graphics and they expect some implementation of it in conjunction with rasterization fairly soon as developers begin to take advantage of GF100’s new programming capabilities. [CENTER]–~~~~~~~~~~~~–[/CENTER] [COLOR=#99ccff][B][U]Conclusion[/U][/B][/COLOR] It’s clear that nVidia has invested a lot of resources and design effort into trying to make the GF100 the fastest single GPU to date. In addition to a very clear focus on improving GPGPU performance and usability, numerous enhancements to image quality and performance for gaming purposes have also been made. It’ll be very interesting to see how the card performs in actual gaming situations, and more importantly, how it compares to ATi’s current single GPU flagship, the Radeon 5870. We are looking forward to bringing our readers the latest news about the Fermi GF100 and we will be testing its performance and image quality in gaming. There is much more to be revealed about NVIDIA’s new GPU. Stay tuned. The graphics wars are heating up and it is getting very interesting again. [CENTER][URL="http://alienbabeltech.com/main/wp-content/uploads/2010/01/Turbulence.jpg"][IMG]http://alienbabeltech.com/main/wp-content/uploads/2010/01/Turbulence_thumb.jpg[/IMG][/URL][/CENTER] Article written by [COLOR=#00ff00][B]Mark Poppin[/B][/COLOR] and [COLOR=#00ff00][B]BFG10K[/B][/COLOR], AlienBabelTech Senior Editors. Reference by apoppin on Jan.17, 2010, under ABTnews, Articles, Technology [/QUOTE]
Insert quotes…
Verification
Dahaya deken beduwama keeyada?
Post reply
Top
Bottom