Search
Search titles only
By:
Search titles only
By:
Log in
Register
Search
Search titles only
By:
Search titles only
By:
Menu
Install the app
Install
Forums
New posts
All threads
Latest threads
New posts
Trending threads
Trending
Search forums
What's new
New posts
New ads
New profile posts
Latest activity
Free Ads
Latest reviews
Search ads
Members
Current visitors
New profile posts
Search profile posts
Contact us
Latest ads
Ad icon
Sell your Land, House on idamata.lk for FREE
sajith.xp.pk
Updated:
Yesterday at 9:03 AM
Handmade Character Soft Toys
anil1961
Updated:
Tuesday at 2:11 PM
Bodim.lk out now !
Manoj Suranga Bandara
Updated:
Sunday at 3:05 AM
Power Lifting Lever Belt
SkullVamp
Updated:
Jun 13, 2026
Ad icon
port.lk Domain for sale
Lankan-Tech
Updated:
Jun 13, 2026
Electronics
Vehicles
Property
Search
Reply to thread
Forums
General
ElaKiri Talk!
A Good News .!!!
Get the App
JavaScript is disabled. For a better experience, please enable JavaScript in your browser before proceeding.
You are using an out of date browser. It may not display this or other websites correctly.
You should upgrade or use an
alternative browser
.
Message
<blockquote data-quote="Core" data-source="post: 6785212" data-attributes="member: 263471"><p><strong>NVIDIA IS GOING TO RELEASE DIRECTX 11 GRAPHIC CARD SERIES./!!! <img src="/styles/default/xenforo/smilies/default/happy.gif" class="smilie" loading="lazy" alt=":)" title="Happy :)" data-shortname=":)" /></strong></p><p><strong></strong></p><p><strong></strong></p><p><strong><a href="http://alienbabeltech.com/main/?p=14600" target="_blank">NVIDIA’s DirectX 11 Architecture: GF100 (Fermi) In Detail</a></strong></p><p></p><p></p><p style="text-align: center"></p> <p style="text-align: center"></p> <p style="text-align: center"></p><p>Article written by <span style="color: #00ff00"><strong>Mark Poppin</strong></span> and <span style="color: #00ff00"><strong>BFG10K</strong></span>, AlienBabelTech Senior Editors.</p><p> <strong><span style="color: #99ccff"><u>Introduction</u></span></strong></p><p> At their Graphics Technology Conference (GTC) last September 30th, NVIDIA announced their next-generation graphics architecture, codenamed Fermi. We reported on it for you <a href="http://alienbabeltech.com/main/?p=11661" target="_blank">here</a>, <a href="http://alienbabeltech.com/main/?p=11825" target="_blank">here</a> and <a href="http://alienbabeltech.com/main/?p=11911" target="_blank">here</a> in a three-part series. At the GTC, graphics performance was not the focus of Tesla Fermi. Rather the conference was emphasizing NVIDIA’s new architecture as a revolutionary <em>G</em>eneral <em>P</em>urpose <em>P</em>rocessor that takes much more advantage of their new Fermi GPU’s abilities of superfast parallel processing over their current architecture. NVIDIA’s goal is to dominate the professional market with their Tesla GPUs. Now that Fermi GF100 GPUs for NVIDIA’s new video cards are finally in mass production, we will be looking at how NVIDIA intends to dominate gaming.</p><p> <img src="http://alienbabeltech.com/main/wp-content/uploads/2009/10/FermProdMockup_thumb.jpg" alt="" class="fr-fic fr-dii fr-draggable " style="" />Fermi Production Mock-up</p><p></p><p> <img src="http://alienbabeltech.com/main/wp-content/uploads/2009/10/RawGPU_ob_thumb.jpg" alt="" class="fr-fic fr-dii fr-draggable " style="" />Fermi GPU</p><p></p><p> To summarize the new architecture, Fermi boasts a brand new shader core whose compute clusters comprise a single shader multiprocessor (SM). Each stream processor has a fully-pipelined integer arithmetic logic unit (ALU) and floating point unit (FPU). Each SM can dual-issue two independent instructions per clock to two different warps. Each instruction is run by a 16-way SIMD block that handles single-precision Floating Multiply-Add Instruction (FMAs). The Fermi memory hierarchy is also new, sporting a new unified L2 cache that serves all of the SMs without partitions. In addition, a new unified memory space allows each SM to not only communicate with its own local registers and shared memory, but now with L2 cache and beyond.</p><p> The GF100 features 768KB unified level-two cache as well as a rather complex cache hierarchy. In addition, many other GPU-compute areas of performance are improved over NVIDIA’s current Tesla architecture GPUs, GT200. The GF100 hardware can sustain peak Single Precision (SP) and Double Precision (DP) FMA instruction throughput. Atomic instruction throughput is maximized over the current generation and Fermi is backed by ECC which is absolutely necessary for GPU computing. This all comes together to support a new type of multi-threading technology which improves the efficiency of the 512 cores working together. The entire Fermi family is compatible with DirectX 11, OpenGL 3.x and OpenCL 1.x application programming interfaces (APIs). The new chips are finally in mass production using 40nm process technology at TSMC.</p><p> Let’s go ahead and see what is new and improved with GF100.</p><p> </p><p style="text-align: center">–~~~~~~~~~~~~–</p><p><strong><span style="color: #99ccff"><u>GF100 Architecture</u></span></strong></p><p> Lets look at the diagrams:</p><p> <a href="http://alienbabeltech.com/main/wp-content/uploads/2010/01/Architecture_1.jpg" target="_blank"><img src="http://alienbabeltech.com/main/wp-content/uploads/2010/01/Architecture_1_thumb.jpg" alt="" class="fr-fic fr-dii fr-draggable " style="" /></a> <a href="http://alienbabeltech.com/main/wp-content/uploads/2010/01/raster2.jpg" target="_blank"><img src="http://alienbabeltech.com/main/wp-content/uploads/2010/01/raster2_thumb.jpg" alt="" class="fr-fic fr-dii fr-draggable " style="" /></a> <a href="http://alienbabeltech.com/main/wp-content/uploads/2010/01/dist_parallel.jpg" target="_blank"><img src="http://alienbabeltech.com/main/wp-content/uploads/2010/01/dist_parallel_thumb.jpg" alt="" class="fr-fic fr-dii fr-draggable " style="" /></a></p><p> <em><span style="color: #99ccff">The first diagram from NVIDIA’s slides, shows the GF100 block diagram illustrating the Host Interface, the GigaThread Engine, four GPCs, six Memory Controllers, six ROP partitions, and a 768 KB L2 cache. Each GPC contains four PolyMorph engines. The ROP partitions are immediately adjacent to the L2 cache. The second image illustrates how GF100’s graphics architecture is built from a number of hardware blocks called Graphics Processing Clusters (GPCs). A GPC contains a Raster Engine and up to four SMs. The third image illustrates how it all works together.</span></em></p><p> Firstly, CPU commands are read by the GPU via the Host Interface. In turn the GigaThread Engine fetches data from the system memory and copies it to the framebuffer. GF100 implements six 64-bit GDDR5 memory controllers for 384-bit total which facilitates high bandwidth access to the framebuffer. The GigaThread Engine then creates and dispatches thread blocks to various SMs. Individual SMs in turn schedules warps (groups of 32 threads) to CUDA cores and to the other execution units. The GigaThread Engine also redistributes work to the SMs when work expansion occurs in the graphics pipeline.</p><p> In the first image, the rectangular structures are SMs, or as NVIDIA calls them, streaming multiprocessors of which Fermi has sixteen. NVIDIA calls the green squares inside of each SM, “CUDA cores”. These CUDA cores compromise the chip’s most fundamental execution resource which helps to determine the chip’s total processing power and ultimately its performance. The GT200 has 240 and Fermi has 512.</p><p> The memory interfaces are 64-bit. This means that Fermi has its total path to memory that is 384 bits wide. This is in contrast to the higher 512 bit pathway on the GT200. However, Fermi compensates by delivering almost twice the bandwidth per pin due to its support for GDDR5 memory; GT200 used GDDR3 memory.</p><p> To summarize, Fermi GF100 has:</p><p> </p><ul> <li data-xf-list-type="ul"><span style="color: #99ccff">512 CUDA cores</span></li> <li data-xf-list-type="ul"><span style="color: #99ccff">16 Geometry Units</span></li> <li data-xf-list-type="ul"><span style="color: #99ccff">4 raster units</span></li> <li data-xf-list-type="ul"><span style="color: #99ccff">64 texture units</span></li> <li data-xf-list-type="ul"><span style="color: #99ccff">48 ROP units</span></li> <li data-xf-list-type="ul"><span style="color: #99ccff">384-bit GDDR5</span></li> </ul><p>NVIDIA’s current generation product, the GT200 – of which GTX 285 is the single GPU flagship – was able to improve on the original G80 design as represented by the 8800 GTX. By refining G80’s architecture, NVIDIA made it more programmable by adding double precision (DP) support and atomic operations. GT200 managed all of this while still holding on to the highest performance crown for a single GPU until nearly five months ago when AMD/ATI’s Radeon 5870 launched. Their competitor has the first DX11 chip that was built with incremental changes made over its last generation resulting in significant performance improvements over HD 4800 series.</p><p> So now NVIDIA has announced their Fermi GF100 next generation DX11 architecture which aims for even greater performance and also is more programmable and software friendly. There is no “GT300”. Until now, NVIDIA has chosen to primarily discuss Fermi Tesla GPU computing architecture and not to disclose microarchitecture or especially game-related performance details of GF100.</p><p> The biggest changes in GF100 architecture show us that the geometry pipeline has been significantly revamped with improved performance in geometry shading, stream out, and culling. Fillrate has also been improved which enables multiple displays to be driven simultaneously by GF100 SLI, much like AMD’s Eyefinity; but now additionally in 3D and at 120 Hz.</p><p> From studying the second image, we can see that the GPC is GF100’s dominant high-level hardware block. It features two key innovations—a scalable Raster Engine for triangle setup, rasterization, and z-cull, and a scalable PolyMorph Engine for vertex attribute fetch and tessellation. The Raster Engine resides in the GPC, whereas the PolyMorph Engine resides in the SM. On earlier NVIDIA GPUs, SMs and Texture Units were grouped together in hardware blocks called Texture Processing Clusters (TPCs). On GF100, each SM has four dedicated Texture Units.</p><p> As we look deeper, we can see that Fermi’s tessellation engine is impressive. It is not something just “tacked on” to GT200. NVIDIA saw early on that if they only made incremental changes to GT200, they would run into severe bottlenecks. Simply adding tessellation to GT200 would lead to intolerable geometry bottlenecks. They tell us that this is what took them so long – they had to design a better balanced new chip architecture that could also have better sequential rendering semantics built into its engine.</p><p> </p><p style="text-align: center">–~~~~~~~~~~~~–</p></blockquote><p></p>
[QUOTE="Core, post: 6785212, member: 263471"] [B]NVIDIA IS GOING TO RELEASE DIRECTX 11 GRAPHIC CARD SERIES./!!! :) [URL="http://alienbabeltech.com/main/?p=14600"]NVIDIA’s DirectX 11 Architecture: GF100 (Fermi) In Detail[/URL][/B] [CENTER] [/CENTER] Article written by [COLOR=#00ff00][B]Mark Poppin[/B][/COLOR] and [COLOR=#00ff00][B]BFG10K[/B][/COLOR], AlienBabelTech Senior Editors. [B][COLOR=#99ccff][U]Introduction[/U][/COLOR][/B] At their Graphics Technology Conference (GTC) last September 30th, NVIDIA announced their next-generation graphics architecture, codenamed Fermi. We reported on it for you [URL="http://alienbabeltech.com/main/?p=11661"]here[/URL], [URL="http://alienbabeltech.com/main/?p=11825"]here[/URL] and [URL="http://alienbabeltech.com/main/?p=11911"]here[/URL] in a three-part series. At the GTC, graphics performance was not the focus of Tesla Fermi. Rather the conference was emphasizing NVIDIA’s new architecture as a revolutionary [I]G[/I]eneral [I]P[/I]urpose [I]P[/I]rocessor that takes much more advantage of their new Fermi GPU’s abilities of superfast parallel processing over their current architecture. NVIDIA’s goal is to dominate the professional market with their Tesla GPUs. Now that Fermi GF100 GPUs for NVIDIA’s new video cards are finally in mass production, we will be looking at how NVIDIA intends to dominate gaming. [IMG]http://alienbabeltech.com/main/wp-content/uploads/2009/10/FermProdMockup_thumb.jpg[/IMG]Fermi Production Mock-up [IMG]http://alienbabeltech.com/main/wp-content/uploads/2009/10/RawGPU_ob_thumb.jpg[/IMG]Fermi GPU To summarize the new architecture, Fermi boasts a brand new shader core whose compute clusters comprise a single shader multiprocessor (SM). Each stream processor has a fully-pipelined integer arithmetic logic unit (ALU) and floating point unit (FPU). Each SM can dual-issue two independent instructions per clock to two different warps. Each instruction is run by a 16-way SIMD block that handles single-precision Floating Multiply-Add Instruction (FMAs). The Fermi memory hierarchy is also new, sporting a new unified L2 cache that serves all of the SMs without partitions. In addition, a new unified memory space allows each SM to not only communicate with its own local registers and shared memory, but now with L2 cache and beyond. The GF100 features 768KB unified level-two cache as well as a rather complex cache hierarchy. In addition, many other GPU-compute areas of performance are improved over NVIDIA’s current Tesla architecture GPUs, GT200. The GF100 hardware can sustain peak Single Precision (SP) and Double Precision (DP) FMA instruction throughput. Atomic instruction throughput is maximized over the current generation and Fermi is backed by ECC which is absolutely necessary for GPU computing. This all comes together to support a new type of multi-threading technology which improves the efficiency of the 512 cores working together. The entire Fermi family is compatible with DirectX 11, OpenGL 3.x and OpenCL 1.x application programming interfaces (APIs). The new chips are finally in mass production using 40nm process technology at TSMC. Let’s go ahead and see what is new and improved with GF100. [CENTER]–~~~~~~~~~~~~–[/CENTER] [B][COLOR=#99ccff][U]GF100 Architecture[/U][/COLOR][/B] Lets look at the diagrams: [URL="http://alienbabeltech.com/main/wp-content/uploads/2010/01/Architecture_1.jpg"][IMG]http://alienbabeltech.com/main/wp-content/uploads/2010/01/Architecture_1_thumb.jpg[/IMG][/URL] [URL="http://alienbabeltech.com/main/wp-content/uploads/2010/01/raster2.jpg"][IMG]http://alienbabeltech.com/main/wp-content/uploads/2010/01/raster2_thumb.jpg[/IMG][/URL] [URL="http://alienbabeltech.com/main/wp-content/uploads/2010/01/dist_parallel.jpg"][IMG]http://alienbabeltech.com/main/wp-content/uploads/2010/01/dist_parallel_thumb.jpg[/IMG][/URL] [I][COLOR=#99ccff]The first diagram from NVIDIA’s slides, shows the GF100 block diagram illustrating the Host Interface, the GigaThread Engine, four GPCs, six Memory Controllers, six ROP partitions, and a 768 KB L2 cache. Each GPC contains four PolyMorph engines. The ROP partitions are immediately adjacent to the L2 cache. The second image illustrates how GF100’s graphics architecture is built from a number of hardware blocks called Graphics Processing Clusters (GPCs). A GPC contains a Raster Engine and up to four SMs. The third image illustrates how it all works together.[/COLOR][/I] Firstly, CPU commands are read by the GPU via the Host Interface. In turn the GigaThread Engine fetches data from the system memory and copies it to the framebuffer. GF100 implements six 64-bit GDDR5 memory controllers for 384-bit total which facilitates high bandwidth access to the framebuffer. The GigaThread Engine then creates and dispatches thread blocks to various SMs. Individual SMs in turn schedules warps (groups of 32 threads) to CUDA cores and to the other execution units. The GigaThread Engine also redistributes work to the SMs when work expansion occurs in the graphics pipeline. In the first image, the rectangular structures are SMs, or as NVIDIA calls them, streaming multiprocessors of which Fermi has sixteen. NVIDIA calls the green squares inside of each SM, “CUDA cores”. These CUDA cores compromise the chip’s most fundamental execution resource which helps to determine the chip’s total processing power and ultimately its performance. The GT200 has 240 and Fermi has 512. The memory interfaces are 64-bit. This means that Fermi has its total path to memory that is 384 bits wide. This is in contrast to the higher 512 bit pathway on the GT200. However, Fermi compensates by delivering almost twice the bandwidth per pin due to its support for GDDR5 memory; GT200 used GDDR3 memory. To summarize, Fermi GF100 has: [LIST] [*][COLOR=#99ccff]512 CUDA cores[/COLOR] [*][COLOR=#99ccff]16 Geometry Units[/COLOR] [*][COLOR=#99ccff]4 raster units[/COLOR] [*][COLOR=#99ccff]64 texture units[/COLOR] [*][COLOR=#99ccff]48 ROP units[/COLOR] [*][COLOR=#99ccff]384-bit GDDR5[/COLOR] [/LIST] NVIDIA’s current generation product, the GT200 – of which GTX 285 is the single GPU flagship – was able to improve on the original G80 design as represented by the 8800 GTX. By refining G80’s architecture, NVIDIA made it more programmable by adding double precision (DP) support and atomic operations. GT200 managed all of this while still holding on to the highest performance crown for a single GPU until nearly five months ago when AMD/ATI’s Radeon 5870 launched. Their competitor has the first DX11 chip that was built with incremental changes made over its last generation resulting in significant performance improvements over HD 4800 series. So now NVIDIA has announced their Fermi GF100 next generation DX11 architecture which aims for even greater performance and also is more programmable and software friendly. There is no “GT300”. Until now, NVIDIA has chosen to primarily discuss Fermi Tesla GPU computing architecture and not to disclose microarchitecture or especially game-related performance details of GF100. The biggest changes in GF100 architecture show us that the geometry pipeline has been significantly revamped with improved performance in geometry shading, stream out, and culling. Fillrate has also been improved which enables multiple displays to be driven simultaneously by GF100 SLI, much like AMD’s Eyefinity; but now additionally in 3D and at 120 Hz. From studying the second image, we can see that the GPC is GF100’s dominant high-level hardware block. It features two key innovations—a scalable Raster Engine for triangle setup, rasterization, and z-cull, and a scalable PolyMorph Engine for vertex attribute fetch and tessellation. The Raster Engine resides in the GPC, whereas the PolyMorph Engine resides in the SM. On earlier NVIDIA GPUs, SMs and Texture Units were grouped together in hardware blocks called Texture Processing Clusters (TPCs). On GF100, each SM has four dedicated Texture Units. As we look deeper, we can see that Fermi’s tessellation engine is impressive. It is not something just “tacked on” to GT200. NVIDIA saw early on that if they only made incremental changes to GT200, they would run into severe bottlenecks. Simply adding tessellation to GT200 would lead to intolerable geometry bottlenecks. They tell us that this is what took them so long – they had to design a better balanced new chip architecture that could also have better sequential rendering semantics built into its engine. [CENTER]–~~~~~~~~~~~~–[/CENTER] [/QUOTE]
Insert quotes…
Verification
Dahaya deken beduwama keeyada?
Post reply
Top
Bottom