Reply to thread

Message: <blockquote data-quote="Core" data-source="post: 6785212" data-attributes="member: 263471">NVIDIA IS GOING TO RELEASE DIRECTX 11 GRAPHIC CARD SERIES./!!! <img src="/styles/default/xenforo/smilies/default/happy.gif"  class="smilie" loading="lazy" alt=":)" title="Happy    :)" data-shortname=":)" /><a href="http://alienbabeltech.com/main/?p=14600" target="_blank">NVIDIA’s  DirectX 11 Architecture: GF100 (Fermi) In Detail</a> Article written by Mark  Poppin and BFG10K,  AlienBabelTech Senior Editors. Introduction At their Graphics Technology Conference (GTC) last September 30th,  NVIDIA announced their next-generation graphics architecture, codenamed  Fermi. We reported on it for you <a href="http://alienbabeltech.com/main/?p=11661" target="_blank">here</a>, <a href="http://alienbabeltech.com/main/?p=11825" target="_blank">here</a> and <a href="http://alienbabeltech.com/main/?p=11911" target="_blank">here</a> in a three-part  series. At the GTC, graphics performance was not the focus of Tesla  Fermi. Rather the conference was emphasizing NVIDIA’s new architecture  as a revolutionary General Purpose Processor  that takes much more advantage of their new Fermi GPU’s abilities of  superfast parallel processing over their current architecture. NVIDIA’s  goal is to dominate the professional market with their Tesla GPUs.  Now  that Fermi GF100 GPUs for NVIDIA’s new video cards are finally in mass  production, we will be looking at how NVIDIA intends to dominate gaming. <img src="http://alienbabeltech.com/main/wp-content/uploads/2009/10/FermProdMockup_thumb.jpg" alt="" class="fr-fic fr-dii fr-draggable " style="" />Fermi Production Mock-up <img src="http://alienbabeltech.com/main/wp-content/uploads/2009/10/RawGPU_ob_thumb.jpg" alt="" class="fr-fic fr-dii fr-draggable " style="" />Fermi GPU To summarize the new architecture, Fermi boasts a brand new shader  core whose compute clusters comprise a single shader multiprocessor  (SM). Each stream processor has a fully-pipelined integer arithmetic  logic unit (ALU) and floating point unit (FPU). Each SM can dual-issue  two independent instructions per clock to two different warps. Each  instruction is run by a 16-way SIMD block that handles single-precision  Floating Multiply-Add Instruction (FMAs). The Fermi memory hierarchy is  also new, sporting a new unified L2 cache that serves all of the SMs  without partitions. In addition, a new unified memory space allows each  SM to not only communicate with its own local registers and shared  memory, but now with L2 cache and beyond. The GF100 features 768KB unified level-two cache as well as a rather  complex cache hierarchy. In addition, many other GPU-compute areas of  performance are improved over NVIDIA’s current Tesla architecture GPUs,  GT200.  The GF100 hardware can sustain peak Single Precision (SP) and  Double Precision (DP) FMA instruction throughput. Atomic instruction  throughput is maximized over the current generation and Fermi is backed  by ECC which is absolutely necessary for GPU computing. This all comes  together to support a new type of multi-threading technology which  improves the efficiency of the 512 cores working together. The entire  Fermi family is compatible with DirectX 11, OpenGL 3.x and OpenCL 1.x  application programming interfaces (APIs). The new chips are finally in  mass production using 40nm process technology at TSMC. Let’s go ahead and see what is new and improved with GF100.  –~~~~~~~~~~~~–GF100  Architecture Lets look at the diagrams: <a href="http://alienbabeltech.com/main/wp-content/uploads/2010/01/Architecture_1.jpg" target="_blank"><img src="http://alienbabeltech.com/main/wp-content/uploads/2010/01/Architecture_1_thumb.jpg" alt="" class="fr-fic fr-dii fr-draggable " style="" /></a> <a href="http://alienbabeltech.com/main/wp-content/uploads/2010/01/raster2.jpg" target="_blank"><img src="http://alienbabeltech.com/main/wp-content/uploads/2010/01/raster2_thumb.jpg" alt="" class="fr-fic fr-dii fr-draggable " style="" /></a> <a href="http://alienbabeltech.com/main/wp-content/uploads/2010/01/dist_parallel.jpg" target="_blank"><img src="http://alienbabeltech.com/main/wp-content/uploads/2010/01/dist_parallel_thumb.jpg" alt="" class="fr-fic fr-dii fr-draggable " style="" /></a> The first diagram from  NVIDIA’s slides, shows the GF100 block diagram illustrating the Host  Interface, the GigaThread Engine, four GPCs, six Memory Controllers, six  ROP partitions, and a 768 KB L2 cache. Each GPC contains four PolyMorph  engines. The ROP partitions are immediately adjacent to the L2 cache.  The second  image illustrates how GF100’s graphics architecture is built  from a number of hardware blocks called Graphics Processing Clusters  (GPCs). A GPC contains a Raster Engine and up to four SMs. The third  image illustrates how it all works together. Firstly, CPU commands are read by the GPU via the Host Interface. In  turn the GigaThread Engine fetches data from the system memory and  copies it to the framebuffer. GF100 implements six 64-bit GDDR5 memory  controllers for 384-bit total which facilitates high bandwidth access to  the framebuffer. The GigaThread Engine then creates and dispatches  thread blocks to various SMs. Individual SMs in turn schedules warps  (groups of 32 threads) to CUDA cores and to the other execution units.  The GigaThread Engine also redistributes work to the SMs when work  expansion occurs in the graphics pipeline. In the first image, the rectangular structures are SMs, or as NVIDIA  calls them, streaming multiprocessors of which Fermi has sixteen. NVIDIA  calls the green squares inside of each SM, “CUDA cores”. These CUDA  cores compromise the chip’s most fundamental execution resource which  helps to determine the chip’s total processing power and ultimately its  performance. The GT200 has 240 and Fermi has 512. The memory interfaces are 64-bit.  This means that Fermi has its  total path to memory that is 384 bits wide. This is in contrast to the  higher 512 bit pathway on the GT200. However, Fermi compensates by  delivering almost twice the bandwidth per pin due to its support for  GDDR5 memory; GT200 used GDDR3 memory. To summarize, Fermi GF100 has: <ul> <li data-xf-list-type="ul">512 CUDA cores</li> <li data-xf-list-type="ul">16 Geometry Units</li> <li data-xf-list-type="ul">4 raster units</li> <li data-xf-list-type="ul">64 texture units</li> <li data-xf-list-type="ul">48 ROP units</li> <li data-xf-list-type="ul">384-bit GDDR5</li> </ul>NVIDIA’s current generation product, the GT200 – of which GTX 285 is  the single GPU flagship – was able to improve on the original G80 design  as represented by the 8800 GTX. By refining G80’s architecture, NVIDIA  made it more programmable by adding double precision (DP) support and  atomic operations.  GT200 managed all of this while still holding on to  the highest performance crown for a single GPU until nearly five months  ago when AMD/ATI’s Radeon 5870 launched. Their competitor has the first  DX11 chip that was built with incremental changes made over its last  generation resulting in significant performance improvements over HD  4800 series. So now NVIDIA has announced their Fermi GF100 next generation DX11  architecture which aims for even greater performance and also is more  programmable and software friendly. There is no “GT300”. Until now,  NVIDIA has chosen to primarily discuss Fermi Tesla GPU computing  architecture and not to disclose microarchitecture or especially  game-related performance details of GF100. The biggest changes in GF100 architecture show us that the geometry  pipeline has been significantly revamped with improved performance in  geometry shading, stream out, and culling. Fillrate has also been  improved which enables multiple displays to be driven simultaneously by  GF100 SLI, much like AMD’s Eyefinity; but now additionally in 3D and at  120 Hz. From studying the second image, we can see that the GPC is GF100’s  dominant high-level hardware block. It features two key innovations—a  scalable Raster Engine for triangle setup, rasterization, and z-cull,  and a scalable PolyMorph Engine for vertex attribute fetch and  tessellation. The Raster Engine resides in the GPC, whereas the  PolyMorph Engine resides in the SM. On earlier NVIDIA GPUs, SMs and  Texture Units were grouped together in hardware blocks called Texture  Processing Clusters (TPCs). On GF100, each SM has four dedicated Texture  Units. As we look deeper, we can see that Fermi’s tessellation engine is  impressive. It is not something just “tacked on” to GT200. NVIDIA saw  early on that if they only made incremental changes to GT200, they would  run into severe bottlenecks. Simply adding tessellation to GT200 would  lead to intolerable geometry bottlenecks. They tell us that this is what  took them so long – they had to design a better balanced new chip  architecture that could also have better sequential rendering semantics  built into its engine.  –~~~~~~~~~~~~–</blockquote>

Verification: Dahaya deken beduwama keeyada?

Top Bottom