A long-time implementer of OpenGL (Mark Kilgard, NVIDIA) and the system's original architect (Kurt Akeley, Microsoft) explain OpenGL's design and evolution. OpenGL's state machine is now a complex data-flow with multiple programmable stages. OpenGL practitioners can expect candid design explanations, advice for programming modern GPUs, and insight into OpenGL's future.
These slides were presented at SIGGRAPH Asia 2008 for the "Modern OpenGL: Its Design and Evolution" course.
Course abstract: OpenGL was conceived in 1991 to provide an industry standard for programming the hardware graphics pipeline. The original design has evolved considerably over the last 17 years. Whereas capabilities mandated by OpenGL such as texture mapping and a stencil buffer were present only on the world's most expensive graphics hardware back in 1991, now these features are completely pervasive in PCs and now even available in several hand-held devices. Over that time, OpenGL's original fixed-function state machine has evolved into a complex data-flow including several application-programmable stages. And the performance of OpenGL has increased from 100x to over 1,000x in many important raw graphics operations.
In this course, a long-time implementer of OpenGL and the system's original architect explain OpenGL's design and evolution.
You will learn how the modern (post-2006) graphics hardware pipeline is exposed through OpenGL. You will hear Kurt Akeley's personal retrospective on OpenGL's development. You will learn nine ways to write better OpenGL programs. You will learn how modern OpenGL implementations operate. Finally we discuss OpenGL's future evolution.
Whether you program with OpenGL or program with another API such as Direct3D, this course will give you new insights into graphics hardware architecture, programmable shading, and how to best take advantage of modern GPUs.
The document discusses approaches for reducing driver overhead in OpenGL applications. It introduces several OpenGL APIs that can be used to achieve this, including persistent mapped buffers for dynamic geometry, multi-draw indirect for batching draw calls, and packing 2D textures into arrays. Speakers then provide details on implementing these techniques and the performance improvements they provide, such as reducing overhead by 5-10x and allowing an order of magnitude more unique objects per frame. Bindless textures and sparse textures are also covered as advanced methods for further optimizing texture handling and memory usage.
Killzone Shadow Fall: Threading the Entity Update on PS4jrouwe
This document discusses Guerrilla Games' approach to multi-threading entity updates in Killzone Shadow Fall on the PlayStation 4. It covers defining entities and their dependencies, a job-based system where one entity update is one job, scheduling algorithms to balance jobs across frames, and performance results showing a 4x speedup versus the PlayStation 3 version. Debug tools like yEd are used to visualize complex dependency graphs. The system allows most of the game to be programmed sequentially while exploiting multi-core performance on PS4.
OpenGL NVIDIA Command-List: Approaching Zero Driver OverheadTristan Lorach
This presentation introduces a new NVIDIA extension called Command-list.
The purpose of this presentation is to explain the basic concepts on how to use it and show what are the benefits.
The sample I used for the talk is here: https://github.com/nvpro-samples/gl_commandlist_bk3d_models
The driver for trying should be PreRelease 347.09
http://www.nvidia.com/download/driverResults.aspx/80913/en-us
This document summarizes Mark Kilgard's presentation on NVIDIA's OpenGL support in 2017. It discusses key points including the announcement of OpenGL 4.6 with SPIR-V support, NVIDIA's OpenGL driver updates, and recent advancements in OpenGL such as new extensions in 2014-2016 and the introduction of OpenGL 4.6 which bundles several new extensions. It also provides an overview of NVIDIA's leverage of the OpenGL codebase and shading compiler across multiple APIs.
Optimizing the Graphics Pipeline with Compute, GDC 2016Graham Wihlidal
With further advancement in the current console cycle, new tricks are being learned to squeeze the maximum performance out of the hardware. This talk will present how the compute power of the console and PC GPUs can be used to improve the triangle throughput beyond the limits of the fixed function hardware. The discussed method shows a way to perform efficient "just-in-time" optimization of geometry, and opens the way for per-primitive filtering kernels and procedural geometry processing.
Takeaway:
Attendees will learn how to preprocess geometry on-the-fly per frame to improve rendering performance and efficiency.
Intended Audience:
This presentation is targeting seasoned graphics developers. Experience with DirectX 12 and GCN is recommended, but not required.
presented at SIGGRAPH 2014 in Vancouver during NVIDIA's "Best of GTC" sponsored sessions
http://www.nvidia.com/object/siggraph2014-best-gtc.html
Watch the replay that includes a demo of GPU-accelerated Illustrator and several OpenGL 4 demos running on NVIDIA's Tegra Shield tablet.
http://www.ustream.tv/recorded/51255959
Find out more about the OpenGL examples for GameWorks:
https://developer.nvidia.com/gameworks-opengl-samples
Siggraph2016 - The Devil is in the Details: idTech 666Tiago Sousa
A behind-the-scenes look into the latest renderer technology powering the critically acclaimed DOOM. The lecture will cover how technology was designed for balancing a good visual quality and performance ratio. Numerous topics will be covered, among them details about the lighting solution, techniques for decoupling costs frequency and GCN specific approaches.
With the highest-quality video options, Battlefield 3 renders its Screen-Space Ambient Occlusion (SSAO) using the Horizon-Based Ambient Occlusion (HBAO) algorithm. For performance reasons, the HBAO is rendered in half resolution using half-resolution input depths. The HBAO is then blurred in full resolution using a depth-aware blur. The main issue with such low-resolution SSAO rendering is that it produces objectionable flickering for thin objects (such as alpha-tested foliage) when the camera and/or the geometry are moving. After a brief recap of the original HBAO pipeline, this talk describes a novel temporal filtering algorithm that fixed the HBAO flickering problem in Battlefield 3 with a 1-2% performance hit in 1920x1200 on PC (DX10 or DX11). The talk includes algorithm and implementation details on the temporal filtering part, as well as generic optimizations for SSAO blur pixel shaders. This is a joint work between Louis Bavoil (NVIDIA) and Johan Andersson (DICE).
NVIDIA OpenGL and Vulkan Support for 2017Mark Kilgard
Learn how NVIDIA continues improving both Vulkan and OpenGL for cross-platform graphics and compute development. This high-level talk is intended for anyone wanting to understand the state of Vulkan and OpenGL in 2017 on NVIDIA GPUs. For OpenGL, the latest standard update maintains the compatibility and feature-richness you expect. For Vulkan, NVIDIA has enabled the latest NVIDIA GPU hardware features and now provides explicit support for multiple GPUs. And for either API, NVIDIA's SDKs and Nsight tools help you develop and debug your application faster.
NVIDIA booth theater presentation at SIGGRAPH in Los Angeles, August 1, 2017.
http://www.nvidia.com/object/siggraph2017-schedule.html?id=sig1732
Get your SIGGRAPH driver release with OpenGL 4.6 and the latest Vulkan functionality from
https://developer.nvidia.com/opengl-driver
Unreal Summit 2016 Seoul Lighting the Planetary World of Project A1Ki Hyunwoo
The document summarizes a presentation about lighting techniques for a spherical planet in the game Project A1. It discusses using deferred cubic irradiance caching for global illumination that varies based on 12 time spans. Reflection probes are relit based on time of day instead of pre-capturing. Directional lighting and shadows change according to longitude. Sky lighting and bent normals are stored in cubemaps.
Ever wondered how to use modern OpenGL in a way that radically reduces driver overhead? Then this talk is for you.
John McDonald and Cass Everitt gave this talk at Steam Dev Days in Seattle on Jan 16, 2014.
OpenGL 4.4 provides new features for accelerating scenes with many objects, which are typically found in professional visualization markets. This talk will provide details on the usage of the features and their effect on real-life models. Furthermore we will showcase how more work for rendering a scene can be off-loaded to the GPU, such as efficient occlusion culling or matrix calculations.
Video presentation here: http://on-demand.gputechconf.com/gtc/2014/video/S4379-opengl-44-scene-rendering-techniques.mp4
The document provides tips for optimizing Unity games to improve CPU and memory performance. It discusses optimizing transforms by collecting updates and using SetPositionAndRotation, optimizing animators by reordering data and removing extra transforms, optimizing physics by reducing collider complexity and raycast distance, and optimizing memory by reducing texture sizes, removing duplicate assets, and marking textures as non-readable.
Talk by Yuriy O’Donnell at GDC 2017.
This talk describes how Frostbite handles rendering architecture challenges that come with having to support a wide variety of games on a single engine. Yuriy describes their new rendering abstraction design, which is based on a graph of all render passes and resources. This approach allows implementation of rendering features in a decoupled and modular way, while still maintaining efficiency.
A graph of all rendering operations for the entire frame is a useful abstraction. The industry can move away from “immediate mode” DX11 style APIs to a higher level system that allows simpler code and efficient GPU utilization. Attendees will learn how it worked out for Frostbite.
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)repii
1. The document discusses parallel graphics techniques used in the Frostbite game engine, both currently and potentially in the future. It describes using job-based parallelism to utilize multiple CPU cores and the PS3 SPUs.
2. One technique is parallel command buffer recording to dispatch draw calls to multiple command buffers and scale linearly with core count. Another is software occlusion culling using the SPUs/CPU to rasterize a coarse z-buffer.
3. Potential future techniques discussed include deferred shading using compute shaders, with the compute shader culling lights and accumulating lighting per screen-space tile.
A description of the next-gen rendering technique called Triangle Visibility Buffer. It offers up to 10x - 20x geometry compared to Deferred rendering and much higher resolution. Generally it aligns better with memory access patterns in modern GPUs compared to Deferred Lighting like Clustered Deferred Lighting etc.
In the session from Game Developers Conference 2011, we'll take a complete look at the terrain system in Frostbite 2 as it was applied in Battlefield 3. The session is partitioned into three parts. We begin with the scalability aspects and discuss how consistent use of hierarchies allowed us to combine high resolutions with high view distances. We then turn towards workflow aspects and describe how we achieved full in-game realtime editing. A fair amount of time is spent describing how issues were addressed.
Finally, we look at the runtime side. We describe usage of CPU, GPU and memory resources and how it was kept to a minimum. We discuss how the GPU is offloaded by caching intermediate results in a procedural virtual texture and how prioritization was done to allow for work throttling without sacrificing quality. We also go into depth about the flexible streaming system that work with both FPS and driving games.
This document provides recommendations for optimizing DirectX 11 performance. It separates the graphics pipeline process into offline and runtime stages. For the offline stage, it recommends creating resources like buffers, textures and shaders on multiple threads. For the runtime stage, it suggests culling unused objects, minimizing state changes, and pushing commands to the driver quickly. It also provides tips for updating dynamic resources efficiently and grouping related constants together. The goal is to keep the CPU and GPU pipelines fully utilized for maximum performance.
Bill explains some of the ways that the Vertex Shader can be used to improve performance by taking a fast path through the Vertex Shader rather than generating vertices with other parts of the pipeline in this AMD technology presentation from the 2014 Game Developers Conference in San Francisco March 17-21. Check out more technical presentations at http://developer.amd.com/resources/documentation-articles/conference-presentations/
Rendering Technologies from Crysis 3 (GDC 2013)Tiago Sousa
This talk covers changes in CryENGINE 3 technology during 2012, with DX11 related topics such as moving to deferred rendering while maintaining backward compatibility on a multiplatform engine, massive vegetation rendering, MSAA support and how to deal with its common visual artifacts, among other topics.
Scene Graphs & Component Based Game EnginesBryan Duggan
A presentation I made at the Fermented Poly meetup in Dublin about Scene Graphs & Component Based Game Engines. Lots of examples from my own game engine BGE - where almost everything is a component. Get the code and the course notes here: https://github.com/skooter500/BGE
The goal of this session is to demonstrate techniques that improve GPU scalability when rendering complex scenes. This is achieved through a modular design that separates the scene graph representation from the rendering backend. We will explain how the modules in this pipeline are designed and give insights to implementation details, which leverage GPU''s compute capabilities for scene graph processing. Our modules cover topics such as shader generation for improved parameter management, synchronizing updates between scenegraph and rendering backend, as well as efficient data structures inside the renderer.
Video here: http://on-demand.gputechconf.com/gtc/2013/video/S3032-Advanced-Scenegraph-Rendering-Pipeline.mp4
Mrinmoy Dalal is giving a presentation about computer graphics and OpenGL. OpenGL is a cross-platform API for rendering 2D and 3D graphics. It can interface with the graphics hardware of various platforms to render graphics and has good support of modern graphics hardware. OpenGL uses a graphics pipeline that processes vertex and fragment data to render 3D scenes to the framebuffer.
Windows IOCP vs Linux EPOLL Performance ComparisonSeungmo Koo
1. The document compares the performance of IOCP and EPOLL for network I/O handling on Windows and Linux servers.
2. Testing showed that throughput was similar between IOCP and EPOLL, but IOCP had lower overall CPU usage without RSS/multi-queue enabled.
3. With RSS/multi-queue enabled on the NIC, CPU usage was nearly identical between IOCP and EPOLL.
The document summarizes a lecture on texture mapping in computer graphics. It discusses topics like texture mapping fundamentals, texture coordinates, texture filtering including mipmapping and anisotropic filtering, wrap modes, cube maps, and texture formats. It also provides examples of texture mapping in games and an overview of the texture sampling process in the graphics pipeline.
NVIDIA OpenGL and Vulkan Support for 2017Mark Kilgard
Learn how NVIDIA continues improving both Vulkan and OpenGL for cross-platform graphics and compute development. This high-level talk is intended for anyone wanting to understand the state of Vulkan and OpenGL in 2017 on NVIDIA GPUs. For OpenGL, the latest standard update maintains the compatibility and feature-richness you expect. For Vulkan, NVIDIA has enabled the latest NVIDIA GPU hardware features and now provides explicit support for multiple GPUs. And for either API, NVIDIA's SDKs and Nsight tools help you develop and debug your application faster.
NVIDIA booth theater presentation at SIGGRAPH in Los Angeles, August 1, 2017.
http://www.nvidia.com/object/siggraph2017-schedule.html?id=sig1732
Get your SIGGRAPH driver release with OpenGL 4.6 and the latest Vulkan functionality from
https://developer.nvidia.com/opengl-driver
Unreal Summit 2016 Seoul Lighting the Planetary World of Project A1Ki Hyunwoo
The document summarizes a presentation about lighting techniques for a spherical planet in the game Project A1. It discusses using deferred cubic irradiance caching for global illumination that varies based on 12 time spans. Reflection probes are relit based on time of day instead of pre-capturing. Directional lighting and shadows change according to longitude. Sky lighting and bent normals are stored in cubemaps.
Ever wondered how to use modern OpenGL in a way that radically reduces driver overhead? Then this talk is for you.
John McDonald and Cass Everitt gave this talk at Steam Dev Days in Seattle on Jan 16, 2014.
OpenGL 4.4 provides new features for accelerating scenes with many objects, which are typically found in professional visualization markets. This talk will provide details on the usage of the features and their effect on real-life models. Furthermore we will showcase how more work for rendering a scene can be off-loaded to the GPU, such as efficient occlusion culling or matrix calculations.
Video presentation here: http://on-demand.gputechconf.com/gtc/2014/video/S4379-opengl-44-scene-rendering-techniques.mp4
The document provides tips for optimizing Unity games to improve CPU and memory performance. It discusses optimizing transforms by collecting updates and using SetPositionAndRotation, optimizing animators by reordering data and removing extra transforms, optimizing physics by reducing collider complexity and raycast distance, and optimizing memory by reducing texture sizes, removing duplicate assets, and marking textures as non-readable.
Talk by Yuriy O’Donnell at GDC 2017.
This talk describes how Frostbite handles rendering architecture challenges that come with having to support a wide variety of games on a single engine. Yuriy describes their new rendering abstraction design, which is based on a graph of all render passes and resources. This approach allows implementation of rendering features in a decoupled and modular way, while still maintaining efficiency.
A graph of all rendering operations for the entire frame is a useful abstraction. The industry can move away from “immediate mode” DX11 style APIs to a higher level system that allows simpler code and efficient GPU utilization. Attendees will learn how it worked out for Frostbite.
Parallel Graphics in Frostbite - Current & Future (Siggraph 2009)repii
1. The document discusses parallel graphics techniques used in the Frostbite game engine, both currently and potentially in the future. It describes using job-based parallelism to utilize multiple CPU cores and the PS3 SPUs.
2. One technique is parallel command buffer recording to dispatch draw calls to multiple command buffers and scale linearly with core count. Another is software occlusion culling using the SPUs/CPU to rasterize a coarse z-buffer.
3. Potential future techniques discussed include deferred shading using compute shaders, with the compute shader culling lights and accumulating lighting per screen-space tile.
A description of the next-gen rendering technique called Triangle Visibility Buffer. It offers up to 10x - 20x geometry compared to Deferred rendering and much higher resolution. Generally it aligns better with memory access patterns in modern GPUs compared to Deferred Lighting like Clustered Deferred Lighting etc.
In the session from Game Developers Conference 2011, we'll take a complete look at the terrain system in Frostbite 2 as it was applied in Battlefield 3. The session is partitioned into three parts. We begin with the scalability aspects and discuss how consistent use of hierarchies allowed us to combine high resolutions with high view distances. We then turn towards workflow aspects and describe how we achieved full in-game realtime editing. A fair amount of time is spent describing how issues were addressed.
Finally, we look at the runtime side. We describe usage of CPU, GPU and memory resources and how it was kept to a minimum. We discuss how the GPU is offloaded by caching intermediate results in a procedural virtual texture and how prioritization was done to allow for work throttling without sacrificing quality. We also go into depth about the flexible streaming system that work with both FPS and driving games.
This document provides recommendations for optimizing DirectX 11 performance. It separates the graphics pipeline process into offline and runtime stages. For the offline stage, it recommends creating resources like buffers, textures and shaders on multiple threads. For the runtime stage, it suggests culling unused objects, minimizing state changes, and pushing commands to the driver quickly. It also provides tips for updating dynamic resources efficiently and grouping related constants together. The goal is to keep the CPU and GPU pipelines fully utilized for maximum performance.
Bill explains some of the ways that the Vertex Shader can be used to improve performance by taking a fast path through the Vertex Shader rather than generating vertices with other parts of the pipeline in this AMD technology presentation from the 2014 Game Developers Conference in San Francisco March 17-21. Check out more technical presentations at http://developer.amd.com/resources/documentation-articles/conference-presentations/
Rendering Technologies from Crysis 3 (GDC 2013)Tiago Sousa
This talk covers changes in CryENGINE 3 technology during 2012, with DX11 related topics such as moving to deferred rendering while maintaining backward compatibility on a multiplatform engine, massive vegetation rendering, MSAA support and how to deal with its common visual artifacts, among other topics.
Scene Graphs & Component Based Game EnginesBryan Duggan
A presentation I made at the Fermented Poly meetup in Dublin about Scene Graphs & Component Based Game Engines. Lots of examples from my own game engine BGE - where almost everything is a component. Get the code and the course notes here: https://github.com/skooter500/BGE
The goal of this session is to demonstrate techniques that improve GPU scalability when rendering complex scenes. This is achieved through a modular design that separates the scene graph representation from the rendering backend. We will explain how the modules in this pipeline are designed and give insights to implementation details, which leverage GPU''s compute capabilities for scene graph processing. Our modules cover topics such as shader generation for improved parameter management, synchronizing updates between scenegraph and rendering backend, as well as efficient data structures inside the renderer.
Video here: http://on-demand.gputechconf.com/gtc/2013/video/S3032-Advanced-Scenegraph-Rendering-Pipeline.mp4
Mrinmoy Dalal is giving a presentation about computer graphics and OpenGL. OpenGL is a cross-platform API for rendering 2D and 3D graphics. It can interface with the graphics hardware of various platforms to render graphics and has good support of modern graphics hardware. OpenGL uses a graphics pipeline that processes vertex and fragment data to render 3D scenes to the framebuffer.
Windows IOCP vs Linux EPOLL Performance ComparisonSeungmo Koo
1. The document compares the performance of IOCP and EPOLL for network I/O handling on Windows and Linux servers.
2. Testing showed that throughput was similar between IOCP and EPOLL, but IOCP had lower overall CPU usage without RSS/multi-queue enabled.
3. With RSS/multi-queue enabled on the NIC, CPU usage was nearly identical between IOCP and EPOLL.
The document summarizes a lecture on texture mapping in computer graphics. It discusses topics like texture mapping fundamentals, texture coordinates, texture filtering including mipmapping and anisotropic filtering, wrap modes, cube maps, and texture formats. It also provides examples of texture mapping in games and an overview of the texture sampling process in the graphics pipeline.
This document contains C++ code for drawing lines to create letters and names using OpenGL graphics functions. It defines a function called "garis" that clears the screen, sets line width, and draws multiple lines with different vertices to render the letters "D I T A" in 3D space. The main function initializes the GLUT window library, sets display and color parameters, and calls the garis function to render the name within a main loop.
Anthony de Mello (1931-1987) was een jezuïtische priester en is, door zijn vele boeken en spirituele conferenties, bekend over de hele wereld.
In deze bestseller Bewustzijn vermengt hij christelijke spiritualiteit met boeddhistische wijsheid en psychologisch inzicht en komt daarbij tot een prachtige synthese.
In korte hoofdstukken legt hij uit dat het tijd wordt om, in plaats van alsmaar een druk en gehaast leven te leiden, ons bewust te worden van de stilte die in ons is. Dit gebeurt alleen, beweert hij, als we ons gewaar worden van onze meest onderdrukte en donkerste gedachten. We moeten deze (h)erkennen en accepteren, maar ons er niet door laten beïnvloeden. Er komt dan ruimte voor bewustzijn (stilte) waardoor we kunnen veranderen.
Wanneer we gaan inzien dat dit bewustzijn in ieder van ons aanwezig is, is dit de sleutel tot een levendiger, uitdagender en vollediger leven.
We kunnen dan meer open staan voor onze medemensen en de behoeften en het potentieel van hen zien.
Dit is een meesterlijk spiritueel boek dat ons uitdaagt om, in elk aspect van ons leven, bewust te worden.
This is a basic framework for a pitch deck, usually for presenting a business plan. The template combines best framesworks from around and uses it to build a custom plan that suits every startup.
This document provides an overview of troubleshooting storage performance issues in vSphere environments. It discusses using vCenter performance charts and ESXTop to analyze latency and I/O statistics at the storage path, disk, and LUN level. The document also covers topics like disk alignment, considerations for using SCSI versus SATA disks, identifying APD issues, multipathing, and how VMware uses SCSI reservations for metadata locking on shared VMFS datastores.
Texture mapping is a process that maps a 2D texture image onto a 3D object's surface. This allows the 3D object to take on the visual characteristics of the 2D texture. The document discusses key aspects of texture mapping like how textures are represented as arrays of texels, how texture coordinates are assigned to map textures onto object surfaces, and techniques like mipmapping, filtering and wrapping that are used to render textures properly at different distances and orientations. OpenGL functions like glTexImage2D and glTexCoord are used to specify textures and texture coordinates for 3D rendering with texture mapping.
The document summarizes a lecture on blending, compositing, and anti-aliasing in computer graphics. It discusses how colors are combined during rendering using blending operations, and how compositing operates on entire images rather than individual pixels. Porter-Duff models for digital image compositing are explained, along with how they relate to OpenGL blending functions.
The document summarizes an interview with Kim Clausen-Jensen about the Regional Operations Committee (ROC). The key points are:
1) ROC was formed in 1995 by Region 8 tribes to address environmental issues through partnerships. It encourages cooperation between tribes and federal agencies like EPA.
2) ROC has been influential in improving relationships between tribes and EPA. It helped establish the Tribal Assistance Program office at EPA to better coordinate tribal grants and issues.
3) ROC encourages information sharing between tribes. Tribes now freely share documents, programs, and knowledge to help each other tackle environmental problems.
Exploring Cultures through Cuisines at the Ultimate Travel Show 2016 - Toronto Yashy Murphy
A presentation on the topic of "Exploring Cultures through Cuisines". A look at resources to help prep for a food-centric vacation, how to leverage social media and ways to explore a destination's food culture.
Designing for Sensors & the Future of ExperiencesJeremy Johnson
Are you ready for the next ten years? Wireframes and prototypes may not be enough. Jeremy will take you on a tour of what Design problems of the future look like, from designing for sensors to walls of screens.
With the advent of sensor-based technology, we are designing more for gestures and voice commands. How do we interact in space without tactile feedback? How do we design for universal gestures?What does a future full of screens and software look like? When everything is an interface, and hardware disappears - and what are the tools and methods to tackle this design problems?
Kasus sengketa tanah hak ulayat di Kota Jayapura, Papua terjadi karena proses jual beli tanah yang tidak sesuai prosedur hukum adat maupun peraturan pertanahan, sehingga menimbulkan tumpang tindih kepemilikan tanah."
Final keyword in java is known to forbid class extension and modification of the fields. It is less known to have special meaning in multithreaded code.
Unfortunately, there is not that much information on the latter, and even most thorough talks avoid deep details on the beauty of finals.
In this talk I apply section 17.5 of java language specification to different examples and show how the spec works. Several myths are busted on the way.
Here's nice article on different aspects of JMM: http://shipilev.net/blog/2014/jmm-pragmatics/
The next generation of GPU APIs for Game EnginesPooya Eimandar
Pooya Eimandar is a game engine developer and CEO who has worked on several game engines including Wolf.Engine and Persian Game Engine. The document discusses the evolution of GPU APIs such as OpenGL, DirectX, and newer APIs like Vulkan and DirectX 12. It covers how newer APIs provide lower overhead and more direct GPU control through explicit memory management and multithreaded command buffers. Sample code is provided showing texture loading in OpenGL and Vulkan.
Open Graphics Library (OpenGL) is a cross-language, cross-platform application programming interface (API) for rendering 2D and 3D vector graphics. The API is typically used to interact with a graphics processing unit (GPU), to achieve hardware-accelerated rendering.
OpenGL is a cross-language API for 2D and 3D graphics rendering on the GPU. It was created by Silicon Graphics in 1992 and is now maintained by the Khronos Group. OpenGL provides an interface between software and graphics hardware to perform tasks like rendering, texture mapping, and shading. Developers write OpenGL code that gets translated into GPU commands by a driver for the specific graphics card. This allows hardware-accelerated graphics to be used across many platforms and programming languages.
ngGoBuilder and collaborative development between San Francisco and Tokyonotolab
This document discusses ngGoBuilder, a game engine and set of tools for developing games using ngCore. It describes ngGoBuilder 1.x features like scrolling layers and particle effects. It then discusses plans for ngGoBuilder 2.0 which will focus on a better user experience and include ngCore, debugging tools, and sample games. Future roadmaps include improved animation support and integration with the ngServer platform. The document also covers collaboration between the San Francisco and Tokyo teams working on the project.
This document provides information about OpenGL and EGL. It discusses OpenGL concepts like the rendering pipeline, shaders, and GLSL. It explains how to use OpenGL and GLSL on Android, including initializing, creating shaders, and rendering. It also covers EGL and how it acts as an interface between OpenGL and the native platform. It describes EGL objects like displays, surfaces, and contexts. It provides examples of basic EGL usage and lists EGL APIs for initializing, configuring, and managing graphics resources.
Unreal Open Day 2017 UE4 for Mobile: The Future of High Quality Mobile GamesEpic Games China
This document summarizes a presentation about Unreal Engine 4 for mobile game development. It discusses UE4's mobile rendering pipeline and features for high-end graphics on mobile, including OpenGL ES 3.1, Vulkan and Metal. It provides an overview of the state of the mobile game market and examples of AAA open-world games made with UE4. It also outlines UE4's feature levels for mobile, describes the components of the mobile rendering pipeline, and highlights specific rendering techniques like HDR encoding.
Video replay: http://nvidia.fullviewmedia.com/siggraph2012/ondemand/SS104.html
Date: Wednesday, August 8, 2012
Time: 11:50 AM - 12:50 PM
Location: SIGGRAPH 2012, Los Angeles
Attend this session to get the most out of OpenGL on NVIDIA Quadro and GeForce GPUs. Learn about the new features in OpenGL 4.3, particularly Compute Shaders. Other topics include bindless graphics; Linux improvements; and how to best use the modern OpenGL graphics pipeline. Learn how your application can benefit from NVIDIA's leadership driving OpenGL as a cross-platform, open industry standard.
Get OpenGL 4.3 beta drivers for NVIDIA GPUs from http://www.nvidia.com/content/devzone/opengl-driver-4.3.html
[03 1][gpu용 개발자 도구 - parallel nsight 및 axe] miller axelaparuma
NVIDIA provides a family of software modules called Application Acceleration Engines (AXE) that enable developers to enhance applications with high performance GPU capabilities. The AXE include PhysX, Cg/CgFX, SceniX, CompleX, and OptiX which are free to use and help apps exploit NVIDIA GPUs. These engines provide features like physics simulation, programmable shading, scene management, scaling to large datasets, and ray tracing to accelerate application development.
Community works for muli core embedded image processingJeongpyo Kong
1. The presentation discusses multi-core embedded image processing and the speaker's work with ETRI and KESSIA on related projects.
2. It provides technical backgrounds on requirements for embedded image processing like low power and high performance. Approaches discussed include hardware based using multi-core processors and software based using efficient algorithms and frameworks.
3. The speaker's current works involve porting OpenCV to various hardware platforms from ETRI and conducting performance tests, and future work may include developing specific applications for smart devices.
The document discusses the evolution of compute APIs from early graphics APIs like CUDA and CTM to current standards like OpenCL and DirectCompute. It summarizes the key aspects of the 1st generation APIs, including their execution model based on graphics processing and caveats identified by developers. The document proposes that the 2nd generation of APIs will be better suited to current hardware designed for compute by adopting a task-based execution model that maps more directly to multi-threaded CPU and GPU architectures.
SIGGRAPH Asia 2012 Exhibitor Talk: OpenGL 4.3 and BeyondMark Kilgard
Location: Conference Hall K, Singapore EXPO
Date: Thursday, November 29, 2012
Time: 11:00 AM - 11:50 PM
Presenter: Mark Kilgard (Principal Software Engineer, NVIDIA, Austin, Texas)
Abstract: Attend this session to get the most out of OpenGL on NVIDIA Quadro and GeForce GPUs. Learn about the new features in OpenGL 4.3, particularly Compute Shaders. Other topics include bindless graphics; Linux improvements; and how to best use the modern OpenGL graphics pipeline. Learn how your application can benefit from NVIDIA's leadership driving OpenGL as a cross-platform, open industry standard.
Topic Areas: Computer Graphics; Development Tools & Libraries; Visualization; Image and Video Processing
Level: Intermediate
OpenGL is a cross-platform API for rendering 3D graphics. It consists of a pipeline that processes vertices, primitives, fragments and pixels. Key stages include vertex processing, tessellation, primitive processing, rasterization, fragment processing and pixel processing. OpenGL uses libraries like GLUT and GLEW and works across Windows, Linux and Mac operating systems.
Frederik Vogel from LINE Fukuoka will give a presentation on Apple's Metal framework. The presentation will provide a history of Metal and graphics processing, demonstrate Metal concepts like the render pipeline, and show code examples. Attendees will learn about rendering 3D graphics, compute capabilities, and possibilities for using Metal in areas like machine learning and finance.
The document discusses the PlayStation Graphics Library (PSGL), an industry standard graphics library for PlayStation 3. It provides precision tools for graphics programming on PS3, including support for OpenGL ES, Cg shaders, and COLLADA. PSGL aims to leverage existing development tools and expertise while guaranteeing quality through conformance testing. It covers topics like the choice of OpenGL ES and Cg over alternatives, PSGL extensions, and the use of COLLADA for content import/export.
The document provides a history of GPUs and GPGPU computing. It describes how GPUs evolved from fixed hardware for graphics to programmable hardware. This allowed general purpose computing on GPUs (GPGPU). It discusses the development of GPGPU languages and APIs like CUDA, OpenCL, and DirectCompute. The anatomy of a modern GPU is explained, highlighting its massively parallel architecture. Typical GPGPU execution and memory models are outlined. Usage of GPGPU for applications like graphics, physics, computer vision, and HPC is mentioned. Leading GPU vendors and their products are briefly introduced.
The document discusses OpenGL ES, a lightweight version of OpenGL designed for embedded systems like mobile phones. It provides an introduction to OpenGL ES, describing its features which include removing redundancy from OpenGL to optimize it for constrained devices. The document outlines the differences between OpenGL and OpenGL ES, and describes the various versions of OpenGL ES from 1.0 to 3.2. It also discusses OpenGL ES fundamentals like its state machine-based model, and basic GL operations like rasterization and datatypes.
Embedded Graphics Drivers in Mesa (ELCE 2019)Igalia
By Neil Roberts.
Users of mobile platforms are expecting more and more complex graphics on their devices. This means that taking advantage of the mobile GPUs efficiently is essential. A large part of this efficiency is dependent on the user-space drivers. Unfortunately being in user-space means that many GPU providers can get away with only providing a closed-source driver which hides a lot of the secrets needed to be efficient. This talk presents a project providing an open-source alternative including support for embedded platforms.
Mesa is the standard open-source user-space library providing an implementation of the OpenGL, GLES and Vulkan APIs on Linux platforms. It has drivers for a range of different hardware. This talk will present the project, the user-space graphics stack and the inner workings of Mesa. It will then continue to present the embedded drivers that it supports such as Freedreno for the Adreno platform, Panfrost for Mali Midgard and Bifrost GPUs and the drivers for Broadcom GPUs.
(c) Open Source Summit + Embedded Linux Conference Europe 2019
October 28 - 30, 2019
Citi Centre de Congrès de Lyon (Lyon Convention Centre)
Lyon, France
Presented as a pre-conference tutorial at the GPU Technology Conference in San Jose on September 20, 2010.
Learn about NVIDIA's OpenGL 4.1 functionality available now on Fermi-based GPUs.
D11: a high-performance, protocol-optional, transport-optional, window system...Mark Kilgard
Consider the dual pressures toward a more tightly integrated workstation window system: 1) the need to efficiently handle high bandwidth services such as video, audio, and three-dimensional graphics; and 2) the desire to achieve the under-realized potential for local window system performance in X11.
This paper proposes a new window system architecture called D11 that seeks higher performance while preserving compatibility with the industry-standard X11 window system. D11 reinvents the X11 client/server architecture using a new operating system facility similar in concept to the Unix kernel's traditional implementation but designed for user-level execution. This new architecture allows local D11 programs to execute within the D11 window system kernel without compromising the window sytem's integrity. This scheme minimizes context switching, eliminates protocol packing and unpacking, and greatly reduces data copying. D11 programs fall back to the X11 protocol when running remote or connecting to an X11 server. A special D11 program acts as an X11 protocol translator to allow X11 programs to utilize a D11 window system.
[The described system was never implemented.]
Computers, Graphics, Engineering, Math, and Video Games for High School StudentsMark Kilgard
This document provides an overview of computer graphics and how it is used in video games. It discusses how graphics have progressed over time from 1995 to 2017, allowing for more complex characters like Laura Croft. It then explains some of the core concepts in computer graphics like geometry, modeling color, lighting, texturing, and motion blur. It discusses how graphics processing units (GPUs) work and how programming units within the GPU can simulate realistic lighting, shadows, and other effects to generate interactive 3D graphics in real-time.
Virtual Reality Features of NVIDIA GPUsMark Kilgard
This document summarizes virtual reality rendering features of NVIDIA Pascal GPUs. It discusses how Pascal supports efficient single-pass stereo rendering by generating left and right eye views in one rendering pass. It also describes how Pascal implements lens matched shading to better match the rendered images to an HMD lens, reducing pixel resampling. Finally, it notes Pascal's use of window rectangle testing to discard pixels that fall outside the lens region, improving performance.
1. The document discusses migrating from OpenGL to Vulkan, providing analogies comparing the APIs to fixed-function toys, programmable LEGO kits, and raw materials pine wood derby kits.
2. It outlines scenarios that are likely and unlikely to benefit from Vulkan, such as applications with parallelizable CPU-bound graphics work.
3. Key differences between OpenGL and Vulkan are explained, such as Vulkan requiring explicit management of graphics resources, synchronization, and command buffer queuing. The document emphasizes that transitioning to Vulkan truly rethinks the entire graphics rendering approach.
EXT_window_rectangles extends OpenGL with a new per-fragment test called the "window rectangles test" for use with FBOs that provides 8 or more inclusive or exclusive rectangles for rasterized fragments. Applications of this functionality include web browsers and virtual reality.
Slides: Accelerating Vector Graphics Rendering using the Graphics Hardware Pi...Mark Kilgard
Slides for SIGGRAPH paper presentation of "Accelerating Vector Graphics Rendering using the Graphics Hardware Pipeline".
Presented by Vineet Batra (Adobe) on Thursday, August 13, 2015 at 2:00 pm - 3:30 pm, Los Angeles Convention Center, Room 150/151.
Accelerating Vector Graphics Rendering using the Graphics Hardware PipelineMark Kilgard
SIGGRAPH 2015 paper.
We describe our successful initiative to accelerate Adobe Illustrator with the graphics hardware pipeline of modern GPUs. Relying on OpenGL 4.4 plus recent OpenGL extensions for advanced blend modes and first-class GPU-accelerated path rendering, we accelerate the Adobe Graphics Model (AGM) layer responsible for rendering sophisticated Illustrator scenes. Illustrator documents render in either an RGB or CMYK color mode. While GPUs are designed and optimized for RGB rendering, we orchestrate OpenGL rendering of vector content in the proper CMYK color space and accommodate the 5+ color components required. We support both non-isolated and isolated transparency groups, knockout, patterns, and arbitrary path clipping. We harness GPU tessellation to shade paths smoothly with gradient meshes. We do all this and render complex Illustrator scenes 2 to 6x faster than CPU rendering at Full HD resolutions; and 5 to 16x faster at Ultra HD resolutions.
NV_path_rendering is an OpenGL extension for GPU-accelerated path rendering. Recent functionality improvements provide better performance, better typography, rounded rectangles, conics, and OpenGL ES support. This functionality is available today with NVIDIA's 337.88 drivers.
The latest NV_path_rendering specification documents these new functional improvements:
https://www.opengl.org/registry/specs/NV/path_rendering.txt
You can find sample code here:
https://github.com/markkilgard/NVprSDK
SIGGRAPH Asia 2012: GPU-accelerated Path RenderingMark Kilgard
Presented at SIGGRAPH Asia 2012 in Singapore on Friday, 30 November 14:15 - 16:00 during the "Points and Vectors" session.
Find the paper at http://developer.nvidia.com/game/gpu-accelerated-path-rendering or on Slideshare.
For thirty years, resolution-independent 2D standards (e.g. PostScript, SVG) have relied largely on CPU-based algorithms for the filling and stroking of paths. Learn about our approach to accelerate path rendering with our GPU-based "Stencil, then Cover" programming interface. We've built and productized our OpenGL-based system.
Programming with NV_path_rendering: An Annex to the SIGGRAPH Asia 2012 paper...Mark Kilgard
This document provides an overview of the programming interface for NV path rendering, an OpenGL extension for accelerating vector graphics on GPUs. It describes the supported path commands, which are designed to match all major path standards. Paths can be specified explicitly from commands and coordinates, from path strings using grammars like SVG or PostScript, or by generating paths from font glyphs. Additional functions allow copying, interpolating, weighting, or transforming existing path objects.
GPUs can accelerate 2D graphics like resolution-independent web content and complex path rendering. A presentation at SIGGRAPH Asia 2012 showed how GPUs can now render overwhelmingly complex 2D scenes in real-time that were shown last year, using a "Stencil, then Cover" approach. Their GPU-accelerated path rendering is available today.
Preprint for SIGGRAPH Asia 2012
Copyright ACM, 2012
For thirty years, resolution-independent 2D standards (e.g. PostScript,
SVG) have depended on CPU-based algorithms for the filling and
stroking of paths. However advances in graphics hardware have largely
ignored the problem of accelerating resolution-independent 2D graphics
rendered from paths.
Our work builds on prior work to re-factor the path rendering task
to leverage existing capabilities of modern pipelined and massively
parallel GPUs. We introduce a two-step “Stencil, then Cover” (StC)
paradigm that explicitly decouples path rendering into one GPU
step to determine a path’s filled or stenciled coverage and a second
step to rasterize conservative geometry intended to test and reset the
coverage determinations of the first step while shading color samples
within the path. Our goals are completeness, correctness, quality, and
performance—but we go further to unify path rendering with OpenGL’s
established 3D rendering pipeline. We have built and productized our
approach to accelerate path rendering as an OpenGL extension.
SIGGRAPH 2012: GPU-Accelerated 2D and Web RenderingMark Kilgard
Video replay: http://nvidia.fullviewmedia.com/siggraph2012/ondemand/SS106.html
Location: West Hall Meeting Room 503, Los Angeles Convention Center
Date: Wednesday, August 8, 2012
Time: 2:40 PM – 3:40 PM
The future of GPU-based visual computing integrates the web, resolution-independent 2D graphics, and 3D to maximize interactivity and quality while minimizing consumed power. See what NVIDIA is doing today to accelerate resolution-independent 2D graphics for web content. This presentation explains NVIDIA's unique "stencil, then cover" approach to accelerating path rendering with OpenGL and demonstrates the wide variety of web content that can be accelerated with this approach.
More information: http://developer.nvidia.com/nv-path-rendering
Presented at the GPU Technology Conference 2012 in San Jose, California.
Tuesday, May 15, 2012.
Standards such as Scalable Vector Graphics (SVG), PostScript, TrueType outline fonts, and immersive web content such as Flash depend on a resolution-independent 2D rendering paradigm that GPUs have not traditionally accelerated. This tutorial explains a new opportunity to greatly accelerate vector graphics, path rendering, and immersive web standards using the GPU. By attending, you will learn how to write OpenGL applications that accelerate the full range of path rendering functionality. Not only will you learn how to render sophisticated 2D graphics with OpenGL, you will learn to mix such resolution-independent 2D rendering with 3D rendering and do so at dynamic, real-time rates.
Presented at the GPU Technology Conference 2012 in San Jose, California.
Monday, May 14, 2012.
Attend this session to get the most out of OpenGL on NVIDIA Quadro and GeForce GPUs. Topics covered include the latest advances available for Cg 3.1, the OpenGL Shading Language (GLSL); programmable tessellation; improved support for Direct3D conventions; integration with Direct3D and CUDA resources; bindless graphics; and more. When you utilize the latest OpenGL innovations from NVIDIA in your graphics applications, you benefit from NVIDIA's leadership driving OpenGL as a cross-platform, open industry standard.
This document provides a review for the final exam in CS 354. It includes:
1) A daily quiz covering surfaces and programmable tessellation from lecture.
2) An overview of topics to be covered on the final exam including fundamentals of computer graphics, practical graphics programming, and content from projects and lectures.
3) Details on the format of the final exam including open notes and textbooks, calculators being allowed, and a prohibition on electronics. The exam will be cumulative and cover all course material.
4) Examples of potential exam questions covering topics like Bezier curves, subdivision surfaces, ray tracing intersections, and the rendering equation that students should review and understand.
CS 354 Surfaces, Programmable Tessellation, and NPR GraphicsMark Kilgard
The document discusses a CS 354 class lecture on surfaces, programmable tessellation, and non-photorealistic rendering. The lecture will cover surfaces and how they are modeled using patches like triangular and quadrilateral patches. It will also cover programmable tessellation, which allows for adaptive level-of-detail and displacement mapping on the GPU. The lecture concludes with information on tools that allow authoring of 3D tessellation content and a comparison of OpenGL specifications that added support for programmable tessellation.
This document contains notes from a CS 354 Performance Analysis lecture on April 26, 2012. It discusses topics including an in-class quiz, the lecturer's office hours, an upcoming project deadline, and the day's lecture on graphics performance analysis using concepts like Amdahl's law, Gustafson's law, and modeling pipeline efficiency. It also provides examples and diagrams related to graphics hardware architecture.
Unlock your organization’s full potential with the 2025 Digital Adoption Blueprint. Discover proven strategies to streamline software onboarding, boost productivity, and drive enterprise-wide digital transformation.
Introduction and Background:
Study Overview and Methodology: The study analyzes the IT market in Israel, covering over 160 markets and 760 companies/products/services. It includes vendor rankings, IT budgets, and trends from 2025-2029. Vendors participate in detailed briefings and surveys.
Vendor Listings: The presentation lists numerous vendors across various pages, detailing their names and services. These vendors are ranked based on their participation and market presence.
Market Insights and Trends: Key insights include IT market forecasts, economic factors affecting IT budgets, and the impact of AI on enterprise IT. The study highlights the importance of AI integration and the concept of creative destruction.
Agentic AI and Future Predictions: Agentic AI is expected to transform human-agent collaboration, with AI systems understanding context and orchestrating complex processes. Future predictions include AI's role in shopping and enterprise IT.
As data privacy regulations become more pervasive across the globe and organizations increasingly handle and transfer (including across borders) meaningful volumes of personal and confidential information, the need for robust contracts to be in place is more important than ever.
This webinar will provide a deep dive into privacy contracting, covering essential terms and concepts, negotiation strategies, and key practices for managing data privacy risks.
Whether you're in legal, privacy, security, compliance, GRC, procurement, or otherwise, this session will include actionable insights and practical strategies to help you enhance your agreements, reduce risk, and enable your business to move fast while protecting itself.
This webinar will review key aspects and considerations in privacy contracting, including:
- Data processing addenda, cross-border transfer terms including EU Model Clauses/Standard Contractual Clauses, etc.
- Certain legally-required provisions (as well as how to ensure compliance with those provisions)
- Negotiation tactics and common issues
- Recent lessons from recent regulatory actions and disputes
Adtran’s new Ensemble Cloudlet vRouter solution gives service providers a smarter way to replace aging edge routers. With virtual routing, cloud-hosted management and optional design services, the platform makes it easy to deliver high-performance Layer 3 services at lower cost. Discover how this turnkey, subscription-based solution accelerates deployment, supports hosted VNFs and helps boost enterprise ARPU.
Multistream in SIP and NoSIP @ OpenSIPS Summit 2025Lorenzo Miniero
Slides for my "Multistream support in the Janus SIP and NoSIP plugins" presentation at the OpenSIPS Summit 2025 event.
They describe my efforts refactoring the Janus SIP and NoSIP plugins to allow for the gatewaying of an arbitrary number of audio/video streams per call (thus breaking the current 1-audio/1-video limitation), plus some additional considerations on what this could mean when dealing with application protocols negotiated via SIP as well.
cloudgenesis cloud workshop , gdg on campus mitasiyaldhande02
Step into the future of cloud computing with CloudGenesis, a power-packed workshop curated by GDG on Campus MITA, designed to equip students and aspiring cloud professionals with hands-on experience in Google Cloud Platform (GCP), Microsoft Azure, and Azure Al services.
This workshop offers a rare opportunity to explore real-world multi-cloud strategies, dive deep into cloud deployment practices, and harness the potential of Al-powered cloud solutions. Through guided labs and live demonstrations, participants will gain valuable exposure to both platforms- enabling them to think beyond silos and embrace a cross-cloud approach to
development and innovation.
Measuring Microsoft 365 Copilot and Gen AI SuccessNikki Chapple
Session | Measuring Microsoft 365 Copilot and Gen AI Success with Viva Insights and Purview
Presenter | Nikki Chapple 2 x MVP and Principal Cloud Architect at CloudWay
Event | European Collaboration Conference 2025
Format | In person Germany
Date | 28 May 2025
📊 Measuring Copilot and Gen AI Success with Viva Insights and Purview
Presented by Nikki Chapple – Microsoft 365 MVP & Principal Cloud Architect, CloudWay
How do you measure the success—and manage the risks—of Microsoft 365 Copilot and Generative AI (Gen AI)? In this ECS 2025 session, Microsoft MVP and Principal Cloud Architect Nikki Chapple explores how to go beyond basic usage metrics to gain full-spectrum visibility into AI adoption, business impact, user sentiment, and data security.
🎯 Key Topics Covered:
Microsoft 365 Copilot usage and adoption metrics
Viva Insights Copilot Analytics and Dashboard
Microsoft Purview Data Security Posture Management (DSPM) for AI
Measuring AI readiness, impact, and sentiment
Identifying and mitigating risks from third-party Gen AI tools
Shadow IT, oversharing, and compliance risks
Microsoft 365 Admin Center reports and Copilot Readiness
Power BI-based Copilot Business Impact Report (Preview)
📊 Why AI Measurement Matters: Without meaningful measurement, organizations risk operating in the dark—unable to prove ROI, identify friction points, or detect compliance violations. Nikki presents a unified framework combining quantitative metrics, qualitative insights, and risk monitoring to help organizations:
Prove ROI on AI investments
Drive responsible adoption
Protect sensitive data
Ensure compliance and governance
🔍 Tools and Reports Highlighted:
Microsoft 365 Admin Center: Copilot Overview, Usage, Readiness, Agents, Chat, and Adoption Score
Viva Insights Copilot Dashboard: Readiness, Adoption, Impact, Sentiment
Copilot Business Impact Report: Power BI integration for business outcome mapping
Microsoft Purview DSPM for AI: Discover and govern Copilot and third-party Gen AI usage
🔐 Security and Compliance Insights: Learn how to detect unsanctioned Gen AI tools like ChatGPT, Gemini, and Claude, track oversharing, and apply eDLP and Insider Risk Management (IRM) policies. Understand how to use Microsoft Purview—even without E5 Compliance—to monitor Copilot usage and protect sensitive data.
📈 Who Should Watch: This session is ideal for IT leaders, security professionals, compliance officers, and Microsoft 365 admins looking to:
Maximize the value of Microsoft Copilot
Build a secure, measurable AI strategy
Align AI usage with business goals and compliance requirements
🔗 Read the blog https://nikkichapple.com/measuring-copilot-gen-ai/
Master tester AI toolbox - Kari Kakkonen at Testaus ja AI 2025 ProfessioKari Kakkonen
My slides at Professio Testaus ja AI 2025 seminar in Espoo, Finland.
Deck in English, even though I talked in Finnish this time, in addition to chairing the event.
I discuss the different motivations for testing to use AI tools to help in testing, and give several examples in each categories, some open source, some commercial.
Fully Open-Source Private Clouds: Freedom, Security, and ControlShapeBlue
In this presentation, Swen Brüseke introduced proIO's strategy for 100% open-source driven private clouds. proIO leverage the proven technologies of CloudStack and LINBIT, complemented by professional maintenance contracts, to provide you with a secure, flexible, and high-performance IT infrastructure. He highlighted the advantages of private clouds compared to public cloud offerings and explain why CloudStack is in many cases a superior solution to Proxmox.
--
The CloudStack European User Group 2025 took place on May 8th in Vienna, Austria. The event once again brought together open-source cloud professionals, contributors, developers, and users for a day of deep technical insights, knowledge sharing, and community connection.
Introducing FME Realize: A New Era of Spatial Computing and ARSafe Software
A new era for the FME Platform has arrived – and it’s taking data into the real world.
Meet FME Realize: marking a new chapter in how organizations connect digital information with the physical environment around them. With the addition of FME Realize, FME has evolved into an All-data, Any-AI Spatial Computing Platform.
FME Realize brings spatial computing, augmented reality (AR), and the full power of FME to mobile teams: making it easy to visualize, interact with, and update data right in the field. From infrastructure management to asset inspections, you can put any data into real-world context, instantly.
Join us to discover how spatial computing, powered by FME, enables digital twins, AI-driven insights, and real-time field interactions: all through an intuitive no-code experience.
In this one-hour webinar, you’ll:
-Explore what FME Realize includes and how it fits into the FME Platform
-Learn how to deliver real-time AR experiences, fast
-See how FME enables live, contextual interactions with enterprise data across systems
-See demos, including ones you can try yourself
-Get tutorials and downloadable resources to help you start right away
Whether you’re exploring spatial computing for the first time or looking to scale AR across your organization, this session will give you the tools and insights to get started with confidence.
Cyber Security Legal Framework in Nepal.pptxGhimire B.R.
The presentation is about the review of existing legal framework on Cyber Security in Nepal. The strength and weakness highlights of the major acts and policies so far. Further it highlights the needs of data protection act .
Offshore IT Support: Balancing In-House and Offshore Help Desk Techniciansjohn823664
In today's always-on digital environment, businesses must deliver seamless IT support across time zones, devices, and departments. This SlideShare explores how companies can strategically combine in-house expertise with offshore talent to build a high-performing, cost-efficient help desk operation.
From the benefits and challenges of offshore support to practical models for integrating global teams, this presentation offers insights, real-world examples, and key metrics for success. Whether you're scaling a startup or optimizing enterprise support, discover how to balance cost, quality, and responsiveness with a hybrid IT support strategy.
Perfect for IT managers, operations leads, and business owners considering global help desk solutions.
Dev Dives: System-to-system integration with UiPath API WorkflowsUiPathCommunity
Join the next Dev Dives webinar on May 29 for a first contact with UiPath API Workflows, a powerful tool purpose-fit for API integration and data manipulation!
This session will guide you through the technical aspects of automating communication between applications, systems and data sources using API workflows.
📕 We'll delve into:
- How this feature delivers API integration as a first-party concept of the UiPath Platform.
- How to design, implement, and debug API workflows to integrate with your existing systems seamlessly and securely.
- How to optimize your API integrations with runtime built for speed and scalability.
This session is ideal for developers looking to solve API integration use cases with the power of the UiPath Platform.
👨🏫 Speakers:
Gunter De Souter, Sr. Director, Product Manager @UiPath
Ramsay Grove, Product Manager @UiPath
This session streamed live on May 29, 2025, 16:00 CET.
Check out all our upcoming UiPath Dev Dives sessions:
👉 https://community.uipath.com/dev-dives-automation-developer-2025/
SAP Sapphire 2025 ERP1612 Enhancing User Experience with SAP Fiori and AIPeter Spielvogel
Explore how AI in SAP Fiori apps enhances productivity and collaboration. Learn best practices for SAPUI5, Fiori elements, and tools to build enterprise-grade apps efficiently. Discover practical tips to deploy apps quickly, leveraging AI, and bring your questions for a deep dive into innovative solutions.
4. 4
Kurt Akeley
• Led development of OpenGL at Silicon Graphics (SGI)
• Co-founded SGI
• Lead development of SGI’s high-end graphics hardware
• Co-author of OpenGL specification
• Returned to Stanford University to complete Ph.D.
• Co-developed Cg “C for graphics” language at NVIDIA
• Principal Researcher, Microsoft Research Silicon Valley
• Spent time at Microsoft Research Asia in Beijing
• Member of US National Academy of Engineering
5. 5
Mark Kilgard
• Principal System Software Engineer, NVIDIA, Austin, Texas
• Developed original OpenGL driver for 1st
GeForce GPU
• Specified many key OpenGL extensions
• Works on Cg for portable programmable shading
• NVIDIA Distinguished Inventor
• Before NVIDIA, worked at Silicon Graphics
• Worked on X Window System integration for OpenGL
• Developed popular OpenGL Utility Toolkit (GLUT)
• Wrote book on OpenGL and X, co-authored Cg Tutorial
6. 6
Marc Levoy
• Moderator for our facilitated discussion
• Professor of Computer Science and Electrical
Engineering
• Stanford University
• SIGGRAPH Computer Graphics Achievement Award
• ACM Fellow
8. 8
Check Out the Course Notes (1)
• Look to www.opengl.org web site for our final slides
• New Material
• “An Incomplete History of OpenGL” (Kilgard)
• How the OpenGL graphics system developed
• “Using Vertex Buffer Objects Well” (Kilgard)
• Learn how to use Vertex Buffers objects for high
vertex processing rates
9. 9
Check Out the Course Notes (2)
• Paper Reprints
• OpenGL design rationale from its specification co-
authors (Segal, Akeley)
• Realizing OpenGL: two implementations of one
architecture (Kilgard)
• Graphics hardware: GTX, RealityEngine,
InfiniteReality, GeForce 6800
• Key developments in graphics hardware design
over last 20 years
• GPU Programmability: “User-Programmable Vertex
Engine” and “Cg” SIGGAPH papers
• “How GPUs Work” (Luebke, Humpherys)
11. 11
Modern OpenGL
• History
• How did OpenGL get where it is now?
• Present
• Version 3.0
• Functionality beyond 3.0
12. 12
An Overview History of OpenGL
• Pre-history 1991
• IRIS GL, a proprietary Graphics Library by SGI
• OpenGL, an open standard for 3D
• Focus: procedural hardware-accelerated 3D graphics
• Governed by Architectural Review Board (ARB)
• Extensibility planned into design
• Competition
• Proprietary APIs (1991-1995)
• PHIGS & PEX for X Window System (1992-1997)
• Microsoft’s Direct3D (1998-)
14. 14
OpenGL’s Design Philosophy
• High-performance
• Assumes hardware
acceleration
• Defined by a specification
• Rather than a de-facto
implementation
• Rendering state machine
• Procedural
• Not a window system,
not a scene graph
• No initial sub-setting
• Extensible
• Data type rich
• Cross-platform
• Window system-
independent core
• X Window System,
Microsoft Windows,
OS/2, OS X, etc.
• Multi-language bindings
• C, FORTRAN, etc.
• Not merely an API,
rather a system
15. 15
Timeline of OpenGL’s Development
1992 1994 1996 1998 2000 2002 2004 2006 2008
OpenGL 1.0 approved
OpenGL 1.1
OpenGL 1.2
Multitexture added (1.2.1)
OpenGL 1.3
OpenGL 1.4
OpenGL 1.5
OpenGL 2.0
OpenGL 2.1
OpenGL 3.0
SGI
Infinite-
Reality
OpenGL Utility
Toolkit (GLUT)
released
Mesa
3D
open
source
Khronos
controls
OpenGL
1st
GPU for PCs
with single-chip
transform &
lighting for
OpenGL
(GeForce)
NT 3.51
bring
OpenGL
to PCs
OpenGL ES for embedded devices
1st
commercial
OpenGL
implementation
(DEC)
16. 16
Competitive 3D APIs
• OpenGL has always existed in
competition with other APIs
• Strengthened OpenGL by driving
feature parity
• OpenGL’s competitive strengths:
1. Cross platform, open process
2. API stability, extensibility
3. Clean initial design & specification
1992 1994 1996 1998 2000 2002 2004 2006 2008
Proprietary Unix workstation 3D APIs
XGL
Doré
Starbase
IRIS GL
X Consortium 3D standard
PEX
Microsoft Direct3D
DirectX 3
DirectX 5
DirectX 6
DirectX 7
DirectX 8
DirectX 9
DirectX 10
17. 17
OpenGL 1.0
1992 1994 1996 1998 2000 2002 2004 2006 2008
OpenGL 1.0 approved
OpenGL 1.1
OpenGL 1.2
Multitexture added (1.2.1)
OpenGL 1.3
OpenGL 1.4
OpenGL 1.5
OpenGL 2.0
OpenGL 2.1
OpenGL 3.0
SGI
Infinite-
Reality
OpenGL Utility
Toolkit (GLUT)
released
Mesa
3D
open
source
Khronos
controls
OpenGL
1st
GPU for PCs
with single-chip
transform &
lighting for
OpenGL
(GeForce)
NT 3.51
bring
OpenGL
to PCs
OpenGL ES for embedded devices
1st
commercial
OpenGL
implementation
(DEC)
•Immediate mode
•Vertex transformation and lighting
•Points, lines, polygons
•Stippling, wide points and lines
•Bitmaps, image rectangles, and pixel reads
•Pixel store and transfer
•1D and 2D textures, fog, and scissor
•Display lists and evaluators
•RGBA and color index color models
•Color, depth, stencil, and accumulation buffers
•Selection and feedback modes
•Queries
19. 19
SGI “Classic” Hardware View of OpenGL
3D Application
or Game
• Entirely fixed-function, no programmability
• High-end SGI hardware manifested functionality
in distinct chips
OpenGL API
Front End
Vertex
Assembly
Vertex
Transform & Lighting
Primitive Assembly,
Clipping, Setup,
and Rasterization
Texture &
Fog
Texture Fetch
Raster
Operations
Framebuffer Access
Memory Interface
Graphics Hardware
Boundary
1992
Graphics data flow
Memory operations
Fixed-function unit
Programmable unit
30. 30
GeForce 3 & 4 Ti (NV2x) View of OpenGL
3D Application
or Game
• Programmable vertex processing
• Highly configurable fragment processing
OpenGL API
GPU
Front End
Vertex
Assembly
Vertex
Program
Primitive Assembly,
Clipping, Setup,
and Rasterization
Multi-texture
shaders &
Combiners
Texture Fetch
Raster
Operations
Framebuffer Access
Memory Interface
CPU – GPU
Boundary
2001
Attribute Fetch
33. 33
OpenGL 1.4
1992 1994 1996 1998 2000 2002 2004 2006 2008
OpenGL 1.0 approved
OpenGL 1.1
OpenGL 1.2
Multitexture added (1.2.1)
OpenGL 1.3
OpenGL 1.4
OpenGL 1.5
OpenGL 2.0
OpenGL 2.1
OpenGL 3.0
SGI
Infinite-
Reality
OpenGL Utility
Toolkit (GLUT)
released
Mesa
3D
open
source
Khronos
controls
OpenGL
1st
GPU for PCs
with single-chip
transform &
lighting for
OpenGL
(GeForce)
NT 3.51
bring
OpenGL
to PCs
OpenGL ES for embedded devices
1st
commercial
OpenGL
implementation
(DEC)
• Automatic mipmap generation
• Shadow-mapping
• Depth textures and shadow comparisons
• Texture level-of-detail bias
• Texture mirrored repeat wrap mode
• Multi-texture combination
• Fog coordinate
• Secondary color
• Configurable point size attenuation
• Color blending improvements
• Stencil wrap operations
• Window-space raster position specification
34. 34
Hardware Shadow Mapping
Without shadow mappingWithout shadow mapping WithWith shadow mappingshadow mapping
Depth map from lightDepth map from light
source’s viewsource’s view
Darker is closerDarker is closer
lightlight
positionposition
Projective Texturing (1.0) &
Polygon Offset (1.1)
key enablers
35. 35
Shadow Mapping Explained
Planar distance from lightPlanar distance from light Depth map projected onto sceneDepth map projected onto scene
≤≤ ==
lessless
thanthan
True “un-shadowed”True “un-shadowed”
region shown greenregion shown green
equalsequals
36. 36
OpenGL 1.5
1992 1994 1996 1998 2000 2002 2004 2006 2008
OpenGL 1.0 approved
OpenGL 1.1
OpenGL 1.2
Multitexture added (1.2.1)
OpenGL 1.3
OpenGL 1.4
OpenGL 1.5
OpenGL 2.0
OpenGL 2.1
OpenGL 3.0
SGI
Infinite-
Reality
OpenGL Utility
Toolkit (GLUT)
released
Mesa
3D
open
source
Khronos
controls
OpenGL
1st
GPU for PCs
with single-chip
transform &
lighting for
OpenGL
(GeForce)
NT 3.51
bring
OpenGL
to PCs
OpenGL ES for embedded devices
1st
commercial
OpenGL
implementation
(DEC)
• Vertex buffer objects (VBOs)
• Occlusion queries
• Generalized shadow mapping functions
37. 37
GeForce FX (NV3x) View of OpenGL
3D Application
or Game
• Programmable fragment processing
• 16 texture units, IEEE 754 32-bit floating-point
• Vertex program branching
OpenGL API
GPU
Front End
Vertex
Assembly
Vertex
Program
Primitive Assembly,
Clipping, Setup,
and Rasterization
Fragment
Program
Texture Fetch
Raster
Operations
Framebuffer Access
Memory Interface
CPU – GPU
Boundary
2003
Attribute Fetch
39. 39
OpenGL Fragment
Program Flowchart
More
Instructions?
Read Interpolants
and/or Registers
Map Input values:
Swizzle, Negate, etc.
Perform Instruction
Math / Operation
Write Output
Register with
Masking
Begin
Fragment
Fetch & Decode
Next Instruction
Temporary
Registers
initialized to
0,0,0,0
Output
Depth & Color
Registers
initialized to 0,0,0,1
Initialize
Parameters
Emit Output
Registers as
Transformed
Vertex
End
Fragment
Fragment
Program
Instruction
Loop
Fragment
Program
Instruction
Memory
Texture
Fetch
Instruction?
yes
no
no
Compute Texture
Address & Level-
of-detail & Fetch
Texels
Filter
Texels
yes
Texture
Images
Primitive
Interpolants
41. 41
Core OpenGL fragment texturing & coloring
Point
Rasterization
Line
Rasterization
Polygon
Rasterization
Pixel Rectangle
Rasterization
Bitmap
Rasterization
From
Primitive
Assembly
DrawPixels
Bitmap
Conventional
Texture Fetching
Texture
Environment
Application
Color Sum
Fog
To raster
operations
Coverage
Application
Texture Unit 0
Texture Unit 1
Texture Unit 0
Texture Unit 1
42. 42 NV1x OpenGL fragment texturing & coloring
Point
Rasterization
Line
Rasterization
Polygon
Rasterization
Pixel Rectangle
Rasterization
Bitmap
Rasterization
From
Primitive
Assembly
DrawPixels
Bitmap
Conventional
Texture Fetching
Texture
Environment
Application
Color Sum
Fog
To raster
operations
Coverage
Application
Register
Combiners
Texture Unit 0
General Stage 1
Final Stage
Texture Unit 1
General Stage 0
Texture Unit 0
Texture Unit 1
GL_REGISTER_COMBINERS_NV
enable
43. 43
Texture Shader 3
…
Texture Shader 1
Texture Shader 0
Register
Combiners
NV2x OpenGL fragment texturing & colorin
Point
Rasterization
Line
Rasterization
Polygon
Rasterization
Pixel Rectangle
Rasterization
Bitmap
Rasterization
From
Primitive
Assembly
DrawPixels
Bitmap
Conventional
Texture Fetching
Texture
Environment
Application
Color Sum
Fog
To raster
operations
Coverage
Application
Texture Shaders
General Stage 1
Final Combiner
General Stage 0
General Stage 7
…Texture Unit 3
…
Texture Unit 1
Texture Unit 0
Texture Unit 3
…
Texture Unit 1
Texture Unit 0
GLTEXTURE_SHADER_NV
enable
GL_REGISTER_COMBINERS_NV
enable
44. 44
Fragment Program
Instruction 0
Texture Shader 3
…
Texture Shader 1
Texture Shader 0
NV3x OpenGL fragment texturing & coloring
Point
Rasterization
Line
Rasterization
Polygon
Rasterization
Pixel Rectangle
Rasterization
Bitmap
Rasterization
From
Primitive
Assembly
DrawPixels
Bitmap
Conventional
Texture Fetching
Texture
Environment
Application
Color Sum
Fog
To raster
operations
Coverage
Application
Texture Shaders
General Stage 1
Final Combiner
General Stage 0
General Stage 7
…
Texture Unit 3
…
Texture Unit 1
Texture Unit 0
Texture Unit 3
…
Texture Unit 1
Texture Unit 0
…
Fragment Program
Fragment Program
Instruction 1023
GL_REGISTER_COMBINERS_NV
enable
GLTEXTURE_SHADER_NV
enable
GL_FRAGMENT_PROGRAM_NV
enable
!!FP1.0 or
!!ARBfp1.0
programs
45. 45
OpenGL 2.0
1992 1994 1996 1998 2000 2002 2004 2006 2008
OpenGL 1.0 approved
OpenGL 1.1
OpenGL 1.2
Multitexture added (1.2.1)
OpenGL 1.3
OpenGL 1.4
OpenGL 1.5
OpenGL 2.0
OpenGL 2.1
OpenGL 3.0
SGI
Infinite-
Reality
OpenGL Utility
Toolkit (GLUT)
released
Mesa
3D
open
source
Khronos
controls
OpenGL
1st
GPU for PCs
with single-chip
transform &
lighting for
OpenGL
(GeForce)
NT 3.51
bring
OpenGL
to PCs
OpenGL ES for embedded devices
1st
commercial
OpenGL
implementation
(DEC)
• Programmable shading
• OpenGL Shading Language (GLSL)
• Multiple color buffer rendering targets
• Non-power-of-two texture dimensions
• Point sprites
• Separate blend equation
• Two-sided stencil testing
46. 46
GeForce 6 & 7 (NV4x/G7x) View of OpenGL
3D Application
or Game
• Limited vertex texturing
• Fragment branching
• Multiple render targets & floating-point blending
OpenGL API
GPU
Front End
Vertex
Assembly
Vertex
Program
Primitive Assembly,
Clipping, Setup,
and Rasterization
Fragment
Program
Texture Fetch
Raster
Operations
Framebuffer Access
Memory Interface
CPU – GPU
Boundary
2004
Attribute Fetch
47. 47
Primitive
Program
GeForce 8 & 9 (G8x/G9x) View of OpenGL
3D Application
or Game
• Primitive (geometry) programs
• Parameter reads from buffer objects
• Transform feedback (stream out)
OpenGL API
GPU
Front End
Vertex
Assembly
Vertex
Program
,
Clipping, Setup,
and Rasterization
Fragment
Program
Texture Fetch
Raster
Operations
Framebuffer Access
Memory Interface
CPU – GPU
Boundary
2006
Attribute Fetch
Primitive
Assembly
Parameter Buffer Read
48. 48
Primitive
Program
OpenGL Pipeline Fixed-function Steps
• Much of functional pipeline remains fixed-function
• Vital to maintaining performance and data flow
• Hard to compete with hard-wired rasterization, Zcull, and pixel compression
GPU
Front End
Vertex
Assembly
Vertex
Program
,
Clipping, Setup,
and Rasterization
Fragment
Program
Texture Fetch
Raster
Operations
Framebuffer Access
Memory Interface 2006
Attribute Fetch
Primitive
Assembly
Parameter Buffer Read
49. 49
Primitive
Program
OpenGL Pipeline Programmable Domains
• New geometry shader domain for per-primitive programmable processing
• Unified Streaming Processor Array (SPA) architecture means same capabilities
for all domains
GPU
Front End
Vertex
Assembly
Vertex
Program
,
Clipping, Setup,
and Rasterization
Fragment
Program
Texture Fetch
Raster
Operations
Framebuffer Access
Memory Interface 2006
Attribute Fetch
Primitive
Assembly
Parameter Buffer Read
Can be
unified
hardware!
50. 50
OpenGL 2.1
1992 1994 1996 1998 2000 2002 2004 2006 2008
OpenGL 1.0 approved
OpenGL 1.1
OpenGL 1.2
Multitexture added (1.2.1)
OpenGL 1.3
OpenGL 1.4
OpenGL 1.5
OpenGL 2.0
OpenGL 2.1
OpenGL 3.0
SGI
Infinite-
Reality
OpenGL Utility
Toolkit (GLUT)
released
Mesa
3D
open
source
Khronos
controls
OpenGL
1st
GPU for PCs
with single-chip
transform &
lighting for
OpenGL
(GeForce)
NT 3.51
bring
OpenGL
to PCs
OpenGL ES for embedded devices
1st
commercial
OpenGL
implementation
(DEC)
• OpenGL Shading Language
(GLSL) improvements
• Non-square matrices
• Pixel buffer objects (PBOs)
• sRGB color space texture formats
51. 51
OpenGL 3.0
1992 1994 1996 1998 2000 2002 2004 2006 2008
OpenGL 1.0 approved
OpenGL 1.1
OpenGL 1.2
Multitexture added (1.2.1)
OpenGL 1.3
OpenGL 1.4
OpenGL 1.5
OpenGL 2.0
OpenGL 2.1
OpenGL 3.0
SGI
Infinite-
Reality
OpenGL Utility
Toolkit (GLUT)
released
Mesa
3D
open
source
Khronos
controls
OpenGL
1st
GPU for PCs
with single-chip
transform &
lighting for
OpenGL
(GeForce)
NT 3.51
bring
OpenGL
to PCs
OpenGL ES for embedded devices
1st
commercial
OpenGL
implementation
(DEC)
• OpenGL Shading Language (GLSL) improvements
• New texture fetches
• True integer data types and operators
• switch/case/default flow control statements
• Conditional rendering based on occlusion query results
• Transform feedback
• Vertex array objects
• Floating-point textures, color buffers, and depth buffers
• Half-precision vertex arrays
• Texture arrays
• Integer textures
• Red and red-green texture formats
• Compressed red and red-green formats
• Framebuffer objects (FBOs)
• Packed depth-stencil pixel formats
• Per-color buffer clearing, blending, and masking
• sRGB color space color buffers
• Fine-grain buffer mapping and flushing
52. 52
Areas of 3.0 Functionality Improvement
• Programmability
• Shader Model 4.0 features
• OpenGL Shading Language (GLSL) 1.30
• Texturing
• New texture representations and formats
• Framebuffer operations
• Framebuffer objects
• New formats
• New copy (blit), clear, blend, and masking operations
• Buffer management
• Non-blocking and fine-grain update of buffer object data stores
• Vertex processing
• Vertex array configuration objects
• Conditional rendering for occlusion culling
• New half-precision vertex attribute formats
• Pixel processing
• New half-precision external pixel formats
All Brand
New
Core
Features
53. 53
OpenGL 3.0 Programmability
• Shader Model 4.0 additions
• True signed & unsigned integer values
• True integer operators: ^, &, |, <<. >>, %,~
• Texture additions
• Texture arrays
• Base texture size queries
• Texel offsets to fetches
• Explicit LOD and derivative control
• Integer samplers
• Interpolation modifiers: centroid, noperspective, and flat
• Vertex array element number: gl_VertexID
• OpenGL Shading Language (GLSL) improvements
• ## concatenation in pre-processor for macros
• switch/case/default statements
54. 54
OpenGL 3.0 Texturing Functionality
• Texture representation
• Texture arrays: indexed access to a set of 1D or 2D
texture images
• Texture formats
• Floating-point texture formats
• Single-precision (32-bit, IEEE s23e8)
• Half-precision (16-bit, s10e5)
• Red & red/green texture formats
• Intended as FBO framebuffer formats too
• Compressed red & red/green texture formats
• Shared exponent texture formats
• Packed floating-point texture formats
55. 55
Texture Arrays
• Conventional texture = One logical pre-filtered image
• Texture array = index-able plurality of pre-filtered images
• Rationale is fewer texture object binds when drawing different objects
• No filtering between mipmap sets in a texture array
• All mipmap sets in array share same format/border & base dimensions
• Both 1D and 2D texture arrays
• Require shaders, no fixed-function support
• Texture image specification
• Use glTexImage3D, glTexSubImage3D, etc. to load 2D texture arrays
• No new OpenGL commands for texture arrays
• 3rd
dimension specifies integer array index
• No halving in 3rd
dimension for mipmaps
• So 64×128x17 reduces to 32×64×17
all the way to 1×1×17
56. 56
Texture Arrays Example
• Multiple skins packed in texture array
• Motivation: binding to one multi-skin texture array avoids texture
bind per object
Texture array index
0 1 2 3 4
0
1
2
3
4
Mipmaplevelindex
58. 58
Compact Floating-point Texture Formats
• Packed float format
• No sign bit, independent exponents
• Shared exponent format
• No sign bit, shared exponent, no implied leading 1
5-bit
mantissa
5-bit
exponent
6-bit
mantissa
5-bit
exponent
6-bit
mantissa
5-bit
exponent
bit 31 bit 0
9-bit
mantissa
5-bit
shared exponent
9-bit
mantissa
9-bit
mantissa
bit 31 bit 0
59. 59
1- and 2-component
Block Compression Scheme
• Basic 1-component block compression format
• Borrowed from alpha compression scheme of S3TC 5
8-bit B8-bit A
2 min/max
values
64 bits total per block
+
4x4 Pixel Decoded BlockEncoded Block
16 pixels x 8-bit/componet = 128 bits decoded
so effectively 2:1 compression
16 bits
60. 60
Framebuffer Operations
• Framebuffer objects
• Standardized framebuffer objects (FBOs) for rendering to textures
and renderbuffers
• Render-to-texture
• Multisample renderbuffers for FBOs
• Framebuffer operations
• Copies from one FBO to another, including multisample data
• Per-color attachment color clears, blending, and write masking
• Framebuffer formats
• Floating-point color buffers
• Floating-point depth buffers
• Rendering into framebuffer format with 3 small unsigned floating-
point values packed in a 32-bit value
• Rendering into sRGB color space framebuffers
61. 61
Framebuffer Object Example
• Depth peeling for correctly ordered transparency
• Great render-to-texture application for FBOs
62. 62
Depth Peeling Behind the Scenes
• Depth buffer has closest fragment at all pixels
• Save depth buffer
• Render again, but use depth buffer as
shadow map
• Discard fragment in front of shadow
map’s depth value
• Effectively peels one layer of depth!
• Resulting color buffer is 2nd
closest fragment
• And depth buffer for 2nd
closest
fragments’ depth
• Now repeat peeling more layers
• Use ping-pong depth buffer scheme
• Use occlusion query to detect when no
more fragments to peel
• Composite color layers front-to-back (or back-
to-front)
• Front-to-back peeling can be done during
the peeling process
63. 63
Delicate Color Fidelity with sRGB
• Problem: PC display devices have non-linear (sRGB) display gamut
—delicate color shading looks wrong
Conventional
rendering
(uncorrected
color)
Gamma
correct
(sRGB
rendered)
Softer
and
more
natural
Unnaturally
deep facial
shadows
NVIDIA’s Adriana GeForce 8 Launch Demo
64. 64
What is sRGB?
• A standard color space
• Intended for monitors, printers, and the Internet
• Created cooperatively by HP and Microsoft
• Non-linear, roughly gamma of 2.2
• Intuitively “encodes more dark values”
• OpenGL 2.1 already added sRGB texture formats
• Texture fetch converts sRGB to linear RGB, then filters
• Result takes more than 8-bit fixed-point to represent in shader
• 3.0 adds complementary sRGB framebuffer support
• “sRGB correct blending” converts framebuffer sRGB to linear,
blend with linear color from shader, then convert back to sRGB
• Works with FrameBuffer Objects (FBOs)
sRGB chromaticity
65. 65
So why sRGB? Standard Windows Display
is Not Gamma Corrected
• 25+ years of PC graphics, icons, and images depend on not gamma
correcting displays
• sRGB textures and color buffers compensates for this
“Expected” appearance of
Windows desktop & icons
but 3D lighting too dark
Wash-ed out desktop appearance if
color response was linear
but 3D lighting is correct
Gamma
1.0
Gamma
2.2
linear
color
response
66. 66
Vertex Processing
• Vertex array configuration
• Objects to manage vertex array configuration client
state
• Half-precision floating-point vertex array formats
• Vertex output streaming
• Stream transformed vertex results into buffer object
data stores
• Occlusion culling
• Skip rendering based on occlusion query result
67. 67
Miscellaneous
• Pixel Processing
• Half-precision floating-point pixel external formats
• Buffer Management
• Non-blocking and fine-grain update of buffer object data
stores
68. 68
ARB Extensions to OpenGL 3.0
• OpenGL 3.0 standard provides new ARB extensions
• Extensions go beyond OpenGL 3.0
• Standardized at same time as OpenGL 3.0
• Support features in hardware today
• Specifically
• ARB_geometry_shader4—provides per-primitive programmable
processing
• ARB_draw_instanced—gives shader access to instance ID
• ARB_texture_buffer_object—allows buffer object to be sampled
as a huge 1D unfiltered texture
• Shipping today
• NVIDIA driver provides all three
69. 69
Transform Feedback for Terrain Generation
by Recursive Subdivision
• Geometry shaders + transform feedback
1. Render quads (use 4-vertex line adjacency
primitive) from vertex buffer object
2. Fetch height field
3. Stream subdivided positions and normals
to transform feedback “other” buffer
object
4. Use buffer object as vertex buffer
5. Repeat, ping-pong buffer objects
Computation and data all stays on the GPU!
70. 70
Skin Deformation
• Capture & re-use geometric deformations
Transform
feedback allows
the GPU to
calculate the
interactive,
deforming elastic
skin of the frog
71. 71
Silhouette Edge Rendering
• Uses geometry shader
silhouette
edge
detection
geometry
shader
Complete mesh
Silhouette edges
Useful for non-photorealistic
rendering
Looks like human sketching
72. 72
More Geometry Shader Examples
Shimmering
point sprites
Generate
fins for
lines
Generate
shells for
fur
rendering
73. 73
Improved Interpolation Techniques
•Using geometry shader functionality
Quadratic normal
interpolation
True quadrilateral rendering with
mean value coordinate interpolation
75. 75
OpenGL 2.x ARB Extensions
• Many OpenGL 3.0 extensions have corresponding ARB extensions for
OpenGL 2.1 implementations to advertise
• Helps get 3.0 functionality out sooner, rather than later
• New ARB extensions for 3.0 functionality
• ARB_framebuffer_object—framebuffer objects (FBOs) for render-to-
texture
• ARB_texture_rg—red and red/green texture formats
• ARB_map_buffer_region—non-blocking and fine-grain update of buffer
object data stores
• ARB_instanced_arrays—instance ID available to shaders
• ARB_half_float_vertex—half-precision floating-point vertex array formats
• ARB_framebuffer_sRGB—rendering into sRGB color space framebuffers
• ARB_texture_compression_rgtc—compressed red and red/green texture
formats
• ARB_depth_buffer_float—floating-point depth buffers
• ARB_vertex_array_object—objects to manage vertex array configuration
client state
76. 76
Beyond OpenGL 3.0
OpenGL 3.0
• EXT_gpu_shader4
• NV_conditional_render
• ARB_color_buffer_float
• NV_depth_buffer_float
• ARB_texture_float
• EXT_packed_float
• EXT_texture_shared_exponent
• NV_half_float
• ARB_half_float_pixel
• EXT_framebuffer_object
• EXT_framebuffer_multisample
• EXT_framebuffer_blit
• EXT_texture_integer
• EXT_texture_array
• EXT_packed_depth_stencil
• EXT_draw_buffers2
• EXT_texture_compression_rgtc
• EXT_transform_feedback
• APPLE_vertex_array_object
• EXT_framebuffer_sRGB
• APPLE_flush_buffer_range (modified)
In GeForce 8, 9, & 2xx Series
but not yet core
• EXT_geometry_shader4 (now ARB)
• EXT_bindable_uniform
• NV_gpu_program4
• NV_parameter_buffer_object
• EXT_texture_compression_latc
• EXT_texture_buffer_object (now ARB)
• NV_framebuffer_multisample_coverage
• NV_transform_feedback2
• NV_explicit_multisample
• NV_multisample_coverage
• EXT_draw_instanced (now ARB)
• EXT_direct_state_access
• EXT_vertex_array_bgra
• EXT_texture_swizzle
Plenty of proven OpenGL extensions
for OpenGL Working Group
to draw upon for OpenGL 3.1
77. 77
OpenGL Version Evolution
• Now OpenGL is part of Khronos Group
• Previously OpenGL’s evolution was governed by the OpenGL
Architectural Review Board (ARB)
• Now officially a Khronos working group
• Khronos also standardizes OpenCL, OpenVG, etc.
• How OpenGL version updates happen
• OpenGL participants proposing extensions
• Successful extensions are polished and incorporated into core
• OpenGL 3.0 is great example of this process
• Roughly 20 extensions folded into “core”
• Just 3 of those previously unimplemented
78. 78
29%
17%
15%
15%
4%
2%
2%
2%
2%
2%
2%
2%
1% 1%
4%
15%
Multi-vendor
Silicon Graphics
Architectural Review Board
NVIDIA
ATI
Apple
Mesa3D
Sun Microsystems
OpenGL ES
OpenML
IBM
Intense3D
Hewlett Packard
3Dfx
Other
EXT
SGI
SGIS
SGIX
ARB
NV
Others
Others
OpenGL Extensions by Source
• 44% of extensions are “core” or multi-vendor
• Lots of vendors have initiated extensions
• Extending OpenGL is industry-wide collaboration
ATI
APPLE
MESA
Source: http://www.opengl.org/registry (Dec 2008)
79. 79
What’s Driving OpenGL Modernization?
Human desire for Visual
Intuition and Entertainment
Embarrassing
Parallelism of
Graphics
Increasing
Semiconductor
Density
Particularly the
hardware-amenable,
latency tolerant
nature of rasterization Particularly
interactive video games
81. 81
AA personalpersonal retrospectiveretrospective
• My background:
• Silicon Graphics, 1982-2001
• OpenGL, 1990-2004
• Today’s topics:
• Computer architecture
• Culture and process
• For a more complete coverage see:
• https://graphics.stanford.edu/wikis/cs448-07-spring/
• Mark Kilgard’s excellent course notes
82. 82
Jim Clark and the Geometry EngineJim Clark and the Geometry Engine
• This text is 24 points
– Sub bullets look like this
The Geometry Engine: A VLSI Geometry System for Graphics
Computer Graphics, Volume 16, Number 3
(Proceedings of SIGGRAPH 1982) p127-133, 1982
83. 83
Jim’s helpers: the Stanford gangJim’s helpers: the Stanford gang
IRIS GL
Geometry Engine
IRIS GL
Hardware back-end
Hardware front-end
86. 86
What is computer architecture?What is computer architecture?
• Architecture: “the minimal set of
properties that determine what programs
will run and what results they will produce”
• Implementation: “the logical
organization of the [computer’s] dataflow
and controls”
• Realization: “the physical structure
embodying the implementation”
87. 87
Example: the analog clockExample: the analog clock
• Architecture
• Circular dial divided into twelfths
• Hour hand (short) and minute hand (long)
Example from Computer Architecture, Concepts and Evolution,
Gerrit A. Blaauw and Frederick P. Brooks, Jr., Addison-Wesley, 1997
• Implementation
• A weight, driving a pendulum, or
• A spring, driving a balance wheel, or
• A battery, driving an oscillator, or ….
• Realization
• Gear ratios, pendulum lengths, battery sizes, ...
12
11
10
6
8
9
7 5
4
2
1
3
89. 89
The mainstream viewThe mainstream view
• Table of Contents:
• Fundamentals
• Instruction Sets
• Pipelining
• Advanced Pipelining and ILP
• Memory-Hierarchy Design
• Storage Systems
• Interconnection Networks
• Multiprocessors
90. 90
OpenGL is an architecture
Blaauw/Brooks OpenGL
Different
implementations
IBM 360 30/40/50/65/75
Amdahl
SGI Indy/Indigo/InfiniteReality
NVIDIA GeForce, ATI Radeon, …
Compatibility
Code runs equivalently on all
implementations
Top-level goal
Conformance tests, …
Intentional design
It’s an architecture, whether it was
planned or not .
Carefully planned, though mistakes
were made
Configuration
Can vary amount of resource (e.g.,
memory)
No feature sub-setting
Configuration attributes (e.g.,
framebuffer)
Speed Not a formal aspect of architecture No performance queries
Validity of inputs No undefined operation
All errors specified
No side effects
Little undefined operation
Enforcement
When implementation errors are
found, they are fixed.
Specification rules!
91. 91
But OpenGL is an APIBut OpenGL is an API
(Application Programming Interface)(Application Programming Interface)
• Yes, Blaauw and Brooks talk about (computer) architecture
as though it is always expressed as ISA (Instruction-Set
Architecture)
• But …
• API is just a higher-level programming interface
• “Instruction-Set” Architecture implies other types of
computer architectures (such as “API” Architecture)
• OpenGL has evolved to include ISA-like interfaces
(e.g., the interface below GLSL)
92. 92
We didn’t know …We didn’t know …
• No mention in spec (even 3.0)
• “We view OpenGL as a state …”
• First use in “ARB”
• Architecture Review Board
• Coined by Bill Glazier from “Palo
Alto Architecture Review Board”
• First formal usage (I know of)
• Mark J. Kilgard, Realizing OpenGL: two implementations of one
architecture, Proceedings of the ACM SIGGRAPH/EUROGRAPHICS
workshop on Graphics hardware, p.45-55, August 03-04, 1997,
Los Angeles, California, United States.
94. 94
What is implied by “programmable”?What is implied by “programmable”?
• What does it mean to teach programming?
• Does running a microwave oven count?
• Does defining the geometry of a game “level” count?
• Does specifying OpenGL modes count?
• This seems to be a somewhat open question
• Butler Lampson couldn’t tell me .
• Microsoft developers of teaching tools couldn’t tell me.
• An online search wasn’t very helpful.
• Do we just “know it when we see it”?
• Justice Potter Stewart’s definition of pornography
95. 95
My try at some formalizationMy try at some formalization
• Key ideas:
• Composition choice of placement, sequence
• Non-obvious semantics are interesting and novel
• Imperative maybe there are other kinds of programming
“Composition, the organization of elemental
operations into a non-obvious whole, is the
essence of imperative programming.”
-- Kurt Akeley (Foreword to GPU Gems 3)
96. 96
OpenGL has always been programmableOpenGL has always been programmable
• Follows directly from being an “architecture”
• OpenGL commands are instructions (API as an ISA)
• They can be “composed” to create programs
• Multi-pass rendering is the prototypical example
• But Peercy et al. implemented a RenderMan shader compiler
• Invariance was specified from the start (e.g., same fragments)
• We set out to enable “usage that we didn’t anticipate”
• Obvious for a traditional ISA (e.g., IA32)
• Not so obvious for a graphics API
• Example: texture applies to all primitives, not just triangles
103. 103
Suppose …Suppose …
http://www.opengl.org/registry/
Name
ARB_texture_cube_map
Name Strings
GL_ARB_texture_cube_map
Notice
Copyright OpenGL Architectural Review Board, 1999.
Contact
Michael Gold, NVIDIA (gold 'at' nvidia.com)
Status
Complete. Approved by ARB on 12/8/1999
Version
Last Modified Date: December 14, 1999
Number
ARB Extension #7
Dependencies
None.
Written based on the wording of the OpenGL 1.2.1 specification but not dependent on it.
Overview
This extension provides a new texture generation scheme for cube map textures. Instead of the
current texture providing a 1D, 2D, or 3D lookup into a 1D, 2D, or 3D texture image, the texture is a
set of six 2D images representing the faces of a cube. The (s,t,r) texture coordinates …
104. 104
Complete specificationComplete specification
Name
Name Strings
Notice
Contact
Status
Version
Number
Dependencies
Overview
Issues
New Procedures and Functions
New Tokens
Additions to Chapter 2 of the OpenGL Specification
Additions to Chapter 3 of the OpenGL Specification
Additions to Chapter 4 of the OpenGL Specification
Additions to Chapter 5 of the OpenGL Specification
Additions to Chapter 6 of the OpenGL Specification
Additions to the GLX Specification
Errors
New State (type, query mechanism, initial value, attribute set, specification section)
Usage Examples
105. 105
19 issues19 issues
The spec just linearly interpolates the reflection vectors computed
per-vertex across polygons. Is there a problem interpolating
reflection vectors in this way?
Probably. The better approach would be to interpolate the eye
vector and normal vector over the polygon and perform the reflection
vector computation on a per-fragment basis. Not doing so is likely
to lead to artifacts because angular changes in the normal vector
result in twice as large a change in the reflection vector as normal
vector changes. The effect is likely to be reflections that become
glancing reflections too fast over the surface of the polygon.
Note that this is an issue for REFLECTION_MAP_ARB, but not
NORMAL_MAP_ARB.
106. 106
19 issues …19 issues …
What happens if an (s,t,q) is passed to cube map generation that
is close to (0,0,0), ie. a degenerate direction vector?
RESOLUTION: Leave undefined what happens in this case (but
may not lead to GL interruption or termination).
Note that a vector close to (0,0,0) may be generated as a
result of the per-fragment interpolation of (s,t,r) between
vertices.
107. 107
Trust and integrityTrust and integrity
• Lots of collaboration during the initial design
• But final decisions made by a small group
• SGI played fair
• OpenGL 1.0 didn’t favor SGI equipment (our ports were late )
• SGI obeyed all conformance rules
• SGI didn’t adjust the spec to match our equipment
• The ARB avoided marketing tasks such as benchmarks
• We stuck with technical design issues
• We documented rigorously
• Specification, man pages, …
109. 109
Extension factsExtension facts
• 442 Vendor and “EXT” extension specifications
• Vendor: specific to a single vendor
• EXT: shared by two or more vendors
• 56 “ARB” extensions
• Standardized , likely to be in the next spec revision
• Lots of text …
Source: OpenGL extension registry, December 2008
110. 110
““Specification” sizesSpecification” sizes
Lines Words Chars
56 ARB Extensions 48,674 263,908 2,221,347
All 442 Extensions 209,426 1,076,008 9,079,063
King James Bible 114,535 823,647 5,214,085
New Testament 27,319 188,430 1,197,812
Old Testament 86,783 632,515 3,998,303
111. 111
Beyond the specificationBeyond the specification
• The ARB (now replaced with Khronos)
• Rules of order, secretary, IP, …
• The extension process
• Categories, token syntax, spec templates, enums,
registry, …
• Licensing
• Conformance
• …
112. 112
SummarySummary
• Many mistakes made (see other presentations for lists)
• Created a sustainable culture that values quality and
rigorous documentation
• Defined and evolved the architecture for interactive 3-D
computer graphics
114. 114
Motivation
• Complex APIs and systems have pitfalls
• After 17 years of designed evolution, OpenGL
certainly has its share
• Normal documentation focus:
• What can you do?
• Rather than: What should you do?
115. 115
Communicating Vertex Data
• The way you learn OpenGL:
• Immediate mode
• glBegin, glColor3f, glVertex3f, glEnd
• Straightforward—no ambiguity about vertex data is
• All vertex components are function parameters
• The problem—too function call intensive
• And all vertex data must flow through CPU
116. 116
Example Scenario
• An OpenGL application has to render a set of rectangles
• Rectangle with its parameters
• x, y, height, width, left color, right color, depth
(x,y)
depth order
0.0
1.0
left side color
right side color
height
width
117. 117
Scene Representation
• Each rectangle specified by following RectInfo structure:
• Array of RectInfo structures describes “scene”
• Simplistic scene for sake of teaching
typedef struct {
GLfloat x, y, width, height;
GLfloat depth_order;
GLfloat left_side_color[3]; // red, green, then
blue
GLfloat right_side_color[3]; // red, green, then
blue
} RectInfo;
120. 120
Critique of Immediate Mode
• Advantages
• Straightforward to code and debug
• Easy-to-understand conceptual model
• Building stream of vertices with OpenGL commands
• Avoids driver & application copies of vertex data
• Flexible, allowing totally dynamic vertex generation
• Disadvantages
• Rendering continuously streams attributes through CPU
• Pollutes CPU cache with vertex data
• Function call intensive
• Unable to saturate fast graphics hardware
• CPUs just too slow
• Contrast with vertex array approach…
121. 121
Vertex Array Approach
• Step 1: Copy vertex attributes into vertex arrays
• From: RectInfo array (CPU memory)
• To: interleaved arrays of vertex attributes (CPU
memory)
• Step 2: To render
• Configure OpenGL vertex array client state
• Use glEnableClientState, glVertexPointer,
glColorPointer
• Render quads based on indices into vertex arrays
• Use glDrawArrays
122. 122
Vertex Array Format
• Interleave vertex attributes in color & position arrays
color
position
float = 4 bytes
vertex 0
vertex 1
red
green
blue
x
y
z
red
green
blue
x
y
z
color
position
24 bytes
per vertex
125. 125
Critique of
Simplistic Vertex Array Rendering
• Advantages
• Far fewer OpenGL commands issued
• Disadvantages
• Every render with drawVarrayRectangles calls
initVarrayRectangles
• Allocates, initializes, & frees vertex array memory
every render
• Improve by separating vertex array construction from
rendering
126. 126
Initialize Once, Render Many Approach
• This routine expects base pointer returned by
initVarrayRectangles
void drawInitializedVarrayRectangles(int count, const void *varray)
{
const GLfloat *p = (const GLfloat*) varray;
const GLsizei stride = sizeof(GLfloat)*6; // 3 RGB floats, 3 XYZ floats
glColorPointer(/*rgb*/3, GL_FLOAT, stride, p+0);
glVertexPointer(/*xyz*/3, GL_FLOAT, stride, p+3);
// Assume GL_COLOR_ARRAY and GL_VERTEX_ARRAY are already enabled!
glDrawArrays(GL_QUADS, /*firstIndex*/0, /*indexCount*/count*4);
}
127. 127
Client Memory Vertex Attribute Transfer
GPU
Processor
command
processor
vertex
puller
hardware
rendering
pipeline
CPU
command queue
CPU writes of
command + vertex data
GPU DMA transfer of
command + vertex data
application
(client)
memory
vertex
array
vertex
data travels
through
CPU
memory
reads
CPU
128. 128
Vertex Buffer Object Vertex Attribute Pulling
OpenGL
(vertex)
buffer
object
GPU
command
processor
vertex
puller
hardware
rendering
pipeline
CPU
command queue
CPU writes of
command + vertex indices
vertex
array
GPU DMA transfer of
command data
application
(client)
memory
memory
reads
CPU
GPU DMA
transfer
of vertex
data—CPU never reads data
129. 129
Initializing Vertex Buffer Objects (VBOs)
• Once using vertex arrays, easy to switch to VBOs
• Make the vertex array as before
• Then bind to buffer object and copy data to the buffer
void initVarrayRectanglesInVBO(GLuint bufferName,
int count, const RectInfo *list)
{
char *varray = initVarrayRectangles(count, list);
const GLsizei stride = sizeof(GLfloat)*6; // 3 RGB floats, 3 XYZ floats
const GLint numVertices = 4*count;
const GLsizeiptr bufferSize = stride*numVertices;
glBindBuffer(GL_ARRAY_BUFFER, bufferName);
glBufferData(GL_ARRAY_BUFFER, bufferSize, varray, GL_STATIC_DRAW);
free(varray);
}
130. 130
Rendering from Vertex Buffer Objects
• Once initialized, glBindBuffer to bind to buffer ahead of
vertex array configuration
• Send offsets instead of points
void drawVarrayRectanglesFromVBO(GLuint bufferName,
int count)
{
const char *base = NULL;
const GLsizei stride = sizeof(GLfloat)*6; // 3 RGB floats, 3 XYZ floats
glBindBuffer(GL_ARRAY_BUFFER, bufferName);
glColorPointer(/*rgb*/3, GL_FLOAT, stride, base+0*sizeof(GLfloat));
glVertexPointer(/*xyz*/3, GL_FLOAT, stride, base+3*sizeof(GLfloat));
// Assume GL_COLOR_ARRAY and GL_VERTEX_ARRAY are already enabled!
glDrawArrays(GL_QUADS, /*firstIndex*/0, /*indexCount*/count*4);
}
131. 131
Understanding glBindBuffer
• Buffer object bindings are frequent point of confusion for
programmers
• What does glBindBuffer do really?
• Lots of buffer binding targets:
• GL_ARRAY_BUFFER target—for vertex attribute arrays
• Query with GL_ARRAY_BUFFER_BINDING
• GL_ARRAY_ELEMENT_BUFFER target—for vertex indices,
effectively topology
• Query with GL_ELEMENT_ARRAY_BUFFER_BINDING
• Each vertex array has its own buffer, query with
• GL_VERTEX_ARRAY_BUFFER_BINDING
• GL_COLOR_ARRAY_BUFFER_BINDING
• GL_TEXCOORD_ARRAY_BUFFER_BINDING, etc.
133. 133
Latched Vertex Array Buffer Bindings
• Here’s the confusing part:
glBindBuffer(GL_ARRAY_BUFFER, 34);
glColorPointer(3, GL_FLOAT, color_stride,
(void*)color_offset);
• The glBindBuffer doesn’t change any vertex array
binding
• The GL_ARRAY_BUFFER_BINDING state that
glBindBuffer sets does not itself affect rendering
• It is the glColorPointer call that latches the array buffer
binding to change the color array’s buffer binding!
• Same with all vertex array buffer bindings
134. 134
Binding Buffer Zero is Special
• By default, vertex arrays don’t access buffer objects
• Instead client memory is accessed
• This is because
• The initial buffer binding for a context is zero
• And zero is special
• Zero means access client memory
• You can always resume client memory vertex array access for a given array like this
glBindBuffer(GL_ARRAY_BUFFER, 0); // use client memory
glColorPointer(3, GL_FLOAT, color_stride, color_pointer);
• Different treatment of the “pointer” parameter to vertex array specification commands
• When the current array buffer binding is zero, the pointer value is a client
memory pointer
• When the current array buffer binding is non-zero (meaning it names a buffer
object), the pointer value is “recast” as an offset from the beginning of the buffer
• Once again
• The glBindBuffer(GL_ARRAY_BUFFER,0) call alone doesn’t change any vertex
array buffer bindings
• It takes a vertex array specification command such as glColorPointer to latch the
zero
ensures compatibility
with pre-VBO OpenGL
135. 135
Texture Coordinate Set Selector
• A selector in OpenGL is
• A state variable that controls what state a subsequent command
updates
• Examples of commands that modify selectors
• glMatrixMode, glActiveTexture, glClientActiveTexture
• A selector is different from latched state
• Latched state is a specified value that is set (or “latched”) when
a subsequent command is called
• Pitfall warning: glTexCoordPointer both
• Relies on the glClientActiveTexture command’s selector
• And latches the current array buffer binding for the selected
texture coordinate vertex array
• Example
glBindBuffer(GL_ARRAY_BUFFER, 34);
glClientActiveTexture(GL_TEXTURE3);
glTexCoordPointer(2, GL_FLOAT, uv_stride, (void*)buffer_offset);
buffer value glTexCoordPointer latches
selector glTexCoordPointer uses
136. 136
OpenGL’s Modern Buffer-centric
Processing Model
Vertex Array Buffer
Object (VaBO)
Transform Feedback
Buffer (XBO)
Parameter
Buffer (PaBO)
Pixel Unpack
Buffer (PuBO)
Pixel Pack
Buffer (PpBO)Bindable
Uniform Buffer
(BUB)
Texture Buffer
Object (TexBO)
Vertex Puller
Vertex Shading
Geometry
Shading
Fragment
Shading
Texturing
Array Element Buffer
Object (VeBO)
Pixel
Pipeline
vertex data
texel data
pixel data
parameter data
(not ARB functionality yet)
glBegin, glDrawElements, etc.
glDrawPixels, glTexImage2D, etc.
glReadPixels,
etc.
Framebuffer
137. 137
Usages of OpenGL Buffers Objects
• Vertex uses (VBOs)
• Input to GL: Vertex attribute buffer objects
• Color, position, texture coordinate sets, etc.
• Input to GL: Vertex element buffer objects
• Indices
• Output from GL: Transform feedback
• Streaming vertex attributes out
• Texture uses (TexBOs)
• Texturing from: Texture buffer objects
• Pixel uses (PBOs)
• Output from GL: Pixel pack buffer objects
• glReadPixels
• Input from GL: Pixel unpack buffer objects
• glDrawPixels, glBitmap, glTexImage2D, etc.
• Shader uses (PaBOs, UBOs)
• Input to assembly program: Parameter buffer objects
• Input to GLSL program: Bind-able uniform buffer objects
Key point: OpenGL
buffers are containers for
bytes; a buffer is not tied
to any particular usage
141. 141
Topics in OpenGL Implementation
• Dual-core OpenGL driver operation
• What goes into a texture fetch?
• You give me some texture coordinates
• I give you back a color
• Could it be any simpler?
142. 142
OpenGL Drivers for Multi-core CPUs
• Today dual-core processors in PCs is nearly ubiquitous
• 4, 6, 8, and more cores are clearly coming
• How does OpenGL implementation exploit this trend?
• Answer: develop dual-core OpenGL driver
144. 144
Dual-core Performance Results
• A well-behaved OpenGL application benefiting from a
dual-core mode of OpenGL driver operations
0
50
100
150
200
250
Single core Dual core Null driver
Frames
per second
Mode of OpenGL driver operation
145. 145
Good Dual-core Driver Practices
• General advice
• Display lists execute on the driver’s worker thread!
• You want to avoid situations where the application thread must
“sync” with the driver thread
• Specific advice
• Avoid OpenGL state queries
• More on this later
• Avoid querying OpenGL errors in production code
• Bad behavior is detected automatically and leads to exit from the
dual-core mode
• Back to the standard single-core driver mode of operation
• “Do no harm”
146. 146
Consider an OpenGL texture fetch
• Seems very simple
• Input: texture coordinates (s,t,r,q)
• Output: some color (r,g,b,a)
• Just a simple function, written in Cg/HLSL:
uniform sampler2D decal : TEXUNIT2;
float4 texcoord : TEXCOORD3;
float4 rgba = tex2D(decal, texcoordset.st);
• Compiles to single instruction:
TEX o[COLR], f[TEX3], TEX2, 2D;
• Implementation is much more involved!
147. 147
Anatomy of a Texture Fetch
Filtered
texel
vector
Texel
Selection
Texel
Combination
Texel
offsets
Texel
data
Texture images
Combination
parameters
Texture
coordinate
vector
Texture parameters
156. 156
Interpolation
• First we need to interpolate (s,t,r,q)
• This is the f[TEX3] part of the TXP instruction
• Projective texturing means we want (s/q, t/q)
• And possible r/q if shadow mapping
• In order to correct for perspective, hardware actually interpolates
• (s/w, t/w, r/w, q/w)
• If not projective texturing, could linearly interpolate inverse w (or 1/w)
• Then compute its reciprocal to get w
• Since 1/(1/w) equals w
• Then multiply (s/w,t/w,r/w,q/w) times w
• To get (s,t,r,q)
• If projective texturing, we can instead
• Compute reciprocal of q/w to get w/q
• Then multiple (s/w,t/w,r/w) by w/q to get (s/q, t/q, r/q)
Observe projective
texturing is same
cost as perspective
correction
157. 157
Interpolation Operations
• Ax + By + C per scalar linear interpolation
• 2 MADs
• One reciprocal to invert q/w for projective texturing
• Or one reciprocal to invert 1/w for perspective
texturing
• Then 1 MUL per component for s/w * w/q
• Or s/w * w
• For (s,t) means
• 4 MADs, 2 MULs, & 1 RCP
• (s,t,r) requires 6 MADs, 3 MULs, & 1 RCP
• All floating-point operations
158. 158
Texture Space Mapping
• Have interpolated & projected coordinates
• Now need to determine what texels to fetch
• Multiple (s,t) by (width,height) of texture base level
• Could convert (s,t) to fixed-point first
• Or do math in floating-point
• Say based texture is 256x256 so
• So compute (s*256, t*256)=(u,v)
159. 159
Mipmap Level-of-detail Selection
• Tri-linear mip-mapping means compute appropriate
mipmap level
• Hardware rasterizes in 2x2 pixel entities
• Typically called quad-pixels or just quad
• Finite difference with neighbors to get change in u
and v with respect to window space
• Approximation to ∂u/∂x, ∂u/∂y, ∂v/∂x, ∂v/∂y
• Means 4 subtractions per quad (1 per pixel)
• Now compute approximation to gradient length
• p = max(sqrt((∂u/∂x)2
+(∂u/∂y)2
),
sqrt((∂v/∂x)2
+(∂v/∂y)2
))
one-pixel separation
160. 160
Level-of-detail Bias and Clamping
• Convert p length to power-of-two level-of-detail and
apply LOD bias
• λ = log2(p) + lodBias
• Now clamp λ to valid LOD range
• λ’ = max(minLOD, min(maxLOD, λ))
161. 161
Determine Mipmap Levels and
Level Filtering Weight
• Determine lower and upper mipmap levels
• b = floor(λ’)) is bottom mipmap level
• t = floor(λ’+1) is top mipmap level
• Determine filter weight between levels
• w = frac(λ’) is filter weight
162. 162
Determine Texture Sample Point
• Get (u,v) for selected top and bottom mipmap levels
• Consider a level l which could be either level t or b
• With (u,v) locations (ul,vl)
• Perform GL_CLAMP_TO_EDGE wrap modes
• uw = max(1/2*widthOfLevel(l),
min(1-1/2*widthOfLevel(l), u))
• vw = max(1/2*heightOfLevel(l),
min(1-1/2*heightOfLevel(l), v))
• Get integer location (i,j) within each level
• (i,j) = ( floor(uw* widthOfLevel(l)),
floor(vw* ) )
border
edge
s
t
164. 164
Determine Texel Addresses
• Assuming a texture level image’s base pointer, compute a texel
address of each texel to fetch
• Assume bytesPerTexel = 4 bytes for RGBA8 texture
• Example
• addr00 = baseOfLevel(l) +
bytesPerTexel*(i0+j0*widthOfLevel(l))
• addr01 = baseOfLevel(l) +
bytesPerTexel*(i0+j1*widthOfLevel(l))
• addr10 = baseOfLevel(l) +
bytesPerTexel*(i1+j0*widthOfLevel(l))
• addr11 = baseOfLevel(l) +
bytesPerTexel*(i1+j1*widthOfLevel(l))
• More complicated address schemes are needed for good texture
locality!
165. 165
Initiate Texture Reads
• Initiate texture memory reads at the 8 texel addresses
• addr00, addr01, addr10, addr11 for the upper level
• addr00, addr01, addr10, addr11 for the lower level
• Queue the weights a, b, and w
• Latency FIFO in hardware makes these weights
available when texture reads complete
166. 166
Phased Data Flow
• Must hide long memory read latency between Selection
and Combination phases
Texel
Selection
Texel
Combination
Texel
offsets
Texel
data
Texture images
Combination
parameters
Texture
coordinate
vector
Texture parameters
Memory
reads for
samples
FIFOing of
combination
parameters
167. 167
Texel Combination
• When texels reads are returned, begin filtering
• Assume results are
• Top texels: t00, t01, t10, t11
• Bottom texels: b00, b01, b10, b11
• Per-component filtering math is tri-linear filter
• RGBA8 is four components
• result = (1-a)*(1-b)*(1-w)*b00 +
(1-a)*b*(1-w)*b*b01 +
a*(1-b)*(1-w)*b10 +
a*b*(1-w)*b11 +
(1-a)*(1-b)*w*t00 +
(1-a)*b*w*t01 +
a*(1-b)*w*t10 +
a*b*w*t11;
• 24 MADs per component, or 96 for RGBA
• Lerp-tree could do 14 MADs per component, or 56 for RGBA
169. 169
Observations about the Texture Fetch
• Lots of ways to implement the math
• Lots of clever ways to be efficient
• Lots more texture operations not considered in this analysis
• Compression
• Anisotropic filtering
• sRGB
• Shadow mapping
• Arguably TEX instructions are “world’s most CISC instructions”
• Texture fetches are incredibly complex instructions
• Good deal of GPU’s superiority at graphics operations over CPUs is
attributable to TEX instruction efficiency
• Good for compute too
171. 171
What drives OpenGL’s future?
• GPU graphics functionality
• Tessellation & geometry amplification
• Ratio of GPU to single-core CPU performance
• Compatibility
• Direct3Disms
• OpenGLisms
• Deprecation
• Compute support
• OpenCL, CUDA, Stream processing
• Unconventional graphics devices
172. 172
Better Graphics Functionality
• Expect more graphics performance
• Easy prediction
• Rasterization nowhere near peaked
• Ray tracing fans—GPUs make rays and triangles
faster
– Market still values triangles more than rays
• Expect more generalized graphics functionality
• Trend for texture enhancements likely to continue
173. 173
Geometry Amplification
• Tessellation
• Programmable hardware support coming
• True market demand probably not tessellation per se
• Games want visual richness
• Texture and shading have created much richness
– Often “pixel richness” as substitute for geometry richness
• Increasingly “visual richness” means geometric complexity
• Geometry Amplification may be better term
• Tessellation is one way to improve tessellation
– Recognize the limits of bi-variate patches for
representing geometry
177. 177
Limits of Patch Tessellation
• What games tend to want
• Here’s 8 vertices (bounding
box), go draw a fire truck
• Here’s a few vertices, go draw
a tree
178. 178
Tessellation Not New to OpenGL
• At least three different bi-variate patch tessellation schemes have
been added to OpenGL
• Evaluators (OpenGL 1.0)
• NV_evaluators (GeForce 3)
• water-tight
• adaptive level-of-detail
• forward differencing approach
• ATI_pn_triangles Curved PN Triangles (Radeon)
• tessellated triangle based on positions+normals
• None succeeded
• Hard to integrate into art pipelines
• Didn’t offer enough performance advantage
GLUT’s wire-frame
teapot
[Moreton 20001]
[Vlachos 20001]
179. 179
Ratio of CPU core-to-GPU Performance
• Well known computer architecture trends now
• Single-threaded CPU performance trends are stalled
• Multi-core is CPU designer response
• GPU performance continues on-trend
• What does this mean for graphics API design?
• CPUs must generate more visually rich API command
streams to saturate GPUs
• Can’t just send more commands faster
• Single-threaded CPUs can only do so much
• So must send more powerful commands
180. 180
Déjà vu
• We’ve been here before
• Early 1980s: Graphics terminals used to be
connected to minicomputers by slow speed
interconnects
• CPUs themselves far too slow for real-time
rendering
• Resulting rendering model
• Download scene database to graphics terminal
• Adjust viewing and modeling parameters
• Send “redraw scene” command
181. 181
What Happened
• Such “scene processor” hardware not very flexible
• Difficult to animate anything beyond rigid dynamics
• Eventually SGI and others matched CPUs and interconnects to
graphics performance
• Result was IRIS GL’s immediate mode
• CPU fast enough to send geometry every frame
• OpenGL took this model
• Over time added vertex arrays, vertex buffers, texturing,
programmable shading, and more performance
• CPU performance became limiter still
• Better graphics driver tuning helped
• Dual-core drivers help some more
182. 182
OpenGL’s Most Powerful Command
• Available since OpenGL 1.0
• Can render essentially anything OpenGL can render!
• Takes just one parameter
• The command
glCallList(GLuint displayListName);
• Power of display lists comes from
• Playing back arbitrary compiled commands
• Allowing for hierarchical calling of display list
• A display list can contain glCallList or glCallLists
• Ability of application to re-define display lists
• No editing, but can be re-defined
183. 183
Enhanced Display Lists
• OpenGL 1.0 display lists are too inflexible
• Pixel & vertex data “compiled into” display lists
• Binding objects always “by name”
• Rather than “by reference
• These problems can be fixed
• Modern OpenGL supports buffers for transferring vertices and
pixels
• Compile commands into display lists that defer vertex and
pixel transfers until execute-time
– Rather than compile-time
• Allow objects (textures, buffers, programs) to be bound “by
reference” or “by name”
184. 184
Other Display List Enhancements
• Conditional display list execution
• Relaxed vertex index and command order
• Parallel construction of display lists by multiple threads
General insight: Easier for driver to optimize application’s
graphics command stream if it gets to
1) see the repetition in the command stream clearly
2) take time to analyze and optimize usage
185. 185
Conditional Display List Execution
• Today’s occlusion query
• Application must “query” to learn occlusion result
• Latency too great to respond
• Application can use OpenGL 3.0’s conditional render
capability
• But just skips vertex pulling, not state changes
• Conditional display list execution
• Allow a glCallList to depend on the occlusion result
from an occlusion query object
• Allows in-band occlusion querying
• Skip both vertex pulling and state changes
186. 186
Relaxed Vertex Index and Command Order
• OpenGL today always executes commands “in order”
• Sequentially requirement
• Provide compile-time specification of re-ordering allowances
• Allows GL implementation to re-order
• Vertex indices within display list’s vertex batch
• Commands within display list
• Key rule: state vector rendering command executes in must
match the state if command was rendered sequentially
• Allow static or dynamic re-ordering
• Static re-ordering needed for multi-pass invariances
• Past practice
• IRIS Performer would sort rendering by state changes for
performance
• [Sander 2007] show substantial benefit for vertex ordering
187. 187
Parallel Display List Construction
• Today’s model
• Single thread makes all OpenGL rendering calls
• Minimizes GPU context switch overhead
• Ties command generation rate to single core’s
CPU performance
• Enhanced display list model
• Multiple threads can build display lists in parallel
• Single thread still executes display lists
• Countable semaphore objects used to synchronize
hand-off of display lists built by other threads with
main rendering thread
188. 188
Rethinking Display Lists
• Display lists have been proposed for deprecation
• Right as we really need them!
• Much more interesting to enhance display lists
• Dual-core driver already off-loads display list traversal
to driver’s thread
• Multi-core driver could scan frequently executed
display lists to optimize their order and error
processing
• Includes adding pre-fetching to avoid stalling CPU
on cache misses for object accesses
189. 189
Direct3Disms
• Developing a shader-rich game title costs $$$
• For top titles, often US$ 5,000,000+
• Investment typically amortized over multiple platforms
• Consoles are primary target, then PCs
• PC version typically developed for Direct3D
• Reality: OpenGL is often 3rd
or worse priority
• API differences = porting & performance pitfalls
• Stops or slows Direct3D-developed 3D content from
working easily on OpenGL platforms
190. 190
Supporting Direct3D: Not New
• OpenGL has always supported multiple formats well
• OpenGL’s plethora of pixel and vertex formats
• Very first OpenGL extension: EXT_bgra
• Provides a pixel component ordering to match the
color component ordering of Windows for 2D GDI
rendering
• Made core functionality by OpenGL 1.3
• Many OpenGL extensions have embraced Direct3Disms
• Secondary color
• Fog coordinate
• Point sprites
191. 191
Direct3D vs. OpenGL
Coordinate System Conventions
• Window origin conventions
• Direct3D = upper-left origin
• OpenGL = lower-left origin
• Pixel center conventions
• Direct3D9 = pixel centers at integer locations
• OpenGL (and Direct3D 10) = pixel centers at half-pixel locations
• Clip space conventions
• Direct3D = [-1,+1] for XY, [0,1] for Z
• OpenGL = [-1,+1] range for XYZ
• Affects
• How projection matrix is loaded
• Fragment shaders that access the window position
• Point sprites have upper-left texture coordinate origin
• OpenGL already lets application choose lower-left or upper-left
192. 192
Direct3D vs. OpenGL
Provoking Vertex Conventions
• Direct3D uses “first” vertex of a triangle or line to
determine which color is used for flat shading
• OpenGL uses “last” vertex for lines, triangles, and quads
• Except for polygons (GL_POLYGON) mode that use the
first vertex
Direct3D 9
pDev->SetRenderState(
D3DRS_SHADEMODE,
D3DSHADE_FLAT);
OpenGL
glShadeModel(GL_FLAT);
Input triangle strip
with per-vertex colors
193. 193
BGRA Vertex Array Order
• Direct3D 9’s most common usage for sending per-vertex
colors is 32-bit D3DCOLOR data type:
• Red in bits 16:23
• Green in bits 8:15
• Blue in bits 0:7
• Alpha in bits 24:31
• Laid in memory, looks like BGRA order
• OpenGL assumes RGBA order for all vertex arrays
• Direct3Dism EXT_vertex_array_bgra extension allows:
glColorPointer(GL_BGRA, GL_UNSIGNED_BYTE, stride, pointer);
glSecondaryColorPointer(GL_BGRA, GL_UNSIGNED_BYTE, stride, pointer);
glVertexAttribPointer(GL_BGRA, GL_UNSIGNED_BYTE, stride, pointer);
8-bit
red
8-bit
alpha
8-bit
green
8-bit
blue
bit 31
bit 0
194. 194
OpenGLisms
• Things about OpenGL’s operation that make it hard for
non-OpenGL applications to port to OpenGL
• Examples
• Selectors
• Linked GLSL program objects
195. 195
Eliminating Selectors from OpenGL
• OpenGL has lots of selectors
• Selectors set state that indicates what state subsequent
commands will update
• Already mentioned selectors: glClientActiveTexture
• Other examples: glActiveTexture, glMatrixMode,
glBindTexture, glBindBuffer, glUseProgram,
glBindProgramARB
• OpenGL is full of selectors
– Partly OpenGL’s extensibility strategy
– Partly because objects are bound into context
» Bind-to-edit objects
» Rather than edit-by-name
• Direct State Access extension: EXT_direct_state_access
• Provides complete selector-free additional API for OpenGL
• Shipping in NVIDIA’s 180.43 drivers
196. 196
Reasons to Eliminate Selectors
• Direct3D has an “edit-by-name” model of operation
• Means Direct3D has no selectors
• Having to manage selectors when porting Direct3D or console
code to OpenGL is awkward
• Requires deferring updates to minimize selector and object
bind changes
• Layered libraries can’t count of selector state
• To be safe when updating sate controlled by selectors, such
libraries must use idiom
• Save selector, Set selector, Update state, Restore selector
• Bad for performance, particularly bad for dual-core drivers
since queries are expensive
197. 197
GLSL Program Object Linking
• GLSL requires shader objects from different domains
(vertex, geometry, fragment) to be linked into single
GLSL program object
• Means you can’t mix-and-match shaders easily
• Other APIs don’t have this limitation
• Direct3D
• Prior OpenGL assembly language extensions
• Consoles
• Have a “separate shader objects” extension could fix this
problem
198. 198
Separate Shader Objects Example
• Combining different GLSL shaders at once
Specular brick
bump mapping
Red diffuse
Wobbly torus
Smooth torus
Different
GLSL
vertex
shaders
Different GLSL fragment shaders
199. 199
Deprecation
• Part of OpenGL 3.0 is a marking of features for deprecation
• LOTS of functionality is marked for deprecation
• I contend no real application today uses the non-deprecated
subset of OpenGL—all apps would have to change due to
deprecation
• Some vendors believe getting rid of features will make OpenGL
better in some way
• NVIDIA does not believe in abandoning API compatibility this
way
• OpenGL is part of a large ecosystem so removing features this way
undermines the substantial investment partners have made in
OpenGL over years
• API compatibility and stability is one of OpenGL’s great
strengths
200. 200
Synergy between OpenGL and OpenCL
• Complimentary capabilities
• OpenGL 3.0 = state-of-the-art, cross-platform graphics
• OpenCL 1.0 = state-of-the-art, cross-platform compute
• Computation & Graphics should work together
• Most natural way to intuit compute results is with graphics
• When Compute is done on a GPU, there’s no need to “copy” the
data to see it visualized
• Appendix B of OpenCL specification
• Details with sharing objects between OpenGL and OpenCL
• Called “GL” and “CL” from here on…
202. 202
OpenGL / OpenCL Sharing
• Requirements for GL object sharing with CL
• CL context must be created with an OpenGL context
• Each platform-specific API will provide its appropriate
way to create an OpenGL-compatible CL context
• For WGL (Windows), CGL (OS X), GLX (X11/Linux),
EGL (OpenGL ES), etc.
• Creating cl_mem for GL Objects does two things
1.Ensures CL has a reference to the GL objects
2.Provides cl_mem handle to acquire GL object for CL’s
use
• clRetainMemObject & clReleaseMemObject can create
counted references to cl_mem objects
203. 203
Acquiring GL Objects for Compute Access
• Still must “enqueue acquire” GL objects for compute kernels to
use them
• Otherwise reading or writing GL objects with CL is undefined
• Enqueue acquire and release provide sequential consistency
with GL command processing
• Enqueue commands for GL objects
• clEnqueueAcquireGLObjects
• Takes list of cl_mem objects for GL objects & list of
cl_events that must complete before acquire
• Returns a cl_event for this acquire operation
• clEnqueueReleaseGLObjects
• Takes list of cl_mem objects for GL objects & list of
cl_events that must complete before release
• Returns a cl_event for this release operation
#85: Didn’t continue to succeed, though.
One of my sorrows is that OpenGL didn’t seem to contribute to success for SGI
#101: Not a required “implementation”, just a concise way to specify the architecture (like ISA registers)
Directly inspired changes to the specification (especially to pixel operations, e.g., depth buffer of)
#102: Not a required “implementation”, just a concise way to specify the architecture (like ISA registers)
Directly inspired changes to the specification (especially to pixel operations, e.g., depth buffer of)