DRI Developer Frequently Asked Questions

José Fonseca

                j_r_fonseca@yahoo.co.uk
            

Revision History
Revision 0.302002-01-27Revised by: jf
Added a section about the purposes of this FAQ. Added some interesting recent posts. Several comestic changes.
Revision 0.202002-01-25Revised by: jf
Incorporated information from "Introduction to the Direct Rendering Infrastructure" by Brian Paul. Added a glossary based on the "Glossary of common 3D acronyms and terms" by Nathan Hand. Hyperlinks to the Web interface to the CVS repository in SourceForge.
Revision 0.102002-01-20Revised by: jf
Initial version based on questions posted on the dri-devel mailing list.

This is the list of Frequently Asked Questions for DRI developers, a framework for allowing direct access to graphics hardware in a safe and efficient manner. This FAQ is meant to be read in conjunction with the DRI Documentation. You can also get Postscript, PDF, HTML, and SGML versions of this document. The DRI Developer Frequently Asked Questions is distributed under the terms of the GNU Free Documentation License.


Table of Contents
1. About this FAQ
1.1. Who is the target audience of this FAQ?
1.1.1. For the users...
1.1.2. For the testers...
1.1.3. For the developers...
1.1.4. For the gurus...
1.2. What this FAQ is not?
2. Getting Started
2.1. How do I get started with development?
2.2. How do I submit a patch?
2.3. How is constituted a DRI driver?
2.4. Do I need to understand the X11 in order to help?
2.5. I want to start the development of a new driver. Which one should I took as template?
2.6. What's the DRI history?
3. DRI Architecture
3.1. Overview
3.2. Direct Rendering Module (DRM)
3.2.1. What is the DRM?
3.2.2. Where does the DRM resides?
3.2.3. In what way does the DRM supports the DRI?
3.2.4. Is it possible to make an DRI driver without a DRM driver in a piece of hardware whereby we do all accelerations in PIO mode?
3.2.5. Has the DRM driver support for or loading sub-drivers?
3.2.6. What is templated DRM code?
3.3. XFree86
3.3.1. DRI Aware DDX Driver
3.3.1.1. What is the DDX Driver?
3.3.1.2. Where does the DDX driver resides?
3.3.2. DRI Extension
3.3.2.1. What does the XFree86 DRI extension?
3.3.2.2. Where does the XFree86 DRI extension resides?
3.3.3. GLX Extension
3.3.3.1. What does the XFree86 GLX extension?
3.3.3.2. Where does the XFree86 GLX resides?
3.3.4. GLcore Extension
3.3.4.1. Where does the GLcore extension resides?
3.3.4.2. If I run a GLX enabled OpenGL program on a remote system with the display set back to my machine, will the X server itself render the GLX requests through DRI?
3.3.4.3. What's the re a difference between local clients and remote clients?
3.3.4.4. Is there a difference between using indirect DRI rendering (e.g., with LIBGL_ALWAYS_INDIRECT) and just linking against the Mesa library?
3.4. libGL
3.4.1. Where does it resides?
3.4.2. 3D Driver
3.4.2.1. Where does the 3D Driver resides?
3.4.3. Of what use is the Mesa code in the xc tree?
3.4.4. Is there any documentation about the XMesa* calls?
3.4.5. How do X modules and X applications communicate?
4. Debugging and benchmarking
4.1. How do you put a breakpoint in the dynamically loaded modules?
4.2. How do I do benchmarking with Unreal Tournament?
4.3. Is there any way for us to detect which features are implemented with hardware support for a given driver?
4.4. Which OpenGL benchmarking program can I use to test and compare various facets of the performance of graphics cards?
4.5. How should I report bugs?
5. Hardware Specific
5.1. AGP
5.1.1. What is AGP?
5.1.2. Where can I get more info about AGP?
5.1.3. Why not use the existing XFree86 AGP manipulation calls?
5.1.4. How do I use AGP?
5.1.5. How to allocate AGP memory?
5.1.6. If one has to insert pages he needs to check for -EBUSY errors and loop through the entire GTT. Wouldn't it be better if the driver fills up pg_start of agp_bind structure instead of user filling up?
5.1.7. How does the DMA transfer mechanism works?
5.2. ATI Cards
5.2.1. How to get ATI cards specification?
5.2.2. Mach64 based cards
5.2.2.1. I would like to help developing the Mach64 driver...
5.3. 3DFX
5.3.1. What's the relationship between Glide and DRI?
5.4. S3
5.4.1. Are there plans to enable the s3tc extension on any of the cards that currently support it?
6. Miscellaneous Questions
6.1. How about the main X drawing surface? Are 2 extra "window sized" buffers allocated for primary and secondary buffers in a page-flipping configuration?
6.2. Is anyone working on adding SSE support to the transform/lighting code in Mesa?
6.3. How often are checks done to see if things need clipped/redrawn/redisplayed?
6.4. What's the deal with fullscreen and DGA?
6.5. DRI without X
6.5.1. Can DRI run without X?
6.5.2. What would be the advantages of a non-X version of DRI?
6.5.3. I would like to make a X11 free acces to 3d...
6.6. DRI driver initialization process
6.7. Utah-GLX
6.7.1. How similar/diffrent are the driver arch of Utah and DRI?
6.7.2. Should one dig into the Utah source first, then go knee-deep in DIR, or the other way around?
7. Authorship and Acknowledgments
Glossary
References

1. About this FAQ

1.1. Who is the target audience of this FAQ?

1.1.1. For the users...

This FAQ is not really intended to users. But if you were curious to come so far you might consider colaborate with the developers by testing the development branches helping improving the support for your 3D graphics card.


1.1.2. For the testers...

This FAQ will give you a good overlook of what DRI is all about enabling you to make better bug-tracking. It also addresses the specific toppics of debugging and benchmarking in the section Debugging and benchmarking.


1.1.3. For the developers...

This FAQ will try give you a good jumpstart in the knowledge of the DRI infrastucture so that you can more easily start to code in the area of your interest.


1.1.4. For the gurus...

This FAQ is a convenient and easy way for you to share you wisdom with the newbies so consider submitting corrections, updates or additions to it.


1.2. What this FAQ is not?

Although I try to gather as much usefull information as possible this FAQ is not, and it will never be, a replacement of:

  • the original DRI documentation. Spite much of the infomation in the original DRI documentation is included in this FAQ not all was suited to be in a FAQ. Always take a look in it when need to know more.

  • the source code. Any documentation gets out-of-date as soon as someone starts to code so always refer to it to know the state of development and implementations details.

    Don't get afraid by the size of the XFree86 tree. This FAQ will give you very good pointers to the specific places you need to look into.

    Do not forget to take a good peek at other drivers than the one you're interest. You will find very similarities and hints to how to proceed.

  • the dri-devel mailing list. Always keep in touch with the others developers. We have a quite active developer base that will help you.

    Try to assist to the IRC meetings. You'll get the chance of talking directly with the others developers and get an insight of the DRI future.


2. Getting Started

2.1. How do I get started with development?

To get started with development you have to first understand the DRI and XFree86 architecture. You can start by reading the developer documents in the documentation section of this website.

Then you can continue by checking out the DRI CVS tree. Poke around the driver source code to find out some more about the inner workings of the DRI. There also are some text documents within the XFree86 source tree that contain useful information.

Once you feel that you have sufficient understanding of the DRI to begin coding you can start by submitting a patch. You have to submit at least one patch to get write access to our CVS. Have a look at the Sourceforge bug tracker for open issues. That is a good place to find an issue that you can fix by submitting a patch.

After you have submitted your patch you can start working on a more concrete project. Have a look at the status page or read the newsgroups for projects that need to be worked on.

Of course you don't have to submit a patch before you can work on a project. But since you wont be able to check in your work until you submit a patch it is very desirable to submit a patch first. You do want people to test your work, right?

Also, don't be shy about asking questions on the dri-devel newsgroup. The main purpose of the newsgroup is the discussion of development issues. So, feel free to ask questions.


2.2. How do I submit a patch?

There are two ways you can submit a patch.

Preferably you will submit the patch through our Sourceforge project page. That way we can easily find your user id to add you to the project as a developer and give you CVS access.

You can also submit a patch by posting it to the dri-devel newsgroup as an attachment to a message.

In any case, make sure you explain clearly what your patch is for.


2.3. How is constituted a DRI driver?

All DRI drivers are made up of 3 parts:


2.4. Do I need to understand the X11 in order to help?

 

I do not understand the X11 codebase. Not one bit. Never even looked at it, probably never intend to. Do I need to understand this in order to work on the DRI/Mesa drivers? Nope. At a bare minimum, I'd need to know how the Mesa Device Driver (DD) interface works, and have a basic understanding of OpenGL. This would allow me to work on the client-side driver code (xc/lib/GL/mesa/src/drv/*). Having more knowledge about the internals of Mesa certainly helps, and if you're keen you can check out the chipset specific stuff in the DRM (which is fairly closely tied to the client-side driver code mentioned above, and mainly deals with DMA transfers etc). No one ever said it would be easy to pick up, but it's not like you have to have a working knowledge of the entire X server to work on the 3D drivers…

 
--Gareth Hughes 

2.5. I want to start the development of a new driver. Which one should I took as template?

The tdfx driver is rather old, and was the place a lot of experimentation was done. It isn`t an example of a good driver.

To start out I`d concentrate on the i810. The design of that driver is very clean, and makes a good base to build upon. (Given more time and resources I`d rewrite the tdfx driver to act more like the i810).

There are a few possibilities you have to consider:

  • If your card has a DMA model which is secure (there is no way that the client can emit instructions which cause writes to system memory, for instance), the current i810 driver is what you should examine.

  • If your card has a DMA model which is insecure (DMA buffers can cause writes to PCI space), look at the current mga driver.

  • If your card has a FIFO/MMIO model which is secure, and there is no vertex buffer or DMA buffer mechanism, the i810 driver is probably the closest thing to look at for state management, but you will need to take a different approach to emitting cliprects — the tdfx driver has some examples of this.

  • If your card has a FIFO/MMIO model which is insecure and there is no vertex buffer or DMA buffer mechanism, you are really in a world of hurt… There are ways around most problems, but hopefully we don`t need to get into details.

The security issue that is seen most often is being able to write to PCI memory. Cards might do this via a 2D blit mode which allows blitting to/from main memory, or perhaps by a special DMA instruction which writes a counter dword into PCI space. These are very useful operations, but can be exploited to write to, eg, the bit of memory which holds a process` UID number.

Most hardware seems to have been designed for consumer versions of Windows, which don`t really have a security model.

So, you need to verify there is no mechanism to write to PCI space (in any way or form).

If there is no DMA interface — as long as the card is secure — that is probably going to simplify the task of writing the driver, as there will be only a tiny kernel component. The tdfx kernel driver should be the basis for your kernel part — there should be very few changes. You can use the basic structure of the i810 3D driver for state management, etc, but you will want to emit state directly via PIO instead of via DMA. You will need to look at how the tdfx driver emits cliprects in the triangle routines — you`ll need to do something similar.


2.6. What's the DRI history?

 

This is mostly a history of the DRI [1]. If you're interested, print it out and snuggle up by the fire…

At SigGraph of '98 (August 1998) there was a Linux 3D BOF. Brian, Daryll and others gave good talks at this meeting. I had a short talk near the end where I basically asked for any community members interested in developing a direct rendering infrastructure to contact me. From this, I got two responses with no follow up.

So, Kevin and I sat down and wrote the high level design document, which outlined many alternatives and recommended a single path for implementing direct rendering. This document was released less than two months after my siggraph plea for involvement. We posted the following e-mail to the mailing list in September of 1998:

Subject: Direct Rendering for 3D
From: Jens Owen (jens@precisioninsight.com)
Date: Sun, 20 Sep 1998 12:40:07 -0600 

We have released our high-level design document which describes a
direct rendering architecture for 3D applications. This document
can be found at: 

http://www.precisioninsight.com/dr/dr.html 

Our intention is to create a sample implementation of the 3D 
infrastructure using Mesa and XFree86 with full source available
under an XFree86-style license. 

We are sending this e-mail to XFree86, Mesa, and kernel developers
in an effort to receive feedback on the high level design. We
would like to receive comments back by 12 Oct 98 as we plan to
start on the low level design at that point. 

Regards, 
Jens Owen and Kevin E. Martin
			

This e-mail received no responses on the devel list. However, Alan Akin did see this and reply with some excellent comments directly to Kevin and myself. After a couple of months with no feedback we pushed on. We were able to hire Rik Faith in November and by early December dedicated a couple of weeks to working together to come up with a low level design document. Allen Akin volunteered his time in that effort, and the four of us spent every other day for two weeks on the phone and the off days thinking over and writing up our design.

By the end of 1998, we had a small amount of funding (relative to the size of the project) and a low level design in hand; however, we were still missing some key components. We needed a reference implementation of OpenGL to base our 3D drivers on. SGI had sample implementation, but they weren't ready to release it in open source, yet. They did however, release their GLX library which was very helpful. They also gave us an initial round of funding — but they understood the problem; and shared with us (after the fact) that they didn't think it could be done on the aggressive schedule and shoe string budget we had to work with. Red Hat also funded this effort. Red Hat was new to 3D graphics, and didn't realize the mountain we were trying to climb — still their support was invaluable.

The two key pieces of technology that came our way (just in time) was SGI's GLX release and Brian Paul's willingness to give us a Mesa license that was compatible with XFree86. With the two missing pieces to the puzzle finally available, we posted to the list again in February of '99. Here is a copy of our post:

Subject: SGI's GLX Source Release
From: Jens Owen (jens@precisioninsight.com)
Date: Tue, 16 Feb 1999 16:18:32 -0700 

SGI has announced their GLX source release to the open source 
community. You can read the press release at 
http://www.sgi.com/newsroom/press_releases/1999/february/opengl.html 

Precision Insight is working with SGI to incorporate their GLX source 
base into XFree86. We are also working on integrating the SGI's GLX 
code base and Brian Paul's Mesa core rendering functionality within
the XFree86 4.0 X Server development branch.

This initial GLX/Mesa integration is an effort to quickly get an early 
version of software only indirect rendering into the XFree86 4.0 
development tree. That will then be the basis for our Direct Rendering 
Infrastructure (DRI) development which will be available in mid '99. 

If you would like more detailed information about our Direct Rendering 
Infrastructure, we have posted an updated project description at 
http://www.precisioninsight.com/DRI021699.html 

Regards, 
Jens Owen 
			

This was exciting. We still didn't get much community involvement — but we did have enough money in hand to fund 3 full time developers to crank this out by the end of June of 1999. We each took a driver component and pushed forward as fast as we could. Kevin took the 3D component and tied the GLX, Mesa and a hardware implementation together. Rik took the kernel component and developed the DRMlib and our first DRM hardware implementation. I took the X Server piece and developed the DRI server extension and extended a DDX driver to be "DRI aware". We burned the midnight oil at a pace that would make Gareth proud:-) We new we had a huge task ahead. We had committed to a demo at the Linux show in the middle of May and needed to wrap up the project by the end of June (when our funded ran out). One slip up by any of us and we weren't going to be able to pull this off. We hadn't been getting much feedback from the open source community — so we just put our heads down and developed like mad men.

By the middle of May, we had a handful of design documents and we were ready for the trade show demo. We posted this to :

Subject: Direct Rendering Design Docs Available
From: Jens Owen (jens@precisioninsight.com)
Date: Wed, 12 May 1999 17:09:23 -0600 

Pointers to our DRI design docs can be found at 
http://www.precisioninsight.com/piinsights.com 

Please direct comments to glx@xfree86.org 

Regards, 
Jens 
			

The only feedback we received was two posts to correct the URL we posted. I admit we weren't too surprised. We hadn't gotten any feedback on the early designs — so why should the more detailed documents be any different. We pulled off our demo and pushed on. We needed to clean up the sources enough to get them into XFree86. We thought, if design documents aren't helping other developers understand our work; maybe the sources will. By the middle of June 1999, we had an alpha release submitted to XFree86. Here's the old announcement:

Subject: README for Direct Rendering
From: Jens Owen (jens@precisioninsight.com)
Date: Sat, 12 Jun 1999 22:51:59 -0600 

Attached is the README for the alpha release of our Direct Rendering 
Infrastructure that has been submitted to XFree86. Look for the code
in an upcoming 3.9 alpha patch release. 

Regards, 
Jens Owen 

  Direct Rendering Infrastructure Alpha release 
  --------------------------------------------- 

  Patches for the alpha release of Precision Insight's Direct Rendering 
  Infrastructure (DRI) have been submitted to XFree86. The final sample 
  implementation (SI) will be available at the end of June. The purpose 
  of this release is to allow XFree86 and others to start evaluating the 
  code. 

  *NOTE* This is an alpha release and there will be changes between now 
  and the final SI release. 

  Please direct all comments about this release to glx@xfree86.org. 

  * What comes with this release? 

	There are four main parts of this patch: 

	1. the client- and server-side DRI, 
	2. a 2D DDX driver for 3Dlabs' GMX2000, 
	3. an OpenGL client side direct rendering driver for the GMX2000,
  and 
	4. a generic kernel driver for Linux 2.2.x. 
  
	The DRI handles the communication and synchronization between the X 
	server, the client driver and the kernel driver. 

	The 3Dlabs XFree86 DDX driver has been enhanced to support the 
	GMX2000. It has also been extended to communicate with and provide 
	callbacks for the DRI. 

	The client driver implements a subset of OpenGL. The subset required 
	for id Software's Quake 2 was chosen to demonstrate the capabilities 
	of the DRI. This driver communicates with the device by filling DMA 
	buffers and sending them to the kernel driver. Note that the Gamma 
	chip implements OpenGL 1.1 in hardware, and therefore, does not use 
	the Mesa internals at this time. However, support for the majority
	of current generation of 3D hardware devices will require
	integration with Mesa, so an example DRI driver using the Mesa
	software-only pipeline was implemented (and is mostly complete for
	the alpha release). 

	The generic kernel driver handles the allocation of the DMA buffers, 
	distribution of the buffers to the clients, sending the buffers to
	the device, and the management of synchronization between the client 
	driver, the X server, and the kernel driver (this includes the
	device lock and a shared memory region). Note that hardware that
	does not support DMA or that does support special synchronization
	methods will only make use of a subset of these capabilities. 

  * What are the known problems and/or limitations of the alpha 
	release? 

	We are actively working on fixing the items listed below, and will 
	attempt to fix as many of them as possible before the SI release. 

	- The X server seg faults due to a context switching bug when there 
	  are 10 or more 3D clients running simultaneously 
	- Dynamic loading of the OpenGL client driver is not yet implemented 
	- 3D client death while holding the drawable lock causes deadlock 
	- The kernel module only works with Linux kernels 2.2.[0-5] 
	- A better authentication mechanism needs to be implemented 
	- A better DMA buffer queuing algorithm needs to be implemented 
	- A device specific shared memory region needs to be added to SAREA 
	- The DRI protocol request for the framebuffer layout needs to be 
	  extended to support FB width and depth information (for 24 vs. 32 
	  bpp, 8+24 layouts, etc) 
	- Add options for the DRI to XF86Config 

	Here are other problems that we are not going to have time to fix
	for the SI. However, we and other open source developers are going
	to continue developing and extending the DRI in follow-on projects. 

	- Direct rendering to a pixmap is not supported 
	- A more sophisticated texture management routine is required to 
	  handle texture swapping efficiently 
	- Multi-threaded OpenGL clients are not supported 
	- glXCopyContext and glXUseXFont are not supported in the DRI 
	- SwapBuffers does not wait on vertical retrace 
	- Support wait for vertical retrace in kernel driver 
	- Handling overlays is not currently supported 
	- Integrate with DBE 
	- Completing the software-only Mesa example driver 
	- Completing the other OpenGL paths for the GMX2000 
	- Support for video modes other than 640x480 in both the GMX2000 2D 
	  DDX driver and the 3D client driver 
	- More than minimal 2D acceleration of the GMX2000 2D DDX driver 
	  should be implemented 
	- Implement finer grained locking scheme in X server to improve 
	  interactivity 
	- Only grab the drawable lock and update the drawable stamp when a
	  3D window is altered 
	- The viewport does not scale properly when a 3D window is resized 
	- Double buffered 3D windows are not clipped to the screen 
	- glXSwapBuffers is not clipped to the client's viewport 
	- Only one client is allowed to use texture memory 
	- glFinish does not wait until the HW completes processing the 
	  outstanding DMA buffers 
	- Version numbers of the DDX and kernel driver are not verified 
	- Make lock available during SIGSTOP 
	- Make drmFinish work while holding the device lock 
	- Improve /proc/drm 
	- Improve documentation 
	- Improve example device-specific kernel driver (not used for SI) 

  * Where can I get more information? 

	We have made our design and implementation documents available on
	our website: 

		http://www.precisioninsight.com/piinsights.html 

	More documentation will be available with the SI release. 

  * Where should I send comments? 

	Please send all comments and questions to the glx@xfree86.org list 
			

Wow, we had made it. The base DRI infrastructure had been released on an near impossible budget and dead line. We were excited to have crossed this bridge — but from the list of limitations outlined in our README above, we new there was still a lot of work to be done. We needed more funding, more work on the infrastructure and more complete driver implementations to move the DRI forward. Daryll Strauss had joined our team just before the trade show in May and was quickly ramping up on the DRI. As the initial 3 developers colapsed in a heap by the end of June; Daryll was able to pick up the slack and push the DRI on another amazing evolution with an impossible schedule. 3Dfx hired us to develop a DRI driver for the Banshee and Voodoo2 chipsets and they wanted a demo by the SigGraph trade show in August of 1999. This seemed impossible in my opinion — but Daryll had the experience of implementing the first complete hardware accellerated OpenGL under Mesa for Linux under his belt. In the past, he had used 3Dfx's Glide library to achieve great results. Now, he attacked the DRI, learned it strengths, and molded around it's weaknesses to developed a demo of the first complete 3D driver for the DRI in just two short months.

That summer, Intel had also commisioned us to develop a 3D driver for the i810 chipset. They didn't have the head start of existing 2D drivers and a ported Glide library — but they were serious about doing Linux right. They wanted complete 2D drivers for the current XFree86 release, first. We were able to bring Keith Whitwell onto the team. Keith knew Mesa well, and had been a key contributor to the first native Mesa drivers under the Utah-GLX project. Keith spun up on 2D XFree86 drivers in no time flat and developed Intel's i810 2D drivers for XFree86 3.3 and 4.0. Then, when the 2D drivers were complete he used his wizardry to develop the first cut of fully native Mesa DRI driver in just a few short weeks. Keith's focus on performance and ability to quickly generate complex 3D drivers was nothing short of amazing — however, it is his consumate dedication to open development that helped move the DRI forward the most. Keith saw first hand how well a simple framework like the Utah-GLX project was able to foster new graphics talent — and he was the initial driving force behind the DRI project being moved to completely open source repository and mailing lists. The entire team embraced his ideals; and the DRI Source Forge project was born.

By the end of the summer of 1999; Wall Street had found Linux and no less than four major graphics hardware vendors had secured our services for a 3D driver under Linux. We had committed to have complete driver suites in place for the latest chipsets from 3Dfx, Intel, ATI and Matrox by early 2000. This required a large effort across the team and some new hands as well. We picked up Jeff Hartmann, an AGP specialist — who enabled us to utilized true AGP busmastering for our drivers. Next we added Brian Paul to our team — as the author of Mesa, his knowledge was second to none and his dedication to OpenGL conformance helped our drivers reach a higher level of completeness and quality.

Brian had a history of working well with members of the OpenGL Architecture Review Board and usually provided the first implementations of new ARB approved extensions via his Mesa software renderer. He was able to quickly drive forward a standard for dynamically resolving driver extensions at run time and implement a nice jump table mechanism to allow multiple DRI drivers to be handled via a single OpenGL library. These mechanisms were integrated into our drivers and OpenGL library just in time for the XFree86 4.0 release.

The XFree86 4.0 release had been our target platform since the early design days of the DRI. We wanted (and needed) to be closely integrated with the standard for open source windowing software — XFree86. David Dawes, a founding member and president of XFree86, joined our team in January of 2000 and helped us bring the DRI project into even closer alignment with the needs of XFree86.

With a heroic effort by Kevin, Rik, Daryll, Keith, Jeff, Brian and David we were able to deliver no less than 4 complete driver suites for the XFree86 4.0 release in early 2000. This moved the DRI from the status of "sample framework" to a solid 3D platform in eight short months. We had moved from a "make annoucements" on progress every few months mode to a fully open development process hosted at SourceForge.

A few months later Precision Insight was aquired by VA Linux Systems. The team grew further to include some additional fantastic developers: Gareth Hughes, Alan Hourihane and Nathan Hand. We took more steps forward in the progression of the DRI — but alas VA was not meant to be in the 3D Linux business.

Today, the team has split up and moved forward in a few different directions. Some of the team went to work for Red Hat (Alan, Kevin and Rik); Gareth is working for NVidia; and Brian, Keith and David have started a new company called Tungsten Graphics with Frank Lamonica and myself. Jeff has gone back to school and Nathan is working on other projects from Australia.

Hopefully this background can give you a perspective for how the DRI has always been rooted in the open source community and how we have evolved to using more and more effective open development techniques. I sincerely hope we can find and provide the needed catalysts for bringing new developers into the exciting technical area of 3D graphics development.

 
--Jens Owen 

3. DRI Architecture


3.2. Direct Rendering Module (DRM)

3.2.1. What is the DRM?

This is a kernel module that gives direct hardware access to DRI clients.

This module deals with DMA, AGP memory management, resource locking, and secure hardware access. In order to support multiple, simultaneous 3D applications the 3D graphics hardware must be treated as a shared resource. Locking is required to provide mutual exclusion. DMA transfers and the AGP interface are used to send buffers of graphics commands to the hardware. Finally, there must be security to prevent out-of-control clients from crashing the hardware.


3.2.2. Where does the DRM resides?

Since internal Linux kernel interfaces and data structures may be changed at any time, DRI kernel modules must be specially compiled for a particular kernel version. The DRI kernel modules reside in the /lib/modules/.../kernel/drivers/char/drm directory. Normally, the X server automatically loads whatever DRI kernel modules are needed.

For each 3D hardware driver there is a kernel module. DRI kernel modules are named device.o where device is a name such as tdfx, mga, r128, etc.

The source code resides in xc/programs/Xserver/hw/xfree86/os-support/linux/drm/ .


3.2.3. In what way does the DRM supports the DRI?

The DRM supports the DRI in three major ways:

  1. The DRM provides synchronized access to the graphics hardware.

    The direct rendering system has multiple entities (i.e., the X server, multiple direct-rendering clients, and the kernel) competing for direct access to the graphics hardware. Hardware that is currently available for PC-class machines will lock up if more than one entity is accessing the hardware (e.g., if two clients intermingle requests in the command FIFO or (on some hardware) if one client reads the framebuffer while another writes the command FIFO).

    The DRM provides a single per-device hardware lock to synchronize access to the hardware. The hardware lock may be required when the X server performs 2D rendering, when a direct-rendering client is performing a software fallback that must read or write the frame buffer, or when the kernel is dispatching DMA buffers.

    This hardware lock may not be required for all hardware (e.g., high-end hardware may be able to intermingle command requests from multiple clients) or for all implementations (e.g., one that uses a page fault mechanism instead of an explicit lock). In the later case, the DRM would be extended to provide support for this mechanism.

    For more details on the hardware lock requirements and a discussion of the performance implications and implementation details, please see [FOM99].

  2. The DRM enforces the DRI security policy for access to the graphics hardware.

    The X server, running as root, usually obtains access to the frame buffer and MMIO regions on the graphics hardware by mapping these regions using /dev/mem. The direct-rendering clients, however, do not run as root, but still require similar mappings. Like /dev/mem, the DRM device interface allows clients to create these mappings, but with the following restrictions:

    1. The client may only map regions if it has a current connection to the X server. This forces direct-rendering clients to obey the normal X server security policy (e.g., using xauth).

    2. The client may only map regions if it can open /dev/drm?, which is only accessible by root and by a group specified in the XF86Config file (a file that only root can edit). This allows the system administrator to restrict direct rendering access to a group of trusted users.

    3. The client may only map regions that the X server allows to be mapped. The X server may also restrict those mappings to be read-only. This allows regions with security implications (e.g., those containing registers that can start DMA) to be restricted.

  3. The DRM provides a generic DMA engine.

    Most modern PC-class graphics hardware provides for DMA access to the command FIFO. Often, DMA access has been optimized so that it provides significantly better throughput than does MMIO access. For these cards, the DRM provides a DMA engine with the following features:

    1. The X server can specify multiple pools of different sized buffers which are allocated and locked down.

    2. The direct-rendering client maps these buffers into its virtual address space, using the DRM API.

    3. The direct-rendering client reserves some of these buffers from the DRM, fills the buffers with commands, and requests that the DRM send the buffers to the graphics hardware. Small buffers are used to ensure that the X server can get the lock between buffer dispatches, thereby providing X server interactivity. Typical 40MB/s PCI transfer rates may require 10000 4kB buffer dispatches per second.

    4. The DRM manages a queue of DMA buffers for each OpenGL GLXContext, and detects when a GLXContext switch is necessary. Hooks are provided so that a device-specific driver can perform the GLXContext switch in kernel-space, and a callback to the X server is provided when a device-specific driver is not available (for the SI, the callback mechanism is used because it provides an example of the most generic method for GLXContext switching). The DRM also performs simple scheduling of DMA buffer requests to prevent GLXContext thrashing. When a DMA is swapped a significant amount of data must be read from and/or written to the graphics device (between 4kB and 64kB for typical hardware).

    5. The DMA engine is generic in the sense that the X server provides information at run-time on how to perform DMA operations for the specific hardware installed on the machine. The X server does all of the hardware detection and setup. This allows easy bootstrapping for new graphics hardware under the DRI, while providing for later performance and capability enhancements through the use of a device-specific kernel driver.


3.2.4. Is it possible to make an DRI driver without a DRM driver in a piece of hardware whereby we do all accelerations in PIO mode?

The kernel provides three main things:

  1. the ability to wait on a contended lock (the waiting process is put to sleep), and to free the lock of a dead process;

  2. the ability to mmap areas of memory that non-root processes can`t usually map;

  3. the ability to handle hardwre interruptions and a DMA queue.

All of these are hard to do outside the kernel, but they aren`t required components of a DRM driver. For example, the tdfx driver doesn`t use hardware interrupts at all — it is one of the simplest DRM drivers, and would be a good model for the hardware you are thinking about (in it`s current form, it is quite generic).

Note

DRI was designed with a very wide range of hardware in mind, ranging from very low-end PC graphics cards through very high-end SGI-like hardware (which may not even need the lock). The DRI is an infrastructure or framework that is very flexible — most of the example drivers we have use hardware interrupts, but that isn`t a requirement.


3.2.5. Has the DRM driver support for or loading sub-drivers?

Although the [Faith99] states that the DRM driver has support for loading sub-drivers by calling drmCreateSub, Linus didn't like that approach. He wanted all drivers to be independent, so it went away.


3.2.6. What is templated DRM code?

It was first discussed in a email about what Garteh had done to bring up the mach64 kernel module.

Not wanting to simply copy-and-paste another version of _drv.[ch], _context.c, _bufs.s and so on, Gareth did some refactoring along the lines of what him and Rik Faith had discussed a long time ago.

This is very much along the lines of a lot of Mesa code, where there exists a template header file that can be customized with a few defines. At the time, it was done _drv.c and _context.c, creating driver_tmp.h and context_tmp.h that could be used to build up the core module.

An inspection of mach64_drv.c on the mach64-0-0-1-branch reveals the following code:

#define DRIVER_AUTHOR           "Gareth Hughes"

#define DRIVER_NAME             "mach64"
#define DRIVER_DESC             "DRM module for the ATI Rage Pro"
#define DRIVER_DATE             "20001203"

#define DRIVER_MAJOR            1
#define DRIVER_MINOR            0
#define DRIVER_PATCHLEVEL       0


static drm_ioctl_desc_t         mach64_ioctls[] = {
	[DRM_IOCTL_NR(DRM_IOCTL_VERSION)]       = { mach64_version,    0, 0 },
	[DRM_IOCTL_NR(DRM_IOCTL_GET_UNIQUE)]    = { drm_getunique,     0, 0 },
	[DRM_IOCTL_NR(DRM_IOCTL_GET_MAGIC)]     = { drm_getmagic,      0, 0 },
	[DRM_IOCTL_NR(DRM_IOCTL_IRQ_BUSID)]     = { drm_irq_busid,     0, 1 },

	[DRM_IOCTL_NR(DRM_IOCTL_SET_UNIQUE)]    = { drm_setunique,     1, 1 },
	[DRM_IOCTL_NR(DRM_IOCTL_BLOCK)]         = { drm_block,         1, 1 },
	[DRM_IOCTL_NR(DRM_IOCTL_UNBLOCK)]       = { drm_unblock,       1, 1 },
	[DRM_IOCTL_NR(DRM_IOCTL_AUTH_MAGIC)]    = { drm_authmagic,     1, 1 },
	[DRM_IOCTL_NR(DRM_IOCTL_ADD_MAP)]       = { drm_addmap,        1, 1 },

	[DRM_IOCTL_NR(DRM_IOCTL_ADD_BUFS)]      = { drm_addbufs,       1, 1 },
	[DRM_IOCTL_NR(DRM_IOCTL_MARK_BUFS)]     = { drm_markbufs,      1, 1 },
	[DRM_IOCTL_NR(DRM_IOCTL_INFO_BUFS)]     = { drm_infobufs,      1, 0 },
	[DRM_IOCTL_NR(DRM_IOCTL_MAP_BUFS)]      = { drm_mapbufs,       1, 0 },
	[DRM_IOCTL_NR(DRM_IOCTL_FREE_BUFS)]     = { drm_freebufs,      1, 0 },

	[DRM_IOCTL_NR(DRM_IOCTL_ADD_CTX)]       = { mach64_addctx,     1, 1 },
	[DRM_IOCTL_NR(DRM_IOCTL_RM_CTX)]        = { mach64_rmctx,      1, 1 },
	[DRM_IOCTL_NR(DRM_IOCTL_MOD_CTX)]       = { mach64_modctx,     1, 1 },
	[DRM_IOCTL_NR(DRM_IOCTL_GET_CTX)]       = { mach64_getctx,     1, 0 },
	[DRM_IOCTL_NR(DRM_IOCTL_SWITCH_CTX)]    = { mach64_switchctx,  1, 1 },
	[DRM_IOCTL_NR(DRM_IOCTL_NEW_CTX)]       = { mach64_newctx,     1, 1 },
	[DRM_IOCTL_NR(DRM_IOCTL_RES_CTX)]       = { mach64_resctx,     1, 0 },
	[DRM_IOCTL_NR(DRM_IOCTL_ADD_DRAW)]      = { drm_adddraw,       1, 1 },
	[DRM_IOCTL_NR(DRM_IOCTL_RM_DRAW)]       = { drm_rmdraw,        1, 1 },
	[DRM_IOCTL_NR(DRM_IOCTL_LOCK)]          = { mach64_lock,       1, 0 },
	[DRM_IOCTL_NR(DRM_IOCTL_UNLOCK)]        = { mach64_unlock,     1, 0 },
	[DRM_IOCTL_NR(DRM_IOCTL_FINISH)]        = { drm_finish,        1, 0 },

#if defined(CONFIG_AGP) || defined(CONFIG_AGP_MODULE)
	[DRM_IOCTL_NR(DRM_IOCTL_AGP_ACQUIRE)]   = { drm_agp_acquire,   1, 1 },
	[DRM_IOCTL_NR(DRM_IOCTL_AGP_RELEASE)]   = { drm_agp_release,   1, 1 },
	[DRM_IOCTL_NR(DRM_IOCTL_AGP_ENABLE)]    = { drm_agp_enable,    1, 1 },
	[DRM_IOCTL_NR(DRM_IOCTL_AGP_INFO)]      = { drm_agp_info,      1, 0 },
	[DRM_IOCTL_NR(DRM_IOCTL_AGP_ALLOC)]     = { drm_agp_alloc,     1, 1 },
	[DRM_IOCTL_NR(DRM_IOCTL_AGP_FREE)]      = { drm_agp_free,      1, 1 },
	[DRM_IOCTL_NR(DRM_IOCTL_AGP_BIND)]      = { drm_agp_bind,      1, 1 },
	[DRM_IOCTL_NR(DRM_IOCTL_AGP_UNBIND)]    = { drm_agp_unbind,    1, 1 },
#endif

	[DRM_IOCTL_NR(DRM_IOCTL_MACH64_INIT)]   = { mach64_dma_init,   1, 1 },
	[DRM_IOCTL_NR(DRM_IOCTL_MACH64_CLEAR)]  = { mach64_dma_clear,  1, 0 },
	[DRM_IOCTL_NR(DRM_IOCTL_MACH64_SWAP)]   = { mach64_dma_swap,   1, 0 },
	[DRM_IOCTL_NR(DRM_IOCTL_MACH64_IDLE)]   = { mach64_dma_idle,   1, 0 },
};

#define DRIVER_IOCTL_COUNT      DRM_ARRAY_SIZE( mach64_ioctls )

#define HAVE_CTX_BITMAP         1

#define TAG(x) mach64_##x
#include "driver_tmp.h"

And that's all you need. A trivial amount of code is needed for the context handling:

#define __NO_VERSION__
#include "drmP.h"
#include "mach64_drv.h"

#define TAG(x) mach64_##x
#include "context_tmp.h"

And as far as I can tell, the only thing that's keeping this out of mach64_drv.c is the __NO_VERSION__, which is a 2.2 thing and is not used in 2.4 (right?).

To enable all the context bitmap code, we see the
#define HAVE_CTX_BITMAP 1
To enable things like AGP, MTRRs and DMA management, the author simply needs to define the correct symbols. With less than five minutes of mach64-specific coding, I had a full kernel module that would do everything a basic driver requires — enough to bring up a software-fallback driver. The above code is all that is needed for the tdfx driver, with appropriate name changes. Indeed, any card that doesn't do kernel-based DMA can have a fully functional DRM module with the above code. DMA-based drivers will need more, of course.

The plan is to extend this to basic DMA setup and buffer management, so that the creation of PCI or AGP DMA buffers, installation of IRQs and so on is as trivial as this. What will then be left is the hardware-specific parts of the DRM module that deal with actually programming the card to do things, such as setting state for rendering or kicking off DMA buffers. That is, the interesting stuff.

A couple of points:

  • Why was it done like this, and not with C++ features like virtual functions (i.e. why don't I do it in C++)? Because it's the Linux kernel, dammit! No offense to any C++ fan who may be reading this :-) Besides, a lot of the initialization is order-dependent, so inserting or removing blocks of code with #defines is a nice way to achieve the desired result, at least in this situation.

  • Much of the core DRM code (like bufs.c, context.c and dma.c) will essentially move into these template headers. I feel that this is a better way to handle the common code. Take context.c as a trivial example — the i810, mga, tdfx, r128 and mach64 drivers have exactly the same code, with name changes. Take bufs.c as a slightly more interesting example — some drivers map only AGP buffers, some do both AGP and PCI, some map differently depending on their DMA queue management and so on. Again, rather than cutting and pasting the code from drm_addbufs into my driver, removing the sections I don't need and leaving it at that, I think keeping the core functionality in bufs_tmp.h and allowing this to be customized at compile time is a cleaner and more maintainable solution.

This looks way sweet. Have you thought about what it would take to generalize this to other OSs? I think that it has the possibility to make keeping the FreeBSD code up to date a lot easier.

The current mach64 branch is only using one template in the driver.

Check out the r128 driver from the trunk, for a good example. Notice there are files in there such as r128_tritmp.h. This is a template that gets included in r128_tris.c. What it does basically is consolidate code that is largly reproduced over several functions, so that you set a few macros. For example:

#define IND (R128_TWOSIDE_BIT)
#define TAG(x) x##_twoside
			

followed by

#include "r128_tritmp.h"
			

Notice the inline function's name defined in r128_tritmp.h is the result of the TAG macro, as well the function's content is dependant on what IND value is defined. So essentially the inline function is a template for various functions that have a bit in common. That way you consolidate common code and keep things consistent.

Look at e.g. xc/programs/Xserver/hw/xfree86/os-support/linux/drm/kernel/r128.h though. That's the template architecture at its beauty. Most of the code is shared between the drivers, customized with a few defines. Compare that to the duplication and inconsistency before.


3.3. XFree86

3.3.1. DRI Aware DDX Driver

3.3.1.1. What is the DDX Driver?

For each type of graphics card there is an XFree86 2D (or DDX) driver which does initialization, manages the display and performs 2D rendering. XFree86 4.0 introduced a new device driver interface called XAA which should allow XFree86 drivers to be backward compatible with future versions of the Xserver.

Each 2D driver has a bit of code to bootstrap the 3D/DRI features.


3.3.1.2. Where does the DDX driver resides?

The XFree86 drivers usually reside in the /usr/X11R6/lib/modules/drivers/ directory with names of the form *_drv.o.

The XFree86 drivers source code resides in xc/programs/Xserver/hw/xfree86/drivers/ usually in a file *_dri.[ch].


3.3.2. DRI Extension

3.3.2.1. What does the XFree86 DRI extension?

The XFree86-DRI X server extension is basically used for communication between the other DRI components (the X server, the kernel module, libGL.so and the 3D DRI drivers).

The XFree86-DRI module maintains DRI-specific data structures related to screens, windows, and rendering contexts. When the user moves a window, for example, the other DRI components need to be informed so that rendering appears in the right place.


3.3.2.2. Where does the XFree86 DRI extension resides?

The DRI extension module usually resides at /usr/X11R6/lib/modules/extensions/libdri.a.

The DRI extension source code resides at xc/programs/Xserver/GL/dri/ .


3.3.3. GLX Extension

3.3.3.1. What does the XFree86 GLX extension?

The GLX extension to the X server handles the server-side tasks of the GLX protocol. This involves setup of GLX-enhanced visuals, GLX context creation, context binding and context destruction.

When using indirect rendering, the GLX extension decodes GLX command packets and dispatches them to the core rendering engine.


3.3.3.2. Where does the XFree86 GLX resides?

The GLX extension module usually resides at /usr/X11R6/lib/modules/extensions/libglx.a.

The GLX extension source code resides at xc/programs/Xserver/GL/glx/ .


3.3.4. GLcore Extension

The GLcore module takes care of rendering for indirect or non-local clients.

Currently, Mesa is used as a software renderer. In the future, indirect rendering will also be hardware accelerated.


3.3.4.1. Where does the GLcore extension resides?

The GLcore extension module usually resides at /usr/X11R6/lib/modules/extensions/libGLcore.a.

The GLcore extension source code resides at xc/programs/Xserver/GL/mesa/src/ .


3.3.4.2. If I run a GLX enabled OpenGL program on a remote system with the display set back to my machine, will the X server itself render the GLX requests through DRI?

The X server will render the requests but not through the DRI. The rendering will be software only. Having the X server spawn a DRI client to process the requests is on the TODO list.


3.3.4.3. What's the re a difference between local clients and remote clients?

There is no difference as far as the client is concerned. The only difference is speed.

The difference between direct and indirect rendering is that the former can`t take place over the network. The DRI currently concentrates on the direct rendering case.

The application still gets a fully functional OpenGL implementation which is all that`s required by the specification. The fact is that the implementation is entirely in software, but that`s completely legal. In fact, all implementations fall back to software when they can`t handle the request in hardware. It`s just that in this case — the implementation can`t handle anything in hardware.

Most people don`t run GLX applications remotely, and/because most applications run very poorly when run remotely. It`s not really the applications fault, OpenGL pushes around a lot of data.

Therefore there hasn`t been a lot of interest in hardware accelerated remote rendering and there`s plenty to do local rendering. It is on the TODO list but at a low priority.

The solution is actually fairly straight forward. When the X server gets a remote client, it forks off a small application that just reads GLX packets and then remakes the same OpenGL calls. This new application is then just a standard direct rendering client and the problem reduces to one already solved.


3.3.4.4. Is there a difference between using indirect DRI rendering (e.g., with LIBGL_ALWAYS_INDIRECT) and just linking against the Mesa library?

Yes. DRI libGL used in in indirect mode sends GLX protocol messages to the X server which are executed by the GLcore renderer. Stand-alone Mesa's non-DRI libGL doesn't know anything about GLX. It effectively translates OpenGL calls into Xlib calls.

The GLcore renderer is based on Mesa. At this time the GLcore renderer can not take advantage of hardware acceleration.


3.4. libGL

OpenGL-based programs must link with the libGL library. libGL implements the GLX interface as well as the main OpenGL API entrypoints. When using indirect rendering, libGL creates GLX protocol messages and sends them to the X server via a socket. When using direct rendering, libGL loads the appropriate 3D DRI driver then dispatches OpenGL library calls directly to that driver.

libGL also has the ability to support heterogeneous, multi-head configurations. That means one could have two or more graphics cards (of different types) in one system and libGL would allow an application program to use all of them simultaneously.


3.4.2. 3D Driver

A DRI aware 3D driver currently based on Mesa.


3.4.2.1. Where does the 3D Driver resides?

Normally libGL loads 3D DRI drivers from the /usr/X11R6/lib/modules/dri directory but the search patch can be overriden by setting the LIBGL_DRIVERS_PATH environment variable.

The DRI aware 3D driver resides in xc/lib/GL/mesa/src/drv


3.4.3. Of what use is the Mesa code in the xc tree?

Mesa is used to build some server side modules/libraries specifically for the benefit of the DRI. The libGL.so is the client side aspect of Mesa which works closely with the server side components of Mesa.

The libGLU and libglut libraries are entirely client side things, and so they are distributed seperately.


3.4.4. Is there any documentation about the XMesa* calls?

There is no documentation for those functions. However, one can point out a few things.

First, despite the prolific use of the word "Mesa" in the client (and server) side DRI code, the DRI is not dependant on Mesa. It`s a common misconception that the DRI was designed just for Mesa. It`s just that the drivers that we at Precision Insight have done so far have Mesa at their core. Other groups are working on non-Mesa-based DRI drivers.

In the client-side code, you could mentally replace the string "XMesa" with "Driver" or some other generic term. All the code below xc/lib/GL/mesa/ could be replaced by alternate code. libGL would still work. libGL.so has no knowledge whatsoever of Mesa. It`s the drivers which it loads that have the Mesa code.

On the server side there`s more of the same. The XMesa code used for indirect/software rendering was originally borrowed from stand-alone Mesa and its pseudo GLX implementation. There are some crufty side-effects from that.


3.4.5. How do X modules and X applications communicate?

X modules are loaded like kernel modules, with symbol resolution at load time, and can thus call eachother functions. For kernel modules, the communication between applications and modules is done via the /dev/* files.

X applications call X libraries function which creates a packet and sends it to the server via sockets which processes it. That's all well documented in the standard X documentation.

There's 3 ways 3D clients can communicate with the server or each other:

  1. Via the X protocol requests. There are DRI extensions.

  2. Via the SAREA (the shared memory segment)

  3. Via the kernel driver.


4. Debugging and benchmarking

4.1. How do you put a breakpoint in the dynamically loaded modules?

You need xfree86-gdb, which is a version of gdb modified to understand the module binary format that XFree86 uses in addition to the standard elf/coff binary formats.

Example 1. Using xfree86-gdb

Use
modules mach64_drv.o mach64_dri.o
after starting the debugger to load symbols, etc.

xfree86-gdb is freely available from here.


4.2. How do I do benchmarking with Unreal Tournament?

Start a practive level. Type timedemo 1 in the console or in alternative select the "TimeDemo Statistic" menu entry under the "Tools" section.

You should see two numbers in white text on the right side of the screen showing the average framerate (Avg) and the number of frame rates in the last second (Last Sec). If this doesn't work check whether the stats get reported in your ~/.loki/ut/System/UnrealTournament.log file.


4.3. Is there any way for us to detect which features are implemented with hardware support for a given driver?

OpenGL doesn't have such a query. This is a potential problem with any OpenGL implementation. The real question one wants answered is "is this feature or GL state combination fast enough for my needs?". Whether a feature is implemented in hardware or software isn't always consistant with that question.

You might consider implementing a benchmark function to test the speed during start-up and making a decision depending on the result. The info could be cached in a file keyed by GL_RENDERER.

Check isfast


4.4. Which OpenGL benchmarking program can I use to test and compare various facets of the performance of graphics cards?

Games

You can use OpenGL games such as Quake 3, Unreal Tournament, etc.

SPECviewperf

SPECviewperf is a portable OpenGL performance benchmark program written in C providing a vast amount of flexibility in benchmarking OpenGL performance.

SPECglperf

SPECglperf is an executable toolkit that measures the performance of OpenGL 2D and 3D graphics operations. These operations are low-level primitives (points, lines, triangles, pixels, etc.) rather than entire models.

gleam

glean is a suite of tools for evaluating the quality of an OpenGL implementation and diagnosing any problems that are discovered. It also has the ability to compare two OpenGL implementations and highlight the differences between them.

machtest

machtest is a thorough benchmark for graphics cards. It has literary thousands of command line options, is easily extensible and it can produce machine readable output.

Mesa demos


4.5. How should I report bugs?

Please submit bugs through the bug tracking system in SourceForge. It`s the only way we can keep track of all of them. Write up one problem in each bug report. It`s best if you can create a small example that shows what you think is the problem.

For those who really want to be Open Source heros -- you can create a test for the bug under glean. The intention would be to run glean quite often, so any functionality you can verify there, is much less likely to reappear in a broken form at some random time in the future.


5. Hardware Specific

5.1. AGP

5.1.1. What is AGP?

AGP is a dedicated high-speed bus that allows the graphics controller to move large amoumts of data directly from system memory. Uses a Graphics Address Re-Mapping Table (GART) to provide a physically-contiguous view of scattered pages in system memory for DMA transfers.

Also check the Intel 440BX AGPset System Address Map


5.1.2. Where can I get more info about AGP?

http://www.agpforum.org/faq_ans.htm

tp://developer.intel.com/technology/agp/agp_index.htm


5.1.3. Why not use the existing XFree86 AGP manipulation calls?

You have to understand that the DRI functions have a different purpose then the ones in XFree. The DRM has to know about AGP, so it talks to the AGP kernel module itself. It has to be able to protect certain regions of AGP memory from the client side 3D drivers, yet it has to export some regions of it as well. While most of this functionality (most, not all) can be accomplished with the /dev/agpgart interface, it makes sense to use the DRM's current authentication mechanism. This means that there is less complexity on the client side. If we used /dev/agpgart then the client would have to open two devices, authenticate to both of them, and make half a dozen calls to agpgart, then only care about the DRM device.

Note

As a side note, the XFree86 calls were written after the DRM functions.

Also to answer a previous question about not using XFree86 calls for memory mapping, you have to understand that under most OS`es (probably solaris as well), XFree86`s functions will only work for root privileged processes. The whole point of the DRI is to allow processes that can connect to the X server to do some form of direct to hardware rendering. If we limited ourselves to using XFree86's functionality, we would not be able to do this. We don`t want everyone to be root.


5.1.4. How do I use AGP?

You can also use this test program as a bit more documentation as to how agpgart is used.


5.1.5. How to allocate AGP memory?

Generally programs do the following:

  1. open /dev/agpgart

  2. ioctl(ACQUIRE)

  3. ioctl(INFO) to determine amountof memory for AGP

  4. mmap the device

  5. ioctl(SETUP) to set the AGP mode

  6. ioctl(ALLOCATE) a chunk o memory, specifying offset in aperture

  7. ioctl(BIND) that same chunk o memory

Every time you update the GATT, you have to flush the cache and/or TLBs. This is expensive. Therefore, you allocate and bind the pages you'll use, and mmap() just returns the right pages when needed.

Then you need to have a remap of the agp aperture in the kernel which you can access. Use ioremap to do that.

After that you have access to the agp memory. You probably want to make sure that there is a write combining mtrr over the aperture. There is code in mga_drv.c in our kernel directory that shows you how to do that.


5.1.6. If one has to insert pages he needs to check for -EBUSY errors and loop through the entire GTT. Wouldn't it be better if the driver fills up pg_start of agp_bind structure instead of user filling up?

All this allocation should be done by only one process. If you need memory in the GTT you should be asking the Xserver for it (or whatever your controlling process is). Things are implemented this way so that the controlling process can know intimate details of how memory is laid out. This is very important for the I810, since you want to set tiled memory on certain regions of the aperture. If you made the kernel do the layout, then you would have to create device specific code in the kernel to make sure that the backbuffer/dcache are aligned for tiled memory. This adds complexity to the kernel that doesn`t need to be there, and imposes restrictions on what you can do with agp memory. Also, the current Xserver implementation (4.0) actually locks out other applications from adding to the GTT. While the Xserver is active, the Xserver is the only one who can add memory. Only the controlling process may add things to the GTT, and while a controlling process is active, no other application can be the controlling process.

Microsoft`s VGART does things like you are describing I believe. I think its bad design. It enforces a policy on whoever uses it, and is not flexible. When you are designing low level system routines I think it is very important to make sure your design has the minimum of policy. Otherwise when you want to do something different you have to change the interface, or create custom drivers for each application that needs to do things differently.


5.1.7. How does the DMA transfer mechanism works?

Here's a proposal for an zero-ioctl (best case) dma transfer mechanism.

Let's call it 'kernel ringbuffers'. The premise is to replace the calls to the 'fire-vertex-buffer' ioctl with code to write to a client-private mapping shared by the kernel (like the current sarea, but for each client).

Starting from the beginning:

  • Each client has a private piece of AGP memory, into which it will put secure commands (typically vertices and texture data). The client may expand or shrink this region according to load.

  • Each client has a shared user/kernel region of cached memory. (Per-context sarea). This is managed like a ring, with head and tail pointers.

  • The client emits vertices to AGP memory (as it currently does with DMA buffers).

  • When a statechange, clear, swap, flush, or other event occurs, the client:

    • Grabs the hardware lock.

    • Re-emits any invalidated state to the head of the ring.

    • Emits a command to fire the portion of AGP space as vertices.

    • Updates the head pointer in the ring.

    • Releases the lock.

  • The kernel is responsible for processing all of the rings. Several events might cause the kernel to examine active rings for commands to be dispatched:

    • A flush ioctl. (Called by impatient clients)

    • A periodic timer. (If this is low overhead?)

    • An interrupt previously emitted by the kernel. (If timers don't work)

Additionally, for those who've been paying attention, you'll notice that some of the assumptions that we use currently to manage hardware state between multiple active contexts are broken if client commands to hardware aren't executed serially in an order which is knowable to the clients. Otherwise, a client that grabs the heavy lock doesn't know what state has been invalidated or textures swapped out by other clients.

This could be solved by keeping per-context state in the kernel and implementing a proper texture manager. That's something we need to do anyway, but it's not a requirement for this mechanism to work.

Instead, force the kernel to fire all outstanding commands on client ringbuffers whenever the heavyweight lock changes hands. This provides the same serialized semantics as the current mechanism, and also simplifies the kernel's task as it knows that only a single context has an active ring buffer (the one last to hold the lock).

An additional mechanism is required to allow clients to know which pieces of their AGP buffer is pending execution by the hardware, and which pieces of the buffer are available to be reused. This is also exactly what NV_vertex_array_range requires.


5.2. ATI Cards

5.2.1. How to get ATI cards specification?

http://www.ati.com/na/pages/resource_centre/dev_rel/devrel.html


5.2.2. Mach64 based cards

5.2.2.1. I would like to help developing the Mach64 driver...

The first step would be to check out the current mach64 branch from dri CVS, the tag is 'mach64-0-0-2-branch.' Follow the instructions on dri.sf.net to compile and install the tree. A couple of things you need to know are:

1. Make sure to check out the branch, not the head (use '... co -r mach64-0-0-2-branch xc')

2. You need libraries and headers from a full X install. I used lndir to add symlinks from /usr/X11R6/include and /usr/X11R6/lib into /usr/X11R6-DRI.

You'll need to have AGP support for your chipset configured in your kernel and have the module loaded before starting X (assuming you build it as a module). At this point, you need agpgart for the driver to load, but AGP isn't currently used by the driver yet.

Take a look at the code, the list archives and the DRI documentation on dri.sf.net (it's a little stale, but a good starting point). We are also using the driver from the Utah-GLX project as a guide, so you might want to check that out (utah-glx.sf.net). Many of us have documentation from ATI as well, you can apply to their developer program for docs at http://apps.ati.com/developers/devform1.asp

Our first priority right now is to get the 3D portion of the driver using DMA transfers (GUI mastering) rather than direct register programming. Frank Earl is currently working on this. Then we need to get the 2D driver to honor the drm locking scheme so we can enable 2D acceleration, which is currently disabled. Right now switching back to X from a text console or switching modes can cause a lockup because 2D and 3D operations are not synchronized. Also on the todo list is using AGP for texture uploads and finishing up the Mesa stuff (e.g. getting points and lines working, alpha blending...).


5.3. 3DFX

5.3.1. What's the relationship between Glide and DRI?

Right now the picture looks like this:

Client -> OpenGL/GLX -> Glide as HAL (DRI) -> hw

In this layout the Glide(DRI) is really a hardware abstraction layer. The only API exposed it OpenGL and Glide(DRI) only works with OpenGL. It isn`t useful by itself.

There are a few Glide only games. 3dfx would like to see those work. So the current solution, shown above, doesn`t work since the Glide API isn`t available. Instead we need:

Client -> Glide as API (DRI) -> hw

Right now Mesa does a bunch of the DRI work, and then hands that data down to Glide. Also Mesa does all the locking of the hardware. If we`re going to remove Mesa, then Glide now has to do the DRI work, and we have to do something about the locking.

The solution is actually a bit more complicated. Glide wants to use all the memory as well. We don`t want the X server to draw at all. Glide will turn off drawing in the X server and grab the lock and never let it go. That way no other 3D client can start up and the X server can still process keyboard events and such for you. When the Glide app goes away we just force a big refresh event for the whole screen.

I hope that explains it. We`re really not trying to encourage people to use the Glide API, it is just to allow those existing games to run. We really want people to use OpenGL directly.

Another interesting project that a few people have discussed is removing Glide from the picture at all. Just let Mesa send the actual commands to the hardware. That`s the way most of our drivers were written. It would simplify the install process (you don`t need Glide separately) and it might improve performance a bit, and since we`re only doing this for one type of hardware (Voodoo3+) Glide isn`t doing that much as a hardware abstraction layer. It`s some work. There`s about 50 calls from Glide we use and those aren`t simple, but it might be a good project for a few people to tackle.


5.4. S3

5.4.1. Are there plans to enable the s3tc extension on any of the cards that currently support it?

There's not a lot we can do with S3TC because of S3's patent/license restrictions.

Normally, OpenGL implementations would do software compression of textures and then send them to the board. The patent seems to prevent that, so we're staying away from it.

If an application has compressed texture (they compressed them themselves or compressed them offline) we can download the compressed texture to the board. Unfortunetly, that's of little use since most applications don't work that way.


6. Miscellaneous Questions

6.1. How about the main X drawing surface? Are 2 extra "window sized" buffers allocated for primary and secondary buffers in a page-flipping configuration?

Right now, we don`t do page flipping at all. Everything is a blit from back to front. The biggest problem with page flipping is detecting when you`re in full screen mode, since OpenGL doesn`t really have a concept of full screen mode. We want a solution that works for existing games. So we`ve been designing a solution for it. It should get implemented fairly soon since we need it for antialiasing on the V5.

In the current implementation the X front buffer is the 3D front buffer. When we do page flipping we`ll continue to do the same thing. Since you have an X window that covers the screen it is safe for us to use the X surface`s memory. Then we`ll do page flipping. The only issue will be falling back to blitting if the window is ever moved from covering the whole screen.


6.2. Is anyone working on adding SSE support to the transform/lighting code in Mesa?

SSE stuff was somewhat broken in the kernels until recently. In fact, we (Gareth Hughes to be precise) just submitted a big kernel patch that should fully support SSE. I don`t know if anyone is working on them for Mesa, I haven`t seen much in that area lately.

I`d start with profiling your app against the current Mesa base, to decide where the optimization effort should go. I`m not convinced SSE is the next right step. There may be more fundamental optimizations to do first. We haven`t spent a much time on optimizing it.


6.3. How often are checks done to see if things need clipped/redrawn/redisplayed?

The locking system is designed to be highly efficient. It is based on a two tiered lock. Basically it works like this:

The client wants the lock. The use the CAS (I was corrected that the instruction is compare and swap, I knew that was the functionality, but I got the name wrong) If the client was the last application to hold the lock, you`re done you move on.

If it wasn`t the last one, then we use an IOCTL to the kernel to arbitrate the lock.

In this case some or all of the state on the card may have changed. The shared memory carries a stamp number for the X server. When the X server does a window operation it increments the stamp. If the client sees that the stamp has changed, it uses a DRI X protocol request to get new window location and clip rects. This only happens on a window move. Assuming your clip rects/window position hasn`t changed, the redisplay happens entirely in the client.

The client may have other state to restore as well. In the case of the tdfx driver we have three more flags for command fifo invalid, 3D state invalid, textures invalid. If those are set the corresponding state is restored.

So, if the X server wakes up to process input, it current grabs the lock but doesn`t invalidate any state. I`m actually fixing this now so that it doesn`t grab the lock for input processing.

If the X server draws, it grabs the lock and invalidates the command fifo.

If the X server moves a window, it grabs the lock, updates the stamp, and invalidates the command fifo.

If another 3D app runs, it grabs the lock, invalidates the command fifo, invalidates the 3D state and possibly invalidates the texture state.


6.4. What's the deal with fullscreen and DGA?

The difference between in a window and fullscreen is actually quite minor. When you`re in fullscreen mode what you`ve done is zoomed in your desktop and moved the zoomed portion to cover just the window. (Just the same hitting ctrl-alt-plus and ctrl-alt-minus) The game still runs in a window in either case. So, the behavior shouldn`t be any different.

DGA is turned off in the current configuration. I`ve just started adding those features back in. The latest code in the trunk has some support for DGA but is broken, the code in my tdfx-1-1 branch should be working.


6.5. DRI without X

6.5.1. Can DRI run without X?

The first question you have to ask is whether you really should throw out X11. X11 does event handling, multiple windows, etc. It can also be made quite light weight. It's running in 600k on an IPAQ.

If you decide you do need to throw out X, then you have to ask yourself if the DRI is the right solution for your problem. The DRI handles multiple 3D clients accessing hardware at the same time, sharing the same screen, in a robust and secure manner. If you don't need those properties the DRI isn't necessarily the right solution for you.

If you get to this point, then it would be theoretically possible to remove the DRI from X11 and have it run without X11. There's no code to support that at this point, so it would require some significant work to do that.


6.5.2. What would be the advantages of a non-X version of DRI?

The main reasons one would be interested in a non-X version of DRI

Pros: Eliminate any performance bottlenecks the XServer may be causing. Since we are 3D only, any extraneous locking/unlocking, periodic refreshes of the (hidden) 2D portion of the display, etc., will cause unexpected slowdowns.

Cons: If the X server never does any drawing then the overhead is minimal. Locking is done around Glide operations. A lock check is a single Check And Set (CAS) instruction. Assuming your 3D window covers the X root, then there are no 2D portions to redisplay.

Pros: Eliminate wasted system memory requirements.

Cons: Yes, there will be some resources from the X server, but I think not much.

Pros: Eliminate on-card font/pixmap/surface/etc caches that just waste memory.

Cons: If you don`t use them they aren`t taking any resources. Right now, there is a small pixmap cache that`s staticall added to 2D. Removing that is a trivial code change. Making it dynamic (3D steals it away from 2D) is not too tough and a better solution than any static allocation.

Pros: Eliminate the need for extra peripherals, such as mice.

Cons: Allowing operations without a mouse should be trivial if it isn`t a configuration option already.

Pros: Reduction in the amount of software necessary to install/maintain on a customer`s system. Certainly none of my customers would have been able to install XFree 4.0 on their own.

Cons: XFree 4.0 installs with appropriate packaging are trivial. What you`re saying is that no one has done the packaging work for you, and that`s correct. If you create your own stripped DRI version you`ll be in for a lot more packaging work on your own.

The impact of the Xserver is on the 3D graphics pipeline is very little. Essentially none in the 3D pipeline. Some resources, but I think not much. Realize the CAS is in the driver, so you`re looking at creating a custom version of that as well. I think effort spent avoiding the CAS, creating your own window system binding for GL, and moving the DRI functionality out of the X server would be much better spent optimizing Mesa and the driver instead. You have to focus resources where they provide the biggest payoff.


6.5.3. I would like to make a X11 free acces to 3d...

Take a look at the fbdri project. They're trying to get the DRI running directly on a console with the Radeon. If all you want is one window and only running OpenGL then this makes sense.

I'll throw out another option. Make the DRI work with TinyX. TinyX runs in 600k and gives you all of X. It should be a fairly straight forward project. As soon as you want more than one window, it makes a lot of sense to use the X framework that already exists.


6.6. DRI driver initialization process

		I've been going through the DRI code and documents trying to figure out how
		all this stuff works.  I decided to start at the start, so I've been looking
		at the driver initialization process.  Boys and girls, when you're going
		through a huge chunk of code like the DRI, cscope is your friend!  I've
		taken some notes along the way, which are included in the bottom of this
		message.
		
		Could the people who actually know look at my notes and correct me where I'm
		wrong?
		
		- The whole process begins when an application calls glXCreateContext
		  (lib/GL/glx/glxcmds.c).  glXCreateContext is just a stub that call
		  CreateContext.  The real work begins when CreateContext calls
		  __glXInitialize (lib/GL/glx/glxext.c).
		
		- The driver specific initialization process starts with __driCreateScreen.
		  Once the driver is loaded (via dlopen), dlsym is used to get a pointer to
		  this function.  The function pointer for each driver is stored in the
		  createScreen array in the __DRIdisplay structure.  This initialization is
		  done in driCreateDisplay (lib/GL/dri/dri_glx.c), which is called by
		  __glXInitialize.
		
		I should also point out, to make it clear, that __driCreateScreen() really
		is the bootstrap of a DRI driver.  It's the only* function in a DRI driver
		that libGL directly knows about.  All the other DRI functions are accessed via
		the __DRIdisplayRec, __DRIscreenRec, __DRIcontextRec and __DRIdrawableRec
		structs
		defined in glxclient.h).  Those structs are pretty well documented in the
		file.
		
		*Footnote: that's not really true- there's also the __driRegisterExtensions()
		function that libGL uses to implement glXGetProcAddress().  That's another
		long story.
		
		
		- After performing the __glXInitialize step, CreateContext calls the
		  createContext function for the requested screen.  Here the driver creates
		  two data structures.  The first, GLcontext (extras/Mesa/src/mtypes.h),
		  contains all of the device independent state, device dependent constants
		  (i.e., texture size limits, light limits, etc.), and device dependent
		  function tables.  The driver also allocates a structure that contains all
		  of the device dependent state.  The GLcontext structure links to the
		  device dependent structure via the DriverCtx pointer.  The device
		  dependent structure also has a pointer back to the GLcontext structure.
		
		  The device dependent structure is where the driver will store context
		  specific hardware state (register settings, etc.) for when
		  context (in terms of OpenGL / X context) switches occur.  This structure is
		  analogous to the buffers where the OS stores CPU state where a program
		  context switch occurs.
		
		
			The texture images really are stored within Mesa's
			data structures.  Mesa supports about a dozen texture formats which
			happen to satisfy what all the DRI drivers need.  So, the texture format/
			packing is dependant on the hardware, but Mesa understands all the
			common formats.  See Mesa/src/texformat.h.  Gareth and Brian spent a lot of
			time on that.
		
		
		- createScreen (i.e., the driver specific initialization function) is called
		  for each screen from AllocAndFetchScreenConfigs (lib/GL/glx/glxext.c).
		  This is also called from __glXInitialize.
		
		- For all of the existing drivers, the __driCreateScreen function is just a
		  wrapper that calls __driUtilCreateScreen (lib/GL/dri/dri_util.c) with a
		  pointer to the driver's API function table (of type __DriverAPIRec).  This
		  creates a __DRIscreenPrivate structure for the display and fills it in
		  (mostly) with the supplied parameters (i.e., screen number, display
		  information, etc.).  
		  
		  It also opens and initialized the connection to DRM.  This includes
		  opening the DRM device, mapping the frame buffer (note: the DRM
		  documentation says that the function used for this is called drmAddMap, but
		  it is actually called drmMap), and mapping the SAREA.  The final step is
		  to call the driver initialization function for the driver (from the
		  InitDriver field in the __DriverAPIRec (DriverAPI field of the
		  __DRIscreenPrivate).
		
		- The InitDriver function does (at least in the Radeon and i810 drivers) two
		  broad things.  It first verifies the version of the services (XFree86,
		  DDX, and DRM) that it will use.  In the two drivers that I examined, this
		  code was exactly the same and could probably be moved to dri_util.[ch]. 
		  
		  (I'd look at more drivers first to see if factoring out that code is really
		a good idea.)
		
		  The driver then creates an internal representation of the screen and
		  stores it (the pointer to the structure) in the private field of the
		  __DRIscreenPrivate structure.  The driver-private data may include things
		  such as mappings of MMIO registers, mappings of display and texture
		  memory, information about the layout if video memory, chipset version
		  specific data (feature availability for the specific chip revision, etc.),
		  and other similar data.  This is the handle that identifies the specific
		  graphics card to the driver (in case there is more than one card in the
		  system that will use the same driver).
		
		- After performing the __glXInitialize step, CreateContext calls the
		  createContext function for the requested screen.  This is where it gets
		  pretty complicated.  I have only looked at the Radeon driver.
		  radeonCreateContext (lib/GL/mesa/src/drv/radeon/radeon_context.c)
		  allocates a GLcontext structure (actually 'struct __GLcontextRec from
		  extras/Mesa/src/mtypes.h).  Here it fills in function tables for virtually
		  every OpenGL call.  Additionally, the __GLcontextRec has pointers to
		  buffers where the driver will store context specific hardware state
		  (textures, register settings, etc.) for when context (in terms of
		  OpenGL / X context) switches occur. 
		
		The __GLcontextRec (i.e. GLcontext in Mesa) doesn't have any buffers
		of hardware-specific data (except texture image data if you want to be
		picky).  All Radeon-specific, per-context data should be hanging off
		of the struct radeon_context.
		
		All the DRI drivers define a hardware-specific context structure
		(such as struct radeon_context, typdef'd to be radeonContextRec, or
		struct mga_context_t typedef'd to be mgaContext).
		
		radeonContextRec has a pointer back to the Mesa __GLcontextRec.
		And Mesa's __GLcontextRec->DriverCtx pointer points back to the
		radeonContextRec
		
		If we were writing all this in C++ (don't laugh) we'd treat Mesa's
		__GLcontextRec as a base class and create driver-specific derived
		classes from it.
		
		Inheritance like this is actually pretty common in the DRI code,
		even though it's sometimes hard to spot.
		
		
		These buffers are analogous to the
		  buffers where the OS stores CPU state where a program context switch occurs.
		
		Note that we don't do any fancy hardware context switching in our drivers.
		When we make-current a new context, we basically update all the hardware
		state with that new context's values.
		
		- When each of the function tables is initialized (see radeonInitSpanFuncs
		  for an example), an internal Mesa function is called.  This function
		  (e.g., _swrast_GetDeviceDriverReference) both allocates the buffer and
		  fills in the function pointers with the software fallbacks.  If a driver
		  were to just call these allocation functions and not replace any of the
		  function poniters, it would be the same as the software renderer.
		
		- The next part seems to start when the createDrawable function in the
		  __DRIscreenRec is called, but I don't see where this happens.
		
		createDrawable should be called via glXMakeCurrent() since that's the
		first time we're given an X drawable handle.  Somewhere during glXMake-
		Current() we use a DRI hash lookup to translate the X Drawable handle
		into an pointer to a __DRIdrawable.  If we get a NULL pointer that means
		we've never seen that handle before and now have to allocate the
		__DRIdrawable and initialize it (and put it in the hash table).
		


6.7. Utah-GLX

6.7.1. How similar/diffrent are the driver arch of Utah and DRI?

Utah is based on earlier Mesa code. Some of the work is the "DRI" work, and some is the "Mesa" work. The Mesa work will transfer over reasonably well. The DRI work is mostly initialization and kernel drivers.


6.7.2. Should one dig into the Utah source first, then go knee-deep in DIR, or the other way around?

So you want to study a good DRI driver and then move Utah code over.


7. Authorship and Acknowledgments

This FAQ is compiled and maintained by José Fonseca, j_r_fonseca@yahoo.co.uk, with assistance and comments from the DRI developers mailing list subscribers.

In the impossibility of getting every person permission to quote them, if you are the author of any material here and don't want its reproduction please contact the author and it will be promptly removed.

Glossary

3DFX

A company that produced the Voodoo Chipset, the first successful consumer 3D chipset for the IBM PC.

Application Programming Interface
(API)

Common name used to describe the interface a programmer uses to access a library. Common 3D APIs include OpenGL, Direct3D, QuickDraw3D, Renderman, Glide, and there are dozens of other lesser known ones. The two most popular real-time 3D APIs these days are OpenGL and Direct3D. Glide enjoyed a dominant but brief popularity on the PC platform in the late 1990s.

See Also: Direct3D, Glide, Open Graphics Library.

Direct3D
(D3D)

Microsoft's proprietary 3D API. It was originally written by RenderMorphics and was acquired by Microsoft in 1995. Direct3D is available only on the Windows series of operating systems. Direct3D has captured the mindshare amongst Windows developers, mostly due to the complete lack of any viable alternatives on the PC platform. Microsoft has been pushing Direct3D over OpenGL for many years, sometimes against the opposing wishes of Windows developers.

Accelerated Graphics Port
(AGP)

A dedicated high-speed bus that allows the graphics controller to move large amoumts of data directly from system memory. Uses a Graphics Address Re-Mapping Table (GART) to provide a physically-contiguous view of scattered pages in system memory for (DMA) transfers.

See Also: Direct Memory Access.

Architecture Review Board
(ARB)

The group of companies (including SGI, IBM, Intel, ATI, Nvidia, Microsoft, and others) responsible for the continued refinement and improvement of the OpenGL specifications. It is largely because of the ARB that OpenGL remains a clean, consistent, portable, vendor-neutral API.

See Also: Open Graphics Library.

Direct Rendering

Direct Rendering lets an OpenGL client write directly to the 3D hardware. This is much faster than Indirect Rendering but it can be very complicated to achieve, especially on multi-user systems like UNIX.

See Also: Indirect Rendering.

Direct Memory Access
(DMA)

A facility of some architectures which allows a peripheral to read and write memory without intervention by the CPU. DMA is a limited form of bus mastering.

See Also: Accelerated Graphics Port.

Direct Rendering Interface
(DRI)

One of the problems with Direct Rendering is that other applications (say the X11 server, or other OpenGL clients) might want to talk to the hardware at the same time. In addition there can be problems with security, multiple users, and several other things caused by the complexity of UNIX. An intricate software design called the Direct Rendering Infrastructure coordinates everything and prevents problems. The DRI used on Linux was designed and implemented by Precision Insight (who were recently bought by VA Linux). The DRI only works on XFree86 4.0 or later.

See Also: Direct Rendering.

First-In, First-Out
(FIFO)

Is an approach to handling program work requests from queues so that the oldest request is handled next.

See Also: Direct Rendering.

Direct Rendering Module
(DRM)

This is a kernel module that gives direct hardware access to DRI clients. Every video card has its own specially customised DRM. The DRM handles security, hardware access, protects the system from instability, etc.

See Also: Direct Rendering Interface.

Full Screen Anti Aliasing
(FSAA)

A technique which oversamples each pixel to achieve a more realistic image.

Glide

A 3D API developed by the 3DFX company. Glide only works with Voodoo based 3D cards and in many ways it is very similar to OpenGL. You can write an entire 3D application using Glide and it should (in theory) run on any Voodoo based 3D card. Glide is now open source.

See Also: 3DFX.

Graphics Library Utilities
(OpenGL)

OpenGL is good but doing some common operations is a regular pain in the proverbial. GLU is a platform independent library that can build spheres, perform collision detection, determine if a point is inside a 3D shape, etc. GLU works on top of OpenGL.

See Also: Open Graphics Library.

Graphics Library Utility Toolkit
(GLUT)

OpenGL is platform independent but certain operations (like creating windows, receiving mouse clicks, resizing windows) are done differently over the many platforms supported by OpenGL. GLUT lets you write a single application that will compile and work on Windows, X11, MacOS, etc. GLUT is fairly limited and so it's more useful for simple demonstration programs rather than full blown applications or games.

See Also: Open Graphics Library.

GLX

X11 is a networked windowing system. Your client and your server might be on different machines. GLX packages up OpenGL commands into network packets, spits them across the X11 network pipe, then unpacks them at the other end. This lets you run accelerated 3D remotely: the client could be a simulation on a mainframe, the display could be your desktop machine in your office. GLX does a number of other X11 related things that couldn't be packaged into OpenGL.

See Also: Indirect Rendering, Open Graphics Library, X11.

Indirect Rendering

When OpenGL commands are packaged up with the GLX library and transported across a network pipe (even if that network pipe is local) it is termed Indirect Rendering.

See Also: Direct Rendering.

Mesa

Brian Paul wrote a free open-source implementation of OpenGL called Mesa. The name has no hidden meaning, it just sounds nice. The original versions of Mesa only did software rendering. Recent versions of Mesa have had accelerated backends for Glide, DRI, etc.

See Also: Open Graphics Library.

Memory-Mapped Input-Output
(MMIO)

Operations that access a region of graphics card memory that has been memory-mapped into the virtual address space, or to operations that access graphics hardware registers via a memory-mapping of the registers into the virtual address space (in contrast to PIO).

Note

Graphics hardware "registers" may actually be pseudo-registers that provide access to the hardware FIFO command queue.

See Also: Programmed Input-Output.

Memory Type Range Register
(MTRR)

On Intel P6 family processors (Pentium Pro, Pentium II and later) the Memory Type Range Registers (MTRRs) may be used to control processor access to memory ranges. This is most useful when you have a video (VGA) card on a PCI or AGP bus. Enabling write-combining allows bus write transfers to be combined into a larger transfer before bursting over the PCI/AGP bus. This can increase performance of image write operations 2.5 times or more.

Open Graphics Library
(OpenGL)

SGI originally developed a graphics library called IrisGL for their high-end 3D hardware. They made some clever changes to make it work on any platform and renamed it OpenGL. Very recently SGI relaxed their restrictions for licensing and also released conformance tests for OpenGL. OpenGL abstracts 3D operations such as projections, lighting, rendering, texturing, matrix operations, etc, making it very easy for developers to produce high quality 3D applications.

Programmed Input-Output
(PIO)

Operations that must use the Intel in and out instructions (or equivalent non-Intel instructions) to access the graphics hardware (in contrast to using memory-mapped graphics hardware registers, which allow for the use of mov instructions).

Note

Graphics hardware "registers" may actually be pseudo-registers that provide access to the hardware FIFO command queue.

See Also: Memory-Mapped Input-Output.

Simple Direct-Media Layer
(SDL)

OpenGL only handles graphics (2D and 3D). Sam Lantinga (who now works at Loki Software Inc) wanted to handle cdroms, audio, mixers, joysticks, keyboards, mpeg playback, and anything else related to multimedia and games. He also wanted to do all this in a platform independent way so he could compile and use the same source code on Windows, MacOS, Linux, etc. The SDL library achieves this. SDL relies heavily on other libraries like Mesa to do the grunt work. You can think of SDL as a more powerful version of GLUT.

See Also: Graphics Library Utility Toolkit.

Scan Line Interleaving
(SLI)

A technique where 2 or more 3D processors are used in parallel. Each processor renders only a fraction of the scanlines in the final image. This lets you achieve much higher framerates in games.

Texture mapping

A graphic design process in which a 2D surface, called a texture map, is "wrapped around" a 3D object. Thus, the 3-D object acquires a surface texture similar to that of the 2-D surface.

Transform & Lighting
(T&L)

This is a technique where the 3D card does even more of the 3D calculations. This eases the load on the host CPU which results in significant speed increases for certain applications (especially games).

Utah-GLX

An early project to integrate accelerated 3D into XFree86 3.3. The primitive architecture of Utah-GLX makes it slower than the DRI but it is much simpler to implement and is also easier to write drivers for. This meant Utah-GLX was available earlier than the DRI and with a greater range of supported 3D cards. Utah-GLX is still the only available option for accelerated 3D under XFree86 3.3 unless you have a Voodoo based card.

See Also: Direct Rendering Interface, GLX.

X11

Eleventh version of the X Window System. The X Window System is a network transparent window system which runs on a wide range of computing and graphics machines. X11 has primitives which are useful for the creation of graphical desktops (eg, Windows, Colors, Displays, Screens). You can almost think of X11 as a 2D graphics library but in practice it does far more than that. X11 is also responsible for delivering a unified stream of events describing the user's interaction with input devices like the keyboard, mouse, touch pads, etc.

XFree86

A free open-source implementation of X11. XFree86 implements drivers for a vast array of popular graphics cards and input devices.

See Also: X11.

Z-buffering

An algorithm used in 3-D graphics to ensure that perspective works the same way in the virtual world as it does in the real one: a solid object in the foreground will block the view of one behind it. It works by testing pixel depth and comparing the current position (z coordinate) with stored data in a buffer (called a z-buffer ) that holds information about each pixel's last postion. The pixel in the closer position to the viewer is the one that will be displayed, just as the person in front of the television is what the viewer sees rather than the screen.

Z-buffering is one of three Visual Surface Determination (VSD) algorithms commonly used for this purpose. The other two, BSP trees and depth sorting, work with polygons and consequently are less effective for portrayal of movement and overlap. Since it works at the pixel level, z-buffering can be demanding in terms of memory and processing time. Nevertheless, its more complex and life-like simulation of real-world object dynamics ensures its continuing popularity as a 3-D graphics development tool.

See Also: X11.


References

DRI

A Multipipe Direct Rendering Architecture for 3D , Jens Owen and Kevin Martin, 15 Sep 1998, 1998.

Introduction to the Direct Rendering Infrastructure , Brian Paul, 10 Aug 2000, 2000.

[FOM99]  Hardware Locking for the Direct Rendering Infrastructure , Rickard Faith, Jens Owen, and Kevin Martin, 1999, (This document is in decent shape. However, we don't use hardware locking very much for DMA based drivers; and it's really a infrastructure design justification. Not recommended for 1st time driver writers; better for developer wanting to extend the infrastructure and needing a low overhead lock.) .

[Faith99]  The Direct Rendering Manager: Kernel Support for the Direct Rendering Infrastructure , Rickard Faith, 1999.


OpenGL

OpenGL Developer FAQ and TroubleShooting Guide , Paul Martz.

OpenGL Programming Guide .

OpenGL.org .


Jargon

Free On-line Dictionary of Computing , Denis Howe, 1993.

whatis?com .

Notes

[1]

Extracted and slightly edited from a email between Jens Owen and Matt Sottek from Intel.