AGP is a dedicated high-speed bus that allows the graphics controller to move large amoumts of data directly from system memory. Uses a Graphics Address Re-Mapping Table (GART) to provide a physically-contiguous view of scattered pages in system memory for DMA transfers.
Also check the Intel 440BX AGPset System Address Map
You have to understand that the DRI functions have a different purpose then the ones in XFree. The DRM has to know about AGP, so it talks to the AGP kernel module itself. It has to be able to protect certain regions of AGP memory from the client side 3D drivers, yet it has to export some regions of it as well. While most of this functionality (most, not all) can be accomplished with the /dev/agpgart interface, it makes sense to use the DRM's current authentication mechanism. This means that there is less complexity on the client side. If we used /dev/agpgart then the client would have to open two devices, authenticate to both of them, and make half a dozen calls to agpgart, then only care about the DRM device.
As a side note, the XFree86 calls were written after the DRM functions.
Also to answer a previous question about not using XFree86 calls for memory mapping, you have to understand that under most OS`es (probably solaris as well), XFree86`s functions will only work for root privileged processes. The whole point of the DRI is to allow processes that can connect to the X server to do some form of direct to hardware rendering. If we limited ourselves to using XFree86's functionality, we would not be able to do this. We don`t want everyone to be root.
You can also use this test program as a bit more documentation as to how agpgart is used.
Generally programs do the following:
ioctl(INFO) to determine amountof memory for AGP
mmap the device
ioctl(SETUP) to set the AGP mode
ioctl(ALLOCATE) a chunk o memory, specifying offset in aperture
ioctl(BIND) that same chunk o memory
Every time you update the GATT, you have to flush the cache and/or TLBs. This is expensive. Therefore, you allocate and bind the pages you'll use, and mmap() just returns the right pages when needed.
Then you need to have a remap of the agp aperture in the kernel which you can access. Use ioremap to do that.
After that you have access to the agp memory. You probably want to make sure that there is a write combining mtrr over the aperture. There is code in mga_drv.c in our kernel directory that shows you how to do that.
All this allocation should be done by only one process. If you need memory in the GTT you should be asking the Xserver for it (or whatever your controlling process is). Things are implemented this way so that the controlling process can know intimate details of how memory is laid out. This is very important for the I810, since you want to set tiled memory on certain regions of the aperture. If you made the kernel do the layout, then you would have to create device specific code in the kernel to make sure that the backbuffer/dcache are aligned for tiled memory. This adds complexity to the kernel that doesn`t need to be there, and imposes restrictions on what you can do with agp memory. Also, the current Xserver implementation (4.0) actually locks out other applications from adding to the GTT. While the Xserver is active, the Xserver is the only one who can add memory. Only the controlling process may add things to the GTT, and while a controlling process is active, no other application can be the controlling process.
Microsoft`s VGART does things like you are describing I believe. I think its bad design. It enforces a policy on whoever uses it, and is not flexible. When you are designing low level system routines I think it is very important to make sure your design has the minimum of policy. Otherwise when you want to do something different you have to change the interface, or create custom drivers for each application that needs to do things differently.
Here's a proposal for an zero-ioctl (best case) dma transfer mechanism.
Let's call it 'kernel ringbuffers'. The premise is to replace the calls to the 'fire-vertex-buffer' ioctl with code to write to a client-private mapping shared by the kernel (like the current sarea, but for each client).
Starting from the beginning:
Each client has a private piece of AGP memory, into which it will put secure commands (typically vertices and texture data). The client may expand or shrink this region according to load.
Each client has a shared user/kernel region of cached memory. (Per-context sarea). This is managed like a ring, with head and tail pointers.
The client emits vertices to AGP memory (as it currently does with DMA buffers).
When a statechange, clear, swap, flush, or other event occurs, the client:
Grabs the hardware lock.
Re-emits any invalidated state to the head of the ring.
Emits a command to fire the portion of AGP space as vertices.
Updates the head pointer in the ring.
Releases the lock.
The kernel is responsible for processing all of the rings. Several events might cause the kernel to examine active rings for commands to be dispatched:
A flush ioctl. (Called by impatient clients)
A periodic timer. (If this is low overhead?)
An interrupt previously emitted by the kernel. (If timers don't work)
Additionally, for those who've been paying attention, you'll notice that some of the assumptions that we use currently to manage hardware state between multiple active contexts are broken if client commands to hardware aren't executed serially in an order which is knowable to the clients. Otherwise, a client that grabs the heavy lock doesn't know what state has been invalidated or textures swapped out by other clients.
This could be solved by keeping per-context state in the kernel and implementing a proper texture manager. That's something we need to do anyway, but it's not a requirement for this mechanism to work.
Instead, force the kernel to fire all outstanding commands on client ringbuffers whenever the heavyweight lock changes hands. This provides the same serialized semantics as the current mechanism, and also simplifies the kernel's task as it knows that only a single context has an active ring buffer (the one last to hold the lock).
An additional mechanism is required to allow clients to know which pieces of their AGP buffer is pending execution by the hardware, and which pieces of the buffer are available to be reused. This is also exactly what NV_vertex_array_range requires.
The first step would be to check out the current mach64 branch from dri CVS, the tag is 'mach64-0-0-2-branch.' Follow the instructions on dri.sf.net to compile and install the tree. A couple of things you need to know are:
1. Make sure to check out the branch, not the head (use '... co -r mach64-0-0-2-branch xc')
2. You need libraries and headers from a full X install. I used lndir to add symlinks from /usr/X11R6/include and /usr/X11R6/lib into /usr/X11R6-DRI.
You'll need to have AGP support for your chipset configured in your kernel and have the module loaded before starting X (assuming you build it as a module). At this point, you need agpgart for the driver to load, but AGP isn't currently used by the driver yet.
Take a look at the code, the list archives and the DRI documentation on dri.sf.net (it's a little stale, but a good starting point). We are also using the driver from the Utah-GLX project as a guide, so you might want to check that out (utah-glx.sf.net). Many of us have documentation from ATI as well, you can apply to their developer program for docs at http://apps.ati.com/developers/devform1.asp
Our first priority right now is to get the 3D portion of the driver using DMA transfers (GUI mastering) rather than direct register programming. Frank Earl is currently working on this. Then we need to get the 2D driver to honor the drm locking scheme so we can enable 2D acceleration, which is currently disabled. Right now switching back to X from a text console or switching modes can cause a lockup because 2D and 3D operations are not synchronized. Also on the todo list is using AGP for texture uploads and finishing up the Mesa stuff (e.g. getting points and lines working, alpha blending...).
Right now the picture looks like this:
Client -> OpenGL/GLX -> Glide as HAL (DRI) -> hw
In this layout the Glide(DRI) is really a hardware abstraction layer. The only API exposed it OpenGL and Glide(DRI) only works with OpenGL. It isn`t useful by itself.
There are a few Glide only games. 3dfx would like to see those work. So the current solution, shown above, doesn`t work since the Glide API isn`t available. Instead we need:
Client -> Glide as API (DRI) -> hw
Right now Mesa does a bunch of the DRI work, and then hands that data down to Glide. Also Mesa does all the locking of the hardware. If we`re going to remove Mesa, then Glide now has to do the DRI work, and we have to do something about the locking.
The solution is actually a bit more complicated. Glide wants to use all the memory as well. We don`t want the X server to draw at all. Glide will turn off drawing in the X server and grab the lock and never let it go. That way no other 3D client can start up and the X server can still process keyboard events and such for you. When the Glide app goes away we just force a big refresh event for the whole screen.
I hope that explains it. We`re really not trying to encourage people to use the Glide API, it is just to allow those existing games to run. We really want people to use OpenGL directly.
Another interesting project that a few people have discussed is removing Glide from the picture at all. Just let Mesa send the actual commands to the hardware. That`s the way most of our drivers were written. It would simplify the install process (you don`t need Glide separately) and it might improve performance a bit, and since we`re only doing this for one type of hardware (Voodoo3+) Glide isn`t doing that much as a hardware abstraction layer. It`s some work. There`s about 50 calls from Glide we use and those aren`t simple, but it might be a good project for a few people to tackle.
There's not a lot we can do with S3TC because of S3's patent/license restrictions.
Normally, OpenGL implementations would do software compression of textures and then send them to the board. The patent seems to prevent that, so we're staying away from it.
If an application has compressed texture (they compressed them themselves or compressed them offline) we can download the compressed texture to the board. Unfortunetly, that's of little use since most applications don't work that way.