3. DRI Architecture

3.1. Overview

3.2. Direct Rendering Module (DRM)

3.2.1. What is the DRM?

This is a kernel module that gives direct hardware access to DRI clients.

This module deals with DMA, AGP memory management, resource locking, and secure hardware access. In order to support multiple, simultaneous 3D applications the 3D graphics hardware must be treated as a shared resource. Locking is required to provide mutual exclusion. DMA transfers and the AGP interface are used to send buffers of graphics commands to the hardware. Finally, there must be security to prevent out-of-control clients from crashing the hardware.

3.2.2. Where does the DRM resides?

Since internal Linux kernel interfaces and data structures may be changed at any time, DRI kernel modules must be specially compiled for a particular kernel version. The DRI kernel modules reside in the /lib/modules/.../kernel/drivers/char/drm directory. Normally, the X server automatically loads whatever DRI kernel modules are needed.

For each 3D hardware driver there is a kernel module. DRI kernel modules are named device.o where device is a name such as tdfx, mga, r128, etc.

The source code resides in xc/programs/Xserver/hw/xfree86/os-support/linux/drm/ .

3.2.3. In what way does the DRM supports the DRI?

The DRM supports the DRI in three major ways:

  1. The DRM provides synchronized access to the graphics hardware.

    The direct rendering system has multiple entities (i.e., the X server, multiple direct-rendering clients, and the kernel) competing for direct access to the graphics hardware. Hardware that is currently available for PC-class machines will lock up if more than one entity is accessing the hardware (e.g., if two clients intermingle requests in the command FIFO or (on some hardware) if one client reads the framebuffer while another writes the command FIFO).

    The DRM provides a single per-device hardware lock to synchronize access to the hardware. The hardware lock may be required when the X server performs 2D rendering, when a direct-rendering client is performing a software fallback that must read or write the frame buffer, or when the kernel is dispatching DMA buffers.

    This hardware lock may not be required for all hardware (e.g., high-end hardware may be able to intermingle command requests from multiple clients) or for all implementations (e.g., one that uses a page fault mechanism instead of an explicit lock). In the later case, the DRM would be extended to provide support for this mechanism.

    For more details on the hardware lock requirements and a discussion of the performance implications and implementation details, please see [FOM99].

  2. The DRM enforces the DRI security policy for access to the graphics hardware.

    The X server, running as root, usually obtains access to the frame buffer and MMIO regions on the graphics hardware by mapping these regions using /dev/mem. The direct-rendering clients, however, do not run as root, but still require similar mappings. Like /dev/mem, the DRM device interface allows clients to create these mappings, but with the following restrictions:

    1. The client may only map regions if it has a current connection to the X server. This forces direct-rendering clients to obey the normal X server security policy (e.g., using xauth).

    2. The client may only map regions if it can open /dev/drm?, which is only accessible by root and by a group specified in the XF86Config file (a file that only root can edit). This allows the system administrator to restrict direct rendering access to a group of trusted users.

    3. The client may only map regions that the X server allows to be mapped. The X server may also restrict those mappings to be read-only. This allows regions with security implications (e.g., those containing registers that can start DMA) to be restricted.

  3. The DRM provides a generic DMA engine.

    Most modern PC-class graphics hardware provides for DMA access to the command FIFO. Often, DMA access has been optimized so that it provides significantly better throughput than does MMIO access. For these cards, the DRM provides a DMA engine with the following features:

    1. The X server can specify multiple pools of different sized buffers which are allocated and locked down.

    2. The direct-rendering client maps these buffers into its virtual address space, using the DRM API.

    3. The direct-rendering client reserves some of these buffers from the DRM, fills the buffers with commands, and requests that the DRM send the buffers to the graphics hardware. Small buffers are used to ensure that the X server can get the lock between buffer dispatches, thereby providing X server interactivity. Typical 40MB/s PCI transfer rates may require 10000 4kB buffer dispatches per second.

    4. The DRM manages a queue of DMA buffers for each OpenGL GLXContext, and detects when a GLXContext switch is necessary. Hooks are provided so that a device-specific driver can perform the GLXContext switch in kernel-space, and a callback to the X server is provided when a device-specific driver is not available (for the SI, the callback mechanism is used because it provides an example of the most generic method for GLXContext switching). The DRM also performs simple scheduling of DMA buffer requests to prevent GLXContext thrashing. When a DMA is swapped a significant amount of data must be read from and/or written to the graphics device (between 4kB and 64kB for typical hardware).

    5. The DMA engine is generic in the sense that the X server provides information at run-time on how to perform DMA operations for the specific hardware installed on the machine. The X server does all of the hardware detection and setup. This allows easy bootstrapping for new graphics hardware under the DRI, while providing for later performance and capability enhancements through the use of a device-specific kernel driver.

3.2.4. Is it possible to make an DRI driver without a DRM driver in a piece of hardware whereby we do all accelerations in PIO mode?

The kernel provides three main things:

  1. the ability to wait on a contended lock (the waiting process is put to sleep), and to free the lock of a dead process;

  2. the ability to mmap areas of memory that non-root processes can`t usually map;

  3. the ability to handle hardwre interruptions and a DMA queue.

All of these are hard to do outside the kernel, but they aren`t required components of a DRM driver. For example, the tdfx driver doesn`t use hardware interrupts at all — it is one of the simplest DRM drivers, and would be a good model for the hardware you are thinking about (in it`s current form, it is quite generic).

Note

DRI was designed with a very wide range of hardware in mind, ranging from very low-end PC graphics cards through very high-end SGI-like hardware (which may not even need the lock). The DRI is an infrastructure or framework that is very flexible — most of the example drivers we have use hardware interrupts, but that isn`t a requirement.

3.2.5. Has the DRM driver support for or loading sub-drivers?

Although the [Faith99] states that the DRM driver has support for loading sub-drivers by calling drmCreateSub, Linus didn't like that approach. He wanted all drivers to be independent, so it went away.

3.2.6. What is templated DRM code?

It was first discussed in a email about what Garteh had done to bring up the mach64 kernel module.

Not wanting to simply copy-and-paste another version of _drv.[ch], _context.c, _bufs.s and so on, Gareth did some refactoring along the lines of what him and Rik Faith had discussed a long time ago.

This is very much along the lines of a lot of Mesa code, where there exists a template header file that can be customized with a few defines. At the time, it was done _drv.c and _context.c, creating driver_tmp.h and context_tmp.h that could be used to build up the core module.

An inspection of mach64_drv.c on the mach64-0-0-1-branch reveals the following code:

#define DRIVER_AUTHOR           "Gareth Hughes"

#define DRIVER_NAME             "mach64"
#define DRIVER_DESC             "DRM module for the ATI Rage Pro"
#define DRIVER_DATE             "20001203"

#define DRIVER_MAJOR            1
#define DRIVER_MINOR            0
#define DRIVER_PATCHLEVEL       0


static drm_ioctl_desc_t         mach64_ioctls[] = {
	[DRM_IOCTL_NR(DRM_IOCTL_VERSION)]       = { mach64_version,    0, 0 },
	[DRM_IOCTL_NR(DRM_IOCTL_GET_UNIQUE)]    = { drm_getunique,     0, 0 },
	[DRM_IOCTL_NR(DRM_IOCTL_GET_MAGIC)]     = { drm_getmagic,      0, 0 },
	[DRM_IOCTL_NR(DRM_IOCTL_IRQ_BUSID)]     = { drm_irq_busid,     0, 1 },

	[DRM_IOCTL_NR(DRM_IOCTL_SET_UNIQUE)]    = { drm_setunique,     1, 1 },
	[DRM_IOCTL_NR(DRM_IOCTL_BLOCK)]         = { drm_block,         1, 1 },
	[DRM_IOCTL_NR(DRM_IOCTL_UNBLOCK)]       = { drm_unblock,       1, 1 },
	[DRM_IOCTL_NR(DRM_IOCTL_AUTH_MAGIC)]    = { drm_authmagic,     1, 1 },
	[DRM_IOCTL_NR(DRM_IOCTL_ADD_MAP)]       = { drm_addmap,        1, 1 },

	[DRM_IOCTL_NR(DRM_IOCTL_ADD_BUFS)]      = { drm_addbufs,       1, 1 },
	[DRM_IOCTL_NR(DRM_IOCTL_MARK_BUFS)]     = { drm_markbufs,      1, 1 },
	[DRM_IOCTL_NR(DRM_IOCTL_INFO_BUFS)]     = { drm_infobufs,      1, 0 },
	[DRM_IOCTL_NR(DRM_IOCTL_MAP_BUFS)]      = { drm_mapbufs,       1, 0 },
	[DRM_IOCTL_NR(DRM_IOCTL_FREE_BUFS)]     = { drm_freebufs,      1, 0 },

	[DRM_IOCTL_NR(DRM_IOCTL_ADD_CTX)]       = { mach64_addctx,     1, 1 },
	[DRM_IOCTL_NR(DRM_IOCTL_RM_CTX)]        = { mach64_rmctx,      1, 1 },
	[DRM_IOCTL_NR(DRM_IOCTL_MOD_CTX)]       = { mach64_modctx,     1, 1 },
	[DRM_IOCTL_NR(DRM_IOCTL_GET_CTX)]       = { mach64_getctx,     1, 0 },
	[DRM_IOCTL_NR(DRM_IOCTL_SWITCH_CTX)]    = { mach64_switchctx,  1, 1 },
	[DRM_IOCTL_NR(DRM_IOCTL_NEW_CTX)]       = { mach64_newctx,     1, 1 },
	[DRM_IOCTL_NR(DRM_IOCTL_RES_CTX)]       = { mach64_resctx,     1, 0 },
	[DRM_IOCTL_NR(DRM_IOCTL_ADD_DRAW)]      = { drm_adddraw,       1, 1 },
	[DRM_IOCTL_NR(DRM_IOCTL_RM_DRAW)]       = { drm_rmdraw,        1, 1 },
	[DRM_IOCTL_NR(DRM_IOCTL_LOCK)]          = { mach64_lock,       1, 0 },
	[DRM_IOCTL_NR(DRM_IOCTL_UNLOCK)]        = { mach64_unlock,     1, 0 },
	[DRM_IOCTL_NR(DRM_IOCTL_FINISH)]        = { drm_finish,        1, 0 },

#if defined(CONFIG_AGP) || defined(CONFIG_AGP_MODULE)
	[DRM_IOCTL_NR(DRM_IOCTL_AGP_ACQUIRE)]   = { drm_agp_acquire,   1, 1 },
	[DRM_IOCTL_NR(DRM_IOCTL_AGP_RELEASE)]   = { drm_agp_release,   1, 1 },
	[DRM_IOCTL_NR(DRM_IOCTL_AGP_ENABLE)]    = { drm_agp_enable,    1, 1 },
	[DRM_IOCTL_NR(DRM_IOCTL_AGP_INFO)]      = { drm_agp_info,      1, 0 },
	[DRM_IOCTL_NR(DRM_IOCTL_AGP_ALLOC)]     = { drm_agp_alloc,     1, 1 },
	[DRM_IOCTL_NR(DRM_IOCTL_AGP_FREE)]      = { drm_agp_free,      1, 1 },
	[DRM_IOCTL_NR(DRM_IOCTL_AGP_BIND)]      = { drm_agp_bind,      1, 1 },
	[DRM_IOCTL_NR(DRM_IOCTL_AGP_UNBIND)]    = { drm_agp_unbind,    1, 1 },
#endif

	[DRM_IOCTL_NR(DRM_IOCTL_MACH64_INIT)]   = { mach64_dma_init,   1, 1 },
	[DRM_IOCTL_NR(DRM_IOCTL_MACH64_CLEAR)]  = { mach64_dma_clear,  1, 0 },
	[DRM_IOCTL_NR(DRM_IOCTL_MACH64_SWAP)]   = { mach64_dma_swap,   1, 0 },
	[DRM_IOCTL_NR(DRM_IOCTL_MACH64_IDLE)]   = { mach64_dma_idle,   1, 0 },
};

#define DRIVER_IOCTL_COUNT      DRM_ARRAY_SIZE( mach64_ioctls )

#define HAVE_CTX_BITMAP         1

#define TAG(x) mach64_##x
#include "driver_tmp.h"

And that's all you need. A trivial amount of code is needed for the context handling:

#define __NO_VERSION__
#include "drmP.h"
#include "mach64_drv.h"

#define TAG(x) mach64_##x
#include "context_tmp.h"

And as far as I can tell, the only thing that's keeping this out of mach64_drv.c is the __NO_VERSION__, which is a 2.2 thing and is not used in 2.4 (right?).

To enable all the context bitmap code, we see the
#define HAVE_CTX_BITMAP 1
To enable things like AGP, MTRRs and DMA management, the author simply needs to define the correct symbols. With less than five minutes of mach64-specific coding, I had a full kernel module that would do everything a basic driver requires — enough to bring up a software-fallback driver. The above code is all that is needed for the tdfx driver, with appropriate name changes. Indeed, any card that doesn't do kernel-based DMA can have a fully functional DRM module with the above code. DMA-based drivers will need more, of course.

The plan is to extend this to basic DMA setup and buffer management, so that the creation of PCI or AGP DMA buffers, installation of IRQs and so on is as trivial as this. What will then be left is the hardware-specific parts of the DRM module that deal with actually programming the card to do things, such as setting state for rendering or kicking off DMA buffers. That is, the interesting stuff.

A couple of points:

  • Why was it done like this, and not with C++ features like virtual functions (i.e. why don't I do it in C++)? Because it's the Linux kernel, dammit! No offense to any C++ fan who may be reading this :-) Besides, a lot of the initialization is order-dependent, so inserting or removing blocks of code with #defines is a nice way to achieve the desired result, at least in this situation.

  • Much of the core DRM code (like bufs.c, context.c and dma.c) will essentially move into these template headers. I feel that this is a better way to handle the common code. Take context.c as a trivial example — the i810, mga, tdfx, r128 and mach64 drivers have exactly the same code, with name changes. Take bufs.c as a slightly more interesting example — some drivers map only AGP buffers, some do both AGP and PCI, some map differently depending on their DMA queue management and so on. Again, rather than cutting and pasting the code from drm_addbufs into my driver, removing the sections I don't need and leaving it at that, I think keeping the core functionality in bufs_tmp.h and allowing this to be customized at compile time is a cleaner and more maintainable solution.

This looks way sweet. Have you thought about what it would take to generalize this to other OSs? I think that it has the possibility to make keeping the FreeBSD code up to date a lot easier.

The current mach64 branch is only using one template in the driver.

Check out the r128 driver from the trunk, for a good example. Notice there are files in there such as r128_tritmp.h. This is a template that gets included in r128_tris.c. What it does basically is consolidate code that is largly reproduced over several functions, so that you set a few macros. For example:

#define IND (R128_TWOSIDE_BIT)
#define TAG(x) x##_twoside
			

followed by

#include "r128_tritmp.h"
			

Notice the inline function's name defined in r128_tritmp.h is the result of the TAG macro, as well the function's content is dependant on what IND value is defined. So essentially the inline function is a template for various functions that have a bit in common. That way you consolidate common code and keep things consistent.

Look at e.g. xc/programs/Xserver/hw/xfree86/os-support/linux/drm/kernel/r128.h though. That's the template architecture at its beauty. Most of the code is shared between the drivers, customized with a few defines. Compare that to the duplication and inconsistency before.

3.3. XFree86

3.3.1. DRI Aware DDX Driver

3.3.1.1. What is the DDX Driver?

For each type of graphics card there is an XFree86 2D (or DDX) driver which does initialization, manages the display and performs 2D rendering. XFree86 4.0 introduced a new device driver interface called XAA which should allow XFree86 drivers to be backward compatible with future versions of the Xserver.

Each 2D driver has a bit of code to bootstrap the 3D/DRI features.

3.3.1.2. Where does the DDX driver resides?

The XFree86 drivers usually reside in the /usr/X11R6/lib/modules/drivers/ directory with names of the form *_drv.o.

The XFree86 drivers source code resides in xc/programs/Xserver/hw/xfree86/drivers/ usually in a file *_dri.[ch].

3.3.2. DRI Extension

3.3.2.1. What does the XFree86 DRI extension?

The XFree86-DRI X server extension is basically used for communication between the other DRI components (the X server, the kernel module, libGL.so and the 3D DRI drivers).

The XFree86-DRI module maintains DRI-specific data structures related to screens, windows, and rendering contexts. When the user moves a window, for example, the other DRI components need to be informed so that rendering appears in the right place.

3.3.2.2. Where does the XFree86 DRI extension resides?

The DRI extension module usually resides at /usr/X11R6/lib/modules/extensions/libdri.a.

The DRI extension source code resides at xc/programs/Xserver/GL/dri/ .

3.3.3. GLX Extension

3.3.3.1. What does the XFree86 GLX extension?

The GLX extension to the X server handles the server-side tasks of the GLX protocol. This involves setup of GLX-enhanced visuals, GLX context creation, context binding and context destruction.

When using indirect rendering, the GLX extension decodes GLX command packets and dispatches them to the core rendering engine.

3.3.3.2. Where does the XFree86 GLX resides?

The GLX extension module usually resides at /usr/X11R6/lib/modules/extensions/libglx.a.

The GLX extension source code resides at xc/programs/Xserver/GL/glx/ .

3.3.4. GLcore Extension

The GLcore module takes care of rendering for indirect or non-local clients.

Currently, Mesa is used as a software renderer. In the future, indirect rendering will also be hardware accelerated.

3.3.4.1. Where does the GLcore extension resides?

The GLcore extension module usually resides at /usr/X11R6/lib/modules/extensions/libGLcore.a.

The GLcore extension source code resides at xc/programs/Xserver/GL/mesa/src/ .

3.3.4.2. If I run a GLX enabled OpenGL program on a remote system with the display set back to my machine, will the X server itself render the GLX requests through DRI?

The X server will render the requests but not through the DRI. The rendering will be software only. Having the X server spawn a DRI client to process the requests is on the TODO list.

3.3.4.3. What's the re a difference between local clients and remote clients?

There is no difference as far as the client is concerned. The only difference is speed.

The difference between direct and indirect rendering is that the former can`t take place over the network. The DRI currently concentrates on the direct rendering case.

The application still gets a fully functional OpenGL implementation which is all that`s required by the specification. The fact is that the implementation is entirely in software, but that`s completely legal. In fact, all implementations fall back to software when they can`t handle the request in hardware. It`s just that in this case — the implementation can`t handle anything in hardware.

Most people don`t run GLX applications remotely, and/because most applications run very poorly when run remotely. It`s not really the applications fault, OpenGL pushes around a lot of data.

Therefore there hasn`t been a lot of interest in hardware accelerated remote rendering and there`s plenty to do local rendering. It is on the TODO list but at a low priority.

The solution is actually fairly straight forward. When the X server gets a remote client, it forks off a small application that just reads GLX packets and then remakes the same OpenGL calls. This new application is then just a standard direct rendering client and the problem reduces to one already solved.

3.3.4.4. Is there a difference between using indirect DRI rendering (e.g., with LIBGL_ALWAYS_INDIRECT) and just linking against the Mesa library?

Yes. DRI libGL used in in indirect mode sends GLX protocol messages to the X server which are executed by the GLcore renderer. Stand-alone Mesa's non-DRI libGL doesn't know anything about GLX. It effectively translates OpenGL calls into Xlib calls.

The GLcore renderer is based on Mesa. At this time the GLcore renderer can not take advantage of hardware acceleration.

3.4. libGL

OpenGL-based programs must link with the libGL library. libGL implements the GLX interface as well as the main OpenGL API entrypoints. When using indirect rendering, libGL creates GLX protocol messages and sends them to the X server via a socket. When using direct rendering, libGL loads the appropriate 3D DRI driver then dispatches OpenGL library calls directly to that driver.

libGL also has the ability to support heterogeneous, multi-head configurations. That means one could have two or more graphics cards (of different types) in one system and libGL would allow an application program to use all of them simultaneously.

3.4.2. 3D Driver

A DRI aware 3D driver currently based on Mesa.

3.4.2.1. Where does the 3D Driver resides?

Normally libGL loads 3D DRI drivers from the /usr/X11R6/lib/modules/dri directory but the search patch can be overriden by setting the LIBGL_DRIVERS_PATH environment variable.

The DRI aware 3D driver resides in xc/lib/GL/mesa/src/drv

3.4.3. Of what use is the Mesa code in the xc tree?

Mesa is used to build some server side modules/libraries specifically for the benefit of the DRI. The libGL.so is the client side aspect of Mesa which works closely with the server side components of Mesa.

The libGLU and libglut libraries are entirely client side things, and so they are distributed seperately.

3.4.4. Is there any documentation about the XMesa* calls?

There is no documentation for those functions. However, one can point out a few things.

First, despite the prolific use of the word "Mesa" in the client (and server) side DRI code, the DRI is not dependant on Mesa. It`s a common misconception that the DRI was designed just for Mesa. It`s just that the drivers that we at Precision Insight have done so far have Mesa at their core. Other groups are working on non-Mesa-based DRI drivers.

In the client-side code, you could mentally replace the string "XMesa" with "Driver" or some other generic term. All the code below xc/lib/GL/mesa/ could be replaced by alternate code. libGL would still work. libGL.so has no knowledge whatsoever of Mesa. It`s the drivers which it loads that have the Mesa code.

On the server side there`s more of the same. The XMesa code used for indirect/software rendering was originally borrowed from stand-alone Mesa and its pseudo GLX implementation. There are some crufty side-effects from that.

3.4.5. How do X modules and X applications communicate?

X modules are loaded like kernel modules, with symbol resolution at load time, and can thus call eachother functions. For kernel modules, the communication between applications and modules is done via the /dev/* files.

X applications call X libraries function which creates a packet and sends it to the server via sockets which processes it. That's all well documented in the standard X documentation.

There's 3 ways 3D clients can communicate with the server or each other:

  1. Via the X protocol requests. There are DRI extensions.

  2. Via the SAREA (the shared memory segment)

  3. Via the kernel driver.