6. Miscellaneous Questions

6.1. How about the main X drawing surface? Are 2 extra "window sized" buffers allocated for primary and secondary buffers in a page-flipping configuration?

Right now, we don`t do page flipping at all. Everything is a blit from back to front. The biggest problem with page flipping is detecting when you`re in full screen mode, since OpenGL doesn`t really have a concept of full screen mode. We want a solution that works for existing games. So we`ve been designing a solution for it. It should get implemented fairly soon since we need it for antialiasing on the V5.

In the current implementation the X front buffer is the 3D front buffer. When we do page flipping we`ll continue to do the same thing. Since you have an X window that covers the screen it is safe for us to use the X surface`s memory. Then we`ll do page flipping. The only issue will be falling back to blitting if the window is ever moved from covering the whole screen.

6.2. Is anyone working on adding SSE support to the transform/lighting code in Mesa?

SSE stuff was somewhat broken in the kernels until recently. In fact, we (Gareth Hughes to be precise) just submitted a big kernel patch that should fully support SSE. I don`t know if anyone is working on them for Mesa, I haven`t seen much in that area lately.

I`d start with profiling your app against the current Mesa base, to decide where the optimization effort should go. I`m not convinced SSE is the next right step. There may be more fundamental optimizations to do first. We haven`t spent a much time on optimizing it.

6.3. How often are checks done to see if things need clipped/redrawn/redisplayed?

The locking system is designed to be highly efficient. It is based on a two tiered lock. Basically it works like this:

The client wants the lock. The use the CAS (I was corrected that the instruction is compare and swap, I knew that was the functionality, but I got the name wrong) If the client was the last application to hold the lock, you`re done you move on.

If it wasn`t the last one, then we use an IOCTL to the kernel to arbitrate the lock.

In this case some or all of the state on the card may have changed. The shared memory carries a stamp number for the X server. When the X server does a window operation it increments the stamp. If the client sees that the stamp has changed, it uses a DRI X protocol request to get new window location and clip rects. This only happens on a window move. Assuming your clip rects/window position hasn`t changed, the redisplay happens entirely in the client.

The client may have other state to restore as well. In the case of the tdfx driver we have three more flags for command fifo invalid, 3D state invalid, textures invalid. If those are set the corresponding state is restored.

So, if the X server wakes up to process input, it current grabs the lock but doesn`t invalidate any state. I`m actually fixing this now so that it doesn`t grab the lock for input processing.

If the X server draws, it grabs the lock and invalidates the command fifo.

If the X server moves a window, it grabs the lock, updates the stamp, and invalidates the command fifo.

If another 3D app runs, it grabs the lock, invalidates the command fifo, invalidates the 3D state and possibly invalidates the texture state.

6.4. What's the deal with fullscreen and DGA?

The difference between in a window and fullscreen is actually quite minor. When you`re in fullscreen mode what you`ve done is zoomed in your desktop and moved the zoomed portion to cover just the window. (Just the same hitting ctrl-alt-plus and ctrl-alt-minus) The game still runs in a window in either case. So, the behavior shouldn`t be any different.

DGA is turned off in the current configuration. I`ve just started adding those features back in. The latest code in the trunk has some support for DGA but is broken, the code in my tdfx-1-1 branch should be working.

6.5. DRI without X

6.5.1. Can DRI run without X?

The first question you have to ask is whether you really should throw out X11. X11 does event handling, multiple windows, etc. It can also be made quite light weight. It's running in 600k on an IPAQ.

If you decide you do need to throw out X, then you have to ask yourself if the DRI is the right solution for your problem. The DRI handles multiple 3D clients accessing hardware at the same time, sharing the same screen, in a robust and secure manner. If you don't need those properties the DRI isn't necessarily the right solution for you.

If you get to this point, then it would be theoretically possible to remove the DRI from X11 and have it run without X11. There's no code to support that at this point, so it would require some significant work to do that.

6.5.2. What would be the advantages of a non-X version of DRI?

The main reasons one would be interested in a non-X version of DRI

Pros: Eliminate any performance bottlenecks the XServer may be causing. Since we are 3D only, any extraneous locking/unlocking, periodic refreshes of the (hidden) 2D portion of the display, etc., will cause unexpected slowdowns.

Cons: If the X server never does any drawing then the overhead is minimal. Locking is done around Glide operations. A lock check is a single Check And Set (CAS) instruction. Assuming your 3D window covers the X root, then there are no 2D portions to redisplay.

Pros: Eliminate wasted system memory requirements.

Cons: Yes, there will be some resources from the X server, but I think not much.

Pros: Eliminate on-card font/pixmap/surface/etc caches that just waste memory.

Cons: If you don`t use them they aren`t taking any resources. Right now, there is a small pixmap cache that`s staticall added to 2D. Removing that is a trivial code change. Making it dynamic (3D steals it away from 2D) is not too tough and a better solution than any static allocation.

Pros: Eliminate the need for extra peripherals, such as mice.

Cons: Allowing operations without a mouse should be trivial if it isn`t a configuration option already.

Pros: Reduction in the amount of software necessary to install/maintain on a customer`s system. Certainly none of my customers would have been able to install XFree 4.0 on their own.

Cons: XFree 4.0 installs with appropriate packaging are trivial. What you`re saying is that no one has done the packaging work for you, and that`s correct. If you create your own stripped DRI version you`ll be in for a lot more packaging work on your own.

The impact of the Xserver is on the 3D graphics pipeline is very little. Essentially none in the 3D pipeline. Some resources, but I think not much. Realize the CAS is in the driver, so you`re looking at creating a custom version of that as well. I think effort spent avoiding the CAS, creating your own window system binding for GL, and moving the DRI functionality out of the X server would be much better spent optimizing Mesa and the driver instead. You have to focus resources where they provide the biggest payoff.

6.5.3. I would like to make a X11 free acces to 3d...

Take a look at the fbdri project. They're trying to get the DRI running directly on a console with the Radeon. If all you want is one window and only running OpenGL then this makes sense.

I'll throw out another option. Make the DRI work with TinyX. TinyX runs in 600k and gives you all of X. It should be a fairly straight forward project. As soon as you want more than one window, it makes a lot of sense to use the X framework that already exists.

6.6. DRI driver initialization process

		I've been going through the DRI code and documents trying to figure out how
		all this stuff works.  I decided to start at the start, so I've been looking
		at the driver initialization process.  Boys and girls, when you're going
		through a huge chunk of code like the DRI, cscope is your friend!  I've
		taken some notes along the way, which are included in the bottom of this
		message.
		
		Could the people who actually know look at my notes and correct me where I'm
		wrong?
		
		- The whole process begins when an application calls glXCreateContext
		  (lib/GL/glx/glxcmds.c).  glXCreateContext is just a stub that call
		  CreateContext.  The real work begins when CreateContext calls
		  __glXInitialize (lib/GL/glx/glxext.c).
		
		- The driver specific initialization process starts with __driCreateScreen.
		  Once the driver is loaded (via dlopen), dlsym is used to get a pointer to
		  this function.  The function pointer for each driver is stored in the
		  createScreen array in the __DRIdisplay structure.  This initialization is
		  done in driCreateDisplay (lib/GL/dri/dri_glx.c), which is called by
		  __glXInitialize.
		
		I should also point out, to make it clear, that __driCreateScreen() really
		is the bootstrap of a DRI driver.  It's the only* function in a DRI driver
		that libGL directly knows about.  All the other DRI functions are accessed via
		the __DRIdisplayRec, __DRIscreenRec, __DRIcontextRec and __DRIdrawableRec
		structs
		defined in glxclient.h).  Those structs are pretty well documented in the
		file.
		
		*Footnote: that's not really true- there's also the __driRegisterExtensions()
		function that libGL uses to implement glXGetProcAddress().  That's another
		long story.
		
		
		- After performing the __glXInitialize step, CreateContext calls the
		  createContext function for the requested screen.  Here the driver creates
		  two data structures.  The first, GLcontext (extras/Mesa/src/mtypes.h),
		  contains all of the device independent state, device dependent constants
		  (i.e., texture size limits, light limits, etc.), and device dependent
		  function tables.  The driver also allocates a structure that contains all
		  of the device dependent state.  The GLcontext structure links to the
		  device dependent structure via the DriverCtx pointer.  The device
		  dependent structure also has a pointer back to the GLcontext structure.
		
		  The device dependent structure is where the driver will store context
		  specific hardware state (register settings, etc.) for when
		  context (in terms of OpenGL / X context) switches occur.  This structure is
		  analogous to the buffers where the OS stores CPU state where a program
		  context switch occurs.
		
		
			The texture images really are stored within Mesa's
			data structures.  Mesa supports about a dozen texture formats which
			happen to satisfy what all the DRI drivers need.  So, the texture format/
			packing is dependant on the hardware, but Mesa understands all the
			common formats.  See Mesa/src/texformat.h.  Gareth and Brian spent a lot of
			time on that.
		
		
		- createScreen (i.e., the driver specific initialization function) is called
		  for each screen from AllocAndFetchScreenConfigs (lib/GL/glx/glxext.c).
		  This is also called from __glXInitialize.
		
		- For all of the existing drivers, the __driCreateScreen function is just a
		  wrapper that calls __driUtilCreateScreen (lib/GL/dri/dri_util.c) with a
		  pointer to the driver's API function table (of type __DriverAPIRec).  This
		  creates a __DRIscreenPrivate structure for the display and fills it in
		  (mostly) with the supplied parameters (i.e., screen number, display
		  information, etc.).  
		  
		  It also opens and initialized the connection to DRM.  This includes
		  opening the DRM device, mapping the frame buffer (note: the DRM
		  documentation says that the function used for this is called drmAddMap, but
		  it is actually called drmMap), and mapping the SAREA.  The final step is
		  to call the driver initialization function for the driver (from the
		  InitDriver field in the __DriverAPIRec (DriverAPI field of the
		  __DRIscreenPrivate).
		
		- The InitDriver function does (at least in the Radeon and i810 drivers) two
		  broad things.  It first verifies the version of the services (XFree86,
		  DDX, and DRM) that it will use.  In the two drivers that I examined, this
		  code was exactly the same and could probably be moved to dri_util.[ch]. 
		  
		  (I'd look at more drivers first to see if factoring out that code is really
		a good idea.)
		
		  The driver then creates an internal representation of the screen and
		  stores it (the pointer to the structure) in the private field of the
		  __DRIscreenPrivate structure.  The driver-private data may include things
		  such as mappings of MMIO registers, mappings of display and texture
		  memory, information about the layout if video memory, chipset version
		  specific data (feature availability for the specific chip revision, etc.),
		  and other similar data.  This is the handle that identifies the specific
		  graphics card to the driver (in case there is more than one card in the
		  system that will use the same driver).
		
		- After performing the __glXInitialize step, CreateContext calls the
		  createContext function for the requested screen.  This is where it gets
		  pretty complicated.  I have only looked at the Radeon driver.
		  radeonCreateContext (lib/GL/mesa/src/drv/radeon/radeon_context.c)
		  allocates a GLcontext structure (actually 'struct __GLcontextRec from
		  extras/Mesa/src/mtypes.h).  Here it fills in function tables for virtually
		  every OpenGL call.  Additionally, the __GLcontextRec has pointers to
		  buffers where the driver will store context specific hardware state
		  (textures, register settings, etc.) for when context (in terms of
		  OpenGL / X context) switches occur. 
		
		The __GLcontextRec (i.e. GLcontext in Mesa) doesn't have any buffers
		of hardware-specific data (except texture image data if you want to be
		picky).  All Radeon-specific, per-context data should be hanging off
		of the struct radeon_context.
		
		All the DRI drivers define a hardware-specific context structure
		(such as struct radeon_context, typdef'd to be radeonContextRec, or
		struct mga_context_t typedef'd to be mgaContext).
		
		radeonContextRec has a pointer back to the Mesa __GLcontextRec.
		And Mesa's __GLcontextRec->DriverCtx pointer points back to the
		radeonContextRec
		
		If we were writing all this in C++ (don't laugh) we'd treat Mesa's
		__GLcontextRec as a base class and create driver-specific derived
		classes from it.
		
		Inheritance like this is actually pretty common in the DRI code,
		even though it's sometimes hard to spot.
		
		
		These buffers are analogous to the
		  buffers where the OS stores CPU state where a program context switch occurs.
		
		Note that we don't do any fancy hardware context switching in our drivers.
		When we make-current a new context, we basically update all the hardware
		state with that new context's values.
		
		- When each of the function tables is initialized (see radeonInitSpanFuncs
		  for an example), an internal Mesa function is called.  This function
		  (e.g., _swrast_GetDeviceDriverReference) both allocates the buffer and
		  fills in the function pointers with the software fallbacks.  If a driver
		  were to just call these allocation functions and not replace any of the
		  function poniters, it would be the same as the software renderer.
		
		- The next part seems to start when the createDrawable function in the
		  __DRIscreenRec is called, but I don't see where this happens.
		
		createDrawable should be called via glXMakeCurrent() since that's the
		first time we're given an X drawable handle.  Somewhere during glXMake-
		Current() we use a DRI hash lookup to translate the X Drawable handle
		into an pointer to a __DRIdrawable.  If we get a NULL pointer that means
		we've never seen that handle before and now have to allocate the
		__DRIdrawable and initialize it (and put it in the hash table).
		

6.7. Utah-GLX

6.7.1. How similar/diffrent are the driver arch of Utah and DRI?

Utah is based on earlier Mesa code. Some of the work is the "DRI" work, and some is the "Mesa" work. The Mesa work will transfer over reasonably well. The DRI work is mostly initialization and kernel drivers.

6.7.2. Should one dig into the Utah source first, then go knee-deep in DIR, or the other way around?

So you want to study a good DRI driver and then move Utah code over.