Right now, we don`t do page flipping at all. Everything is a blit from back to front. The biggest problem with page flipping is detecting when you`re in full screen mode, since OpenGL doesn`t really have a concept of full screen mode. We want a solution that works for existing games. So we`ve been designing a solution for it. It should get implemented fairly soon since we need it for antialiasing on the V5.
In the current implementation the X front buffer is the 3D front buffer. When we do page flipping we`ll continue to do the same thing. Since you have an X window that covers the screen it is safe for us to use the X surface`s memory. Then we`ll do page flipping. The only issue will be falling back to blitting if the window is ever moved from covering the whole screen.
SSE stuff was somewhat broken in the kernels until recently. In fact, we (Gareth Hughes to be precise) just submitted a big kernel patch that should fully support SSE. I don`t know if anyone is working on them for Mesa, I haven`t seen much in that area lately.
I`d start with profiling your app against the current Mesa base, to decide where the optimization effort should go. I`m not convinced SSE is the next right step. There may be more fundamental optimizations to do first. We haven`t spent a much time on optimizing it.
The locking system is designed to be highly efficient. It is based on a two tiered lock. Basically it works like this:
The client wants the lock. The use the CAS (I was corrected that the instruction is compare and swap, I knew that was the functionality, but I got the name wrong) If the client was the last application to hold the lock, you`re done you move on.
If it wasn`t the last one, then we use an IOCTL to the kernel to arbitrate the lock.
In this case some or all of the state on the card may have changed. The shared memory carries a stamp number for the X server. When the X server does a window operation it increments the stamp. If the client sees that the stamp has changed, it uses a DRI X protocol request to get new window location and clip rects. This only happens on a window move. Assuming your clip rects/window position hasn`t changed, the redisplay happens entirely in the client.
The client may have other state to restore as well. In the case of the tdfx driver we have three more flags for command fifo invalid, 3D state invalid, textures invalid. If those are set the corresponding state is restored.
So, if the X server wakes up to process input, it current grabs the lock but doesn`t invalidate any state. I`m actually fixing this now so that it doesn`t grab the lock for input processing.
If the X server draws, it grabs the lock and invalidates the command fifo.
If the X server moves a window, it grabs the lock, updates the stamp, and invalidates the command fifo.
If another 3D app runs, it grabs the lock, invalidates the command fifo, invalidates the 3D state and possibly invalidates the texture state.
The difference between in a window and fullscreen is actually quite minor. When you`re in fullscreen mode what you`ve done is zoomed in your desktop and moved the zoomed portion to cover just the window. (Just the same hitting ctrl-alt-plus and ctrl-alt-minus) The game still runs in a window in either case. So, the behavior shouldn`t be any different.
DGA is turned off in the current configuration. I`ve just started adding those features back in. The latest code in the trunk has some support for DGA but is broken, the code in my tdfx-1-1 branch should be working.
The first question you have to ask is whether you really should throw out X11. X11 does event handling, multiple windows, etc. It can also be made quite light weight. It's running in 600k on an IPAQ.
If you decide you do need to throw out X, then you have to ask yourself if the DRI is the right solution for your problem. The DRI handles multiple 3D clients accessing hardware at the same time, sharing the same screen, in a robust and secure manner. If you don't need those properties the DRI isn't necessarily the right solution for you.
If you get to this point, then it would be theoretically possible to remove the DRI from X11 and have it run without X11. There's no code to support that at this point, so it would require some significant work to do that.
The main reasons one would be interested in a non-X version of DRI
Pros: Eliminate any performance bottlenecks the XServer may be causing. Since we are 3D only, any extraneous locking/unlocking, periodic refreshes of the (hidden) 2D portion of the display, etc., will cause unexpected slowdowns.
Cons: If the X server never does any drawing then the overhead is minimal. Locking is done around Glide operations. A lock check is a single Check And Set (CAS) instruction. Assuming your 3D window covers the X root, then there are no 2D portions to redisplay.
Pros: Eliminate wasted system memory requirements.
Cons: Yes, there will be some resources from the X server, but I think not much.
Pros: Eliminate on-card font/pixmap/surface/etc caches that just waste memory.
Cons: If you don`t use them they aren`t taking any resources. Right now, there is a small pixmap cache that`s staticall added to 2D. Removing that is a trivial code change. Making it dynamic (3D steals it away from 2D) is not too tough and a better solution than any static allocation.
Pros: Eliminate the need for extra peripherals, such as mice.
Cons: Allowing operations without a mouse should be trivial if it isn`t a configuration option already.
Pros: Reduction in the amount of software necessary to install/maintain on a customer`s system. Certainly none of my customers would have been able to install XFree 4.0 on their own.
Cons: XFree 4.0 installs with appropriate packaging are trivial. What you`re saying is that no one has done the packaging work for you, and that`s correct. If you create your own stripped DRI version you`ll be in for a lot more packaging work on your own.
The impact of the Xserver is on the 3D graphics pipeline is very little. Essentially none in the 3D pipeline. Some resources, but I think not much. Realize the CAS is in the driver, so you`re looking at creating a custom version of that as well. I think effort spent avoiding the CAS, creating your own window system binding for GL, and moving the DRI functionality out of the X server would be much better spent optimizing Mesa and the driver instead. You have to focus resources where they provide the biggest payoff.
Take a look at the fbdri project. They're trying to get the DRI running directly on a console with the Radeon. If all you want is one window and only running OpenGL then this makes sense.
I'll throw out another option. Make the DRI work with TinyX. TinyX runs in 600k and gives you all of X. It should be a fairly straight forward project. As soon as you want more than one window, it makes a lot of sense to use the X framework that already exists.
I've been going through the DRI code and documents trying to figure out how all this stuff works. I decided to start at the start, so I've been looking at the driver initialization process. Boys and girls, when you're going through a huge chunk of code like the DRI, cscope is your friend! I've taken some notes along the way, which are included in the bottom of this message. Could the people who actually know look at my notes and correct me where I'm wrong? - The whole process begins when an application calls glXCreateContext (lib/GL/glx/glxcmds.c). glXCreateContext is just a stub that call CreateContext. The real work begins when CreateContext calls __glXInitialize (lib/GL/glx/glxext.c). - The driver specific initialization process starts with __driCreateScreen. Once the driver is loaded (via dlopen), dlsym is used to get a pointer to this function. The function pointer for each driver is stored in the createScreen array in the __DRIdisplay structure. This initialization is done in driCreateDisplay (lib/GL/dri/dri_glx.c), which is called by __glXInitialize. I should also point out, to make it clear, that __driCreateScreen() really is the bootstrap of a DRI driver. It's the only* function in a DRI driver that libGL directly knows about. All the other DRI functions are accessed via the __DRIdisplayRec, __DRIscreenRec, __DRIcontextRec and __DRIdrawableRec structs defined in glxclient.h). Those structs are pretty well documented in the file. *Footnote: that's not really true- there's also the __driRegisterExtensions() function that libGL uses to implement glXGetProcAddress(). That's another long story. - After performing the __glXInitialize step, CreateContext calls the createContext function for the requested screen. Here the driver creates two data structures. The first, GLcontext (extras/Mesa/src/mtypes.h), contains all of the device independent state, device dependent constants (i.e., texture size limits, light limits, etc.), and device dependent function tables. The driver also allocates a structure that contains all of the device dependent state. The GLcontext structure links to the device dependent structure via the DriverCtx pointer. The device dependent structure also has a pointer back to the GLcontext structure. The device dependent structure is where the driver will store context specific hardware state (register settings, etc.) for when context (in terms of OpenGL / X context) switches occur. This structure is analogous to the buffers where the OS stores CPU state where a program context switch occurs. The texture images really are stored within Mesa's data structures. Mesa supports about a dozen texture formats which happen to satisfy what all the DRI drivers need. So, the texture format/ packing is dependant on the hardware, but Mesa understands all the common formats. See Mesa/src/texformat.h. Gareth and Brian spent a lot of time on that. - createScreen (i.e., the driver specific initialization function) is called for each screen from AllocAndFetchScreenConfigs (lib/GL/glx/glxext.c). This is also called from __glXInitialize. - For all of the existing drivers, the __driCreateScreen function is just a wrapper that calls __driUtilCreateScreen (lib/GL/dri/dri_util.c) with a pointer to the driver's API function table (of type __DriverAPIRec). This creates a __DRIscreenPrivate structure for the display and fills it in (mostly) with the supplied parameters (i.e., screen number, display information, etc.). It also opens and initialized the connection to DRM. This includes opening the DRM device, mapping the frame buffer (note: the DRM documentation says that the function used for this is called drmAddMap, but it is actually called drmMap), and mapping the SAREA. The final step is to call the driver initialization function for the driver (from the InitDriver field in the __DriverAPIRec (DriverAPI field of the __DRIscreenPrivate). - The InitDriver function does (at least in the Radeon and i810 drivers) two broad things. It first verifies the version of the services (XFree86, DDX, and DRM) that it will use. In the two drivers that I examined, this code was exactly the same and could probably be moved to dri_util.[ch]. (I'd look at more drivers first to see if factoring out that code is really a good idea.) The driver then creates an internal representation of the screen and stores it (the pointer to the structure) in the private field of the __DRIscreenPrivate structure. The driver-private data may include things such as mappings of MMIO registers, mappings of display and texture memory, information about the layout if video memory, chipset version specific data (feature availability for the specific chip revision, etc.), and other similar data. This is the handle that identifies the specific graphics card to the driver (in case there is more than one card in the system that will use the same driver). - After performing the __glXInitialize step, CreateContext calls the createContext function for the requested screen. This is where it gets pretty complicated. I have only looked at the Radeon driver. radeonCreateContext (lib/GL/mesa/src/drv/radeon/radeon_context.c) allocates a GLcontext structure (actually 'struct __GLcontextRec from extras/Mesa/src/mtypes.h). Here it fills in function tables for virtually every OpenGL call. Additionally, the __GLcontextRec has pointers to buffers where the driver will store context specific hardware state (textures, register settings, etc.) for when context (in terms of OpenGL / X context) switches occur. The __GLcontextRec (i.e. GLcontext in Mesa) doesn't have any buffers of hardware-specific data (except texture image data if you want to be picky). All Radeon-specific, per-context data should be hanging off of the struct radeon_context. All the DRI drivers define a hardware-specific context structure (such as struct radeon_context, typdef'd to be radeonContextRec, or struct mga_context_t typedef'd to be mgaContext). radeonContextRec has a pointer back to the Mesa __GLcontextRec. And Mesa's __GLcontextRec->DriverCtx pointer points back to the radeonContextRec If we were writing all this in C++ (don't laugh) we'd treat Mesa's __GLcontextRec as a base class and create driver-specific derived classes from it. Inheritance like this is actually pretty common in the DRI code, even though it's sometimes hard to spot. These buffers are analogous to the buffers where the OS stores CPU state where a program context switch occurs. Note that we don't do any fancy hardware context switching in our drivers. When we make-current a new context, we basically update all the hardware state with that new context's values. - When each of the function tables is initialized (see radeonInitSpanFuncs for an example), an internal Mesa function is called. This function (e.g., _swrast_GetDeviceDriverReference) both allocates the buffer and fills in the function pointers with the software fallbacks. If a driver were to just call these allocation functions and not replace any of the function poniters, it would be the same as the software renderer. - The next part seems to start when the createDrawable function in the __DRIscreenRec is called, but I don't see where this happens. createDrawable should be called via glXMakeCurrent() since that's the first time we're given an X drawable handle. Somewhere during glXMake- Current() we use a DRI hash lookup to translate the X Drawable handle into an pointer to a __DRIdrawable. If we get a NULL pointer that means we've never seen that handle before and now have to allocate the __DRIdrawable and initialize it (and put it in the hash table).
Utah is based on earlier Mesa code. Some of the work is the "DRI" work, and some is the "Mesa" work. The Mesa work will transfer over reasonably well. The DRI work is mostly initialization and kernel drivers.
So you want to study a good DRI driver and then move Utah code over.