/DEV/FASTIO$ - the Final Way
Okay. We just managed to get a transforming (32->16bit) call gate, that just happens to point to the wrong address. It was a matter of seconds to find the address of the corresponding GDT entry, and redirect it to the expected position. A kernel debugger is really a neat tool for the hacker. It worked!
At this point, calling the DevHlp_DynamicAPI function becomes useless, and will just occupy a later unusable entry point in the kernel. A quick look into the list of device helper functions offers the function DevHlp_AllocGDTSelector. We acquire a default GDT selector for exclusive use by the driver, and "adjust" it to form a 32->16 bit R3->R0 call gate into the I/O routine section of the driver.
Have a look at the code fragment in the FASTIO$ driver (figure 4) which does it all.
.386p
_acquire_gdt proc far
pusha
mov ax, word ptr [_io_gdt32] ; get selector
or ax,ax
jnz aexit ; if we didn't have one
; make one
xor ax, ax
mov word ptr [_io_gdt32], ax ; clear gdt save
mov word ptr [gdthelper], ax ; helper
push ds
pop es ; ES:DI = addr of
mov di, offset _io_gdt32 ; _io_gdt32
mov cx, 2 ; two selectors
mov dl, DevHlp_AllocGDTSelector ; get GDT selectors
call [_Device_Help]
jc aexit ; exit if failed
sgdt qword ptr [gdtsave] ; access the GDT ptr
mov ebx, dword ptr [gdtsave+2] ; get lin addr of GDT
movzx eax, word ptr [_io_gdt32] ; build offset into table
and eax, 0fffffff8h ; mask away DPL
add ebx, eax ; build address in EBX
mov ax, word ptr [gdthelper] ; selector to map GDT at
mov ecx, 08h ; a single entry (8 bytes)
mov dl, DevHlp_LinToGDTSelector
call [_Device_Help]
jc aexit0 ; if failed exit
mov ax, word ptr [gdthelper]
mov es, ax ; build address to GDT
xor bx, bx
mov word ptr es:[bx], offset _io_call ; fix address off
mov word ptr es:[bx+2], cs ; fix address sel
mov word ptr es:[bx+4], 0ec00h ; a r0 386 call gate
mov word ptr es:[bx+6], 0000h ; high offset
mov dl, DevHlp_FreeGDTSelector ; free gdthelper
call [_Device_Help]
jnc short aexit
aexit0: xor ax,ax ; clear selector
mov word ptr [_io_gdt32], ax
aexit: popa ; restore all registers
mov ax, word ptr [_io_gdt32]
ret
_acquire_gdt endp
Figure 4: Initialization routine of FASTIO$ driver
Since a device driver is initialized in ring 3, this routine does not work during startup. Rather, the driver will call this code once the first time some client opens the device. Thus, to use the driver, a small routine io_init() needs to be called first. Refer to the file iolib.asm that comes with this issue of EDM/2.
A final improvement: Usually, C code passes arguments on the stack. A call gate can be configured to copy these parameters over to the new ring. But why should we do this? For really fast I/O access we pass the data in registers. This allows for direct replacement of I/O instructions in assembler code by a simple indirect call as shown in figure 5. The address of the indirect call is set up by the above mentioned io_init() procedure.
EXTRN ioentry:FWORD
:
MOV DX, portaddr
MOV AL, 123
MOV BX, 4 ; function code 4 = write byte
CALL FWORD PTR [ioentry]
:
Figure 5: Calling I/O from assembler
If the code needs to be called from C, we simply write a small stub that wraps a stack frame envelope around it, just as shown in figure 6.
; Calling convention:
; void c_outb(short port,char data)
;
;
PUBLIC _c_outb
PUBLIC c_outb
_c_outb PROC
c_outb:
PUSH EBP
MOV EBP, ESP ; set standard stack frame
PUSH EBX ; save register
MOV DX, WORD PTR [EBP+8] ; get port
MOV AL, BYTE PTR [EBP+12] ; get data
MOV BX, 4 ; function code 4 = write byte
CALL FWORD PTR [ioentry] ; call intersegment indirect 16:32
POP EBX ; restore bx
POP EBP ; return
RET
ALIGN 4
_c_outb ENDP
Figure 6: A C callable I/O function
The file iolib.asm contains a set of functions c_inX() and c_outX() for using I/O from any 32 bit compiler that supports the standard stack frame. The files iolib.a and iolib.lib are precompiled versions; the file iolib.h contains the C prototypes.
In the complete driver, I gave up a small amount of the theoretically reachable performance. There are six basic I/O operations: IN and OUT instructions exist for transferring bytes, 16 bit words and 32 bit long words. To become really fast, one would have to provide a separate GDT selector for each of them. In a typical OS/2 system, this should not be a problem. However, if now everyone would start to add more routines, each with its own entry point, this resource could become rather quickly a scarce one. So I spent a function code, to be passed in the BX register, to multiplex the six functions into a single GDT selector. Refer to the io_call entry point in the fastio_a.asm driver source file.
There are no comments on this page. [Add comment]