Optimizing Subroutines in Assembly Language by Agner Fog - HTML preview

PLEASE NOTE: This is an HTML preview only and some elements such as links or page numbers may be incorrect.
Download the book in PDF, ePub, Kindle for a complete version.

8 Making function libraries compatible with multiple compilers and platforms

 

There are a number of compatibility problems to take care of if you want to make a function library that is compatible with multiple compilers, multiple programming languages, and multiple operating systems. The most important compatibility problems have to do with:

1.  Name mangling

2.  Calling conventions

3.  Object file formats

The easiest solution to these portability problems is to make the code in a high level language such as C++ and make any necessary low-level constructs with the use of intrinsic functions or inline assembly. The code can then be compiled with different compilers for the different platforms. Note that not all C++ compilers support intrinsic functions or inline assembly and that the syntax may differ.

If assembly language programming is necessary or desired then there are various methods for overcoming the compatibility problems between different x86 platforms. These methods are discussed in the following paragraphs.

8.1 Supporting multiple name mangling schemes

The easiest way to deal with the problems of compiler-specific name mangling schemes is to turn off name mangling with the extern "C" directive, as explained on page 30.

The extern "C" directive cannot be used for class member functions, overloaded functions and operators. This problem can be used by making an inline function with a mangled name to call an assembly function with an unmangled name:

// Example 8.1. Avoid name mangling of overloaded functions in C++

// Prototypes for unmangled assembly functions:

extern "C" double power_d (double x, double n);

extern "C" double power_i (double x, int n);

 

// Wrap these into overloaded functions:

inline double power (double x, double n) {return power_d(x, n);

inline double power (double x, int n)    {return power_i(x, n);

The compiler will simply replace a call to the mangled function with a call to the appropriate unmangled assembly function without any extra code. The same method can be used for class member functions, as explained on page 49.

However, in some cases it is desired to preserve the name mangling. Either because it makes the C++ code simpler, or because the mangled names contain information about calling conventions and other compatibility issues.

An assembly function can be made compatible with multiple name mangling schemes simply by giving it multiple public names. Returning to example 4.1c page 32, we can add mangled names for multiple compilers in the following way:

; Example 8.2. (Example 4.1c rewritten)

; Function with multiple mangled names (32-bit mode)

 

; double sinxpnx (double x, int n) {return sin(x) + n*x;}

 

ALIGN     4

_sinxpnx  PROC NEAR            ; extern "C" name

 

; Make public names for each name mangling scheme:

?sinxpnx@@YANNH@Z  LABEL NEAR  ; Microsoft compiler

@sinxpnx$qdi       LABEL NEAR  ; Borland compiler

_Z7sinxpnxdi       LABEL NEAR  ; Gnu compiler for Linux

__Z7sinxpnxdi      LABEL NEAR  ; Gnu compiler for Windows and Mac OS

PUBLIC ?sinxpnx@@YANNH@Z, @sinxpnx$qdi, _Z7sinxpnxdi, __Z7sinxpnxdi

 

; parameter x = [ESP+4]

; parameter n = [ESP+12]

; return value = ST(0)

    fild  dword ptr [esp+12] ; n

    fld   qword ptr [esp+4]  ; x

    fmul  st(1), st(0)       ; n*x

    fsin                     ; sin(x)

    fadd                     ; sin(x) + n*x

    ret                      ; return value is in st(0)

_sinxpnx  ENDP

Example 8.2 works with most compilers in both 32-bit Windows and 32-bit Linux because the calling conventions are the same. A function can have multiple public names and the linker will simply search for a name that matches the call from the C++ file. But a function call cannot have more than one external name.

The syntax for name mangling for different compilers is described in manual 5: "Calling conventions for different C++ compilers and operating systems". Applying this syntax manually is a difficult job. It is much easier and safer to generate each mangled name by compiling the function in C++ with the appropriate compiler. Command line versions of most compilers are available for free or as trial versions.

The Intel, Digital Mars and Codeplay compilers for Windows are compatible with the Microsoft name mangling scheme. The Intel compiler for Linux is compatible with the Gnu name mangling scheme. Gnu compilers version 2.x and earlier have a different name mangling scheme which I have not included in example 8.2. Mangled names for the Watcom compiler contain special characters which are only allowed by the Watcom assembler.

8.2 Supporting multiple calling conventions in 32 bit mode

Member functions in 32-bit Windows do not always have the same calling convention. The Microsoft-compatible compilers use the __thiscall convention with 'this' in register ecx, while Borland and Gnu compilers use the __cdecl convention with 'this' on the stack. One solution is to use friend functions as explained on page 49. Another possibility is to make a function with multiple entries. The following example is a rewrite of example 7.1a page 49 with two entries for the two different calling conventions:

; Example 8.3a (Example 7.1a with two entries)

; Member function, 32-bit mode

; int MyList::Sum()

 

; Define structure corresponding to class MyList:

MyList   STRUC

length_  DD  ?

buffer   DD  100 DUP (?)

MyList   ENDS

 

_MyList_Sum       PROC NEAR   ; for extern "C" friend function

 

; Make mangled names for compilers with __cdecl convention:

@MyList@Sum$qv    LABEL NEAR  ; Borland compiler

_ZN6MyList3SumEv  LABEL NEAR  ; Gnu comp. for Linux

__ZN6MyList3SumEv LABEL NEAR  ; Gnu comp. for Windows and Mac OS

PUBLIC @MyList@Sum$qv, _ZN6MyList3SumEv, __ZN6MyList3SumEv

 

      ; Move 'this' from the stack to register ecx:

      mov ecx, [esp+4]

 

; Make mangled names for compilers with __thiscall convention:

?Sum@MyList@@QAEHXZ LABEL NEAR     ; Microsoft compiler

PUBLIC ?Sum@MyList@@QAEHXZ

assume ecx: ptr MyList             ; ecx points to structure MyList

      xor eax, eax                 ; sum = 0

      xor edx, edx                 ; Loop index i = 0

      cmp [ecx].length_, 0         ; this->length

      je  L9                       ; Skip if length = 0

L1:   add eax, [ecx].buffer[edx*4] ; sum += buffer[i]

      add edx, 1                   ; i++

      cmp edx, [ecx].length_       ; while (i < length)

      jb  L1                       ; Loop

L9:   ret                          ; Return value is in eax

_MyList_Sum ENDP                   ; End of int MyList::Sum()

assume ecx: nothing                ; ecx no longer points to anything

The difference in name mangling schemes is actually an advantage here because it enables the linker to lead the call to the entry that corresponds to the right calling convention.

The method becomes more complicated if the member function has more parameters. Consider the function void MyList::AttItem(int item) on page 40. The __thiscall convention has the parameter 'this' in ecx and the parameter item on the stack at [esp+4] and requires that the stack is cleaned up by the function. The __cdecl convention has both parameters on the stack with 'this' at [esp+4] and item at [esp+8] and the stack cleaned up by the caller. A solution with two function entries requires a jump:

; Example 8.3b

; void MyList::AttItem(int item);

 

_MyList_AttItem       PROC NEAR    ; for extern "C" friend function

 

; Make mangled names for compilers with __cdecl convention:

@MyList@AttItem$qi    LABEL NEAR   ; Borland compiler

_ZN6MyList7AttItemEi  LABEL NEAR   ; Gnu comp. for Linux

__ZN6MyList7AttItemEi LABEL NEAR   ; Gnu comp. for Windows and Mac OS

PUBLIC @MyList@AttItem$qi, _ZN6MyList7AttItemEi, __ZN6MyList7AttItemEi

 

      ; Move parameters into registers:

      mov  ecx, [esp+4]            ; ecx = this

mov  edx, [esp+8]            ; edx = item

      jmp  L0                      ; jump into common section

 

; Make mangled names for compilers with __thiscall convention:

?AttItem@MyList@@QAEXH@Z LABEL NEAR; Microsoft compiler

PUBLIC ?AttItem@MyList@@QAEXH@Z

      pop  eax                     ; Remove return address from stack

      pop  edx                     ; Get parameter 'item' from stack

      push eax                     ; Put return address back on stack

 

L0:   ; common section where parameters are in registers

      ; ecx = this, edx = item

 

      assume ecx: ptr MyList       ; ecx points to structure MyList

      mov  eax, [ecx].length_      ; eax = this->length

      cmp  eax, 100                ; Check if too high

      jnb  L9                      ; List is full. Exit

      mov  [ecx].buffer[eax*4],edx ; buffer[length] = item

      add  eax, 1                  ; length++

      mov  [ecx].length_, eax

L9:   ret

_MyList_AttItem ENDP               ; End of MyList::AttItem

assume ecx: nothing                ; ecx no longer points to anything

In example 8.3b, the two function entries each load all parameters into registers and then jumps to a common section that doesn't need to read parameters from the stack. The __thiscall entry must remove parameters from the stack before the common section. Another compatibility problem occurs when we want to have a static and a dynamic link version of the same function library in 32-bit Windows. The static link library uses the __cdecl convention by default, while the dynamic link library uses the __stdcall convention by default. The static link library is the most efficient solution for C++ programs, but the dynamic link library is needed for several other programming languages.

One solution to this problem is to specify the __cdecl or the __stdcall convention for both libraries. Another solution is to make functions with two entries.

The following example shows the function from example 8.2 with two entries for the __cdecl and __stdcall calling conventions. Both conventions have the parameters on the stack. The difference is that the stack is cleaned up by the caller in the __cdecl convention and by the called function in the __stdcall convention.

; Example 8.4a (Example 8.2 with __stdcall and __cdecl entries)

; Function with entries for __stdcall and __cdecl (32-bit Windows):

 

ALIGN     4

; __stdcall entry:

; extern "C" double __stdcall sinxpnx (double x, int n);

_sinxpnx@12        PROC NEAR

    ; Get all parameters into registers

    fild  dword ptr [esp+12] ; n

    fld   qword ptr [esp+4]  ; x

 

    ; Remove parameters from stack:

    pop eax                  ; Pop return address

    add esp, 12              ; remove 12 bytes of parameters

    push eax                 ; Put return address back on stack

    jmp L0

 

; __cdecl entry:

; extern "C" double __cdecl sinxpnx (double x, int n);

_sinxpnx           LABEL NEAR

PUBLIC _sinxpnx

    ; Get all parameters into registers

    fild  dword ptr [esp+12] ; n

    fld   qword ptr [esp+4]  ; x

    ; Don't remove parameters from the stack. This is done by caller

 

L0: ; Common entry with parameters all in registers

; parameter x = st(0)

; parameter n = st(1)

    fmul  st(1), st(0)       ; n*x

    fsin                     ; sin(x)

    fadd                     ; sin(x) + n*x

    ret                      ; return value is in st(0)

_sinxpnx@12  ENDP

The method of removing parameters from the stack in the function prolog rather than in the epilog is admittedly rather kludgy. A more efficient solution is to use conditional assembly:

; Example 8.4b

; Function with versions for __stdcall and __cdecl (32-bit Windows)

; Choose function prolog according to calling convention:

IFDEF STDCALL_               ; If STDCALL_ is defined

    _sinxpnx@12  PROC NEAR   ; extern "C" __stdcall function name

ELSE

    _sinxpnx     PROC NEAR   ; extern "C" __cdecl function name

ENDIF

 

; Function body common to both calling conventions:

    fild  dword ptr [esp+12] ; n

    fld   qword ptr [esp+4]  ; x

    fmul  st(1), st(0)       ; n*x

    fsin                     ; sin(x)

    fadd                     ; sin(x) + n*x

 

; Choose function epilog according to calling convention:

IFDEF STDCALL_               ; If STDCALL_ is defined

    ret 12                   ; Clean up stack if __stdcall

    _sinxpnx@12  ENDP        ; End of function   

ELSE

    ret                      ; Don't clean up stack if __cdecl

    _sinxpnx  ENDP           ; End of function

ENDIF

This solution requires that you make two versions of the object file, one with __cdecl calling convention for the static link library and one with __stdcall calling convention for the dynamic link library. The distinction is made on the command line for the assembler. The __stdcall version is assembled with /DSTDCALL_ on the command line to define the macro STDCALL_, which is detected by the IFDEF conditional.

8.3 Supporting multiple calling conventions in 64 bit mode

Calling conventions are better standardized in 64-bit systems than in 32-bit systems. There is only one calling convention for 64-bit Windows and one calling convention for 64-bit Linux and other Unix-like systems. Unfortunately, the two sets of calling conventions are quite different. The most important differences are:

  • Function parameters are transferred in different registers in the two systems.
  • Registers RSI, RDI, and XMM6 - XMM15 have callee-save status in 64-bit Windows but not in 64-bit Linux.