Optimizing Subroutines in Assembly Language by Agner Fog - HTML preview

PLEASE NOTE: This is an HTML preview only and some elements such as links or page numbers may be incorrect.
Download the book in PDF, ePub, Kindle for a complete version.

4 ABI standards

 

ABI stands for Application Binary Interface. An ABI is a standard for how functions are called, how parameters and return values are transferred, and which registers a function is allowed to change. It is important to obey the appropriate ABI standard when combining assembly with high level language. The details of calling conventions etc. are covered in manual 5: "Calling conventions for different C++ compilers and operating systems". The most important rules are summarized here for your convenience.

4.1 Register usage

img15.png

The floating point registers ST(0) - ST(7) must be empty before any call or return, except when used for function return value. The MMX registers must be cleared by EMMS before any call or return. The YMM registers must be cleared by VZEROUPPER before any call or return to non-VEX code.

The arithmetic flags can be changed freely. The direction flag may be set temporarily, but must be cleared before any call or return in 32-bit and 64-bit systems. The interrupt flag cannot be cleared in protected operating systems. The floating point control word and bit 6- 15 of the MXCSR register must be saved and restored in functions that modify them.

Register FS and GS are used for thread information blocks etc. and should not be changed. Other segment registers should not be changed, except in segmented 16-bit models.

4.2 Data storage

Variables and objects that are declared inside a function in C or C++ are stored on the stack and addressed relative to the stack pointer or a stack frame. This is the most efficient way of storing data, for two reasons. Firstly, the stack space used for local storage is released when the function returns and may be reused by the next function that is called. Using the same memory area repeatedly improves data caching. The second reason is that data stored on the stack can often be addressed with an 8-bit offset relative to a pointer rather than the 32 bits required for addressing data in the data segment. This makes the code more compact so that it takes less space in the code cache or trace cache.

Global and static data in C++ are stored in the data segment and addressed with 32-bit absolute addresses in 32-bit systems and with 32-bit RIP-relative addresses in 64-bit systems. A third way of storing data in C++ is to allocate space with new or malloc. This method should be avoided if speed is critical.

4.3 Function calling conventions

Calling convention in 16 bit mode DOS and Windows 3.x

Function parameters are passed on the stack with the first parameter at the lowest address. This corresponds to pushing the last parameter first. The stack is cleaned up by the caller.

Parameters of 8 or 16 bits size use one word of stack space. Parameters bigger than 16 bits are stored in little-endian form, i.e. with the least significant word at the lowest address. All stack parameters are aligned by 2.

Function return values are passed in registers in most cases. 8-bit integers are returned in AL, 16-bit integers and near pointers in AX, 32-bit integers and far pointers in DX:AX, Booleans in AX, and floating point values in ST(0).

Calling convention in 32 bit Windows, Linux, BSD, Mac OS X

Function parameters are passed on the stack according to the following calling conventions:

img16.png

The __cdecl calling convention is the default in Linux. In Windows, the __cdecl convention is also the default except for member functions, system functions and DLL's. Statically linked modules in .obj and .lib files should preferably use __cdecl, while dynamic link libraries in .dll files should use __stdcall. The Microsoft, Intel, Digital Mars and Codeplay compilers use __thiscall by default for member functions under Windows, the Borland compiler uses __cdecl with 'this' as the first parameter.

The fastest calling convention for functions with integer parameters is __fastcall, but this calling convention is not standardized.

Remember that the stack pointer is decreased when a value is pushed on the stack. This means that the parameter pushed first will be at the highest address, in accordance with the _pascal convention. You must push parameters in reverse order to satisfy the __cdecl and __stdcall conventions.

Parameters of 32 bits size or less use 4 bytes of stack space. Parameters bigger than 32 bits are stored in little-endian form, i.e. with the least significant DWORD at the lowest address, and aligned by 4.

Mac OS X and the Gnu compiler version 3 and later align the stack by 16 before every call instruction, though this behavior is not consistent. Sometimes the stack is aligned by 4. This discrepancy is an unresolved issue at the time of writing. See manual 5: "Calling conventions for different C++ compilers and operating systems" for details.

Function return values are passed in registers in most cases. 8-bit integers are returned in AL, 16-bit integers in AX, 32-bit integers, pointers, references and Booleans in EAX, 64-bit integers in EDX:EAX, and floating point values in ST(0).

See manual 5: "Calling conventions for different C++ compilers and operating systems" for details about parameters of composite types (struct, class, union) and vector types (__m64, __m128, __m256).

Calling conventions in 64 bit Windows

The first parameter is transferred in RCX if it is an integer or in XMM0 if it is a float or double. The second parameter is transferred in RDX or XMM1. The third parameter is trans- ferred in R8 or XMM2. The fourth parameter is transferred in R9 or XMM3. Note that RCX is not used for parameter transfer if XMM0 is used, and vice versa. No more than four parameters can be transferred in registers, regardless of type. Any further parameters are transferred on the stack with the first parameter at the lowest address and aligned by 8. Member functions have 'this' as the first parameter.

The caller must allocate 32 bytes of free space on the stack in addition to any parameters transferred on the stack. This is a shadow space where the called function can save the four parameter registers if it needs to. The shadow space is the place where the first four parameters would have been stored if they were transferred on the stack according to the __cdecl rule. The shadow space belongs to the called function which is allowed to store the parameters (or anything else) in the shadow space. The caller must reserve the 32 bytes of shadow space even for functions that have no parameters. The caller must clean up the stack, including the shadow space. Return values are in RAX or XMM0.

The stack pointer must be aligned by 16 before any CALL instruction, so that the value of RSP is 8 modulo 16 at the entry of a function. The function can rely on this alignment when storing XMM registers to the stack.

See manual 5: "Calling conventions for different C++ compilers and operating systems" for details about parameters of composite types (struct, class, union) and vector types (__m64, __m128, __m256).

Calling conventions in 64 bit Linux, BSD and Mac OS X

The first six integer parameters are transferred in RDI, RSI, RDX, RCX, R8, R9, respectively. The first eight floating point parameters are transferred in XMM0 - XMM7. All these registers can be used, so that a maximum of fourteen parameters can be transferred in registers. Any further parameters are transferred on the stack with the first parameters at the lowest address and aligned by 8. The stack is cleaned up by the caller if there are any parameters on the stack. There is no shadow space. Member functions have 'this' as the first parameter. Return values are in RAX or XMM0.

The stack pointer must be aligned by 16 before any CALL instruction, so that the value of RSP is 8 modulo 16 at the entry of a function. The function can rely on this alignment when storing XMM registers to the stack.

The address range from [RSP-1] to [RSP-128] is called the red zone. A function can safely store data above the stack in the red zone as long as this is not overwritten by any PUSH or CALL instructions.

See manual 5: "Calling conventions for different C++ compilers and operating systems" for details about parameters of composite types (struct, class, union) and vector types (__m64, __m128, __m256).

4.4 Name mangling and name decoration

The support for function overloading in C++ makes it necessary to supply information about the parameters of a function to the linker. This is done by appending codes for the parameter types to the function name. This is called name mangling. The name mangling codes have traditionally been compiler specific. Fortunately, there is a growing tendency towards standardization in this area in order to improve compatibility between different compilers. The name mangling codes for different compilers are described in detail in manual 5: "Calling conventions for different C++ compilers and operating systems".

The problem of incompatible name mangling codes is most easily solved by using extern "C" declarations. Functions with extern "C" declaration have no name mangling. The only decoration is an underscore prefix in 16- and 32-bit Windows and 32- and 64-bit Mac OS. There is some additional decoration of the name for functions with __stdcall and __fastcall declarations.

The extern "C" declaration cannot be used for member functions, overloaded functions, operators, and other constructs that are not supported in the C language. You can avoid name mangling in these cases by defining a mangled function that calls an unmangled function. If the mangled function is defined as inline then the compiler will simply replace the call to the mangled function by the call to the unmangled function. For example, to define an overloaded C++ operator in assembly without name mangling:

class C1;

// unmangled assembly function;

extern "C" C1 cplus  (C1 const & a, C1 const & b);

// mangled C++ operator

inline C1 operator + (C1 const & a, C1 const & b) {

   // operator + replaced inline by function cplus

   return cplus(a, b);

}

Overloaded functions can be inlined in the same way. Class member functions can be translated to friend functions as illustrated in example 7.1b page 49.

4.5 Function examples

The following examples show how to code a function in assembly that obeys the calling conventions. First the code in C++:

// Example 4.1a

extern "C" double sinxpnx (double x, int n) {

   return sin(x) + n * x;

}

The same function can be coded in assembly. The following examples show the same function coded for different platforms.

; Example 4.1b. 16-bit DOS and Windows 3.x

ALIGN     4

_sinxpnx  PROC   NEAR

; parameter x = [SP+2]

; parameter n = [SP+10]

; return value = ST(0)

 

    push  bp                ; bp must be saved

    mov   bp, sp            ; stack frame

    fild  word ptr [bp+12]  ; n

    fld   qword ptr [bp+4]  ; x

    fmul  st(1), st(0)      ; n*x

    fsin                    ; sin(x)

    fadd                    ; sin(x) + n*x

    pop   bp                ; restore bp

    ret                     ; return value is in st(0)

_sinxpnx  ENDP

In 16-bit mode we need BP as a stack frame because SP cannot be used as base pointer. The integer n is only 16 bits. I have used the hardware instruction FSIN for the sin function.