C++/C++ address starting point
Expert: Ralph McArdell - 7/17/2007
QuestionHi,
C++ pointer must be integer, and it is a memory address of an object. I want to know where a C++ compiler and runtime count the address from ?
Thanks,,
lzzzz
AnswerThey do not count them from anywhere specifically as standard.
What memory addresses are available to a process and how they are specified, and for what sort of memory depends totally on the processor and operating system (or, the raw system design if there is no operating system), and the tools/formats in use (compilers, linkers, object code formats etc.).
Note that there are three types of memory generally used by a program:
static memory, that the compiler knows about and can arrange to have allocated when a program is started
stack based memory that is local to a function call stack frame and would usually use the process application stack (note: having a separate system stack is common in operating systems)
dynamic memory that the compiler does not quantify and is allocated at runtime from the free store (or heap).
Each of these can be allocated in totally different locations.
For example, on a 16-bit Intel x86 operating system such as MS-DOS (if you can call it an operating system) you could choose the memory model used. Some models used one 64KB segment for all data: the stack memory grew down from the top of the segment, the static memory was presumably wired in at the bottom and the dynamic memory would presumably use the bit left over in the middle. Where this 64KB (or smaller) block was in memory would depend on what else was loaded into the system - segments were allocated on 16 byte boundaries if I member correctly. The compiler of course had control on how much static data was used and I think the linker could control the amount of stack space to use within this segment, and addresses would be 16-bit near address values within this segment, presumable starting at zero.
On the other hand other models would allow multiple 64KB segments to be used for data. In this scheme I would guess stack uses up to one 64KB segment, static data uses one or more pre-determined segments and dynamic memory would come from one or more 64KB segments as required at runtime and allocated by asking for them from the operating system (I forget the exact details). The pointers would use 32-bit far pointers (having two 16 bit parts: 16-bit segment address and 16-bit segment offset). Here no pointer would be truly zero (0000:0000) - this address would almost certainly be part of the system - either the BIOS or MS-DOS. However the compiler may generate addresses starting from 0000:0000 and expect the linking and loading process to fix up the segment part of the addresses to something usable when the program was run.
Many years ago I used mini computers - a Prime 50 series, running Primos, the proprietary operating system for Pr1me Computer 50 series machines. This OS allocated memory for processes such that the lower portion was common to all processes (the operating system code and the like) and the higher section was used by application code. Hence I would not expect a compiler to necessarily generate values starting from zero here either (and the Primes also a segmented memory scheme!). I also remember something about some weird offset of 400 octal or similar that confused things when looking at memory map files and trying to determine where things had gone wrong!
Even a null memory value, although convertible from zero in C++, may not be a bit pattern of all zero bits in reality. I have a vague feeling that this may apply to Primos and that odd 400 octal offset but honestly cannot remember now - it was back in the mid to late 1980s...
Often a compiler might use an offset rather than an actual address. This is especially true of local data in function calls - they tend to be referred to from the stack frame base plus or minus an offset. This stack frame base address is saved, usually in a register, often one for this specific purpose, on entry to the function's code.
It should be noted that dynamic memory obtained from the free store (or heap) is C++ runtime support specific and will depend of the operating system and compiler, and possibly on third party add-ons such as SmartHeap
http://www.microquill.com/index.html (which might be used to improve performance particularly in multithreaded applications over the compiler supplied heap support), so the memory addresses obtained dynamically are going to be very compiler, platform and possibly runtime heap support/library specific.
I should also point out that certain pointer types may contain additional information to the memory address. A pointer to char in particular may be larger than a pointer to other types. This is because on some systems the unit addressed is larger than a byte (Prime 50 series mini computers for example used 16-bit words rather than being byte addressed). In such systems additional information is required to specify which part of a multi-char word at an address is being referenced by a pointer to char (the other option I suppose would be to make a char the same size as an addressed word). If an address addresses a 16-bit word and characters are 8-bits in size then an additional bit would be needed to specify either the high or low 8-bits of an addressed word. However it is probably that a pointer to char would be twice the size of other pointers just to maintain decent word-alignment in memory. Processors have rules about data and alignment - sometimes you pay a time penalty if the data is mis-aligned and sometimes the processor faults (i.e. you program will crash) - depends on the processor type.
The other pointer type that is larger than a usual pointer is not really a pointer at all in the usual sense. That is a pointer to member. Such pointers need to be combined with an object's address to form a pointer to the member for a specific object, so they are more like offsets from an object base address to a member inside it. The problems come when the pointer to member is a pointer to a member function. In such cases information on the function's virtualness and multiple inheritance etc. need to be kept. See for example
http://linuxquality.sunsite.dk/articles/memberpointers/ for more information.
Finally, code generated by a compiler may be position independent, or position dependent but fix-up-able, such that the actual position in the executable output from a linker may not be the same as that in the object code output by the compiler. Similarly, the executable may contain position independent code or have information that allows all addresses to be fixed up during loading of the executable. This occurs if the pre-built location is already in use and occurs most often when dynamically linked (or shared object) libraries are mapped into a process and two or more request loading at the same base address. Obviously re-basing all addresses takes time and so it is to be avoided if possible by selecting base addresses that are not thought to be required for other libraries mapped into an applications process. Such details come into the optimisation phase of development and require the use of linker options and tools such as the MS EDITBIN utility with the /REBASE option which can be used with object files, application executables and dlls.
If you are interested in what code your compiler generates try generating assembler output from your compiler (you can use the /FAs option - Assembly with Source Code Output with MS Visual C++ for example and there is an option in many UNIX/Linux based compilers to generate assembler output, such as the -S option with gcc/g++). Remember that the code produced _will_ depend on the compiler options as well as the compiler. Note that even if VC++ and gcc are both generating assembler for Intel x86 or AMD x64 they (annoyingly) use a different assembler format: MS VC++ uses MS MASM assembler format and gcc uses AT&T assembler for x86/x64 format.
----------------------------------------------------
ADDITIONAL
----------------------------------------------------
After sending my answer to your question I read up on a Prime news group (comp.sys.prime) some interesting discussion on getting software to run under Primos running on an emulator someone has developed (on a Playstation 3 of all things), and came across this as part of a discussion on getting a C compiler to function and compiling some C code:
"But if your plan is to port software from other platforms,
well..... You would be amazed at how much software out there relies on characters being in the range of 0-127 and NULL to actually be 0."
Which indicates that my vague memories about odd memory values are at least partially correct.
Oh the mention of the 0..127 for character values is due to PRIMOS using a 7-bit character encoding within an 8-bit char type with the highest bit set to 1, thus they had char values 128..255 for character values 0..127!