You are here:

C++/C++ address starting point

Advertisement


Question
Hi,

C++ pointer must be integer, and it is a memory address of an object. I want to know where a C++ compiler and runtime count the address from ?

Thanks,,
lzzzz

Answer
They do not count them from anywhere specifically as standard.

What memory addresses are available to a process and how they are specified, and for what sort of memory depends totally on the processor and operating system (or, the raw system design if there is no operating system), and the tools/formats in use (compilers, linkers, object code formats etc.).

Note that there are three types of memory generally used by a program:

static memory, that the compiler knows about and can arrange to have allocated when a program is started

stack based memory that is local to a function call stack frame and would usually use the process application stack (note: having a separate system stack is common in operating systems)

dynamic memory that the compiler does not quantify and is allocated at runtime from the free store (or heap).

Each of these can be allocated in totally different locations.

For example, on a 16-bit Intel x86 operating system such as MS-DOS (if you can call it an operating system) you could choose the memory model used. Some models used one 64KB segment for all data: the stack memory grew down from the top of the segment, the static memory was presumably wired in at the bottom and the dynamic memory would presumably use the bit left over in the middle. Where this 64KB (or smaller) block was in memory would depend on what else was loaded into the system - segments were allocated on 16 byte boundaries if I member correctly. The compiler of course had control on how much static data was used and I think the linker could control the amount of stack space to use within this segment, and addresses would be 16-bit near address values within this segment, presumable starting at zero.

On the other hand other models would allow multiple 64KB segments to be used for data. In this scheme I would guess stack uses up to one 64KB segment, static data uses one or more pre-determined segments and dynamic memory would come from one or more 64KB segments as required at runtime and allocated by asking for them from the operating system (I forget the exact details). The pointers would use 32-bit far pointers (having two 16 bit parts: 16-bit segment address and 16-bit segment offset). Here no pointer would be truly zero (0000:0000) - this address would almost certainly be part of the system - either the BIOS or MS-DOS. However the compiler may generate addresses starting from 0000:0000 and expect the linking and loading process to fix up the segment part of the addresses to something usable when the program was run.

Many years ago I used mini computers - a Prime 50 series, running Primos, the proprietary operating system for Pr1me Computer 50 series machines. This OS allocated memory for processes such that the lower portion was common to all processes (the operating system code and the like) and the higher section was used by application code. Hence I would not expect a compiler to necessarily generate values starting from zero here either (and the Primes also a segmented memory scheme!). I also remember something about some weird offset of 400 octal or similar that confused things when looking at memory map files and trying to determine where things had gone wrong!

Even a null memory value, although convertible from zero in C++, may not be a bit pattern of all zero bits in reality. I have a vague feeling that this may apply to Primos and that odd 400 octal offset but honestly cannot remember now - it was back in the mid to late 1980s...

Often a compiler might use an offset rather than an actual address. This is especially true of local data in function calls - they tend to be referred to from the stack frame base plus or minus an offset. This stack frame base address is saved, usually in a register, often one for this specific purpose, on entry to the function's code.

It should be noted that dynamic memory obtained from the free store (or heap) is C++ runtime support specific and will depend of the operating system and compiler, and possibly on third party add-ons such as SmartHeap http://www.microquill.com/index.html (which might be used to improve performance particularly in multithreaded applications over the compiler supplied heap support), so the memory addresses obtained dynamically are going to be very compiler, platform and possibly runtime heap support/library specific.

I should also point out that certain pointer types may contain additional information to the memory address. A pointer to char in particular may be larger than a pointer to other types. This is because on some systems the unit addressed is larger than a byte (Prime 50 series mini computers for example used 16-bit words rather than being byte addressed). In such systems additional information is required to specify which part of a multi-char word at an address is being referenced by a pointer to char (the other option I suppose would be to make a char the same size as an addressed word). If an address addresses a 16-bit word and characters are 8-bits in size then an additional bit would be needed to specify either the high or low 8-bits of an addressed word. However it is probably that a pointer to char would be twice the size of other pointers just to maintain decent word-alignment in memory. Processors have rules about data and alignment - sometimes you pay a time penalty if the data is mis-aligned and sometimes the processor faults (i.e. you program will crash) - depends on the processor type.

The other pointer type that is larger than a usual pointer is not really a pointer at all in the usual sense. That is a pointer to member. Such pointers need to be combined with an object's address to form a pointer to the member for a specific object, so they are more like offsets from an object base address to a member inside it. The problems come when the pointer to member is a pointer to a member function. In such cases information on the function's virtualness and multiple inheritance etc. need to be kept. See for example http://linuxquality.sunsite.dk/articles/memberpointers/ for more information.

Finally, code generated by a compiler may be position independent, or position dependent but fix-up-able, such that the actual position in the executable output from a linker may not be the same as that in the object code output by the compiler. Similarly, the executable may contain position independent code or have information that allows all addresses to be fixed up during loading of the executable. This occurs if the pre-built location is already in use and occurs most often when dynamically linked (or shared object) libraries are mapped into a process and two or more request loading at the same base address. Obviously re-basing all addresses takes time and so it is to be avoided if possible by selecting base addresses that are not thought to be required for other libraries mapped into an applications process. Such details come into the optimisation phase of development and require the use of linker options and tools such as the MS EDITBIN utility with the /REBASE option which can be used with object files, application executables and dlls.

If you are interested in what code your compiler generates try generating assembler output from your compiler (you can use the /FAs option - Assembly with Source Code Output with MS Visual C++ for example and there is an option in many UNIX/Linux based compilers to generate assembler output, such as the -S option with gcc/g++). Remember that the code produced _will_ depend on the compiler options as well as the compiler. Note that even if VC++ and gcc are both generating assembler for Intel x86 or AMD x64 they (annoyingly) use a different assembler format: MS VC++ uses MS MASM assembler format and gcc uses AT&T assembler for x86/x64 format.



----------------------------------------------------
ADDITIONAL
----------------------------------------------------

After sending my answer to your question I read up on a Prime news group (comp.sys.prime) some interesting discussion on getting software to run under Primos running on an emulator someone has developed (on a Playstation 3 of all things), and came across this as part of a discussion on getting a C compiler to function and compiling some C code:

"But if your plan is to port software from other platforms,
well.....   You would be amazed at how much software out there relies on characters being in the range of 0-127 and NULL to actually be 0."

Which indicates that my vague memories about odd memory values are at least partially correct.

Oh the mention of the 0..127 for character values is due to PRIMOS using a 7-bit character encoding within an 8-bit char type with the highest bit set to 1, thus they had char values 128..255 for character values 0..127!  

C++

All Answers


Answers by Expert:


Ask Experts

Volunteer


Ralph McArdell

Expertise

I am a software developer with more than 15 years C++ experience and over 25 years experience developing a wide variety of applications for Windows NT/2000/XP, UNIX, Linux and other platforms. I can help with basic to advanced C++, C (although I do not write just-C much if at all these days so maybe ask in the C section about purely C matters), software development and many platform specific and system development problems.

Experience

My career started in the mid 1980s working as a batch process operator for the now defunct Inner London Education Authority, working on Prime mini computers. I then moved into the role of Programmer / Analyst, also on the Primes, then into technical support and finally into the micro computing section, using a variety of 16 and 8 bit machines. Following the demise of the ILEA I worked for a small company, now gone, called Hodos. I worked on a part task train simulator using C and the Intel DVI (Digital Video Interactive) - the hardware based predecessor to Indeo. Other projects included a CGI based train simulator (different goals to the first), and various other projects in C and Visual Basic (er, version 1 that is). When Hodos went into receivership I went freelance and finally managed to start working in C++. I initially had contracts working on train simulators (surprise) and multimedia - I worked on many of the Dorling Kindersley CD-ROM titles and wrote the screensaver games for the Wallace and Gromit Cracking Animator CD. My more recent contracts have been more traditionally IT based, working predominately in C++ on MS Windows NT, 2000. XP, Linux and UN*X. These projects have had wide ranging additional skill sets including system analysis and design, databases and SQL in various guises, C#, client server and remoting, cross porting applications between platforms and various client development processes. I have an interest in the development of the C++ core language and libraries and try to keep up with at least some of the papers on the ISO C++ Standard Committee site at http://www.open-std.org/jtc1/sc22/wg21/.

Education/Credentials

©2016 About.com. All rights reserved.