C++/C++ source code -> assembly language (CPU instruction set) -> binary code
Expert: Ralph McArdell - 5/6/2008
QuestionHi,
My understanding of processing a C++ program as follows:
-------------------------------------------
C++ compiler translate C++ source code into assembly language which is the CPU instruction set, then CPU translate the assembly language into binary code such as 010101 to run.
--------------------------------------------
Do you know anything about this?
Thanks,
lzzzz
AnswerYes I do and you are wrong.
A CPU does _not_ execute assembly language. It executes machine instructions, which are encoded in binary.
Consider a time when digital computers were new and language tools were not yet available. In such time you have to program by writing binary machine instructions directly into memory. However entering binary all the time, or some other numeric base equivalent, is tedious to say the least as is the act of figuring out the numbers in the first place as to do so you need to have a very good understanding of the way the instructions are encoded. So people invented assembly language which wraps up instructions in short mnemonics such as MOV or LD (move or load), ADD, JMP or BRA (jump or branch) CMP (compare) etc. and takes care of the tedious work of encoding the various forms of these basic instructions - usually these variations have to do with the locations of the operands. For example moving a value from one place to another: this could be from one register to another, from memory to register, immediate to register, memory indirect to register, register to memory and so on. Jumps (or branches) require the address of the destination of the jump specified and this can usually be either relative or absolute. The fastest jump is often a short relative jump - so calculating the correct offsets can again be very tedious and another chore which an assembler helps out with.
So an assembly language for a CPU is just another form of translator. You write the assembly language in source files, run them through the assembler application from which some sort of assembled machine code representation is produced called object code.
As an example the popular Intel x86 and variations have at least two assembler syntax variations in use: the Intel variety and the ATT variety. You might wonder why this is important. Well Microsoft and many other assemblers and compilers use the Intel version and the GNU translator set uses the ATT version (by default) and they are quite different. (Note: you can often get assembler output from compilers and may be able to include inline assembler in certain language source code).
For more on assembly language try
http://en.wikipedia.org/wiki/Assembly_language for starters, and
http://en.wikipedia.org/wiki/X86_assembly_language for x86 assembler (which appears to concentrate on the Intel variety). See
http://en.wikipedia.org/wiki/GNU_Assembler for a little on the GNU assembler. For more on object code see
http://en.wikipedia.org/wiki/Object_file and
http://en.wikipedia.org/wiki/Linker on linkers.
OK so that is assembly code.
A C++ (or other language) compiler might produce assembler code. However the assembler will then need to be executed to assemble the assembly code into object code ready for linking. This might in fact happen under the covers when compiling - the assembler just gets invoked after the compiler and before the linker.
A C++ compiler might produce some other language code - most notably C code - which is then compiled (and assembled if the C compiler produces assembler) to produce object code which is then linked to form an executable. Early C++ compilers worked in this fashion and some still do today - the Comeau C++ compiler for example (see
http://www.comeaucomputing.com/). Again, any such compiler would probably be invoked automatically after the C++ compilation stage.
A C++ compiler might produce object code containing machine instructions native to the machine's CPU directly without recourse to other translation programs.
A C++ compiler might produce some intermediate machine language object code - as in the case of producing managed code from the Microsoft Visual C++ compiler to execute within the .NET environment. In this mode the Microsoft compiler produces object code containing CIL (Common Intermediate Language) - formerly MSIL (MicroSoft Intermediate Language) - see
http://en.wikipedia.org/wiki/Common_Intermediate_Language
A C++ compiler might produce object code in some foreign system format containing machine code for some other CPU. Such a compiler is called a cross compiler - not because it is annoyed <g>, but because it produces code that needs to be moved across to some other system to be executed (OK so that is probably not the real reason see the section on Canadian Cross in the Wikipedia article mentioned below). Such compilers are useful for example when developing for small devices that cannot support a development environment (e.g. a washing machine controller). See
http://en.wikipedia.org/wiki/Cross_compiler for more on cross compilers.
So basically the process is:
- take a source file and translate it to produce an object file
- link object files and libraries to produce an executable
The first stage is repeated until all object files required are produced, whereupon they can all be linked to form an executable.
I have just used the term 'translate' in the first step. This can involve executing various pre-processors and pre-compilers, various main language compilers, and an assembler (I cannot see requiring more than one of these!). Hopefully most such stages of translation will be wrapped up in a single command, together with the linking stage for simple programs.
Libraries come in two main forms:
- Simple archives of object files (static libraries). In fact the UN*X/Linux utility for managing static object file libraries is called ar - short for archiver.
- Dynamic or shared object libraries which have to be linked like an executable. Such libraries can be loaded into memory at runtime (using operating specific API functions), and may be shared between processes.
See
http://en.wikipedia.org/wiki/Library_(computing) for more information on libraries.