You are here:

C++/C++ source code -> assembly language (CPU instruction set) -> binary code

Advertisement


Question
Hi,

My understanding of processing a C++ program as follows:
-------------------------------------------
C++ compiler translate C++ source code into assembly language which is the CPU instruction set, then CPU translate the assembly language into binary code such as 010101 to run.
--------------------------------------------

Do you know anything about this?

Thanks,
lzzzz

Answer
Yes I do and you are wrong.

A CPU does _not_ execute assembly language. It executes machine instructions, which are encoded in binary.

Consider a time when digital computers were new and language tools were not yet available. In such time you have to program by writing binary machine instructions directly into memory. However entering binary all the time, or some other numeric base equivalent, is tedious to say the least as is the act of figuring out the numbers in the first place as to do so you need to have a very good understanding of the way the instructions are encoded. So people invented assembly language which wraps up instructions in short mnemonics such as MOV or LD (move or load), ADD, JMP or BRA (jump or branch) CMP (compare) etc. and takes care of the tedious work of encoding the various forms of these basic instructions - usually these variations have to do with the locations of the operands. For example moving a value from one place to another: this could be from one register to another, from memory to register, immediate to register, memory indirect to register, register to memory and so on. Jumps (or branches) require the address of the destination of the jump specified and this can usually be either relative or absolute. The fastest jump is often a short relative jump - so calculating the correct offsets can again be very tedious and another chore which an assembler helps out with.

So an assembly language for a CPU is just another form of translator. You write the assembly language in source files, run them through the assembler application from which some sort of assembled machine code representation is produced called object code.

As an example the popular Intel x86 and variations have at least two assembler syntax variations in use: the Intel variety and the ATT variety. You might wonder why this is important. Well Microsoft and many other assemblers and compilers use the Intel version and the GNU translator set uses the ATT version (by default) and they are quite different. (Note: you can often get assembler output from compilers and may be able to include inline assembler in certain language source code).

For more on assembly language try http://en.wikipedia.org/wiki/Assembly_language for starters, and http://en.wikipedia.org/wiki/X86_assembly_language for x86 assembler (which appears to concentrate on the Intel variety). See http://en.wikipedia.org/wiki/GNU_Assembler for a little on the GNU assembler. For more on object code see http://en.wikipedia.org/wiki/Object_file and http://en.wikipedia.org/wiki/Linker on linkers.

OK so that is assembly code.

A C++ (or other language) compiler might produce assembler code. However the assembler will then need to be executed to assemble the assembly code into object code ready for linking. This might in fact happen under the covers when compiling - the assembler just gets invoked after the compiler and before the linker.

A C++ compiler might produce some other language code - most notably C code - which is then compiled (and assembled if the C compiler produces assembler) to produce object code which is then linked to form an executable. Early C++ compilers worked in this fashion and some still do today - the Comeau C++ compiler for example (see http://www.comeaucomputing.com/). Again, any such compiler would probably be invoked automatically after the C++ compilation stage.

A C++ compiler might produce object code containing machine instructions native to the machine's CPU directly without recourse to other translation programs.

A C++ compiler might produce some intermediate machine language object code - as in the case of producing managed code from the Microsoft Visual C++ compiler to execute within the .NET environment. In this mode the Microsoft compiler produces object code containing CIL (Common Intermediate Language) - formerly MSIL (MicroSoft Intermediate Language) - see http://en.wikipedia.org/wiki/Common_Intermediate_Language.

A C++ compiler might produce object code in some foreign system format containing machine code for some other CPU. Such a compiler is called a cross compiler - not because it is annoyed <g>, but because it produces code that needs to be moved across to some other system to be executed (OK so that is probably not the real reason see the section on Canadian Cross in the Wikipedia article mentioned below). Such compilers are useful for example when developing for small devices that cannot support a development environment (e.g. a washing machine controller). See http://en.wikipedia.org/wiki/Cross_compiler for more on cross compilers.

So basically the process is:

- take a source file and translate it to produce an object file
- link object files and libraries to produce an executable

The first stage is repeated until all object files required are produced, whereupon they can all be linked to form an executable.

I have just used the term 'translate' in the first step. This can involve executing various pre-processors and pre-compilers, various main language compilers, and an assembler (I cannot see requiring more than one of these!). Hopefully most such stages of translation will be wrapped up in a single command, together with the linking stage for simple programs.

Libraries come in two main forms:

- Simple archives of object files (static libraries). In fact the UN*X/Linux utility for managing static object file libraries is called ar - short for archiver.

- Dynamic or shared object libraries which have to be linked like an executable. Such libraries can be loaded into memory at runtime (using operating specific API functions), and may be shared between processes.

See http://en.wikipedia.org/wiki/Library_(computing) for more information on libraries.  

C++

All Answers


Answers by Expert:


Ask Experts

Volunteer


Ralph McArdell

Expertise

I am a software developer with more than 15 years C++ experience and over 25 years experience developing a wide variety of applications for Windows NT/2000/XP, UNIX, Linux and other platforms. I can help with basic to advanced C++, C (although I do not write just-C much if at all these days so maybe ask in the C section about purely C matters), software development and many platform specific and system development problems.

Experience

My career started in the mid 1980s working as a batch process operator for the now defunct Inner London Education Authority, working on Prime mini computers. I then moved into the role of Programmer / Analyst, also on the Primes, then into technical support and finally into the micro computing section, using a variety of 16 and 8 bit machines. Following the demise of the ILEA I worked for a small company, now gone, called Hodos. I worked on a part task train simulator using C and the Intel DVI (Digital Video Interactive) - the hardware based predecessor to Indeo. Other projects included a CGI based train simulator (different goals to the first), and various other projects in C and Visual Basic (er, version 1 that is). When Hodos went into receivership I went freelance and finally managed to start working in C++. I initially had contracts working on train simulators (surprise) and multimedia - I worked on many of the Dorling Kindersley CD-ROM titles and wrote the screensaver games for the Wallace and Gromit Cracking Animator CD. My more recent contracts have been more traditionally IT based, working predominately in C++ on MS Windows NT, 2000. XP, Linux and UN*X. These projects have had wide ranging additional skill sets including system analysis and design, databases and SQL in various guises, C#, client server and remoting, cross porting applications between platforms and various client development processes. I have an interest in the development of the C++ core language and libraries and try to keep up with at least some of the papers on the ISO C++ Standard Committee site at http://www.open-std.org/jtc1/sc22/wg21/.

Education/Credentials

©2016 About.com. All rights reserved.