C++/C/C++ needs header files
Expert: Ralph McArdell - 9/25/2007
QuestionHi,
I try to understand why C/C++ needs header files, but Java doesn't. Do you know about this by any chance?
Thanks,
lzzzz
AnswerI have a bit of knowledge, yes, although I am in no way a Java expert so lack some of the finer details. A Java expert can obviously give much better explanations of the Java side of things – possibly more accurate as I have not looked at Java recently.
Some short answers are:
- Because C++ is C++ and Java is Java. They are different beasties.
- History. (C++)
- Available resources and environment.
Java and other similar technologies (e.g. .NET) do not require header files because all the information they require on other shared entities is pulled in from those entities' compiled modules. The Java compiler writes the output from each compiled Java source file into a Java class file. These files have the same name as the public class they define with the suffix .class, and these tend to be stored in a directory structure that corresponds to the package the class is part of. Thus the class whose fully qualified name is dev.Lzzzz.SomeClass would be placed in a file with a path possibly like so: dev/Lzzzz/SomeClass.class.
The Java compiler can locate these class files as required and extract the required declarations and definitions from them - remember Java does not compile to straight native object code, and the class file structure obviously contains enough information to allow the Java compiler to extract the required information.
Later the Java interpreter or JIT (just in time) compiler executes your class code and loads additional class files as required.
C++ however inherited the C mechanism which, like many other languages at the time, relied on textual inclusion of shared code to produce single one off compilation units which are then compiled as a whole to produce separate intermediate native object code files which are then linked together using a separate linker/loader tool to produce a native executable.
There were very good reasons in the early days of C++ (well C with classes originally - the name C++ came a bit later). C was already established and the new language did not want to introduce a radically different way of doing things - partly because it would have been more work and partly because it was easier for people to implement and use the new language and so helped it gain popularity and acceptance.
C++'s compatibility with C goes quite deep as I am sure you know. However it is a two edged sword - on the one hand it means C++ has gained great acceptance. On the other hand this means that C++ has many nasty little corners and wrinkles that are a direct result of its compatibility with C, and the constraint of the build tool chain: pre-proccess->compile->link is one of them (in fact this is a simplification, but will do for our purposes).
Note that the inclusion above of the initial pre-process stage is important as it is this stage that processed all pre-processor directives - all those things that start with # : #define, #pragma, #ifdef and of course #include.
So by the time the compiler proper gets to see what it is compiling it has one big file with the text of all the included files contained in it. OK so maybe not all C++ compilers work in exactly this way but they should behave as if they do.
The point is that the compiler (generally) does not know if your source file contained a declaration or definition or if it pulled it in from another file using a #include directive.
The main reason in C and C++ (and other languages that use similar mechanisms) to use header files is, most importantly, to ensure consistency. They even offer a degree of abstraction in that I can use the definitions and declarations from an included header file without needing to know the gory details!
If you do not see the argument for consistency then consider the following:
Suppose I define a type alias for AnIntType to be an alias for short int and a function GetTheValue that returns an AnIntType object. Another source file also defines the type alias and declares (not defines) GetTheValue because it wished to call it. Now what happens if I re-define the alias in my module such that AnIntType now aliases an int and GetTheValue such that it requires a parameter. Thus:
In MyModule.cpp
typedef short AnIntType; // master type alias
AnIntType GetTheValue() // function definition
{
// ...
}
In OtherModule.cpp that calls GetTheValue:
typedef short AnIntType; // copy of type alias
AnIntType GetTheValue(); // function declaration
void UsingFunction()
{
AnIntType result( GetTheValue() ); // call GetTheValue
// ...
}
Both files compile and the linker resolves the call to GetTheValue from OtherModule.obj (or OtherModule.o) to be the GetTHeValue defined in MyModule.obj (or MyModule.o).
Next we modify the original definitions in MyModule.cpp:
typedef int AnIntType; // master type alias
AnIntType GetTheValue( AnIntType scale ) // function definition
{
// ...
}
Now the declarations in OtherModule.cpp are wrong but we will only find this out at link time - often with less informative error messages - as OtherModule.cpp is still internally consistent so it will compile, but not consistent with the actual definition of GetTheValue anymore.
If MyModule.cpp used a header file that is included by users of its facilities and kept consistent with the MyModule.cpp then OtherModule.cpp, having included the MyModule.h header file, would firstly have automatically picked up the change to the AnIntType alias and the GetTheValue function declaration and secondly would not compile unless its code were updated to match. It is aways better to catch errors early: compile, then link, finally runtime. I suppose catching errors before compiling or not putting thenin in the first place would be better still <g>!
The point to take away here is that you _can_ do without header files in C and C++ but not for any reasonable length of time or size of project, and that Java and other platforms and languages use different mechanisms so that code is shared in a more robust manner. I should say that this is true of most C++ compilers in use today. It is possible that a really clever compiler could look across compilation units but most (all that I have used) do not.
I consider the textual inclusion of code rather than proper module support a weak point of both C and C++. You might be interested in the fact that this lack of proper module support for C++ has not gone unnoticed. People are working on proposals for modules in C++ - see:
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2006/n2073.pdf
A related subject oddly is work on dynamic libraries, which in fact looks similar to modules, but at the link stage rather than the compile stage, see:
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2003/n1496.html
Sadly neither of these proposals is mature enough to make it into the next version of the C++ standard. Ho hum, one day...
If you are interested in how C++ came to be how it is then I can strongly recommend "The Design and Evolution of C++" by Bjarne Stroustrup - the inventor of C++. See particularly the section on the linkage model (2.5, 2.5.1). I found the book a good read.