You are here:

C++/c++ question


how to find number of objects present in a file,if the file contains a collection of items/objects of unequal length.

That really depends on the structure of the file.

Here are some obvious ways you could achieve this. I do not claim this is an exhaustive list:

1/ Most obviously you read the whole file to determine how many objects are in it. Note that you do not have to actually hold all or even any actual object in memory, depending on how the file format and the code to create in-memory objects interact. At most you should only need to create one object at a time of the type currently being created from the file, once read it can be deleted so long as it is not required for any other purpose.

2/ If the file contained a linked list of file position offsets from object to object you could follow the links and keep a count.

3/ If the file contained an index of (fixed size) file position offsets you could use the size of this index to determine the number of objects in the file. This is a file equivalent of having a vector of pointers to objects.

4/ The file could contain a count of the number of object it contains.

5/ The file could consist of fixed length and variable length areas (or you could use two or more files). Variable length data primarily consists of string data, but could include media data such as sound, audio of video for example. Then you can deduce a count from the size of the fixed length area. This could be extended to using multiple fixed length areas, each having its own size specified in the file somewhere. The count of all objects is then computed from the sum of the counts for each fixed sized area.

You can of course mix and match. For example you could combine 2, 3, 4 and 5 to a) store the count anyway, b) have and index, or rather a table of contents (or TOC) the specifies the fixed and variable data areas, and each area could use a linked list of pointers to allow the areas to be split (this can be used to allow easily extending the data file), one possible way this might look is as follows:

[Number of Objects]
[Number of TOC entries in this chunk]
[Offset to further TOC chunks or 0 if no more]
[Size of Object fixed data or 0 for variable data chunk]
[Offset to data]
[Size of Object fixed data or 0 for variable data chunk]
[Offset to data]

[Size of fixed length data chunk]
[Offset to next data chunk for same fixed data size or 0 if no more]
[Object Data]
[Object Data]

[Size of variable length data chunk]
[Offset to next variable length data chunks or 0 if no more]
[Variable length data entity size]
[Variable length entity data]
[Variable length data entity size]
[Variable length entity data]

Of course this has some problems. For a start you might not know how many TOC entries you require to start with, and similarly for other chunk types. You could of course add the housekeeping information to the end of the file after all other processing has been done, which I believe is the case with some formats.

Building the file is also a tricky thing to do. To build a whole new file form scratch there would be one of each data size chunk and one variable sized data chunk. You might wish to build these up in separate temporary files and then append them together at the end, remembering to go back and write the correct offset values into the TOC.

If you need to extend the data you could either build the file from scratch again or just append new data and TOC chunks to the end of the file and go back and fix up the existing 0 'next' offsets to point to the new chained chunks.

Consider how to modify this scheme to allow for spare space in the data chunks so that you do not have to have to chain chunks for each object or new fixed sized data value to the file.

Consider if using the fixed size of an object is appropriate on its own. What happens if two object types have the same size?

Consider what sort of operations could be performed to check for consistency and optimise such a file.

If you think this is similar to the problems of memory management and allocation then you would be correct. Both a disk file and memory can be considered as a vector of binary words. Disk words are nearly always 8-bit bytes these days and this is also true of the smallest entity directly addressable by many modern processors. Both can be randomly accessed. The differences are that disk storage is much, much slower than main memory storage and disk storage is persistent and current main memory technology is not.

I could also mention that you have to use some special API to access and manipulate data in a file (e.g. C++ IOStreams, std::fstream etc.) and you just use memory by creating objects and code in your programs. However even this difference can be largely removed on systems that allow files to be mapped into memory, thus allowing the file to be accessed as if it were memory (e.g. as an array of unsigned char). Win32 based systems, UNIX and Linux systems are examples of systems that support such a feature.

Hope this has given you a solution or at least some ideas towards a solution.

One final point: the more complex schemes require that you use the file in binary mode not the default of text mode. This has little difference on a UNIX or Linux system but does on a Microsoft based system where writing 10 (i.e. ctrl-J or \n) when in text mode actually writes 13 10 (CR LF or \r\n) and reading 13 10 is translated into just 10 (LF or \n). Such values can be written as part of a larger entity such as a 4-byte offset value that just happens to contain such values. I assume 8-bit characters based on the ASCII character set. Effectively this makes such supposedly fixed sized values variably sized and could therefore mess up number, size and offset calculations.

To open a std::fstream or similar in binary mode specify std::ios::binary as one of the open mode flags for the file in the object constructor or open member function.  


All Answers

Answers by Expert:

Ask Experts


Ralph McArdell


I am a software developer with more than 15 years C++ experience and over 25 years experience developing a wide variety of applications for Windows NT/2000/XP, UNIX, Linux and other platforms. I can help with basic to advanced C++, C (although I do not write just-C much if at all these days so maybe ask in the C section about purely C matters), software development and many platform specific and system development problems.


My career started in the mid 1980s working as a batch process operator for the now defunct Inner London Education Authority, working on Prime mini computers. I then moved into the role of Programmer / Analyst, also on the Primes, then into technical support and finally into the micro computing section, using a variety of 16 and 8 bit machines. Following the demise of the ILEA I worked for a small company, now gone, called Hodos. I worked on a part task train simulator using C and the Intel DVI (Digital Video Interactive) - the hardware based predecessor to Indeo. Other projects included a CGI based train simulator (different goals to the first), and various other projects in C and Visual Basic (er, version 1 that is). When Hodos went into receivership I went freelance and finally managed to start working in C++. I initially had contracts working on train simulators (surprise) and multimedia - I worked on many of the Dorling Kindersley CD-ROM titles and wrote the screensaver games for the Wallace and Gromit Cracking Animator CD. My more recent contracts have been more traditionally IT based, working predominately in C++ on MS Windows NT, 2000. XP, Linux and UN*X. These projects have had wide ranging additional skill sets including system analysis and design, databases and SQL in various guises, C#, client server and remoting, cross porting applications between platforms and various client development processes. I have an interest in the development of the C++ core language and libraries and try to keep up with at least some of the papers on the ISO C++ Standard Committee site at


©2017 All rights reserved.

[an error occurred while processing this directive]