I was woundering if you could aid me in learning about files. before i write a program I need a little bit of knowledge about them.
I have a fact wrong somewere and maybe you can help.
1) eveything on a computer is a 0 or a 1.
2) Therefore a file (name.extension) is made from 0's and 1's.
3) any program can open any file, however when processing a file, if data that is expected to be found isn't there will be an error (eg open a.txt in media player)

I believe all to be true. however, I right click on picture.jpg -> open with -> notepad.exe
I save the file as image.jpg. however the file doesn't open.

My logic is, the binary is read by note pad and converted to characters. note pad saves this which converts the characters back to binary so my image priver should read it.

Yes its a strange question. the ultimate reason is to get me to understand these files so i can use a programming language to do things like:
copy a file
invert the binary of a file
convert several images/sounds/programs into a large file which can be re seperated.

Mostly for educational purposes, but also to send batch pictures without zipping them (I have the most computer illiterate friends)

you can see my original plan easily, start a text document with something like @#@#this is image 1, then have the next lines to contain image 1's data.
then use a program to read lines till it found @#@# (not sure on this as a converted line may contain this) then save the lines it read to a file.


First this question has very little if anything to do with C++ and programming, so is automatically outside the area I answer questions on. So the following may not be what you are after.

Your assumption is that reading and writing binary data file, such as a JPEG image file using a utility for processing text, such as notepad, is a straight through operation that causes no change in the data.

This may not be true (and in fact from what you report in your question appears not to be true). One way this could happen is if notepad uses the C (or C++) file handling routines. These routines on Microsoft operating systems handle text differently to binary data - you specify which when you open the file for processing. The character ctrl-Z for text files is taken to mean end of file, and <cr><lf> sequences are converted to <lf> which is then converted back to <cr><lf> on output.

This latter behaviour should be a null effect when you read and write the file in most cases. However, the ctrl-Z effect is more obviously serious as binary data handled in this way gets truncated at the first byte having a value of 26 (ASCII ctrl-Z). Also, MS text files should never have a <cr> or <lf> character on their own - as they should always come in <cr><lf> pairs - this then could be another source of corruption. Further, notepad may well wrap long lines - inserting line breaks effectively changing the data.

In general any text character sequence you use as delimiters around binary data can potentially occur in that binary data. The usual solution is to convert (encode) each byte of binary data to a text format using only safe characters - letters, numerals etc. Any such encoding of course needs the reverse to be able to convert these values back to raw bytes again - the decoding process. So for example the raw byte sequence:

       00 01 02 fe ed de ca

(as hexadecimal digits) could be converted to the character equivalent for each digit - effectively doubling (or worse if you are using a character encoding larger than a byte) the length of the data:

      '0' '0' '0' '1' '0' '2' 'f' 'e' 'e' 'd' 'd' 'e' 'c' 'a'

The quotes indicate that I mean a character and not a raw value. In ASCII the above characters are represented as the (hexadecimal) values:

       30 30 30 31 30 32 3f 3e 3e 3d 3d 3e 3c 3a

A more efficient encoding would be Base64 encoding (see http://www.faqs.org/rfcs/rfc3548.html for example). This is exactly the sort of encoding used in email MIME attachments - which would be a possible format for your file format.

Note that each hexadecimal digit can be represented as a 4-bit binary value between 0000 (0) and 1111 (f).

If you and your friends are using Windows XP then it handles ZIP files directly in Windows Explorer anyway.

If not then a little tutoring from you might be the easiest way forward as anything you produce will be non standard an so you will have to support all by yourself - tools, viewers etc. You might just as well use standard formats that are well supported and teach your friends to use them.

As mentioned earlier, emailing them with attachments is probably one of the easiest solutions, if this is the sort of way you were thinking of distributing the data.

Remember KISS - Keep It Simple, Silly!

Hope this helps a bit.  


All Answers

Answers by Expert:

Ask Experts


Ralph McArdell


I am a software developer with more than 15 years C++ experience and over 25 years experience developing a wide variety of applications for Windows NT/2000/XP, UNIX, Linux and other platforms. I can help with basic to advanced C++, C (although I do not write just-C much if at all these days so maybe ask in the C section about purely C matters), software development and many platform specific and system development problems.


My career started in the mid 1980s working as a batch process operator for the now defunct Inner London Education Authority, working on Prime mini computers. I then moved into the role of Programmer / Analyst, also on the Primes, then into technical support and finally into the micro computing section, using a variety of 16 and 8 bit machines. Following the demise of the ILEA I worked for a small company, now gone, called Hodos. I worked on a part task train simulator using C and the Intel DVI (Digital Video Interactive) - the hardware based predecessor to Indeo. Other projects included a CGI based train simulator (different goals to the first), and various other projects in C and Visual Basic (er, version 1 that is). When Hodos went into receivership I went freelance and finally managed to start working in C++. I initially had contracts working on train simulators (surprise) and multimedia - I worked on many of the Dorling Kindersley CD-ROM titles and wrote the screensaver games for the Wallace and Gromit Cracking Animator CD. My more recent contracts have been more traditionally IT based, working predominately in C++ on MS Windows NT, 2000. XP, Linux and UN*X. These projects have had wide ranging additional skill sets including system analysis and design, databases and SQL in various guises, C#, client server and remoting, cross porting applications between platforms and various client development processes. I have an interest in the development of the C++ core language and libraries and try to keep up with at least some of the papers on the ISO C++ Standard Committee site at http://www.open-std.org/jtc1/sc22/wg21/.


©2016 About.com. All rights reserved.