You are here:

C++/C++ vectors (incrementing an element)

Advertisement


Question
after i read each character from an input stream (a text file) one by one how do I count the frequency of each character? I assume it is by incrementing the frequency for the appropriate element in a vector, like if there is the letter 'a' in the file then Vector[97] = incremented by one?

Vector[97] would be the ascii value for a...
Vector[98] would be the ascii value for b..  for example

i am stumped on this one..thanks

Answer
I am not sure why you say you are stumped as you have the basic idea totally correct!

The only thing you seem not to have not realised is that you can use the character value read as a number.

When you say:

   Vector[97] would be the ascii value for a...

You can then take it one stage further and say:

   Vector['a']

Character literals such as 'a' translate to the number representing that character in the character set in use for the compiler/platform in question. Hence we can use them in places an integer literal or constant can be used, as in the index usage above. Other obvious examples are:

For case values in switch statements:

   switch (command_char)
   {
   case 'h':    // Help requested
   case 'H':
   case '?':
       DoHelp();
       break;

   case 'q':    // Quit requested
   case 'Q':
       return;

   // ...

   }

And as conditions in if and loops:

   if ( 'q' == the_char || 'Q' == the_char )
   {
   // ...
   }

   do
   {
   // ...
   }
   while ( 'q' != the_char && 'Q' != the_char )


The char type is nothing more than a small integer type, typically of 8-bit byte size. Unlike other C and C++ integer types it has three variations: char, signed char and unsigned char. This is because, unlike say int which is shorthand for signed int, char may be signed or unsigned, depending on the compiler and quite possibly the compiler switches in effect.

Thus as char is a small integer type variables of char type can be used where integer variables are allowed, for example:

   Vector[read_character]

Where read_character is a variable of char type that contains the character read from the input stream.

[Note: The code fragments I show are for exposition only. The code has not been tested or even compiled so may contains typos or other errors. If this is the case then I apologise!].

Oh, strictly speaking assuming that the character set in use is ASCII or based on ASCII may be a dangerous assumption. Not all systems use such a character set, IBM systems traditionally used EBCDIC for example (see http://en.wikipedia.org/wiki/EBCDIC - and note the Criticism and Humour section of the article).

I assume that now you have the method to access the frequency element for each character you are able to increment that value using the C/C++ increment operator ++.

Remember that for C++ pre-increment (++value) is preferred over post-increment (value++) in situations where either could be used. Which form is used is not a concern for built in types as used in this case. However C++ allows increment operators to be defined for user defined classes. In such cases post-increment operations are typically defined in terms of pre-increment operations and are therefore less efficient and so should only be used where really necessary. Hence it is a good idea to get into the habit of using pre-increment where either pre- or post-increment could be used. The same goes for decrement operators by the way.

Finally, although not really part of your question here are some things to consider:

While using a vector is OK for counting the frequency of letters in English/ASCII or similar western scripts, where a single-byte character code is used and thus there are only 256 possibilities (128 if we stick to ASCII, which is a 7-bit encoding, and even less if we exclude control codes) such a method may not be so feasible for larger character sets. The UNICODE character encoding for example (see http://unicode.org/) currently allows the encoding of characters in around 21 bits (so on typical operating system platforms only 32-bit or wider data types can represent any UNICODE character in a single value – the UNICODE UTF-32 encoding). However not all this possible space is used (apparently only around 99100 characters are currently assigned codes). To count the frequency of text that could contain any current UNICODE character would thus require a larger vector, much of which would probably not be used in many cases. If we wished to cater for any future possible UNICODE character we would require an even larger vector.

Alternative UNICODE encodings using 16-bit (UTF-16) or 8-bit (UTF-8) values require multiple values for a single character for some characters. Such character encoding schemes using 8-bit bytes (UNICODE’s UTF-8 byte encoding is by no way the only such encoding) are called multibyte character sets. Handling such multibyte (or multi word) character data is obviously more complex than those that always encode one character in one value, and thus would require a modified scheme to the nice simple one we have here!

C++ has support for wide characters (i.e. more bits than a char) in the wchar_t type. Unlike other integer types there is only one wchar_t type and it is commonly a 16 or 32 bit integer type and may be signed or unsigned. For example Microsoft Windows uses 16-bit unsigned values, thus the Microsoft Visual C++ compiler has an unsigned 16-bit wchar_t type. On the other hand the GNU g++ compiler on my 64-bit Ubuntu Linux system has a signed 32-bit wchar_t type. In addition to the wchar_t type C++ also has wide strings (std::wstring) and wide streams (std::wistream, std::wostream etc.) that read from / write to narrow (char) character data externally and convert the values to/from wide (wchar_t) character data internally. The standard C++ library locales have facet types to convert between narrow and wide character encodings. C++ inherits wide versions of various C-string and character functions from C such as wcslen and wcscpy (include <cwchar>). Note that in C wchar_t is a type alias (typedef) to one of the other integer types e.g.

   typedef usigned short  wchar_t;

However in C++ wchar_t is a proper built in type distinct from other integer types.

Hope this helps and gives you some food for thought.  

C++

All Answers


Answers by Expert:


Ask Experts

Volunteer


Ralph McArdell

Expertise

I am a software developer with more than 15 years C++ experience and over 25 years experience developing a wide variety of applications for Windows NT/2000/XP, UNIX, Linux and other platforms. I can help with basic to advanced C++, C (although I do not write just-C much if at all these days so maybe ask in the C section about purely C matters), software development and many platform specific and system development problems.

Experience

My career started in the mid 1980s working as a batch process operator for the now defunct Inner London Education Authority, working on Prime mini computers. I then moved into the role of Programmer / Analyst, also on the Primes, then into technical support and finally into the micro computing section, using a variety of 16 and 8 bit machines. Following the demise of the ILEA I worked for a small company, now gone, called Hodos. I worked on a part task train simulator using C and the Intel DVI (Digital Video Interactive) - the hardware based predecessor to Indeo. Other projects included a CGI based train simulator (different goals to the first), and various other projects in C and Visual Basic (er, version 1 that is). When Hodos went into receivership I went freelance and finally managed to start working in C++. I initially had contracts working on train simulators (surprise) and multimedia - I worked on many of the Dorling Kindersley CD-ROM titles and wrote the screensaver games for the Wallace and Gromit Cracking Animator CD. My more recent contracts have been more traditionally IT based, working predominately in C++ on MS Windows NT, 2000. XP, Linux and UN*X. These projects have had wide ranging additional skill sets including system analysis and design, databases and SQL in various guises, C#, client server and remoting, cross porting applications between platforms and various client development processes. I have an interest in the development of the C++ core language and libraries and try to keep up with at least some of the papers on the ISO C++ Standard Committee site at http://www.open-std.org/jtc1/sc22/wg21/.

Education/Credentials

©2016 About.com. All rights reserved.