You are here:

C++/Count the number of occurrences of a word in string

Advertisement


Question
Hi, Im trying to learn C++. Im a beginner. I have written a code to count the number of occurrences of a letter in a string. But Im not sure how to count the number of occurrences of a word in string.

For example, this is my string:
"Hello there, take care and be happy and smile"

I use this code to count the letter 't'. and the output is 2.

int countVowels(char str[])
{
    int count = 0, k = 0;
    while (str[k] != '\0')
    {
         if (str[k] == 't')
         count++;
         k++;
    }
    return count;
}

How do I count how many 'and's in the string? I used the same code and change the 't' to 'and' but  the output is 0.

thank you in advance!

Answer
Well I would say for a start that you are learning C rather than C++ - or those parts of C++ inherited from C.

Secondly, if you are trying to count vowels why are you in fact counting lowercase 't'? Maybe the function should be called countLowerTees or something?

Now counting words can be tricky - I mean how do you define a word?

For example:

Count the occurrences of "and" in:

"And123and567and"

Should the result be 0, 2 or 3?

How about:

"hand and band"

Or:

"There were blue and green and, to make it hard we used punctuation"

So first decide how you wish to break your string into words, break it into words, then compare a each word against your required word, for which you can use the C (and C++) standard library function strcmp. Here is a code snippet modified from your original posting:

    char * pWord = GetWord( str );
    while ( pWord != NULL )
    {
         if ( 0==strcmp( pWord, "and") )
         {
         ++count;
         }
         pWord = GetWord( NULL );
    }

strcmp returns 0 if both strings match (meaning no difference). It returns a value < 0 if the first string is lexicographically less than the second and value > 0  if the first string is lexicographically greater than the second.

Now there is a second function that can do the job of GetWord in the above example - it is called strtok and it splits a string into tokens (like words) broken on any of a set of characters provided as a second argument, so we could write GetWord like so:

       char * GetWord( char * str )
       {
         return strtok( str, " ,;:!.?" );
       }

Notice that str is passes as a non- constant pointer. This indicates that strtok (and therefore GetWord) will modify the passed in string. In fact it writes '\0' characters to the string to temporarily terminate each word at the right point.

I specified a likely selection of characters that may follow the end of a word - such as space, comma, full stop etc - as the second word ending (or delimiting) set of characters.

Now these string functions are declared in a standard header file, called string.h for C or cstring (no .h) for C++. So you have to include it in your program:

       #include <cstring>

In standard C++ the names of C library functions are in both the global namespace and the namespace std (like most of the C++ only standard library names). This will probably not mean much to you yet but may be helpful later!

Now the next question is why are you using C-style strings - meaning arrays of zero terminated character arrays? The C++ standard include a string class std::string which relieves you of many tedious details, although the facilities of strtok are not one of them ho hum such is life.

Here is a simple example code fragment using the C++ string class:

       #include <string> // C++ library include for std::string

       // ...

       std::string str
         ("Hello there, take care and be happy and smile");
       std::string::size_type and_pos( 0 );
       int count(0);
       while ( and_pos!=std::string::npos )
       {
         and_pos = str.find("and", and_pos );
         if ( and_pos != std::string::npos )
         {
         ++count;
         and_pos += 3; // start next search after this "and"
         }
       }

Here I have hard coded "and" and its length of 3 - not a good idea. However we could write a function like so:

int WordOccurrenceCount
( std::string const & str, std::string const & word )
{
       int count(0);
       std::string::size_type word_pos( 0 );
       while ( word_pos!=std::string::npos )
       {
         word_pos = str.find(word, word_pos );
         if ( word_pos != std::string::npos )
         {
         ++count;

         // start next search after this word
         word_pos += word.length();
         }
       }
       
       return count;
}

Note that this is fairly easy to at least partially understand with operations on the string objects such as find and length. However this simple example does loose the ability to differentiate and as a whole word or and as part of another word as in hand...

You will have noticed several things about my style of writing C++ - initialising objects using initialiser syntax (i.e. int a(1); instead of int a = 1;). Preferring pre-increment (++a) (and decrement) to post increment (a++) (and decrement). And of course the use of fully qualified names - that is, ones using :: to specify the parts of the name in the various namespaces. Namespaces are created by using namespace or when defining a class - so the C++ string class is a name in the std namespace and it contains the names npos and size_type - hence std::string::npos and std::string:size_type.

In fact the standard C++ library contains a great many useful things such as collection classes for various types of collections such as lists and vectors, algorithms to perform operations on such collections such as sort or find, and input / output streams to read and write to files, consoles and other things such as strings. I suggest you get acquainted with it as well as the core language. You might try Accelerated C++ by Koenig and Moo - however I have heard that this book is better for people who have some experience with programming already so maybe a new book called "You Can Do It - A Beginner's Introduction to Computer Programming" by Francis Glassborow and Roberta Allen.

As to other sources of information. A good place to start is the ACCU site at http://www.accu.org - they have a book reviews section and resource links as well as mentored development areas for members.

You might like to look at the C++ links at:

http://www.fz-juelich.de/zam/cxx/extern.html#ezine

However I cannot vouch for the quality of anything you find there - some I know of such as the "The On-Line C++ FAQ" by Marshall Cline on the parashift site, and indeed is the first place you are directed to before posting a question on the comp.lang.c++.moderated newsgroup.  

C++

All Answers


Answers by Expert:


Ask Experts

Volunteer


Ralph McArdell

Expertise

I am a software developer with more than 15 years C++ experience and over 25 years experience developing a wide variety of applications for Windows NT/2000/XP, UNIX, Linux and other platforms. I can help with basic to advanced C++, C (although I do not write just-C much if at all these days so maybe ask in the C section about purely C matters), software development and many platform specific and system development problems.

Experience

My career started in the mid 1980s working as a batch process operator for the now defunct Inner London Education Authority, working on Prime mini computers. I then moved into the role of Programmer / Analyst, also on the Primes, then into technical support and finally into the micro computing section, using a variety of 16 and 8 bit machines. Following the demise of the ILEA I worked for a small company, now gone, called Hodos. I worked on a part task train simulator using C and the Intel DVI (Digital Video Interactive) - the hardware based predecessor to Indeo. Other projects included a CGI based train simulator (different goals to the first), and various other projects in C and Visual Basic (er, version 1 that is). When Hodos went into receivership I went freelance and finally managed to start working in C++. I initially had contracts working on train simulators (surprise) and multimedia - I worked on many of the Dorling Kindersley CD-ROM titles and wrote the screensaver games for the Wallace and Gromit Cracking Animator CD. My more recent contracts have been more traditionally IT based, working predominately in C++ on MS Windows NT, 2000. XP, Linux and UN*X. These projects have had wide ranging additional skill sets including system analysis and design, databases and SQL in various guises, C#, client server and remoting, cross porting applications between platforms and various client development processes. I have an interest in the development of the C++ core language and libraries and try to keep up with at least some of the papers on the ISO C++ Standard Committee site at http://www.open-std.org/jtc1/sc22/wg21/.

Education/Credentials

©2016 About.com. All rights reserved.