C++/c++

Advertisement


Question
QUESTION: could u pls help mi.i realli hv no idea on hw to start.i hope u could guide mi on tis program.i will be greatly appericated for all the help frm u.i am given a large text file, sample.txt, which i will need to index the words stored in them.i need to separate out the words in the text file and index them according to their frequency. The program shall count the number of unique words and store them in an appropriate Standard Template Library container.
The words are to be normalized to lower-case so that do not have to deal with case-sensitivity. Thereafter, the program
shall generate an output file, index.txt, which consists of 50% of the least common words in the following format:

example:
Total unique words: 4590
Word1,1
Word2,1
Word3,2

The first line will indicate the total number of words, followed by the list of least frequent words and their counts.
The program will ignore the following:
-Punctuations
-Numerical numbers (1, 2, etc., but ‘one’, ‘two’ are to be treated as words)
-Common words (e.g. a, an, and, if, the etc) that are stored in text file, common.txt.
i hv to select the most appropriate Standard Template Library container to develop this application.

ANSWER: Hello Annabelle. Do you have a sister named Lisa ? I'll help you but I don't have too much time at the moment. What STL containers do you know and which one do YOU think would be best to store your index. Think about what kind of data you need to store. Tell me, in english, not in C, the steps your program needs to do. In other words, tell me in english what functions you think you need. Output to the screen is one such function.

Best regards.
Zlatko


---------- FOLLOW-UP ----------

QUESTION: No,i dun hv.tis is the code so far.pls could u help mi.greatly appericated for your help.hw should i do to display it in alphabetical order,and all words in lower case.i hv remove the Numerical number,but dun knw y it still display a counting of 1 at the side.

example text contain:
I have a dog. His name is lucky. He is only 2 years old.
He is very cute.

output:
Text file name : example.txt

Total unique words: 15//wrong counting
         1//wrong
         He         2
         His         1
         I         1
         a         1
         cute         1
         dog         1
         have         1
         is         3
        lucky         1
         name         1
         old         1
         only         1
         very         1
        years         1

code:
#include <iostream>
#include <iomanip>
#include <sstream>
#include <fstream>
#include <map>
#include <cctype>


using namespace std;

map<string,int> freq;
map<string,int>::iterator i;
void removePunct(string&);
bool isLetter (char);



int main(int argc, char *argv[])
{
   string file,line,word;
   stringstream ss;

   cout << "Text file name : ";
   getline(cin,file);
   ifstream myfile(file.c_str());
   if (myfile.is_open())
  {
      while (getline(myfile,line))
      {
        
        ss.str(line);
      while (ss.good())
     {
         if (ss >> word)
        {
         
         removePunct(word);
         if ((i = freq.find(word)) == freq.end())
         {
         freq.insert(pair<string,int>(word,1));
         } else
         {
         freq[word]++;
         }
         
         }
       }
       ss.clear();
   }
   }else
    {
       cout << "error opening " << file << endl;
     }
       cout << endl;
     
  cout << "Total unique words: " << freq.size() << endl;
   for (i = freq.begin();i != freq.end(); ++i)
  {
    cout << setw(14) << i->first;
    cout << setw(10) << i->second << endl;
   }
  
   
   return(0);
}

  

void removePunct(string &s)
{
   string::iterator i = s.begin();
   while (i != s.end())
  {
       if ((ispunct(*i)) || (isdigit(*i)))
       {
         s.erase(i);
     }
       else
     {  
         ++i;
     
     }
  }
}






ANSWER: Hello Annabelle

Your choice of a map for the STL container was excellent.

You need to add a function to convert a string to lowercase. You can use this:

void lowerCase(string& s)
{
   string::iterator i = s.begin();
   while (i != s.end())
   {
       *i = tolower(*i);
       ++i;
   }
}

The tolower function is a standard C function defined in <ctype.h>

After you remove the punctuation and digits from a word, you may have an empty string. If the string is empty, you should not add it to your map. The word processing code should look like this:

         if (ss >> word)
         {
         removePunct(word);
         if (!word.empty())
         {
         lowerCase(word);
         if ((i = freq.find(word)) == freq.end())
         {
         freq.insert(pair<string,int>(word,1));
         } else
         {
         freq[word]++;
         }
         }
         }



Good Work!

Best regards
Zlatko

---------- FOLLOW-UP ----------

QUESTION: thankz for helping mi.i hv problem on tis part where it says, Thereafter, the program shall generate an output file, index.txt, which consists of 50% of the least common words.
the program will hv to ignore the following:
-Common words (e.g. a, an, and, if, the etc) that are stored in text file, common.txt.hw should i do tis part,i realli hv no idea.hope u could help mi.

example common.txt contains:
a
all
also
an
and
are


Answer
Hello Annabelle

I think the common words are supposed to be ignored, just like numbers and punctuation. So, you need to read the common words file into a STL set. A set is like a map, but easier. Then you need to check if the word you want to put into the map is already in the set. If it is in the set, don't put it into the map. Set has a find method just like map. If the set::find returns set::end, then the word is not on the set. You can check the set after converting to lower case and removing the punctuation.

You already know how to open a file for reading. You can make the set global, just like your map, and initialize it in main before working on your map.

Don't forget to change all the words in common.txt into lowercase before putting them into the set. Your instructor may try to trick you there.

Bye for now.

C++

All Answers


Answers by Expert:


Ask Experts

Volunteer


Zlatko

Expertise

No longer taking questions.

Experience

No longer taking questions.

Education/Credentials
No longer taking questions.

©2016 About.com. All rights reserved.