C++/c++ dictionary implementation
I have to code a dictionary in c++ which behaves like this: reads a text (from file) ,gives each word an index,(due to the frequency of the word), selects the n most frequent words, and writes the text again (in file) with the other words substituted with the special <UNK> token (unknown).
would you guide me how to write this in c++? I know the algorithm but I am not that familiar with the syntax
I don't know what your algorithm is, but I would do it like this.
Create a structure to hold your word and its count, like this:
word = text;
count = 1;
The count is set to 1 when the structure is first created with some text.
Hold the words on a word list vector. Declare it like this:
A vector is just like an array, except that it can grow to accept more elements. You add things to a vector using its push_back method. I will show you that later.
You will also need to find the word on the wordList vector. You can do this with a simple linear search, or you can use a map to associate a text string with an index into the vector. I will not show you how to use the map. You need to create a function to find a particular word in the vector, and return its index.
Here is an example
int findWord(std::string text)
for(int ix = 0; ix < wordList.size(); ++ix)
if (text == wordList[ix].word)
The function returns -1 if the text was not found on the wordList. As I said, the wordList is a vector. Notice that it is accessed with , just like a regular array is. That will be important for you to know, because you will need to write a sort routine to sort the vector, and you will need to access the individual elements in order to do that.
Now you need to open the file, read the words, and put them into the vector.
To access the file, use the ifstream object.
The read a line from the file, use getline
Once you have a word from the file, see if it is on the vector using the findWord function I showed you. If it is on the vector, simply increment its count. If it is not on the vector, put it on. Here is sample code to do that.
int index = findWord(text);
if (index >= 0)
Next you will have to sort the vector based on counts. I'm sure you know how to write a sort routine. I have shown you enough C++ syntax now that you should be able to translate a sort algorithm into C++. Of course, you can use the standard C++ library sort routine, but your instructor might want you to do the work yourself.
Once you have the vector sorted, just print out the first n elements, whatever n is.
If you need to write them to a file, use the ofstream object.
That is plenty of guidance for you. All the links I've given you contain sample code for you to read. Now it is up to you to do the rest of the work, but if there is something I wrote that you don't understand, please ask again.