You are here:

C++/Frequency analysis



I've an cryptographic related assignment where I need to code frequency analysis program that will count frequency of letter as well as frequency of 2-letters occurrence and position. Example:
Cipher text: dacps sdaps


A: 2
B: 0
C: 1
Z: 0

DA: 2
PS: 2

I've successfully code the frequency of letter but I'm having problem in 2-letters frequency which I'm only successful searching for first 2-letters occurrence (da). I'm not sure how to shift and find next 2-letters occurrence till the end of the text as well as to keep track of each position. I'm thinking of using struct to store the word, the count and position but I'm not sure how to make my array struct equal to each 2-letters of the word.

Here is my current code:
#include <iostream>         // Standard Library Function needed for input & output
#include<conio.h>          // Standard Library Function needed for getch
#include <iostream>
using namespace std;

#define N 1000

struct wordRepeat          
  char word[10];
  int count;

int main()
   char msg[N]={};
   char msg2[N]={};
   string temp="";   
   int charCount[26]={0};
   int wordCount=0;
   cout<<"Enter cipher text: "<<endl;
   cin.getline(msg, N);
   strupr(msg); //convert msg to upper case
   string text(msg);

   for(int i=0; i<strlen(msg); i++)
         if(msg[i]>='A' && msg[i]<='Z')
         charCount[msg[i]-65]++;  //'A' = 65
   cout<<endl<<endl<<"Characters Frequency: "<<endl<<endl;
    for (int i= 0; i < 26; i++)
       cout << char(i + 65) << ": " << charCount[i] << endl;
    /*for (int i=0; i<text.length(); i++)
    if (!=' ')
     {temp = temp +;

    for (int i=0; i<temp.length(); i++)
       text = temp;
    cout<<endl<<"Without space: "<<text<<endl;
    cout<<endl<<"Without space: "<<msg2<<endl;*/

   cout<<endl<<"Repetition of Words: "<<endl<<endl;
  int i, j, M=2, N2 = strlen(msg), count = 0, pos = 0;
     for (i = 0; i < N2; i++)
       for (j = 0; j < M; j++)
         if (msg[i+j] != msg[j]) break;
         if (j == M-1)
         { pos=i+1;
   cout<<"Count: "<<count<<" Position: "<<pos;
   system ("pause");
   return 0;

Sorry for my bad English. Thank you for reading this e-mail and hope you can give me hint how to solve these problems.

Thank you...

ANSWER: Hello Nur

I have a suggestion for you to take a different approach if you need only pairs of letters. Use an idea similar to what you did for the single letter frequency count. Create a 2 dimensional array 26x26 of integers.

int pairs[26][26];
Each dimension represents one letter in the pair.

Then have a single loop going from 0 to less than the length of the string - 1 which will examine the character at i and i+1

  for (i = 0; i < len-1; ++i)
       int char1 = toupper(text[i]) - 'A';
       int char2 = toupper(text[i+1]) - 'A';

Think about what to do if one of the characters is not a real letter. For example, what if one of the characters is a space?

I think you know how to continue from this point.

If you really want a general solution, with M being any number, then you need to do more programming. A three dimensional array of 26*26*26 might work but the scheme will quickly break down for larger M as the array becomes too large. You would need a wordRepeat structure like you have, and you would need to store the wordRepeat structures in a set or in a vector and make sure that duplicate words are not saved. Vectors are easiest to learn if you are not familiar with C++ containers.

The pseudo code would be something like this:

for (i = 0; i < len - M; ++i)
  string word = "";
  for(j = 0; j < M ; ++j)
    word += text[i+j];
  search for the word in the vector or other container.
  If found increment the count
  if not found store the word with a count of 1

Let me know if you need more help.

Best regards

---------- FOLLOW-UP ----------

QUESTION: I've got problem when entering cipher text such as 'aaaa'
where it count occurrence of aa: 3 instead of 2.

Hello Nur.

I guess you would need to remember the last 2 character sequence which was recorded. If the current 2 character sequence is not the same as the last, then you do an increment of the count. If the current 2 character sequence is the same as the last, then don't increment the count, but also clear the last sequence. I recommend that you use a string to hold the last sequence.

If that is not enough of a hint, then read the next part.

Looking at aaaa (a1a2a3a4)
The first time through the loop you have a1a2, and nothing in lastSequence, so increment for aa, and set lastSequence to aa.

The second time through you have a2a3, but lastSequence already has aa, so a2a3 is ignored. The lastSequence is then cleared.

The third time through the loop you have a3a4, and it is just like the first time through.

This would work for an M character long sequence too.

Whenever the count is not incremented, lastSequence is cleared.
Whenever the count is incremented, lastSequence is set.

Can you do the implementation?


All Answers

Answers by Expert:

Ask Experts




No longer taking questions.


No longer taking questions.

No longer taking questions.

©2016 All rights reserved.