C/Help needed to write a program in c
Expert: Abhishek Kumar - 11/18/2011
QuestionHi,
I need to write a c program to remove duplicates from a file.
I have a wordlist.txt file with around 700k words...there are lots of duplicates in the file... and i need to write a program in c to remove the duplicates...
I have just started learning c... I have written just the starting of the program by referencing books and websites... dont know if its correct... plz take a look at the code below and tell me how to write this program... thanks
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
int main(int argc, char *argv[])
{
FILE *fp;
long lfilelen; /* Length of file */
char *cFile; /* Dynamically allocated buffer ( entire file ) */
if(argc != 2)
{
syntax();
return 1;
}
fp = fopen(argv[1], "r"); /* Open file in text mode */
if(fp == NULL) /* Could not open file */
{
printf("Error opening %s: %s (%u)
", argv[1], strerror(errno), errno);
return 1;
}
fseek(fp, 0L, SEEK_END); /* Position to end of file */
lfilelen = ftell(fp); /* Get file length */
rewind(fp); /* Back to start of file */
cFile = calloc(lfilelen + 1, sizeof(char));
if(cFile == NULL)
{
printf("
Insufficient memory to read file
");
return 0;
}
fread(cFile, lfilelen, 1, fp); /* Read entire file into cFile */
}
AnswerHi Nildeep,
Till now you program looks good. In the end you are copying the entire file into the buffer. Now you just have to think about how to check for duplication and how you can remove it.
"If by duplication you mean same word occurring more than once". To, start with what you can do is, write a n-squre algo using 2 loops and check whether each word is present more than once or not. If it is present in that case you can over write the next time you see that word.(you can replace the word by spaces).
In this way you can remove the duplication.
First, try doing this later we can go over other algo's to make it run better.
Regards,
Abhishek Kumar