How are you?
I'm trying to write a program that, among other things, needs to validate a website. Basically I need to know if a user-entered website exists or not. I don't know where to start. I was wondering if you could give me a lead with this. How to do this using C/C++ Thank you so much!
First you need to decide what you mean by "web site exists".
Would you need to validate the form of the URL to start with (probably)?
If so then what URL schemes would you accept? (http, https, ftp etc?)
Does the domain having a DNS entry suffice?
What if the user entered raw IP address values?
Would pinging that address suffice?
Or would actually trying to obtain the page data from the site be what is required?
And in all scenarios what happens if a facility is not available - e.g. your Internet access is down, or the site or DNS server is temporarily down or there is some problem in between such that that portion of the Internet is unavailable? In thee cases the user may well have entered a valid URL - its just currently unreachable.
You will have to decide these sorts of things yourself.
So let us look at how you might achieve some of these things using C++.
First off, C++ has no built in support for networking at all.
Nor does it currently have any support for pattern matching as you might use to validate the form of an entered URL. However in this case there is some hope. The C++ TR1 library update to the current C++ standard (see http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2009/n2914.pdf
) includes regular expression support which of course can be used to create regular expressions that can be used to validate a URL string. Unfortunately not many compilers currently ship with TR1 library support - I currently only know of the SP1 update to Microsoft Visual C++ 2008 (and presumably later editions of MSVC++). However support is available in the Boost libraries, see:
For networking support we have to look elsewhere.
You could use raw operating system socket and networking APIs such as gethostbyname (see http://linux.die.net/man/3/gethostbyname
(VS.85).aspx for example) to lookup a host name and obtain its address. Note that this would imply being able to extract the host name part of a URL from the whole URL string (regular expressions again).
You could use third party networking libraries such as ACE (http://www.cs.wustl.edu/~schmidt/ACE.html
) or the Boost asio library (http://www.boost.org/doc/libs/1_39_0/doc/html/boost_asio.html
However these are all somewhat low level if you wish to perform say a HTTP request to the URL and see if you get any (valid) data back. In these cases maybe something like the C cURL library - libcurl - (see http://curl.haxx.se/
), as wrapped and used by PHP.
One library, the PoCo C++ libraries (see http://pocoproject.org/
), seem particularly interesting for this sort of problem. They support many useful facilities including regular expressions, URI/UTLs and HTTP. Looking at some of the Poco library samples might be instructive - see the description document http://pocoproject.org/documentation/PoCoSamples.pdf
- the URI Sample (section 2.1.15) and HTTPGet Sample (section 2.4.4) in particular look relevant here.
Note that for the Boost and PoCo libraries you will require an up to date C++ compiler and standard library implementation - e.g.Microsoft Visual C++ 7.1 (with service packs) or later, GNU g++ 3.4.3 or later (depends on operating system which versions of g++ work)
So your course of action would be:
- decide exactly what is it you need to do
- look around for useful libraries/facilities to perform as much of the heavy lifting as possible.
- install these libraries for your system / compiler etc.
- use the facilities of these libraries to get the job done
If you do not like or want or are unable to use the libraries mentioned here then you can use Internet search sites (Google, Yahoo, Bing etc.) to locate such libraries and API information. One site that is trying to create a comprehensive database of such libraries is at http://c-plusplus.org/
- so you may like to poke around there to start with.
Hope this gives you some pointers at least.