Deep web, (or deep Internet) is the part of the Internet that normal searchers like GOOGLE can not visit, which is why the information contained in it is unclassified and difficult to access if you do not know the means to get there .
Despite its name, a bit gruesome and horror movie, it is simply that part of the information contained on the web, to which search engines can not reach , regardless of the normal, simple, confidential or complex information they contain. .
And why can not Google access it? - Because the information it contains is created at the moment someone makes a query, and it is destroyed when that query is finished.
Suppose a web page reports weather. A consultant enters the city where you want to know the time and the moment you want that information, suppose you ask for the time of Sevill, the day after tomorrow at 6 in the afternoon and Murcia tomorrow morning.
With the first request, the website makes a prediction for Seville and displays it on the screen. and once you finish the query and enter the second one, the prediction of Sevilla replaced by a of Murcia is destroyed.
The search engine spiders are trained to go through that web page and classify its content, but when they visit that website, they will not ask for the time of a specific place and a specific time, so the page does not generate an answer and the search engine will not be able to classify it. .
As you can see, it is not an esoteric or secretive topic that is hidden, it is simply that the search engine is unable to make the appropriate consultation. That part of the web will not be classified and nobody can ask Google for information about Sevilla and Murcia and obtain a result based on that web page, because it simply will not have it. These webs are dynamic and their content depends on parameters that are entered. The search engine spiders are not designed to travel dynamic pages but on the contrary static pages.
This type of reasons is 90% of the Deep Web. Try to imagine the multitude of pages that show a specific content based on some data that you enter, information of commercial flights, product information in online stores, consultations to specialized wikipedia encyclopedias, ....
Of course, there is another part of the web that is hidden from the search engines in a premeditated way for one reason or another. The reasons can be several, from the fulfillment of the Law of the Right to Oblivion, or the desire to keep information for a certain group of people. but not open to the public in general, Suppose for example that the alumni association of your school maintains a record of the ex-students, but only accessible to those belonging to the association.
Another source, for example, the balances of your account given by the bank, only creates it and even prints it at the request of the owner of the account, but logically it will be secret to Google search engines. As you can see there may be many and different reasons to keep the saber of the robots from holding information.
To avoid the work of the search engine there is a specific mandate that is put on the web so that the spiders do not investigate how to put the meta tag in the <HEAD> section of the template of your blog:
- <META NAME = "ROBOTS" CONTENT = "NOINDEX, NOFOLLOW">
to much more sophisticated methods, such as saving the contents with some type of encryption, so that it is only understandable to those who know how to decrypt it.
Personally I believe that this Deep web should not be more than 1% of the total Deep Web.