Home > Blogs > How the Internet Works: The Deep Web

The Deep Web. The Deepnet. The Invisible Web. The Hidden Web. 

Maybe you have heard of the Deep Web. Maybe you even know how to access the Deep Web. 

Chances are though, you've never heard of the Deep Web and you have no idea how to access it. The Deep Web sounds mysterious, elusive and somewhat dangerous. By all accounts, it is all these things. 

So, what is the deep web? How does the deep web work? How do you access it?

In this installment of "How the Internet Works", we tackle the mysterious Deepnet.   

The Surface Net

To understand the Deepnet, we first have to discuss how surface Internet web pages are kept track of. The surface Internet, is the public facing Internet which is actively crawled, indexed, kept track of and accessed.

The process of being included in the surface net involves web search engines, Google for all intents and purposes, indexing web pages. Google indexes web pages by running large scale algorithms called spiders. These spiders, in a process known as spidering, crawl new web pages, archive their location, make records of their content and create a working history of all outbound and inbound links of the page. The process of spidering (also known as crawling) is Google working to make a web page actively searchable, seek-able and find-able. If you think of the surface net as a public library, each web page has a corresponding dewy decimal point value assigned to it. Spiders assign that number so unique traffic can find its destination.

Public internet traffic works through the surface net. Every time you access Facebook.com, Wikipedia.org, InformIT.com or any other publicly accessible site you can think of, you are accessing the surface web.

Interestingly enough, the surface web (also known as the visible web) comprises only 4% of all Internet content. Much like an iceberg, the surface web represents a small tip of total volume.

Understanding the Content of The Deep Web

At this point you might be saying to yourself, if only 4% of all Internet content resides within the surface net, what content is held within the Deepnet and more importantly, why is there a need for deep web content?

Deep Web Content

By its very name, Deep Web content brings adjectives like nefarious, dark, mysterious, bad and malevolent to mind. While it is true the Hidden Web is used to hold/share lurid, illegal and compromising content, the name doesn't apply to all unsearchable web pages held within. From Motherboard:

"The deep web is not all fun and games (weird, illegal, or otherwise). It’s full of databases of information from the likes of the US National Oceanic and Atmospheric Administration, JSTOR, NASA, and the Patent and Trademark Office. There are also lots of Intranets—internal networks for companies and universities—that mostly contain dull personnel information."
"Then there’s a small corner of the deep web called Tor, short for The Onion Routing project, which was initially built by the US Naval Research Laboratory as a way to communicate online anonymously. This, of course, is where the notorious Silk Road and other deep web black markets come in."

Part of the Invisible Web is filled with dark, black market content and part of it is filled with private data which requires private network access. It isn't all nefarious yet, it does raise a question. Why, if all the content isn't considered black market, does the Hidden Web exist in such a staggering quantity?

The Invisible Web and Security

It should come as no surprise yet the Internet isn't a wholly safe place for data storage. With major Internet companies and brands being hacked nearly on a daily/weekly basis, we have reached a point wherein the public understands the Internet - and certainly the Cloud - contains major security faults. First reason for using the Deep Web: harder to find content is content which is less likely to be hacked, stolen and leaked.

Some people, some companies and some organizations, due to the information they deal in, have a need to hide internally held content from outside sources. A good example of this: a new blue print for an upcoming NASA/JPL engine which NASA/JPL wants to keep secret yet share with one another through private network access. A great mistake about the Deep Web is that all content held within is nefarious. This isn't the case. A lot of the content is held within the Deep Web because it lends itself to secrecy. To enhanced web security. Second reason for using the Deep Web: keeping useful and legal content away from the eyes of the public.

Some people simply do not trust the powers to be. A lot of Deep Web content is kept and accessed only via the Deep Web because the owner of that content does not trust government and wholly believes if the content were placed on the visible web, it would be used a weapon. Third reason for using the Deep Web: some Invisible Web content is intentionally placed to thwart government intrusion and snooping.

The Deep Web is full of good useful content. It is also full of illegal, black market content which for obvious reasons, users want kept secret. Fourth reason for using the Deep Web: to post/share/buy illegal content without stated definite repercussions.

The bottom line is the Deep Web holds roughly 500 times the amount of content than the public Internet does. No one knows how big the Deep Web is. We only know estimates. Thus estimates show it to be massive.

How Does the Deep Web Work?

Setting up a Deep Web page is as simple as deploying a website which requires password access to view content. The same concept applies for individual web pages carrying encrypted access held within a larger public site. Web crawlers can not index encrypted sites or pages.

Another form of deep web page content is unlisted content a site uses for regular site maintenance purposes. These pages, which contain no outbound or inbound links, hold content which spiders can not index.

For a lot of users, the Deep Web is added to by simply blocking index access to crawling spiders. The content doesn't have to be nefarious or black market. The content could be an older news story archived on a publications website which you can only access by sorting through onsite content and searching internally. Some content, older content is key here, falls into the deep web because it lives on internal site archives as opposed to indexed surface web content.

On the other hand, the growth of the Dark Web runs parallel with Tor.

Tor - Access to The Darkest Sections of the Deep Web

Tor, or The Onion Router, is a software which can be used to access the Deepnet. Tor installs directly into your web browsers and establishes the connections needed to access Deep Web sites. The above image shows how a Tor network works. Tor enables anonymous access to the Deepnet by driving connections through web servers and proxies located around the world masked with encryption. The basic idea is to drive traffic through multiple encrypted web servers masking all parties between. It is only at the end of the tunnel that the traffic flows through an unencrypted access link.

The Visible Web functions off of commonly known domains: .com, .org, .net. Tor allows for access to the Deep Web with page domains .onion. For black market Deep Web markets like the now defunct Silk Road or the still operating Bluesky Marketplace, .onion is the domain.

An offshoot of this conversation is an online currency you might have heard of - Bitcoin. To make black market purchases, users utilize Bitcoin. For the purposes of this content, Bitcoin is the black market currency of choice. While Bitcoin has beneficial purposes, the story of the currency will not be told here.

How Did the Tor Begin?

The Onion Network began as a U.S. government project. Originally built to allow government whistle blowers and political refugees anonymous routes of communication and information sharing, Tor quickly proved so effective at providing anonymity that criminal elements took up the mantle. As you might be able to tell, the creation of Tor by government entities leading to the persecution of Tor users by government entities creates an interesting legal dynamic.

Returning to the Surface

Is the Deep Web wholly nefarious? No. Does the Deepnet hold massive potential for good? Without a doubt. Will we ever know how large the Hidden Web is? Unless web crawlers and spiders advance in capabilities to index deep pages, estimates will remain. Does the Invisible Web do an excellent job in masking traffic through Tor? Yes. Now, the most important question: should you utilize a Tor network to gain access to the Deep Web? That choice is up to you.

Remember, if you like this content and want to chat about it, you can reach me at the following social spaces:

  1. Twitter: @bleibowi
  2. Instagram: @byalie
  3. Linkedin: Brad Yale
  4. Google +: Brad Yale

Lastly, to check out previous installments of "How the Internet Works", follow the links below:

How the Internet Works: A Call for Personal Security

How the Internet Works: The Layers of the Cloud

How the Internet Works: TCP/IP, Trace Routes and Hops