KEY POINTS:
We all know that the web is big, but I don't think any of us had any idea it was this big.
Google first indexed the web in 1998, by that time there were already had 26 million pages, and by 2000 the Google index reached the one billion mark.
Over the last eight years we've heard of a few big numbers being bounced around but until recently never really new how quickly the web was actually growing. So it's no wonder that Google search engineers stopped in awe when their systems that process links to find new content hit a milestone: 1 trillion (as in 1,000,000,000,000) unique URLs on the web at once!
How did they find all those pages? Well they start at a set of well-connected initial pages and follow each of their links to new pages. Then they follow the links on those new pages to even more pages and so on, until they have a huge list of links.
In fact, Google engineers have found even more than 1 trillion individual links, but not all of them lead to unique web pages. Many pages have multiple URLs with exactly the same content or URLs that are auto-generated copies of each other.
Even after removing those exact duplicates, there were still a trillion unique URLs, and the number of individual web pages out there is growing by several billion pages per day.
Google's systems that process this data have come a long way since the first set of web data was processed to answer queries.
Back then, they did everything in batches: one workstation could compute the PageRank on 26 million pages in a couple of hours, and that set of pages would be used as Google's index for a fixed period of time.
Today, Google downloads the web continuously, collecting updated page information and re-processing the entire web-link graph several times per day. This graph of one trillion URLs is similar to a map made up of one trillion intersections.
So multiple times every day, Google does the computational equivalent of fully exploring every intersection of every road in the United States. Except it'd be a map about 50,000 times as big as the U.S with 50,000 times as many roads and intersections just to prepare to answer one important question: your next Google search.
- STARTUP