By PETER SINCLAIR
Just after last week's column about the ProQuest Historical Newspapers Project came a sort of corollary to it - confirmation that the web is a great deal larger than anyone suspected; also murkier and less accessible.
Deep Web, as it's been dubbed, consists of all the iceberg except its tip - the multimedia files, indexes, bibliographies and vast databases hidden from the average search-engine. Add the enormous volume of news that flits in and out of focus before the engines can get a handle on much of it, and you've got a situation where only a fraction of the available material is easily retrievable.
The figures: according to the best guesstimate of search company BrightPlanet, less than 1 per cent - up to 500 billion pieces - of material that exists on the web can be found and catalogued by traditional engines. The rest constitutes a Deep Web up to 500 times larger, not including intranets or firewalled content.
Search engines routinely ignore Adobe's PDF (portable document format) files, for instance, mostly trawling only HTML (hypertext markup language) and text pages. Yet much of the most useful information for the business and academic communities is to be found in them.
A recent engine from Adobe itself now summarises over a million documents (in web terms, just a few). Don't worry, clicking doesn't automatically load the document, for some PDFs are huge; instead, you're shown a summary before you download or view on Acrobat Reader 4.0 (free download onsite).
One solution is to address the problem of retrieval differently - by drilling deeper rather than casting the net wider.
Danny Sullivan, editor of Search Engine Watch, predicts engines of the future which will concentrate only on specialised areas - legal, medical, scientific, whatever.
The New York Times quotes the specialty product-search engine mySimon, which does shopping, and FindLaw (legal information) as harbingers of the trend.
Or there's Moreover, which winnows nearly 2000 of the most-visited online news websites, attracting 340,000 visitors in December.
The great challenge to the present generation of search engines, says Sullivan, is specificity - that is, the ability to automatically focus a search and detect what deep databases are relevant to it, before the searcher senses this needs to be done.
Until they do, a visit to AltaVista for the average surfer will remain a fairly hit - or, quite often, miss - affair.
BOOKMARKS
MOST TAXING: WestpacTrust
WestpacTrust takes the lead in letting you open your veins to the taxman online - until now New Zealand taxes have not been payable on the net. There are 27 different types of payment available - one of them, surely, has to be slightly less excruciating than the rest? General manager online services Phil Doak says the system is especially suitable for self-employed people with shifting cashflows. A user-friendly wizard guides technobunnies through the set-up process and the trauma of the actual payment itself.
Advisory: the Ides of March.
BEST USE OF THE WEB: An Art Exhibition
Of all the arts, the big winner on the web is - or should be - painting. This local site, featuring the work of Darlene Te Young, shows how it can - and should be - done. As the works scroll by from left to right, the computer screen becomes a virtual gallery for the strong yet tranquil images of koru, fronds and waves, their soft, stylised foam pleasingly distinct from, say, the fanged surf of Japan. Other works pay tribute to Mark Rothko, and their thin, luminous washes of colour are stunning - boardroom works at a lunchroom price.
Advisory: those wishing to show this columnist their appreciation will buy him Ocean.
LIES, DAMNED LIES
Christmas Rush: Ernst & Young say online sales in November-December climbed to about $11 billion from $7.5 billion in 1999. By 2005, they guesstimate, the web will have grabbed over 10 per cent of global sales in categories like health and beauty, clothing and toys; plus a whopping 25 per cent of the book, music, software and consumer electronics market (see: www.ey.com).
Kidnapped Surfers: US Christmas shoppers may have enjoyed buying on the net, but the rest of the world was not so keen - some nations even have an intense fear of logging on at all, says a survey by Research International United Kingdom. South American surfers are said to be afraid of sharing personal details for fear of being kidnapped; Germans are scared credit cards will lead them into debt; Spaniards distrust websites without a Spanish translation, while Russians are more inclined to trust sites in English than in Russian.
Attack! E-mail viruses soared 300 per cent last year. Virus-monitoring company MessageLabs says it stopped over 155,000 by the end of November, one every three minutes.
Links:
Adobe
Search Engine Watch
mySimon
FindLaw
Moreover
WestpacTrust
An Art Exhibition
Mark Rothko
www.ey.com
Research International
MessageLabs
E-mail: petersinclair@email.com
<i>Peter Sinclair:</i> Search engines skimming over the dark Deep Web
AdvertisementAdvertise with NZME.