By PETER GRIFFIN
The managers of New Zealand's news websites are becoming wary of having their site's content cherry-picked by foreign website owners employing "web scraping" tools.
Web scraping involves content being stripped from sites using software "robots" that search for information automatically.
Similar "crawler" software is used by search engines to cross reference the activities of web surfers.
Wilson & Horton Interactive's technical manager Patrick Van Rinsvelt said the New Zealand Herald website nzherald.co.nz was bombarded last week by automated robots sent by search engine company Google.
"They sent 15 bots all at one time which slowed down the site. In the space of 45 minutes they took over 700 stories."
Van Rinsvelt said the Google bots regularly visited the site but not in such large numbers. The invasion coincided with the launch of news.google.com, which sources parts of stories from newspapers around the world and from services like CNN, Ananova and ITV.
While the Herald had no qualms with websites linking to Herald stories, Van Rinsvelt said he intended to contact Google and ask them not to send so many bots at one time again.
The bots have also been visiting TVNZ's online portal nzoom.com.
That site's executive producer, Glyn Jones, said there was a fine line between extracting news stories so they could be linked to and the wholesale copying of work.
"Although we don't encourage it, if it's restricted to a headline and the first paragraph, that's the accepted industry guide."
He said some overseas sports news sites had been "grabbing whole stories" and rewriting them.
Local internet providers had been known to scrape the site to create links for their own news service.
A spokesman for Telecom's online portal xtramsn.co.nz, Matt Bostwick, said the site's content was regularly scraped by Google but never to the extent that it slowed access to the site.
While web scraping has been used by purveyors of "spam" email to strip target email addresses, the software has legitimate uses.
A number of companies were offering web aggregation services whereby users could group all their regularly visited web pages on one site that automatically updated the content.
My Netscape
Octopus.com
Yodlee
Robots digging-out the news of the day
AdvertisementAdvertise with NZME.