By SIMON COLLINS, science reporter
Google says it is superseding the dictionary to cope with users who have come up with 800 misspellings so far of "Britney Spears".
The world's biggest search engine, founded at California's Stanford University just nine years ago, is expected to raise about US$3.3 billion ($5 billion) when it floats on Nasdaq in the next few days.
Senior Google scientist Dr Mehran Sahami told the Pacific Rim International Conference on Artificial Intelligence in Auckland last week that the company's 2200 employees were constantly battling to cope with users' needs and to fight crafty website designers who packed their pages with whole dictionaries of keywords.
"There are commercial interests who try to spam or provide misinformation," he said.
"This creates an adversarial situation with the search engine, which is trying to meet the needs of the people out there that are trying to actually get their results back."
Sahami could not answer financial questions at the conference, hosted by Auckland University of Technology, as Google is in its "quiet period" before its Nasdaq float. It began taking bids on its website on July 31 to gauge an initial price for the 24.6 million shares it is offering.
But he said the company had indexed more than six billion web documents in 35 languages, including more than one billion images.
Traditional search engines simply searched for the words that users typed into their terminals, using dictionaries of about 100,000 words to correct misspellings.
Google was shifting to a web-based "contextual lexicon" which automatically included common words, such as new words like "Sars" and names like "Britney Spears".
"'Britney' is different from the standard spellings," Sahami said. "We have more than 800 variants of misspellings of Britney Spears."
He said some website designers tried to get their sites to the top of search engine hitlists by "cloaking" long lists of keywords at the bottom of their web pages, which were invisible to readers but were picked up by search engines. "Some would take an entire dictionary at the bottom of their web pages so a web page has the potential to match any single query in that language."
Google was developing more sophisticated techniques to distinguish "natural language" from long lists of keywords.
It also aimed to develop personalised web searches for individual users, better image searching and ways to handle multimedia websites.
He invited people to help develop new services listed on its laboratory web pages, including a glossary of acronyms and local search engines.
'Playboy' interview causes IPO jitters
Google's share float looked threatened at the weekend after its founders granted an interview to Playboy magazine, potentially in breach of US stock market rules.
Bloomberg later reported that the float would go ahead, but added the founders Sergey Brin, 30, and Larry Page, 31, may still be vulnerable to lawsuits by shareholders.
Google had said in an SEC filing on Friday that it might have to buy back shares sold in its initial public offering if a court found that Page's and Brin's comments in the Playboy article violated rules about what companies could disclose when selling shares to the public.
Google said the interview took place in April, before the company filed its registration statement with the SEC.
labs.google.com
www.pricai04.info
Chance to pick Google's brain
AdvertisementAdvertise with NZME.