Many others demur. They say it is easily reversible — and one software developer built us something to show how.
Overseas, the objectors include the US Federal Trade Commission and the European Data Protection Supervisor. Closer to home, various software consultants contacted RNZ after the story ran to express their concerns.
“I was absolutely stunned that this is happening and I think there is a very clear privacy breach going on here,” one emailed.
“The IRD seem to be saying: we have a secure method (hashed data) to communicate with data-hungry multinational tech organisations who make money by building products based on the data they can collect about you. Clearly, that’s a bad argument,” another said.
A third, ex-hacker Adam Boileau, was blunt about sharing details with organisations that already had billions of data points to start with.
“Using hashing or other data aggregation in this context is, sadly, just a technological sleight-of-hand trick to bamboozle,” he said.
One of Inland Revenue’s defences for choosing to use what was arguably the best-bang-for-buck approach to targeted advertising that tax money could buy — after all, papers show it spent only about $400,000 with Facebook this way in the past six years — was a technical one: “Hashing is a type of cryptographic security method that turns identifiers into randomised code and cannot be reversed so identities are protected,” it said.
“For example, john.doe@ird.govt.nz may come out hashed as wLKziR/6RoXDv1MDaXLH1UNUC9nIVr97jrTnL4TcxsM=. Meta, for example, uses this hashed information and compares it to its own hashed information to build custom audiences.”
‘Cannot be reversed’ — really?
Maybe 15 years ago (or whenever exactly it was that Inland Revenue started down this route — online marketer Jack Yan said it was an early adopter), “cannot be reversed” carried weight.
But times, and tech, change.
After the RNZ story revealed the practice, one software developer set out of his own volition to do some tyre-kicking.
“To prove how ineffective hashing is for anonymising a finite set of values, I created a simple programme that converts any hashed (encrypted) NZ landline [phone] number back to the original (unencrypted) number,” he told RNZ.
How long does it take?
“0.15 seconds.”
He called his programme the SHA256 Generator: SHA stands for standard hash algorithm, and SHA256 is the algorithm Facebook uses and was introduced along with three other hashing algorithms globally over 20 years ago.
Here is the consultant’s DIY recipe for reversing irreversible hashing: First, generate a list of all possible phone numbers for each area code. “For example, for the South Island, 03 000 0000 to 03 999 9999.”
Next, generate a SHA256 hash each. The Generator will do that for you super-quick.
Store it in a database. Then when a hash lands and you think it might be a phone number, ask the database.
“This is a well-known technique for attacking hashed values,” the consultant said.
As you might imagine, someone streamlined and packaged up this approach, calling it “the rainbow table”.
“A rainbow table attack is a password-cracking method that uses a special table (a “rainbow table”) to crack the password hashes in a database,” a tech website said.
Boileau, the technical editor at Risky.biz, does weekly podcasts on security news. He compared hashing to a meat grinder for lumps of data.
“You can’t tell by looking at a hash sausage which bits of the pig went in.”
A cyberattacker who stole a file of passwords had to attempt to decode the hashes, “to put the sausage into the grinder, turn the handle backwards, and get a pig out”, he said.
“Instead of this folly, what we — I spent 20 years as a professional hacker — do is just hash every word in the dictionary and see if we get a match.”
Surely that takes ages?
No. “Using the power of modern 3D gaming graphics equipment, we can do this at speeds of hundreds of billions of words per second. The maths for both are basically the same.
If you know something about the nature of the data that has been hashed, then the reversing gets even easier. For instance, if it is probably info on gender, dates of birth, phone numbers or credit cards, then simply computing a hash for every possible phone number or credit card “is trivial, mere seconds or minutes of compute”.
“Ultimately, there is no easy way to share data with someone or an organisation you don’t trust, especially if that organisation already has billions of data points to start with,” Boileau said.
“If they want to correlate or investigate to de-anonymise data, they can do so.”
They can. But do Facebook and Google and LinkedIn want to? What is in it for them if they already have your name, date of birth, address, phone, and email contact?
“Look at the kinds of adverts that are being posted, which are targeted at specific people by the IRD,” Daniel Wilson, a lecturer in the School of Computer Science at Auckland University, said.
‘Sensitive stuff’
Inland Revenue said it targeted ads at people with an income tax or GST debt due, or a student loan debt due, or needing a Working for Families update.
“What happens if the aim of IRD is successful and someone clicks on one of the IRD adverts dished up by Facebook?” Wilson said.
“Facebook, for instance, keeps track of your ad activity.” (You can check that out by going to “menu”, then “Recent Ad Activity”.)
“So if I click on the IRD ‘sort out your income tax debt’ advert, that is logged ... giving information to Meta that, for instance, I am likely to have an income tax debt is pretty sensitive stuff.
“This is in a different league from Meta knowing that I am a fan of, say, popular science books.”
Inland Revenue offered other defences, including this was both within the law and an effective way to get tax revenue back.
It also stressed it trusted the tech companies to do the right thing, including deleting the taxpayer’s info quickly after use.
Wilson said Inland Revenue might think deletion limited its responsibility.
“But in the broader system context, if IRD is successful in their aim of getting a client to click on a specific ad that indicates a particular tax liability, this information is logged and, in the current environment, is free to be used by social media companies for activities like training AI systems,” he said.
“Social media organisations would not have been able to collect this specific kind of information without IRD’s targeted advertising campaigns.”
Where is the regulator in this?
The Office of the Privacy Commissioner told RNZ it did not have a general position on hashing, but could look at developing one if need be. The US Federal Trade Commission and European regulators saw the need years ago.
One emailer said the commissioner needed to find out more about Inland Revenue, and conjured the slippery slope. “Up until a few years ago there was a red line that health data should not go offshore. That has been gradually whittled down.”
Another said Inland Revenue might be sailing close to the wind. They discussed how the Google Adwords Customer Match feature allowed a customer like Inland Revenue to upload a list of details to Google to target individuals directly with advertising.
One of the terms of service conditions was that the advertiser had to have a privacy policy that allowed them to share customer data with advertisers and third parties.
The emailer said they did not believe Inland Revenue had acquired “the knowing, uncoerced consent for this usage of my private information” as required under the Privacy Act.
Inland Revenue defended hashing — then, after the RNZ story ran, said it would take another look “to ensure it is still safe to use”.
But when was it last safe?
Sign up to The Daily H, a free newsletter curated by our editors and delivered straight to your inbox every weekday.