As an example, in May this year it was revealed that NZ Police had on the quiet trialled the American Clearview AI facial recognition system that is built using billions of images scraped from people's social media profiles.
Some would say that if you post pictures of yourself or others on social media, you're asking for them to be used by anyone, but even by that measure Clearview AI seems to have gone too far with its business. The company now faces investigation by the Australian, British and Canadian privacy watchdogs.
There's an argument of course that the long arm of the law should have access to the same resources that hackers do. However, as state-sponsored offensive hacking campaigns like NotPetya showed us when it took out Maersk's IT systems as collateral damage, there is plenty of potential for things to go wrong.
Likewise, not even the US National Security Agency can keep its stashes of exploits secret and safe. Instead they have leaked out and become incorporated in dangerous malware.
Despite the above I was surprised to read that police in the United States appear to have been buying data from breaches - hacks, if you like. It's all done in the open via a vendor called Spycloud that collects breached data and cleans it up.
It's a colossal amount of data from breaches as well at over 102 billion "assets" Spycloud boasts. The data trove is growing at more than 50 breaches a week. That's breaches, and not the number of records which could be anything.
The data held by Spycloud appears to be very detailed too, with plenty of sensitive personal information:
"We primarily look to acquire leaked or stolen assets in the criminal underground that contain: user credentials: email/username and password, highly enriched PII [personally identifiable information] such as first and last names, addresses, phone numbers, dates of birth, SSNs, and over 150 data types that power fraud investigations," Spycloud says.
The data can be fed into software like Maltego, an open source intelligence gleaning tool that finds links between pieces of information.
Spycloud says they collect the data through "human intelligence" and claim it has the most and cleanest data of any provider. Microsoft through a venture capital subsidiary has sunk tens of millions of dollars in investment in Spycloud.
There's nothing to suggest that Spycloud is doing anything nefarious with the data it has.
On the contrary, the company says it has a cybersecurity mission, and tries to develop methods and technologies to prevent account takeovers, and to investigate fraud.
The media release I spotted last month talked about Spycloud working with law enforcement and another tech firm, Zero Trafficking, that also uses large-scale data analytics to fight the horrible scourge of human trafficking.
At first glance, it seems like a great idea to use people's hacked personal data against the criminals that stole it and other miscreants. Besides, the data is already out there and often in multiple sets that are traded by spammers and other criminals.
The problem is that nobody asked the people whose stolen data is assembled in the 102 billion "assets" used for analysis and processing. I'm in their system, and June was the first time I had heard of Spycloud. As mentioned before, the data is "enriched" which means several sources are joined up to provide a fuller picture of subjects.
How is a private, commercial data collection and processing company regulated though? If you're outside US jurisdiction, do international regulations apply? Does the law enforcement collaboration require warrants? What exactly is the data used for by police? How many other companies are involved in such work, and who are their customers? Can we opt out somehow and have our hacked data deleted?
If by now you're asking if this is compliant with privacy laws around the world, you're not alone. Privacy researcher and activist Wolfie Christl has suggested that under the strict European Union General Data Protection Regulation rules you can't take people's hacked information and process it, no matter that it's been "made public" by criminals so to speak.
Maybe the Spycloud approach is helpful and a good resource for law enforcement against digital criminals; maybe it's not. We do need transparency around it, just like we expect Google and Facebook to tell us what they do with our data.
If on the other hand it can be argued that law enforcement agencies need large data lakes of PII to successfully track alleged criminals, we need to hold that thought and carefully consider what kind of privacy-less future such a move could bring.