A story about robots, Tor/I2P, and how they just don’t mix.
Please don’t rain hate down on the operators of this site. They’re trying to stay legal in Finland and because of the sheer volume of entries received, we understand why they deployed a robotic system to assist. But as YouTube knows (and doesn’t fucking care) robots have a tendency of identifying false positives as potential threats. Our site was deemed as one.
Would you like to know more?
What is Ahmia.
To a lot of people. Ahmia (Onion address here for those already reading this on Tor) would almost be their entry point into the Tor/Onion network. Giving people a ‘peek’ into the kind of websites that exist on the other side of the fence without having to install a special browser or load up the Tor software. Because of their open nature of allowing anyone to submit their onion address into their network, it leads to having problem sites show up from time to time.
Ahmia vs. Child Porn.
If it’s one of the few things we and the government of Finland agree upon is “Fuck CP!”. One of those problems is people that host CP (Child Porn) which frankly those people at best need psychological help if they even have thoughts like that. And at worst need a serious ass kicking if ever caught and convicted. In a job we had long ago in a galaxy far far away. We were in charge of a small social network and during image and virus scans we noticed the server was getting stuck on these incredibly large images (50Mb jpeg files). After further investigation, we found it was users exchanging CP back and forth. Even in America if your web provider catches wind that your server is hosting this. You will be erased from the net! You might even get a call from the police.
So, we get Ahmia’s paranoia about this subject!
Ahmia publishes their blacklists publicly in the form of MD5 checksums that you can get the RAW text file on this page. So, props to Ahmia for being transparent about what is happening in their engine! How it works is you download the text file. Take your onion address. run an MD5 checksum on your onion address and then paste the MD5 result into that text file. It’s a fairly safe way to show people who’ve been banned without other search engines accidentally indexing the list and having it inadvertently turn into an advertisement page for some of the real shitbags out there!
Needless to say: Ban-hammered.
I went into the contact area and it had a project leader’s e-mail at the bottom. And e-mailed them. With our addresses, explaining the situation. and within hours something remarkable happened that no other automated search engine has ever done.
We got a response.
He was very cordial and told me that he’d make an exception in his engine to start indexing again. He did mention that this site has a lot of data and that there were certain keywords that triggered the CP ban.
Huh. So Ahmia’s system runs on certain words. Perhaps this very post mentioned things like CP, Child Porn. Or my FAQ telling “underaged” people to GTFO. Or hell even the October blog entry we made about Trick-or-Treat and even mentioning what life was like as a “Child” is enough to freak out his system. Or, it could be the time when the entire Onion network was getting duplicated by a group of people that put CP banners at the bottom of all content on the Tor/Onion like what was illustrated earlier. And because of those imitation websites leading to CP the original everything was copied from was also banned. Guilty by association with the eyes of the robot.
So, by having content. By having stories. The robot punishes. Seems a little counterproductive. But okay.
Within 24 hours we were back up there. The admin whitelisted us. Not sure if “No description provided” is something that the spider is supposed to fill in. But because we’re still technically banned it’s never really going to fill out. Because on just name alone as you can see Ahmia’s engine flies wildly out of control after our entry. Probably looking up anything with “S” and anything with “Config” afterward treating the dash as a separator.
We did e-mail again asking if there’s a way to be removed from the MD5 blacklist. The admin responded stating it’s not possible because even if he did remove the MD5 the robot would look at my articles again and add me right back on.
Whitelisted but still banned.
We’re not mad at Ahmia and we do hope they can get to the bottom of their robot issues because if we’re a false positive who’s to say how many other sites are in the same boat?
They also seem to have other wars they’re fighting with various drug dealers using an almost dictionary attack on their engine to ensure they get to the top of every possible word that an end-user types in.
Finally, if you do find an interesting link it might be offline because if Ahmia’s spider isn’t checking certain types of servers then they could stay up on their search listing indefinitely. leading to massive link-rot.
So in summary, Ahmia was certainly one of the bigger search engines to hit the clear web offering users a ‘view’ into what’s beyond. But all of the quasi-attacks on it have really made it an uphill battle for them. We hope they pull through.
This short entry is just an illustration when it comes to alternative and free networks. It’s totally not friendly toward automation. Automation can be exploited and automation can be ruined with zero consequence to the attacker as they’re technically an ‘anonymous reader’ at that state.
It was demonstrated in our imitation websites article that anyone with enough scripting can replicate a site and hijack it to their own means. Yes, we’re even aware of the Darkweb scraper that uses this website as an example which although they didn’t ask for permission they don’t really have to. It’s cool we’ll give them a link to their project anyway! Because we understand that websites on these alternative networks go up and down all of the time and some people want to capture a site for personal or historical purposes and not for fucking people over. It’s just a tool, how you use said tool is what determines you as an asshole or not.
Those who stay on the Onion Network eventually rely on hand-curated listings like Let’s Decentralize because believe it or not there’s an actual person checking out those websites. We even hand-check sites in our cellar door once a month to ensure the website isn’t being hijacked by some evil fuck spreading malware or CP. It’s a lot harder for a shitty site to enter a human-made list because they have to convince the human running said list they are ‘good’. They in fact are shit and denied on site.
Anyhow, that’s what server said.
END OF LINE+++