Cataloging the Unindexed Internet
How Parity by Backchannel is being used by researchers to investigate hateful content from fringe groups in the US.
Instead I spent the past week preparing to ship an analytics platform far ahead of my initially planned launch, in order to empower investigators researching the January 6th insurrection at the US Capitol.
I have named this project Parity, and it is already being used by researchers, journalists, and investigators to analyze millions of real time messages from fringe and hate groups.
This post will elaborate a bit about how I am using the tool to study fringe content, and in turn will give you a small preview of what I plan on announcing soon for Backchannel. While Parity is intended to store anything and connect to everything, it is currently finding a comfortable role as a data warehouse for hateful content.
Garbage In, Insights Out
The messages being warehoused by the Parity platform originate from distinct chats where fringe groups organize, and coordinate hate speech and operations. There are many references to the January 6th insurrection at the US Capitol, but also plenty of rising action from the week preceding that event as well. This kind of content can be unappetizing to the average internet user. I plan on building Parity to reduce overall exposure to this content, with a transparent rules engine that can help content moderators build advanced filters.
For example, while most of the messages are structured with a common schema and format, there is still a need to handle rich text and multimedia payloads stored within each message. This is particularly important since the groups being studied are prolific meme producers. My goal is to design Parity so that analysts and investigators can spend less time spelunking into the gutter of the internet, and be freed up to do what they do best (analysis).
Parity is meant to warehouse diverse datasets, but not horde them in a single place. There are simple connectors to business intelligence tools and visualization frameworks that can be plug-and-play or developer-friendly.
I chose to do some light analysis based on keywords, knowing that far-right militias are planning another march in the month of January. By clustering terms that appear next to the term “January” throughout the Parity warehouse, I constructed the word cloud below to see if there are any particular dates in mind for the upcoming operations. As a result, the word cloud illuminated several dates from January 17th through 20th, along with other high-volume terms observed to have covisitation with the term “January.”
The results align with multimedia collected from these chats, such as a flyer promoting an armed march on state capitals around the country on January 17th.
Given how typical information security workflows revolve around the enrichment of indicators and observables, I wanted to extract the common URLs and domains from these messages so that I can look them up in other tools like VirusTotal or RiskIQ.
The vast majority of the most common observed domains should be familiar to researchers who study the online communications of fringe groups. Sites like YouTube and Bitchute have a history of hosting hateful media and content that slips through their content moderation detection. And Twitter, Discord, and Instagram continue to be popular avenues for fringe group discussion, despite a cat-and-mouse game of reports and bans from these platforms.
Threat intelligence companies have sunk millions of dollars into scraping and crawling areas of interest to cybersecurity practitioners. However, much of the underbelly of the internet has shifted to decentralized, private, and group-oriented mediums, and away from forums and imageboards. There is certainly an opportunity to expand coverage into chats, an opportunity I certainly hope to follow. But I also believe that the capabilities accrued in the cyber threat intelligence industry this past decade could be doing so much more for organizations and individuals across industries.
My vision for Parity is for users to define what they want collected, and create an incredibly simple way to collect, store, and study that information. And I want users to not be held to the limits of a service provider, but rather independent owners of their data. I have no interest in creating a data mining business; rather, I envision a democratized capability that enables people to mine their own business.
I have sped up the shipment of Parity so that interested parties can inquire about a pilot program immediately. Please send an email to firstname.lastname@example.org to discuss getting access.