One such page is entitled: "A guide to kill Americans in Saudi Arabia."
Programmers can often leave digital clues as to their identity: in their greetings and terms, punctuation and syntax, and the coding they employ for multimedia attachments and links.
Accordingly, a University of Arizona project is developing a tool that uses these clues to automate the analysis of online jihadism. The project is entitled The Dark Web Terrorism Research project which scours Web sites, forums, and chat rooms to find jihadists and learn how they reel in adherents.
Lab director Hsinchun Chen calls the project "al-Qaida University on the Web."
The massive amount of data is a huge issue which makes the project all the more valuable. There has been a tenfold increase in the last two years in jihadist content appearing online.
One other existing computer-generated research of terrorist Web sites is at the Pacific Northwest National Laboratory.
A $1.3 million grant the National Science Foundation gave Chen's group will focus on who produces IEDs (Improvised Explosive Devices); Chen started the project with about $3 million from other Artificial Intelligence Lab programs.
The AP carried a story about how the project works:
Dark Web's software, Writeprint, samples 480 different factors to identify whether the same people are posting to multiple radical forums. It can analyze everything from a fragment of an e-mail to videos depicting American soldiers blown up in Humvees and fuel tankers.
Writeprint is derived from a program originally used to determine the authenticity of William Shakespeare's works. It looks at writing style, word usage and frequency and greetings, and at technical elements ranging from Web addresses to the coding on multimedia attachments. It also looks at linguistic features such as special characters, punctuation, word roots, font size and color.
Dark Web compares writings it finds to others in its logs of about 500 million pages of jihadist-produced content.
Most of the material is in Arabic, but the terrorist network has expanded to include Chinese, Spanish, and French sources, soon, others will be added.
The methods used here are unproven but data collection and analysis is common in enterprise applications and I see no reason to doubt that their efforts could lead to breakthroughs. One of the best sites of violent postings collected by a group that tracks jihadists is at the Search for International Terrorist Entities. One shocking fact to consider is how many of these violent sites are housed on U.S. ISPs.