Eureka: Six steps to build smart artificial intelligence by Joseph Gatt

In an age where most information and communication is done and received online, several corporations have been asking how danger can be identified online. That is how to make the difference between hate-related articles or articles inciting terrorism, and those who simply make commentary on terrorism. Here are six ways you can make the difference.

Open coding

If Google or Facebook hired a few open coders to code content circulating online, patterns would quickly be identified between hate-inciting materials and innocent online articles. Open coding basically means assigning a broad category to each sentence or paragraph in the text. For example, if you open coded this very paragraph you would assign the category “definition of open coding” or some similar category. The idea is the open code has to be as broad as possible.

Axial coding

Once you use open coding for paragraphs or sentences in articles, you then use axial coding, that is you take the different open codes for sentences or paragraphs and assign them to an even broader category. For example, the last two paragraphs of the article would be assigned the very broad category of “text coding.”

Type-tokens

Once you’ve open coded and used axial codes for your text, you want to know what type-tokens are used for each broad and narrow category of text. Type-tokens are the number of times the same words or phrases are repeated in each paragraph. A high type-token is one where few words are repeated several times, a low type-token is one where the authors visibly know a lot of vocabulary and don’t repeat words frequently.

Word frequency

Now that you have your open code, axial codes and type-tokens for each category, you want to run a word-frequency list. Hate-inciting articles use different word frequencies than articles simply commenting on hate incitement. You want to rank what words rank at the top and what words are only said once or twice. Assign the word frequency to each category.

Sublanguage

Now that you have your coding, type-token and word-frequency test you have a better idea of the sublanguage being used by the text. Each office, organization or partnership uses its own sublanguage, that is as long as people are interacting frequently they will use their own sublanguage. You may have your type-tokens and word frequency, but you will want to check for grammar patterns, syntax, and perhaps different semantic meanings in the text. You want to encode the structure of the sublanguage being used.

Color type-token

Now not all communication is done with text, some of it is done with color as well. You want to check pictures with color type-tokens, that is what colors appear to dominate the picture and what the balance of color is in the image. Color also depends on the subculture, that is some subcultures will use some colors more than others.

This is hopefully a recipe to tell the different between hate-filled materials and simple commentary. Good luck with the coding!


Leave a comment