Intelartes Web Intelligence Distiller®

Mr. Piorot is an intelligence analyst. His job is to monitor the web pages, emails and chats for particular types of information. He produces a conceptual model of what he wishes to find and deploys it on an information extraction system. The information extraction system will process the text collected by a web crawler. It interprets the text in the light of the conceptual model he specifies. It selects items of information that deserves his special attention.

 

Intelartes Web Intelligence Distiller (IWID) performs the forensic analysis of natural language texts in order to identify linguistic evidences and red flags based on a conceptual forensic model of concepts and relations.

 

Users and work process

The user of the system intended for is the intelligence analyst, information monitor and surveillance. The work process implied is of iteration of 7 development steps:

 

FF POIROT develops and practices a methodology for ontology-based conceptual modelling: Application Knowledge Engineering Methodology to maximise the productivity of knowledge engineering and the adaptability to dynamic changes. It has been used to create applications to detect email frauds. 

 

System and technologies

IWID is based on information extraction and natural language understanding technologies, It performs

Central to IWID is the semantic interpretation, which interprets elements of information in terms of semantic templates. As the web is filled with information in natural language, IWID has a language processor component that performs robust parsing on texts of variable length and compose the parses into semantic interpretation.

 

IWID consists of two processing units: Semantic Extraction and Semantic Interpretation. The Language Processor in the Semantic Extraction processes unstructured data and outputs to the Semantic Interpretation. It is the line of operation that the current study discusses. IWID for the unstructured data uses three data stores directly: lexicon, syntax and semantic definition and ontology indirectly.

 

The language processor consists of a tokeniser with parts of speech tagging, robust parser. The parser is robust against incomplete grammar and lexicon, ignores what it does not know or is not supposed to know. It performs the syntactic constituent analysis of phrases, sentences if their grammar is defined. It conducts functional analysis of the phrases with feature unification. Its grammar is phrase structure grammar as backbone with feature unification as augmentation. It takes as input, words, phrases, sentences and identifies the semantic elements and features expressed in them.

 

 

The semantic interpretation is performed by two components. The composer selects the relevant semantic template and fills it with semantic elements extracted previously. The evaluator selects templates by scores calculated in terms of the semantic constraints on each slot of the template and contextual distance between the elements filling the slots. It turns out fully or partially filled semantic templates.

 

The semantic extraction and interpretation is supported by a manager of context that keeps a record of semantic and textual context and score a semantic element, semantic template and interpretation in view of the current contexts.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Contact:

Gang Zhao (FF POIROT Project Coordinator)

STARLab, Computer Science, Vrije Universiteit Brussel

Pleinlaan 2, 1050 Brussels, Belgium

Email: gang.zhao@vub.ac.be,  Phone: + 32 2 629 3543  Fax: + 32 2 629 3819