Data extractor architecture


The core of the service is a portable application written in c++ that uses set of predefined configuration files (stencils) for identifying and parsing the input (Unicode text) into structured message. The application is build from modules that each perform some subset task.

The extractor core

The extractor core consumes Unicode text as a source. The service however can handle any format that has input filter defined.

Web service

The extractor can be deployed in Docker container that runs Linux operating system.

The extractor web service

Running instance of the extractor can be managed via REST API or using built-in management web page. The management interface allows adding and removing stencils, observing execution logs, and managing user roles. It makes also possible to edit stencils from Stencil Editor desktop application using Publisher module.

Each extractor invocation produces a log entry that includes result code (none, partial, complete). It is possible to retain copy of input files that produce incomplete match. This allows using the failed input to adjust existing or create new stencils. Collecting input files is optional.