Concept of filters is a generic way to apply transformations to any data. The mechanism itself does not provide any functionality besides abstracting away the details of invoking one or more modifiers. Filter is stored as XML document that defines processing steps for invoking programs and resolves logistics of transferring data between steps. Application can invoke the filter by writing data into stream or providing a file path. Filter handles the transfer from file to pipe and vice versa when needed.
The filter declares input and output MIME type which makes it possible to look up suitable filter for converting format A to B. The same filter can be used both by desktop tools and runtime components.
Filters are typically installed next to the application executable in subfolder called Filters and the filter files have .filter filename extension. Applications find the filters from specified folder automatically so there is no need to register a filter in some other configuration file.
Data extractor uses Unicode text data as input so any other data format we want to process with extractor needs to be converted into plain UTF-8 stream. Following filter does that by invoking Windows Powershell script for the conversion.
<?xml version="1.0"?> <filters> <filter name="Excel" command="$SystemRoot\System32\WindowsPowerShell\v1.0\powershell.exe" itype="application/vnd.ms-excel" otype="text/plain"> <argument type = "value">-File</argument> <argument type = "value">$FilterPath\exceltotext.ps1</argument> <argument type = "infile"></argument> <argument type = "outfile" absolute="1"></argument> </filter> </filters>
When there are multiple
<filter> steps defined, the first step needs to have input MIME type in
itype attribute and the last one a
otype attribute to declare the format of data on input and output. To make the filter usable for extractor, the
otype must be set to "text/plain". Some of the MIME types are detected by the application using common file signatures. To make sure the data is recognized properly, the applications like merge and extract have command line parameters for declaring the MIME type.
- value, meaning it is a command line parameter or option
- infile, a input file name provided by filter mechanism at runtime
- outfile, a output file provided at runtime
Argument element attribute 'absolute', when set to 1 or true, will tell the filter system to provide file using full path. Argument element attribute 'uri', when set to 1 or true, will make sure the file name is provided in URI format, for instance file:///home/user/file.txt
Variables in filter
Environment variables defined for the executing process can be used in filter. This helps to avoid writing absolute paths into configuration. Some additional variables are evaluated automatically:
- $FilterPath Specifies the location of filter file
- $AppPath Specifies the location of the application executable
- $AppPlatform Currently defined on Windows only as "x64" or "x86"
The $SystemRoot in above example is a variable typically defined in Windows environment and has value similar to c:\Windows.
Filters can be created by manually editing the XML file or by using filter editor desktop application.