The PHIL System

PHIL is a software tool, written in Haskell, for filtering information from XML data. Essentially, the system implements a simple declarative language which allows one to extract relevant data as well as to exclude useless and misleading contents from an XML document by matching patterns against XML documents.

The matching mechanism (inspired by [2]) employes a cost-based pattern transformation algorithm which searches for patterns in an approximate way (i.e. modulo renaming, insertion, and deletion of XML items) and ranks the results w.r.t. their cost. In order to improve efficiency, the implementation uses sophisticated indexing techniques and exploits laziness to automatically avoid the construction of unnecessary data structures.

A technical report describing the system is available [here].

Download

PHIL Preview Relase

Binary packages
- PHIL v1.0 for MacosX (Intel Core 2 Duo) [terminal version] [GUI version]
- PHIL v1.0 for Linux (i386) [terminal version] [GUI version]
- PHIL v1.0 for Windows [terminal version] [GUI version]
Source packages

Note: in order to compile the PHIL's sources you need: ghc (6.4 or above), the Happy parser generator (v1.16 or above), and wxHaskell v0.9.4 (the last package is only required to build the GUI version).
- PHIL v1.0 [terminal version sources]
- PHIL v1.0 [GUI version sources]

Related papers

M. Baggi, D. Ballis. A Lazy Implementation of a Language for Approximate Filtering of XML Documents, Technical Report, University of Udine, 2007 [pdf].
T. Schlieder, H. Meuss. Querying and Ranking XML Documents. In Journal of the American Society for Information Science and Technology (JASIST), vol. 53, number 6, pages 489-503, 2002.