SGREP (structured grep) is a tool for searching and indexing text, SGML,XML and HTML files and filtering text streams using structural criteria. The data model of sgrep is based on regions, which are nonempty substrings of text. Regions are typically occurrences of constant strings, SGML-tags, or meaningful text elements, which are recognizable through some delimiting strings or the builtin SGML, XML and HTML parser. Regions can be arbitrarily long, arbitrarily overlapping, and arbitrarily nested. There is also a paper which would be useful for anyone wishing to use SGREP.
TLA-team: Sgrep is a free (GPL) command line tool. Text to process can be plain text, plus there are special modes for XML and SGML. The manual page gives several usage examples. One possible query is “word(‘free’) near(10) word(‘software’)” – a possible match for this would be “free and open source software”. Sgrep for Mac OS can be installed via MacPorts. Ubuntu and other Linux distributions provide a package.