The CSV file format used for parsers in the ‘Parse Text from Files’ tool is as follows:
"Description", "Name", "Pattern"[, "Pattern 2", etc.]
where:
Description – contains a short description to display in the tool table.
Name – contains either the file name if using the ‘Text Output’ tab, or the database table name if using the ‘Database Output’ tab.
Pattern (n) – contains one or more regular expressions used to parse the data.
NastPad will perform something akin to a Batch Search (Find All), but output any captured texts to the specified file or database. Only named capture groups in the regular expression are output. The first unnamed capture group is passed to the next pattern as the subject string. This is useful if it is impractical or impossible to parse all the data with a single regular expression. The default F06 parser usually uses the first pattern to find a ‘page’ of results (all text between a page header and footer) whilst also capturing the Subcase ID, etc. The page text is then passed to a second regular expression to capture individual lines of results data. The Subcase ID is then prepended to the other data and output as rows of data to text or database. From there it is easy to process the data using queries or pivot tables.
Feel free to reply to this topic with any questions or suggest additions to the default parsers…