Configuring Parsers

Configuring Parsers

Contents:

Parsing HTML

gsqlcmd allows extracting data from HTML tables.

You may extract the required table data using the rootPath option.

For example, you may download dividends from the nasdaq.com website like

gsqlcmd download http://www.nasdaq.com/symbol/aapl/dividend-history aapl.htm

Then, get available tables using the command:

gsqlcmd show-html-tables aapl.htm tables.htm /openOutput

The resulting tables.htm file contains data like this:

You may see that the dividends are in table 3.

So, use the command like this to convert the downloaded file:

gsqlcmd convert aapl.htm dividends.csv /rootPath=3 /quoteChar=

or like this to extact data from the web page:

gsqlcmd convert http://www.nasdaq.com/symbol/aapl/dividend-history dividends.csv /rootPath=3 /quoteChar=

The resulting dividends.csv contains data like the following:

ExEffDate;Type;CashAmount;DeclarationDate;RecordDate;PaymentDate
2019-02-08;Cash;0.73;2019-01-29;2019-02-11;2019-02-14
2018-11-08;Cash;0.73;2018-11-01;2018-11-12;2018-11-15
2018-08-10;Cash;0.73;2018-07-31;2018-08-13;2018-08-16
2018-05-11;Cash;0.73;2018-05-01;2018-05-14;2018-05-17
2018-02-09;Cash;0.63;2018-02-01;2018-02-12;2018-02-15

gsqlcmd detects the correct table in most cases automatically.

So, you may try a simpler command first:

gsqlcmd convert aapl.htm dividends.csv

If the required page contains several tables with the required data, just apply the convert mode for every table.

If the tables have the same structure, specify multiple indexes of the tables in the /rootPath option separated by commas like /rootPath=20,21,22.

If the table does not contain headers, use /noSourceHeaders option. In this case, gsqlcmd names columns like F1, F2, and so on.

Parsing XML

gsqlcmd allows extracting table data from hierarchical XML files.

In most cases, gsqlcmd detects the desired data automatically.

However, you may use the rootPath and other parser options to customize the parser.

For example, test.xml contains the following contents:

<?xml version="1.0" encoding="utf-8"?>
<root>
    <parent>
        <row id="1"><f1>data11</f1><f2>data12</f2></row>
        <row id="2"><f1>data21</f1><f2>data22</f2></row>
        <row id="3"><f1>data31</f1><f2>data32</f2></row>
    </parent>
</root>

The basic convert command looks like this:

gsqlcmd convert test.xml test.csv

The resulting test.csv will have the following content:

id;"f1";"f2"
1;"data11";"data12"
2;"data21";"data22"
3;"data31";"data32"

gsqlcmd selects the "best" root of the data.

This command is equivalent to

gsqlcmd convert test.xml test.csv /rootPath=root.parent.row

You may change the root path to get data from the specified node.

For example:

gsqlcmd convert test.xml test.csv /rootPath=root.parent

The result is the following:

row_id;"row_f1";"row_f2"
1;"data11";"data12"
2;"data21";"data22"
3;"data31";"data32"

You may use the /quoteChar option to remove quotes.

For example:

gsqlcmd convert test.xml test.csv /rootPath=root.parent /quoteChar=

row_id;row_f1;row_f2
1;data11;data12
2;data21;data22
3;data31;data32

Use the show-roots mode to list applicable roots.

For example:

gsqlcmd show-roots test.xml

root
root.parent
root.parent.row
root.parent.row.id
root.parent.row.f1
root.parent.row.f2

Take a look at the following options that allows customizing the parser:

/attributedColumns=<col>.<attr>[;...]
/collapsedNodes=<node>[;...]
/ignoredTags=<tag>[;...]
/keptNodes=<node>[;...]
/requiredColumns=<name>[;...]
/skippedNodes=<node>[;...]

For example, you may skip required nodes using skippedNodes option like:

gsqlcmd convert test.xml test.csv /quoteChar= /skippedNodes=f2

id;f1
1;data11
2;data21
3;data31

Parsing JSON

Parsing JSON is similar to the parsing XML as both document formats have hierarchical structures.

For example, test.json contains the following contents:

{"root":
  {"parent":[
    {"id":1,"f1":"data11","f2":"data12"},
    {"id":2,"f1":"data21","f2":"data22"},
    {"id":3,"f1":"data31","f2":"data32"}
  ]
}

The basic convert command returns the following:

gsqlcmd convert test.json test.csv

Id;"F1";"F2"
1;"data11";"data12"
2;"data21";"data22"
3;"data31";"data32"

The JSON parser capitalizes the first letters of column names.

Use the show-roots mode to list applicable roots.

For example:

gsqlcmd show-roots test.json

root
root.parent
root.parent.row
root.parent.row.id
root.parent.row.f1
root.parent.row.f2

You ma select the required data using the rootPath option.

For example:

gsqlcmd convert test.json test.csv /rootPath=root /quoteChar=

Parent_id;Parent_f1;Parent_f2
1;data11;data12
2;data21;data22
3;data31;data32

The JSON parser support multiple roots. For example:

gsqlcmd convert test.json test.csv /rootPath=root.parent.f1,root.parent.f2 /quoteChar=

F1;F2
data11;data12
data21;data22
data31;data32

You may use * instead of node names. For example:

gsqlcmd convert test.json test.csv /rootPath=*.*.f1,*.*.f2 /quoteChar=

You may find a lot of ready-to-use samples in the Downloading and converting to CSV folder of the downloaded gsqlcmd package.

Also, take a look at the following options that allows customizing the parser:

/keptNodes=<node>[;...]
/requiredColumns=<name>[;...]
/skippedNodes=<node>[;...]

Parsing Plain Text

gsqlcmd allows getting data from plain text files if the text contains detected columns like these:

id  f1      f2
1   data11  data12
2   data21  data22
3   data31  data32

Use the convert command to extract data:

gsqlcmd convert test.txt test.csv

The resulting test.csv file contains the following:

id;"f1";"f2"
1;"data11";"data12"
2;"data21";"data22"
3;"data31";"data32"

Note that you may also use Microsoft ODBC and OLE DB text drivers to select and convert plain text data.

In the initial step, update the schema.ini file using the make-ini command:

gsqlcmd make-ini test.txt schema.ini

Then use the exec command to select data:

gsqlcmd exec test.txt "SELECT * FROM test.txt" test.csv

You may use SQL supported features like WHERE or GROUP in such commands.

Parsing CSV

gsqlcmd allows extracting data from CSV files and converting to any supported output format.

Take a look at the useful formatting options:

/add=<header>=<value|function>[;...]
/addRowNum
/dateFormat=<format>
/dateTimeFormat=<format>
/escapeChar=<char>
/noBOM
/outputCodepage=<codepage>
/outputCulture=<name>
/[outputFormat=]asText|asTxt|asCsv|asHtm|asHtml|asXml|asJson
/[output]Separator=<separator>|Tab
/quoteChar=<char>
/rowNumBase=<integer>
/timeFormat=<format>

For example, you may use the following command to add new columns like Symbol and Date, and convert the output file to the UTF-8 encoding.

gsqlcmd convert test.csv test-65001.csv /outputEncoding=65001 /add=Symbol=AAPL;Date=Date()

Note that you may also use Microsoft ODBC and OLE DB text drivers to select and convert CSV data.

In the initial step, update the schema.ini file using the make-ini command:

gsqlcmd make-ini test.csv schema.ini

Then, use the exec command to select data:

gsqlcmd exec test.csv "SELECT 'AAPL' AS Symbol, :date AS [Date], * FROM test.csv" output.csv /set=date=Date()

You may use SQL supported features like WHERE or GROUP in such commands.

This website is using cookies. By continuing to browse, you give us your consent to our use of cookies as explained in our Cookie Policy.