gsqlcmd HTML Cleaner Options
/autoCorrectedTags=<tag>[;...]
Use this option to specify HTML tags that should be automatically closed.
For example, in XHTML, the syntax looks like this:
<ul> <li>item1</li> <li>item2</li> </ul>
Here, the <li>
tag requires a closing </li>
tag.
To parse HTML without the closing tag, add it to the autoCorrectedTags option:
<ul> <li>item1 <li>item2 </ul>
The default value for this option includes the tags: li
, p
, and a
.
You can modify the default value in the configuration file.
/cleanHtml
Use this option to clean downloaded HTML files.
You can leverage the clean-html mode to clean local files and identify the appropriate cleaning options.
Customize cleaning rules using the configuration file along with these output options:
/cutAttributes=<attribute>[;...]
Use this option to remove specified HTML attributes in the clean-html mode or the /cleanHtml option.
For example:
/cutAttributes=data-vars-event-action,data-vars-event-label
/cutComments
Use this option to remove HTML comments in the clean-html mode or the /cleanHtml option.
/cutIDs=<id>[;...]
Use this option to remove HTML nodes with specified ID values in the clean-html mode or the /cleanHtml option.
This option is useful for removing navigation and advertisement elements.
/cutScripts
Use this option to remove HTML <script>
and <noscript>
tags, as well as on*
event attributes, in the clean-html mode or the /cleanHtml option.
/cutStyles
Use this option to remove HTML <style>
tags and style
and class
attributes in the clean-html mode or the /cleanHtml option.
/cutTags=<tag>[;...]
Use this option to remove HTML nodes with specified tags in the clean-html mode or the /cleanHtml option.
For example:
/cutTags=amp-install-serviceworker,amp-state,amp-analytics,amp-user-notification