What is on your mind?

Enjoy our 24/7/365 expert support
  1. Anonymous User
  2. InfinityGrab
  3. Thursday, 16 January 2020
  4.  Subscribe via email

If you need a hand using the HTML Parser code, we can help you out setting up the pipes. Simply post a detailed private "Technical Question" on this helpdesk describing what exactly you are trying to achieve and we get back to you ASAP with our advice. However, if you prefer us to set up entire pipes we will have to charge that against your developer hours that you have purchased from us or received as a freebie. Our hosting partners receive a credit of one free developer hour for every 120 USD they spend with us on hosting.

The HTML Parser processor is an alternative processor to the “Get Fulltext” processor in the case that one doesn’t work for your source. It has its own program language. This is how it works:

https://www.youtube.com/watch?v=K9cN9yOVfnQ

There are several commands: ginner, remove, split, wrap and replace to do its magic with HTML source. Each command needs to be placed on a new line.

Function: ginner

Get inside content of an HTML tag from input HTML source

Syntax

ginner|{LINE}|{TAG}|{DELIMITER}|{RETURN}|{DEBUG}|
  • {LINE}: the output from {LINE}. Basically, you can put many lines and each line will have an output itself, and we can use output of this line as input of other line. “0” means the original input of the processor, “1” means the output of the line#1.
  • {TAG}: target HTML tag
  • {DELIMITER} a string inside that target tag
  • {RETURN}: the number of part will be returned for the processor, start with 0, L stands for Last part.
  • {DEBUG}: debug mode in the case the {DELIMITER} cannot be found from INPUT HTML source.
    • 0: return "” (empty string) in the case an error occurs.
    • 1: stop immediately in the case an error occurs.
    • 2: return INPUT HTML source.

Example:

ginner|0|div|post|L|1|

Get inner content from input HTML source for the “div” tag with a string “post” inside that div, no matter that string is id, class or any attribute. For example: <html>...<body>...<div class=”post” id=”whatever” what_ever=”attribute”>I want to get this text</div>...</body></html> Will return “I want to get this text” by using above sample.

Function: remove

To remove an HTML tag out our input HTML source

Syntax

remove|{LINE}|{TAG}|{DELIMITER}|

Example

remove|0|div|post|

Remove div tag which has string “post” inside. For example:

<html>...<body>...ABC<div class=”post” id=”whatever” what_ever=”attribute”>I want to get this text</div>XYZ...</body></html>

Will return

<html>...<body>...ABCXYZ...</body></html>

Which is the input HTML source without the div tag with string “post” inside.

Function: split

To split/seperate HTML source to many parts base on a delimiter. This function is pretty similar to explode function in PHP (if you know PHP program language).

Syntax

split|{LINE}|{DELIMITER}|{RETURN}|{DEBUG}|
  • {LINE}: the output from {LINE}. Basically, you can put many lines and each line will have an output itself, and we can use output of this line as input of other line. “0” means the original input of the processor, “1” means the output of the line#1.
  • {DELIMITER}: HTML or Text to delimiter the INPUT HTML source.
  • {RETURN}: the number of part will be returned for the processor, start with 0, L stands for Last part.
  • {DEBUG}: debug mode in the case the {DELIMITER} cannot be found from INPUT HTML source.
    • 0: return "” (empty string) in the case an error occurs.
    • 1: stop immediately in the case an error occurs.
    • 2: return INPUT HTML source.

Example

Example 1
split|0|<div class="post">|L|1|

Split the INPUT HTML source to many parts by the delimiter <div class=”post”>, it gets the last part, and if nothing found, it will stop immediately and start over with the new item.

Example 2
split|2|<p class="paragraph">|1|2|

Split the output from line#2 by the delimiter <p class=”paragraph”>, it gets the first part, and if nothing found, it will return the line-itself input.

Function: wrap

wrap/combine one or many parts (which returned by other lines) by a new HTML format.

Syntax

wrap|{INPUT_LINE1,INPUT_LINE2,...}|{WRAP_HTML}|
  • {INPUT_LINE1,INPUT_LINE2,...}: input lines variables to be wrapped.
  • {WRAP_HTML}: there are variables in {WRAP_HTML}
    • {ogb-0} understands for the first line-parameter in INPUT_LINE1, this will be replaced by the output value of INPUT_LINE1.
    • {ogb-1} understands for the first line-parameter in INPUT_LINE2, this will be replaced by the output value of INPUT_LINE2.

Example

wrap|3,5|<div class="content">{ogb-0}<hr />{ogb-1}|

Combine line#3 and line#5 into the new formated HTML source, the first line parameter (line#3) will be replaced for {obg-0}, the second line parameter (line#5) will be replaced for {obg-1}.

Function: replace

replaces an INPUT_SOURCE by a new one.

Syntax

replace|{INPUT_LINE}|{SEARCH}|{REPLACE}|
  • {INPUT_LINE}: get input from other line output.
  • {SEARCH}: search this string.
  • {REPLACE}: and replace by this string.

Example

replace|5|<div class="abc"|<div class="xyz" |

Find <div class=”abc” from line#5 output, replace it by <div class=”xyz” 

Comment
There are no comments made yet.


There are no replies made for this post yet.
Be one of the first to reply to this post!