Saturday, 7 September 2013

Getting attributed html element

Getting attributed html element

I'm trying to get table with content of MMEL codes from this site and I'm
trying to accomplish it with CSS Selectors.
What I've got so far is:
require_relative 'sources/Downloader'
require 'nokogiri'
html_content =
Downloader.download_page('http://www.s-techent.com/ATA100.htm')
parsed_html = Nokogiri::HTML(html_content)
tmp = parsed_html.css("tr[*]")
puts tmp.text
And I'm getting error while trying to get this tr with attribute. How can
I complete this task to get this table in simple form because I want to
parse it to JSON. It would be nice go get this in sections and call it
in.each block.



EDIT: I'd be nic if I can get things in block like this (look into pages
source)
<TR><TD WIDTH="10%" VALIGN="TOP" ROWSPAN=5>
<B><FONT FACE="Arial" SIZE=2><P ALIGN="CENTER">11</B></FONT></TD>
<TD WIDTH="40%" VALIGN="TOP" COLSPAN=2>
<B><FONT FACE="Arial" SIZE=2><P>PLACARDS AND MARKINGS</B></FONT></TD>
<TD WIDTH="50%" VALIGN="TOP">
<FONT FACE="Arial" SIZE=2><P ALIGN="LEFT">All procurable placards, labels,
etc., shall be included in the illustrated Parts Catalog. They shall be
illustrated, showing the part number, Legend and Location. The
Maintenance Manual shall provide the approximate Location (i.e., FWD
-UPPER -RH) and illustrate each placard, label, marking, self
-illuminating sign, etc., required for safety information, maintenance
significant information or by government regulations. Those required by
government regulations shall be so identified.</FONT></TD>
</TR>

No comments:

Post a Comment