Monday, 30 September 2013

What does [^.]* mean in regular expression?

What does [^.]* mean in regular expression?

I'm trying to get 482.75 from the following text: <span
id="yfs_l84_aapl">482.75</span>
The regex I used is: regex = '<span id="yfs_l84_[^.]*">(.+?)</span>' and
it worked.
But the thing that I do not understand is why [^.]* can match aapl here?
My understanding is that . means any character except a newline; and ^
means negator. So [^.] should be newline and [^.]* should be any number of
new lines. However this theory is contrary to real world implementation.
Any help is appreciated and thanks in advance.



The python code I used:
import urllib
import re
htmlfile = urllib.urlopen("http://finance.yahoo.com/q?s=AAPL&ql=0")
htmltext = htmlfile.read()
regex = '<span id="yfs_l84_[^.]*">(.+?)</span>'
pattern = re.compile(regex)
price = re.findall(pattern, htmltext)
print "the price of of aapl is", price[0]

No comments:

Post a Comment