If it's not on-topic, it's in here.
Post a reply

script question

Fri Mar 01, 2013 12:25 am

I was asked if i could help with a script. As i was asked and don't need help myself, i put this in the general nonsense section.
I gave up pretty fast, then thought that it is a good or interesting subject.

In Germany we got "gelbe Seiten". There you can look up people who do certain works (dentist, garderner, well: anything).
They got a web-site:
http://www.gelbeseiten.de/
From there the search results are needed, and what needs to be extracted from search results is:
name, address, phone, mail and www

I first thought this might be easy. But for me it isn't.
The 2 search boxes ask for:
< search term > (say gardener)
< city > (say Berlin)

The first problem i got is that i would not know how to tell wget (or whatever one uses) to "enter" the search term.
As i couldn't figure out, i just picked a search term ("Tischler", which is a carpenter) and a city (Arnsdorf), took the url:
http://www.gelbeseiten.de/tischler/arnsdorf
and simply wget it.
Then i was lost again, when trying to find a pattern for name, address, phone, etc, and gave up.

Any ideas or experiences or ever-run-into-something-which-sounded-like-that?

Re: script question

Fri Mar 01, 2013 3:26 pm

OK, I'll start:
Code:
# get the names
sed -n '/itemprop="name"/{n;p;}' arnsdorf.html | sed 's/<\/span>//g'

# get the street addresses
grep street-address arnsdorf.html | sed 's/<span\ class=\"subscriber_street_address\"\ itemprop=\"street-address\">//g' |sed 's/<\/span>,//g'
I'll leave it to you to sort the output so that the names match the street addresses.

And now, I'll give up. My gut feeling is that there's a way to do it with javascript, since it's already laid out that way. But I really have no idea what I'm talking about. Are they pulling the data from a database, or do they have to manually add each entry to the page? If from a database, you'd probably do better to access it directly. And I'll guess ahead of time that you can't do that.

Re: script question

Sat Mar 02, 2013 3:04 pm

What means {n;p;}
I can see what it does, but would not be able to change it in case i ever need it for something different.


A bit naive i tried this:

for i in $(
sed -n '/itemprop="name"/{n;p;}' arnsdorf | sed 's/<\/span>//g'
)
do
# get the street addresses
grep street-address arnsdorf | sed 's/<span\ class=\"subscriber_street_address\"\ itemprop=\"street-address\">//g' |sed 's/<\/span>,//g'
done

-
Not much better, but perhaps on the right track:

for i in $(
sed -n '/itemprop="name"/{n;p;}' arnsdorf | sed 's/<\/span>//g'
)
do
echo $i
# grep street-address arnsdorf | sed 's/<span\ class=\"subscriber_street_address\"\ itemprop=\"street-address\">//g' |sed 's/<\/span>,//g'
echo
done

Re: script question

Sun Mar 03, 2013 1:32 am

I'm not really sure what the letters mean. I think the ;p is print, and ;n might mean next. You can also do {;n;n;p} and so on, to get more lines after the line with the pattern.

Re: script question

Sun Mar 03, 2013 4:38 am

Thanks. Really.

The bad news: sed will have to wait for me. I looked at the link you gave in the sed thread, and i could also have watched the stars for half an hour.
Post a reply