Python - 學習中 & 基本爬蟲

Python - 學習中 & 基本爬蟲

# **if用法、長度**
filename = 'default.properties'
if len(sys.argv) != 1:
    filename = sys.argv[1]
print(filename)

# **撈取div class ="title" 裡的內容**
shop_links_a_tags = soup.find_all('div',{'class' : 'title'})

# ** 尋找字元**
可以用 find or index
stra="This is a string"
try:
  print(stra.find("aaa"))
except ValueError:
  print("nk")


time.sleep(2)

**# send post**
http://docs.python-requests.org/en/master/user/quickstart/
Now, let's try to get a webpage. For this example, let's get GitHub's public timeline:

r = requests.get('https://api.github.com/events')
Now, we have a Response object called r. We can get all the information we need from this object.

Requests' simple API means that all forms of HTTP request are as obvious. For example, this is how you make an HTTP POST request:

 r = requests.post('http://httpbin.org/post', data = {'key':'value'})
Nice, right? What about the other HTTP request types: PUT, DELETE, HEAD and OPTIONS? These are all just as simple:

 r = requests.put('http://httpbin.org/put', data = {'key':'value'})
 r = requests.delete('http://httpbin.org/delete')
 r = requests.head('http://httpbin.org/get')
 r = requests.options('http://httpbin.org/get')
That's all well and good, but it's also only the start of what Requests can do.

Passing Parameters In URLs
You often want to send some sort of data in the URL's query string. If you were constructing the URL by hand, this data would be given as key/value pairs in the URL after a question mark, e.g. httpbin.org/get?key=val. Requests allows you to provide these arguments as a dictionary, using the params keyword argument. As an example, if you wanted to pass key1=value1 and key2=value2 to httpbin.org/get, you would use the following code:

 payload = {'key1': 'value1', 'key2': 'value2'}
 r = requests.get('http://httpbin.org/get', params=payload)
You can see that the URL has been correctly encoded by printing the URL:

 print(r.url)
http://httpbin.org/get?key2=value2&key1=value1

**# subString**
 x = "Hello World!"
 x[2:]
'llo World!'
 x[:2]
'He'
 x[:-2]
'Hello Worl'
 x[-2:]
'd!'
 x[2:-2]
'llo Worl'

//爬愛評網的基本示範

import requests
from bs4 import BeautifulSoup

HTML_PARSER = "html.parser"
ROOT_URL = 'http://www.ipeen.com.tw'
LIST_URL = 'http://www.ipeen.com.tw/search/taiwan/000/1-0-0-0/'

def get_shop_link_list():
    list_req = requests.get(LIST_URL)
    if list_req.status_code == 200:
        soup = BeautifulSoup(list_req.content, HTML_PARSER)
        shop_links_a_tags = soup.find_all('a', attrs={'data-label': '店名'})

        shop_links = []
        for link in shop_links_a_tags:
            print(link)
          #print(ROOT_URL + link['href'])
          #shop_links.append(ROOT_URL + link['href'])

HTML_PARSER = "html.parser"
ROOT_URL_ptt = 'http://verywed.com/'
LIST_URL_ptt = 'http://verywed.com/forum/expexch'

def get_shop_link_list2():
    list_req = requests.get(LIST_URL_ptt)
    if list_req.status_code == 200:
        soup = BeautifulSoup(list_req.content, HTML_PARSER)
        shop_links_a_tags = soup.find_all('a')
        #'div', class_='clearfix'
        shop_links = []
        
        for link in shop_links_a_tags:
            print(link)
          #print(ROOT_URL_ptt + link['listThreadTable_m'])
          #print(link['href'])
          #shop_links.append(ROOT_URL_ptt + link['href'])
          
          
if __name__ == '__main__':
    get_shop_link_list2()