终于解决了这个问题,原来是我的代码中构造HTTP header的时候多了可以接受gzip压缩,支持gzip压缩的网页就下载了也不能用BeautifulSoup分析了,原来1ting.com现在支持gzip压缩了,还换了一个nProxy,多半是把ngnix的代码改了配置重新编译了~ 真是很~~
1 2 3 4 5 6 7 8 9 10 11 12 13 | # Use Pycurl def buildHeaders(browser, referer=""): """ Build HTTP Headers, So we can download wma files. Arguments: - `browser`: Which browser will use - `referer`: Referer url """ if referer != "": buildHeaders = ['User-Agent: ' + browser, 'Accept: text/html, application/xml;q=0.9, audio/x-ms-wma, application/xhtml+xml, image/png, gzip, x-gzip, image/jpeg, image/gif, image/x-xbitmap, */*;q=0.1', 'Accept-Language: en-us', 'Accept-Encoding: deflate, identity, *;q=0', 'Accept-Charset: iso-8859-1, utf-8, utf-16, *;q=0.1', 'Cookie: PIN=G39J3kmH2AU0SBieDgavAg==', 'Referer:' + referer] else: buildHeaders = ['User-agent: ' + browser, 'Accept: text/html, application/xml;q=0.9, audio/x-ms-wma, application/xhtml+xml, image/png, gzip, x-gzip, image/jpeg, image/gif, image/x-xbitmap, */*;q=0.1', 'Accept-Language: en-us', 'Accept-Encoding: deflate, identity, *;q=0', 'Accept-Charset: iso-8859-1, utf-8, utf-16, *;q=0.1', 'Cookie: PIN=G39J3kmH2AU0SBieDgavAg=='] return buildHeaders |
Recent Comments