现在时间是:
当前位置:首 页 >> 数据采集>> 文章列表

Requests 直接操作Cookies

作者:   发布时间:2015-11-26 17:26:08   浏览次数:3246

COOKIES是网络的通行证,其加之IP和HEADERS就组成了一套客户端的信息卡,有了这张卡,很多用JS或者其他非明文方式登录的方式,就可以直接解决!!!

 

import requests
import os,time,json
s= requests.session()
my_headers = { 'User-Agent': 'Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; .NET4.0C; .NET4.0E)',
} #######下面这段cookies可以直接在firebug中获取,但是需要适当的编辑,大多时候json.load也会出现各种错误提示。 my_cookies = { '__gads=ID':'1912f5********794:T=1448516996:S=ALNI_MYVU900dFxwJ571AFZctGV_ekSPXw',
'_ga':'GA1.2.9****718.14****3788',
'.CNBlogsCookie':"5536C673C042CFE26101E64F427D89D10A****0CFDBF6FA1D437BF9E1B6BEB4DCBB7E3****0F87E6D65823****5F884CA845DE3303CD2EDE****C5BA6C1B779BB30C2B3BE1D1B05A8",
'__utma':'226****.917********.144****788.1448****788.144****3788.1',
'__utmz':'226****.14********.1.1.utmcsr=baidu|utmccn=(organic)|utmcmd=organic',
'__utmc':'226****',
} s.get('http://www.cnblogs.com/',headers=my_headers) #先登录主页获取第一步的cookies,以方便更新。 s.cookies.update(my_cookies) #这一步就是更新Cookies result = s.get('http://home.cnblogs.com/followees/').text print(result)

以上主要是因为博客园的登录采用的JS进行了加密,下来准备用Pyv8与JS进行交互!经过折腾3.4一时半会装不上PyV8,所以只能考虑PyQT的webkit,实在不想折回去用Python2.7

 

对于 javascript 动态创建网页标签生成的网页,用python的urllib2抓取不到最终的网页。

使用Qt的QWebkit组件可以模拟浏览器执行javascript并讲最终网页提取出来。

之前网上有很多关于pyqt4抓取动态网页的帖子,但是pyqt5与pyqt4有细微不同,纪录下来以备后面查看。

import sys
from PyQt5.QtWidgets import *
from PyQt5.QtCore import *
from PyQt5.QtWebKitWidgets import *

#use QtWebkit to get the final webpage
class WebRender(QWebPage):
    def __init__(self, url):
        self.app = QApplication(sys.argv)
        QWebPage.__init__(self)
        self.loadFinished.connect(self.__loadFinished)
        self.mainFrame().load(QUrl(url))
        self.app.exec_()

    def __loadFinished(self, result):
        self.frame = self.mainFrame()
        self.app.quit()

 r = WebRender(url)
        html = r.frame.toHtml()
        page = etree.HTML(html.encode('utf-8'))


注意:

1、PyQt5的 QWebPage 要从PyQt5.QtWebKitWidgets 导入,QApplication 要从PyQt5.QtWidgets 导入

2、r.frame.toHtml() 得到的html要encode成utf-8,以支持对中文字符的支持

 

 

python取得javascript里面的值

复制代码 代码如下:

import PyV8

with PyV8.JSContext() as env1:
    env1.eval("""
                var_i = 1;
                var_f = 1.0;
                var_s = "test";
                var_b = true;
            """)
    vars = env1.locals
    var_i = vars.var_i
    print var_i


javascript取得python里面的值

复制代码 代码如下:

import PyV8

with PyV8.JSContext() as env1:
    env1.securityToken = "foo"
    env1.locals.prop = 3
    print int(env1.eval("prop"))

python和javascript里面的函数交互

python调用javascript里面的函数 python调用func就可以使用js里面的function函数了

复制代码 代码如下:

import PyV8

with PyV8.JSContext() as ctxt:
    func = ctxt.eval("""
                (function ()
                {
                    function a()
                    {
                        return "abc";
                    }

                    return a();
                })
                """)
    print func()

这样也可以

复制代码 代码如下:

import PyV8

with PyV8.JSContext() as ctxt:
    func = ctxt.eval("""
                    function a()
                    {
                        return "abc";
                    }

                    function c()
                    {
                        return "abc";
                    }
                """)
    a = ctxt.locals.a
    print a()

http://www.jb51.net/article/44116.htm

 

麻蛋,终于找到一个可以在py3中执行JS的模块,哈哈....https://github.com/doloopwhile/PyExecJS

PyExecJS

Build Status

Run JavaScript code from Python.

PyExecJS is a porting of ExecJS from Ruby. PyExecJS automatically picks the best runtime available to evaluate your JavaScript program.

A short example:

>>> import execjs >>> execjs.eval("'red yellow blue'.split(' ')") ['red', 'yellow', 'blue'] >>> ctx = execjs.compile(""" ...     function add(x, y) { ...         return x + y; ...     } ... """) >>> ctx.call("add", 1, 2) 3 

The pros of PyExecJS is that you do not need take care of JavaScript environment. Especially, it works in Windows environment without installing extra libraries.

One of cons of PyExecJS is performance. PyExecJS communicate JavaScript runtime by text and it is slow. The other cons is that it does not fully support runtime specific features.

PyV8 might be better choice for some use case.

Installation

$ pip install PyExecJS 

or

$ easy_install PyExecJS 

Details

PyExecJS supports these runtimes:

If EXECJS_RUNTIME environment variable is specified, PyExecJS pick the JavaScript runtime as a default:

>>> execjs.get().name # this value is depends on your environment. >>> os.environ["EXECJS_RUNTIME"] = "Node" >>> execjs.get().name 'Node.js (V8)' 

You can choose JavaScript runtime by execjs.get():

>>> default = execjs.get() # the automatically picked runtime >>> default.eval("1 + 2") 3 >>> jscript = execjs.get("JScript") >>> jscript.eval("1 + 2") 3 >>> node = execjs.get("Node") >>> node.eval("1 + 2") 3 

License

Copyright (c) 2012 Omoto Kenji. Copyright (c) 2011 Sam Stephenson and Josh Peek.

Released under the MIT license. See LICENSE for details.

 

 

ExecJS让您可以在Ruby中运行JavaScript代码。它自动选择最佳运行可用来评估你的JavaScript程序,然后将结果作为一个Ruby对象返回。

ExecJS支持以下运行期:

1
2
3
4
5
6
7
require "execjs"
require "open-uri"
source = open("http://jashkenas.github.com/coffee-script/extras/coffee-script.js").read
 
context = ExecJS.compile(source)
context.call("CoffeeScript.compile", "square = (x) -> x * x", :bare => true)
# => "var square;nsquare = function(x) {n  return x * x;n};"






上一篇:没有了    下一篇:没有了

Copyright ©2018    易一网络科技|www.yeayee.com All Right Reserved.

技术支持:自助建站 | 领地网站建设 |短信接口 版权所有 © 2005-2018 lingw.net.粤ICP备16125321号 -5