2013-12-08

python wsgi教程4——GET请求

执行之前的程序，然后在浏览器中打开 http://localhost:8051/?a=10&b=w&b=r 这样的url。

环境变量字典中保存了请求信息REQUEST_METHOD和QUERY_STRING。问号之后的内容即为此次请求字符串的值。可以写一个函数对它进行解析，或者直接使用CGI模块的parse_qs函数，它返回一个字典，其值为一个列表。

下面是一个简单的例子：

#!/usr/bin/env python

from wsgiref.simple_server import make_server
from cgi import parse_qs, escape

html = """
<html>
<body>
   <form method="get" action="">
      <p>
         Age: <input type="text" name="age">
         </p>
      <p>
         Hobbies:
         <input name="hobbies" type="checkbox" value="software"> Software
         <input name="hobbies" type="checkbox" value="tunning"> Auto Tunning
         </p>
      <p>
         <input type="submit" value="Submit">
         </p>
      </form>
   <p>
      Age: %s<br>
      Hobbies: %s
      </p>
   </body>
</html>"""

def application(environ, start_response):

   # Returns a dictionary containing lists as values.
   d = parse_qs(environ['QUERY_STRING'])

   # In this idiom you must issue a list containing a default value.
   age = d.get('age', [''])[0] # Returns the first age value.
   hobbies = d.get('hobbies', []) # Returns a list of hobbies.

   # Always escape user input to avoid script injection
   age = escape(age)
   hobbies = [escape(hobby) for hobby in hobbies]

   response_body = html % (age or 'Empty',
               ', '.join(hobbies or ['No Hobbies']))

   status = '200 OK'

   # Now content type is text/html
   response_headers = [('Content-Type', 'text/html'),
                  ('Content-Length', str(len(response_body)))]
   start_response(status, response_headers)

   return [response_body]

httpd = make_server('localhost', 8051, application)
# Now it is serve_forever() in instead of handle_request().
# In Windows you can kill it in the Task Manager (python.exe).
# In Linux a Ctrl-C will do it.
httpd.serve_forever()

2013-12-08

Web开发

python wsgi教程3——响应

将上一个例子的返回：

return [response_body]

改为：

return response_body

再次运行会发现速度变慢了。这是因此服务器对发送过来的字符串是按单个字节进行迭代的，所以最好对返回的字符串用一个可迭代对象包装一下。

如果返回的这个可迭代对象生成多个字符串，那么正文的长度即为这些字符串长度的总和。

接下来看一个例子：

#! /usr/bin/env python

from wsgiref.simple_server import make_server

def application(environ, start_response):

   response_body = ['%s: %s' % (key, value)
                    for key, value in sorted(environ.items())]
   response_body = '\n'.join(response_body)

   # Response_body has now more than one string
   response_body = ['The Beggining\n',
                    '*' * 30 + '\n',
                    response_body,
                    '\n' + '*' * 30 ,
                    '\nThe End']

   # So the content-lenght is the sum of all string's lengths
   content_length = 0
   for s in response_body:
      content_length += len(s)

   status = '200 OK'
   response_headers = [('Content-Type', 'text/plain'),
                  ('Content-Length', str(content_length))]
   start_response(status, response_headers)

   return response_body

httpd = make_server('localhost', 8051, application)
httpd.handle_request()

2013-12-08

Web开发

python wsgi教程2——环境变量

环境变量字典包含了类似于CGI的变量，它是在每次请求时被服务器填充。

#! /usr/bin/env python

# Our tutorial's WSGI server
from wsgiref.simple_server import make_server

def application(environ, start_response):

   # Sorting and stringifying the environment key, value pairs
   response_body = ['%s: %s' % (key, value)
                    for key, value in sorted(environ.items())]
   response_body = '\n'.join(response_body)

   status = '200 OK'
   response_headers = [('Content-Type', 'text/plain'),
                  ('Content-Length', str(len(response_body)))]
   start_response(status, response_headers)

   return [response_body]

# Instantiate the WSGI server.
# It will receive the request, pass it to the application
# and send the application's response to the client
httpd = make_server(
   'localhost', # The host name.
   8051, # A port number where to wait for the request.
   application # Our application object name, in this case a function.
   )

# Wait for a single request, serve it and quit.
httpd.handle_request()

执行该脚本，然后在浏览器中打开http://localhost:8051/ 查看效果。

这个例子是将环境变量字典的值全部输出。

2013-12-08

Web开发

python wsgi教程1——介绍

WSGI(Web Server Gateway Interface)并不是一个服务器，而是一个协议。最开始是用Python写的，现在很多语言都有了对应的实现。详细内容可以看这里： http://www.python.org/dev/peps/pep-3333/

WSGI应用程序接口是一个可调用的对象。它必须接收两个固定的参数：一个包含了类似CGI变量的字典；一个可调用的函数用于返回HTTP状态代码和数据头。

# This is our application object. It could have any name,
# except when using mod_wsgi where it must be "application"
def application( # It accepts two arguments:
      # environ points to a dictionary containing CGI like environment variables
      # which is filled by the server for each received request from the client
      environ,
      # start_response is a callback function supplied by the server
      # which will be used to send the HTTP status and headers to the server
      start_response):

   # build the response body possibly using the environ dictionary
   response_body = 'The request method was %s' % environ['REQUEST_METHOD']

   # HTTP response code and message
   status = '200 OK'

   # These are HTTP headers expected by the client.
   # They must be wrapped as a list of tupled pairs:
   # [(Header name, Header value)].
   response_headers = [('Content-Type', 'text/plain'),
                       ('Content-Length', str(len(response_body)))]

   # Send them to the server using the supplied function
   start_response(status, response_headers)

   # Return the response body.
   # Notice it is wrapped in a list although it could be any iterable.
   return [response_body]

以上是一个应用程序的基本框架。由于没有服务器，因此这段代码目前还不能运行。

2013-12-07

Web开发

python cgi教程6——Session

Session是位于服务器端的Cookie。它保存在服务器上的文件或者数据库中。每条session是由session id(SID)进行标识。

基于Cookie的SID

Cookie可以长久胡保存SID，直到Cookie过期。用这种方式更快更安全。但是得客户端的浏览器支持Cookie才行。

#!/usr/bin/env python

import sha, time, Cookie, os

cookie = Cookie.SimpleCookie()
string_cookie = os.environ.get('HTTP_COOKIE')

# If new session
if not string_cookie:
   # The sid will be a hash of the server time
   sid = sha.new(repr(time.time())).hexdigest()
   # Set the sid in the cookie
   cookie['sid'] = sid
   # Will expire in a year
   cookie['sid']['expires'] = 12 * 30 * 24 * 60 * 60
# If already existent session
else:
   cookie.load(string_cookie)
   sid = cookie['sid'].value

print cookie
print 'Content-Type: text/html\n'
print '<html><body>'

if string_cookie:
   print '<p>Already existent session</p>'
else:
   print '<p>New session</p>'

print '<p>SID =', sid, '</p>'
print '</body></html>'

我们对服务器的时间进行哈希生成一个唯一的Session ID。

基于Query String的SID

#!/usr/bin/env python

import sha, time, cgi, os

sid = cgi.FieldStorage().getfirst(‘sid’)

if sid: # If session exists
message = ‘Already existent session’
else: # New session
# The sid will be a hash of the server time
sid = sha.new(repr(time.time())).hexdigest()
message = ‘New session’

qs = ‘sid=’ + sid

print “””\
Content-Type: text/html\n

%s

SID = %s

reload

“”” % (message, sid, sid)

这是将sid直接在url中传递。

使用隐藏域

#!/usr/bin/env python

import sha, time, cgi, os

sid = cgi.FieldStorage().getfirst(‘sid’)

if sid: # If session exists
message = ‘Already existent session’
else: # New session
# The sid will be a hash of the server time
sid = sha.new(repr(time.time())).hexdigest()
message = ‘New session’

qs = ‘sid=’ + sid

print “””\
Content-Type: text/html\n

%s

SID = %s

“”” % (message, sid, sid)

这是将sid放在表单中作为隐藏字段提交。

shelve模块

光有session id是不够的，还需要将内容保存到文件或者数据库中。这里可以使用shelve模块保存到文件。

session = shelve.open('/tmp/.session/sess_' + sid, writeback=True)

它打开文件并返回一个类似于字典的对象。

session['lastvisit'] = repr(time.time())

设置session的值。

lastvisit = session.get('lastvisit')

读取刚刚设置的值。

session.close()

最后操作完成之后要记得关闭文件。

Cookie和Shelve

接下来用一个例子展示下Cookie和Shelve共同使用。

#!/usr/bin/env python
import sha, time, Cookie, os, shelve

cookie = Cookie.SimpleCookie()
string_cookie = os.environ.get('HTTP_COOKIE')

if not string_cookie:
   sid = sha.new(repr(time.time())).hexdigest()
   cookie['sid'] = sid
   message = 'New session'
else:
   cookie.load(string_cookie)
   sid = cookie['sid'].value
cookie['sid']['expires'] = 12 * 30 * 24 * 60 * 60

# The shelve module will persist the session data
# and expose it as a dictionary
session_dir = os.environ['DOCUMENT_ROOT'] + '/tmp/.session'
session = shelve.open(session_dir + '/sess_' + sid, writeback=True)

# Retrieve last visit time from the session
lastvisit = session.get('lastvisit')
if lastvisit:
   message = 'Welcome back. Your last visit was at ' + \
      time.asctime(time.gmtime(float(lastvisit)))
# Save the current time in the session
session['lastvisit'] = repr(time.time())

print """\
%s
Content-Type: text/html\n
<html><body>
<p>%s</p>
<p>SID = %s</p>
</body></html>
""" % (cookie, message, sid)

2013-12-07

Web开发

python cgi教程5——Cookie

设置Cookie

有两个与cookie相关的操作，设置cookie和读取cookie。

以下例子展示了cookie的设置。

#!/usr/bin/env python
import time

# This is the message that contains the cookie
# and will be sent in the HTTP header to the client
print 'Set-Cookie: lastvisit=' + str(time.time());

# To save one line of code
# we replaced the print command with a '\n'
print 'Content-Type: text/html\n'
# End of HTTP header

print '<html><body>'
print 'Server time is', time.asctime(time.localtime())
print '</body></html>'

这是在数据头中使用Set-Cookie进行的操作。

检索Cookie

浏览器返回来的cookie存放于os.environ字典中，对应的字段名为’HTTP_COOKIE’。以下是一个例子：

#!/usr/bin/env python
import Cookie, os, time

cookie = Cookie.SimpleCookie()
cookie['lastvisit'] = str(time.time())

print cookie
print 'Content-Type: text/html\n'

print '<html><body>'
print '<p>Server time is', time.asctime(time.localtime()), '</p>'

# The returned cookie is available in the os.environ dictionary
cookie_string = os.environ.get('HTTP_COOKIE')

# The first time the page is run there will be no cookies
if not cookie_string:
   print '<p>First visit or cookies disabled</p>'

else: # Run the page twice to retrieve the cookie
   print '<p>The returned cookie string was "' + cookie_string + '"</p>'

   # load() parses the cookie string
   cookie.load(cookie_string)
   # Use the value attribute of the cookie to get it
   lastvisit = float(cookie['lastvisit'].value)
   
   print '<p>Your last visit was at',
   print time.asctime(time.localtime(lastvisit)), '</p>'

print '</body></html>'

使用SimpleCookie对象的load()方法对字符串进行解析。

2013-12-07

Web开发

python cgi教程4——执行Shell命令

可以使用subprocess.Popen或者os.popen4让cgi执行shell命令。

#!/usr/bin/python
import cgitb; cgitb.enable()

# The subprocess module is new in 2.4
import os, urllib, subprocess as sub

# Retrieve the command from the query string
# and unencode the escaped %xx chars
str_command = urllib.unquote(os.environ['QUERY_STRING'])

p = sub.Popen(['/bin/bash', '-c', str_command], 
    stdout=sub.PIPE, stderr=sub.STDOUT)
output = urllib.unquote(p.stdout.read())

print """\
Content-Type: text/html\n
<html><body>
<pre>
$ %s
%s
</pre>
</body></html>
""" % (str_command, output)

注意：这只是一个例子，在生产环境中这么使用是非常的不安全。

可以使用Cookies和Session对用户进行认证以提高安全性。

2013-12-07

Web开发

python cgi教程3——表单

cgi模块中有一个FieldStorage类可用于表单处理。

单一字段名

有一个HTML表单如下：

<html><body>
<form method="get" action="/cgi-bin/form1.py">
Name: <input type="text" name="name">
<input type="submit" value="Submit">
</form>
</body></html>

form1.py内容为：

#!/usr/bin/env python
import cgi
form = cgi.FieldStorage() # instantiate only once!
name = form.getfirst('name', 'empty')

# Avoid script injection escaping the user input
name = cgi.escape(name)

print """\
Content-Type: text/html\n
<html><body>
<p>The submitted name was "%s"</p>
</body></html>
""" % name

getfirst方法获取指定字段的第一个值，如果该字段不存在则为空。将表单的方法改为post它同样适用。

为了避免用户提交危险的内容，可以使用cgi.escape()方法对内容进行转换。

多字段名

对于多个字段具有相同名字的可以使用getlist()方法，它返回一个列表包含了这些值。

<html><body>
<form method="post" action="/cgi-bin/form2.py">
Red<input type="checkbox" name="color" value="red">
Green<input type="checkbox" name="color" value="green">
<input type="submit" value="Submit">
</form>
</body></html>

form2.py内容如下：

#!/usr/bin/env python
import cgi
form = cgi.FieldStorage()

# getlist() returns a list containing the
# values of the fields with the given name
colors = form.getlist('color')

print "Content-Type: text/html\n"
print '<html><body>'
print 'The colors list:', colors
for color in colors:
   print '<p>', cgi.escape(color), '</p>'
print '</body></html>'

文件上传

<html><body>
<form enctype="multipart/form-data" action="/cgi-bin/form3.py" method="post">
<p>File: <input type="file" name="file"></p>
<p><input type="submit" value="Upload"></p>
</form>
</body></html>

getfirst()和getlist()都只能获取文件的内容。想获取文件名需要使用FieldStorage。

form3.py内容如下：

#!/usr/bin/env python
import cgi, os
import cgitb; cgitb.enable()

try: # Windows needs stdio set for binary mode.
    import msvcrt
    msvcrt.setmode (0, os.O_BINARY) # stdin  = 0
    msvcrt.setmode (1, os.O_BINARY) # stdout = 1
except ImportError:
    pass

form = cgi.FieldStorage()

# A nested FieldStorage instance holds the file
fileitem = form['file']

# Test if the file was uploaded
if fileitem.filename:

   # strip leading path from file name to avoid directory traversal attacks
   fn = os.path.basename(fileitem.filename)
   open('files/' + fn, 'wb').write(fileitem.file.read())
   message = 'The file "' + fn + '" was uploaded successfully'

else:
   message = 'No file was uploaded'

print """\
Content-Type: text/html\n
<html><body>
<p>%s</p>
</body></html>
""" % (message,)

大文件上传

在处理大文件时如果内存不足，可以使用生成器将文件分成小片。

可将之前的脚本改写如下：

#!/usr/bin/env python
import cgi, os
import cgitb; cgitb.enable()

try: # Windows needs stdio set for binary mode.
    import msvcrt
    msvcrt.setmode (0, os.O_BINARY) # stdin  = 0
    msvcrt.setmode (1, os.O_BINARY) # stdout = 1
except ImportError:
    pass

form = cgi.FieldStorage()

# Generator to buffer file chunks
def fbuffer(f, chunk_size=10000):
   while True:
      chunk = f.read(chunk_size)
      if not chunk: break
      yield chunk

# A nested FieldStorage instance holds the file
fileitem = form['file']

# Test if the file was uploaded
if fileitem.filename:

   # strip leading path from file name to avoid directory traversal attacks
   fn = os.path.basename(fileitem.filename)
   f = open('files/' + fn, 'wb', 10000)

   # Read the file in chunks
   for chunk in fbuffer(fileitem.file):
      f.write(chunk)
   f.close()
   message = 'The file "' + fn + '" was uploaded successfully'

else:
   message = 'No file was uploaded'

print """\
Content-Type: text/html\n
<html><body>
<p>%s</p>
</body></html>
""" % (message,)

2013-12-06

Web开发

python cgi教程2——调试

有时数据头出错是很难定位的，除非有权限访问服务器日志。

好在Python有cgitb模块，可以将异常的堆栈信息放在正文中，作为HTML输出。

以下是一个简单的例子：

#!/usr/bin/env python
print "Content-Type: text/html\n"
import cgitb; cgitb.enable()
print 1/0

也可以使用handler()方法进行捕获异常处理。

#!/usr/bin/env python
print "Content-Type: text/html"
print
import cgitb
try:
   f = open('non-existent-file.txt', 'r')
except:
   cgitb.handler()

还有一种更直接的方法，将数据头设为”text/plain”并把标准错误输出设置到标准输出。

print "Content-Type: text/plain"
print
import sys
sys.stderr = sys.stdout
f = open('non-existent-file.txt', 'r')

注意：这些只是用于在开发阶段，在生产环境中要把它禁用。以免异常信息被攻击者利用。

2013-12-06

Web开发

python cgi教程1——Hello World

简介

CGI(Common Gateway Interface)，通用网关接口的简称。它是客户端和服务器程序进行数据传输的一种标准。

一个CGI程序可以使用任何语言编写，通常它是放在Web服务器（如Apache）目录下的cgi-bin目录里。

实例

接下来看一个简单的例子。

#!/usr/bin/env python
print "Content-Type: text/html"
print
print """\
<html>
<body>
<h2>Hello World!</h2>
</body>
</html>
"""

脚本程序的第一行指定了python解释器的路径。在你系统中它也可能为：

#!/usr/bin/python
#!/usr/bin/python2
#!c:\Python26\python.exe
#!c:\Python27\python.exe

1 2	print "Content-Type: text/html" print

脚本必须输出一个HTTP的头，它由一条或者多条消息构成，然后再一个空行。空行是必需的，它意味着头的结束。

这里我们想要把输出作为HTML解释，因此指定Content-Type为 text/html。

这里也可以写成：

1	print "Content-Type: text/html\n"

保存以上脚本，并添加执行权限。然后在浏览器中访问执行该脚本，应该可以看到”Hello World”这几个字。