APACHE2 AND MOD PYTHON IN A SHARED HOSTING ENVIRONMENT 295 1 nohup python fcgihandler.py & Then, (re)start the web server with: 1 /etc/init.d/lighttpd restart Notice that FastCGI binds the web2py server to a Unix socket, not to an IP socket: 1 /tmp/fcgi.sock This is where Lighttpd forwards the HTTP requests to and receives responses from. Unix sockets are lighter than Internet sockets, and this is one of the reasons Lighttpd+FastCGI+web2py is fast. As in the case of Apache, it is possible to setup Lighttpd to deal with static files directly, and to force some applications over HTTPS. Refer to the Lighttpd documentation for details. The administrative interface must be disabled when web2py runs on a shared host with FastCGI, or it will be exposed to the other users. 11.8 Apache2 and mod python in a shared hosting environment There are times, specifically on shared hosts, when one does not have the permission to configure the Apache config files directly. You can still run web2py. Here we show an example of how to set it up using mod python 6 • Place contents of web2py into the "htdocs" folder. • In the web2py folder, create a file "web2py modpython.py" file with the following contents: 1 from mod_python import apache 2 import modpythonhandler 3 4 def handler(req): 5 req.subprocess_env['PATH_INFO'] = \ 6 req.subprocess_env['SCRIPT_URL'] 7 return modpythonhandler.handler(req) • Create/update the file ".htaccess" with the following contents: 1 SetHandler python-program 2 PythonHandler web2py_modpython 3 ##PythonDebug On 6 Examples provided by Niktar 296 DEPLOYMENT RECIPES 11.9 Setup Cherokee with FastGGI Cherokee is a very fast web server and, like web2py, it provides an AJAX- enabled web-based interface for its configuration. Its web interface is written in Python. In addition, there is no restart required for most of the changes. Here are the steps required to setup web2py with Cherokee: • Download Cherokee [76] • Untar, build, and install: 1 tar -xzf cherokee-0.9.4.tar.gz 2 cd cherokee-0.9.4 3 ./configure enable-fcgi && make 4 make install • Start web2py normally at least once to make sure it creates the "ap- plications" folder. • Write a shell script named "startweb2py.sh" with the following code: 1 #!/bin/bash 2 cd /var/web2py 3 python /var/web2py/fcgihandler.py & and givethe scriptexecuteprivileges andrun it. This will startweb2py under FastCGI handler. • Start Cherokee and cherokee-admin: 1 sudo nohup cherokee & 2 sudo nohup cherokee-admin & By default, cherokee-admin only listens at local interface on port 9090. This is not a problem if you have full, physical access on that machine. If this is not the case, you can force it to bind to an IP address and port by using the following options: 1 -b, bind[=IP] 2 -p, port=NUM or do an SSH port-forward (more secure, recommended): 1 ssh -L 9090:localhost:9090 remotehost • Open "http://localhost:9090" in your browser. If everything is ok, you will get cherokee-admin. • In cherokee-admin web interface, click "info sources". Choose "Local Interpreter". Write in the following code, then click "Add New". SETUP POSTGRESQL 297 1 Nick: web2py 2 Connection: /tmp/fcgi.sock 3 Interpreter: /var/web2py/startweb2py.sh • Click "Virtual Servers", then click "Default". • Click "Behavior", then, under that, click "default". • Choose "FastCGI" instead of "List and Send" from the list box. • At the bottom, select "web2py" as "Application Server" • Put a check in all the checkboxes (you can leave Allow-x-sendfile). If there is a warning displayed, disable and enable one of the check- boxes. (Itwillautomaticallyre-submitthe applicationserverparameter. Sometimes it doesn’t, which is a bug). • Point your browser to "http://ipaddressofyoursite", and "Welcome to web2py" will appear. 11.10 Setup PostgreSQL PostgreSQL is a free and open source database which is used in demand- ing production environments, for example, to store the .org domain name database, and has been proven to scale well into hundreds of terabytes of data. It has very fast and solid transaction support, and provides an auto- vacuum feature that frees the administrator from most database maintenance tasks. On an Ubuntu or other Debian-based Linux distribution, it is easy to install PostgreSQL and its Python API with: 1 sudo apt-get -y install postgresql 2 sudo apt-get -y install python-psycopg2 It is wise to run the web server(s) and the database server on different machines. In this case, the machines running the web servers should be connected with a secure internal (physical) network, or should establish SSL tunnels to securely connect with the database server. Start the database server with: 1 sudo /etc/init.d/postgresql restart When restarting the PostgreSQL server, it should notify which port it is running on. Unless you have multiple database servers, it should be 5432. The PostgreSQL configuration file is: 298 DEPLOYMENT RECIPES 1 /etc/postgresql/x.x/main/postgresql.conf (where x.x is the version number). The PostgreSQL logs are in: 1 /var/log/postgresql/ Once the database server is up and running, create a user and a database so that web2py applications can use it: 1 sudo -u postgres createuser -P -s myuser 2 createdb mydb 3 echo 'The following databases have been created:' 4 psql -l 5 psql mydb The first of the commands will grant superuser-access to the new user, called myuser. It will prompt you for a password. Any web2py application can connect to this database with the command: 1 db = DAL("postgres://myuser:mypassword@localhost:5432/mydb") where mypassword is the password you entered when prompted, and 5432 is the port where the database server is running. Normally youuse one database for each application, andmultiple instances of the same application connect to the same database. It is also possible for different applications to share the same database. For database backup details, read the PostgreSQL documentation; specifi- cally the commands pg dump and pg restore. 11.11 Security Issues It is very dangerous to publicly expose the admin application and the ap- padmin controllers unless they run over HTTPS. Moreover, your password and credentials should never be transmitted unencrypted. This is true for web2py and any other web application. In your applications, if they require authentication, you should make the session cookies secure with: 1 session.secure() An easy way to setup a secure production environment on a server is to first stop web2py and then remove all the parameters * .py files from the web2py installation folder. Then start web2py without a password. This will completely disable admin and appadmin. Next, start a second Python instance accessible only from localhost: 1 nohup python web2py -p 8001 -i 127.0.0.1 -a '<ask>' & SCALABILITY ISSUES 299 and create an SSH tunnel from the local machine (the one from which you wish to access the administrative interface) to the server (the one where web2py is running, example.com), using: 1 ssh -L 8001:127.0.0.1:8001 username@example.com Now you can access the administrative interface locally via the web browser at localhost:8001. This configuration is secure because admin is not reachable when the tunnel is closed (the user is logged out). This solution is secure on shared hosts if and only if other users do not have read access to the folder that contains web2py; otherwise users may be able to steal session cookies directly from the server. 11.12 Scalability Issues web2py is designed to be easy to deploy and to setup. This does not mean that it compromises on efficiency or scalability, but it means you may need to tweak it to make it scalable. In this section we assume multiple web2py installations behind a NAT server that provides local load-balancing. In this case, web2py works out-of-the-box if some conditions are met. In particular, all instances of each web2py application must access the same database server and must see the same files. This latter condition can be implemented by making the following folders shared: 1 applications/myapp/sessions 2 applications/myapp/errors 3 applications/myapp/uploads 4 applications/myapp/cache The shared folders must support file locking. Possible solutions are ZFS 7 , NFS 8 , or Samba (SMB). It is possible, but not a good idea, to share the entire web2py folder or the entire applications folder, because this would cause a needless increase of network bandwidth usage. We believe the configuration discussed aboveto be very scalable because it reduces the database load by moving to the shared filesystems those resources 7 ZFS was developed by Sun Microsystems and is the preferred choice. 8 With NFS you may need to run the nlockmgr daemon to allow file locking. 300 DEPLOYMENT RECIPES that need to be shared but do not need transactional safety (only one client at a time is supposed to access a session file, cache always needs a global lock, uploads and errors are write once/read many files). Ideally, both the database and the shared storage should have RAID capa- bility. Do not make the mistake of storing the database on the same storage as the shared folders, or you will create a new bottle neck there. On a case-by-casebasis, you may need to perform additional optimizations and we will discuss them below. In particular, we will discuss how to get rid of these shared folders one-by-one, and how to store the associated data in the database instead. While this is possible, it is not necessarily a good solution. Nevertheless, there may be reasons to do so. One such reason is that sometimes we do not have the freedom to set up shared folders. Sessions in Database It is possible to instruct web2py to store sessions in a database instead of in the sessions folder. This has to be done for each individual web2py application although they may all use the same database to store sessions. Given a database connection 1 db = DAL( ) you can store the sessions inthis database(db) by simply stating the following, in the same model file that establishes the connection: 1 session.connect(request, response, db) If it does not exist already, web2py creates a table in the database called web2py session appname containing the following fields: 1 Field('locked', 'boolean', default=False), 2 Field('client_ip'), 3 Field('created_datetime', 'datetime', default=now), 4 Field('modified_datetime', 'datetime'), 5 Field('unique_key'), 6 Field('session_data', 'text') "unique key" is a uuid key used to identify the session in the cookie. "ses- sion data" is the cPickled session data. To minimize database access, you should avoid storing sessions when they are not needed with: 1 session.forget() With this tweak the "sessions" folder does not need to be a shared folder because it will no longer be accessed. Noticethat, ifsessions aredisabled,youmustnot passthe session to form.accepts and you cannot use session.flash nor CRUD. SCALABILITY ISSUES 301 Pound, a High Availability Load Balancer If you need multiple web2py processes running on multiple machines, in- stead of storing sessions in the database or in cache, you have the option to use a load balancer with sticky sessions. Pound [78] is an HTTP load balancer and Reverse proxy that provides sticky sessions. By sticky sessions, we mean that once a session cookie has been issued, the load balancer will always route requests from the client associated to the session, to the same server. This allows you to store the session in the local filesystem. To use Pound: First, install Pound, on out Ubuntu test machine: 1 sudo apt-get -y install pound Second edittheconfiguration file"/etc/pound/pound.cfg"and enablePound at startup: 1 startup=1 Bind it to a socket (IP, Port): 1 ListenHTTP 123.123.123.123,80 Specify the IP addresses and ports of the machines in the farm running web2py: 1 UrlGroup ". * " 2 BackEnd 192.168.1.1,80,1 3 BackEnd 192.168.1.2,80,1 4 BackEnd 192.168.1.3,80,1 5 Session IP 3600 6 EndGroup The ",1" indicates the relative strength of the machines. The last line will maintain sessions by client IP for 3600 seconds. Third, enable this config file and start Pound: 1 /etc/default/pound Cleanup Sessions If you choose to keep your sessions in the filesystem, you should be aware that on a production environment they pile up fast. web2py provides a script called: 1 scripts/sessions2trash.py 302 DEPLOYMENT RECIPES that when run in the background, periodically deletes all sessions that have not been accessed for a certain amount of time. This is the content of the script: 1 SLEEP_MINUTES = 5 2 EXPIRATION_MINUTES = 60 3 import os, time, stat 4 path = os.path.join(request.folder, 'sessions') 5 while 1: 6 now = time.time() 7 for file in os.listdir(path): 8 filename = os.path.join(path, file) 9 t = os.stat(filename)[stat.ST_MTIME] 10 if now - t > EXPIRATION_MINUTES * 60: 11 unlink(filename) 12 time.sleep(SLEEP_MINUTES * 60) You can run the script with the following command: 1 nohup python web2py.py -S yourapp -R scripts/sessions2trash.py & where yourapp is the name of your application. Upload Files in Database By default, all uploaded files handled by SQLFORMs are safely renamed and stored in the filesystem under the "uploads" folder. It is possible to instruct web2py to store uploaded files in the database instead. Consider the following table: 1 db.define_table('dog', 2 Field('name') 3 Field('image', 'upload')) where dog.image is of type upload. To make the uploaded image go in the same record as the name of the dog, you must modify the table definition by adding a blob field and link it to the upload field: 1 db.define_table('dog', 2 Field('name') 3 Field('image', 'upload', uploadfield='image_data'), 4 Field('image_data', 'blob')) Here "image data" is just an arbitrary name for the new blob field. Line 3 instructs web2py to safely rename uploaded images as usual, store the new name in the image field, and store the data in the uploadfield called "image data" instead of storing the data on the filesystem. All of this is be done automatically by SQLFORMs and no other code needs to be changed. With this tweak, the "uploads" folder is no longer needed. No Google App Engine files are stored by default in the database without need to define an uploadfield, one is created by default. SCALABILITY ISSUES 303 Collecting Tickets By default, web2py stores tickets (errors) on the local file system. It would not make sense to store tickets directly in the database, because the most common origin of error in a production environment is database failure. Storing tickets is never a bottleneck, because this is ordinarily a rare event, hence, in a production environment with multiple concurrent servers, it is more than adequate to store them in a shared folder. Nevertheless, since only the administrator needs to retrieve tickets, it is also OK to store tickets in a non-shared local "errors" folder and periodically collect them and/or clear them. One possibility is to periodically move all local tickets to a database. For this purpose, web2py provides the following script: 1 scripts/tickets2db.py which contains: 1 import sys 2 import os 3 import time 4 import stat 5 import datetime 6 7 from gluon.utils import md5_hash 8 from gluon.restricted import RestrictedError 9 10 SLEEP_MINUTES = 5 11 DB_URI = 'sqlite://tickets.db' 12 ALLOW_DUPLICATES = True 13 14 path = os.path.join(request.folder, 'errors') 15 16 db = SQLDB(DB_URI) 17 db.define_table('ticket', SQLField('app'), SQLField('name'), 18 SQLField('date_saved', 'datetime'), SQLField('layer') , 19 SQLField('traceback', 'text'), SQLField('code', 'text ')) 20 21 hashes = {} 22 23 while 1: 24 for file in os.listdir(path): 25 filename = os.path.join(path, file) 26 27 if not ALLOW_DUPLICATES: 28 file_data = open(filename, 'r').read() 29 key = md5_hash(file_data) 30 31 if key in hashes: 32 continue 33 34 hashes[key] = 1 304 DEPLOYMENT RECIPES 35 36 error = RestrictedError() 37 error.load(request, request.application, filename) 38 39 modified_time = os.stat(filename)[stat.ST_MTIME] 40 modified_time = datetime.datetime.fromtimestamp(modified_time ) 41 42 db.ticket.insert(app=request.application, 43 date_saved=modified_time, 44 name=file, 45 layer=error.layer, 46 traceback=error.traceback, 47 code=error.code) 48 49 os.unlink(filename) 50 51 db.commit() 52 time.sleep(SLEEP_MINUTES * 60) This script should be edited. Change the DB URI string so that it connects to your database server and run it with the command: 1 nohup python web2py.py -S yourapp -M -R scripts/tickets2db.py & where yourapp is the name of your application. This script runs in the background and every 5 minutes moves all tickets to the database server in a table called "ticket" and removes the local tickets. If ALLOW DUPLICATES is set to False, it will only store tickets that cor- respond to different types of errors. With this tweak, the "errors" folder does not need to be a shared folder any more, since it will only be accessed locally. Memcache We have shown that web2py provides two types of cache: cache.ram and cache.disk. They both work on a distributed environment with multiple concurrent servers, but they do not work as expected. In particular, cache.ram will only cache at the server level; thus it becomes useless. cache.disk will also cache at the server level unless the "cache" folder is a shared folder that supports locking; thus, instead of speeding things up, it becomes a major bottleneck. The solution is not to use them, but to use memcache instead. web2py comes with a memcache API. To use memcache, create a new model file, for example 0 memcache.py, and in this file write (or append) the following code: 1 from gluon.contrib.memcache import MemcacheClient 2 memcache_servers = ['127.0.0.1:11211'] 3 cache.memcache = MemcacheClient(request, memcache_servers) . following command: 1 nohup python web2 py. py -S yourapp -R scripts/sessions2trash .py & where yourapp is the name of your application. Upload Files in Database By default, all uploaded files handled. Debian-based Linux distribution, it is easy to install PostgreSQL and its Python API with: 1 sudo apt-get -y install postgresql 2 sudo apt-get -y install python-psycopg2 It is wise to run the web. way to setup a secure production environment on a server is to first stop web2 py and then remove all the parameters * .py files from the web2 py installation folder. Then start web2 py without a password.