Installing the Pfam website from scratch can be a tricky and time-consuming task. To make it easier for users to install and run the site locally, we have built a virtual machine (VM) with everything necessary pre-installed and, as far as possible, pre-configured.
The following pages explain how to install the virtual machine in your local virtualisation software and then configure it to run the website. With the exception of the "Install VM" section, you can find exactly the same information in a set of text files in the home directory of the main "pfamadmin" user account within the VM itself.
Note that this VM encapsulates only the code needed to run the Pfam website. It does not include the Pfam database, since that runs to several hundred gigabytes when installed and running. You will need to retrieve and install the database separately. The VM also does not include the code needed to run sequence and other searches through the website. That functionality requires another set of scripts and software and, in our case, an entirely separate set of machines. We'll look at building another VM for that task later, if there's interest.
Setting up the VM
These are the basic steps involved:
- the virtual machine image
- the virtual machine itself
- Install the Pfam and create the "web_user" database
- the PfamWeb application
- Set up the
- PfamWeb and lighttpd
- to make the site start on boot
|Operating system||Ubuntu 11.10 (oneiric) (64 bit)|
|Last package update||2012-01-24|
|Hard drive||20 Gb (virtual size, dynamically allocated)|
|OVA file||PfamWeb_20120124.ova (1.0Gb) (md5 checksum 9777f4723a4a3f65ffa2ac8911a96115)|
Install the VM image
This page gives some pointers on how to install the virtual machine image in your local virtualisation software. The description covers the process in two of the most popular packages, VMware fusion (for mac) and VirtualBox (for mac, linux and windows).
Note: the VM was built using VMware and tested only briefly under VirtualBox and these instructions are based on running the software on a desktop machine, rather than a full-blown virtualisation infrastructure.
The VM is packaged as an Open Virtual Appliance (OVA), essentially a bundle of files that make up an Open Virtualisation Format (OVF) virtual machine. The package should be usable in many virtualisation frameworks. You can find the OVA file on our FTP area.
The OVA packge must be converted to a VMX file for use in
VMware. This is done using
(You may need to install
The OVA file can be imported directly into VirtualBox. Choose "Import" from the "File" menu.
It's useful to start the VM at least once under the GUI of your virtualisation software, if only to check that it boots and allows you to log in. If you are using VirtualBox, you will also need to install the "guest additions" using the GUI.
Once you've checked that you can successfully log into the VM and that it connects to your network etc., you may want to start it in "headless" mode, i.e. without a GUI interface. If you're planning to run the VM as a server for a long period, you will almost certainly want to run in this mode.
If you run your VM in a GUI, you will be able to log in directly, but if you run the VM in "headless" mode you may need to query the guest OS to find the assigned IP address.
The following script gives an example of how to do retrieve the IP address of the VM semi-automatically using the vmrun" (PDF) command with VMware. The VMware tools must be installed in the guest before this script will work. The VM comes with VMware tools pre-installed but if you update packages in the guest OS, you may need to re-install them.
You can then log in using something like:
As with VMware, VirtualBox allows querying of the guest OS from the host. Also like VMware, VirtualBox requires software on the guest to allow the host to interact with it. These "guest additions" need to be installed before you can query the VM for information like its IP address.
Once the guest additions are installed, the VM can be interrogated from the host using VBoxManage, something like this:
The VM, as shipped, has 1Gb of memory, 20Gb of disk space and a single CPU core. These are probably the minimum requirements for running the PfamWeb server. If your server will be used heavily, you may want to consider increasing the memory allocated to the VM to 2Gb or more. Adding more CPUs or CPU cores will improve the performance of the website, though the specification of your database server will also have a significant effect. Consult the documentation for your virtualisation software to find out how to re-configure the VM.
The VM is configured with a single user account, "pfamadmin". The password for the account is "admin password". It is highly recommended that youimmediately after you log in for the first time.
Installing the guest additions into VirtualBox can be somewhat involved. Here are some pointers to getting it to work.
Mount the guest additions CD in the VM
VirtualBox can present the guest additions installer to the VM as a CD image. Choose "Install Guest Additions" from the "Devices" menu of the VM, which effectively inserts a CD into the virtual CD drive of the virtual machine. You then need to mount the CD:
Next you need to install the necessary pre-requisites for building linux kernel modules: the guest additions include a module that allows the host to interact with the guest.
First, install the
module-assistant package, which helps
install the required kernel headers and other essentials for building
kernel modules. Note: since this involves using
apt-get to download and install packages, you may need to
before you can do this.
Install the guest additions
Finally, run the installer:
If the installation has been successful, you should now be able to probe the VM from the host machine:
In particular you should now see an entry
/VirtualBox/GuestInfo/Net/0/V4/IP, which was not previously
visible. The value for that property gives the IP address for the
Configure the virtual machine
This file documents the steps needed to set up the virtual machine itself. jt6 20111012 WTSI 1. Change the "pfamadmin" password ================================== On traditional unix systems, the user "root" had total control over all files and processes. With modern linux distributions such as Ubuntu, there is no root user, but individual user accounts are granted "super-user" permissions for certain operations using the "sudo" command. This VM comes configured with a single user account, "pfamadmin", with a default password of "admin password". Since this account can be used to make any change to the system, it's important to reset the password immediately. Use the "passwd" command. You will be asked to give the old password ("admin password") and then to enter the new password twice. pfamadmin@ubuntu:~$ passwd Changing password for pfamadmin. (current) UNIX password: Enter new UNIX password: Retype new UNIX password: passwd: password updated successfully pfamadmin@ubuntu:~$ If you'll be logging into the VM frequently, you'll probably want to set up passwordless logins, using public/private keys for authentication instead: https://help.ubuntu.com/11.10/serverguide/C/openssh-server.html 2. Configure a web-proxy ======================== Some institutions require their users to access the wider internet through a network gateway, or "proxy". Any software that needs to access the network will need to be configured to direct requests through the proxy. "apt-get" --------- Ubuntu uses a packaging system for installing and maintaining software. "apt-get" and related commands can be used to manage packages, including updating them when bug-fixes or security patches are released. If you need to configure your browser to use a web proxy in order to access the web from your site, you will also need to configure "apt-get" to use the same proxy, so that it can retrieve information about updates and the update packages themselves. Edit the following file: └── etc/ └── apt/ └── apt.conf.d └── 70debconf Append the following line: Acquire::http::proxy "http://<proxy URL>:<proxy port>"; including the URL and port number for your site's proxy. If you need to use a proxy for http connections, you'll probably also need to use one for secure connections. If so, append a second line: Acquire::https::proxy "https://<secure proxy URL>:<secure proxy port>"; Check the Ubuntu documentation and "howtos" on the web for information on keeping the VM up-to-date. cpan ---- Perl modules are generally installed using a perl-specific packaging tool called "cpan". Again, in order to contact repositories and to check for updates, cpan needs to know if you use a web proxy: pfamadmin@ubuntu:~$ sudo cpan [sudo] password for pfamadmin: cpan shell -- CPAN exploration and modules installation (v1.9402) Enter 'h' for help. cpan> o conf http_proxy http://<proxy URL>:<proxy port> http_proxy [http://<proxy URL>:<proxy port>] Please use 'o conf commit' to make the config permanent! cpan> o conf commit commit: wrote '/etc/perl/CPAN/Config.pm' cpan> q Lockfile removed. You may need to use cpan to update perl modules occasionally, but mostly you should leave the perl installation untouched. In the shell ------------ Finally, you may want to configure the proxy in the ".bashrc" file, so that it's set for all new shells. └── home/ └── pfamadmin/ └── .bashrc Append the following lines: export http_proxy=<proxy URL>:<proxy port> export https_proxy=<secure proxy URL>:<secure proxy port> 3. Set up an email address to receive monitoring alerts ======================================================= We use Monit (http://mmonit.com/monit/) to maintain and control the server processes. When it detects a change in a service or when it restarts a failed process, monit will send an email to an administrator to let them know about the event. The VM is configured so that monit will send event emails to the "pfamadmin" user on the VM itself. You may want to change the email address to the main address of the person who will be administering the machine. The monit configuration file is: └── etc/ └── monit/ └── conf.d/ └── general Edit "general" and change the email address on the "set alert" line: set alert pfamadmin@localhost 4. Configure the "physical" parameters of the VM ================================================ One of the advantages of a virtual machine is that its "physical" characteristics can be changed easily. For example, the Pfam website VM is initially configured as a single-core, 64-bit machine with 2Gb of RAM. One parameter that you might choose to change is the available memory. If the website is heavily used, you may want to increase the memory of the VM to perhaps 4Gb or more, if available. Alternatively, if the host machine is relatively small, you may want to reduce the memory allocated to the guest (this VM), so that the host machine doesn't run out of memory itself. Before you can alter the characteristics of the VM, you will probably need to shut down the VM. Use: pfamadmin@ubuntu:~$ sudo shutdown -h now to shutdown the guest operating system cleanly, then reconfigure and restart the VM using your virtualisation software. 5. Set your timezone ==================== The VM is configured for the "Europe/London" timezone. You will need to reconfigure it if you're in a different zone. The safest way is to use "dpkg-reconfigure tzdata", which presents you with a list of geographical areas and timezones and sets the value in /etc/timezone for you: pfamadmin@ubuntu:~$ cat /etc/timezone America/Los_Angeles pfamadmin@ubuntu:~$ sudo dpkg-reconfigure tzdata Current default time zone: 'Europe/London' Local time is now: Mon Jan 16 11:47:18 GMT 2012. Universal Time is now: Mon Jan 16 11:47:18 UTC 2012. pfamadmin@ubuntu:~$ cat /etc/timezone Europe/London pfamadmin@ubuntu:~$ 6. Configure mail ================= The mail system on the VM is configured for local mail only. Also mail sent to the "root" user will be forward to "pfamadmin"; you can read it using "mail" or "alpine". "root" is commonly sent notices about problems with the system, such as warnings from monit, so you may want to configure the mail system to send mails to a user outside of the VM. You will need to reconfigure the postfix system to do that: pfamadmin@ubuntu:~$ sudo dpkg-reconfigure postfix ... This will take you through a series of forms to set up the mail system as needed. 7. Configure access to the machine from your network ==================================================== The VM is configured as a basic Ubuntu server, with no restrictions on access or limits on who can view the website. You will need to configure security features such as firewall rules or access restrictions according to your local network policy. Please check the standard Ubuntu documentation for help on security improvements to the machine. Useful links ============ http://en.wikipedia.org/wiki/Sudo https://help.ubuntu.com/community/InstallingSoftware http://mmonit.com/monit/ http://en.wikipedia.org/wiki/Virtual_machine https://help.ubuntu.com/11.10/serverguide/C/postfix.html
This file documents the steps involved with setting up the Pfam databases. The Pfam release database uses around 300Gb of disk space when installed and running, making it too large to distribute easily within a VM. Instead, you will need to set up a local installation of MySQL and install the Pfam database within it, using dump files available on the Pfam FTP area. The Pfam website uses two separate databases. One, the main Pfam release database, contains all of the data about Pfam families. The other, "web_user", stores ancillary data, such as the content scraped from Wikipedia. You will need to set up both databases in order to run the website. jt6 20111012 WTSI 1. Install MySQL ================ You can download MySQL from: http://dev.mysql.com/downloads/mysql/ The main Pfam websites use MySQL 5.0 and we recommend that you install at least version 5.0. Installing and configuring MySQL itself is beyond the scope of this document, but there are many tutorials and guides on the web. 2. Install the Pfam database ============================ You can find the database dump files on the Pfam FTP area, at: ftp://ftp.sanger.ac.uk/pub/databases/Pfam/current_release/database_files/ The "database_files" directory contains two files for every table, a ".sql" file with the SQL "CREATE TABLE" statement, and a ".txt" file with the raw data. You will need to download and install all of the tables. One way to load, for example, the VERSION table is: mysql> source VERSION.sql Query OK, 0 rows affected (0.00 sec) Query OK, 0 rows affected (0.00 sec) Query OK, 0 rows affected (0.00 sec) Query OK, 0 rows affected (0.00 sec) Query OK, 0 rows affected (0.00 sec) Query OK, 0 rows affected (0.08 sec) Query OK, 0 rows affected (0.00 sec) Query OK, 0 rows affected (0.00 sec) Query OK, 0 rows affected (0.02 sec) Query OK, 0 rows affected (0.04 sec) Query OK, 0 rows affected (0.00 sec) Query OK, 0 rows affected (0.00 sec) Query OK, 0 rows affected (0.00 sec) Query OK, 0 rows affected (0.00 sec) Query OK, 0 rows affected (0.00 sec) Query OK, 0 rows affected (0.00 sec) mysql> load data infile "/nfs/users/nfs_j/jt6/Downloads/VERSION.txt" into table VERSION; Query OK, 1 row affected (0.01 sec) Records: 1 Deleted: 0 Skipped: 0 Warnings: 0 You can confirm that the table now has data by doing an extra query: mysql> select * from VERSION\G *************************** 1. row *************************** pfam_release: 25.0 pfam_release_date: 2011-03-31 swiss_prot_version: 2010_05 trembl_version: 2010_05 hmmer_version: 3.0b2 pfamA_coverage: 76.7 pfamB_additional_coverage: 7.9 pfamA_residue_coverage: 53.9 pfamB_additional_residue_coverage: 6.9 number_families: 12273 1 row in set (0.00 sec) mysql> 3. Create the "web_user" database ================================= Creating the "web_user" database should be as simple as running an SQL script to create the database and a set of empty tables. You can find the script for generating the "web_user" database at: └── home/ └── pfamadmin/ └── create_web_user.sql You can execute the script either from the shell using something like: pfamadmin@ubuntu:~$ mysql < create_web_user.sql or from within the mysql client, with: mysql> source create_web_user.sql Useful links ============ http://dev.mysql.com/doc/refman/5.1/en/installing.html http://dev.mysql.com/doc/refman/5.0/en/batch-commands.html
Configure PfamWeb application
This file explains how to configure the PfamWeb application. The Pfam website is built as a Perl application, PfamWeb. The dynamic content in the pages is retrieved from a pair of MySQL databases and the web application needs to be configured to connect to them. Sitting in front of the PfamWeb application is a light-weight web server, lighttpd, which serves static content such as images and CSS files, and proxies dynamic content from the perl application to the browser. The web server is configured to use port 8000 by default, but this can be changed if necessary. jt6 20111012 WTSI 1. Set the database connection parameters ========================================= PfamWeb is the Perl application that runs the Pfam website. It's configured via a pair of Apache-style configuration files: └── opt/ └── www/ └── conf/ ├── pfamweb.conf └── pfamweb_local.conf The only changes you should need to make will be in the "pfamweb_local.conf" file. You should not need to edit the main "pfamweb.conf" file. The main configuration parameters are the database connection settings:
schema_class "PfamDB" connect_info "dbi:mysql:database=pfam_25_0;host=database_server;port=3306" connect_info pfamwebro connect_infoThe first "connect_info" line specifies the data source name (DSN), essentially the connection details of the database itself. You will need to have the host name of the machine where MySQL is running, as well as the port where it's running, and the name of the database itself (usually something like "pfam_25_0"). The next two lines give the username and password for the database account that will be used to connect to the Pfam database. We recommend making this a read-only account. For further security you should restrict the account to accessing only the Pfam database schema. The next lines give the parameters for the DAS-components of the site. If your VM will need to use a web-proxy to connect to the wider internet, you need to give it here on the "dasProxy" line, otherwise leave the value blank (""). Finally, you need to give connection details for the "web_user" schema. The database account that you use here should have read/write access to the web_user schema: dasDsn "http://das.sanger.ac.uk/das/pfam" dasTo 4 dasProxy "http:// : " schema_class "WebUser" connect_info "dbi:mysql:database=web_user;host=database_server;port=3306" connect_info webuser connect_infoYou should leave all other configuration parameters untouched. 2. Set the port number for the website ====================================== You may need to change the default port number (8000), depending on your network environment. The port is set in one of the lighttpd configuration files, "20-pfamweb.conf": └── etc/ └── lighttpd/ └── conf-available/ └── 20-pfamweb.conf Change the value on the following line: server.port = 8000 3. Configure PfamWeb to expect a front-end proxy ================================================ When designing network architectures, it's common to make all web traffic come through a dedicated machine, a "front-end" proxy, which then directs traffic to the appropriate internal machine, something like this: Internal network | Internet +------------+ +------------+ | +------------+ | PfamWeb |<-----| Front-end |<-~ | ~--| Client | | server | | proxy | | | browser | +------------+ +------------+ | +------------+ "request from 192.168.0.10 | 188.8.131.52 184.108.40.206" | If you plan to make your Pfam website accessible outside of your immediate network, and if external traffic will be arriving via an internal proxy, you may need to adjust the configuration of the PfamWeb application to tell it that it is running behind a front-end proxy. Telling PfamWeb about the proxy allows it to record the correct IP for incoming requests, using the IP of the client rather than that of the proxy, which can be important for auditing or for dealing with malicious traffic from the wider network. More importantly, PfamWeb builds many of the URLs in the Pfam website dynamically and these links can be incorrectly generated if requests are not properly treated as coming from outside of your network. To tell PfamWeb that it's running behind a front-end proxy, edit the local PfamWeb configuration file: └── opt/ └── www/ └── conf/ └── pfamweb_local.conf Change the value of "using_frontend_proxy" to 1: using_frontend_proxy 1 Useful links ============ http://search.cpan.org/dist/Config-General/ http://redmine.lighttpd.net/wiki/lighttpd/Docs:ConfigurationOptions AutoCommit 1 mysql_enable_utf8 1
Configure cron jobs
This file explains how to configure the cron jobs that maintain some of the ancillary data needed by the website. jt6 20111012 WTSI There are three cron scripts on the VM: mapping_cron.pl scrape_cron.pl update_das_sources.pl The first script is responsible for updating the live mapping between Pfam families and Wikipedia articles. The second downloads the content of any Wikipedia articles that are referenced by a Pfam family. The final script updates the list of available DAS sources from the DAS registry. 1. Edit the cron script configuration file ========================================== All three scripts deposit data in the "web_user" database. The database connection parameters are given in: └── opt/ └── www/ └── conf/ └── crons.conf The connection parameters are given in a slightly different format in this file, compared to the other configuration files, but the host name, port and account details will be the same:
db_name web_user db_host database_server db_port 3306 username webuser passwordYou can use the same database account for this as for the web_user database in the PfamWeb configuration. If you need to use a web proxy to access the web, you will need to configure it in two places for the cron scripts. First, in the "crons.conf" file, set it on the "das_proxy" line: das_proxy "http:// : " Leave the value blank if you don't have to use a proxy. 2. Configure the crontab ======================== If you need to use a proxy, that also needs to be set in the crontab file. Edit the crontab: pfamadmin@ubuntu:~$ crontab -e which will drop you into your configured editor. If you would prefer to use a different editor, you can quit without saving and then choose a different one using the "select-editor" command at the shell prompt. In the crontab file, look for the line: # http_proxy=http:// : Add your server name and port then uncomment the line by removing the leading "#". If you don't need to use a proxy, leave that line untouched. By default, the output of the cron scripts will be sent to the "pfamadmin" user on the local machine. If you would prefer to choose a different address to receive these logging emails, set the values of "MAILTO" as appropriate. 3. Enable the cron scripts ========================== Again in the crontab ("crontab -e"), look for the following three lines and remove the leading "#" to enable the jobs: # 00 00,12 * * * $PERLBIN /opt/www/PfamScripts/wiki/mapping_cron.pl -c $CRON_CONFIG # 01 01,13 * * * $PERLBIN /opt/www/PfamScripts/wiki/scrape_cron.pl -c $CRON_CONFIG # 30 01 * * * $PERLBIN /opt/www/PfamBackend/scripts/update_das_sources.pl -c $CRON_CONFIG Save the file and exit the editor to install the new crontab. 4. Update the Wikipedia mapping and content =========================================== You can now wait until the cron scripts run automatically over night, or you can run them manually to populate the two relevant tables immediately. To run the scripts: pfamadmin@ubuntu:~$ http_proxy=http:// : pfamadmin@ubuntu:~$ PERL5LIB=/opt/www/PfamLib:/opt/www/PfamSchemata pfamadmin@ubuntu:~$ PERLBIN=/usr/bin/perl pfamadmin@ubuntu:~$ CRON_CONFIG=/opt/www/conf/crons.conf pfamadmin@ubuntu:~$ $PERLBIN /opt/www/PfamScripts/wiki/mapping_cron.pl -c $CRON_CONFIG main:::90 INFO: retrieved 2091 article-to-entry rows for db rfam main:::90 INFO: retrieved 4916 article-to-entry rows for db pfam main:::97 INFO: got 7007 accessions in final mapping pfamadmin@ubuntu:~$ $PERLBIN /opt/www/PfamScripts/wiki/scrape_cron.pl -c $CRON_CONFIG ... Useful links ============ http://en.wikipedia.org/wiki/Cron
This document explains how to start the Pfam website on the VM. Both the PfamWeb application and the lighttpd web server are controlled using "monit". Monit takes care of starting the processes initially and then watches to make sure they are running constantly, restarting them if they crash or become unresponsive. jt6 20111012 WTSI 1. Start PfamWeb ================ To start the PfamWeb application: pfamadmin@ubuntu:~$ sudo monit start pfamweb There will be no output from this command, but you can check the progress of the start-up using the shell alias "server": pfamadmin@ubuntu:~$ server root 3476 1 0 00:05 ? 00:00:00 /bin/sh /etc/init.d/pfamweb start www-data 3488 1 42 00:05 ? 00:00:01 perl /opt/www/PfamWeb/script/pfamweb_fastcgi.pl -M FCGI::ProcManager::MaxRequests -n 10 -l /tmp/pfamweb_fastcgi.socket -p /var/run/pfamweb.pid 1000 3494 2000 0 00:05 pts/5 00:00:00 egrep --color=auto cgi|famweb|lighttpd In this listing, process 3488 is the "master" server process starting up. Checking again a few seconds later shows the server processes up and running: pfamadmin@ubuntu:~$ server www-data 3488 1 1 00:05 ? 00:00:04 perl-fcgi-pm [PfamWeb] www-data 3499 3488 0 00:05 ? 00:00:00 perl-fcgi www-data 3500 3488 0 00:05 ? 00:00:00 perl-fcgi www-data 3503 3488 0 00:05 ? 00:00:00 perl-fcgi www-data 3504 3488 0 00:05 ? 00:00:00 perl-fcgi www-data 3505 3488 0 00:05 ? 00:00:00 perl-fcgi www-data 3506 3488 0 00:05 ? 00:00:00 perl-fcgi www-data 3507 3488 0 00:05 ? 00:00:00 perl-fcgi www-data 3508 3488 0 00:05 ? 00:00:00 perl-fcgi www-data 3509 3488 0 00:05 ? 00:00:00 perl-fcgi www-data 3510 3488 0 00:05 ? 00:00:00 perl-fcgi 1000 3536 2000 0 00:09 pts/5 00:00:00 egrep --color=auto cgi|famweb|lighttpd 2. Start lighttpd ================= Similarly, starting lighttpd is done through monit: pfamadmin@ubuntu:~$ sudo monit start lighttpd and again, the progress can be seen using "server": pfamadmin@ubuntu:~$ server www-data 3488 1 1 00:05 ? 00:00:04 perl-fcgi-pm [PfamWeb] www-data 3499 3488 0 00:05 ? 00:00:00 perl-fcgi www-data 3500 3488 0 00:05 ? 00:00:00 perl-fcgi www-data 3503 3488 0 00:05 ? 00:00:00 perl-fcgi www-data 3504 3488 0 00:05 ? 00:00:00 perl-fcgi www-data 3505 3488 0 00:05 ? 00:00:00 perl-fcgi www-data 3506 3488 0 00:05 ? 00:00:00 perl-fcgi www-data 3507 3488 0 00:05 ? 00:00:00 perl-fcgi www-data 3508 3488 0 00:05 ? 00:00:00 perl-fcgi www-data 3509 3488 0 00:05 ? 00:00:00 perl-fcgi www-data 3510 3488 0 00:05 ? 00:00:00 perl-fcgi www-data 3551 1 0 00:10 ? 00:00:00 /usr/sbin/lighttpd -f /etc/lighttpd/lighttpd.conf 1000 3553 2000 0 00:10 pts/5 00:00:00 egrep --color=auto cgi|famweb|lighttpd Now a lighttpd process (3551) can be seen in the list. You can check that the server is working correctly using "curl": pfamadmin@ubuntu:~$ curl http://localhost:8000/ !<DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "<http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html> <head> <title>Pfam: Home page</title> <meta name="verify-v1" content="GjV+z5lf7mSCShhAOJZh1UW8J+iiCgWmbxIFg2GkG0Q=" /> <meta name="verify-v1" content="FA9AR+bh3BmS05vcSp0mbiAB80DgELEAkFvu4q9ViC8=" /> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> ... 3. Stopping the website ======================= If you need to shutdown the website, you need to stop both the lighttpd and PfamWeb processes. It's slightly cleaner to shutdown lighttpd first, since shutting down PfamWeb but leaving lighttpd running will cause the web server to serve error messages to users. Shutting down lighttpd means simply that users will not be able to connect to the site. To stop lighttpd: pfamadmin@ubuntu:~$ sudo monit stop lighttpd and to stop PfamWeb pfamadmin@ubuntu:~$ sudo monit stop pfamweb Although lighttpd responds very quickly to a "stop" command, PfamWeb can be much slower. Check with "server" and wait for all "perl-fcgi" processes to disappear. 4. Check process statuses using monit ===================================== You can see what monit thinks is the current state of the server processes using: pfamadmin@ubuntu:~$ sudo monit status The Monit daemon 5.2.1 uptime: 55m Process 'pfamweb' status running monitoring status monitored pid 3488 parent pid 1 uptime 11m children 10 memory kilobytes 108232 memory kilobytes total 1147132 memory percent 10.5% memory percent total 112.1% cpu percent 0.0% cpu percent total 0.0% unix socket response time 5.005s to /tmp/pfamweb_fastcgi.socket [generic] data collected Thu Oct 13 00:16:56 2011 Process 'lighttpd' status running monitoring status monitored pid 3590 parent pid 1 uptime 0m children 0 memory kilobytes 1388 memory kilobytes total 1388 memory percent 0.1% memory percent total 0.1% cpu percent 0.0% cpu percent total 0.0% port response time 0.000s to localhost:8000 [DEFAULT via TCP] data collected Thu Oct 13 00:16:56 2011 System 'system_ubuntu' status running monitoring status monitored load average [0.00] [0.02] [0.05] cpu 0.1%us 0.7%sy 0.0%wa memory usage 314188 kB [30.7%] swap usage 1052 kB [0.1%] data collected Thu Oct 13 00:16:56 2011 The important lines are the "status" rows, which should be showing "running" if all processes are up and working. 5. Check the server logs ======================== The web server keeps copious logs of usage and errors. You can see basic access information in the lighttpd "access.log" while "error.log" gives a detailed log of what the PfamWeb application is doing: └── var/ └── log/ └── lighttpd/ ├── access.log └── error.log The access log shows the IP adress of incoming requests, the time of the request and the resource that was requested, along with several other pieces of information that can be useful for debugging: pfamadmin@ubuntu:~$ sudo tail -f /var/log/lighttpd/access.log 172.16.100.1 172.16.100.130:8000 - [13/Oct/2011:17:16:39 +0100] "GET /static/images/box_darker.gif HTTP/1.1" 200 2894 "http://172.16.100.130:8000/static/css/cb.css" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:7.0.1) Gecko/20100101 Firefox/7.0.1" 172.16.100.1 172.16.100.130:8000 - [13/Oct/2011:17:16:39 +0100] "GET /static/images/borders_darker.gif HTTP/1.1" 200 178 "http://172.16.100.130:8000/static/css/cb.css" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:7.0.1) Gecko/20100101 Firefox/7.0.1" 172.16.100.1 172.16.100.130:8000 - [13/Oct/2011:17:16:43 +0100] "GET /shared/images/__utm.gif?utmwv=1&utmn=1594200320&utmcs=UTF-8&utmsr=1920x1200&utmsc=24-bit&utmul=en-us&utmje=1&utmfl=10.3%20r183&utmdt=Pfam%3A%20Search%20Pfam&utmhn=172.16.100.130&utmr=-&utmp=/tab/search ... The error log shows, by default, a detailed debug log for every request that the PfamWeb application handles: pfamadmin@ubuntu:~$ sudo tail -f /var/log/lighttpd/error.log 2011-10-14 11:35:55: (mod_fastcgi.c.2701) FastCGI-stderr: [info] *** Request 1 (0.000/s)  [Fri Oct 14 11:35:53 2011] *** [debug] "GET" request for "/" from "172.16.100.1" [debug] PfamWeb::index: generating site index [debug] Rendering template "pages/index.tt" [debug] Response Code: 200; Content-Type: text/html; charset=utf-8; Content-Length: 26289 [info] Request took 1.709306s (0.585/s) .------------------------------------------------------------+-----------. | Action | Time | +-------------- 2011-10-14 11:35:55: (mod_fastcgi.c.2701) FastCGI-stderr: ----------------------------------------------+-----------+ | /auto | 1.596192s | | /index | 0.000794s | | /end | 0.074495s | | -> PfamWeb::View::TT->process | 0.070278s | '------------------------------------------------------------+-----------' The occasion "FastCGI-stderr" lines interspersed with the debug information are an annoying side-effect of the interaction between the perl application and the web-server and can be safely ignored. Once the server is up and running properly, you may want to turn off the verbose debugging information in the error log, which can be done by editing the start up script for the PfamWeb application: └── etc/ └── init.d/ └── pfamweb Edit the "pfamweb" script, using the "sudo" command, because this file is considered to be a system file: pfamadmin@ubuntu:~$ sudo vi /etc/init.d/pfamweb Change the value of "PFAMWEB_DEBUG" to 0 to disable debug messages: export PFAMWEB_DEBUG=0 The error log will now show only true errors and warnings from the perl back-end. 6. Cleaning up processes ======================== It's possible for the PfamWeb server processes to get into various strange states. In many cases, it may be simplest to reboot the virtual machine; the server processes will be started as soon as the machine is rebooted, so the downtime may be less than a minute: pfamadmin@ubuntu:~$ sudo shutdown -r now If you want to try to clean up the server without rebooting, first stop the server processes using monit: pfamadmin@ubuntu:~$ sudo monit stop lighttpd pfamadmin@ubuntu:~$ sudo monit stop pfamweb If PfamWeb processes ("perl-fcgi") are left running after monit thinks that the service is stopped, you can kill them manually: pfamadmin@ubuntu:~$ server www-data 736 1 0 11:04 ? 00:00:00 /usr/sbin/lighttpd -f /etc/lighttpd/lighttpd.conf www-data 855 1 0 11:04 ? 00:00:06 perl-fcgi-pm [PfamWeb] www-data 1355 855 0 11:04 ? 00:00:00 perl-fcgi www-data 1356 855 0 11:04 ? 00:00:00 perl-fcgi www-data 1357 855 0 11:04 ? 00:00:00 perl-fcgi www-data 1358 855 0 11:04 ? 00:00:00 perl-fcgi www-data 1359 855 0 11:04 ? 00:00:00 perl-fcgi www-data 1360 855 0 11:04 ? 00:00:00 perl-fcgi www-data 1361 855 0 11:04 ? 00:00:00 perl-fcgi www-data 1362 855 0 11:04 ? 00:00:00 perl-fcgi www-data 1363 855 0 11:04 ? 00:00:00 perl-fcgi www-data 1364 855 0 11:04 ? 00:00:00 perl-fcgi 1000 1586 1372 0 12:56 pts/2 00:00:00 egrep --color=auto cgi|famweb|lighttpd pfamadmin@ubuntu:~$ sudo kill -9 855 1355 1356 1357 1358 .... 7. Enable and start monit ============================= Monit is intially disabled on the VM, so that the website is not started on first boot. Before it can be started, monit needs to be enabled. Edit the configuration file for the monit init script: └── etc/ └── default/ └── monit Change the value of "startup" to 1: startup=1 Start the monit daemon using: pfamadmin@ubuntu:~$ sudo /etc/init.d/monit start Monit should now start the PfamWeb and lighttpd services automatically when the machine boots. Useful links ============ http://mmonit.com/monit/
Configure PfamWeb application
These are the files and directories that are used by the Pfam website and related processes on this VM. jt6 20120109 WTSI / ├── etc/ │ ├── aliases │ ├── apt/ │ │ └── apt.conf.d │ │ └── 70debconf │ ├── default/ │ │ └── monit │ ├── init.d/ │ │ ├── lighttpd │ │ ├── monit │ │ └── pfamweb │ ├── lighttpd/ │ │ ├── conf-available/ │ │ │ └── 20-pfamweb.conf │ │ ├── conf-enabled/ │ │ │ └── 20-pfamweb.conf -> ../conf-available/20-pfamweb.conf │ │ └── lighttpd.conf │ └── monit/ │ │ ├── conf.d/ │ │ │ ├── general │ │ │ ├── lighttpd │ │ │ └── pfamweb │ │ └── monitrc │ └── timezone ├── home/ │ └── pfamadmin/ │ ├── mail/ │ ├── .bashrc │ ├── .bash_aliases │ ├── 00README.txt │ ├── 01configure_vm.txt │ ├── 02install_databases.txt │ ├── 03configure_pfamweb.txt | ├── 04configure_crons.txt | ├── 05start_website.txt | ├── 06files.txt │ └── create_web_user.sql ├── opt/ │ └── www/ │ ├── PfamBackend/ │ | └── scripts/ │ | └── update_das_sources.pl │ ├── PfamBase/ │ ├── PfamLib/ │ ├── PfamSchemata/ │ ├── PfamScripts │ │ └── wiki │ │ ├── approvals.cgi │ │ ├── mapping.cgi │ │ ├── mapping_cron.pl │ │ ├── rescrape.pl │ │ ├── revisions.cgi │ │ ├── scrape_cron.pl │ │ ├── sync_articles_cron.pl │ │ └── update_cron.pl │ ├── PfamWeb/ │ | ├── bin/ │ | ├── inc/ │ | ├── instructions.txt │ | ├── lib/ │ | ├── LICENSE │ | ├── Makefile.PL │ | ├── root/ │ | ├── script/ │ | │ ├── pfamweb_cgi.pl │ | │ ├── pfamweb_create.pl │ | │ ├── pfamweb_fastcgi.pl │ | │ ├── pfamweb_server.pl │ | │ └── pfamweb_test.pl │ | └── t/ │ └── conf/ │ ├── changelog.conf │ ├── crons.conf │ ├── pfamweb.conf │ ├── pfamweb_local.conf │ └── robots/ │ ├── inra.conf │ ├── janelia.conf │ ├── sbc.conf │ └── wtsi.conf ├── tmp/ │ └── pfamweb_fastcgi.socket └── var/ ├── log/ │ └── lighttpd/ | ├── access.log | └── error.log ├── run/ │ ├── lighttpd.pid │ └── monit.pid └── tmp/ └── opt/