#*****************************************************************************/
#   This software was developed by Marcelo Reis, Joao C. Setubal,
#   David Haake and James Matsunaga
#
#   It should not be redistributed or used for any commercial purpose
#   without written permission from Joao C. Setubal setubal@vbi.vt.edu
#
#   release date September 2005
#
#   Please cite the following publication:
#
#    J. C. Setubal, M. Reis, J. Matsunaga and D. A. Haake. Lipoprotein
#    computational prediction in spirochaetal genomes. Microbiology
#    (2006) 152: 113-121.
#
#
# This software is experimental in nature and is
# supplied "AS IS", without obligation by the authors to provide
# accompanying services or support.  The entire risk as to the quality
# and performance of the Software is with you. The authors
# EXPRESSLY DISCLAIM ANY AND ALL WARRANTIES REGARDING THE SOFTWARE,
# WHETHER EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO WARRANTIES
# PERTAINING TO MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
#
#*****************************************************************************/


+------------------------------------------+
| SpLip software documentation             |
|                                          |
| Author: Marcelo S. Reis, september 2005  |
+------------------------------------------+

-------------------------------------------------------------------------------

* CONTENTS:

This .tar.gz file contains:

- This "README.txt" file;

- "build_weight_matrix.pl" (perl) file;

- "find_orf_score.pl" (perl) file;

- "matrices" directory, with some weight-matrix files (the B. burgdorferi
  weight-matrix - used for the examples - is called "bb.matrix");

- "example.fasta" a fasta file with 4 Borrelia burgdorferi sequences,
  as example to run SpLip;

- "example.out" the output for the given "example.fasta".

-------------------------------------------------------------------------------

* RUNNING SpLip:

a) To generate a weight-matrix file:

  Call at the command line the command:

perl -c build_weight_matrix.pl <1> <2> <3>

where:

	<1> is the trainning set (certain lipoprotein sequences);
	<2> is the genome proteins (for the background frequencies);
	<3> is the weight-matrix name.

Example:

perl -w build_weight_matrix.pl trainingSet.fasta B_burgdorferi.fasta bb.matrix

--

b) To look for lipoproteins with a calculated weight-matrix:

  Call at the command line the command:

perl -w find_orf_score.pl <1> <2> <3>

where:

	<1> is the weight-matrix file;
	<2> is the multifasta file containing the sequences to be analysed;
	<3> is the results output file.

Example:

perl -w find_orf_score.pl matrices/bb.fasta example.fasta example.out

--

OBS: You must have the perl interpreter installed at your operating system.
The software was installed and runned in POSIX systems (Linux, Solaris).
However, it should run in other OS as well (Windows, Mac, etc).

-------------------------------------------------------------------------------

* UNDERSTANDING THE OUTPUT FILE:

The output file classifies the analysed ORF in one of the types below:

a) PROBABLE LIPOPROTEIN: very likely to be a lipoprotein; 
b) POSSIBLE LIPOPROTEIN: may be a lipoprotein;
c) NOT LIPOPROTEIN: not a lipoprotein at all.

Example:

>Results for ORF 6382311: PROBABLE LIPOPROTEIN
BEST HIT:
        cleavage site = 23
        score H-Region = 29.500000
        score Lipobox = 6.848000
        size of H-Region = 9
        score N-Region = 3
SECONDARY HIT:
        cleavage site = 14
        score H-Region = 0.000000
        score Lipobox = -94.193000
        size of H-Region = 0
        score N-Region = 3

Where:

cleavage site: position where the lipoprotein should be cut;
score H-Region: the score of the hydrophobic region;
score Lipobox: the score of the lipoprotein's lipobox;
score N-Region: the score of the N-terminal region;
size of H-Region: size of the hydrophobic region (in aminoacids);
BEST HIT: the best hit found;
SECONDARY HIT: the second best hit (if found);

If no hits are found, the "cleavage site" field shows a negative value.

-------------------------------------------------------------------------------
