Full Code of pltrdy/files2rouge for AI

master d3d09c838741 cached
56 files
648.4 KB
207.7k tokens
14 symbols
1 requests
Download .txt
Showing preview only (675K chars total). Download the full file or copy to clipboard to get everything.
Repository: pltrdy/files2rouge
Branch: master
Commit: d3d09c838741
Files: 56
Total size: 648.4 KB

Directory structure:
gitextract_z1nyx_hr/

├── .gitignore
├── Dockerfile
├── LICENSE
├── MANIFEST.in
├── README.md
├── experiments/
│   └── openNMT.0.md
├── files2rouge/
│   ├── RELEASE-1.5.5/
│   │   ├── README.txt
│   │   ├── RELEASE-NOTE.txt
│   │   ├── ROUGE-1.5.5.pl
│   │   ├── XML/
│   │   │   ├── DOM/
│   │   │   │   ├── AttDef.pod
│   │   │   │   ├── AttlistDecl.pod
│   │   │   │   ├── Attr.pod
│   │   │   │   ├── CDATASection.pod
│   │   │   │   ├── CharacterData.pod
│   │   │   │   ├── Comment.pod
│   │   │   │   ├── DOMException.pm
│   │   │   │   ├── DOMImplementation.pod
│   │   │   │   ├── Document.pod
│   │   │   │   ├── DocumentFragment.pod
│   │   │   │   ├── DocumentType.pod
│   │   │   │   ├── Element.pod
│   │   │   │   ├── ElementDecl.pod
│   │   │   │   ├── Entity.pod
│   │   │   │   ├── EntityReference.pod
│   │   │   │   ├── NamedNodeMap.pm
│   │   │   │   ├── NamedNodeMap.pod
│   │   │   │   ├── Node.pod
│   │   │   │   ├── NodeList.pm
│   │   │   │   ├── NodeList.pod
│   │   │   │   ├── Notation.pod
│   │   │   │   ├── Parser.pod
│   │   │   │   ├── PerlSAX.pm
│   │   │   │   ├── ProcessingInstruction.pod
│   │   │   │   ├── Text.pod
│   │   │   │   └── XMLDecl.pod
│   │   │   ├── DOM.pm
│   │   │   ├── Handler/
│   │   │   │   └── BuildDOM.pm
│   │   │   └── RegExp.pm
│   │   ├── data/
│   │   │   ├── WordNet-1.6-Exceptions/
│   │   │   │   ├── adj.exc
│   │   │   │   ├── adv.exc
│   │   │   │   ├── buildExeptionDB.pl
│   │   │   │   ├── noun.exc
│   │   │   │   └── verb.exc
│   │   │   ├── WordNet-2.0-Exceptions/
│   │   │   │   ├── adj.exc
│   │   │   │   ├── adv.exc
│   │   │   │   ├── buildExeptionDB.pl
│   │   │   │   ├── noun.exc
│   │   │   │   └── verb.exc
│   │   │   └── smart_common_words.txt
│   │   └── runROUGE-test.pl
│   ├── __init__.py
│   ├── files2rouge.py
│   ├── settings.py
│   └── utils.py
├── setup.py
└── setup_rouge.py

================================================
FILE CONTENTS
================================================

================================================
FILE: .gitignore
================================================
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
env/
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
*.egg-info/
.installed.cfg
*.egg

# PyInstaller
#  Usually these files are written by a python script from a template
#  before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*,cover
.hypothesis/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
target/

# IPython Notebook
.ipynb_checkpoints

# pyenv
.python-version

# celery beat schedule file
celerybeat-schedule

# dotenv
.env

# virtualenv
venv/
ENV/

# Spyder project settings
.spyderproject

# Rope project settings
.ropeproject


================================================
FILE: Dockerfile
================================================
FROM python:3.6-stretch

MAINTAINER cgebe

RUN apt-get update && \
    apt-get install -y cpanminus && \
    cpanm --force XML::Parser

COPY . /etc/rouge
WORKDIR /etc/rouge

RUN pip install -U git+https://github.com/pltrdy/pyrouge && \
    echo | python setup_rouge.py && \
    python setup.py install

ENV DATA_DIR /etc/rouge/data
VOLUME ["/etc/rouge/data"]

ENTRYPOINT ["/bin/bash"]


================================================
FILE: LICENSE
================================================
MIT License

Copyright (c) 2017 

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.


================================================
FILE: MANIFEST.in
================================================
include files2rouge/settings.json


================================================
FILE: README.md
================================================
# Files2ROUGE
## Motivations
Given two files with the same number of lines, `files2rouge` calculates the average ROUGE scores of each sequence (=line). Each sequence may contain multiple sentences. In this case, the end of sentence string must be passed using the `--eos` flag (default: "."). Running `files2rouge` with a wrong eos delimiter may lead to incorrect ROUGE-L score.


You may also be interested in a Python implementation (instead of a wrapper): <https://github.com/pltrdy/rouge>.

```bash
$ files2rouge --help
usage: files2rouge [-h] [-v] [-a ARGS] [-s SAVETO] [-e EOS] [-m] [-i]
                   reference summary

Calculating ROUGE score between two files (line-by-line)

positional arguments:
  reference             Path of references file
  summary               Path of summary file

optional arguments:
  -h, --help            show this help message and exit
  -v, --verbose         Prints ROUGE logs
  -a ARGS, --args ARGS  ROUGE Arguments
  -s SAVETO, --saveto SAVETO
                        File to save scores
  -e EOS, --eos EOS     End of sentence separator (for multisentence).
                        Default: "."
  -m, --stemming        DEPRECATED: stemming is now default behavior
  -nm, --no_stemming    Switch off stemming
  -i, --ignore_empty
```

## Getting Started
**0) Install prerequisites**
```bash
pip install -U git+https://github.com/pltrdy/pyrouge
```
(**NOTE:** running `pip install pyrouge` would not work as the package is out of date on PyPI)


**1) Clone the repo, setup the module and ROUGE**
```bash
git clone https://github.com/pltrdy/files2rouge.git     
cd files2rouge
python setup_rouge.py
python setup.py install
```
**Do not forget to run `setup_rouge`**    

**2) Run `files2rouge.py`** 
```bash
files2rouge references.txt summaries.txt 
```

**Outputs:**
When `--verbose` is set, the script prints progress and remaining time on `stderr`.  This can be changed using `--verbose` in order to outputs ROUGE execution logs. 

Default output example:
```
Preparing documents...
Running ROUGE...
---------------------------------------------
1 ROUGE-1 Average_R: 0.28242 (95%-conf.int. 0.25721 - 0.30877)
1 ROUGE-1 Average_P: 0.30157 (95%-conf.int. 0.27114 - 0.33506)
1 ROUGE-1 Average_F: 0.28196 (95%-conf.int. 0.25704 - 0.30722)
---------------------------------------------
1 ROUGE-2 Average_R: 0.10395 (95%-conf.int. 0.08298 - 0.12600)
1 ROUGE-2 Average_P: 0.11458 (95%-conf.int. 0.08873 - 0.14023)
1 ROUGE-2 Average_F: 0.10489 (95%-conf.int. 0.08303 - 0.12741)
---------------------------------------------
1 ROUGE-L Average_R: 0.25231 (95%-conf.int. 0.22709 - 0.27771)
1 ROUGE-L Average_P: 0.26830 (95%-conf.int. 0.23834 - 0.29818)
1 ROUGE-L Average_F: 0.25142 (95%-conf.int. 0.22741 - 0.27533)

Elapsed time: 0.458 secondes
```

## Call `files2rouge` from Python
```
import files2rouge
files2rouge.run(hyp_path, ref_path)
```

## ROUGE Args
One can specify which ROUGE args to use using the flag `--args` (or `-a`).    
The default behavior is equivalent to: 
```
files2rouge reference.txt summary.txt -a "-c 95 -r 1000 -n 2 -a" # be sure to write args betwen double-quotes
```
You can find more informations about these arguments [here](./files2rouge/RELEASE-1.5.5/README.txt)

## Known issues
* `ROUGE-1.5.5.pl - XML::Parser dependency error`: see [issue #9](https://github.com/pltrdy/files2rouge/issues/9).

## More informations
* [ROUGE Original Paper (Lin 2004)](http://www.aclweb.org/anthology/W04-1013)
* [ROUGE-1.5.5/README.txt](./files2rouge/RELEASE-1.5.5/README.txt)
* **Use cases:**
  * [Text Summarization using OpenNMT](./experiments/openNMT.0.md)
* About `files2rouge.py`: run `files2rouge.py --help`


================================================
FILE: experiments/openNMT.0.md
================================================

# Motivations

* Replicate results for Text Summarization task on Gigaword (see 'About')
* Getting started with Text Summarization using `OpenNMT` ([src](https://github.com/OpenNMT/OpenNMT))
* Getting started with ROUGE scoring using `files2rouge` ([src](https://github.com/pltrdy/files2rouge)) 

# About
 * Reference: http://opennmt.net//Models/#english-summarization
 * Dataset: https://github.com/harvardnlp/sent-summary 
 * Expected results:
   * R1: 33.13 
   * R2: 16.09 
   * RL: 31.00
 * OpenNMT v0.2.0. (precisely using commit from the 4th of Jan., 2017, 561994adcd147f9f77cc744a041152c3182a9300)
 * file2rouge commit: 5397befa8397017964d21aa61a4e399dedd5c340

# Setup

```shell
git clone https://github.com/OpenNMT/OpenNMT.git opennmt
git clone --recursive https://github.com/pltrdy/files2rouge.git files2rouge
```
Download data from [here](https://github.com/harvardnlp/sent-summary) and extract (`tar -xzf summary.tar.gz`) to `./data`.


**We assume that your file system is like:**

```
./   
  opennmt/   
  data/   
  file2rouge/   
```

# Building model
Following the [guide](http://opennmt.net//Guide/)
```shell
# First, move to OpenNMT dir
cd opennmt
```
**1) Preprocess**   
```shell
th preprocess.lua -train_src ../data/train/train.article.txt -train_tgt ../data/train/train.title.txt -valid_src ../data/train/valid.article.filter.txt -valid_tgt ../data/train/valid.title.filter.txt -save_data ../data/train/textsum
```
**2) Train**   
```shell
th train.lua -data ./textsum_train/textsum-train.t7  -save_model textsum
```
or using GPU:
```shell
th train.lua -data ./textsum_train/textsum_model-train.t7  -save_model textsum -gpuid 1
```
**3) Generate summary**   
```shell
th translate.lua -model textsum_final.t7 -src ../data/Giga/inputs.txt
```
**(add `-gpuid 1` if you trained the model using GPU)**     
The output will be in `pred.txt`

# ROUGE Scoring using `files2rouge`
```shell
cd ../files2rouge
./files2rouge --ref ../data/Giga/task1_ref0.txt --summ ../opennmt/pred.txt
```

# Results
| ROUGE-1 | ROUGE-2 | ROUGE-L |
|---------|---------|---------|
|  34.2   |  16.2   |  31.9   |


================================================
FILE: files2rouge/RELEASE-1.5.5/README.txt
================================================
A Brief Introduction of the ROUGE Summary Evaluation Package
by Chin-Yew LIN 
Univeristy of Southern California/Information Sciences Institute
05/26/2005

<<WHAT'S NEW>>

(1) Correct the resampling routine which ignores the last evaluation
    item in the evaluation list. Therefore, the average scores reported
    by ROUGE is only based on the first N-1 evaluation items.
    Thanks Barry Schiffman at Columbia University to report this bug.
    This bug only affects ROUGE-1.5.X. For pre-1.5 ROUGE, it only affects
    the computation of confidence interval (CI) estimation, i.e. CI is only
    estimated by the first N-1 evaluation items, but it *does not* affect
    average scores.
(2) Correct stemming on multi-token BE heads and modifiers.
    Previously, only single token heads and modifiers were assumed.
(3) Change read_text and read_text_LCS functions to read exact words or
    bytes required by users. Previous versions carry out whitespace 
    compression and other string clear up actions before enforce the length
    limit. 
(4) Add the capability to score summaries in Basic Element (BE)
    format by using option "-3", standing for BE triple. There are 6
    different modes in BE scoring. We suggest using *"-3 HMR"* on BEs
    extracted from Minipar parse trees based on our correlation analysis
    of BE-based scoring vs. human judgements on DUC 2002 & 2003 automatic
    summaries.
(5) ROUGE now generates three scores (recall, precision and F-measure)
    for each evaluation. Previously, only one score is generated
    (recall). Precision and F-measure scores are useful when the target
    summary length is not enforced. Only recall scores were necessary since 
    DUC guideline dictated the limit on summary length. For comparison to
    previous DUC results, please use the recall scores. The default alpha
    weighting for computing F-measure is 0.5. Users can specify a
    particular alpha weighting that fits their application scenario using
    option "-p alpha-weight". Where *alpha-weight* is a number between 0
    and 1 inclusively.
(6) Pre-1.5 version of ROUGE used model average to compute the overall
    ROUGE scores when there are multiple references. Starting from v1.5+,
    ROUGE provides an option to use the best matching score among the
    references as the final score. The model average option is specified
    using "-f A" (for Average) and the best model option is specified
    using "-f B" (for the Best). The "-f A" option is better when use
    ROUGE in summarization evaluations; while "-f B" option is better when
    use ROUGE in machine translation (MT) and definition
    question-answering (DQA) evaluations since in a typical MT or DQA
    evaluation scenario matching a single reference translation or
    definition answer is sufficient. However, it is very likely that
    multiple different but equally good summaries exist in summarization
    evaluation.
(7) ROUGE v1.5+ also provides the option to specify whether model unit
    level average will be used (macro-average, i.e. treating every model
    unit equally) or token level average will be used (micro-average,
    i.e. treating every token equally). In summarization evaluation, we
    suggest using model unit level average and this is the default setting
    in ROUGE. To specify other average mode, use "-t 0" (default) for
    model unit level average, "-t 1" for token level average and "-t 2"
    for output raw token counts in models, peers, and matches.
(8) ROUGE now offers the option to use file list as the configuration
    file. The input format of the summary files are specified using the
    "-z INPUT-FORMAT" option. The INPUT-FORMAT can be SEE, SPL, ISI or
    SIMPLE. When "-z" is specified, ROUGE assumed that the ROUGE
    evaluation configuration file is a file list with each evaluation
    instance per line in the following format:

peer_path1 model_path1 model_path2 ... model_pathN
peer_path2 model_path1 model_path2 ... model_pathN
...
peer_pathM model_path1 model_path2 ... model_pathN

  The first file path is the peer summary (system summary) and it
  follows with a list of model summaries (reference summaries) separated
  by white spaces (spaces or tabs).
(9) When stemming is applied, a new WordNet exception database based
    on WordNet 2.0 is used. The new database is included in the data
    directory.

<<USAGE>>

(1) Use "-h" option to see a list of options.
    Summary:
Usage: ROUGE-1.5.4.pl
         [-a (evaluate all systems)] 
         [-c cf]
         [-d (print per evaluation scores)] 
         [-e ROUGE_EVAL_HOME] 
         [-h (usage)] 
         [-b n-bytes|-l n-words] 
         [-m (use Porter stemmer)] 
         [-n max-ngram] 
         [-s (remove stopwords)] 
         [-r number-of-samples (for resampling)] 
         [-2 max-gap-length (if < 0 then no gap length limit)] 
         [-3 <H|HM|HMR|HM1|HMR1|HMR2>] 
         [-u (include unigram in skip-bigram) default no)] 
         [-U (same as -u but also compute regular skip-bigram)] 
         [-w weight (weighting factor for WLCS)] 
         [-v (verbose)] 
         [-x (do not calculate ROUGE-L)] 
         [-f A|B (scoring formula)] 
         [-p alpha (0 <= alpha <=1)] 
         [-t 0|1|2 (count by token instead of sentence)] 
         [-z <SEE|SPL|ISI|SIMPLE>] 
         <ROUGE-eval-config-file> [<systemID>]

  ROUGE-eval-config-file: Specify the evaluation setup. Three files come with the ROUGE 
            evaluation package, i.e. ROUGE-test.xml, verify.xml, and verify-spl.xml are 
            good examples.

  systemID: Specify which system in the ROUGE-eval-config-file to perform the evaluation.
            If '-a' option is used, then all systems are evaluated and users do not need to
            provide this argument.

  Default:
    When running ROUGE without supplying any options (except -a), the following defaults are used:
    (1) ROUGE-L is computed;
    (2) 95% confidence interval;
    (3) No stemming;
    (4) Stopwords are inlcuded in the calculations;
    (5) ROUGE looks for its data directory first through the ROUGE_EVAL_HOME environment variable. If
        it is not set, the current directory is used.
    (6) Use model average scoring formula.
    (7) Assign equal importance of ROUGE recall and precision in computing ROUGE f-measure, i.e. alpha=0.5.
    (8) Compute average ROUGE by averaging sentence (unit) ROUGE scores.
  Options:
    -2: Compute skip bigram (ROGUE-S) co-occurrence, also specify the maximum gap length between two words (skip-bigram)
    -u: Compute skip bigram as -2 but include unigram, i.e. treat unigram as "start-sentence-symbol unigram"; -2 has to be specified.
    -3: Compute BE score.
        H    -> head only scoring (does not applied to Minipar-based BEs).
        HM   -> head and modifier pair scoring.
        HMR  -> head, modifier and relation triple scoring.
        HM1  -> H and HM scoring (same as HM for Minipar-based BEs).
        HMR1 -> HM and HMR scoring (same as HMR for Minipar-based BEs).
        HMR2 -> H, HM and HMR scoring (same as HMR for Minipar-based BEs).
    -a: Evaluate all systems specified in the ROUGE-eval-config-file.
    -c: Specify CF\% (0 <= CF <= 100) confidence interval to compute. The default is 95\% (i.e. CF=95).
    -d: Print per evaluation average score for each system.
    -e: Specify ROUGE_EVAL_HOME directory where the ROUGE data files can be found.
        This will overwrite the ROUGE_EVAL_HOME specified in the environment variable.
    -f: Select scoring formula: 'A' => model average; 'B' => best model
    -h: Print usage information.
    -b: Only use the first n bytes in the system/peer summary for the evaluation.
    -l: Only use the first n words in the system/peer summary for the evaluation.
    -m: Stem both model and system summaries using Porter stemmer before computing various statistics.
    -n: Compute ROUGE-N up to max-ngram length will be computed.
    -p: Relative importance of recall and precision ROUGE scores. Alpha -> 1 favors precision, Alpha -> 0 favors recall.
    -s: Remove stopwords in model and system summaries before computing various statistics.
    -t: Compute average ROUGE by averaging over the whole test corpus instead of sentences (units).
        0: use sentence as counting unit, 1: use token as couting unit, 2: same as 1 but output raw counts
        instead of precision, recall, and f-measure scores. 2 is useful when computation of the final,
        precision, recall, and f-measure scores will be conducted later.
    -r: Specify the number of sampling point in bootstrap resampling (default is 1000).
        Smaller number will speed up the evaluation but less reliable confidence interval.
    -w: Compute ROUGE-W that gives consecutive matches of length L in an LCS a weight of 'L^weight' instead of just 'L' as in LCS.
        Typically this is set to 1.2 or other number greater than 1.
    -v: Print debugging information for diagnositic purpose.
    -x: Do not calculate ROUGE-L.
    -z: ROUGE-eval-config-file is a list of peer-model pair per line in the specified format (SEE|SPL|ISI|SIMPLE).

(2) Please read RELEASE-NOTE.txt for information about updates from previous versions.

(3) The following files coming with this package in the "sample-output"
    directory are the expected output of the evaluation files in the
    "sample-test" directory.
    (a) use "data" as ROUGE_EVAL_HOME, compute 95% confidence interval,
	compute ROUGE-L (longest common subsequence, default),
        compute ROUGE-S* (skip bigram) without gap length limit,
        compute also ROUGE-SU* (skip bigram with unigram),
        run resampling 1000 times,
        compute ROUGE-N (N=1 to 4),
        compute ROUGE-W (weight = 1.2), and
	compute these ROUGE scores for all systems:
    ROUGE-test-c95-2-1-U-r1000-n4-w1.2-a.out        
    > ROUGE-1.5.4.pl -e data -c 95 -2 -1 -U -r 1000 -n 4 -w 1.2 -a ROUGE-test.xml

    (b) Same as (a) but apply Porter's stemmer on the input:
    ROUGE-test-c95-2-1-U-r1000-n4-w1.2-a-m.out        
    > ROUGE-1.5.4.pl -e data -c 95 -2 -1 -U -r 1000 -n 4 -w 1.2 -m -a ROUGE-test.xml

    (c) Same as (b) but apply also a stopword list on the input:
    ROUGE-test-c95-2-1-U-r1000-n4-w1.2-a-m-s.out        
    > ROUGE-1.5.4.pl -e data -c 95 -2 -1 -U -r 1000 -n 4 -w 1.2 -m -s -a ROUGE-test.xml

    (d) Same as (a) but apply a summary length limit of 10 words:
    ROUGE-test-c95-2-1-U-r1000-n4-w1.2-l10-a.out        
    > ROUGE-1.5.4.pl -e data -c 95 -2 -1 -U -r 1000 -n 4 -w 1.2 -l 10 -a ROUGE-test.xml

    (e) Same as (d) but apply Porter's stemmer on the input:
    ROUGE-test-c95-2-1-U-r1000-n4-w1.2-l10-a-m.out        
    > ROUGE-1.5.4.pl -e data -c 95 -2 -1 -U -r 1000 -n 4 -w 1.2 -l 10 -m -a ROUGE-test.xml

    (f) Same as (e) but apply also a stopword list on the input:
    ROUGE-test-c95-2-1-U-r1000-n4-w1.2-l10-a-m-s.out        
    > ROUGE-1.5.4.pl -e data -c 95 -2 -1 -U -r 1000 -n 4 -w 1.2 -l 10 -m -s -a ROUGE-test.xml

    (g) Same as (a) but apply a summary lenght limit of 75 bytes:
    ROUGE-test-c95-2-1-U-r1000-n4-w1.2-b75-a.out        
    > ROUGE-1.5.4.pl -e data -c 95 -2 -1 -U -r 1000 -n 4 -w 1.2 -b 75 -a ROUGE-test.xml

    (h) Same as (g) but apply Porter's stemmer on the input:
    ROUGE-test-c95-2-1-U-r1000-n4-w1.2-b75-a-m.out        
    > ROUGE-1.5.4.pl -e data -c 95 -2 -1 -U -r 1000 -n 4 -w 1.2 -b 75 -m -a ROUGE-test.xml

    (i) Same as (h) but apply also a stopword list on the input:
    ROUGE-test-c95-2-1-U-r1000-n4-w1.2-b75-a-m-s.out        
    > ROUGE-1.5.4.pl -e data -c 95 -2 -1 -U -r 1000 -n 4 -w 1.2 -b 75 -m -s -a ROUGE-test.xml

  Sample DUC2002 data (1 system and 1 model only per DUC 2002 topic), their BE and
    ROUGE evaluation configuration file in XML and file list format,
    and their expected output are also included for your reference.

    (a) Use DUC2002-BE-F.in.26.lst, a BE files list, as ROUGE the
        configuration file:
        command> ROUGE-1.5.4.pl -3 HM -z SIMPLE DUC2002-BE-F.in.26.lst 26
	output:  DUC2002-BE-F.in.26.lst.out
    (b) Use DUC2002-BE-F.in.26.simple.xml as ROUGE XML evaluation configuration file:
        command> ROUGE-1.5.4.pl -3 HM DUC2002-BE-F.in.26.simple.xml 26
	output:  DUC2002-BE-F.in.26.simple.out
    (c) Use DUC2002-BE-L.in.26.lst, a BE files list, as ROUGE the
        configuration file:
        command> ROUGE-1.5.4.pl -3 HM -z SIMPLE DUC2002-BE-L.in.26.lst 26
	output:  DUC2002-BE-L.in.26.lst.out
    (d) Use DUC2002-BE-L.in.26.simple.xml as ROUGE XML evaluation configuration file:
        command> ROUGE-1.5.4.pl -3 HM DUC2002-BE-L.in.26.simple.xml 26
	output:  DUC2002-BE-L.in.26.simple.out
    (e) Use DUC2002-ROUGE.in.26.spl.lst, a BE files list, as ROUGE the
        configuration file:
        command> ROUGE-1.5.4.pl -n 4 -z SPL DUC2002-ROUGE.in.26.spl.lst 26
	output:  DUC2002-ROUGE.in.26.spl.lst.out
    (f) Use DUC2002-ROUGE.in.26.spl.xml as ROUGE XML evaluation configuration file:
        command> ROUGE-1.5.4.pl -n 4 DUC2002-ROUGE.in.26.spl.xml 26
	output:  DUC2002-ROUGE.in.26.spl.out

<<INSTALLATION>>

(1) You need to have DB_File installed. If the Perl script complains
    about database version incompatibility, you can create a new
    WordNet-2.0.exc.db by running the buildExceptionDB.pl script in
    the "data/WordNet-2.0-Exceptions" subdirectory.
(2) You also need to install XML::DOM from http://www.cpan.org.
    Direct link: http://www.cpan.org/modules/by-module/XML/XML-DOM-1.43.tar.gz.
    You might need install extra Perl modules that are required by
    XML::DOM.
(3) Setup an environment variable ROUGE_EVAL_HOME that points to the
    "data" subdirectory. For example, if your "data" subdirectory
    located at "/usr/local/ROUGE-1.5.4/data" then you can setup
    the ROUGE_EVAL_HOME as follows:
    (a) Using csh or tcsh:
        $command_prompt>setenv ROUGE_EVAL_HOME /usr/local/ROUGE-1.5.4/data
    (b) Using bash
        $command_prompt>ROUGE_EVAL_HOME=/usr/local/ROUGE-1.5.4/data
	$command_prompt>export ROUGE_EVAL_HOME
(4) Run ROUGE-1.5.4.pl without supplying any arguments will give
    you a description of how to use the ROUGE script.
(5) Please look into the included ROUGE-test.xml, verify.xml. and
    verify-spl.xml evaluation configuration files for preparing your
    own evaluation setup. More detailed description will be provided
    later. ROUGE-test.xml and verify.xml specify the input from
    systems and references are in SEE (Summary Evaluation Environment)
    format (http://www.isi.edu/~cyl/SEE); while verify-spl.xml specify
    inputs are in sentence per line format.

<<DOCUMENTATION>>

(1) Please look into the "docs" directory for more information about
    ROUGE.
(2) ROUGE-Note-v1.4.2.pdf explains how ROUGE works. It was published in
    Proceedings of the Workshop on Text Summarization Branches Out
    (WAS 2004), Bacelona, Spain, 2004.
(3) NAACL2003.pdf presents the initial idea of applying n-gram
    co-occurrence statistics in automatic evaluation of
    summarization. It was publised in Proceedsings of 2003 Language
    Technology Conference (HLT-NAACL 2003), Edmonton, Canada, 2003.
(4) NTCIR2004.pdf discusses the effect of sample size on the
    reliability of automatic evaluation results using data in the past
    Document Understanding Conference (DUC) as examples. It was
    published in Proceedings of the 4th NTCIR Meeting, Tokyo, Japan, 2004.
(5) ACL2004.pdf shows how ROUGE can be applied on automatic evaluation
    of machine translation. It was published in Proceedings of the 42nd
    Annual Meeting of the Association for Computational Linguistics
    (ACL 2004), Barcelona, Spain, 2004.
(6) COLING2004.pdf proposes a new meta-evaluation framework, ORANGE, for
    automatic evaluation of automatic evaluation methods. We showed
    that ROUGE-S and ROUGE-L were significantly better than BLEU,
    NIST, WER, and PER automatic MT evalaution methods under the
    ORANGE framework. It was published in Proceedings of the 20th
    International Conference on Computational Linguistics (COLING 2004),
    Geneva, Switzerland, 2004.
(7) For information about BE, please go to http://www.isi.edu/~cyl/BE.

<<NOTE>>

    Thanks for using the ROUGE evaluation package. If you have any
questions or comments, please send them to cyl@isi.edu. I will do my
best to answer your questions.


================================================
FILE: files2rouge/RELEASE-1.5.5/RELEASE-NOTE.txt
================================================
# Revision Note: 05/26/2005, Chin-Yew LIN
#              1.5.5
#              (1) Correct stemming on multi-token BE heads and modifiers.
#                  Previously, only single token heads and modifiers were assumed.
#              (2) Correct the resampling routine which ignores the last evaluation
#                  item in the evaluation list. Therefore, the average scores reported
#                  by ROUGE is only based on the first N-1 evaluation items.
#                  Thanks Barry Schiffman at Columbia University to report this bug.
#                  This bug only affects ROUGE-1.5.X. For pre-1.5 ROUGE, it only affects
#                  the computation of confidence interval (CI) estimation, i.e. CI is only
#                  estimated by the first N-1 evaluation items, but it *does not* affect
#                  average scores.
#              (3) Change read_text and read_text_LCS functions to read exact words or
#                  bytes required by users. Previous versions carry out whitespace 
#                  compression and other string clear up actions before enforce the length
#                  limit. 
#              1.5.4.1
#              (1) Minor description change about "-t 0" option.
#              1.5.4
#              (1) Add easy evalution mode for single reference evaluations with -z
#                  option.
#              1.5.3
#              (1) Add option to compute ROUGE score based on SIMPLE BE format. Given
#                  a set of peer and model summary file in BE format with appropriate
#                  options, ROUGE will compute matching scores based on BE lexical
#                  matches.
#                  There are 6 options:
#                  1. H    : Head only match. This is similar to unigram match but
#                            only BE Head is used in matching. BEs generated by
#                            Minipar-based breaker do not include head-only BEs,
#                            therefore, the score will always be zero. Use HM or HMR
#                            optiions instead.
#                  2. HM   : Head and modifier match. This is similar to bigram or
#                            skip bigram but it's head-modifier bigram match based on
#                            parse result. Only BE triples with non-NIL modifier are
#                            included in the matching.
#                  3. HMR  : Head, modifier, and relation match. This is similar to
#                            trigram match but it's head-modifier-relation trigram
#                            match based on parse result. Only BE triples with non-NIL
#                            relation are included in the matching.
#                  4. HM1  : This is combination of H and HM. It is similar to unigram +
#                            bigram or skip bigram with unigram match but it's 
#                            head-modifier bigram match based on parse result.
#                            In this case, the modifier field in a BE can be "NIL"
#                  5. HMR1 : This is combination of HM and HMR. It is similar to
#                            trigram match but it's head-modifier-relation trigram
#                            match based on parse result. In this case, the relation
#                            field of the BE can be "NIL".
#                  6. HMR2 : This is combination of H, HM and HMR. It is similar to
#                            trigram match but it's head-modifier-relation trigram
#                            match based on parse result. In this case, the modifier and
#                            relation fields of the BE can both be "NIL".
#              1.5.2
#              (1) Add option to compute ROUGE score by token using the whole corpus
#                  as average unit instead of individual sentences. Previous versions of
#                  ROUGE uses sentence (or unit) boundary to break counting unit and takes
#                  the average score from the counting unit as the final score.
#                  Using the whole corpus as one single counting unit can potentially
#                  improve the reliablity of the final score that treats each token as
#                  equally important; while the previous approach considers each sentence as
#                  equally important that ignores the length effect of each individual
#                  sentences (i.e. long sentences contribute equal weight to the final
#                  score as short sentences.)
#                  +v1.2 provide a choice of these two counting modes that users can
#                  choose the one that fits their scenarios.
#              1.5.1
#              (1) Add precision oriented measure and f-measure to deal with different lengths
#                  in candidates and references. Importance between recall and precision can
#                  be controled by 'alpha' parameter:
#                  alpha -> 0: recall is more important
#                  alpha -> 1: precision is more important
#                  Following Chapter 7 in C.J. van Rijsbergen's "Information Retrieval".
#                  http://www.dcs.gla.ac.uk/Keith/Chapter.7/Ch.7.html
#                  F = 1/(alpha * (1/P) + (1 - alpha) * (1/R)) ;;; weighted harmonic mean
#              1.4.2
#              (1) Enforce length limit at the time when summary text is read. Previously (before
#                  and including v1.4.1), length limit was enforced at tokenization time.
#              1.4.1
#              (1) Fix potential over counting in ROUGE-L and ROUGE-W
#                  In previous version (i.e. 1.4 and order), LCS hit is computed
#                  by summing union hit over all model sentences. Each model sentence
#                  is compared with all peer sentences and mark the union LCS. The
#                  length of the union LCS is the hit of that model sentence. The
#                  final hit is then sum over all model union LCS hits. This potentially
#                  would over count a peer sentence which already been marked as contributed
#                  to some other model sentence. Therefore, double counting is resulted.
#                  This is seen in evalution where ROUGE-L score is higher than ROUGE-1 and
#                  this is not correct.
#                  ROUGEeval-1.4.1.pl fixes this by add a clip function to prevent
#                  double counting.
#              1.4
#              (1) Remove internal Jackknifing procedure:
#                  Now the ROUGE script will use all the references listed in the
#                  <MODEL></MODEL> section in each <EVAL></EVAL> section and no
#                  automatic Jackknifing is performed.
#                  If Jackknifing procedure is required when comparing human and system
#                  performance, then users have to setup the procedure in the ROUGE
#                  evaluation configuration script as follows:
#                  For example, to evaluate system X with 4 references R1, R2, R3, and R4. 
#                  We do the following computation:
#
#                  for system:            and for comparable human:
#                  s1 = X vs. R1, R2, R3    h1 = R4 vs. R1, R2, R3 
#                  s2 = X vs. R1, R3, R4    h2 = R2 vs. R1, R3, R4
#                  s3 = X vs. R1, R2, R4    h3 = R3 vs. R1, R2, R4
#                  s4 = X vs. R2, R3, R4    h4 = R1 vs. R2, R3, R4
#
#                  Average system score for X = (s1+s2+s3+s4)/4 and for human = (h1+h2+h3+h4)/4
#                  Implementation of this in a ROUGE evaluation configuration script is as follows:
#                  Instead of writing all references in a evaluation section as below:
#                    <EVAL ID="1">
#                    ...
#                    <PEERS>
#                    <P ID="X">systemX</X>
#                    <PEERS>
#                    <MODELS>
#                    <M ID="1">R1</M>
#                    <M ID="2">R2</M>
#                    <M ID="3">R3</M>
#                    <M ID="4">R4</M>
#                    <MODELS>
#                    </EVAL>
#                  we write the following:
#                    <EVAL ID="1-1">
#                    <PEERS>
#                    <P ID="X">systemX</X>
#                    <PEERS>
#                    <MODELS>
#                    <M ID="2">R2</M>
#                    <M ID="3">R3</M>
#                    <M ID="4">R4</M>
#                    <MODELS>
#                    </EVAL>
#                    <EVAL ID="1-2">
#                    <PEERS>
#                    <P ID="X">systemX</X>
#                    <PEERS>
#                    <MODELS>
#                    <M ID="1">R1</M>
#                    <M ID="3">R3</M>
#                    <M ID="4">R4</M>
#                    <MODELS>
#                    </EVAL>
#                    <EVAL ID="1-3">
#                    <PEERS>
#                    <P ID="X">systemX</X>
#                    <PEERS>
#                    <MODELS>
#                    <M ID="1">R1</M>
#                    <M ID="2">R2</M>
#                    <M ID="4">R4</M>
#                    <MODELS>
#                    </EVAL>
#                    <EVAL ID="1-4">
#                    <PEERS>
#                    <P ID="X">systemX</X>
#                    <PEERS>
#                    <MODELS>
#                    <M ID="1">R1</M>
#                    <M ID="2">R2</M>
#                    <M ID="3">R3</M>
#                    <MODELS>
#                    </EVAL>
#                    
#                  In this case, the system and human numbers are comparable.
#                  ROUGE as it is implemented for summarization evaluation is a recall-based metric. 
#                  As we increase the number of references, we are increasing the number of 
#                  count units (n-gram or skip-bigram or LCSes) in the target pool (i.e. 
#                  the number ends up in the denominator of any ROUGE formula is larger). 
#                  Therefore, a candidate summary has more chance to hit but it also has to 
#                  hit more. In the end, this means lower absolute ROUGE scores when more 
#                  references are used and using different sets of rerferences should  not 
#                  be compared to each other. There is no nomalization mechanism in ROUGE 
#                  to properly adjust difference due to different number of references used.
#                    
#                  In the ROUGE implementations before v1.4 when there are N models provided for 
#                  evaluating system X in the ROUGE evaluation script, ROUGE does the 
#                  following:
#                    (1) s1 = X vs. R2, R3, R4, ..., RN
#                    (2) s2 = X vs. R1, R3, R4, ..., RN
#                    (3) s3 = X vs. R1, R2, R4, ..., RN
#                    (4) s4 = X vs. R1, R2, R3, ..., RN
#                    (5) ...
#                    (6) sN= X vs. R1, R2, R3, ..., RN-1
#                  And the final ROUGE score is computed by taking average of (s1, s2, s3, 
#                  s4, ..., sN). When we provide only three references for evaluation of a 
#                  human summarizer, ROUGE does the same thing but using 2 out 3 
#                  references, get three numbers, and then take the average as the final 
#                  score. Now ROUGE (after v1.4) will use all references without this
#                  internal Jackknifing procedure. The speed of the evaluation should improve
#                  a lot, since only one set instead of four sets of computation will be
#                  conducted.
#              1.3
#              (1) Add skip bigram
#              (2) Add an option to specify the number of sampling point (default is 1000)
#              1.2.3
#              (1) Correct the enviroment variable option: -e. Now users can specify evironment
#                  variable ROUGE_EVAL_HOME using the "-e" option; previously this option is
#                  not active. Thanks Zhouyan Li of Concordia University, Canada pointing this
#                  out.
#              1.2.2
#              (1) Correct confidence interval calculation for median, maximum, and minimum.
#                  Line 390.
#              1.2.1
#              (1) Add sentence per line format input format. See files in Verify-SPL for examples.
#              (2) Streamline command line arguments.
#              (3) Use bootstrap resampling to estimate confidence intervals instead of using t-test
#                  or z-test which assume a normal distribution.
#              (4) Add LCS (longest common subsequence) evaluation method.
#              (5) Add WLCS (weighted longest common subsequence) evaluation method.
#              (6) Add length cutoff in bytes.
#              (7) Add an option to specify the longest ngram to compute. The default is 4.
#              1.2
#              (1) Change zero condition check in subroutine &computeNGramScores when
#                  computing $gram1Score from
#                  if($totalGram2Count!=0)  to
#                  if($totalGram1Count!=0)
#                  Thanks Ken Litkowski for this bug report.
#                  This original script will set gram1Score to zero if there is no
#                  bigram matches. This should rarely has significant affect the final score
#                  since (a) there are bigram matches most of time; (b) the computation
#                  of gram1Score is using Jackknifing procedure. However, this definitely
#                  did not compute the correct $gram1Score when there is no bigram matches.
#                  Therefore, users of version 1.1 should definitely upgrade to newer
#                  version of the script that does not contain this bug.
# Note:        To use this script, two additional data files are needed:
#              (1) smart_common_words.txt - contains stopword list from SMART IR engine
#              (2) WordNet-1.6.exc.db - WordNet 1.6 exception inflexion database
#              These two files have to be put in a directory pointed by the environment
#              variable: "ROUGE_EVAL_HOME".
#              If environment variable ROUGE_EVAL_HOME does not exist, this script will
#              will assume it can find these two database files in the current directory.


================================================
FILE: files2rouge/RELEASE-1.5.5/ROUGE-1.5.5.pl
================================================
#!/usr/bin/perl -w
# Add current dir to include
use File::Basename;
use lib dirname (__FILE__);

# Version:     ROUGE v1.5.5
# Date:        05/26/2005,05/19/2005,04/26/2005,04/03/2005,10/28/2004,10/25/2004,10/21/2004
# Author:      Chin-Yew Lin
# Description: Given an evaluation description file, for example: test.xml,
#              this script computes the averages of the average ROUGE scores for 
#              the evaluation pairs listed in the ROUGE evaluation configuration file.
#              For more information, please see:
#              http://www.isi.edu/~cyl/ROUGE
#              For more information about Basic Elements, please see:
#              http://www.isi.edu/~cyl/BE
# Revision Note:
#              1.5.5
#              (1) Correct stemming on multi-token BE heads and modifiers.
#                  Previously, only single token heads and modifiers were assumed.
#              (2) Correct the resampling routine which ignores the last evaluation
#                  item in the evaluation list. Therefore, the average scores reported
#                  by ROUGE is only based on the first N-1 evaluation items.
#                  Thanks Barry Schiffman at Columbia University to report this bug.
#                  This bug only affects ROUGE-1.5.X. For pre-1.5 ROUGE, it only affects
#                  the computation of confidence interval (CI) estimation, i.e. CI is only
#                  estimated by the first N-1 evaluation items, but it *does not* affect
#                  average scores.
#              (3) Change read_text and read_text_LCS functions to read exact words or
#                  bytes required by users. Previous versions carry out whitespace 
#                  compression and other string clear up actions before enforce the length
#                  limit. 
#              1.5.4.1
#              (1) Minor description change about "-t 0" option.
#              1.5.4
#              (1) Add easy evalution mode for single reference evaluations with -z
#                  option.
#              1.5.3
#              (1) Add option to compute ROUGE score based on SIMPLE BE format. Given
#                  a set of peer and model summary file in BE format with appropriate
#                  options, ROUGE will compute matching scores based on BE lexical
#                  matches.
#                  There are 6 options:
#                  1. H    : Head only match. This is similar to unigram match but
#                            only BE Head is used in matching. BEs generated by
#                            Minipar-based breaker do not include head-only BEs,
#                            therefore, the score will always be zero. Use HM or HMR
#                            optiions instead.
#                  2. HM   : Head and modifier match. This is similar to bigram or
#                            skip bigram but it's head-modifier bigram match based on
#                            parse result. Only BE triples with non-NIL modifier are
#                            included in the matching.
#                  3. HMR  : Head, modifier, and relation match. This is similar to
#                            trigram match but it's head-modifier-relation trigram
#                            match based on parse result. Only BE triples with non-NIL
#                            relation are included in the matching.
#                  4. HM1  : This is combination of H and HM. It is similar to unigram +
#                            bigram or skip bigram with unigram match but it's 
#                            head-modifier bigram match based on parse result.
#                            In this case, the modifier field in a BE can be "NIL"
#                  5. HMR1 : This is combination of HM and HMR. It is similar to
#                            trigram match but it's head-modifier-relation trigram
#                            match based on parse result. In this case, the relation
#                            field of the BE can be "NIL".
#                  6. HMR2 : This is combination of H, HM and HMR. It is similar to
#                            trigram match but it's head-modifier-relation trigram
#                            match based on parse result. In this case, the modifier and
#                            relation fields of the BE can both be "NIL".
#              1.5.2
#              (1) Add option to compute ROUGE score by token using the whole corpus
#                  as average unit instead of individual sentences. Previous versions of
#                  ROUGE uses sentence (or unit) boundary to break counting unit and takes
#                  the average score from the counting unit as the final score.
#                  Using the whole corpus as one single counting unit can potentially
#                  improve the reliablity of the final score that treats each token as
#                  equally important; while the previous approach considers each sentence as
#                  equally important that ignores the length effect of each individual
#                  sentences (i.e. long sentences contribute equal weight to the final
#                  score as short sentences.)
#                  +v1.2 provide a choice of these two counting modes that users can
#                  choose the one that fits their scenarios.
#              1.5.1
#              (1) Add precision oriented measure and f-measure to deal with different lengths
#                  in candidates and references. Importance between recall and precision can
#                  be controled by 'alpha' parameter:
#                  alpha -> 0: recall is more important
#                  alpha -> 1: precision is more important
#                  Following Chapter 7 in C.J. van Rijsbergen's "Information Retrieval".
#                  http://www.dcs.gla.ac.uk/Keith/Chapter.7/Ch.7.html
#                  F = 1/(alpha * (1/P) + (1 - alpha) * (1/R)) ;;; weighted harmonic mean
#              1.4.2
#              (1) Enforce length limit at the time when summary text is read. Previously (before
#                  and including v1.4.1), length limit was enforced at tokenization time.
#              1.4.1
#              (1) Fix potential over counting in ROUGE-L and ROUGE-W
#                  In previous version (i.e. 1.4 and order), LCS hit is computed
#                  by summing union hit over all model sentences. Each model sentence
#                  is compared with all peer sentences and mark the union LCS. The
#                  length of the union LCS is the hit of that model sentence. The
#                  final hit is then sum over all model union LCS hits. This potentially
#                  would over count a peer sentence which already been marked as contributed
#                  to some other model sentence. Therefore, double counting is resulted.
#                  This is seen in evalution where ROUGE-L score is higher than ROUGE-1 and
#                  this is not correct.
#                  ROUGEeval-1.4.1.pl fixes this by add a clip function to prevent
#                  double counting.
#              1.4
#              (1) Remove internal Jackknifing procedure:
#                  Now the ROUGE script will use all the references listed in the
#                  <MODEL></MODEL> section in each <EVAL></EVAL> section and no
#                  automatic Jackknifing is performed. Please see RELEASE-NOTE.txt
#                  for more details.
#              1.3
#              (1) Add skip bigram
#              (2) Add an option to specify the number of sampling point (default is 1000)
#              1.2.3
#              (1) Correct the enviroment variable option: -e. Now users can specify evironment
#                  variable ROUGE_EVAL_HOME using the "-e" option; previously this option is
#                  not active. Thanks Zhouyan Li of Concordia University, Canada pointing this
#                  out.
#              1.2.2
#              (1) Correct confidence interval calculation for median, maximum, and minimum.
#                  Line 390.
#              1.2.1
#              (1) Add sentence per line format input format. See files in Verify-SPL for examples.
#              (2) Streamline command line arguments.
#              (3) Use bootstrap resampling to estimate confidence intervals instead of using t-test
#                  or z-test which assume a normal distribution.
#              (4) Add LCS (longest common subsequence) evaluation method.
#              (5) Add WLCS (weighted longest common subsequence) evaluation method.
#              (6) Add length cutoff in bytes.
#              (7) Add an option to specify the longest ngram to compute. The default is 4.
#              1.2
#              (1) Change zero condition check in subroutine &computeNGramScores when
#                  computing $gram1Score from
#                  if($totalGram2Count!=0)  to
#                  if($totalGram1Count!=0)
#                  Thanks Ken Litkowski for this bug report.
#                  This original script will set gram1Score to zero if there is no
#                  bigram matches. This should rarely has significant affect the final score
#                  since (a) there are bigram matches most of time; (b) the computation
#                  of gram1Score is using Jackknifing procedure. However, this definitely
#                  did not compute the correct $gram1Score when there is no bigram matches.
#                  Therefore, users of version 1.1 should definitely upgrade to newer
#                  version of the script that does not contain this bug.
# Note:        To use this script, two additional data files are needed:
#              (1) smart_common_words.txt - contains stopword list from SMART IR engine
#              (2) WordNet-2.0.exc.db - WordNet 2.0 exception inflexion database
#              These two files have to be put in a directory pointed by the environment
#              variable: "ROUGE_EVAL_HOME".
#              If environment variable ROUGE_EVAL_HOME does not exist, this script will
#              will assume it can find these two database files in the current directory.
# COPYRIGHT (C) UNIVERSITY OF SOUTHERN CALIFORNIA, 2002,2003,2004
# University of Southern California                                           
# Information Sciences Institute                                              
# 4676 Admiralty Way                                                          
# Marina Del Rey, California 90292-6695                                       
#                                                                             
# This software was partially developed under SPAWAR Grant No.
# N66001-00-1-8916 , and  the Government holds license rights under
# DAR 7-104.9(a)(c)(1).  It is  
# transmitted outside of the University of Southern California only under 
# written license agreements or software exchange agreements, and its use   
# is limited by these agreements.  At no time shall any recipient use       
# this software in any manner which conflicts or interferes with the        
# governmental license rights or other provisions of the governing           
# agreement under which it is obtained.  It is supplied "AS IS," without     
# any warranties of any kind.  It is furnished only on the basis that any    
# party who receives it indemnifies and holds harmless the parties who       
# furnish and originate it against any claims, demands or liabilities        
# connected with using it, furnishing it to others or providing it to a      
# third party.  THIS NOTICE MUST NOT BE REMOVED FROM THE SOFTWARE,
# AND IN THE EVENT THAT THE SOFTWARE IS DIVIDED, IT SHOULD BE
# ATTACHED TO EVERY PART.
#
# Contributor to its design is Chin-Yew Lin.

use XML::DOM;
use DB_File;
use Getopt::Std;
#-------------------------------------------------------------------------------------
use vars qw($opt_a $opt_b $opt_c $opt_d $opt_e $opt_f $opt_h $opt_H $opt_m $opt_n $opt_p $opt_s $opt_t $opt_l $opt_v $opt_w $opt_2 $opt_u $opt_x $opt_U $opt_3 $opt_M $opt_z);
my $usageFull="$0\n         [-a (evaluate all systems)] 
         [-c cf]
         [-d (print per evaluation scores)] 
         [-e ROUGE_EVAL_HOME] 
         [-h (usage)] 
         [-H (detailed usage)] 
         [-b n-bytes|-l n-words] 
         [-m (use Porter stemmer)] 
         [-n max-ngram] 
         [-s (remove stopwords)] 
         [-r number-of-samples (for resampling)] 
         [-2 max-gap-length (if < 0 then no gap length limit)] 
         [-3 <H|HM|HMR|HM1|HMR1|HMR2> (for scoring based on BE)] 
         [-u (include unigram in skip-bigram) default no)] 
         [-U (same as -u but also compute regular skip-bigram)] 
         [-w weight (weighting factor for WLCS)] 
         [-v (verbose)] 
         [-x (do not calculate ROUGE-L)] 
         [-f A|B (scoring formula)] 
         [-p alpha (0 <= alpha <=1)] 
         [-t 0|1|2 (count by token instead of sentence)] 
         [-z <SEE|SPL|ISI|SIMPLE>] 
         <ROUGE-eval-config-file> [<systemID>]\n
".
  "ROUGE-eval-config-file: Specify the evaluation setup. Three files come with the ROUGE evaluation package, i.e.\n".
  "          ROUGE-test.xml, verify.xml, and verify-spl.xml are good examples.\n".
  "systemID: Specify which system in the ROUGE-eval-config-file to perform the evaluation.\n".
  "          If '-a' option is used, then all systems are evaluated and users do not need to\n".
  "          provide this argument.\n".
  "Default:\n".
  "  When running ROUGE without supplying any options (except -a), the following defaults are used:\n".
  "  (1) ROUGE-L is computed;\n".
  "  (2) 95% confidence interval;\n".
  "  (3) No stemming;\n".
  "  (4) Stopwords are inlcuded in the calculations;\n".
  "  (5) ROUGE looks for its data directory first through the ROUGE_EVAL_HOME environment variable. If\n".
  "      it is not set, the current directory is used.\n".
  "  (6) Use model average scoring formula.\n".
  "  (7) Assign equal importance of ROUGE recall and precision in computing ROUGE f-measure, i.e. alpha=0.5.\n".
  "  (8) Compute average ROUGE by averaging sentence (unit) ROUGE scores.\n".
  "Options:\n".
  "  -2: Compute skip bigram (ROGUE-S) co-occurrence, also specify the maximum gap length between two words (skip-bigram)\n".
  "  -u: Compute skip bigram as -2 but include unigram, i.e. treat unigram as \"start-sentence-symbol unigram\"; -2 has to be specified.\n".
  "  -3: Compute BE score. Currently only SIMPLE BE triple format is supported.\n".
  "      H    -> head only scoring (does not applied to Minipar-based BEs).\n".
  "      HM   -> head and modifier pair scoring.\n".
  "      HMR  -> head, modifier and relation triple scoring.\n".
  "      HM1  -> H and HM scoring (same as HM for Minipar-based BEs).\n".
  "      HMR1 -> HM and HMR scoring (same as HMR for Minipar-based BEs).\n".
  "      HMR2 -> H, HM and HMR scoring (same as HMR for Minipar-based BEs).\n".
  "  -a: Evaluate all systems specified in the ROUGE-eval-config-file.\n".
  "  -c: Specify CF\% (0 <= CF <= 100) confidence interval to compute. The default is 95\% (i.e. CF=95).\n".
  "  -d: Print per evaluation average score for each system.\n".
  "  -e: Specify ROUGE_EVAL_HOME directory where the ROUGE data files can be found.\n".
  "      This will overwrite the ROUGE_EVAL_HOME specified in the environment variable.\n".
  "  -f: Select scoring formula: 'A' => model average; 'B' => best model\n".
  "  -h: Print usage information.\n".
  "  -H: Print detailed usage information.\n".
  "  -b: Only use the first n bytes in the system/peer summary for the evaluation.\n".
  "  -l: Only use the first n words in the system/peer summary for the evaluation.\n".
  "  -m: Stem both model and system summaries using Porter stemmer before computing various statistics.\n".
  "  -n: Compute ROUGE-N up to max-ngram length will be computed.\n".
  "  -p: Relative importance of recall and precision ROUGE scores. Alpha -> 1 favors precision, Alpha -> 0 favors recall.\n".
  "  -s: Remove stopwords in model and system summaries before computing various statistics.\n".
  "  -t: Compute average ROUGE by averaging over the whole test corpus instead of sentences (units).\n".
  "      0: use sentence as counting unit, 1: use token as couting unit, 2: same as 1 but output raw counts\n".
  "      instead of precision, recall, and f-measure scores. 2 is useful when computation of the final,\n".
  "      precision, recall, and f-measure scores will be conducted later.\n".
  "  -r: Specify the number of sampling point in bootstrap resampling (default is 1000).\n".
  "      Smaller number will speed up the evaluation but less reliable confidence interval.\n".
  "  -w: Compute ROUGE-W that gives consecutive matches of length L in an LCS a weight of 'L^weight' instead of just 'L' as in LCS.\n".
  "      Typically this is set to 1.2 or other number greater than 1.\n".
  "  -v: Print debugging information for diagnositic purpose.\n".
  "  -x: Do not calculate ROUGE-L.\n".
  "  -z: ROUGE-eval-config-file is a list of peer-model pair per line in the specified format (SEE|SPL|ISI|SIMPLE).\n";

my $usage="$0\n         [-a (evaluate all systems)] 
         [-c cf]
         [-d (print per evaluation scores)] 
         [-e ROUGE_EVAL_HOME] 
         [-h (usage)] 
         [-H (detailed usage)] 
         [-b n-bytes|-l n-words] 
         [-m (use Porter stemmer)] 
         [-n max-ngram] 
         [-s (remove stopwords)] 
         [-r number-of-samples (for resampling)] 
         [-2 max-gap-length (if < 0 then no gap length limit)] 
         [-3 <H|HM|HMR|HM1|HMR1|HMR2> (for scoring based on BE)] 
         [-u (include unigram in skip-bigram) default no)] 
         [-U (same as -u but also compute regular skip-bigram)] 
         [-w weight (weighting factor for WLCS)] 
         [-v (verbose)] 
         [-x (do not calculate ROUGE-L)] 
         [-f A|B (scoring formula)] 
         [-p alpha (0 <= alpha <=1)] 
         [-t 0|1|2 (count by token instead of sentence)] 
         [-z <SEE|SPL|ISI|SIMPLE>] 
         <ROUGE-eval-config-file> [<systemID>]
";
getopts('ahHb:c:de:f:l:mMn:p:st:r:2:3:w:uUvxz:');
my $systemID;

die $usageFull if defined($opt_H);
die $usage if defined($opt_h)||@ARGV==0;
die "Please specify the ROUGE configuration file or use option '-h' for help\n" if(@ARGV==0);
if(@ARGV==1&&defined($opt_z)) {
  $systemID="X"; # default system ID
}
elsif(@ARGV==1&&!defined($opt_a)) {
  die "Please specify a system ID to evaluate or use option '-a' to evaluate all systems. For more information, use option '-h'.\n";
}
elsif(@ARGV==2) {
  $systemID=$ARGV[1];
}
if(defined($opt_e)) {
  $stopwords="$opt_e/smart_common_words.txt";
  $wordnetDB="$opt_e/WordNet-2.0.exc.db";
}
else {
  if(exists($ENV{"ROUGE_EVAL_HOME"})) {
    $stopwords="$ENV{\"ROUGE_EVAL_HOME\"}/smart_common_words.txt";
    $wordnetDB="$ENV{\"ROUGE_EVAL_HOME\"}/WordNet-2.0.exc.db";
  }
  elsif(exists($ENV{"RED_EVAL_HOME"})) {
    $stopwords="$ENV{\"RED_EVAL_HOME\"}/smart_common_words.txt";
    $wordnetDB="$ENV{\"RED_EVAL_HOME\"}/WordNet-2.0.exc.db";
  }
  else {
    # if no environment variable exists then assume data files are in the current directory
    $stopwords="smart_common_words.txt";
    $wordnetDB="WordNet-2.0.exc.db";
  }
}

if(defined($opt_s)) {
  $useStopwords=0; # do not use stop words
}
else {
  $useStopwords=1; # use stop words
}

if(defined($opt_l)&&defined($opt_b)) {
  die "Please specify length limit in words or bytes but not both.\n";
}

if(defined($opt_l)) {
  $lengthLimit=$opt_l;
  $byteLimit=0;   # no byte limit
}
elsif(defined($opt_b)) {
  $lengthLimit=0; # no length limit in words
  $byteLimit=$opt_b;
}
else {
  $byteLimit=0;   # no byte limit
  $lengthLimit=0; # no length limit
}

unless(defined($opt_c)) {
  $opt_c=95;
}
else {
  if($opt_c<0||$opt_c>100) {
    die "Confidence interval should be within 0 and 100. Use option -h for more details.\n";
  }
}

if(defined($opt_w)) {
  if($opt_w>0) {
    $weightFactor=$opt_w;
  }
  else {
    die "ROUGE-W weight factor must greater than 0.\n";
  }
}
#unless(defined($opt_n)) {
#    $opt_n=4; # default maximum ngram is 4
#}
if(defined($opt_v)) {
  $debug=1;
}
else {
  $debug=0;
}

if(defined($opt_r)) {
  $numOfResamples=$opt_r;
}
else {
  $numOfResamples=1000;
}

if(defined($opt_2)) {
  $skipDistance=$opt_2;
}

if(defined($opt_3)) {
  $BEMode=$opt_3;
}

if(defined($opt_f)) {
  $scoreMode=$opt_f;
}
else {
  $scoreMode="A"; # default: use model average scoring formula
}

if(defined($opt_p)) {
  $alpha=$opt_p;
  if($alpha<0||
     $alpha>1) {
    die "Relative importance of ROUGE recall and precision has to be between 0 and 1 inclusively.\n";
  }
}
else {
  $alpha=0.5; # default is equal importance of ROUGE recall and precision
}

if(defined($opt_t)) {
  # make $opt_t as undef when appropriate option is given
  # when $opt_t is undef, sentence level average will be used
  if($opt_t==0) {
    $opt_t=undef;
  }
  elsif($opt_t!=1&&
	$opt_t!=2) {
    $opt_t=undef; # other than 1 or 2, let $opt_t to be undef
  }
}

if(defined($opt_z)) {
  # If opt_z is specified, the user has to specify a system ID that
  # is used for identification therefore -a option is not allowed.
  # Here we make it undef.
  $opt_a=undef;
}
#-------------------------------------------------------------------------------------
# Setup ROUGE scoring parameters
%ROUGEParam=();   # ROUGE scoring parameter
if(defined($lengthLimit)) {
  $ROUGEParam{"LENGTH"}=$lengthLimit;
}
else {
  $ROUGEParam{"LENGTH"}=undef;
}
if(defined($byteLimit)) {
  $ROUGEParam{"BYTE"}=$byteLimit;
}
else {
  $ROUGEParam{"BYTE"}=undef;
}
if(defined($opt_n)) { # ngram size
  $ROUGEParam{"NSIZE"}=$opt_n;
}
else {
  $ROUGEParam{"NSIZE"}=undef;
}
if(defined($weightFactor)) {
  $ROUGEParam{"WEIGHT"}=$weightFactor;
}
else {
  $ROUGEParam{"WEIGHT"}=undef;
}
if(defined($skipDistance)) {
  $ROUGEParam{"SD"}=$skipDistance;
}
else {
  $ROUGEParam{"SD"}=undef;
}
if(defined($scoreMode)) {
  $ROUGEParam{"SM"}=$scoreMode;
}
else {
  $ROUGEParam{"SM"}=undef;
}
if(defined($alpha)) {
  $ROUGEParam{"ALPHA"}=$alpha;
}
else {
  $ROUGEParam{"ALPHA"}=undef;
}
if(defined($opt_t)) {
  $ROUGEParam{"AVERAGE"}=$opt_t;
}
else {
  $ROUGEParam{"AVERAGE"}=undef;
}
if(defined($opt_3)) {
  $ROUGEParam{"BEMODE"}=$opt_3;
}
else {
  $ROUGEParam{"BEMODE"}=undef;
}
#-------------------------------------------------------------------------------------
# load stopwords
%stopwords=();
open(STOP,$stopwords)||die "Cannot open $stopwords\n";
while(defined($line=<STOP>)) {
  chomp($line);
  $stopwords{$line}=1;
}
close(STOP);
# load WordNet database
if(-e "$wordnetDB") {
  tie %exceptiondb,'DB_File',"$wordnetDB",O_RDONLY,0440,$DB_HASH or
    die "Cannot open exception db file for reading: $wordnetDB\n";
}
else {
  die "Cannot open exception db file for reading: $wordnetDB\n";
}
#-------------------------------------------------------------------------------------
# Initialize Porter Stemmer
&initialise();
#-------------------------------------------------------------------------------------
# Read and parse the document
my $parser = new XML::DOM::Parser;
my $doc;
unless(defined($opt_z)) {
  $doc=$parser->parsefile($ARGV[0]);
}
else {
  open($doc,$ARGV[0])||die "Cannot open $ARGV[0]\n";
}
%ROUGEEvals=();
@ROUGEEvalIDs=();
%ROUGEPeerIDTable=();
@allPeerIDs=();
%knownMissing=(); # remember missing submission already known
if(defined($doc)) {
  # read evaluation description file
  &readEvals(\%ROUGEEvals,\@ROUGEEvalIDs,\%ROUGEPeerIDTable,$doc,undef);
  # print evaluation configuration
  if(defined($opt_z)) {
    if(defined($ARGV[1])) {
      $systemID=$ARGV[1];
    }
    else {
      $systemID="X"; # default system ID in BE file list evaluation mode
    }
    push(@allPeerIDs,$systemID);
  }
  else {
    unless(defined($opt_a)) {
      $systemID=$ARGV[1];
      push(@allPeerIDs,$systemID);
    }
    else {
      # run evaluation for each peer listed in the description file
      @allPeerIDs=sort (keys %ROUGEPeerIDTable);
    }
  }
  foreach $peerID (@allPeerIDs) {
    %testIDs=();
    #	print "\@PEER($peerID)--------------------------------------------------\n";
    if(defined($opt_n)) {
      # evaluate a specific peer
      # compute ROUGE score up to $opt_n-gram
      for($n=1;$n<=$opt_n;$n++) {
	my (%ROUGEScores,%ROUGEAverages);
	
	%ROUGEScores=();
	foreach $e (@ROUGEEvalIDs) {
	  if($debug) {
	    print "\@Eval ($e)\n";
	  }
	  $ROUGEParam{"NSIZE"}=$n;
	  &computeROUGEX("N",\%ROUGEScores,$e,$ROUGEEvals{$e},$peerID,\%ROUGEParam);
	}
	# compute averages
	%ROUGEAverages=();
	&computeAverages(\%ROUGEScores,\%ROUGEAverages,$opt_t);
	&printResults($peerID,\%ROUGEAverages,\%ROUGEScores,"ROUGE-$n",$opt_c,$opt_t,$opt_d);
      }
    }
    unless(defined($opt_x)||defined($opt_3)) {
      #-----------------------------------------------
      # compute LCS score
      %ROUGEScores=();
      foreach $e (@ROUGEEvalIDs) {
	&computeROUGEX("L",\%ROUGEScores,$e,$ROUGEEvals{$e},$peerID,\%ROUGEParam);
      }
      # compute averages
      %ROUGEAverages=();
      &computeAverages(\%ROUGEScores,\%ROUGEAverages,$opt_t);
      &printResults($peerID,\%ROUGEAverages,\%ROUGEScores,"ROUGE-L",$opt_c,$opt_t,$opt_d);
    }
    if(defined($opt_w)) {
      #-----------------------------------------------
      # compute WLCS score
      %ROUGEScores=();
      foreach $e (@ROUGEEvalIDs) {
	&computeROUGEX("W",\%ROUGEScores,$e,$ROUGEEvals{$e},$peerID,\%ROUGEParam);
      }
      # compute averages
      %ROUGEAverages=();
      &computeAverages(\%ROUGEScores,\%ROUGEAverages,$opt_t);
      &printResults($peerID,\%ROUGEAverages,\%ROUGEScores,"ROUGE-W-$weightFactor",$opt_c,$opt_t,$opt_d);
    }
    if(defined($opt_2)) {
      #-----------------------------------------------
      # compute skip bigram score
      %ROUGEScores=();
      foreach $e (@ROUGEEvalIDs) {
	&computeROUGEX("S",\%ROUGEScores,$e,$ROUGEEvals{$e},$peerID,\%ROUGEParam);
      }
      # compute averages
      %ROUGEAverages=();
      &computeAverages(\%ROUGEScores,\%ROUGEAverages,$opt_t);
      if($skipDistance>=0) {
	if(defined($opt_u)) {
	  &printResults($peerID,\%ROUGEAverages,\%ROUGEScores,"ROUGE-SU$skipDistance",$opt_c,$opt_t,$opt_d);
	}
	elsif(defined($opt_U)) {
	  # print regular skip bigram results
	  &printResults($peerID,\%ROUGEAverages,\%ROUGEScores,"ROUGE-S$skipDistance",$opt_c,$opt_t,$opt_d);
	  #-----------------------------------------------
	  # compute skip bigram with unigram extension score
	  $opt_u=1;
	  %ROUGEScores=();
	  foreach $e (@ROUGEEvalIDs) {
	    &computeROUGEX("S",\%ROUGEScores,$e,$ROUGEEvals{$e},$peerID,\%ROUGEParam);
	  }
	  $opt_u=undef;
	  # compute averages
	  %ROUGEAverages=();
	  &computeAverages(\%ROUGEScores,\%ROUGEAverages,$opt_t);
	  &printResults($peerID,\%ROUGEAverages,\%ROUGEScores,"ROUGE-SU$skipDistance",$opt_c,$opt_t,$opt_d);
	}
	else {
	  &printResults($peerID,\%ROUGEAverages,\%ROUGEScores,"ROUGE-S$skipDistance",$opt_c,$opt_t,$opt_d);
	}
      }
      else {
	if(defined($opt_u)) {
	  &printResults($peerID,\%ROUGEAverages,\%ROUGEScores,"ROUGE-SU*",$opt_c,$opt_t,$opt_d);
	}
	else {
	  &printResults($peerID,\%ROUGEAverages,\%ROUGEScores,"ROUGE-S*",$opt_c,$opt_t,$opt_d);
	  if(defined($opt_U)) {
	    #-----------------------------------------------
	    # compute skip bigram with unigram extension score
	    $opt_u=1;
	    %ROUGEScores=();
	    foreach $e (@ROUGEEvalIDs) {
	      &computeROUGEX("S",\%ROUGEScores,$e,$ROUGEEvals{$e},$peerID,\%ROUGEParam);
	    }
	    $opt_u=undef;
	    # compute averages
	    %ROUGEAverages=();
	    &computeAverages(\%ROUGEScores,\%ROUGEAverages,$opt_t);
	    &printResults($peerID,\%ROUGEAverages,\%ROUGEScores,"ROUGE-SU*",$opt_c,$opt_t,$opt_d);
	  }
	}
      }
    }
    if(defined($opt_3)) {
      #-----------------------------------------------
      # compute Basic Element triple score
      %ROUGEScores=();
      foreach $e (@ROUGEEvalIDs) {
	&computeROUGEX("BE",\%ROUGEScores,$e,$ROUGEEvals{$e},$peerID,\%ROUGEParam);
      }
      # compute averages
      %ROUGEAverages=();
      &computeAverages(\%ROUGEScores,\%ROUGEAverages,$opt_t);
      &printResults($peerID,\%ROUGEAverages,\%ROUGEScores,"ROUGE-BE-$BEMode",$opt_c,$opt_t,$opt_d);
    }
  }
}
else {
  die "Document undefined\n";
}
if(defined($opt_z)) {
  close($doc);
}
untie %exceptiondb;

sub printResults {
  my $peerID=shift;
  my $ROUGEAverages=shift;
  my $ROUGEScores=shift;
  my $methodTag=shift;
  my $opt_c=shift;
  my $opt_t=shift;
  my $opt_d=shift;

  print "---------------------------------------------\n";
  if(!defined($opt_t)||$opt_t==1) {
    print "$peerID $methodTag Average_R: $ROUGEAverages->{'AvgR'} ";
    print "($opt_c\%-conf.int. $ROUGEAverages->{'CIAvgL_R'} - $ROUGEAverages->{'CIAvgU_R'})\n";
    print "$peerID $methodTag Average_P: $ROUGEAverages->{'AvgP'} ";
    print "($opt_c\%-conf.int. $ROUGEAverages->{'CIAvgL_P'} - $ROUGEAverages->{'CIAvgU_P'})\n";
    print "$peerID $methodTag Average_F: $ROUGEAverages->{'AvgF'} ";
    print "($opt_c\%-conf.int. $ROUGEAverages->{'CIAvgL_F'} - $ROUGEAverages->{'CIAvgU_F'})\n";
  }
  else {
    print "$peerID $methodTag M_count: ";
    print int($ROUGEAverages->{'M_cnt'});
    print " P_count: ";
    print int($ROUGEAverages->{'P_cnt'});
    print " H_count: ";
    print int($ROUGEAverages->{'H_cnt'});
    print "\n";
  }
  if(defined($opt_d)) {
    print ".............................................\n";
    &printPerEvalData($ROUGEScores,"$peerID $methodTag Eval");
  }
}

sub bootstrapResampling {
  my $scores=shift;
  my $instances=shift;
  my $seed=shift;
  my $opt_t=shift;
  my $sample;
  my ($i,$ridx);
  
  # Use $seed to seed the random number generator to make sure
  # we have the same random sequence every time, therefore a
  # consistent estimation of confidence interval in different runs.
  # This is not necessary. To ensure a consistent result in reporting
  # results using ROUGE, this is implemented.
  srand($seed);
  for($i=0;$i<@{$instances};$i++) {
    # generate a random index
    $ridx=int(rand(@{$instances}));
    unless(defined($sample)) {
      # setup the resampling array
      $sample=[];
      push(@$sample,$scores->{$instances->[$ridx]}[0]);
      push(@$sample,$scores->{$instances->[$ridx]}[1]);
      push(@$sample,$scores->{$instances->[$ridx]}[2]);
    }
    else {
      # update the resampling array
      $sample->[0]+=$scores->{$instances->[$ridx]}[0];
      $sample->[1]+=$scores->{$instances->[$ridx]}[1];
      $sample->[2]+=$scores->{$instances->[$ridx]}[2];
    }
  }
  # compute the average result for this resampling procedure
  unless(defined($opt_t)) {
    # per instance or sentence average
    if(@{$instances}>0) {
      $sample->[0]/=@{$instances};
      $sample->[1]/=@{$instances};
      $sample->[2]/=@{$instances};
    }
    else {
      $sample->[0]=0;
      $sample->[1]=0;
      $sample->[2]=0;
    }
  }
  else {
    if($opt_t==1) {
      # per token or corpus level average
      # output recall, precision, and f-measure score
      my ($tmpR,$tmpP,$tmpF);
      if($sample->[0]>0) {
	$tmpR=$sample->[2]/$sample->[0]; # recall
      }
      else {
	$tmpR=0;
      }
      if($sample->[1]>0) {
	$tmpP=$sample->[2]/$sample->[1]; # precision
      }
      else {
	$tmpP=0;
      }
      if((1-$alpha)*$tmpP+$alpha*$tmpR>0) {
	$tmpF=($tmpR*$tmpP)/((1-$alpha)*$tmpP+$alpha*$tmpR); # f-measure
      }
      else {
	$tmpF=0;
      }
      $sample->[0]=$tmpR;
      $sample->[1]=$tmpP;
      $sample->[2]=$tmpF;
    }
    else {
      # $opt_t!=1 => output raw model token count, peer token count, and hit count
      # do nothing, just return $sample
    }
  }
  return $sample;
}

sub by_value {
  $a<=>$b;
}

sub printPerEvalData {
  my $ROUGEScores=shift;
  my $tag=shift; # tag to identify each evaluation
  my (@instances,$i,$j);
  
  @instances=sort by_evalID (keys %$ROUGEScores);
  foreach $i (@instances) {
    # print average per evaluation score
    print "$tag $i R:$ROUGEScores->{$i}[0] P:$ROUGEScores->{$i}[1] F:$ROUGEScores->{$i}[2]\n";
  }
}

sub by_evalID {
  my ($a1,$b1);

  if($a=~/^([0-9]+)/o) {
    $a1=$1;
  }
  if($b=~/^([0-9]+)/o) {
    $b1=$1;
  }
  if(defined($a1)&&defined($b1)) {
    return $a1<=>$b1;
  }
  else {
    return $a cmp $b;
  }
}

sub computeAverages {
  my $ROUGEScores=shift;
  my $ROUGEAverages=shift;
  my $opt_t=shift;
  my ($avgAvgROUGE_R,$resampleAvgROUGE_R);
  my ($avgAvgROUGE_P,$resampleAvgROUGE_P);
  my ($avgAvgROUGE_F,$resampleAvgROUGE_F);
  my ($ciU,$ciL);
  my (@instances,$i,$j,@rankedArray_R,@rankedArray_P,@RankedArray_F);
  
  @instances=sort (keys %$ROUGEScores);
  $avgAvgROUGE_R=0;
  $avgAvgROUGE_P=0;
  $avgAvgROUGE_F=0;
  $resampleAvgROUGE_R=0;
  $resampleAvgROUGE_P=0;
  $resampleAvgROUGE_F=0;
  # compute totals
  foreach $i (@instances) {
    $avgAvgROUGE_R+=$ROUGEScores->{$i}[0]; # recall     ; or model token count
    $avgAvgROUGE_P+=$ROUGEScores->{$i}[1]; # precision  ; or peer token count
    $avgAvgROUGE_F+=$ROUGEScores->{$i}[2]; # f1-measure ; or match token count (hit)
  }
  # compute averages
  unless(defined($opt_t)) {
    # per sentence average
    if((scalar @instances)>0) {
      $avgAvgROUGE_R=sprintf("%7.5f",$avgAvgROUGE_R/(scalar @instances));
      $avgAvgROUGE_P=sprintf("%7.5f",$avgAvgROUGE_P/(scalar @instances));
      $avgAvgROUGE_F=sprintf("%7.5f",$avgAvgROUGE_F/(scalar @instances));
    }
    else {
      $avgAvgROUGE_R=sprintf("%7.5f",0);
      $avgAvgROUGE_P=sprintf("%7.5f",0);
      $avgAvgROUGE_F=sprintf("%7.5f",0);
    }
  }
  else {
    if($opt_t==1) {
      # per token average on corpus level
      my ($tmpR,$tmpP,$tmpF);
      if($avgAvgROUGE_R>0) {
	$tmpR=$avgAvgROUGE_F/$avgAvgROUGE_R;
      }
      else {
	$tmpR=0;
      }
      if($avgAvgROUGE_P>0) {
	$tmpP=$avgAvgROUGE_F/$avgAvgROUGE_P;
      }
      else {
	$tmpP=0;
      }
      if((1-$alpha)*$tmpP+$alpha*$tmpR>0) {
	$tmpF=($tmpR+$tmpP)/((1-$alpha)*$tmpP+$alpha*$tmpR);
      }
      else {
	$tmpF=0;
      }
      $avgAvgROUGE_R=sprintf("%7.5f",$tmpR);
      $avgAvgROUGE_P=sprintf("%7.5f",$tmpP);
      $avgAvgROUGE_F=sprintf("%7.5f",$tmpF);
    }
  }
  if(!defined($opt_t)||$opt_t==1) {
    # compute confidence intervals using bootstrap resampling
    @ResamplingArray=();
    for($i=0;$i<$numOfResamples;$i++) {
      my $sample;
      
      $sample=&bootstrapResampling($ROUGEScores,\@instances,$i,$opt_t);
      # sample contains average sum of the sample
      if(@ResamplingArray==0) {
	# setup the resampling array for Avg
	my $s;
	
	$s=[];
	push(@$s,$sample->[0]);
	push(@ResamplingArray,$s);
	$s=[];
	push(@$s,$sample->[1]);
	push(@ResamplingArray,$s);
	$s=[];
	push(@$s,$sample->[2]);
	push(@ResamplingArray,$s);
      }
      else {
	$rsa=$ResamplingArray[0];
	push(@{$rsa},$sample->[0]);
	$rsa=$ResamplingArray[1];
	push(@{$rsa},$sample->[1]);
	$rsa=$ResamplingArray[2];
	push(@{$rsa},$sample->[2]);
      }
    }
    # sort resampling results
    {
      # recall
      @rankedArray_R=sort by_value (@{$ResamplingArray[0]});
      $ResamplingArray[0]=\@rankedArray_R;
      for($x=0;$x<=$#rankedArray_R;$x++) {
	$resampleAvgROUGE_R+=$rankedArray_R[$x];
	#	print "*R ($x): $rankedArray_R[$x]\n";
      }
      $resampleAvgROUGE_R=sprintf("%7.5f",$resampleAvgROUGE_R/(scalar @rankedArray_R));
      # precision
      @rankedArray_P=sort by_value (@{$ResamplingArray[1]});
      $ResamplingArray[1]=\@rankedArray_P;
      for($x=0;$x<=$#rankedArray_P;$x++) {
	$resampleAvgROUGE_P+=$rankedArray_P[$x];
	#	print "*P ($x): $rankedArray_P[$x]\n";
      }
      $resampleAvgROUGE_P=sprintf("%7.5f",$resampleAvgROUGE_P/(scalar @rankedArray_P));
      # f1-measure
      @rankedArray_F=sort by_value (@{$ResamplingArray[2]});
      $ResamplingArray[2]=\@rankedArray_F;
      for($x=0;$x<=$#rankedArray_F;$x++) {
	$resampleAvgROUGE_F+=$rankedArray_F[$x];
	#	print "*F ($x): $rankedArray_F[$x]\n";
      }
      $resampleAvgROUGE_F=sprintf("%7.5f",$resampleAvgROUGE_F/(scalar @rankedArray_F));
    }
    #    $ciU=999-int((100-$opt_c)*10/2); # upper bound index
    #    $ciL=int((100-$opt_c)*10/2);     # lower bound index
    $delta=$numOfResamples*((100-$opt_c)/2.0)/100.0;
    $ciUa=int($numOfResamples-$delta-1); # upper confidence interval lower index
    $ciUb=$ciUa+1;                       # upper confidence interval upper index
    $ciLa=int($delta);                   # lower confidence interval lower index
    $ciLb=$ciLa+1;                       # lower confidence interval upper index
    $ciR=$numOfResamples-$delta-1-$ciUa; # ratio bewteen lower and upper indexes
    #    $ROUGEAverages->{"AvgR"}=$avgAvgROUGE_R;
    #-------
    # recall
    $ROUGEAverages->{"AvgR"}=$resampleAvgROUGE_R;
    # find condifence intervals; take maximum distance from the mean
    $ROUGEAverages->{"CIAvgL_R"}=sprintf("%7.5f",$ResamplingArray[0][$ciLa]+
					 ($ResamplingArray[0][$ciLb]-$ResamplingArray[0][$ciLa])*$ciR);
    $ROUGEAverages->{"CIAvgU_R"}=sprintf("%7.5f",$ResamplingArray[0][$ciUa]+
					 ($ResamplingArray[0][$ciUb]-$ResamplingArray[0][$ciUa])*$ciR);
    #-------
    # precision
    $ROUGEAverages->{"AvgP"}=$resampleAvgROUGE_P;
    # find condifence intervals; take maximum distance from the mean
    $ROUGEAverages->{"CIAvgL_P"}=sprintf("%7.5f",$ResamplingArray[1][$ciLa]+
					 ($ResamplingArray[1][$ciLb]-$ResamplingArray[1][$ciLa])*$ciR);
    $ROUGEAverages->{"CIAvgU_P"}=sprintf("%7.5f",$ResamplingArray[1][$ciUa]+
					 ($ResamplingArray[1][$ciUb]-$ResamplingArray[1][$ciUa])*$ciR);
    #-------
    # f1-measure
    $ROUGEAverages->{"AvgF"}=$resampleAvgROUGE_F;
    # find condifence intervals; take maximum distance from the mean
    $ROUGEAverages->{"CIAvgL_F"}=sprintf("%7.5f",$ResamplingArray[2][$ciLa]+
					 ($ResamplingArray[2][$ciLb]-$ResamplingArray[2][$ciLa])*$ciR);
    $ROUGEAverages->{"CIAvgU_F"}=sprintf("%7.5f",$ResamplingArray[2][$ciUa]+
					 ($ResamplingArray[2][$ciUb]-$ResamplingArray[2][$ciUa])*$ciR);
    $ROUGEAverages->{"M_cnt"}=$avgAvgROUGE_R; # model token count
    $ROUGEAverages->{"P_cnt"}=$avgAvgROUGE_P; # peer token count
    $ROUGEAverages->{"H_cnt"}=$avgAvgROUGE_F; # hit token count
  }
  else {
    # $opt_t==2 => output raw count instead of precision, recall, and f-measure values
    # in this option, no resampling is necessary, just output the raw counts
    $ROUGEAverages->{"M_cnt"}=$avgAvgROUGE_R; # model token count
    $ROUGEAverages->{"P_cnt"}=$avgAvgROUGE_P; # peer token count
    $ROUGEAverages->{"H_cnt"}=$avgAvgROUGE_F; # hit token count
  }
}

sub computeROUGEX {
  my $metric=shift;       # which ROUGE metric to compute?
  my $ROUGEScores=shift;
  my $evalID=shift;
  my $ROUGEEval=shift;    # one particular evaluation pair
  my $peerID=shift;       # a specific peer ID
  my $ROUGEParam=shift;   # ROUGE scoring parameters
  my $lengthLimit;        # lenght limit in words
  my $byteLimit;          # length limit in bytes
  my $NSIZE;              # ngram size for ROUGE-N
  my $weightFactor;       # weight factor for ROUGE-W
  my $skipDistance;       # skip distance for ROUGE-S
  my $scoreMode;          # scoring mode: A = model average; B = best model
  my $alpha;              # relative importance between recall and precision
  my $opt_t;              # ROUGE score counting mode
  my $BEMode;             # Basic Element scoring mode
  my ($c,$cx,@modelPaths,$modelIDs,$modelRoot,$inputFormat);

  $lengthLimit=$ROUGEParam->{"LENGTH"};
  $byteLimit=$ROUGEParam->{"BYTE"};
  $NSIZE=$ROUGEParam->{"NSIZE"};
  $weightFactor=$ROUGEParam->{"WEIGHT"};
  $skipDistance=$ROUGEParam->{"SD"};
  $scoreMode=$ROUGEParam->{"SM"};
  $alpha=$ROUGEParam->{"ALPHA"};
  $opt_t=$ROUGEParam->{"AVERAGE"};
  $BEMode=$ROUGEParam->{"BEMODE"};
  
  # Check to see if this evaluation trial contains this $peerID.
  # Sometimes not every peer provides response for each
  # evaluation trial.
  unless(exists($ROUGEEval->{"Ps"}{$peerID})) {
    unless(exists($knownMissing{$evalID})) {
      $knownMissing{$evalID}={};
    }
    unless(exists($knownMissing{$evalID}{$peerID})) {
      print STDERR "\*ROUGE Warning: test instance for peer $peerID does not exist for evaluation $evalID\n";
      $knownMissing{$evalID}{$peerID}=1;
    }
    return;
  }
  unless(defined($opt_z)) {
    $peerPath=$ROUGEEval->{"PR"}."/".$ROUGEEval->{"Ps"}{$peerID};
  }
  else {
    # if opt_z is set then peerPath is read from a file list that
    # includes the path to the peer.
    $peerPath=$ROUGEEval->{"Ps"}{$peerID};
  }
  if(defined($ROUGEEval->{"MR"})) {
    $modelRoot=$ROUGEEval->{"MR"};
  }
  else {
    # if opt_z is set then modelPath is read from a file list that
    # includes the path to the model.
    $modelRoot="";
  }
  $modelIDs=$ROUGEEval->{"MIDList"};
  $inputFormat=$ROUGEEval->{"IF"};
  # construct combined model
  @modelPaths=(); # reset model paths
  for($cx=0;$cx<=$#{$modelIDs};$cx++) {
    my $modelID;
    $modelID=$modelIDs->[$cx];
    unless(defined($opt_z)) {
      $modelPath="$modelRoot/$ROUGEEval->{\"Ms\"}{$modelID}"; # get full model path
    }
    else {
      # if opt_z is set then modelPath is read from a file list that
      # includes the full path to the model.
      $modelPath="$ROUGEEval->{\"Ms\"}{$modelID}"; # get full model path
    }
    if(-e "$modelPath") {
      #		    print "*$modelPath\n";
    }
    else {
      die "Cannot find model summary: $modelPath\n";
    }
    push(@modelPaths,$modelPath);
  }
  #---------------------------------------------------------------
  # evaluate peer
  {
    my (@results);
    my ($testID,$avgROUGE,$avgROUGE_P,$avgROUGE_F);
    @results=();
    if($metric eq "N") {
      &computeNGramScore(\@modelPaths,$peerPath,\@results,$NSIZE,$lengthLimit,$byteLimit,$inputFormat,$scoreMode,$alpha);
    }
    elsif($metric eq "L") {
      &computeLCSScore(\@modelPaths,$peerPath,\@results,$lengthLimit,$byteLimit,$inputFormat,$scoreMode,$alpha);
    }
    elsif($metric eq "W") {
      &computeWLCSScore(\@modelPaths,$peerPath,\@results,$lengthLimit,$byteLimit,$inputFormat,$weightFactor,$scoreMode,$alpha);
    }
    elsif($metric eq "S") {
      &computeSkipBigramScore(\@modelPaths,$peerPath,\@results,$skipDistance,$lengthLimit,$byteLimit,$inputFormat,$scoreMode,$alpha);
    }
    elsif($metric eq "BE") {
      &computeBEScore(\@modelPaths,$peerPath,\@results,$BEMode,$lengthLimit,$byteLimit,$inputFormat,$scoreMode,$alpha);
    }
    else {
      die "Unknown ROUGE metric ID: $metric, has to be N, L, W, or S\n";
      
    }
    unless(defined($opt_t)) {
      # sentence level average
      $avgROUGE=sprintf("%7.5f",$results[2]);
      $avgROUGE_P=sprintf("%7.5f",$results[4]);
      $avgROUGE_F=sprintf("%7.5f",$results[5]);
    }
    else {
      # corpus level per token average
      $avgROUGE=$results[0]; # total model token count
      $avgROUGE_P=$results[3]; # total peer token count
      $avgROUGE_F=$results[1]; # total match count between model and peer, i.e. hit
    }
    # record ROUGE scores for the current test
    $testID="$evalID\.$peerID";
    if($debug) {
      print "$testID\n";
    }
    unless(exists($testIDs{$testID})) {
      $testIDs{$testID}=1;
    }
    unless(exists($ROUGEScores->{$testID})) {
      $ROUGEScores->{$testID}=[];
      push(@{$ROUGEScores->{$testID}},$avgROUGE);   # average ; or model token count
      push(@{$ROUGEScores->{$testID}},$avgROUGE_P); # average ; or peer token count
      push(@{$ROUGEScores->{$testID}},$avgROUGE_F); # average ; or match token count (hit)
    }
  }
}

# 10/21/2004 add selection of scoring mode
# A: average over all models
# B: take only the best score
sub computeNGramScore {
  my $modelPaths=shift;
  my $peerPath=shift;
  my $results=shift;
  my $NSIZE=shift;
  my $lengthLimit=shift;
  my $byteLimit=shift;
  my $inputFormat=shift;
  my $scoreMode=shift;
  my $alpha=shift;
  my ($modelPath,$modelText,$peerText,$text,@tokens);
  my (%model_grams,%peer_grams);
  my ($gramHit,$gramScore,$gramScoreBest);
  my ($totalGramHit,$totalGramCount);
  my ($gramScoreP,$gramScoreF,$totalGramCountP);
  
  #------------------------------------------------
  # read model file and create model n-gram maps
  $totalGramHit=0;
  $totalGramCount=0;
  $gramScoreBest=-1;
  $gramScoreP=0; # precision
  $gramScoreF=0; # f-measure
  $totalGramCountP=0;
  #------------------------------------------------
  # read peer file and create model n-gram maps
  %peer_grams=();
  $peerText="";
  &readText($peerPath,\$peerText,$inputFormat,$lengthLimit,$byteLimit);
  &createNGram($peerText,\%peer_grams,$NSIZE);
  if($debug) {
    print "***P $peerPath\n";
    if(defined($peerText)) {
      print "$peerText\n";
      print join("|",%peer_grams),"\n";
    }
    else {
      print "---empty text---\n";
    }
  }
  foreach $modelPath (@$modelPaths) {
    %model_grams=();
    $modelText="";
    &readText($modelPath,\$modelText,$inputFormat,$lengthLimit,$byteLimit);
    &createNGram($modelText,\%model_grams,$NSIZE);
    if($debug) {
      if(defined($modelText)) {
	print "$modelText\n";
	print join("|",%model_grams),"\n";
      }
      else {
	print "---empty text---\n";
      }
    }
    #------------------------------------------------
    # compute ngram score
    &ngramScore(\%model_grams,\%peer_grams,\$gramHit,\$gramScore);
    # collect hit and count for each models
    # This will effectively clip hit for each model; therefore would not give extra
    # credit to reducdant information contained in the peer summary.
    if($scoreMode eq "A") {
      $totalGramHit+=$gramHit;
      $totalGramCount+=$model_grams{"_cn_"};
      $totalGramCountP+=$peer_grams{"_cn_"};
    }
    elsif($scoreMode eq "B") {
      if($gramScore>$gramScoreBest) {
	# only take a better score (i.e. better match)
	$gramScoreBest=$gramScore;
	$totalGramHit=$gramHit;
	$totalGramCount=$model_grams{"_cn_"};
	$totalGramCountP=$peer_grams{"_cn_"};
      }
    }
    else {
      # use average mode
      $totalGramHit+=$gramHit;
      $totalGramCount+=$model_grams{"_cn_"};
      $totalGramCountP+=$peer_grams{"_cn_"};
    }
    if($debug) {
      print "***M $modelPath\n";
    }
  }
  # prepare score result for return
  # unigram
  push(@$results,$totalGramCount); # total number of ngrams in models
  push(@$results,$totalGramHit);
  if($totalGramCount!=0) {
    $gramScore=sprintf("%7.5f",$totalGramHit/$totalGramCount);
  }
  else {
    $gramScore=sprintf("%7.5f",0);
  }
  push(@$results,$gramScore);
  push(@$results,$totalGramCountP); # total number of ngrams in peers
  if($totalGramCountP!=0) {
    $gramScoreP=sprintf("%7.5f",$totalGramHit/$totalGramCountP);
  }
  else {
    $gramScoreP=sprintf("%7.5f",0);
  }
  push(@$results,$gramScoreP);      # precision score
  if((1-$alpha)*$gramScoreP+$alpha*$gramScore>0) {
    $gramScoreF=sprintf("%7.5f",($gramScoreP*$gramScore)/((1-$alpha)*$gramScoreP+$alpha*$gramScore));
  }
  else {
    $gramScoreF=sprintf("%7.5f",0);
  }
  push(@$results,$gramScoreF);      # f1-measure score
  if($debug) {
    print "total $NSIZE-gram model count: $totalGramCount\n";
    print "total $NSIZE-gram peer count: $totalGramCountP\n";
    print "total $NSIZE-gram hit: $totalGramHit\n";
    print "total ROUGE-$NSIZE\-R: $gramScore\n";
    print "total ROUGE-$NSIZE\-P: $gramScoreP\n";
    print "total ROUGE-$NSIZE\-F: $gramScoreF\n";
  }
}

sub computeSkipBigramScore {
  my $modelPaths=shift;
  my $peerPath=shift;
  my $results=shift;
  my $skipDistance=shift;
  my $lengthLimit=shift;
  my $byteLimit=shift;
  my $inputFormat=shift;
  my $scoreMode=shift;
  my $alpha=shift;
  my ($modelPath,$modelText,$peerText,$text,@tokens);
  my (%model_grams,%peer_grams);
  my ($gramHit,$gramScore,$gramScoreBest);
  my ($totalGramHitm,$totalGramCount);
  my ($gramScoreP,$gramScoreF,$totalGramCountP);
  
  #------------------------------------------------
  # read model file and create model n-gram maps
  $totalGramHit=0;
  $totalGramCount=0;
  $gramScoreBest=-1;
  $gramScoreP=0; # precision
  $gramScoreF=0; # f-measure
  $totalGramCountP=0;
  #------------------------------------------------
  # read peer file and create model n-gram maps
  %peer_grams=();
  $peerText="";
  &readText($peerPath,\$peerText,$inputFormat,$lengthLimit,$byteLimit);
  &createSkipBigram($peerText,\%peer_grams,$skipDistance);
  if($debug) {
    print "***P $peerPath\n";
    if(defined($peerText)) {
      print "$peerText\n";
      print join("|",%peer_grams),"\n";
    }
    else {
      print "---empty text---\n";
    }
  }
  foreach $modelPath (@$modelPaths) {
    %model_grams=();
    $modelText="";
    &readText($modelPath,\$modelText,$inputFormat,$lengthLimit,$byteLimit);
    if(defined($opt_M)) { # only apply stemming on models
      $opt_m=1;
    }
    &createSkipBigram($modelText,\%model_grams,$skipDistance);
    if(defined($opt_M)) { # only apply stemming on models
      $opt_m=undef;
    }
    if($debug) {
      if(defined($modelText)) {
	print "$modelText\n";
	print join("|",%model_grams),"\n";
      }
      else {
	print "---empty text---\n";
      }
    }
    #------------------------------------------------
    # compute ngram score
    &skipBigramScore(\%model_grams,\%peer_grams,\$gramHit,\$gramScore);
    # collect hit and count for each models
    # This will effectively clip hit for each model; therefore would not give extra
    # credit to reducdant information contained in the peer summary.
    if($scoreMode eq "A") {
      $totalGramHit+=$gramHit;
      $totalGramCount+=$model_grams{"_cn_"};
      $totalGramCountP+=$peer_grams{"_cn_"};
    }
    elsif($scoreMode eq "B") {
      if($gramScore>$gramScoreBest) {
	# only take a better score (i.e. better match)
	$gramScoreBest=$gramScore;
	$totalGramHit=$gramHit;
	$totalGramCount=$model_grams{"_cn_"};
	$totalGramCountP=$peer_grams{"_cn_"};
      }
    }
    else {
      # use average mode
      $totalGramHit+=$gramHit;
      $totalGramCount+=$model_grams{"_cn_"};
      $totalGramCountP+=$peer_grams{"_cn_"};
    }
    if($debug) {
      print "***M $modelPath\n";
    }
  }
  # prepare score result for return
  # unigram
  push(@$results,$totalGramCount); # total number of ngrams
  push(@$results,$totalGramHit);
  if($totalGramCount!=0) {
    $gramScore=sprintf("%7.5f",$totalGramHit/$totalGramCount);
  }
  else {
    $gramScore=sprintf("%7.5f",0);
  }
  push(@$results,$gramScore);
  push(@$results,$totalGramCountP); # total number of ngrams in peers
  if($totalGramCountP!=0) {
    $gramScoreP=sprintf("%7.5f",$totalGramHit/$totalGramCountP);
  }
  else {
    $gramScoreP=sprintf("%7.5f",0);
  }
  push(@$results,$gramScoreP);      # precision score
  if((1-$alpha)*$gramScoreP+$alpha*$gramScore>0) {
    $gramScoreF=sprintf("%7.5f",($gramScoreP*$gramScore)/((1-$alpha)*$gramScoreP+$alpha*$gramScore));
  }
  else {
    $gramScoreF=sprintf("%7.5f",0);
  }
  push(@$results,$gramScoreF);      # f1-measure score
  if($debug) {
    print "total ROUGE-S$skipDistance model count: $totalGramCount\n";
    print "total ROUGE-S$skipDistance peer count: $totalGramCountP\n";
    print "total ROUGE-S$skipDistance hit: $totalGramHit\n";
    print "total ROUGE-S$skipDistance\-R: $gramScore\n";
    print "total ROUGE-S$skipDistance\-P: $gramScore\n";
    print "total ROUGE-S$skipDistance\-F: $gramScore\n";
  }
}

sub computeLCSScore {
  my $modelPaths=shift;
  my $peerPath=shift;
  my $results=shift;
  my $lengthLimit=shift;
  my $byteLimit=shift;
  my $inputFormat=shift;
  my $scoreMode=shift;
  my $alpha=shift;
  my ($modelPath,@modelText,@peerText,$text,@tokens);
  my (@modelTokens,@peerTokens);
  my ($lcsHit,$lcsScore,$lcsBase,$lcsScoreBest);
  my ($totalLCSHitm,$totalLCSCount);
  my (%peer_1grams,%tmp_peer_1grams,%model_1grams,$peerText1,$modelText1);
  my ($lcsScoreP,$lcsScoreF,$totalLCSCountP);
  
  #------------------------------------------------
  $totalLCSHit=0;
  $totalLCSCount=0;
  $lcsScoreBest=-1;
  $lcsScoreP=0;
  $lcsScoreF=0;
  $totalLCSCountP=0;
  #------------------------------------------------
  # read peer file and create peer n-gram maps
  @peerTokens=();
  @peerText=();
  &readText_LCS($peerPath,\@peerText,$inputFormat,$lengthLimit,$byteLimit);
  &tokenizeText_LCS(\@peerText,\@peerTokens);
  #------------------------------------------------
  # create unigram for clipping
  %peer_1grams=();
  &readText($peerPath,\$peerText1,$inputFormat,$lengthLimit,$byteLimit);
  &createNGram($peerText1,\%peer_1grams,1);
  if($debug) {
    my $i;
    print "***P $peerPath\n";
    print join("\n",@peerText),"\n";
    for($i=0;$i<=$#peerText;$i++) {
      print $i,": ",join("|",@{$peerTokens[$i]}),"\n";
    }
  }
  foreach $modelPath (@$modelPaths) {
    %tmp_peer_1grams=%peer_1grams; # renew peer unigram hash, so the peer count can be reset to the orignal number
    @modelTokens=();
    @modelText=();
    &readText_LCS($modelPath,\@modelText,$inputFormat,$lengthLimit,$byteLimit);
    if(defined($opt_M)) {
      $opt_m=1;
      &tokenizeText_LCS(\@modelText,\@modelTokens);
      $opt_m=undef;
    }
    else {
      &tokenizeText_LCS(\@modelText,\@modelTokens);
    }
    #------------------------------------------------
    # create unigram for clipping
    %model_1grams=();
    &readText($modelPath,\$modelText1,$inputFormat,$lengthLimit,$byteLimit);
    if(defined($opt_M)) { # only apply stemming on models
      $opt_m=1;
    }        
    &createNGram($modelText1,\%model_1grams,1);
    if(defined($opt_M)) { # only apply stemming on models
      $opt_m=undef;
    }
    #------------------------------------------------
    # compute LCS score
    &lcs(\@modelTokens,\@peerTokens,\$lcsHit,\$lcsScore,\$lcsBase,\%model_1grams,\%tmp_peer_1grams);
    # collect hit and count for each models
    # This will effectively clip hit for each model; therefore would not give extra
    # credit to reductant information contained in the peer summary.
    # Previous method that lumps model text together and inflates the peer summary
    # the number of references time would reward redundant information
    if($scoreMode eq "A") {
      $totalLCSHit+=$lcsHit;
      $totalLCSCount+=$lcsBase;
      $totalLCSCountP+=$peer_1grams{"_cn_"};
    }
    elsif($scoreMode eq "B") {
      if($lcsScore>$lcsScoreBest) {
	# only take a better score (i.e. better match)
	$lcsScoreBest=$lcsScore;
	$totalLCSHit=$lcsHit;
	$totalLCSCount=$lcsBase;
	$totalLCSCountP=$peer_1grams{"_cn_"};
      }
    }
    else {
      # use average mode
      $totalLCSHit+=$lcsHit;
      $totalLCSCount+=$lcsBase;
      $totalLCSCountP+=$peer_1grams{"_cn_"};
    }
    if($debug) {
      my $i;
      print "***M $modelPath\n";
      print join("\n",@modelText),"\n";
      for($i=0;$i<=$#modelText;$i++) {
	print $i,": ",join("|",@{$modelTokens[$i]}),"\n";
      }
    }
  }
  # prepare score result for return
  push(@$results,$totalLCSCount); # total number of ngrams
  push(@$results,$totalLCSHit);
  if($totalLCSCount!=0) {
    $lcsScore=sprintf("%7.5f",$totalLCSHit/$totalLCSCount);
  }
  else {
    $lcsScore=sprintf("%7.5f",0);
  }
  push(@$results,$lcsScore);
  push(@$results,$totalLCSCountP); # total number of token in peers
  if($totalLCSCountP!=0) {
    $lcsScoreP=sprintf("%7.5f",$totalLCSHit/$totalLCSCountP);
  }
  else {
    $lcsScoreP=sprintf("%7.5f",0);
  }
  push(@$results,$lcsScoreP);
  if((1-$alpha)*$lcsScoreP+$alpha*$lcsScore>0) {
    $lcsScoreF=sprintf("%7.5f",($lcsScoreP*$lcsScore)/((1-$alpha)*$lcsScoreP+$alpha*$lcsScore));
  }
  else {
    $lcsScoreF=sprintf("%7.5f",0);
  }
  push(@$results,$lcsScoreF);
  if($debug) {
    print "total ROUGE-L model count: $totalLCSCount\n";
    print "total ROUGE-L peer count: $totalLCSCountP\n";
    print "total ROUGE-L hit: $totalLCSHit\n";
    print "total ROUGE-L-R score: $lcsScore\n";
    print "total ROUGE-L-P: $lcsScoreP\n";
    print "total ROUGE-L-F: $lcsScoreF\n";
  }
}

sub computeWLCSScore {
  my $modelPaths=shift;
  my $peerPath=shift;
  my $results=shift;
  my $lengthLimit=shift;
  my $byteLimit=shift;
  my $inputFormat=shift;
  my $weightFactor=shift;
  my $scoreMode=shift;
  my $alpha=shift;
  my ($modelPath,@modelText,@peerText,$text,@tokens);
  my (@modelTokens,@peerTokens);
  my ($lcsHit,$lcsScore,$lcsBase,$lcsScoreBest);
  my ($totalLCSHitm,$totalLCSCount);
  my (%peer_1grams,%tmp_peer_1grams,%model_1grams,$peerText1,$modelText1);
  my ($lcsScoreP,$lcsScoreF,$totalLCSCountP);
  
  #------------------------------------------------
  # read model file and create model n-gram maps
  $totalLCSHit=0;
  $totalLCSCount=0;
  $lcsScoreBest=-1;
  $lcsScoreP=0;
  $lcsScoreF=0;
  $totalLCSCountP=0;
  #------------------------------------------------
  # read peer file and create model n-gram maps
  @peerTokens=();
  @peerText=();
  &readText_LCS($peerPath,\@peerText,$inputFormat,$lengthLimit,$byteLimit);
  &tokenizeText_LCS(\@peerText,\@peerTokens);
  #------------------------------------------------
  # create unigram for clipping
  %peer_1grams=();
  &readText($peerPath,\$peerText1,$inputFormat,$lengthLimit,$byteLimit);
  &createNGram($peerText1,\%peer_1grams,1);
  if($debug) {
    my $i;
    print "***P $peerPath\n";
    print join("\n",@peerText),"\n";
    for($i=0;$i<=$#peerText;$i++) {
      print $i,": ",join("|",@{$peerTokens[$i]}),"\n";
    }
  }
  foreach $modelPath (@$modelPaths) {
    %tmp_peer_1grams=%peer_1grams; # renew peer unigram hash, so the peer count can be reset to the orignal number
    @modelTokens=();
    @modelText=();
    &readText_LCS($modelPath,\@modelText,$inputFormat,$lengthLimit,$byteLimit);
    &tokenizeText_LCS(\@modelText,\@modelTokens);
    #------------------------------------------------
    # create unigram for clipping
    %model_1grams=();
    &readText($modelPath,\$modelText1,$inputFormat,$lengthLimit,$byteLimit);
    if(defined($opt_M)) { # only apply stemming on models
      $opt_m=1;
    }
    &createNGram($modelText1,\%model_1grams,1);
    if(defined($opt_M)) { # only apply stemming on models
      $opt_m=undef;
    }
    #------------------------------------------------
    # compute WLCS score
    &wlcs(\@modelTokens,\@peerTokens,\$lcsHit,\$lcsScore,\$lcsBase,$weightFactor,\%model_1grams,\%tmp_peer_1grams);
    # collect hit and count for each models
    # This will effectively clip hit for each model; therefore would not give extra
    # credit to reductant information contained in the peer summary.
    # Previous method that lumps model text together and inflates the peer summary
    # the number of references time would reward redundant information
    if($scoreMode eq "A") {
      $totalLCSHit+=$lcsHit;
      $totalLCSCount+=&wlcsWeight($lcsBase,$weightFactor);
      $totalLCSCountP+=&wlcsWeight($peer_1grams{"_cn_"},$weightFactor);
    }
    elsif($scoreMode eq "B") {
      if($lcsScore>$lcsScoreBest) {
	# only take a better score (i.e. better match)
	$lcsScoreBest=$lcsScore;
	$totalLCSHit=$lcsHit;
	$totalLCSCount=&wlcsWeight($lcsBase,$weightFactor);
	$totalLCSCountP=&wlcsWeight($peer_1grams{"_cn_"},$weightFactor);
      }
    }
    else {
      # use average mode
      $totalLCSHit+=$lcsHit;
      $totalLCSCount+=&wlcsWeight($lcsBase,$weightFactor);
      $totalLCSCountP+=&wlcsWeight($peer_1grams{"_cn_"},$weightFactor);
    }
    if($debug) {
      my $i;
      print "***M $modelPath\n";
      print join("\n",@modelText),"\n";
      for($i=0;$i<=$#modelText;$i++) {
	print $i,": ",join("|",@{$modelTokens[$i]}),"\n";
      }
    }
  }
  # prepare score result for return
  push(@$results,$totalLCSCount); # total number of ngrams
  push(@$results,$totalLCSHit);
  if($totalLCSCount!=0) {
    $lcsScore=sprintf("%7.5f",&wlcsWeightInverse($totalLCSHit/$totalLCSCount,$weightFactor));
  }
  else {
    $lcsScore=sprintf("%7.5f",0);
  }
  push(@$results,$lcsScore);
  push(@$results,$totalLCSCountP); # total number of token in peers
  if($totalLCSCountP!=0) {
    $lcsScoreP=sprintf("%7.5f",&wlcsWeightInverse($totalLCSHit/$totalLCSCountP,$weightFactor));
  }
  else {
    $lcsScoreP=sprintf("%7.5f",0);
  }
  push(@$results,$lcsScoreP);
  if((1-$alpha)*$lcsScoreP+$alpha*$lcsScore>0) {
    $lcsScoreF=sprintf("%7.5f",($lcsScoreP*$lcsScore)/((1-$alpha)*$lcsScoreP+$alpha*$lcsScore));
  }
  else {
    $lcsScoreF=sprintf("%7.5f",0);
  }
  push(@$results,$lcsScoreF);
  if($debug) {
    print "total ROUGE-W-$weightFactor model count: $totalLCSCount\n";
    print "total ROUGE-W-$weightFactor peer count: $totalLCSCountP\n";
    print "total ROUGE-W-$weightFactor hit: $totalLCSHit\n";
    print "total ROUGE-W-$weightFactor-R score: $lcsScore\n";
    print "total ROUGE-W-$weightFactor-P score: $lcsScoreP\n";
    print "total ROUGE-W-$weightFactor-F score: $lcsScoreF\n";
  }
}

sub computeBEScore {
  my $modelPaths=shift;
  my $peerPath=shift;
  my $results=shift;
  my $BEMode=shift;
  my $lengthLimit=shift;
  my $byteLimit=shift;
  my $inputFormat=shift;
  my $scoreMode=shift;
  my $alpha=shift;
  my ($modelPath,@modelBEList,@peerBEList,$text,@tokens);
  my (%model_BEs,%peer_BEs);
  my ($BEHit,$BEScore,$BEScoreBest);
  my ($totalBEHit,$totalBECount);
  my ($BEScoreP,$BEScoreF,$totalBECountP);
  
  #------------------------------------------------
  # read model file and create model BE maps
  $totalBEHit=0;
  $totalBECount=0;
  $BEScoreBest=-1;
  $BEScoreP=0; # precision
  $BEScoreF=0; # f-measure
  $totalBECountP=0;
  #------------------------------------------------
  # read peer file and create model n-BE maps
  %peer_BEs=();
  @peerBEList=();
  &readBE($peerPath,\@peerBEList,$inputFormat);
  &createBE(\@peerBEList,\%peer_BEs,$BEMode);
  if($debug) {
    print "***P $peerPath\n";
    if(scalar @peerBEList > 0) {
#      print join("\n",@peerBEList);
#      print "\n";
      print join("#",%peer_BEs),"\n";
    }
    else {
      print "---empty text---\n";
    }
  }
  foreach $modelPath (@$modelPaths) {
    %model_BEs=();
    @modelBEList=();
    &readBE($modelPath,\@modelBEList,$inputFormat);
    if(defined($opt_M)) { # only apply stemming on models
      $opt_m=1;
    }
    &createBE(\@modelBEList,\%model_BEs,$BEMode);
    if(defined($opt_M)) { # only apply stemming on models
      $opt_m=undef;
    }
    if($debug) {
      if(scalar @modelBEList > 0) {
#	print join("\n",@modelBEList);
#	print "\n";
	print join("#",%model_BEs),"\n";
      }
      else {
	print "---empty text---\n";
      }
    }
    #------------------------------------------------
    # compute BE score
    &getBEScore(\%model_BEs,\%peer_BEs,\$BEHit,\$BEScore);
    # collect hit and count for each models
    # This will effectively clip hit for each model; therefore would not give extra
    # credit to reducdant information contained in the peer summary.
    if($scoreMode eq "A") {
      $totalBEHit+=$BEHit;
      $totalBECount+=$model_BEs{"_cn_"};
      $totalBECountP+=$peer_BEs{"_cn_"};
    }
    elsif($scoreMode eq "B") {
      if($BEScore>$BEScoreBest) {
	# only take a better score (i.e. better match)
	$BEScoreBest=$BEScore;
	$totalBEHit=$BEHit;
	$totalBECount=$model_BEs{"_cn_"};
	$totalBECountP=$peer_BEs{"_cn_"};
      }
    }
    else {
      # use average mode
      $totalBEHit+=$BEHit;
      $totalBECount+=$model_BEs{"_cn_"};
      $totalBECountP+=$peer_BEs{"_cn_"};
    }
    if($debug) {
      print "***M $modelPath\n";
    }
  }
  # prepare score result for return
  # uniBE
  push(@$results,$totalBECount); # total number of nbes in models
  push(@$results,$totalBEHit);
  if($totalBECount!=0) {
    $BEScore=sprintf("%7.5f",$totalBEHit/$totalBECount);
  }
  else {
    $BEScore=sprintf("%7.5f",0);
  }
  push(@$results,$BEScore);
  push(@$results,$totalBECountP); # total number of nBEs in peers
  if($totalBECountP!=0) {
    $BEScoreP=sprintf("%7.5f",$totalBEHit/$totalBECountP);
  }
  else {
    $BEScoreP=sprintf("%7.5f",0);
  }
  push(@$results,$BEScoreP);      # precision score
  if((1-$alpha)*$BEScoreP+$alpha*$BEScore>0) {
    $BEScoreF=sprintf("%7.5f",($BEScoreP*$BEScore)/((1-$alpha)*$BEScoreP+$alpha*$BEScore));
  }
  else {
    $BEScoreF=sprintf("%7.5f",0);
  }
  push(@$results,$BEScoreF);      # f1-measure score
  if($debug) {
    print "total BE-$BEMode model count: $totalBECount\n";
    print "total BE-$BEMode peer count: $totalBECountP\n";
    print "total BE-$BEMode hit: $totalBEHit\n";
    print "total ROUGE-BE-$BEMode\-R: $BEScore\n";
    print "total ROUGE-BE-$BEMode\-P: $BEScoreP\n";
    print "total ROUGE-BE-$BEMode\-F: $BEScoreF\n";
  }
}

sub readTextOld {
  my $inPath=shift;
  my $tokenizedText=shift;
  my $type=shift;
  my $lengthLimit=shift;
  my $byteLimit=shift;
  my ($text,$bsize,$wsize,@words,$done);
  
  $$tokenizedText=undef;
  $bsize=0;
  $wsize=0;
  $done=0;
  open(TEXT,$inPath)||die "Cannot open $inPath\n";
  if($type=~/^SEE$/oi) {
    while(defined($line=<TEXT>)) { # SEE abstract format
      if($line=~/^<a (size=\"[0-9]+\" )?name=\"[0-9]+\">\[([0-9]+)\]<\/a>\s+<a href=\"\#[0-9]+\" id=[0-9]+>([^<]+)/o) {
	$text=$3;
	$text=~tr/A-Z/a-z/;
	&checkSummarySize($tokenizedText,\$text,\$wsize,\$bsize,\$done,$lengthLimit,$byteLimit);
      }
    }
  }
  elsif($type=~/^ISI$/oi) { # ISI standard sentence by sentence format
    while(defined($line=<TEXT>)) {
      if($line=~/^<S SNTNO=\"[0-9a-z,]+\">([^<]+)<\/S>/o) {
	$text=$1;
	$text=~tr/A-Z/a-z/;
	&checkSummarySize($tokenizedText,\$text,\$wsize,\$bsize,\$done,$lengthLimit,$byteLimit);
      }
    }
  }
  elsif($type=~/^SPL$/oi) { # SPL one Sentence Per Line format
    while(defined($line=<TEXT>)) {
      chomp($line);
      $line=~s/^\s+//;
      $line=~s/\s+$//;
      if(defined($line)&&length($line)>0) {
	$text=$line;
	$text=~tr/A-Z/a-z/;
	&checkSummarySize($tokenizedText,\$text,\$wsize,\$bsize,\$done,$lengthLimit,$byteLimit);
      }
    }
  }
  else {
    close(TEXT);
    die "Unknown input format: $type\n";
  }
  close(TEXT);
  if(defined($$tokenizedText)) {
    $$tokenizedText=~s/\-/ \- /g;
    $$tokenizedText=~s/[^A-Za-z0-9\-]/ /g;
    $$tokenizedText=~s/^\s+//;
    $$tokenizedText=~s/\s+$//;
    $$tokenizedText=~s/\s+/ /g;
  }
  else {
    print STDERR "readText: $inPath -> empty text\n";
  }
  #    print "($$tokenizedText)\n\n";
}

# enforce length cutoff at the file level
# convert different input format into SPL format then put them into
# tokenizedText
sub readText {
  my $inPath=shift;
  my $tokenizedText=shift;
  my $type=shift;
  my $lengthLimit=shift;
  my $byteLimit=shift;
  my ($text,$bsize,$wsize,@words,$done,@sntList);
  
  $$tokenizedText=undef;
  $bsize=0;
  $wsize=0;
  $done=0;
  @sntList=();
  open(TEXT,$inPath)||die "Cannot open $inPath\n";
  if($type=~/^SEE$/oi) {
    while(defined($line=<TEXT>)) { # SEE abstract format
      if($line=~/^<a size=\"[0-9]+\" name=\"[0-9]+\">\[([0-9]+)\]<\/a>\s+<a href=\"\#[0-9]+\" id=[0-9]+>([^<]+)/o||
	 $line=~/^<a name=\"[0-9]+\">\[([0-9]+)\]<\/a>\s+<a href=\"\#[0-9]+\" id=[0-9]+>([^<]+)/o) {
	$text=$2;
	$text=~tr/A-Z/a-z/;
	push(@sntList,$text);
      }
    }
  }
  elsif($type=~/^ISI$/oi) { # ISI standard sentence by sentence format
    while(defined($line=<TEXT>)) {
      if($line=~/^<S SNTNO=\"[0-9a-z,]+\">([^<]+)<\/S>/o) {
	$text=$1;
	$text=~tr/A-Z/a-z/;
	push(@sntList,$text);
      }
    }
  }
  elsif($type=~/^SPL$/oi) { # SPL one Sentence Per Line format
    while(defined($line=<TEXT>)) {
      chomp($line);
      if(defined($line)&&length($line)>0) {
	$text=$line;
	$text=~tr/A-Z/a-z/;
	push(@sntList,$text);
      }
    }
  }
  else {
    close(TEXT);
    die "Unknown input format: $type\n";
  }
  close(TEXT);
  if($lengthLimit==0&&$byteLimit==0) {
    $$tokenizedText=join(" ",@sntList);
  }
  elsif($lengthLimit!=0) {
    my ($tmpText);
    $tmpText="";
    $tmpTextLen=0;
    foreach $s (@sntList) {
      my ($sLen,@tokens);
      @tokens=split(/\s+/,$s);
      $sLen=scalar @tokens;
      if($tmpTextLen+$sLen<$lengthLimit) {
	if($tmpTextLen!=0) {
	  $tmpText.=" $s";
	}
	else {
	  $tmpText.="$s";
	}
	$tmpTextLen+=$sLen;
      }
      else {
	if($tmpTextLen>0) {
	  $tmpText.=" ";
	}
	$tmpText.=join(" ",@tokens[0..$lengthLimit-$tmpTextLen-1]);
	last;
      }
    }
    if(length($tmpText)>0) {
      $$tokenizedText=$tmpText;
    }
  }
  elsif($byteLimit!=0) {
    my ($tmpText);
    $tmpText="";
    $tmpTextLen=0;
    foreach $s (@sntList) {
      my ($sLen);
      $sLen=length($s);
      if($tmpTextLen+$sLen<$byteLimit) {
	if($tmpTextLen!=0) {
	  $tmpText.=" $s";
	}
	else {
	  $tmpText.="$s";
	}
	$tmpTextLen+=$sLen;
      }
      else {
	if($tmpTextLen>0) {
	  $tmpText.=" ";
	}
	$tmpText.=substr($s,0,$byteLimit-$tmpTextLen);
	last;
      }
    }
    if(length($tmpText)>0) {
      $$tokenizedText=$tmpText;
    }
  }
  if(defined($$tokenizedText)) {
    $$tokenizedText=~s/\-/ \- /g;
    $$tokenizedText=~s/[^A-Za-z0-9\-]/ /g;
    $$tokenizedText=~s/^\s+//;
    $$tokenizedText=~s/\s+$//;
    $$tokenizedText=~s/\s+/ /g;
  }
  else {
    print STDERR "readText: $inPath -> empty text\n";
  }
  #    print "($$tokenizedText)\n\n";
}

sub readBE {
  my $inPath=shift;
  my $BEList=shift;
  my $type=shift;
  my ($line);
  
  open(TEXT,$inPath)||die "Cannot open $inPath\n";
  if(defined($opt_v)) {
    print STDERR "$inPath\n";
  }
  if($type=~/^SIMPLE$/oi) {
    while(defined($line=<TEXT>)) { # Simple BE triple format
      chomp($line);
      push(@{$BEList},$line);
    }
  }
  elsif($type=~/^ISI$/oi) { # ISI standard BE format
    while(defined($line=<TEXT>)) {
      # place holder
    }
  }
  else {
    close(TEXT);
    die "Unknown input format: $type\n";
  }
  close(TEXT);
  if(scalar @{$BEList} ==0) {
    print STDERR "readBE: $inPath -> empty text\n";
  }
}

sub checkSummarySize {
  my $tokenizedText=shift;
  my $text=shift;
  my $wsize=shift;
  my $bsize=shift;
  my $done=shift;
  my $lenghtLimit=shift;
  my $byteLimit=shift;
  my (@words);
  
  @words=split(/\s+/,$$text);
  if(($lengthLimit==0&&$byteLimit==0)||
     ($lengthLimit!=0&&(scalar @words)+$$wsize<=$lengthLimit)||
     ($byteLimit!=0&&length($$text)+$$bsize<=$byteLimit)) {
    if(defined($$tokenizedText)) {
      $$tokenizedText.=" $$text";
    }
    else {
      $$tokenizedText=$$text;
    }
    $$bsize+=length($$text);
    $$wsize+=(scalar @words);
  }
  elsif($lengthLimit!=0&&(scalar @words)+$$wsize>$lengthLimit) {
    if($$done==0) {
      if(defined($$tokenizedText)) {
	$$tokenizedText.=" ";
	$$tokenizedText.=join(" ",@words[0..$lengthLimit-$$wsize-1]);
      }
      else {
	$$tokenizedText=join(" ",@words[0..$lengthLimit-$$wsize-1]);
      }
      $$done=1;
    }
  }
  elsif($byteLimit!=0&&length($$text)+$$bsize>$byteLimit) {
    if($$done==0) {
      if(defined($$tokenizedText)) {
	$$tokenizedText.=" ";
	$$tokenizedText.=substr($$text,0,$byteLimit-$$bsize);
      }
      else {
	$$tokenizedText=substr($$text,0,$byteLimit-$$bsize);
	
      }
      $$done=1;
    }
  }
}

# LCS computing is based on unit and cannot lump all the text together
# as in computing ngram co-occurrences
sub readText_LCS {
  my $inPath=shift;
  my $tokenizedText=shift;
  my $type=shift;
  my $lengthLimit=shift;
  my $byteLimit=shift;
  my ($text,$t,$bsize,$wsize,$done,@sntList);
  
  @{$tokenizedText}=();
  $bsize=0;
  $wsize=0;
  $done=0;
  @sntList=();
  open(TEXT,$inPath)||die "Cannot open $inPath\n";
  if($type=~/^SEE$/oi) {
    while(defined($line=<TEXT>)) { # SEE abstract format
      if($line=~/^<a size=\"[0-9]+\" name=\"[0-9]+\">\[([0-9]+)\]<\/a>\s+<a href=\"\#[0-9]+\" id=[0-9]+>([^<]+)/o||
	 $line=~/^<a name=\"[0-9]+\">\[([0-9]+)\]<\/a>\s+<a href=\"\#[0-9]+\" id=[0-9]+>([^<]+)/o) {
	$text=$2;
	$text=~tr/A-Z/a-z/;
	push(@sntList,$text);
      }
    }
  }
  elsif($type=~/^ISI$/oi) { # ISI standard sentence by sentence format
    while(defined($line=<TEXT>)) {
      if($line=~/^<S SNTNO=\"[0-9a-z,]+\">([^<]+)<\/S>/o) {
	$text=$1;
	$text=~tr/A-Z/a-z/;
	push(@sntList,$text);
      }
    }
  }
  elsif($type=~/^SPL$/oi) { # SPL one Sentence Per Line format
    while(defined($line=<TEXT>)) {
      chomp($line);
      if(defined($line)&&length($line)>0) {
	$text=$line;
	$text=~tr/A-Z/a-z/;
	push(@sntList,$text);
      }
    }
  }
  else {
    close(TEXT);
    die "Unknown input format: $type\n";
  }
  close(TEXT);
  if($lengthLimit==0&&$byteLimit==0) {
    @{$tokenizedText}=@sntList;
  }
  elsif($lengthLimit!=0) {
    my ($tmpText);
    $tmpText="";
    $tmpTextLen=0;
    foreach $s (@sntList) {
      my ($sLen,@tokens);
      @tokens=split(/\s+/,$s);
      $sLen=scalar @tokens;
      if($tmpTextLen+$sLen<$lengthLimit) {
	$tmpTextLen+=$sLen;
	push(@{$tokenizedText},$s);
      }
      else {
	push(@{$tokenizedText},join(" ",@tokens[0..$lengthLimit-$tmpTextLen-1]));
	last;
      }
    }
  }
  elsif($byteLimit!=0) {
    my ($tmpText);
    $tmpText="";
    $tmpTextLen=0;
    foreach $s (@sntList) {
      my ($sLen);
      $sLen=length($s);
      if($tmpTextLen+$sLen<$byteLimit) {
	push(@{$tokenizedText},$s);
      }
      else {
	push(@{$tokenizedText},substr($s,0,$byteLimit-$tmpTextLen));
	last;
      }
    }
  }
  if(defined(@{$tokenizedText}>0)) {
    for($t=0;$t<@{$tokenizedText};$t++) {
      $tokenizedText->[$t]=~s/\-/ \- /g;
      $tokenizedText->[$t]=~s/[^A-Za-z0-9\-]/ /g;
      $tokenizedText->[$t]=~s/^\s+//;
      $tokenizedText->[$t]=~s/\s+$//;
      $tokenizedText->[$t]=~s/\s+/ /g;
    }
  }
  else {
    print STDERR "readText_LCS: $inPath -> empty text\n";
  }
}

# LCS computing is based on unit and cannot lump all the text together
# as in computing ngram co-occurrences
sub readText_LCS_old {
  my $inPath=shift;
  my $tokenizedText=shift;
  my $type=shift;
  my $lengthLimit=shift;
  my $byteLimit=shift;
  my ($text,$t,$bsize,$wsize,$done);
  
  @{$tokenizedText}=();
  $bsize=0;
  $wsize=0;
  $done=0;
  open(TEXT,$inPath)||die "Cannot open $inPath\n";
  if($type=~/^SEE$/oi) {
    while(defined($line=<TEXT>)) { # SEE abstract format
      if($line=~/^<a (size=\"[0-9]+\" )?name=\"[0-9]+\">\[([0-9]+)\]<\/a>\s+<a href=\"\#[0-9]+\" id=[0-9]+>([^<]+)/o) {
	$text=$3;
	$text=~tr/A-Z/a-z/;
	&checkSummarySize_LCS($tokenizedText,\$text,\$wsize,\$bsize,\$done,$lengthLimit,$byteLimit);
      }
    }
  }
  elsif($type=~/^ISI$/oi) { # ISI standard sentence by sentence format
    while(defined($line=<TEXT>)) {
      if($line=~/^<S SNTNO=\"[0-9a-z,]+\">([^<]+)<\/S>/o) {
	$text=$1;
	$text=~tr/A-Z/a-z/;
	&checkSummarySize_LCS($tokenizedText,\$text,\$wsize,\$bsize,\$done,$lengthLimit,$byteLimit);
      }
    }
  }
  elsif($type=~/^SPL$/oi) { # SPL one Sentence Per Line format
    while(defined($line=<TEXT>)) {
      chomp($line);
      $line=~s/^\s+//;
      $line=~s/\s+$//;
      if(defined($line)&&length($line)>0) {
	$text=$line;
	$text=~tr/A-Z/a-z/;
	&checkSummarySize_LCS($tokenizedText,\$text,\$wsize,\$bsize,\$done,$lengthLimit,$byteLimit);
      }
    }
  }
  else {
    close(TEXT);
    die "Unknown input format: $type\n";
  }
  close(TEXT);
  if(defined(@{$tokenizedText}>0)) {
    for($t=0;$t<@{$tokenizedText};$t++) {
      $tokenizedText->[$t]=~s/\-/ \- /g;
      $tokenizedText->[$t]=~s/[^A-Za-z0-9\-]/ /g;
      $tokenizedText->[$t]=~s/^\s+//;
      $tokenizedText->[$t]=~s/\s+$//;
      $tokenizedText->[$t]=~s/\s+/ /g;
    }
  }
  else {
    print STDERR "readText_LCS: $inPath -> empty text\n";
  }
}

sub checkSummarySize_LCS {
  my $tokenizedText=shift;
  my $text=shift;
  my $wsize=shift;
  my $bsize=shift;
  my $done=shift;
  my $lenghtLimit=shift;
  my $byteLimit=shift;
  my (@words);
  
  @words=split(/\s+/,$$text);
  if(($lengthLimit==0&&$byteLimit==0)||
     ($lengthLimit!=0&&(scalar @words)+$$wsize<=$lengthLimit)||
     ($byteLimit!=0&&length($$text)+$$bsize<=$byteLimit)) {
    push(@{$tokenizedText},$$text);
    $$bsize+=length($$text);
    $$wsize+=(scalar @words);
  }
  elsif($lengthLimit!=0&&(scalar @words)+$$wsize>$lengthLimit) {
    if($$done==0) {
      push(@{$tokenizedText},$$text);
      $$done=1;
    }
  }
  elsif($byteLimit!=0&&length($$text)+$$bsize>$byteLimit) {
    if($$done==0) {
      push(@{$tokenizedText},$$text);
      $$done=1;
    }
  }
}

sub ngramScore {
  my $model_grams=shift;
  my $peer_grams=shift;
  my $hit=shift;
  my $score=shift;
  my ($s,$t,@tokens);
  
  $$hit=0;
  @tokens=keys (%$model_grams);
  foreach $t (@tokens) {
    if($t ne "_cn_") {
      my $h;
      $h=0;
      if(exists($peer_grams->{$t})) {
	$h=$peer_grams->{$t}<=$model_grams->{$t}?
	  $peer_grams->{$t}:$model_grams->{$t}; # clip
	$$hit+=$h;
      }
    }
  }
  if($model_grams->{"_cn_"}!=0) {
    $$score=sprintf("%07.5f",$$hit/$model_grams->{"_cn_"});
  }
  else {
    # no instance of n-gram at this length
    $$score=0;
    #	die "model n-grams has zero instance\n";
  }
}

sub skipBigramScore {
  my $model_grams=shift;
  my $peer_grams=shift;
  my $hit=shift;
  my $score=shift;
  my ($s,$t,@tokens);
  
  $$hit=0;
  @tokens=keys (%$model_grams);
  foreach $t (@tokens) {
    if($t ne "_cn_") {
      my $h;
      $h=0;
      if(exists($peer_grams->{$t})) {
	$h=$peer_grams->{$t}<=$model_grams->{$t}?
	  $peer_grams->{$t}:$model_grams->{$t}; # clip
	$$hit+=$h;
      }
    }
  }
  if($model_grams->{"_cn_"}!=0) {
    $$score=sprintf("%07.5f",$$hit/$model_grams->{"_cn_"});
  }
  else {
    # no instance of n-gram at this length
    $$score=0;
    #	die "model n-grams has zero instance\n";
  }
}

sub lcs {
  my $model=shift;
  my $peer=shift;
  my $hit=shift;
  my $score=shift;
  my $base=shift;
  my $model_1grams=shift;
  my $peer_1grams=shift;
  my ($i,$j,@hitMask,@LCS);
  
  $$hit=0;
  $$base=0;
  # compute LCS length for each model/peer pair
  for($i=0;$i<@{$model};$i++) {
    # use @hitMask to make sure multiple peer hit won't be counted as multiple hits
    @hitMask=();
    for($j=0;$j<@{$model->[$i]};$j++) {
      push(@hitMask,0); # initialize hit mask
    }
    $$base+=scalar @{$model->[$i]}; # add model length
    for($j=0;$j<@{$peer};$j++) {
      &lcs_inner($model->[$i],$peer->[$j],\@hitMask);
    }
    @LCS=();
    for($j=0;$j<@{$model->[$i]};$j++) {
      if($hitMask[$j]==1) {
	if(exists($model_1grams->{$model->[$i][$j]})&&
	   exists($peer_1grams->{$model->[$i][$j]})&&
	   $model_1grams->{$model->[$i][$j]}>0&&
	   $peer_1grams->{$model->[$i][$j]}>0) {
	  $$hit++;
	  #---------------------------------------------
	  # bookkeeping to clip over counting
	  # everytime a hit is found it is deducted
	  # from both model and peer unigram count
	  # if a unigram count already involve in
	  # one LCS match then it will not be counted
	  # if it match another token in the model
	  # unit. This will make sure LCS score
	  # is always lower than unigram score
	  $model_1grams->{$model->[$i][$j]}--;
	  $peer_1grams->{$model->[$i][$j]}--;
	  push(@LCS,$model->[$i][$j]);
	}
      }
    }
    if($debug) {
      print "LCS: ";
      if(@LCS) {
	print join(" ",@LCS),"\n";
      }
      else {
	print "-\n";
      }
    }
  }
  if($$base>0) {
    $$score=$$hit/$$base;
  }
  else {
    $$score=0;
  }
}

sub lcs_inner {
  my $model=shift;
  my $peer=shift;
  my $hitMask=shift;
  my $m=scalar @$model; # length of model
  my $n=scalar @$peer; # length of peer
  my ($i,$j);
  my (@c,@b);
  
  if(@{$model}==0) {
    return;
  }
  @c=();
  @b=();
  # initialize boundary condition and
  # the DP array
  for($i=0;$i<=$m;$i++) {
    push(@c,[]);
    push(@b,[]);
    for($j=0;$j<=$n;$j++) {
      push(@{$c[$i]},0);
      push(@{$b[$i]},0);
    }
  }
  for($i=1;$i<=$m;$i++) {
    for($j=1;$j<=$n;$j++) {
      if($model->[$i-1] eq $peer->[$j-1]) {
	# recursively solve the i-1 subproblem
	$c[$i][$j]=$c[$i-1][$j-1]+1;
	$b[$i][$j]="\\"; # go diagonal
      }
      elsif($c[$i-1][$j]>=$c[$i][$j-1]) {
	$c[$i][$j]=$c[$i-1][$j];
	$b[$i][$j]="^"; # go up
      }
      else {
	$c[$i][$j]=$c[$i][$j-1];
	$b[$i][$j]="<"; # go left
      }
    }
  }
  &markLCS($hitMask,\@b,$m,$n);
}

sub wlcs {
  my $model=shift;
  my $peer=shift;
  my $hit=shift;
  my $score=shift;
  my $base=shift;
  my $weightFactor=shift;
  my $model_1grams=shift;
  my $peer_1grams=shift;
  my ($i,$j,@hitMask,@LCS,$hitLen);
  
  $$hit=0;
  $$base=0;
  # compute LCS length for each model/peer pair
  for($i=0;$i<@{$model};$i++) {
    # use @hitMask to make sure multiple peer hit won't be counted as multiple hits
    @hitMask=();
    for($j=0;$j<@{$model->[$i]};$j++) {
      push(@hitMask,0); # initialize hit mask
    }
    $$base+=&wlcsWeight(scalar @{$model->[$i]},$weightFactor); # add model length
    for($j=0;$j<@{$peer};$j++) {
      &wlcs_inner($model->[$i],$peer->[$j],\@hitMask,$weightFactor);
    }
    @LCS=();
    $hitLen=0;
    for($j=0;$j<@{$model->[$i]};$j++) {
      if($hitMask[$j]==1) {
	if(exists($model_1grams->{$model->[$i][$j]})&&
	   exists($peer_1grams->{$model->[$i][$j]})&&
	   $model_1grams->{$model->[$i][$j]}>0&&
	   $peer_1grams->{$model->[$i][$j]}>0) {
	  $hitLen++;
	  if($j+1<@{$model->[$i]}&&$hitMask[$j+1]==0) {
	    $$hit+=&wlcsWeight($hitLen,$weightFactor);
	    $hitLen=0; # reset hit length
	  }
	  elsif($j+1==@{$model->[$i]}) {
	    # end of sentence
	    $$hit+=&wlcsWeight($hitLen,$weightFactor);
	    $hitLen=0; # reset hit length
	  }
	  #---------------------------------------------
	  # bookkeeping to clip over counting
	  # everytime a hit is found it is deducted
	  # from both model and peer unigram count
	  # if a unigram count already involve in
	  # one LCS match then it will not be counted
	  # if it match another token in the model
	  # unit. This will make sure LCS score
	  # is always lower than unigram score
	  $model_1grams->{$model->[$i][$j]}--;
	  $peer_1grams->{$model->[$i][$j]}--;
	  push(@LCS,$model->[$i][$j]);
	}
      }
    }
    if($debug) {
      print "ROUGE-W: ";
      if(@LCS) {
	print join(" ",@LCS),"\n";
      }
      else {
	print "-\n";
      }
    }
  }
  $$score=wlcsWeightInverse($$hit/$$base,$weightFactor);
}

sub wlcsWeight {
  my $r=shift;
  my $power=shift;
  
  return $r**$power;
}

sub wlcsWeightInverse {
  my $r=shift;
  my $power=shift;
  
  return $r**(1/$power);
}

sub wlcs_inner {
  my $model=shift;
  my $peer=shift;
  my $hitMask=shift;
  my $weightFactor=shift;
  my $m=scalar @$model; # length of model
  my $n=scalar @$peer; # length of peer
  my ($i,$j);
  my (@c,@b,@l);
  
  if(@{$model}==0) {
    return;
  }
  @c=();
  @b=();
  @l=(); # the length of consecutive matches so far
  # initialize boundary condition and
  # the DP array
  for($i=0;$i<=$m;$i++) {
    push(@c,[]);
    push(@b,[]);
    push(@l,[]);
    for($j=0;$j<=$n;$j++) {
      push(@{$c[$i]},0);
      push(@{$b[$i]},0);
      push(@{$l[$i]},0);
    }
  }
  for($i=1;$i<=$m;$i++) {
    for($j=1;$j<=$n;$j++) {
      if($model->[$i-1] eq $peer->[$j-1]) {
	# recursively solve the i-1 subproblem
	$k=$l[$i-1][$j-1];
	$c[$i][$j]=$c[$i-1][$j-1]+&wlcsWeight($k+1,$weightFactor)-&wlcsWeight($k,$weightFactor);
	$b[$i][$j]="\\"; # go diagonal
	$l[$i][$j]=$k+1; # extend the consecutive matching sequence
      }
      elsif($c[$i-1][$j]>=$c[$i][$j-1]) {
	$c[$i][$j]=$c[$i-1][$j];
	$b[$i][$j]="^"; # go up
	$l[$i][$j]=0; # no match at this position
      }
      else {
	$c[$i][$j]=$c[$i][$j-1];
	$b[$i][$j]="<"; # go left
	$l[$i][$j]=0; # no match at this position
      }
    }
  }
  &markLCS($hitMask,\@b,$m,$n);
}

sub markLCS {
  my $hitMask=shift;
  my $b=shift;
  my $i=shift;
  my $j=shift;
  
  while($i!=0&&$j!=0) {
    if($b->[$i][$j] eq "\\") {
      $i--;
      $j--;
      $hitMask->[$i]=1; # mark current model position as a hit
    }
    elsif($b->[$i][$j] eq "^") {
      $i--;
    }
    elsif($b->[$i][$j] eq "<") {
      $j--;
    }
    else {
      die "Illegal move in markLCS: ($i,$j): \"$b->[$i][$j]\".\n";
    }
  }
}

# currently only support simple lexical matching
sub getBEScore {
  my $modelBEs=shift;
  my $peerBEs=shift;
  my $hit=shift;
  my $score=shift;
  my ($s,$t,@tokens);
  
  $$hit=0;
  @tokens=keys (%$modelBEs);
  foreach $t (@tokens) {
    if($t ne "_cn_") {
      my $h;
      $h=0;
      if(exists($peerBEs->{$t})) {
	$h=$peerBEs->{$t}<=$modelBEs->{$t}?
	  $peerBEs->{$t}:$modelBEs->{$t}; # clip
	$$hit+=$h;
	if(defined($opt_v)) {
	  print "* Match: $t\n";
	}
      }
    }
  }
  if($modelBEs->{"_cn_"}!=0) {
    $$score=sprintf("%07.5f",$$hit/$modelBEs->{"_cn_"});
  }
  else {
    # no instance of BE at this length
    $$score=0;
    #	die "model BE has zero instance\n";
  }
}

sub MorphStem {
  my $token=shift;
  my ($os,$ltoken);
  
  if(!defined($token)||length($token)==0) {
    return undef;
  }
  
  $ltoken=$token;
  $ltoken=~tr/A-Z/a-z/;
  if(exists($exceptiondb{$ltoken})) {
    return $exceptiondb{$ltoken};
  }
  $os=$ltoken;
  return stem($os);
}

sub createNGram {
  my $text=shift;
  my $g=shift;
  my $NSIZE=shift;
  my @mx_tokens=();
  my @m_tokens=();
  my ($i,$j);
  my ($gram);
  my ($count);
  my ($byteSize);
  
  # remove stopwords
  if($useStopwords) {
    %stopwords=(); # consider stop words
  }
  unless(defined($text)) {
    $g->{"_cn_"}=0;
    return;
  }
  @mx_tokens=split(/\s+/,$text);
  $byteSize=0;
  for($i=0;$i<=$#mx_tokens;$i++) {
    unless(exists($stopwords{$mx_tokens[$i]})) {
      $byteSize+=length($mx_tokens[$i])+1; # the length of words in bytes so far + 1 space 
      if($mx_tokens[$i]=~/^[a-z0-9\$]/o) {
	if(defined($opt_m)) {
	  # use stemmer
	  # only consider words starting with these characters
	  # use Porter stemmer
	  my $stem;
	  $stem=$mx_tokens[$i];
	  if(length($stem)>3) {
	    push(@m_tokens,&MorphStem($stem));
	  }
	  else { # no stemmer as default
	    push(@m_tokens,$mx_tokens[$i]);
	  }
	}
	else { # no stemmer
	  push(@m_tokens,$mx_tokens[$i]);
	}
      }
    }
  }
  #-------------------------------------
  # create ngram
  $count=0;
  for($i=0;$i<=$#m_tokens-$NSIZE+1;$i++) {
    $gram=$m_tokens[$i];
    for($j=$i+1;$j<=$i+$NSIZE-1;$j++) {
      $gram.=" $m_tokens[$j]";
    }
    $count++;
    unless(exists($g->{$gram})) {
      $g->{$gram}=1;
    }
    else {
      $g->{$gram}++;
    }
  }
  # save total number of tokens
  $g->{"_cn_"}=$count;
}

sub createSkipBigram {
  my $text=shift;
  my $g=shift;
  my $skipDistance=shift;
  my @mx_tokens=();
  my @m_tokens=();
  my ($i,$j);
  my ($gram);
  my ($count);
  my ($byteSize);
  
  # remove stopwords
  if($useStopwords) {
    %stopwords=(); # consider stop words
  }
  unless(defined($text)) {
    $g->{"_cn_"}=0;
    return;
  }
  @mx_tokens=split(/\s+/,$text);
  $byteSize=0;
  for($i=0;$i<=$#mx_tokens;$i++) {
    unless(exists($stopwords{$mx_tokens[$i]})) {
      $byteSize+=length($mx_tokens[$i])+1; # the length of words in bytes so far + 1 space 
      if($mx_tokens[$i]=~/^[a-z0-9\$]/o) {
	if(defined($opt_m)) {
	  # use stemmer
	  # only consider words starting with these characters
	  # use Porter stemmer
	  my $stem;
	  $stem=$mx_tokens[$i];
	  if(length($stem)>3) {
	    push(@m_tokens,&MorphStem($stem));
	  }
	  else { # no stemmer as default
	    push(@m_tokens,$mx_tokens[$i]);
	  }
	}
	else { # no stemmer
	  push(@m_tokens,$mx_tokens[$i]);
	}
      }
    }
  }
  #-------------------------------------
  # create ngram
  $count=0;
  for($i=0;$i<$#m_tokens;$i++) {
    if(defined($opt_u)) {
      # add unigram count
      $gram=$m_tokens[$i];
      $count++;
      unless(exists($g->{$gram})) {
	$g->{$gram}=1;
      }
      else {
	$g->{$gram}++;
      }
    }
    for($j=$i+1;
	$j<=$#m_tokens&&($skipDistance<0||$j<=$i+$skipDistance+1);
	$j++) {
      $gram=$m_tokens[$i];
      $gram.=" $m_tokens[$j]";
      $count++;
      unless(exists($g->{$gram})) {
	$g->{$gram}=1;
      }
      else {
	$g->{$gram}++;
      }
    }
  }
  # save total number of tokens
  $g->{"_cn_"}=$count;
}

sub createBE {
  my $BEList=shift;
  my $BEMap=shift;
  my $BEMode=shift;
  my ($i);
  
  $BEMap->{"_cn_"}=0;
  unless(scalar @{$BEList} > 0) {
    return;
  }
  for($i=0;$i<=$#{$BEList};$i++) {
    my (@fds);
    my ($be,$stemH,$stemM);
    $be=$BEList->[$i];
    $be=~tr/A-Z/a-z/;
    @fds=split(/\|/,$be);
    if(@fds!=3) {
      print STDERR "Basic Element (BE) input file is invalid: *$be*\n";
      print STDERR "A BE file has to be in this format per line: HEAD|MODIFIER|RELATION\n";
      die "For more infomation about BE, go to: http://www.isi.edu/~cyl/BE\n";
    }
    $stemH=$fds[0];
    $stemM=$fds[1];
    if(defined($opt_m)) {
      # use stemmer
      # only consider words starting with these characters
      # use Porter stemmer
      if(length($stemH)>3) {
	$stemH=&MorphStemMulti($stemH);
      }
      if($stemM ne "NIL"&&
	 length($stemM)>3) {
	$stemM=&MorphStemMulti($stemM);
      }
    }
    if($BEMode eq "H"&&
      $stemM eq "nil") {
      unless(exists($BEMap->{$stemH})) {
	$BEMap->{$stemH}=0;
      }
      $BEMap->{$stemH}++;
      $BEMap->{"_cn_"}++;
    }
    elsif($BEMode eq "HM"&&
	  $stemM ne "nil") {
      my $pair="$stemH|$stemM";
      unless(exists($BEMap->{$pair})) {
	$BEMap->{$pair}=0;
      }
      $BEMap->{$pair}++;
      $BEMap->{"_cn_"}++;
    }
    elsif($BEMode eq "HMR"&&
	  $fds[2] ne "nil") {
      my $triple="$stemH|$stemM|$fds[2]";
      unless(exists($BEMap->{$triple})) {
	$BEMap->{$triple}=0;
      }
      $BEMap->{$triple}++;
      $BEMap->{"_cn_"}++;
    }
    elsif($BEMode eq "HM1") {
      my $pair="$stemH|$stemM";
      unless(exists($BEMap->{$pair})) {
	$BEMap->{$pair}=0;
      }
      $BEMap->{$pair}++;
      $BEMap->{"_cn_"}++;
    }
    elsif($BEMode eq "HMR1"&&
	  $fds[1] ne "nil") { 
      # relation can be "NIL" but modifier has to have value
      my $triple="$stemH|$stemM|$fds[2]";
      unless(exists($BEMap->{$triple})) {
	$BEMap->{$triple}=0;
      }
      $BEMap->{$triple}++;
      $BEMap->{"_cn_"}++;
    }
    elsif($BEMode eq "HMR2") {
      # modifier and relation can be "NIL"
      my $triple="$stemH|$stemM|$fds[2]";
      unless(exists($BEMap->{$triple})) {
	$BEMap->{$triple}=0;
      }
      $BEMap->{$triple}++;
      $BEMap->{"_cn_"}++;
    }
  }
}

sub MorphStemMulti {
  my $string=shift;
  my (@tokens,@stems,$t,$i);
  
  @tokens=split(/\s+/,$string);
  foreach $t (@tokens) {
    if($t=~/[A-Za-z0-9]/o&&
       $t!~/(-LRB-|-RRB-|-LSB-|-RSB-|-LCB-|-RCB-)/o) {
      my $s;
      if(defined($s=&MorphStem($t))) {
	$t=$s;
      }
      push(@stems,$t);
    }
    else {
      push(@stems,$t);
    }
  }
  return join(" ",@stems);
}

sub tokenizeText {
  my $text=shift;
  my $tokenizedText=shift;
  my @mx_tokens=();
  my ($i,$byteSize);
  
  # remove stopwords
  if($useStopwords) {
    %stopwords=(); # consider stop words
  }
  unless(defined($text)) {
    return;
  }
  @mx_tokens=split(/\s+/,$text);
  $byteSize=0;
  @{$tokenizedText}=();
  for($i=0;$i<=$#mx_tokens;$i++) {
    unless(exists($stopwords{$mx_tokens[$i]})) {
      $byteSize+=length($mx_tokens[$i])+1; # the length of words in bytes so far + 1 space 
      if($mx_tokens[$i]=~/^[a-z0-9\$]/o) {
	if(defined($opt_m)) {
	  # use stemmer
	  # only consider words starting with these characters
	  # use Porter stemmer
	  my $stem;
	  $stem=$mx_tokens[$i];
	  if(length($stem)>3) {
	    push(@{$tokenizedText},&MorphStem($stem));
	  }
	  else { # no stemmer as default
	    push(@{$tokenizedText},$mx_tokens[$i]);
	  }
	}
	else { # no stemmer
	  push(@{$tokenizedText},$mx_tokens[$i]);
	}
      }
    }
  }
}

sub tokenizeText_LCS {
  my $text=shift;
  my $tokenizedText=shift;
  my $lengthLimit=shift;
  my $byteLimit=shift;
  my @mx_tokens=();
  my ($i,$byteSize,$t,$done);
  
  # remove stopwords
  if($useStopwords) {
    %stopwords=(); # consider stop words
  }
  if(@{$text}==0) {
    return;
  }
  $byteSize=0;
  @{$tokenizedText}=();
  $done=0;
  for($t=0;$t<@{$text}&&$done==0;$t++) {
    @mx_tokens=split(/\s+/,$text->[$t]);
    # tokenized array for each separate unit (for example, sentence)
    push(@{$tokenizedText},[]);
    for($i=0;$i<=$#mx_tokens;$i++) {
      unless(exists($stopwords{$mx_tokens[$i]})) {
	$byteSize+=length($mx_tokens[$i])+1; # the length of words in bytes so far + 1 space 
	if($mx_tokens[$i]=~/^[a-z0-9\$]/o) {
	  if(defined($opt_m)) {
	    # use stemmer
	    # only consider words starting with these characters
	    # use Porter stemmer
	    my $stem;
	    $stem=$mx_tokens[$i];
	    if(length($stem)>3) {
	      push(@{$tokenizedText->[$t]},&MorphStem($stem));
	    }
	    else { # no stemmer as default
	      push(@{$tokenizedText->[$t]},$mx_tokens[$i]);
	    }
	  }
	  else { # no stemmer
	    push(@{$tokenizedText->[$t]},$mx_tokens[$i]);
	  }
	}
      }
    }
  }
}

# Input file configuration is a list of peer/model pair for each evaluation
# instance. Each evaluation pair is in a line separated by white spaces
# characters.
sub readFileList {
  my ($ROUGEEvals)=shift;
  my ($ROUGEEvalIDs)=shift;
  my ($ROUGEPeerIDTable)=shift;
  my ($doc)=shift;
  my ($evalID,$pair);
  my ($inputFormat,$peerFile,$modelFile,$peerID,$modelID);
  my (@files);

  $evalID=1;  # automatically generated evaluation ID starting from 1
  $peerID=$systemID;
  $modelID="M";
  unless(exists($ROUGEPeerIDTable->{$peerID})) {
    $ROUGEPeerIDTable->{$peerID}=1;
  }
  while(defined($pair=<$doc>)) {
    my ($peerPath,$modelPath);
    if($pair!~/^\#/o&&
       $pair!~/^\s*$/o) { # Lines start with '#' is a comment line
      chomp($pair);
      $pair=~s/^\s+//;
      $pair=~s/\s+$//;
      @files=split(/\s+/,$pair);
      if(scalar @files < 2) {
	die "File list has to have at least 2 filenames per line (peer model1 model2 ... modelN)\n";
      }
      $peerFile=$files[0];
      unless(exists($ROUGEEvals->{$evalID})) {
	$ROUGEEvals->{$evalID}={};
	push(@{$ROUGEEvalIDs},$evalID);
	$ROUGEEvals->{$evalID}{"IF"}=$opt_z;
      }
      unless(exists($ROUGEPeerIDTable->{$peerID})) {
	$ROUGEPeerIDTable->{$peerID}=1; # save peer ID for reference
      }
      if(exists($ROUGEEvals->{$evalID})) {
	unless(exists($ROUGEEvals->{$evalID}{"Ps"})) {
	  $ROUGEEvals->{$evalID}{"Ps"}={};
	  $ROUGEEvals->{$evalID}{"PIDList"}=[];
	}
	push(@{$ROUGEEvals->{$evalID}{"PIDList"}},$peerID); # save peer IDs
      }
      else {
	die "(PEERS) Evaluation database does not contain entry for this evaluation ID: $evalID\n";
      }
      # remove leading and trailing newlines and
      # spaces
      if(exists($ROUGEEvals->{$evalID}{"Ps"})) {
	$ROUGEEvals->{$evalID}{"Ps"}{$peerID}=$peerFile; # save peer filename
      }
      else {
	die "(P) Evaluation database does not contain entry for this evaluation ID: $evalID\n";
      }
      for($mid=1;$mid<=$#files;$mid++) {
	$modelFile=$files[$mid];
	if(exists($ROUGEEvals->{$evalID})) {
	  unless(exists($ROUGEEvals->{$evalID}{"Ms"})) {
	    $ROUGEEvals->{$evalID}{"Ms"}={};
	    $ROUGEEvals->{$evalID}{"MIDList"}=[];
	  }
	  push(@{$ROUGEEvals->{$evalID}{"MIDList"}},"$modelID.$mid"); # save model IDs
	}
	else {
	  die "(MODELS) Evaluation database does not contain entry for this evaluation ID: $evalID\n";
	}
	# remove leading and trailing newlines and
	# spaces
	if(exists($ROUGEEvals->{$evalID}{"Ms"})) {
	  $ROUGEEvals->{$evalID}{"Ms"}{"$modelID.$mid"}=$modelFile; # save peer filename
	}
	else {
	  die "(M) Evaluation database does not contain entry for this evaluation ID: $evalID\n";
	}
      }
      $evalID++;
    }
  }
}

# read and parse ROUGE evaluation file
sub readEvals {
  my ($ROUGEEvals)=shift;
  my ($ROUGEEvalIDs)=shift;
  my ($ROUGEPeerIDTable)=shift;
  my ($node)=shift;
  my ($evalID)=shift;
  my ($inputFormat,$peerRoot,$modelRoot,$peerFile,$modelFile,$peerID,$modelID);
  
  if(defined($opt_z)) {
    # Input file configuration is a list of peer/model pair for each evaluation
    # instance. Each evaluation pair is in a line separated by white spaces
    # characters.
    &readFileList($ROUGEEvals,$ROUGEEvalIDs,$ROUGEPeerIDTable,$node);
    return;
  }
  # Otherwise, the input file is the standard ROUGE XML evaluation configuration
  # file.
  if($node->getNodeType==ELEMENT_NODE||
     $node->getNodeType==DOCUMENT_NODE) {
    if($node->getNodeType==ELEMENT_NODE) {
      $nodeName=$node->getNodeName;
      if($nodeName=~/^EVAL$/oi) {
	$evalID=$node->getAttributeNode("ID")->getValue;
	unless(exists($ROUGEEvals->{$evalID})) {
	  $ROUGEEvals->{$evalID}={};
	  push(@{$ROUGEEvalIDs},$evalID);
	}
	foreach my $child ($node->getChildNodes()) {
	  &readEvals($ROUGEEvals,$ROUGEEvalIDs,$ROUGEPeerIDTable,$child,$evalID);
	}
      }
      elsif($nodeName=~/^INPUT-FORMAT$/oi) {
	$inputFormat=$node->getAttributeNode("TYPE")->getValue;
	if($inputFormat=~/^(SEE|ISI|SPL|SIMPLE)$/oi) { # SPL: one sentence per line
	  if(exists($ROUGEEvals->{$evalID})) {
	    $ROUGEEvals->{$evalID}{"IF"}=$inputFormat;
	  }
	  else {
	    die "(INPUT-FORMAT) Evaluation database does not contain entry for this evaluation ID: $evalID\n";
	  }
	}
	else {
	  die "Unknown input type: $inputFormat\n";
	}
      }
      elsif($nodeName=~/^PEER-ROOT$/oi) {
	foreach my $child ($node->getChildNodes()) {
	  if($child->getNodeType==TEXT_NODE) {
	    $peerRoot=$child->getData;
	    # remove leading and trailing newlines and
	    # spaces
	    $peerRoot=~s/^[\n\s]+//;
	    $peerRoot=~s/[\n\s]+$//;
	    if(exists($ROUGEEvals->{$evalID})) {
	      $ROUGEEvals->{$evalID}{"PR"}=$peerRoot;
	    }
	    else {
	      die "(PEER-ROOT) Evaluation database does not contain entry for this evaluation ID: $evalID\n";
	    }
	  }
	}
      }
      elsif($nodeName=~/^MODEL-ROOT$/oi) {
	foreach my $child ($node->getChildNodes()) {
	  if($child->getNodeType==TEXT_NODE) {
	    $modelRoot=$child->getData;
	    # remove leading and trailing newlines and
	    # spaces
	    $modelRoot=~s/^[\n\s]+//;
	    $modelRoot=~s/[\n\s]+$//;
	    if(exists($ROUGEEvals->{$evalID})) {
	      $ROUGEEvals->{$evalID}{"MR"}=$modelRoot;
	    }
	    else {
	      die "(MODEL-ROOT) Evaluation database does not contain entry for this evaluation ID: $evalID\n";
	    }
	  }
	}
      }
      elsif($nodeName=~/^PEERS$/oi) {
	foreach my $child ($node->getChildNodes()) {
	  if($child->getNodeType==ELEMENT_NODE&&
	     $child->getNodeName=~/^P$/oi) {
	    $peerID=$child->getAttributeNode("ID")->getValue;
	    unless(exists($ROUGEPeerIDTable->{$peerID})) {
	      $ROUGEPeerIDTable->{$peerID}=1; # save peer ID for reference
	    }
	    if(exists($ROUGEEvals->{$evalID})) {
	      unless(exists($ROUGEEvals->{$evalID}{"Ps"})) {
		$ROUGEEvals->{$evalID}{"Ps"}={};
		$ROUGEEvals->{$evalID}{"PIDList"}=[];
	      }
	      push(@{$ROUGEEvals->{$evalID}{"PIDList"}},$peerID); # save peer IDs
	    }
	    else {
	      die "(PEERS) Evaluation database does not contain entry for this evaluation ID: $evalID\n";
	    }
	    foreach my $grandchild ($child->getChildNodes()) {
	      if($grandchild->getNodeType==TEXT_NODE) {
		$peerFile=$grandchild->getData;
		# remove leading and trailing newlines and
		# spaces
		$peerFile=~s/^[\n\s]+//;
		$peerFile=~s/[\n\s]+$//;
		if(exists($ROUGEEvals->{$evalID}{"Ps"})) {
		  $ROUGEEvals->{$evalID}{"Ps"}{$peerID}=$peerFile; # save peer filename
		}
		else {
		  die "(P) Evaluation database does not contain entry for this evaluation ID: $evalID\n";
		}
	      }
	    }
	  }
	}
      }
      elsif($nodeName=~/^MODELS$/oi) {
	foreach my $child ($node->getChildNodes()) {
	  if($child->getNodeType==ELEMENT_NODE&&
	     $child->getNodeName=~/^M$/oi) {
	    $modelID=$child->getAttributeNode("ID")->getValue;
	    if(exists($ROUGEEvals->{$evalID})) {
	      unless(exists($ROUGEEvals->{$evalID}{"Ms"})) {
		$ROUGEEvals->{$evalID}{"Ms"}={};
		$ROUGEEvals->{$evalID}{"MIDList"}=[];
	      }
	      push(@{$ROUGEEvals->{$evalID}{"MIDList"}},$modelID); # save model IDs
	    }
	    else {
	      die "(MODELS) Evaluation database does not contain entry for this evaluation ID: $evalID\n";
	    }
	    foreach my $grandchild ($child->getChildNodes()) {
	      if($grandchild->getNodeType==TEXT_NODE) {
		$modelFile=$grandchild->getData;
		# remove leading and trailing newlines and
		# spaces
		$modelFile=~s/^[\n\s]+//;
		$modelFile=~s/[\n\s]+$//;
		if(exists($ROUGEEvals->{$evalID}{"Ms"})) {
		  $ROUGEEvals->{$evalID}{"Ms"}{$modelID}=$modelFile; # save peer filename
		}
		else {
		  die "(M) Evaluation database does not contain entry for this evaluation ID: $evalID\n";
		}
	      }
	    }
	  }
	}
      }
      else {
	foreach my $child ($node->getChildNodes()) {
	  &readEvals($ROUGEEvals,$ROUGEEvalIDs,$ROUGEPeerIDTable,$child,$evalID);
	}
      }
    }
    else {
      foreach my $child ($node->getChildNodes()) {
	&readEvals($ROUGEEvals,$ROUGEEvalIDs,$ROUGEPeerIDTable,$child,$evalID);
      }
    }
  }
  else {
    if(defined($node->getChildNodes())) {
      foreach my $child ($node->getChildNodes()) {
	&readEvals($ROUGEEvals,$ROUGEEvalIDs,$ROUGEPeerIDTable,$child,$evalID);
      }
    }
  }
}

# Porter stemmer in Perl. Few comments, but it's easy to follow against the rules in the original
# paper, in
#
#   Porter, 1980, An algorithm for suffix stripping, Program, Vol. 14,
#   no. 3, pp 130-137,
#
# see also http://www.tartarus.org/~martin/PorterStemmer

# Release 1

local %step2list;
local %step3list;
local ($c, $v, $C, $V, $mgr0, $meq1, $mgr1, $_v);


sub stem
  {  my ($stem, $suffix, $firstch);
     my $w = shift;
     if (length($w) < 3) { return $w; } # length at least 3
     # now map initial y to Y so that the patterns never treat it as vowel:
     $w =~ /^./; $firstch = $&;
     if ($firstch =~ /^y/) { $w = ucfirst $w; }
     
     # Step 1a
     if ($w =~ /(ss|i)es$/) { $w=$`.$1; }
     elsif ($w =~ /([^s])s$/) { $w=$`.$1; }
     # Step 1b
     if ($w =~ /eed$/) { if ($` =~ /$mgr0/o) { chop($w); } }
     elsif ($w =~ /(ed|ing)$/)
       {  $stem = $`;
	  if ($stem =~ /$_v/o)
	    {  $w = $stem;
	       if ($w =~ /(at|bl|iz)$/) { $w .= "e"; }
	       elsif ($w =~ /([^aeiouylsz])\1$/) { chop($w); }
	       elsif ($w =~ /^${C}${v}[^aeiouwxy]$/o) { $w .= "e"; }
   }
}
# Step 1c
  if ($w =~ /y$/) { $stem = $`; if ($stem =~ /$_v/o) { $w = $stem."i"; } }

# Step 2
if ($w =~ /(ational|tional|enci|anci|izer|bli|alli|entli|eli|ousli|ization|ation|ator|alism|iveness|fulness|ousness|aliti|iviti|biliti|logi)$/)
  { $stem = $`; $suffix = $1;
    if ($stem =~ /$mgr0/o) { $w = $stem . $step2list{$suffix}; }
  }

# Step 3

if ($w =~ /(icate|ative|alize|iciti|ical|ful|ness)$/)
  { $stem = $`; $suffix = $1;
    if ($stem =~ /$mgr0/o) { $w = $stem . $step3list{$suffix}; }
  }

# Step 4

   # CYL: Modified 02/14/2004, a word ended in -ement will not try the rules "-ment" and "-ent"
#   if ($w =~ /(al|ance|ence|er|ic|able|ible|ant|ement|ment|ent|ou|ism|ate|iti|ous|ive|ize)$/)
#   elsif ($w =~ /(s|t)(ion)$/)
#   { $stem = $` . $1; if ($stem =~ /$mgr1/o) { $w = $stem; } }
   if ($w =~ /(al|ance|ence|er|ic|able|ible|ant|ement|ou|ism|ate|iti|ous|ive|ize)$/)
   { $stem = $`; if ($stem =~ /$mgr1/o) { $w = $stem; } }
   if ($w =~ /ment$/)
   { $stem = $`; if ($stem =~ /$mgr1/o) { $w = $stem; } }
   if ($w =~ /ent$/)
   { $stem = $`; if ($stem =~ /$mgr1/o) { $w = $stem; } }
   elsif ($w =~ /(s|t)(ion)$/)
   { $stem = $` . $1; if ($stem =~ /$mgr1/o) { $w = $stem; } }

#  Step 5

if ($w =~ /e$/)
  { $stem = $`;
    if ($stem =~ /$mgr1/o or
	($stem =~ /$meq1/o and not $stem =~ /^${C}${v}[^aeiouwxy]$/o))
{ $w = $stem; }
}
if ($w =~ /ll$/ and $w =~ /$mgr1/o) { chop($w); }

# and turn initial Y back to y
if ($firstch =~ /^y/) { $w = lcfirst $w; }
return $w;
}

  sub initialise {
    
    %step2list =
      ( 'ational'=>'ate', 'tional'=>'tion', 'enci'=>'ence', 'anci'=>'ance', 'izer'=>'ize', 'bli'=>'ble',
	'alli'=>'al', 'entli'=>'ent', 'eli'=>'e', 'ousli'=>'ous', 'ization'=>'ize', 'ation'=>'ate',
	'ator'=>'ate', 'alism'=>'al', 'iveness'=>'ive', 'fulness'=>'ful', 'ousness'=>'ous', 'aliti'=>'al',
	'iviti'=>'ive', 'biliti'=>'ble', 'logi'=>'log');
    
    %step3list =
      ('icate'=>'ic', 'ative'=>'', 'alize'=>'al', 'iciti'=>'ic', 'ical'=>'ic', 'ful'=>'', 'ness'=>'');
    
    
    $c =    "[^aeiou]";          # consonant
    $v =    "[aeiouy]";          # vowel
    $C =    "${c}[^aeiouy]*";    # consonant sequence
    $V =    "${v}[aeiou]*";      # vowel sequence
    
    $mgr0 = "^(${C})?${V}${C}";               # [C]VC... is m>0
    $meq1 = "^(${C})?${V}${C}(${V})?" . '$';  # [C]VC[V] is m=1
   $mgr1 = "^(${C})?${V}${C}${V}${C}";       # [C]VCVC... is m>1
   $_v   = "^(${C})?${v}";                   # vowel in stem

}



================================================
FILE: files2rouge/RELEASE-1.5.5/XML/DOM/AttDef.pod
================================================
=head1 NAME

XML::DOM::AttDef - A single XML attribute definition in an ATTLIST in XML::DOM 

=head1 DESCRIPTION

XML::DOM::AttDef extends L<XML::DOM::Node>, but is not part of the DOM Level 1
specification.

Each object of this class represents one attribute definition in an AttlistDecl.

=head2 METHODS

=over 4

=item getName

Returns the attribute name.

=item getDefault

Returns the default value, or undef.

=item isFixed

Whether the attribute value is fixed (see #FIXED keyword.)

=item isRequired

Whether the attribute value is required (see #REQUIRED keyword.)

=item isImplied

Whether the attribute value is implied (see #IMPLIED keyword.)

=back


================================================
FILE: files2rouge/RELEASE-1.5.5/XML/DOM/AttlistDecl.pod
================================================
=head1 NAME

XML::DOM::AttlistDecl - An XML ATTLIST declaration in XML::DOM

=head1 DESCRIPTION

XML::DOM::AttlistDecl extends L<XML::DOM::Node> but is not part of the 
DOM Level 1 specification.

This node represents an ATTLIST declaration, e.g.

 <!ATTLIST person
   sex      (male|female)  #REQUIRED
   hair     CDATA          "bold"
   eyes     (none|one|two) "two"
   species  (human)        #FIXED "human"> 

Each attribute definition is stored a separate AttDef node. The AttDef nodes can
be retrieved with getAttDef and added with addAttDef.
(The AttDef nodes are stored in a NamedNodeMap internally.)

=head2 METHODS

=over 4

=item getName

Returns the Element tagName.

=item getAttDef (attrName)

Returns the AttDef node for the attribute with the specified name.

=item addAttDef (attrName, type, default, [ fixed ])

Adds a AttDef node for the attribute with the specified name.

Parameters:
 I<attrName> the attribute name.
 I<type>     the attribute type (e.g. "CDATA" or "(male|female)".)
 I<default>  the default value enclosed in quotes (!), the string #IMPLIED or 
             the string #REQUIRED.
 I<fixed>    whether the attribute is '#FIXED' (default is 0.)

=back


================================================
FILE: files2rouge/RELEASE-1.5.5/XML/DOM/Attr.pod
================================================
=head1 NAME

XML::DOM::Attr - An XML attribute in XML::DOM

=head1 DESCRIPTION

XML::DOM::Attr extends L<XML::DOM::Node>.

The Attr nodes built by the XML::DOM::Parser always have one child node
which is a Text node containing the expanded string value (i.e. EntityReferences
are always expanded.) EntityReferences may be added when modifying or creating
a new Document.

The Attr interface represents an attribute in an Element object.
Typically the allowable values for the attribute are defined in a
document type definition.

Attr objects inherit the Node interface, but since they are not
actually child nodes of the element they describe, the DOM does not
consider them part of the document tree. Thus, the Node attributes
parentNode, previousSibling, and nextSibling have a undef value for Attr
objects. The DOM takes the view that attributes are properties of
elements rather than having a separate identity from the elements they
are associated with; this should make it more efficient to implement
such features as default attributes associated with all elements of a
given type. Furthermore, Attr nodes may not be immediate children of a
DocumentFragment. However, they can be associated with Element nodes
contained within a DocumentFragment. In short, users and implementors
of the DOM need to be aware that Attr nodes have some things in common
with other objects inheriting the Node interface, but they also are
quite distinct.

The attribute's effective value is determined as follows: if this
attribute has been explicitly assigned any value, that value is the
attribute's effective value; otherwise, if there is a declaration for
this attribute, and that declaration includes a default value, then
that default value is the attribute's effective value; otherwise, the
attribute does not exist on this element in the structure model until
it has been explicitly added. Note that the nodeValue attribute on the
Attr instance can also be used to retrieve the string version of the
attribute's value(s).

In XML, where the value of an attribute can contain entity references,
the child nodes of the Attr node provide a representation in which
entity references are not expanded. These child nodes may be either
Text or EntityReference nodes. Because the attribute type may be
unknown, there are no tokenized attribute values.

=head2 METHODS

=over 4

=item getValue

On retrieval, the value of the attribute is returned as a string. 
Character and general entity references are replaced with their values.

=item setValue (str)

DOM Spec: On setting, this creates a Text node with the unparsed contents of the 
string.

=item getName

Returns the name of this attribute.

=back


================================================
FILE: files2rouge/RELEASE-1.5.5/XML/DOM/CDATASection.pod
================================================
=head1 NAME

XML::DOM::CDATASection - Escaping XML text blocks in XML::DOM

=head1 DESCRIPTION

XML::DOM::CDATASection extends L<XML::DOM::CharacterData> which extends
L<XML::DOM::Node>.

CDATA sections are used to escape blocks of text containing characters
that would otherwise be regarded as markup. The only delimiter that is
recognized in a CDATA section is the "]]>" string that ends the CDATA
section. CDATA sections can not be nested. The primary purpose is for
including material such as XML fragments, without needing to escape all
the delimiters.

The DOMString attribute of the Text node holds the text that is
contained by the CDATA section. Note that this may contain characters
that need to be escaped outside of CDATA sections and that, depending
on the character encoding ("charset") chosen for serialization, it may
be impossible to write out some characters as part of a CDATA section.

The CDATASection interface inherits the CharacterData interface through
the Text interface. Adjacent CDATASections nodes are not merged by use
of the Element.normalize() method.

B<NOTE:> XML::DOM::Parser and XML::DOM::ValParser convert all CDATASections 
to regular text by default.
To preserve CDATASections, set the parser option KeepCDATA to 1.




================================================
FILE: files2rouge/RELEASE-1.5.5/XML/DOM/CharacterData.pod
================================================
=head1 NAME

XML::DOM::CharacterData - Common interface for Text, CDATASections and Comments

=head1 DESCRIPTION

XML::DOM::CharacterData extends L<XML::DOM::Node>

The CharacterData interface extends Node with a set of attributes and
methods for accessing character data in the DOM. For clarity this set
is defined here rather than on each object that uses these attributes
and methods. No DOM objects correspond directly to CharacterData,
though Text, Comment and CDATASection do inherit the interface from it. 
All offsets in this interface start from 0.

=head2 METHODS

=over 4

=item getData and setData (data)

The character data of the node that implements this
interface. The DOM implementation may not put arbitrary
limits on the amount of data that may be stored in a
CharacterData node. However, implementation limits may mean
that the entirety of a node's data may not fit into a single
DOMString. In such cases, the user may call substringData to
retrieve the data in appropriately sized pieces.

=item getLength

The number of characters that are available through data and
the substringData method below. This may have the value zero,
i.e., CharacterData nodes may be empty.

=item substringData (offset, count)

Extracts a range of data from the node.

Parameters:
 I<offset>  Start offset of substring to extract.
 I<count>   The number of characters to extract.

Return Value: The specified substring. If the sum of offset and count
exceeds the length, then all characters to the end of
the data are returned.

=item appendData (str)

Appends the string to the end of the character data of the
node. Upon success, data provides access to the concatenation
of data and the DOMString specified.

=item insertData (offset, arg)

Inserts a string at the specified character offset.

Parameters:
 I<offset>  The character offset at which to insert.
 I<arg>     The DOMString to insert.

=item deleteData (offset, count)

Removes a range of characters from the node. 
Upon success, data and length reflect the change.
If the sum of offset and count exceeds length then all characters 
from offset to the end of the data are deleted.

Parameters: 
 I<offset>  The offset from which to remove characters. 
 I<count>   The number of characters to delete. 

=item replaceData (offset, count, arg)

Replaces the characters starting at the specified character
offset with the specified string.

Parameters:
 I<offset>  The offset from which to start replacing.
 I<count>   The number of characters to replace. 
 I<arg>     The DOMString with which the range must be replaced.

If the sum of offset and count exceeds length, then all characters to the end of
the data are replaced (i.e., the effect is the same as a remove method call with 
the same range, followed by an append method invocation).

=back


================================================
FILE: files2rouge/RELEASE-1.5.5/XML/DOM/Comment.pod
================================================
=head1 NAME

XML::DOM::Comment - An XML comment in XML::DOM

=head1 DESCRIPTION

XML::DOM::Comment extends L<XML::DOM::CharacterData> which extends 
L<XML::DOM::Node>.

This node represents the content of a comment, i.e., all the characters
between the starting '<!--' and ending '-->'. Note that this is the
definition of a comment in XML, and, in practice, HTML, although some
HTML tools may implement the full SGML comment structure.



================================================
FILE: files2rouge/RELEASE-1.5.5/XML/DOM/DOMException.pm
================================================
######################################################################
package XML::DOM::DOMException;
######################################################################

use Exporter;

use overload '""' => \&stringify;
use vars qw ( @ISA @EXPORT @ErrorNames );

BEGIN
{
  @ISA = qw( Exporter );
  @EXPORT = qw( INDEX_SIZE_ERR
		DOMSTRING_SIZE_ERR
		HIERARCHY_REQUEST_ERR
		WRONG_DOCUMENT_ERR
		INVALID_CHARACTER_ERR
		NO_DATA_ALLOWED_ERR
		NO_MODIFICATION_ALLOWED_ERR
		NOT_FOUND_ERR
		NOT_SUPPORTED_ERR
		INUSE_ATTRIBUTE_ERR
	      );
}

sub UNKNOWN_ERR			() {0;}	# not in the DOM Spec!
sub INDEX_SIZE_ERR		() {1;}
sub DOMSTRING_SIZE_ERR		() {2;}
sub HIERARCHY_REQUEST_ERR	() {3;}
sub WRONG_DOCUMENT_ERR		() {4;}
sub INVALID_CHARACTER_ERR	() {5;}
sub NO_DATA_ALLOWED_ERR		() {6;}
sub NO_MODIFICATION_ALLOWED_ERR	() {7;}
sub NOT_FOUND_ERR		() {8;}
sub NOT_SUPPORTED_ERR		() {9;}
sub INUSE_ATTRIBUTE_ERR		() {10;}

@ErrorNames = (
	       "UNKNOWN_ERR",
	       "INDEX_SIZE_ERR",
	       "DOMSTRING_SIZE_ERR",
	       "HIERARCHY_REQUEST_ERR",
	       "WRONG_DOCUMENT_ERR",
	       "INVALID_CHARACTER_ERR",
	       "NO_DATA_ALLOWED_ERR",
	       "NO_MODIFICATION_ALLOWED_ERR",
	       "NOT_FOUND_ERR",
	       "NOT_SUPPORTED_ERR",
	       "INUSE_ATTRIBUTE_ERR"
	      );
sub new
{
    my ($type, $code, $msg) = @_;
    my $self = bless {Code => $code}, $type;

    $self->{Message} = $msg if defined $msg;

#    print "=> Exception: " . $self->stringify . "\n"; 
    $self;
}

sub getCode
{
    $_[0]->{Code};
}

#------------------------------------------------------------
# Extra method implementations

sub getName
{
    $ErrorNames[$_[0]->{Code}];
}

sub getMessage
{
    $_[0]->{Message};
}

sub stringify
{
    my $self = shift;

    "XML::DOM::DOMException(Code=" . $self->getCode . ", Name=" .
	$self->getName . ", Message=" . $self->getMessage . ")";
}

1; # package return code


================================================
FILE: files2rouge/RELEASE-1.5.5/XML/DOM/DOMImplementation.pod
================================================
=head1 NAME

XML::DOM::DOMImplementation - Information about XML::DOM implementation

=head1 DESCRIPTION

The DOMImplementation interface provides a number of methods for
performing operations that are independent of any particular instance
of the document object model.

The DOM Level 1 does not specify a way of creating a document instance,
and hence document creation is an operation specific to an
implementation. Future Levels of the DOM specification are expected to
provide methods for creating documents directly.

=head2 METHODS

=over 4

=item hasFeature (feature, version)

Returns 1 if and only if feature equals "XML" and version equals "1.0".

=back


================================================
FILE: files2rouge/RELEASE-1.5.5/XML/DOM/Document.pod
================================================
=head1 NAME

XML::DOM::Document - An XML document node in XML::DOM

=head1 DESCRIPTION

XML::DOM::Document extends L<XML::DOM::Node>.

It is the main root of the XML document structure as returned by 
XML::DOM::Parser::parse and XML::DOM::Parser::parsefile.

Since elements, text nodes, comments, processing instructions, etc.
cannot exist outside the context of a Document, the Document interface
also contains the factory methods needed to create these objects. The
Node objects created have a getOwnerDocument method which associates
them with the Document within whose context they were created.

=head2 METHODS

=over 4

=item getDocumentElement

This is a convenience method that allows direct access to
the child node that is the root Element of the document.

=item getDoctype

The Document Type Declaration (see DocumentType) associated
with this document. For HTML documents as well as XML
documents without a document type declaration this returns
undef. The DOM Level 1 does not support editing the Document
Type Declaration.

B<Not In DOM Spec>: This implementation allows editing the doctype. 
See I<XML::DOM::ignoreReadOnly> for details.

=item getImplementation

The DOMImplementation object that handles this document. A
DOM application may use objects from multiple implementations.

=item createElement (tagName)

Creates an element of the type specified. Note that the
instance returned implements the Element interface, so
attributes can be specified directly on the returned object.

DOMExceptions:

=over 4

=item * INVALID_CHARACTER_ERR

Raised if the tagName does not conform to the XML spec.

=back

=item createTextNode (data)

Creates a Text node given the specified string.

=item createComment (data)

Creates a Comment node given the specified string.

=item createCDATASection (data)

Creates a CDATASection node given the specified string.

=item createAttribute (name [, value [, specified ]])

Creates an Attr of the given name. Note that the Attr
instance can then be set on an Element using the setAttribute method.

B<Not In DOM Spec>: The DOM Spec does not allow passing the value or the 
specified property in this method. In this implementation they are optional.

Parameters:
 I<value>     The attribute's value. See Attr::setValue for details.
              If the value is not supplied, the specified property is set to 0.
 I<specified> Whether the attribute value was specified or whether the default
              value was used. If not supplied, it's assumed to be 1.

DOMExceptions:

=over 4

=item * INVALID_CHARACTER_ERR

Raised if the name does not conform to the XML spec.

=back

=item createProcessingInstruction (target, data)

Creates a ProcessingInstruction node given the specified name and data strings.

Parameters:
 I<target>  The target part of the processing instruction.
 I<data>    The data for the node.

DOMExceptions:

=over 4

=item * INVALID_CHARACTER_ERR

Raised if the target does not conform to the XML spec.

=back

=item createDocumentFragment

Creates an empty DocumentFragment object.

=item createEntityReference (name)

Creates an EntityReference object.

=back

=head2 Additional methods not in the DOM Spec

=over 4

=item getXMLDecl and setXMLDecl (xmlDecl)

Returns the XMLDecl for this Document or undef if none was specified.
Note that XMLDecl is not part of the list of child nodes.

=item setDoctype (doctype)

Sets or replaces the DocumentType. 
B<NOTE>: Don't use appendChild or insertBefore to set the DocumentType.
Even though doctype will be part of the list of child nodes, it is handled
specially.

=item getDefaultAttrValue (elem, attr)

Returns the default attribute value as a string or undef, if none is available.

Parameters:
 I<elem>    The element tagName.
 I<attr>    The attribute name.

=item getEntity (name)

Returns the Entity with the specified name.

=item createXMLDecl (version, encoding, standalone)

Creates an XMLDecl object. All parameters may be undefined.

=item createDocumentType (name, sysId, pubId)

Creates a DocumentType object. SysId and pubId may be undefined.

=item createNotation (name, base, sysId, pubId)

Creates a new Notation object. Consider using 
XML::DOM::DocumentType::addNotation!

=item createEntity (parameter, notationName, value, sysId, pubId, ndata)

Creates an Entity object. Consider using XML::DOM::DocumentType::addEntity!

=item createElementDecl (name, model)

Creates an ElementDecl object.

DOMExceptions:

=over 4

=item * INVALID_CHARACTER_ERR

Raised if the element name (tagName) does not conform to the XML spec.

=back

=item createAttlistDecl (name)

Creates an AttlistDecl object.

DOMExceptions:

=over 4

=item * INVALID_CHARACTER_ERR

Raised if the element name (tagName) does not conform to the XML spec.

=back

=item expandEntity (entity [, parameter])

Expands the specified entity or parameter entity (if parameter=1) and returns
its value as a string, or undef if the entity does not exist.
(The entity name should not contain the '%', '&' or ';' delimiters.)

=item check ( [$checker] )

Uses the specified L<XML::Checker> to validate the document.
If no XML::Checker is supplied, a new XML::Checker is created.
See L<XML::Checker> for details.

=item check_sax ( [$checker] )

Similar to check() except it uses the SAX interface to XML::Checker instead of 
the expat interface. This method may disappear or replace check() at some time.

=item createChecker ()

Creates an XML::Checker based on the document's DTD.
The $checker can be reused to check any elements within the document.
Create a new L<XML::Checker> whenever the DOCTYPE section of the document 
is altered!

=back


================================================
FILE: files2rouge/RELEASE-1.5.5/XML/DOM/DocumentFragment.pod
================================================
=head1 NAME

XML::DOM::DocumentFragment - Facilitates cut & paste in XML::DOM documents

=head1 DESCRIPTION

XML::DOM::DocumentFragment extends L<XML::DOM::Node>

DocumentFragment is a "lightweight" or "minimal" Document object. It is
very common to want to be able to extract a portion of a document's
tree or to create a new fragment of a document. Imagine implementing a
user command like cut or rearranging a document by moving fragments
around. It is desirable to have an object which can hold such fragments
and it is quite natural to use a Node for this purpose. While it is
true that a Document object could fulfil this role, a Document object
can potentially be a heavyweight object, depending on the underlying
implementation. What is really needed for this is a very lightweight
object. DocumentFragment is such an object.

Furthermore, various operations -- such as inserting nodes as children
of another Node -- may take DocumentFragment objects as arguments; this
results in all the child nodes of the DocumentFragment being moved to
the child list of this node.

The children of a DocumentFragment node are zero or more nodes
representing the tops of any sub-trees defining the structure of the
document. DocumentFragment nodes do not need to be well-formed XML
documents (although they do need to follow the rules imposed upon
well-formed XML parsed entities, which can have multiple top nodes).
For example, a DocumentFragment might have only one child and that
child node could be a Text node. Such a structure model represents
neither an HTML document nor a well-formed XML document.

When a DocumentFragment is inserted into a Document (or indeed any
other Node that may take children) the children of the DocumentFragment
and not the DocumentFragment itself are inserted into the Node. This
makes the DocumentFragment very useful when the user wishes to create
nodes that are siblings; the DocumentFragment acts as the parent of
these nodes so that the user can use the standard methods from the Node
interface, such as insertBefore() and appendChild().


================================================
FILE: files2rouge/RELEASE-1.5.5/XML/DOM/DocumentType.pod
================================================
=head1 NAME

XML::DOM::DocumentType - An XML document type (DTD) in XML::DOM

=head1 DESCRIPTION

XML::DOM::DocumentType extends L<XML::DOM::Node>.

Each Document has a doctype attribute whose value is either null or a
DocumentType object. The DocumentType interface in the DOM Level 1 Core
provides an interface to the list of entities that are defined for the
document, and little else because the effect of namespaces and the
various XML scheme efforts on DTD representation are not clearly
understood as of this writing. 
The DOM Level 1 doesn't support editing DocumentType nodes.

B<Not In DOM Spec>: This implementation has added a lot of extra 
functionality to the DOM Level 1 interface. 
To allow editing of the DocumentType nodes, see XML::DOM::ignoreReadOnly.

=head2 METHODS

=over 4

=item getName

Returns the name of the DTD, i.e. the name immediately following the
DOCTYPE keyword.

=item getEntities

A NamedNodeMap containing the general entities, both external
and internal, declared in the DTD. Duplicates are discarded.
For example in:

 <!DOCTYPE ex SYSTEM "ex.dtd" [
  <!ENTITY foo "foo">
  <!ENTITY bar "bar">
  <!ENTITY % baz "baz">
 ]>
 <ex/>

the interface provides access to foo and bar but not baz.
Every node in this map also implements the Entity interface.

The DOM Level 1 does not support editing entities, therefore
entities cannot be altered in any way.

B<Not In DOM Spec>: See XML::DOM::ignoreReadOnly to edit the DocumentType etc.

=item getNotations

A NamedNodeMap containing the notations declared in the DTD.
Duplicates are discarded. Every node in this map also
implements the Notation interface.

The DOM Level 1 does not support editing notations, therefore
notations cannot be altered in any way.

B<Not In DOM Spec>: See XML::DOM::ignoreReadOnly to edit the DocumentType etc.

=head2 Additional methods not in the DOM Spec

=item Creating and setting the DocumentType

A new DocumentType can be created with:

	$doctype = $doc->createDocumentType ($name, $sysId, $pubId, $internal);

To set (or replace) the DocumentType for a particular document, use:

	$doc->setDocType ($doctype);

=item getSysId and setSysId (sysId)

Returns or sets the system id.

=item getPubId and setPubId (pudId)

Returns or sets the public id.

=item setName (name)

Sets the name of the DTD, i.e. the name immediately following the
DOCTYPE keyword. Note that this should always be the same as the element
tag name of the root element.

=item getAttlistDecl (elemName)

Returns the AttlistDecl for the Element with the specified name, or undef.

=item getElementDecl (elemName)

Returns the ElementDecl for the Element with the specified name, or undef.

=item getEntity (entityName)

Returns the Entity with the specified name, or undef.

=item addAttlistDecl (elemName)

Adds a new AttDecl node with the specified elemName if one doesn't exist yet.
Returns the AttlistDecl (new or existing) node.

=item addElementDecl (elemName, model)

Adds a new ElementDecl node with the specified elemName and model if one doesn't 
exist yet.
Returns the AttlistDecl (new or existing) node. The model is ignored if one
already existed.

=item addEntity (notationName, value, sysId, pubId, ndata, parameter)

Adds a new Entity node. Don't use createEntity and appendChild, because it should
be added to the internal NamedNodeMap containing the entities.

Parameters:
 I<notationName> the entity name.
 I<value>        the entity value.
 I<sysId>        the system id (if any.)
 I<pubId>        the public id (if any.)
 I<ndata>        the NDATA declaration (if any, for general unparsed entities.)
 I<parameter>	 whether it is a parameter entity (%ent;) or not (&ent;).

SysId, pubId and ndata may be undefined.

DOMExceptions:

=over 4

=item * INVALID_CHARACTER_ERR

Raised if the notationName does not conform to the XML spec.

=back

=item addNotation (name, base, sysId, pubId)

Adds a new Notation object. 

Parameters:
 I<name>   the notation name.
 I<base>   the base to be used for resolving a relative URI.
 I<sysId>  the system id.
 I<pubId>  the public id.

Base, sysId, and pubId may all be undefined.
(These parameters are passed by the XML::Parser Notation handler.)

DOMExceptions:

=over 4

=item * INVALID_CHARACTER_ERR

Raised if the notationName does not conform to the XML spec.

=back

=item addAttDef (elemName, attrName, type, default, fixed)

Adds a new attribute definition. It will add the AttDef node to the AttlistDecl
if it exists. If an AttDef with the specified attrName already exists for the
given elemName, this function only generates a warning.

See XML::DOM::AttDef::new for the other parameters.

=item getDefaultAttrValue (elem, attr)

Returns the default attribute value as a string or undef, if none is available.

Parameters:
 I<elem>    The element tagName.
 I<attr>    The attribute name.

=item expandEntity (entity [, parameter])

Expands the specified entity or parameter entity (if parameter=1) and returns
its value as a string, or undef if the entity does not exist.
(The entity name should not contain the '%', '&' or ';' delimiters.)

=back


================================================
FILE: files2rouge/RELEASE-1.5.5/XML/DOM/Element.pod
================================================
=head1 NAME

XML::DOM::Element - An XML element node in XML::DOM

=head1 DESCRIPTION

XML::DOM::Element extends L<XML::DOM::Node>.

By far the vast majority of objects (apart from text) that authors
encounter when traversing a document are Element nodes. Assume the
following XML document:

     <elementExample id="demo">
       <subelement1/>
       <subelement2><subsubelement/></subelement2>
     </elementExample>

When represented using DOM, the top node is an Element node for
"elementExample", which contains two child Element nodes, one for
"subelement1" and one for "subelement2". "subelement1" contains no
child nodes.

Elements may have attributes associated with them; since the Element
interface inherits from Node, the generic Node interface method
getAttributes may be used to retrieve the set of all attributes for an
element. There are methods on the Element interface to retrieve either
an Attr object by name or an attribute value by name. In XML, where an
attribute value may contain entity references, an Attr object should be
retrieved to examine the possibly fairly complex sub-tree representing
the attribute value. On the other hand, in HTML, where all attributes
have simple string values, methods to directly access an attribute
value can safely be used as a convenience.

=head2 METHODS

=over 4

=item getTagName

The name of the element. For example, in:

               <elementExample id="demo">
                       ...
               </elementExample>

tagName has the value "elementExample". Note that this is
case-preserving in XML, as are all of the operations of the
DOM.

=item getAttribute (name)

Retrieves an attribute value by name.

Return Value: The Attr value as a string, or the empty string if that
attribute does not have a specified or default value.

=item setAttribute (name, value)

Adds a new attribute. If an attribute with that name is
already present in the element, its value is changed to be
that of the value parameter. This value is a simple string,
it is not parsed as it is being set. So any markup (such as
syntax to be recognized as an entity reference) is treated as
literal text, and needs to be appropriately escaped by the
implementation when it is written out. In order to assign an
attribute value that contains entity references, the user
must create an Attr node plus any Text and EntityReference
nodes, build the appropriate subtree, and use
setAttributeNode to assign it as the value of an attribute.


DOMExceptions:

=over 4

=item * INVALID_CHARACTER_ERR

Raised if the specified name contains an invalid character.

=item * NO_MODIFICATION_ALLOWED_ERR

Raised if this node is readonly.

=back

=item removeAttribute (name)

Removes an attribute by name. If the removed attribute has a
default value it is immediately replaced.

DOMExceptions:

=over 4

=item * NO_MODIFICATION_ALLOWED_ERR

Raised if this node is readonly.

=back

=item getAttributeNode

Retrieves an Attr node by name.

Return Value: The Attr node with the specified attribute name or undef
if there is no such attribute.

=item setAttributeNode (attr)

Adds a new attribute. If an attribute with that name is
already present in the element, it is replaced by the new one.

Return Value: If the newAttr attribute replaces an existing attribute
with the same name, the previously existing Attr node is
returned, otherwise undef is returned.

DOMExceptions:

=over 4

=item * WRONG_DOCUMENT_ERR

Raised if newAttr was created from a different document than the one that created
the element.

=item * NO_MODIFICATION_ALLOWED_ERR

Raised if this node is readonly.

=item * INUSE_ATTRIBUTE_ERR

Raised if newAttr is already an attribute of another Element object. The DOM
user must explicitly clone Attr nodes to re-use them in other elements.

=back

=item removeAttributeNode (oldAttr)

Removes the specified attribute. If the removed Attr has a default value it is
immediately replaced. If the Attr already is the default value, nothing happens
and nothing is returned.

Parameters:
 I<oldAttr>  The Attr node to remove from the attribute list. 

Return Value: The Attr node that was removed.

DOMExceptions:

=over 4

=item * NO_MODIFICATION_ALLOWED_ERR

Raised if this node is readonly.

=item * NOT_FOUND_ERR

Raised if oldAttr is not an attribute of the element.

=back

=head2 Additional methods not in the DOM Spec

=over 4

=item setTagName (newTagName)

Sets the tag name of the Element. Note that this method is not portable
between DOM implementations.

DOMExceptions:

=over 4

=item * INVALID_CHARACTER_ERR

Raised if the specified name contains an invalid character.

=back

=item check ($checker)

Uses the specified L<XML::Checker> to validate the document.
NOTE: an XML::Checker must be supplied. The checker can be created in
different ways, e.g. when parsing a document with XML::DOM::ValParser,
or with XML::DOM::Document::createChecker().
See L<XML::Checker> for more info.

=back


================================================
FILE: files2rouge/RELEASE-1.5.5/XML/DOM/ElementDecl.pod
================================================
=head1 NAME

XML::DOM::ElementDecl - An XML ELEMENT declaration in XML::DOM

=head1 DESCRIPTION

XML::DOM::ElementDecl extends L<XML::DOM::Node> but is not part of the 
DOM Level 1 specification.

This node represents an Element declaration, e.g.

 <!ELEMENT address (street+, city, state, zip, country?)>

=head2 METHODS

=over 4

=item getName

Returns the Element tagName.

=item getModel and setModel (model)

Returns and sets the model as a string, e.g. 
"(street+, city, state, zip, country?)" in the above example.

=back


================================================
FILE: files2rouge/RELEASE-1.5.5/XML/DOM/Entity.pod
================================================
=head1 NAME

XML::DOM::Entity - An XML ENTITY in XML::DOM

=head1 DESCRIPTION

XML::DOM::Entity extends L<XML::DOM::Node>.

This node represents an Entity declaration, e.g.

 <!ENTITY % draft 'INCLUDE'>

 <!ENTITY hatch-pic SYSTEM "../grafix/OpenHatch.gif" NDATA gif>

The first one is called a parameter entity and is referenced like this: %draft;
The 2nd is a (regular) entity and is referenced like this: &hatch-pic;

=head2 METHODS

=over 4

=item getNotationName

Returns the name of the notation for the entity.

I<Not Implemented> The DOM Spec says: For unparsed entities, the name of the 
notation for the entity. For parsed entities, this is null.
(This implementation does not support unparsed entities.)

=item getSysId

Returns the system id, or undef.

=item getPubId

Returns the public id, or undef.

=back

=head2 Additional methods not in the DOM Spec

=over 4

=item isParameterEntity

Whether it is a parameter entity (%ent;) or not (&ent;)

=item getValue

Returns the entity value.

=item getNdata

Returns the NDATA declaration (for general unparsed entities), or undef.

=back


================================================
FILE: files2rouge/RELEASE-1.5.5/XML/DOM/EntityReference.pod
================================================
=head1 NAME

XML::DOM::EntityReference - An XML ENTITY reference in XML::DOM

=head1 DESCRIPTION

XML::DOM::EntityReference extends L<XML::DOM::Node>.

EntityReference objects may be inserted into the structure model when
an entity reference is in the source document, or when the user wishes
to insert an entity reference. Note that character references and
references to predefined entities are considered to be expanded by the
HTML or XML processor so that characters are represented by their
Unicode equivalent rather than by an entity reference. Moreover, the
XML processor may completely expand references to entities while
building the structure model, instead of providing EntityReference
objects. If it does provide such objects, then for a given
EntityReference node, it may be that there is no Entity node
representing the referenced entity; but if such an Entity exists, then
the child list of the EntityReference node is the same as that of the
Entity node. As with the Entity node, all descendants of the
EntityReference are readonly.

The resolution of the children of the EntityReference (the replacement
value of the referenced Entity) may be lazily evaluated; actions by the
user (such as calling the childNodes method on the EntityReference
node) are assumed to trigger the evaluation.


================================================
FILE: files2rouge/RELEASE-1.5.5/XML/DOM/NamedNodeMap.pm
================================================
######################################################################
package XML::DOM::NamedNodeMap;
######################################################################

use strict;

use Carp;
use XML::DOM::DOMException;
use XML::DOM::NodeList;

use vars qw( $Special );

# Constant definition:
# Note: a real Name should have at least 1 char, so nobody else should use this
$Special = "";

sub new 
{
    my ($class, %args) = @_;

    $args{Values} = new XML::DOM::NodeList;

    # Store all NamedNodeMap properties in element $Special
    bless { $Special => \%args}, $class;
}

sub getNamedItem 
{
    # Don't return the $Special item!
    ($_[1] eq $Special) ? undef : $_[0]->{$_[1]};
}

sub setNamedItem 
{
    my ($self, $node) = @_;
    my $prop = $self->{$Special};

    my $name = $node->getNodeName;

    if ($XML::DOM::SafeMode)
    {
	croak new XML::DOM::DOMException (NO_MODIFICATION_ALLOWED_ERR)
	    if $self->isReadOnly;

	croak new XML::DOM::DOMException (WRONG_DOCUMENT_ERR)
	    if $node->[XML::DOM::Node::_Doc] != $prop->{Doc};

	croak new XML::DOM::DOMException (INUSE_ATTRIBUTE_ERR)
	    if defined ($node->[XML::DOM::Node::_UsedIn]);

	croak new XML::DOM::DOMException (INVALID_CHARACTER_ERR,
		      "can't add name with NodeName [$name] to NamedNodeMap")
	    if $name eq $Special;
    }

    my $values = $prop->{Values};
    my $index = -1;

    my $prev = $self->{$name};
    if (defined $prev)
    {
	# decouple previous node
	$prev->decoupleUsedIn;

	# find index of $prev
	$index = 0;
	for my $val (@{$values})
	{
	    last if ($val == $prev);
	    $index++;
	}
    }

    $self->{$name} = $node;    
    $node->[XML::DOM::Node::_UsedIn] = $self;

    if ($index == -1)
    {
	push (@{$values}, $node);
    }
    else	# replace previous node with new node
    {
	splice (@{$values}, $index, 1, $node);
    }
    
    $prev;
}

sub removeNamedItem 
{
    my ($self, $name) = @_;

    # Be careful that user doesn't delete $Special node!
    croak new XML::DOM::DOMException (NOT_FOUND_ERR)
        if $name eq $Special;

    my $node = $self->{$name};

    croak new XML::DOM::DOMException (NOT_FOUND_ERR)
        unless defined $node;

    # The DOM Spec doesn't mention this Exception - I think it's an oversight
    croak new XML::DOM::DOMException (NO_MODIFICATION_ALLOWED_ERR)
	if $self->isReadOnly;

    $node->decoupleUsedIn;
    delete $self->{$name};

    # remove node from Values list
    my $values = $self->getValues;
    my $index = 0;
    for my $val (@{$values})
    {
	if ($val == $node)
	{
	    splice (@{$values}, $index, 1, ());
	    last;
	}
	$index++;
    }
    $node;
}

# The following 2 are really bogus. DOM should use an iterator instead (Clark)

sub item 
{
    my ($self, $item) = @_;
    $self->{$Special}->{Values}->[$item];
}

sub getLength 
{
    my ($self) = @_;
    my $vals = $self->{$Special}->{Values};
    int (@$vals);
}

#------------------------------------------------------------
# Extra method implementations

sub isReadOnly
{
    return 0 if $XML::DOM::IgnoreReadOnly;

    my $used = $_[0]->{$Special}->{UsedIn};
    defined $used ? $used->isReadOnly : 0;
}

sub cloneNode
{
    my ($self, $deep) = @_;
    my $prop = $self->{$Special};

    my $map = new XML::DOM::NamedNodeMap (Doc => $prop->{Doc});
    # Not copying Parent property on purpose! 

    local $XML::DOM::IgnoreReadOnly = 1;	# temporarily...

    for my $val (@{$prop->{Values}})
    {
	my $key = $val->getNodeName;

	my $newNode = $val->cloneNode ($deep);
	$newNode->[XML::DOM::Node::_UsedIn] = $map;
	$map->{$key} = $newNode;
	push (@{$map->{$Special}->{Values}}, $newNode);
    }

    $map;
}

sub setOwnerDocument
{
    my ($self, $doc) = @_;
    my $special = $self->{$Special};

    $special->{Doc} = $doc;
    for my $kid (@{$special->{Values}})
    {
	$kid->setOwnerDocument ($doc);
    }
}

sub getChildIndex
{
    my ($self, $attr) = @_;
    my $i = 0;
    for my $kid (@{$self->{$Special}->{Values}})
    {
	return $i if $kid == $attr;
	$i++;
    }
    -1;	# not found
}

sub getValues
{
    wantarray ? @{ $_[0]->{$Special}->{Values} } : $_[0]->{$Special}->{Values};
}

# Remove circular dependencies. The NamedNodeMap and its values should
# not be used afterwards.
sub dispose
{
    my $self = shift;

    for my $kid (@{$self->getValues})
    {
	undef $kid->[XML::DOM::Node::_UsedIn]; # was delete
	$kid->dispose;
    }

    delete $self->{$Special}->{Doc};
    delete $self->{$Special}->{Parent};
    delete $self->{$Special}->{Values};

    for my $key (keys %$self)
    {
	delete $self->{$key};
    }
}

sub setParentNode
{
    $_[0]->{$Special}->{Parent} = $_[1];
}

sub getProperty
{
    $_[0]->{$Special}->{$_[1]};
}

#?? remove after debugging
sub toString
{
    my ($self) = @_;
    my $str = "NamedNodeMap[";
    while (my ($key, $val) = each %$self)
    {
	if ($key eq $Special)
	{
	    $str .= "##Special (";
	    while (my ($k, $v) = each %$val)
	    {
		if ($k eq "Values")
		{
		    $str .= $k . " => [";
		    for my $a (@$v)
		    {
#			$str .= $a->getNodeName . "=" . $a . ",";
			$str .= $a->toString . ",";
		    }
		    $str .= "], ";
		}
		else
		{
		    $str .= $k . " => " . $v . ", ";
		}
	    }
	    $str .= "), ";
	}
	else
	{
	    $str .= $key . " => " . $val . ", ";
	}
    }
    $str . "]";
}

1; # package return code


================================================
FILE: files2rouge/RELEASE-1.5.5/XML/DOM/NamedNodeMap.pod
================================================
=head1 NAME

XML::DOM::NamedNodeMap - A hash table interface for XML::DOM

=head1 DESCRIPTION

Objects implementing the NamedNodeMap interface are used to represent
collections of nodes that can be accessed by name. Note that
NamedNodeMap does not inherit from NodeList; NamedNodeMaps are not
maintained in any particular order. Objects contained in an object
implementing NamedNodeMap may also be accessed by an ordinal index, but
this is simply to allow convenient enumeration of the contents of a
NamedNodeMap, and does not imply that the DOM specifies an order to
these Nodes.

Note that in this implementation, the objects added to a NamedNodeMap
are kept in order.

=head2 METHODS

=over 4

=item getNamedItem (name)

Retrieves a node specified by name.

Return Value: A Node (of any type) with the specified name, or undef if
the specified name did not identify any node in the map.

=item setNamedItem (arg)

Adds a node using its nodeName attribute.

As the nodeName attribute is used to derive the name which
the node must be stored under, multiple nodes of certain
types (those that have a "special" string value) cannot be
stored as the names would clash. This is seen as preferable
to allowing nodes to be aliased.

Parameters:
 I<arg>  A node to store in a named node map. 

The node will later be accessible using the value of the nodeName
attribute of the node. If a node with that name is
already present in the map, it is replaced by the new one.

Return Value: If the new Node replaces an existing node with the same
name the previously existing Node is returned, otherwise undef is returned.

DOMExceptions:

=over 4

=item * WRONG_DOCUMENT_ERR

Raised if arg was created from a different document than the one that 
created the NamedNodeMap.

=item * NO_MODIFICATION_ALLOWED_ERR

Raised if this NamedNodeMap is readonly.

=item * INUSE_ATTRIBUTE_ERR

Raised if arg is an Attr that is already an attribute of another Element object.
The DOM user must explicitly clone Attr nodes to re-use them in other elements.

=back

=item removeNamedItem (name)

Removes a node specified by name. If the removed node is an
Attr with a default value it is immediately replaced.

Return Value: The node removed from the map or undef if no node with
such a name exists.

DOMException:

=over 4

=item * NOT_FOUND_ERR

Raised if there is no node named name in the map.

=back

=item item (index)

Returns the indexth item in the map. If index is greater than
or equal to the number of nodes in the map, this returns undef.

Return Value: The node at the indexth position in the NamedNodeMap, or
undef if that is not a valid index.

=item getLength

Returns the number of nodes in the map. The range of valid child node
indices is 0 to length-1 inclusive.

=back

=head2 Additional methods not in the DOM Spec

=over 4

=item getValues

Returns a NodeList with the nodes contained in the NamedNodeMap.
The NodeList is "live", in that it reflects changes made to the NamedNodeMap.

When this method is called in a list context, it returns a regular perl list
containing the values. Note that this list is not "live". E.g.

 @list = $map->getValues;	 # returns a perl list
 $nodelist = $map->getValues;    # returns a NodeList (object ref.)
 for my $val ($map->getValues)   # iterate over the values

=item getChildIndex (node)

Returns the index of the node in the NodeList as returned by getValues, or -1
if the node is not in the NamedNodeMap.

=item dispose

Removes all circular references in this NamedNodeMap and its descendants so the 
objects can be claimed for garbage collection. The objects should not be used
afterwards.

=back


================================================
FILE: files2rouge/RELEASE-1.5.5/XML/DOM/Node.pod
================================================
=head1 NAME

XML::DOM::Node - Super class of all nodes in XML::DOM

=head1 DESCRIPTION

XML::DOM::Node is the super class of all nodes in an XML::DOM document.
This means that all nodes that subclass XML::DOM::Node also inherit all
the methods that XML::DOM::Node implements.

=head2 GLOBAL VARIABLES

=over 4

=item @NodeNames

The variable @XML::DOM::Node::NodeNames maps the node type constants to strings.
It is used by XML::DOM::Node::getNodeTypeName.

=back

=head2 METHODS

=over 4

=item getNodeType

Return an integer indicating the node type. See XML::DOM constants.

=item getNodeName

Return a property or a hardcoded string, depending on the node type.
Here are the corresponding functions or values:

 Attr			getName
 AttDef			getName
 AttlistDecl		getName
 CDATASection		"#cdata-section"
 Comment		"#comment"
 Document		"#document"
 DocumentType		getNodeName
 DocumentFragment	"#document-fragment"
 Element		getTagName
 ElementDecl		getName
 EntityReference	getEntityName
 Entity			getNotationName
 Notation		getName
 ProcessingInstruction	getTarget
 Text			"#text"
 XMLDecl		"#xml-declaration"

B<Not In DOM Spec>: AttDef, AttlistDecl, ElementDecl and XMLDecl were added for
completeness.

=item getNodeValue and setNodeValue (value)

Returns a string or undef, depending on the node type. This method is provided 
for completeness. In other languages it saves the programmer an upcast.
The value is either available thru some other method defined in the subclass, or
else undef is returned. Here are the corresponding methods: 
Attr::getValue, Text::getData, CDATASection::getData, Comment::getData, 
ProcessingInstruction::getData.

=item getParentNode and setParentNode (parentNode)

The parent of this node. All nodes, except Document,
DocumentFragment, and Attr may have a parent. However, if a
node has just been created and not yet added to the tree, or
if it has been removed from the tree, this is undef.

=item getChildNodes

A NodeList that contains all children of this node. If there
are no children, this is a NodeList containing no nodes. The
content of the returned NodeList is "live" in the sense that,
for instance, changes to the children of the node object that
it was created from are immediately reflected in the nodes
returned by the NodeList accessors; it is not a static
snapshot of the content of the node. This is true for every
NodeList, including the ones returned by the
getElementsByTagName method.

NOTE: this implementation does not return a "live" NodeList for
getElementsByTagName. See L<CAVEATS>.

When this method is called in a list context, it returns a regular perl list
containing the child nodes. Note that this list is not "live". E.g.

 @list = $node->getChildNodes;	      # returns a perl list
 $nodelist = $node->getChildNodes;    # returns a NodeList (object reference)
 for my $kid ($node->getChildNodes)   # iterate over the children of $node

=item getFirstChild

The first child of this node. If there is no such node, this returns undef.

=item getLastChild

The last child of this node. If there is no such node, this returns undef.

=item getPreviousSibling

The node immediately preceding this node. If there is no such 
node, this returns undef.

=item getNextSibling

The node immediately following this node. If there is no such node, this returns 
undef.

=item getAttributes

A NamedNodeMap containing the attributes (Attr nodes) of this node 
(if it is an Element) or undef otherwise.
Note that adding/removing attributes from the returned object, also adds/removes
attributes from the Element node that the NamedNodeMap came from.

=item getOwnerDocument

The Document object associated with this node. This is also
the Document object used to create new nodes. When this node
is a Document this is undef.

=item insertBefore (newChild, refChild)

Inserts the node newChild before the existing child node
refChild. If refChild is undef, insert newChild at the end of
the list of children.

If newChild is a DocumentFragment object, all of its children
are inserted, in the same order, before refChild. If the
newChild is already in the tree, it is first removed.

Return Value: The node being inserted.

DOMExceptions:

=over 4

=item * HIERARCHY_REQUEST_ERR

Raised if this node is of a type that does not allow children of the type of
the newChild node, or if the node to insert is one of this node's ancestors.

=item * WRONG_DOCUMENT_ERR

Raised if newChild was created from a different document than the one that 
created this node.

=item * NO_MODIFICATION_ALLOWED_ERR

Raised if this node is readonly.

=item * NOT_FOUND_ERR

Raised if refChild is not a child of this node.

=back

=item replaceChild (newChild, oldChild)

Replaces the child node oldChild with newChild in the list of
children, and returns the oldChild node. If the newChild is
already in the tree, it is first removed.

Return Value: The node replaced.

DOMExceptions:

=over 4

=item * HIERARCHY_REQUEST_ERR

Raised if this node is of a type that does not allow children of the type of
the newChild node, or it the node to put in is one of this node's ancestors.

=item * WRONG_DOCUMENT_ERR

Raised if newChild was created from a different document than the one that 
created this node.

=item * NO_MODIFICATION_ALLOWED_ERR

Raised if this node is readonly.

=item * NOT_FOUND_ERR

Raised if oldChild is not a child of this node.

=back

=item removeChild (oldChild)

Removes the child node indicated by oldChild from the list of
children, and returns it.

Return Value: The node removed.

DOMExceptions:

=over 4

=item * NO_MODIFICATION_ALLOWED_ERR

Raised if this node is readonly.

=item * NOT_FOUND_ERR

Raised if oldChild is not a child of this node.

=back

=item appendChild (newChild)

Adds the node newChild to the end of the list of children of
this node. If the newChild is already in the tree, it is
first removed. If it is a DocumentFragment object, the entire contents of 
the document fragment are moved into the child list of this node

Return Value: The node added.

DOMExceptions:

=over 4

=item * HIERARCHY_REQUEST_ERR

Raised if this node is of a type that does not allow children of the type of
the newChild node, or if the node to append is one of this node's ancestors.

=item * WRONG_DOCUMENT_ERR

Raised if newChild was created from a different document than the one that 
created this node.

=item * NO_MODIFICATION_ALLOWED_ERR

Raised if this node is readonly.

=back

=item hasChildNodes

This is a convenience method to allow easy determination of
whether a node has any children.

Return Value: 1 if the node has any children, 0 otherwise.

=item cloneNode (deep)

Returns a duplicate of this node, i.e., serves as a generic
copy constructor for nodes. The duplicate node has no parent
(parentNode returns undef.).

Cloning an Element copies all attributes and their values,
including those generated by the XML processor to represent
defaulted attributes, but this method does not copy any text
it contains unless it is a deep clone, since the text is
contained in a child Text node. Cloning any other type of
node simply returns a copy of this node.

Parameters: 
 I<deep>   If true, recursively clone the subtree under the specified node.
If false, clone only the node itself (and its attributes, if it is an Element).

Return Value: The duplicate node.

=item normalize

Puts all Text nodes in the full depth of the sub-tree
underneath this Element into a "normal" form where only
markup (e.g., tags, comments, processing instructions, CDATA
sections, and entity references) separates Text nodes, i.e.,
there are no adjacent Text nodes. This can be used to ensure
that the DOM view of a document is the same as if it were
saved and re-loaded, and is useful when operations (such as
XPointer lookups) that depend on a particular document tree
structure are to be used.

B<Not In DOM Spec>: In the DOM Spec this method is defined in the Element and 
Document class interfaces only, but it doesn't hurt to have it here...

=item getElementsByTagName (name [, recurse])

Returns a NodeList of all descendant elements with a given
tag name, in the order in which they would be encountered in
a preorder traversal of the Element tree.

Parameters:
 I<name>  The name of the tag to match on. The special value "*" matches all tags.
 I<recurse>  Whether it should return only direct child nodes (0) or any descendant that matches the tag name (1). This argument is optional and defaults to 1. It is not part of the DOM spec.

Return Value: A list of matching Element nodes.

NOTE: this implementation does not return a "live" NodeList for
getElementsByTagName. See L<CAVEATS>.

When this method is called in a list context, it returns a regular perl list
containing the result nodes. E.g.

 @list = $node->getElementsByTagName("tag");       # returns a perl list
 $nodelist = $node->getElementsByTagName("tag");   # returns a NodeList (object ref.)
 for my $elem ($node->getElementsByTagName("tag")) # iterate over the result nodes

=back

=head2 Additional methods not in the DOM Spec

=over 4

=item getNodeTypeName

Return the string describing the node type. 
E.g. returns "ELEMENT_NODE" if getNodeType returns ELEMENT_NODE.
It uses @XML::DOM::Node::NodeNames.

=item toString

Returns the entire subtree as a string.

=item printToFile (filename)

Prints the entire subtree to the file with the specified filename.

Croaks: if the file could not be opened for writing.

=item printToFileHandle (handle)

Prints the entire subtree to the file handle.
E.g. to print to STDOUT:

 $node->printToFileHandle (\*STDOUT);

=item print (obj)

Prints the entire subtree using the object's print method. E.g to print to a
FileHandle object:

 $f = new FileHandle ("file.out", "w");
 $node->print ($f);

=item getChildIndex (child)

Returns the index of the child node in the list returned by getChildNodes.

Return Value: the index or -1 if the node is not found.

=item getChildAtIndex (index)

Returns the child node at the specifed index or undef.

=item addText (text)

Appends the specified string to the last child if it is a Text node, or else 
appends a new Text node (with the specified text.)

Return Value: the last child if it was a Text node or else the new Text node.

=item dispose

Removes all circular references in this node and its descendants so the 
objects can be claimed for garbage collection. The objects should not be used
afterwards.

=item setOwnerDocument (doc)

Sets the ownerDocument property of this node and all its children (and 
attributes etc.) to the specified document.
This allows the user to cut and paste document subtrees between different
XML::DOM::Documents. The node should be removed from the original document
first, before calling setOwnerDocument.

This method does nothing when called on a Document node.

=item isAncestor (parent)

Returns 1 if parent is an ancestor of this node or if it is this node itself.

=item expandEntityRefs (str)

Expands all the entity references in the string and returns the result.
The entity references can be character references (e.g. "&#123;" or "&#x1fc2"),
default entity references ("&quot;", "&gt;", "&lt;", "&apos;" and "&amp;") or
entity references defined in Entity objects as part of the DocumentType of
the owning Document. Character references are expanded into UTF-8.
Parameter entity references (e.g. %ent;) are not expanded.

=item to_sax ( %HANDLERS )

E.g.

 $node->to_sax (DocumentHandler => $my_handler, 
		Handler => $handler2 );

%HANDLERS may contain the following handlers:

=over 4

=item * DocumentHandler

=item * DTDHandler

=item * EntityResolver

=item * Handler 

Default handler when one of the above is not specified

=back

Each XML::DOM::Node generates the appropriate SAX callbacks (for the
appropria
Download .txt
gitextract_z1nyx_hr/

├── .gitignore
├── Dockerfile
├── LICENSE
├── MANIFEST.in
├── README.md
├── experiments/
│   └── openNMT.0.md
├── files2rouge/
│   ├── RELEASE-1.5.5/
│   │   ├── README.txt
│   │   ├── RELEASE-NOTE.txt
│   │   ├── ROUGE-1.5.5.pl
│   │   ├── XML/
│   │   │   ├── DOM/
│   │   │   │   ├── AttDef.pod
│   │   │   │   ├── AttlistDecl.pod
│   │   │   │   ├── Attr.pod
│   │   │   │   ├── CDATASection.pod
│   │   │   │   ├── CharacterData.pod
│   │   │   │   ├── Comment.pod
│   │   │   │   ├── DOMException.pm
│   │   │   │   ├── DOMImplementation.pod
│   │   │   │   ├── Document.pod
│   │   │   │   ├── DocumentFragment.pod
│   │   │   │   ├── DocumentType.pod
│   │   │   │   ├── Element.pod
│   │   │   │   ├── ElementDecl.pod
│   │   │   │   ├── Entity.pod
│   │   │   │   ├── EntityReference.pod
│   │   │   │   ├── NamedNodeMap.pm
│   │   │   │   ├── NamedNodeMap.pod
│   │   │   │   ├── Node.pod
│   │   │   │   ├── NodeList.pm
│   │   │   │   ├── NodeList.pod
│   │   │   │   ├── Notation.pod
│   │   │   │   ├── Parser.pod
│   │   │   │   ├── PerlSAX.pm
│   │   │   │   ├── ProcessingInstruction.pod
│   │   │   │   ├── Text.pod
│   │   │   │   └── XMLDecl.pod
│   │   │   ├── DOM.pm
│   │   │   ├── Handler/
│   │   │   │   └── BuildDOM.pm
│   │   │   └── RegExp.pm
│   │   ├── data/
│   │   │   ├── WordNet-1.6-Exceptions/
│   │   │   │   ├── adj.exc
│   │   │   │   ├── adv.exc
│   │   │   │   ├── buildExeptionDB.pl
│   │   │   │   ├── noun.exc
│   │   │   │   └── verb.exc
│   │   │   ├── WordNet-2.0-Exceptions/
│   │   │   │   ├── adj.exc
│   │   │   │   ├── adv.exc
│   │   │   │   ├── buildExeptionDB.pl
│   │   │   │   ├── noun.exc
│   │   │   │   └── verb.exc
│   │   │   └── smart_common_words.txt
│   │   └── runROUGE-test.pl
│   ├── __init__.py
│   ├── files2rouge.py
│   ├── settings.py
│   └── utils.py
├── setup.py
└── setup_rouge.py
Download .txt
SYMBOL INDEX (14 symbols across 4 files)

FILE: files2rouge/files2rouge.py
  function run (line 29) | def run(summ_path,
  function main (line 101) | def main():

FILE: files2rouge/settings.py
  function _default_path (line 8) | def _default_path():
  class Settings (line 13) | class Settings:
    method __init__ (line 15) | def __init__(self, path=None):
    method _load (line 18) | def _load(self):
    method _generate (line 29) | def _generate(self, data):
    method set_data (line 34) | def set_data(self, data):

FILE: files2rouge/utils.py
  function mkdir (line 6) | def mkdir(path):
  function mkdirs (line 10) | def mkdirs(paths):
  function line_count (line 15) | def line_count(path):
  function tee (line 23) | def tee(saveto, *args, **kwargs):
  function split_files (line 31) | def split_files(model_path, system_path, model_dir, system_dir,

FILE: setup_rouge.py
  function copy_rouge (line 14) | def copy_rouge():
Condensed preview — 56 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (706K chars).
[
  {
    "path": ".gitignore",
    "chars": 1045,
    "preview": "# Byte-compiled / optimized / DLL files\n__pycache__/\n*.py[cod]\n*$py.class\n\n# C extensions\n*.so\n\n# Distribution / packagi"
  },
  {
    "path": "Dockerfile",
    "chars": 385,
    "preview": "FROM python:3.6-stretch\n\nMAINTAINER cgebe\n\nRUN apt-get update && \\\n    apt-get install -y cpanminus && \\\n    cpanm --for"
  },
  {
    "path": "LICENSE",
    "chars": 1057,
    "preview": "MIT License\n\nCopyright (c) 2017 \n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this s"
  },
  {
    "path": "MANIFEST.in",
    "chars": 34,
    "preview": "include files2rouge/settings.json\n"
  },
  {
    "path": "README.md",
    "chars": 3676,
    "preview": "# Files2ROUGE\n## Motivations\nGiven two files with the same number of lines, `files2rouge` calculates the average ROUGE s"
  },
  {
    "path": "experiments/openNMT.0.md",
    "chars": 2112,
    "preview": "\n# Motivations\n\n* Replicate results for Text Summarization task on Gigaword (see 'About')\n* Getting started with Text Su"
  },
  {
    "path": "files2rouge/RELEASE-1.5.5/README.txt",
    "chars": 16359,
    "preview": "A Brief Introduction of the ROUGE Summary Evaluation Package\nby Chin-Yew LIN \nUniveristy of Southern California/Informat"
  },
  {
    "path": "files2rouge/RELEASE-1.5.5/RELEASE-NOTE.txt",
    "chars": 14426,
    "preview": "# Revision Note: 05/26/2005, Chin-Yew LIN\n#              1.5.5\n#              (1) Correct stemming on multi-token BE hea"
  },
  {
    "path": "files2rouge/RELEASE-1.5.5/ROUGE-1.5.5.pl",
    "chars": 103888,
    "preview": "#!/usr/bin/perl -w\n# Add current dir to include\nuse File::Basename;\nuse lib dirname (__FILE__);\n\n# Version:     ROUGE v1"
  },
  {
    "path": "files2rouge/RELEASE-1.5.5/XML/DOM/AttDef.pod",
    "chars": 662,
    "preview": "=head1 NAME\n\nXML::DOM::AttDef - A single XML attribute definition in an ATTLIST in XML::DOM \n\n=head1 DESCRIPTION\n\nXML::D"
  },
  {
    "path": "files2rouge/RELEASE-1.5.5/XML/DOM/AttlistDecl.pod",
    "chars": 1190,
    "preview": "=head1 NAME\n\nXML::DOM::AttlistDecl - An XML ATTLIST declaration in XML::DOM\n\n=head1 DESCRIPTION\n\nXML::DOM::AttlistDecl e"
  },
  {
    "path": "files2rouge/RELEASE-1.5.5/XML/DOM/Attr.pod",
    "chars": 2693,
    "preview": "=head1 NAME\n\nXML::DOM::Attr - An XML attribute in XML::DOM\n\n=head1 DESCRIPTION\n\nXML::DOM::Attr extends L<XML::DOM::Node>"
  },
  {
    "path": "files2rouge/RELEASE-1.5.5/XML/DOM/CDATASection.pod",
    "chars": 1257,
    "preview": "=head1 NAME\n\nXML::DOM::CDATASection - Escaping XML text blocks in XML::DOM\n\n=head1 DESCRIPTION\n\nXML::DOM::CDATASection e"
  },
  {
    "path": "files2rouge/RELEASE-1.5.5/XML/DOM/CharacterData.pod",
    "chars": 2812,
    "preview": "=head1 NAME\n\nXML::DOM::CharacterData - Common interface for Text, CDATASections and Comments\n\n=head1 DESCRIPTION\n\nXML::D"
  },
  {
    "path": "files2rouge/RELEASE-1.5.5/XML/DOM/Comment.pod",
    "chars": 438,
    "preview": "=head1 NAME\n\nXML::DOM::Comment - An XML comment in XML::DOM\n\n=head1 DESCRIPTION\n\nXML::DOM::Comment extends L<XML::DOM::C"
  },
  {
    "path": "files2rouge/RELEASE-1.5.5/XML/DOM/DOMException.pm",
    "chars": 1892,
    "preview": "######################################################################\npackage XML::DOM::DOMException;\n#################"
  },
  {
    "path": "files2rouge/RELEASE-1.5.5/XML/DOM/DOMImplementation.pod",
    "chars": 665,
    "preview": "=head1 NAME\n\nXML::DOM::DOMImplementation - Information about XML::DOM implementation\n\n=head1 DESCRIPTION\n\nThe DOMImpleme"
  },
  {
    "path": "files2rouge/RELEASE-1.5.5/XML/DOM/Document.pod",
    "chars": 5651,
    "preview": "=head1 NAME\n\nXML::DOM::Document - An XML document node in XML::DOM\n\n=head1 DESCRIPTION\n\nXML::DOM::Document extends L<XML"
  },
  {
    "path": "files2rouge/RELEASE-1.5.5/XML/DOM/DocumentFragment.pod",
    "chars": 2075,
    "preview": "=head1 NAME\n\nXML::DOM::DocumentFragment - Facilitates cut & paste in XML::DOM documents\n\n=head1 DESCRIPTION\n\nXML::DOM::D"
  },
  {
    "path": "files2rouge/RELEASE-1.5.5/XML/DOM/DocumentType.pod",
    "chars": 5110,
    "preview": "=head1 NAME\n\nXML::DOM::DocumentType - An XML document type (DTD) in XML::DOM\n\n=head1 DESCRIPTION\n\nXML::DOM::DocumentType"
  },
  {
    "path": "files2rouge/RELEASE-1.5.5/XML/DOM/Element.pod",
    "chars": 4949,
    "preview": "=head1 NAME\n\nXML::DOM::Element - An XML element node in XML::DOM\n\n=head1 DESCRIPTION\n\nXML::DOM::Element extends L<XML::D"
  },
  {
    "path": "files2rouge/RELEASE-1.5.5/XML/DOM/ElementDecl.pod",
    "chars": 529,
    "preview": "=head1 NAME\n\nXML::DOM::ElementDecl - An XML ELEMENT declaration in XML::DOM\n\n=head1 DESCRIPTION\n\nXML::DOM::ElementDecl e"
  },
  {
    "path": "files2rouge/RELEASE-1.5.5/XML/DOM/Entity.pod",
    "chars": 1100,
    "preview": "=head1 NAME\n\nXML::DOM::Entity - An XML ENTITY in XML::DOM\n\n=head1 DESCRIPTION\n\nXML::DOM::Entity extends L<XML::DOM::Node"
  },
  {
    "path": "files2rouge/RELEASE-1.5.5/XML/DOM/EntityReference.pod",
    "chars": 1305,
    "preview": "=head1 NAME\n\nXML::DOM::EntityReference - An XML ENTITY reference in XML::DOM\n\n=head1 DESCRIPTION\n\nXML::DOM::EntityRefere"
  },
  {
    "path": "files2rouge/RELEASE-1.5.5/XML/DOM/NamedNodeMap.pm",
    "chars": 5311,
    "preview": "######################################################################\npackage XML::DOM::NamedNodeMap;\n#################"
  },
  {
    "path": "files2rouge/RELEASE-1.5.5/XML/DOM/NamedNodeMap.pod",
    "chars": 3642,
    "preview": "=head1 NAME\n\nXML::DOM::NamedNodeMap - A hash table interface for XML::DOM\n\n=head1 DESCRIPTION\n\nObjects implementing the "
  },
  {
    "path": "files2rouge/RELEASE-1.5.5/XML/DOM/Node.pod",
    "chars": 13079,
    "preview": "=head1 NAME\n\nXML::DOM::Node - Super class of all nodes in XML::DOM\n\n=head1 DESCRIPTION\n\nXML::DOM::Node is the super clas"
  },
  {
    "path": "files2rouge/RELEASE-1.5.5/XML/DOM/NodeList.pm",
    "chars": 695,
    "preview": "######################################################################\npackage XML::DOM::NodeList;\n#####################"
  },
  {
    "path": "files2rouge/RELEASE-1.5.5/XML/DOM/NodeList.pod",
    "chars": 1095,
    "preview": "=head1 NAME\n\nXML::DOM::NodeList - A node list as used by XML::DOM\n\n=head1 DESCRIPTION\n\nThe NodeList interface provides t"
  },
  {
    "path": "files2rouge/RELEASE-1.5.5/XML/DOM/Notation.pod",
    "chars": 874,
    "preview": "=head1 NAME\n\nXML::DOM::Notation - An XML NOTATION in XML::DOM\n\n=head1 DESCRIPTION\n\nXML::DOM::Notation extends L<XML::DOM"
  },
  {
    "path": "files2rouge/RELEASE-1.5.5/XML/DOM/Parser.pod",
    "chars": 2716,
    "preview": "=head1 NAME\n\nXML::DOM::Parser - An XML::Parser that builds XML::DOM document structures\n\n=head1 SYNOPSIS\n\n use XML::DOM;"
  },
  {
    "path": "files2rouge/RELEASE-1.5.5/XML/DOM/PerlSAX.pm",
    "chars": 940,
    "preview": "package XML::DOM::PerlSAX;\nuse strict;\n\nBEGIN\n{\n    if ($^W)\n    {\n\twarn \"XML::DOM::PerlSAX has been renamed to XML::Han"
  },
  {
    "path": "files2rouge/RELEASE-1.5.5/XML/DOM/ProcessingInstruction.pod",
    "chars": 802,
    "preview": "=head1 NAME\n\nXML::DOM::ProcessingInstruction - An XML processing instruction in XML::DOM\n\n=head1 DESCRIPTION\n\nXML::DOM::"
  },
  {
    "path": "files2rouge/RELEASE-1.5.5/XML/DOM/Text.pod",
    "chars": 1886,
    "preview": "=head1 NAME\n\nXML::DOM::Text - A piece of XML text in XML::DOM\n\n=head1 DESCRIPTION\n\nXML::DOM::Text extends L<XML::DOM::Ch"
  },
  {
    "path": "files2rouge/RELEASE-1.5.5/XML/DOM/XMLDecl.pod",
    "chars": 686,
    "preview": "=head1 NAME\n\nXML::DOM::XMLDecl - XML declaration in XML::DOM\n\n=head1 DESCRIPTION\n\nXML::DOM::XMLDecl extends L<XML::DOM::"
  },
  {
    "path": "files2rouge/RELEASE-1.5.5/XML/DOM.pm",
    "chars": 110674,
    "preview": "################################################################################\n#\n# Perl module: XML::DOM\n#\n# By Enno D"
  },
  {
    "path": "files2rouge/RELEASE-1.5.5/XML/Handler/BuildDOM.pm",
    "chars": 7526,
    "preview": "package XML::Handler::BuildDOM;\nuse strict;\nuse XML::DOM;\n\n#\n# TODO:\n# - add support for parameter entity references\n# -"
  },
  {
    "path": "files2rouge/RELEASE-1.5.5/XML/RegExp.pm",
    "chars": 6065,
    "preview": "package XML::RegExp;\n\nuse vars qw( $BaseChar $Ideographic $Letter $Digit $Extender \n\t     $CombiningChar $NameChar \n\t   "
  },
  {
    "path": "files2rouge/RELEASE-1.5.5/data/WordNet-1.6-Exceptions/adj.exc",
    "chars": 20308,
    "preview": "after after\nairier airy\nairiest airy\nangrier angry\nangriest angry\nartier arty\nartiest arty\nashier ashy\nashiest ashy\nbagg"
  },
  {
    "path": "files2rouge/RELEASE-1.5.5/data/WordNet-1.6-Exceptions/adv.exc",
    "chars": 85,
    "preview": "best well\nbetter well\ndeeper deeply\nfarther far\nfurther far\nharder hard\nhardest hard\n"
  },
  {
    "path": "files2rouge/RELEASE-1.5.5/data/WordNet-1.6-Exceptions/buildExeptionDB.pl",
    "chars": 679,
    "preview": "#!/usr/bin/perl -w\nuse DB_File;\n@ARGV!=3&&die \"Usage: buildExceptionDB.pl WordNet-exception-file-directory exception-fil"
  },
  {
    "path": "files2rouge/RELEASE-1.5.5/data/WordNet-1.6-Exceptions/noun.exc",
    "chars": 111135,
    "preview": "aardwolves aardwolf\nabaci abacus\nabacuses abacus\nabbacies abbacy\nabhenries abhenry\nabilities ability\nabnormalities abnor"
  },
  {
    "path": "files2rouge/RELEASE-1.5.5/data/WordNet-1.6-Exceptions/verb.exc",
    "chars": 83167,
    "preview": "abets abet\nabetted abet\nabetting abet\nabhorred abhor\nabhorring abhor\nabhors abhor\nabided abide\nabides abide\nabiding abid"
  },
  {
    "path": "files2rouge/RELEASE-1.5.5/data/WordNet-2.0-Exceptions/adj.exc",
    "chars": 23019,
    "preview": "acer acer\nafter after\nairier airy\nairiest airy\nall-arounder all-arounder\nangrier angry\nangriest angry\narcher archer\narti"
  },
  {
    "path": "files2rouge/RELEASE-1.5.5/data/WordNet-2.0-Exceptions/adv.exc",
    "chars": 85,
    "preview": "best well\nbetter well\ndeeper deeply\nfarther far\nfurther far\nharder hard\nhardest hard\n"
  },
  {
    "path": "files2rouge/RELEASE-1.5.5/data/WordNet-2.0-Exceptions/buildExeptionDB.pl",
    "chars": 679,
    "preview": "#!/usr/bin/perl -w\nuse DB_File;\n@ARGV!=3&&die \"Usage: buildExceptionDB.pl WordNet-exception-file-directory exception-fil"
  },
  {
    "path": "files2rouge/RELEASE-1.5.5/data/WordNet-2.0-Exceptions/noun.exc",
    "chars": 38063,
    "preview": "aardwolves aardwolf\nabaci abacus\naboideaux aboideau\naboiteaux aboiteau\nabscissae abscissa\nacanthi acanthus\nacari acarus\n"
  },
  {
    "path": "files2rouge/RELEASE-1.5.5/data/WordNet-2.0-Exceptions/verb.exc",
    "chars": 38038,
    "preview": "abetted abet\nabetting abet\nabhorred abhor\nabhorring abhor\nabode abide\nabought aby\nabout-shipped about-ship\nabout-shippin"
  },
  {
    "path": "files2rouge/RELEASE-1.5.5/data/smart_common_words.txt",
    "chars": 3703,
    "preview": "reuters\nap\njan\nfeb\nmar\napr\nmay\njun\njul\naug\nsep\noct\nnov\ndec\ntech\nnews\nindex\nmon\ntue\nwed\nthu\nfri\nsat\n's\na\na's\nable\nabout\na"
  },
  {
    "path": "files2rouge/RELEASE-1.5.5/runROUGE-test.pl",
    "chars": 2616,
    "preview": "#!/usr/bin/perl -w\nuse Cwd;\n$curdir=getcwd;\n$ROUGE=\"../ROUGE-1.5.5.pl\";\nchdir(\"sample-test\");\n$cmd=\"$ROUGE -e ../data -c"
  },
  {
    "path": "files2rouge/__init__.py",
    "chars": 234,
    "preview": "from __future__ import absolute_import\nfrom files2rouge.files2rouge import run\nfrom files2rouge.files2rouge import main\n"
  },
  {
    "path": "files2rouge/files2rouge.py",
    "chars": 5124,
    "preview": "#!/usr/bin/env python\n# -*- coding: utf-8 -*-\n\"\"\"\n    ROUGE scoring for each lines from `ref_path` and `summ_path`\n    i"
  },
  {
    "path": "files2rouge/settings.py",
    "chars": 1228,
    "preview": "import os\nimport json\n\nPATHS = ['ROUGE_path', 'ROUGE_data']\nPARAMS = PATHS + []\n\n\ndef _default_path():\n    _dir, _filena"
  },
  {
    "path": "files2rouge/utils.py",
    "chars": 2441,
    "preview": "#!/usr/bin/env python\nfrom __future__ import print_function\nimport os\n\n\ndef mkdir(path):\n    os.mkdir(path)\n\n\ndef mkdirs"
  },
  {
    "path": "setup.py",
    "chars": 964,
    "preview": "from setuptools import setup, find_packages\n\nversion = \"2.1.0\"\nsetup(\n    name=\"files2rouge\",\n    version=version,\n    d"
  },
  {
    "path": "setup_rouge.py",
    "chars": 1066,
    "preview": "#!/usr/bin/env python\n\"\"\"\n    Utility to copy ROUGE script.\n    It has to be run before `setup.py`\n\n\"\"\"\nimport os\nimport"
  }
]

About this extraction

This page contains the full source code of the pltrdy/files2rouge GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 56 files (648.4 KB), approximately 207.7k tokens, and a symbol index with 14 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Copied to clipboard!