[
  {
    "path": "README.md",
    "content": "# Text-Summarizer-Pytorch\nCombining [A Deep Reinforced Model for Abstractive Summarization](https://arxiv.org/pdf/1705.04304.pdf) and [Get To The Point: Summarization with Pointer-Generator Networks](https://arxiv.org/pdf/1704.04368.pdf)\n\n## Model Description\n* LSTM based Sequence-to-Sequence model for Abstractive Summarization\n* Pointer mechanism for handling Out of Vocabulary (OOV) words [See et al. (2017)](https://arxiv.org/pdf/1704.04368.pdf)\n* Intra-temporal and Intra-decoder attention for handling repeated words [Paulus et al. (2018)](https://arxiv.org/pdf/1705.04304.pdf)\n* Self-critic policy gradient training along with MLE training [Paulus et al. (2018)](https://arxiv.org/pdf/1705.04304.pdf)\n\n## Prerequisites\n* Pytorch\n* Tensorflow\n* Python 2 & 3\n* [rouge](https://github.com/pltrdy/rouge) \n\n## Data\n* Download train and valid pairs (article, title) of OpenNMT provided Gigaword dataset from [here](https://github.com/harvardnlp/sent-summary)\n* Copy files ```train.article.txt```, ```train.title.txt```, ```valid.article.filter.txt```and ```valid.title.filter.txt``` to ```data/unfinished``` folder\n* Files are already preprcessed\n\n## Creating ```.bin``` files and vocab file\n* The model accepts data in the form of ```.bin``` files.\n* To convert ```.txt``` file into ```.bin``` file and chunk them further, run (requires Python 2 & Tensorflow):\n```\npython make_data_files.py\n```\n* You will find the data in ```data/chunked``` folder and vocab file in ```data``` folder\n\n## Training\n* As suggested in [Paulus et al. (2018)](https://arxiv.org/pdf/1705.04304.pdf), first pretrain the seq-to-seq model using MLE (with Python 3):\n```\npython train.py --train_mle=yes --train_rl=no --mle_weight=1.0\n```\n* Next, find the best saved model on validation data by running (with Python 3):\n```\npython eval.py --task=validate --start_from=0005000.tar\n```\n* After finding the best model (lets say ```0100000.tar```) with high rouge-l f score, load it and run (with Python 3):\n```\npython train.py --train_mle=yes --train_rl=yes --mle_weight=0.25 --load_model=0100000.tar --new_lr=0.0001\n```\nfor MLE + RL training (or)\n```\npython train.py --train_mle=no --train_rl=yes --mle_weight=0.0 --load_model=0100000.tar --new_lr=0.0001\n```\nfor RL training\n\n## Validation\n* To perform validation of RL training, run (with Python 3):\n```\npython eval.py --task=validate --start_from=0100000.tar\n```\n## Testing\n* After finding the best model of RL training (lets say ```0200000.tar```), evaluate it on test data & get all rouge metrics by running (with Python 3):\n```\npython eval.py --task=test --load_model=0200000.tar\n```\n\n## Results\n* Rouge scores obtained by using best MLE trained model on test set:  \nscores: {  \n```'rouge-1':``` {'f': 0.4412018559893622, 'p': 0.4814799494024485, 'r': 0.4232331027817015},  \n```'rouge-2':``` {'f': 0.23238981595683728, 'p': 0.2531296070596062, 'r': 0.22407861554997008},  \n```'rouge-l':``` {'f': 0.40477682528278364, 'p': 0.4584684491434479, 'r': 0.40351107200202596}  \n}\n\n* Rouge scores obtained by using best MLE + RL trained model on test set:  \nscores: {  \n```'rouge-1':``` {'f': 0.4499047033247696, 'p': 0.4853756369556345, 'r': 0.43544461386607497},  \n```'rouge-2':``` {'f': 0.24037014314625643, 'p': 0.25903387205387235, 'r': 0.23362662645146298},  \n```'rouge-l':``` {'f': 0.41320241732946406, 'p': 0.4616655167980162, 'r': 0.4144419466382236}  \n}\n\n* Training log file is included in the repository\n\n# Examples\n```article:``` russia 's lower house of parliament was scheduled friday to debate an appeal to the prime minister that challenged the right of u.s.-funded radio liberty to operate in russia following its introduction of broadcasts targeting chechnya .  \n```ref:``` russia 's lower house of parliament mulls challenge to radio liberty  \n```dec:``` russian parliament to debate on banning radio liberty  \n\n```article:``` continued dialogue with the democratic people 's republic of korea is important although australia 's plan to open its embassy in pyongyang has been shelved because of the crisis over the dprk 's nuclear weapons program , australian foreign minister alexander downer said on friday .  \n```ref:``` dialogue with dprk important says australian foreign minister  \n```dec:``` australian fm says dialogue with dprk important  \n\n```article:``` water levels in the zambezi river are rising due to heavy rains in its catchment area , prompting zimbabwe 's civil protection unit -lrb- cpu -rrb- to issue a flood alert for people living in the zambezi valley , the herald reported on friday .  \n```ref:``` floods loom in zambezi valley  \n```dec:``` water levels rising in zambezi river  \n\n```article:``` tens of thousands of people have fled samarra , about ## miles north of baghdad , in recent weeks , expecting a showdown between u.s. troops and heavily armed groups within the city , according to u.s. and iraqi sources .  \n```ref:``` thousands flee samarra fearing battle  \n```dec:``` tens of thousands flee samarra expecting showdown with u.s. troops  \n\n```article:``` the #### tung blossom festival will kick off saturday with a fun-filled ceremony at the west lake resort in the northern taiwan county of miaoli , a hakka stronghold , the council of hakka affairs -lrb- cha -rrb- announced tuesday .  \n```ref:``` #### tung blossom festival to kick off saturday  \n```dec:``` #### tung blossom festival to kick off in miaoli  \n\n## References\n* [pytorch implementation of \"Get To The Point: Summarization with Pointer-Generator Networks\"](https://github.com/atulkum/pointer_summarizer)\n"
  },
  {
    "path": "beam_search.py",
    "content": "import numpy as np\nimport torch as T\nfrom data_util import config, data\nfrom train_util import get_cuda\n\n\nclass Beam(object):\n    def __init__(self, start_id, end_id, unk_id, hidden_state, context):\n        h,c = hidden_state                                              #(n_hid,)\n        self.tokens = T.LongTensor(config.beam_size,1).fill_(start_id)  #(beam, t) after t time steps\n        self.scores = T.FloatTensor(config.beam_size,1).fill_(-30)      #beam,1; Initial score of beams = -30\n        self.tokens, self.scores = get_cuda(self.tokens), get_cuda(self.scores)\n        self.scores[0][0] = 0                                           #At time step t=0, all beams should extend from a single beam. So, I am giving high initial score to 1st beam\n        self.hid_h = h.unsqueeze(0).repeat(config.beam_size, 1)         #beam, n_hid\n        self.hid_c = c.unsqueeze(0).repeat(config.beam_size, 1)         #beam, n_hid\n        self.context = context.unsqueeze(0).repeat(config.beam_size, 1) #beam, 2*n_hid\n        self.sum_temporal_srcs = None\n        self.prev_s = None\n        self.done = False\n        self.end_id = end_id\n        self.unk_id = unk_id\n\n    def get_current_state(self):\n        tokens = self.tokens[:,-1].clone()\n        for i in range(len(tokens)):\n            if tokens[i].item() >= config.vocab_size:\n                tokens[i] = self.unk_id\n        return tokens\n\n\n    def advance(self, prob_dist, hidden_state, context, sum_temporal_srcs, prev_s):\n        '''Perform beam search: Considering the probabilites of given n_beam x n_extended_vocab words, select first n_beam words that give high total scores\n        :param prob_dist: (beam, n_extended_vocab)\n        :param hidden_state: Tuple of (beam, n_hid) tensors\n        :param context:   (beam, 2*n_hidden)\n        :param sum_temporal_srcs:   (beam, n_seq)\n        :param prev_s:  (beam, t, n_hid)\n        '''\n        n_extended_vocab = prob_dist.size(1)\n        h, c = hidden_state\n        log_probs = T.log(prob_dist+config.eps)                         #beam, n_extended_vocab\n\n        scores = log_probs + self.scores                                #beam, n_extended_vocab\n        scores = scores.view(-1,1)                                      #beam*n_extended_vocab, 1\n        best_scores, best_scores_id = T.topk(input=scores, k=config.beam_size, dim=0)   #will be sorted in descending order of scores\n        self.scores = best_scores                                       #(beam,1); sorted\n        beams_order = best_scores_id.squeeze(1)/n_extended_vocab        #(beam,); sorted\n        best_words = best_scores_id%n_extended_vocab                    #(beam,1); sorted\n        self.hid_h = h[beams_order]                                     #(beam, n_hid); sorted\n        self.hid_c = c[beams_order]                                     #(beam, n_hid); sorted\n        self.context = context[beams_order]\n        if sum_temporal_srcs is not None:\n            self.sum_temporal_srcs = sum_temporal_srcs[beams_order]     #(beam, n_seq); sorted\n        if prev_s is not None:\n            self.prev_s = prev_s[beams_order]                           #(beam, t, n_hid); sorted\n        self.tokens = self.tokens[beams_order]                          #(beam, t); sorted\n        self.tokens = T.cat([self.tokens, best_words], dim=1)           #(beam, t+1); sorted\n\n        #End condition is when top-of-beam is EOS.\n        if best_words[0][0] == self.end_id:\n            self.done = True\n\n    def get_best(self):\n        best_token = self.tokens[0].cpu().numpy().tolist()              #Since beams are always in sorted (descending) order, 1st beam is the best beam\n        try:\n            end_idx = best_token.index(self.end_id)\n        except ValueError:\n            end_idx = len(best_token)\n        best_token = best_token[1:end_idx]\n        return best_token\n\n    def get_all(self):\n        all_tokens = []\n        for i in range(len(self.tokens)):\n            all_tokens.append(self.tokens[i].cpu().numpy())\n        return all_tokens\n\n\ndef beam_search(enc_hid, enc_out, enc_padding_mask, ct_e, extra_zeros, enc_batch_extend_vocab, model, start_id, end_id, unk_id):\n\n    batch_size = len(enc_hid[0])\n    beam_idx = T.LongTensor(list(range(batch_size)))\n    beams = [Beam(start_id, end_id, unk_id, (enc_hid[0][i], enc_hid[1][i]), ct_e[i]) for i in range(batch_size)]   #For each example in batch, create Beam object\n    n_rem = batch_size                                                  #Index of beams that are active, i.e: didn't generate [STOP] yet\n    sum_temporal_srcs = None                                            #Number of examples in batch that didn't generate [STOP] yet\n    prev_s = None\n\n    for t in range(config.max_dec_steps):\n        x_t = T.stack(\n            [beam.get_current_state() for beam in beams if beam.done == False]      #remaining(rem),beam\n        ).contiguous().view(-1)                                                     #(rem*beam,)\n        x_t = model.embeds(x_t)                                                 #rem*beam, n_emb\n\n        dec_h = T.stack(\n            [beam.hid_h for beam in beams if beam.done == False]                    #rem*beam,n_hid\n        ).contiguous().view(-1,config.hidden_dim)\n        dec_c = T.stack(\n            [beam.hid_c for beam in beams if beam.done == False]                    #rem,beam,n_hid\n        ).contiguous().view(-1,config.hidden_dim)                                   #rem*beam,n_hid\n\n        ct_e = T.stack(\n            [beam.context for beam in beams if beam.done == False]                  #rem,beam,n_hid\n        ).contiguous().view(-1,2*config.hidden_dim)                                 #rem,beam,n_hid\n\n        if sum_temporal_srcs is not None:\n            sum_temporal_srcs = T.stack(\n                [beam.sum_temporal_srcs for beam in beams if beam.done == False]\n            ).contiguous().view(-1, enc_out.size(1))                                #rem*beam, n_seq\n\n        if prev_s is not None:\n            prev_s = T.stack(\n                [beam.prev_s for beam in beams if beam.done == False]\n            ).contiguous().view(-1, t, config.hidden_dim)                           #rem*beam, t-1, n_hid\n\n\n        s_t = (dec_h, dec_c)\n        enc_out_beam = enc_out[beam_idx].view(n_rem,-1).repeat(1, config.beam_size).view(-1, enc_out.size(1), enc_out.size(2))\n        enc_pad_mask_beam = enc_padding_mask[beam_idx].repeat(1, config.beam_size).view(-1, enc_padding_mask.size(1))\n\n        extra_zeros_beam = None\n        if extra_zeros is not None:\n            extra_zeros_beam = extra_zeros[beam_idx].repeat(1, config.beam_size).view(-1, extra_zeros.size(1))\n        enc_extend_vocab_beam = enc_batch_extend_vocab[beam_idx].repeat(1, config.beam_size).view(-1, enc_batch_extend_vocab.size(1))\n\n        final_dist, (dec_h, dec_c), ct_e, sum_temporal_srcs, prev_s = model.decoder(x_t, s_t, enc_out_beam, enc_pad_mask_beam, ct_e, extra_zeros_beam, enc_extend_vocab_beam, sum_temporal_srcs, prev_s)              #final_dist: rem*beam, n_extended_vocab\n\n        final_dist = final_dist.view(n_rem, config.beam_size, -1)                   #final_dist: rem, beam, n_extended_vocab\n        dec_h = dec_h.view(n_rem, config.beam_size, -1)                             #rem, beam, n_hid\n        dec_c = dec_c.view(n_rem, config.beam_size, -1)                             #rem, beam, n_hid\n        ct_e = ct_e.view(n_rem, config.beam_size, -1)                             #rem, beam, 2*n_hid\n\n        if sum_temporal_srcs is not None:\n            sum_temporal_srcs = sum_temporal_srcs.view(n_rem, config.beam_size, -1) #rem, beam, n_seq\n\n        if prev_s is not None:\n            prev_s = prev_s.view(n_rem, config.beam_size, -1, config.hidden_dim)    #rem, beam, t\n\n        # For all the active beams, perform beam search\n        active = []         #indices of active beams after beam search\n\n        for i in range(n_rem):\n            b = beam_idx[i].item()\n            beam = beams[b]\n            if beam.done:\n                continue\n\n            sum_temporal_srcs_i = prev_s_i = None\n            if sum_temporal_srcs is not None:\n                sum_temporal_srcs_i = sum_temporal_srcs[i]                              #beam, n_seq\n            if prev_s is not None:\n                prev_s_i = prev_s[i]                                                #beam, t, n_hid\n            beam.advance(final_dist[i], (dec_h[i], dec_c[i]), ct_e[i], sum_temporal_srcs_i, prev_s_i)\n            if beam.done == False:\n                active.append(b)\n\n        if len(active) == 0:\n            break\n\n        beam_idx = T.LongTensor(active)\n        n_rem = len(beam_idx)\n\n    predicted_words = []\n    for beam in beams:\n        predicted_words.append(beam.get_best())\n\n    return predicted_words\n\n\n\n\n\n\n\n\n\n\n\n\n\n"
  },
  {
    "path": "data/saved_models/.gitignore",
    "content": ""
  },
  {
    "path": "data/unfinished/.gitignore",
    "content": ""
  },
  {
    "path": "data_util/batcher.py",
    "content": "#Most of this file is copied form https://github.com/abisee/pointer-generator/blob/master/batcher.py\n\nimport queue as Queue\nimport time\nfrom random import shuffle\nfrom threading import Thread\n\nimport numpy as np\nimport tensorflow as tf\n\nfrom . import config\nfrom . import data\n\nimport random\nrandom.seed(1234)\n\n\nclass Example(object):\n\n  def __init__(self, article, abstract_sentences, vocab):\n    # Get ids of special tokens\n    start_decoding = vocab.word2id(data.START_DECODING)\n    stop_decoding = vocab.word2id(data.STOP_DECODING)\n\n    # Process the article\n    article_words = article.split()\n    if len(article_words) > config.max_enc_steps:\n      article_words = article_words[:config.max_enc_steps]\n    self.enc_len = len(article_words) # store the length after truncation but before padding\n    self.enc_input = [vocab.word2id(w) for w in article_words] # list of word ids; OOVs are represented by the id for UNK token\n\n    # Process the abstract\n    abstract = ' '.join(abstract_sentences) # string\n    abstract_words = abstract.split() # list of strings\n    abs_ids = [vocab.word2id(w) for w in abstract_words] # list of word ids; OOVs are represented by the id for UNK token\n\n    # Get the decoder input sequence and target sequence\n    self.dec_input, _ = self.get_dec_inp_targ_seqs(abs_ids, config.max_dec_steps, start_decoding, stop_decoding)\n    self.dec_len = len(self.dec_input)\n\n    # If using pointer-generator mode, we need to store some extra info\n    # Store a version of the enc_input where in-article OOVs are represented by their temporary OOV id; also store the in-article OOVs words themselves\n    self.enc_input_extend_vocab, self.article_oovs = data.article2ids(article_words, vocab)\n\n    # Get a verison of the reference summary where in-article OOVs are represented by their temporary article OOV id\n    abs_ids_extend_vocab = data.abstract2ids(abstract_words, vocab, self.article_oovs)\n\n    # Get decoder target sequence\n    _, self.target = self.get_dec_inp_targ_seqs(abs_ids_extend_vocab, config.max_dec_steps, start_decoding, stop_decoding)\n\n    # Store the original strings\n    self.original_article = article\n    self.original_abstract = abstract\n    self.original_abstract_sents = abstract_sentences\n\n\n\n  def get_dec_inp_targ_seqs(self, sequence, max_len, start_id, stop_id):\n    inp = [start_id] + sequence[:]\n    target = sequence[:]\n    if len(inp) > max_len: # truncate\n      inp = inp[:max_len]\n      target = target[:max_len] # no end_token\n    else: # no truncation\n      target.append(stop_id) # end token\n    assert len(inp) == len(target)\n    return inp, target\n\n\n  def pad_decoder_inp_targ(self, max_len, pad_id):\n    while len(self.dec_input) < max_len:\n      self.dec_input.append(pad_id)\n    while len(self.target) < max_len:\n      self.target.append(pad_id)\n\n\n  def pad_encoder_input(self, max_len, pad_id):\n    while len(self.enc_input) < max_len:\n      self.enc_input.append(pad_id)\n    while len(self.enc_input_extend_vocab) < max_len:\n      self.enc_input_extend_vocab.append(pad_id)\n\n\nclass Batch(object):\n  def __init__(self, example_list, vocab, batch_size):\n    self.batch_size = batch_size\n    self.pad_id = vocab.word2id(data.PAD_TOKEN) # id of the PAD token used to pad sequences\n    self.init_encoder_seq(example_list) # initialize the input to the encoder\n    self.init_decoder_seq(example_list) # initialize the input and targets for the decoder\n    self.store_orig_strings(example_list) # store the original strings\n\n\n  def init_encoder_seq(self, example_list):\n    # Determine the maximum length of the encoder input sequence in this batch\n    max_enc_seq_len = max([ex.enc_len for ex in example_list])\n\n    # Pad the encoder input sequences up to the length of the longest sequence\n    for ex in example_list:\n      ex.pad_encoder_input(max_enc_seq_len, self.pad_id)\n\n    # Initialize the numpy arrays\n    # Note: our enc_batch can have different length (second dimension) for each batch because we use dynamic_rnn for the encoder.\n    self.enc_batch = np.zeros((self.batch_size, max_enc_seq_len), dtype=np.int32)\n    self.enc_lens = np.zeros((self.batch_size), dtype=np.int32)\n    self.enc_padding_mask = np.zeros((self.batch_size, max_enc_seq_len), dtype=np.float32)\n\n    # Fill in the numpy arrays\n    for i, ex in enumerate(example_list):\n      self.enc_batch[i, :] = ex.enc_input[:]\n      self.enc_lens[i] = ex.enc_len\n      for j in range(ex.enc_len):\n        self.enc_padding_mask[i][j] = 1\n\n    # For pointer-generator mode, need to store some extra info\n    # Determine the max number of in-article OOVs in this batch\n    self.max_art_oovs = max([len(ex.article_oovs) for ex in example_list])\n    # Store the in-article OOVs themselves\n    self.art_oovs = [ex.article_oovs for ex in example_list]\n    # Store the version of the enc_batch that uses the article OOV ids\n    self.enc_batch_extend_vocab = np.zeros((self.batch_size, max_enc_seq_len), dtype=np.int32)\n    for i, ex in enumerate(example_list):\n      self.enc_batch_extend_vocab[i, :] = ex.enc_input_extend_vocab[:]\n\n  def init_decoder_seq(self, example_list):\n    # Pad the inputs and targets\n    for ex in example_list:\n      ex.pad_decoder_inp_targ(config.max_dec_steps, self.pad_id)\n\n    # Initialize the numpy arrays.\n    self.dec_batch = np.zeros((self.batch_size, config.max_dec_steps), dtype=np.int32)\n    self.target_batch = np.zeros((self.batch_size, config.max_dec_steps), dtype=np.int32)\n    # self.dec_padding_mask = np.zeros((self.batch_size, config.max_dec_steps), dtype=np.float32)\n    self.dec_lens = np.zeros((self.batch_size), dtype=np.int32)\n\n    # Fill in the numpy arrays\n    for i, ex in enumerate(example_list):\n      self.dec_batch[i, :] = ex.dec_input[:]\n      self.target_batch[i, :] = ex.target[:]\n      self.dec_lens[i] = ex.dec_len\n      # for j in range(ex.dec_len):\n      #   self.dec_padding_mask[i][j] = 1\n\n  def store_orig_strings(self, example_list):\n    self.original_articles = [ex.original_article for ex in example_list] # list of lists\n    self.original_abstracts = [ex.original_abstract for ex in example_list] # list of lists\n    self.original_abstracts_sents = [ex.original_abstract_sents for ex in example_list] # list of list of lists\n\n\nclass Batcher(object):\n  BATCH_QUEUE_MAX = 1000 # max number of batches the batch_queue can hold\n\n  def __init__(self, data_path, vocab, mode, batch_size, single_pass):\n    self._data_path = data_path\n    self._vocab = vocab\n    self._single_pass = single_pass\n    self.mode = mode\n    self.batch_size = batch_size\n    # Initialize a queue of Batches waiting to be used, and a queue of Examples waiting to be batched\n    self._batch_queue = Queue.Queue(self.BATCH_QUEUE_MAX)\n    self._example_queue = Queue.Queue(self.BATCH_QUEUE_MAX * self.batch_size)\n\n    # Different settings depending on whether we're in single_pass mode or not\n    if single_pass:\n      self._num_example_q_threads = 1 # just one thread, so we read through the dataset just once\n      self._num_batch_q_threads = 1  # just one thread to batch examples\n      self._bucketing_cache_size = 1 # only load one batch's worth of examples before bucketing; this essentially means no bucketing\n      self._finished_reading = False # this will tell us when we're finished reading the dataset\n    else:\n      self._num_example_q_threads = 1 #16 # num threads to fill example queue\n      self._num_batch_q_threads = 1 #4  # num threads to fill batch queue\n      self._bucketing_cache_size = 1 #100 # how many batches-worth of examples to load into cache before bucketing\n\n    # Start the threads that load the queues\n    self._example_q_threads = []\n    for _ in range(self._num_example_q_threads):\n      self._example_q_threads.append(Thread(target=self.fill_example_queue))\n      self._example_q_threads[-1].daemon = True\n      self._example_q_threads[-1].start()\n    self._batch_q_threads = []\n    for _ in range(self._num_batch_q_threads):\n      self._batch_q_threads.append(Thread(target=self.fill_batch_queue))\n      self._batch_q_threads[-1].daemon = True\n      self._batch_q_threads[-1].start()\n\n    # Start a thread that watches the other threads and restarts them if they're dead\n    if not single_pass: # We don't want a watcher in single_pass mode because the threads shouldn't run forever\n      self._watch_thread = Thread(target=self.watch_threads)\n      self._watch_thread.daemon = True\n      self._watch_thread.start()\n\n  def next_batch(self):\n    # If the batch queue is empty, print a warning\n    if self._batch_queue.qsize() == 0:\n      # tf.logging.warning('Bucket input queue is empty when calling next_batch. Bucket queue size: %i, Input queue size: %i', self._batch_queue.qsize(), self._example_queue.qsize())\n      if self._single_pass and self._finished_reading:\n        tf.logging.info(\"Finished reading dataset in single_pass mode.\")\n        return None\n\n    batch = self._batch_queue.get() # get the next Batch\n    return batch\n\n  def fill_example_queue(self):\n    input_gen = self.text_generator(data.example_generator(self._data_path, self._single_pass))\n\n    while True:\n      try:\n        (article, abstract) = next(input_gen) # read the next example from file. article and abstract are both strings.\n      except StopIteration: # if there are no more examples:\n        tf.logging.info(\"The example generator for this example queue filling thread has exhausted data.\")\n        if self._single_pass:\n          tf.logging.info(\"single_pass mode is on, so we've finished reading dataset. This thread is stopping.\")\n          self._finished_reading = True\n          break\n        else:\n          raise Exception(\"single_pass mode is off but the example generator is out of data; error.\")\n\n      # abstract_sentences = [sent.strip() for sent in data.abstract2sents(abstract)] # Use the <s> and </s> tags in abstract to get a list of sentences.\n      abstract_sentences = [abstract.strip()]\n      example = Example(article, abstract_sentences, self._vocab) # Process into an Example.\n      self._example_queue.put(example) # place the Example in the example queue.\n\n  def fill_batch_queue(self):\n    while True:\n      if self.mode == 'decode':\n        # beam search decode mode single example repeated in the batch\n        ex = self._example_queue.get()\n        b = [ex for _ in range(self.batch_size)]\n        self._batch_queue.put(Batch(b, self._vocab, self.batch_size))\n      else:\n        # Get bucketing_cache_size-many batches of Examples into a list, then sort\n        inputs = []\n        for _ in range(self.batch_size * self._bucketing_cache_size):\n          inputs.append(self._example_queue.get())\n        inputs = sorted(inputs, key=lambda inp: inp.enc_len, reverse=True) # sort by length of encoder sequence\n\n        # Group the sorted Examples into batches, optionally shuffle the batches, and place in the batch queue.\n        batches = []\n        for i in range(0, len(inputs), self.batch_size):\n          batches.append(inputs[i:i + self.batch_size])\n        if not self._single_pass:\n          shuffle(batches)\n        for b in batches:  # each b is a list of Example objects\n          self._batch_queue.put(Batch(b, self._vocab, self.batch_size))\n\n  def watch_threads(self):\n    while True:\n      tf.logging.info(\n        'Bucket queue size: %i, Input queue size: %i',\n        self._batch_queue.qsize(), self._example_queue.qsize())\n\n      time.sleep(60)\n      for idx,t in enumerate(self._example_q_threads):\n        if not t.is_alive(): # if the thread is dead\n          tf.logging.error('Found example queue thread dead. Restarting.')\n          new_t = Thread(target=self.fill_example_queue)\n          self._example_q_threads[idx] = new_t\n          new_t.daemon = True\n          new_t.start()\n      for idx,t in enumerate(self._batch_q_threads):\n        if not t.is_alive(): # if the thread is dead\n          tf.logging.error('Found batch queue thread dead. Restarting.')\n          new_t = Thread(target=self.fill_batch_queue)\n          self._batch_q_threads[idx] = new_t\n          new_t.daemon = True\n          new_t.start()\n\n\n  def text_generator(self, example_generator):\n    while True:\n      e = next(example_generator) # e is a tf.Example\n      try:\n        article_text = e.features.feature['article'].bytes_list.value[0] # the article text was saved under the key 'article' in the data files\n        abstract_text = e.features.feature['abstract'].bytes_list.value[0] # the abstract text was saved under the key 'abstract' in the data files\n        article_text = article_text.decode()\n        abstract_text = abstract_text.decode()\n      except ValueError:\n        tf.logging.error('Failed to get article or abstract from example')\n        continue\n      if len(article_text)==0: # See https://github.com/abisee/pointer-generator/issues/1\n        #tf.logging.warning('Found an example with empty article text. Skipping it.')\n        continue\n      else:\n        yield (article_text, abstract_text)\n"
  },
  {
    "path": "data_util/config.py",
    "content": "train_data_path = \t\"data/chunked/train/train_*\"\nvalid_data_path = \t\"data/chunked/valid/valid_*\"\ntest_data_path = \t\"data/chunked/test/test_*\"\nvocab_path = \t\t\"data/vocab\"\n\n\n# Hyperparameters\nhidden_dim = 512\nemb_dim = 256\nbatch_size = 200\nmax_enc_steps = 55\t\t#99% of the articles are within length 55\nmax_dec_steps = 15\t\t#99% of the titles are within length 15\nbeam_size = 4\nmin_dec_steps= 3\nvocab_size = 50000\n\nlr = 0.001\nrand_unif_init_mag = 0.02\ntrunc_norm_init_std = 1e-4\n\neps = 1e-12\nmax_iterations = 500000\n\n\nsave_model_path = \"data/saved_models\"\n\nintra_encoder = True\nintra_decoder = True"
  },
  {
    "path": "data_util/data.py",
    "content": "#Most of this file is copied form https://github.com/abisee/pointer-generator/blob/master/data.py\n\nimport glob\nimport random\nimport struct\nimport csv\nfrom tensorflow.core.example import example_pb2\n\n# <s> and </s> are used in the data files to segment the abstracts into sentences. They don't receive vocab ids.\nSENTENCE_START = '<s>'\nSENTENCE_END = '</s>'\n\nPAD_TOKEN = '[PAD]' # This has a vocab id, which is used to pad the encoder input, decoder input and target sequence\nUNKNOWN_TOKEN = '[UNK]' # This has a vocab id, which is used to represent out-of-vocabulary words\nSTART_DECODING = '[START]' # This has a vocab id, which is used at the start of every decoder input sequence\nSTOP_DECODING = '[STOP]' # This has a vocab id, which is used at the end of untruncated target sequences\n\n# Note: none of <s>, </s>, [PAD], [UNK], [START], [STOP] should appear in the vocab file.\n\n\nclass Vocab(object):\n\n  def __init__(self, vocab_file, max_size):\n    self._word_to_id = {}\n    self._id_to_word = {}\n    self._count = 0 # keeps track of total number of words in the Vocab\n\n    # [UNK], [PAD], [START] and [STOP] get the ids 0,1,2,3.\n    for w in [UNKNOWN_TOKEN, PAD_TOKEN, START_DECODING, STOP_DECODING]:\n      self._word_to_id[w] = self._count\n      self._id_to_word[self._count] = w\n      self._count += 1\n\n    # Read the vocab file and add words up to max_size\n    with open(vocab_file, 'r') as vocab_f:\n      for line in vocab_f:\n        pieces = line.split()\n        if len(pieces) != 2:\n          # print ('Warning: incorrectly formatted line in vocabulary file: %s\\n' % line)\n          continue\n        w = pieces[0]\n        if w in [SENTENCE_START, SENTENCE_END, UNKNOWN_TOKEN, PAD_TOKEN, START_DECODING, STOP_DECODING]:\n          raise Exception('<s>, </s>, [UNK], [PAD], [START] and [STOP] shouldn\\'t be in the vocab file, but %s is' % w)\n        if w in self._word_to_id:\n          raise Exception('Duplicated word in vocabulary file: %s' % w)\n        self._word_to_id[w] = self._count\n        self._id_to_word[self._count] = w\n        self._count += 1\n        if max_size != 0 and self._count >= max_size:\n          # print (\"max_size of vocab was specified as %i; we now have %i words. Stopping reading.\" % (max_size, self._count))\n          break\n\n    # print (\"Finished constructing vocabulary of %i total words. Last word added: %s\" % (self._count, self._id_to_word[self._count-1]))\n\n  def word2id(self, word):\n    if word not in self._word_to_id:\n      return self._word_to_id[UNKNOWN_TOKEN]\n    return self._word_to_id[word]\n\n  def id2word(self, word_id):\n    if word_id not in self._id_to_word:\n      raise ValueError('Id not found in vocab: %d' % word_id)\n    return self._id_to_word[word_id]\n\n  def size(self):\n    return self._count\n\n  def write_metadata(self, fpath):\n    print (\"Writing word embedding metadata file to %s...\" % (fpath))\n    with open(fpath, \"w\") as f:\n      fieldnames = ['word']\n      writer = csv.DictWriter(f, delimiter=\"\\t\", fieldnames=fieldnames)\n      for i in xrange(self.size()):\n        writer.writerow({\"word\": self._id_to_word[i]})\n\n\ndef example_generator(data_path, single_pass):\n  while True:\n    filelist = glob.glob(data_path) # get the list of datafiles\n    assert filelist, ('Error: Empty filelist at %s' % data_path) # check filelist isn't empty\n    if single_pass:\n      filelist = sorted(filelist)\n    else:\n      random.shuffle(filelist)\n    for f in filelist:\n      reader = open(f, 'rb')\n      while True:\n        len_bytes = reader.read(8)\n        if not len_bytes: break # finished reading this file\n        str_len = struct.unpack('q', len_bytes)[0]\n        example_str = struct.unpack('%ds' % str_len, reader.read(str_len))[0]\n        yield example_pb2.Example.FromString(example_str)\n    if single_pass:\n      # print (\"example_generator completed reading all datafiles. No more data.\")\n      break\n\n\ndef article2ids(article_words, vocab):\n  ids = []\n  oovs = []\n  unk_id = vocab.word2id(UNKNOWN_TOKEN)\n  for w in article_words:\n    i = vocab.word2id(w)\n    if i == unk_id: # If w is OOV\n      if w not in oovs: # Add to list of OOVs\n        oovs.append(w)\n      oov_num = oovs.index(w) # This is 0 for the first article OOV, 1 for the second article OOV...\n      ids.append(vocab.size() + oov_num) # This is e.g. 50000 for the first article OOV, 50001 for the second...\n    else:\n      ids.append(i)\n  return ids, oovs\n\n\ndef abstract2ids(abstract_words, vocab, article_oovs):\n  ids = []\n  unk_id = vocab.word2id(UNKNOWN_TOKEN)\n  for w in abstract_words:\n    i = vocab.word2id(w)\n    if i == unk_id: # If w is an OOV word\n      if w in article_oovs: # If w is an in-article OOV\n        vocab_idx = vocab.size() + article_oovs.index(w) # Map to its temporary article OOV number\n        ids.append(vocab_idx)\n      else: # If w is an out-of-article OOV\n        ids.append(unk_id) # Map to the UNK token id\n    else:\n      ids.append(i)\n  return ids\n\n\ndef outputids2words(id_list, vocab, article_oovs):\n  words = []\n  for i in id_list:\n    try:\n      w = vocab.id2word(i) # might be [UNK]\n    except ValueError as e: # w is OOV\n      assert article_oovs is not None, \"Error: model produced a word ID that isn't in the vocabulary. This should not happen in baseline (no pointer-generator) mode\"\n      article_oov_idx = i - vocab.size()\n      try:\n        w = article_oovs[article_oov_idx]\n      except ValueError as e: # i doesn't correspond to an article oov\n        raise ValueError('Error: model produced word ID %i which corresponds to article OOV %i but this example only has %i article OOVs' % (i, article_oov_idx, len(article_oovs)))\n    words.append(w)\n  return words\n\n\ndef abstract2sents(abstract):\n  cur = 0\n  sents = []\n  while True:\n    try:\n      start_p = abstract.index(SENTENCE_START, cur)\n      end_p = abstract.index(SENTENCE_END, start_p + 1)\n      cur = end_p + len(SENTENCE_END)\n      sents.append(abstract[start_p+len(SENTENCE_START):end_p])\n    except ValueError as e: # no more sentences\n      return sents\n\n\ndef show_art_oovs(article, vocab):\n  unk_token = vocab.word2id(UNKNOWN_TOKEN)\n  words = article.split(' ')\n  words = [(\"__%s__\" % w) if vocab.word2id(w)==unk_token else w for w in words]\n  out_str = ' '.join(words)\n  return out_str\n\n\ndef show_abs_oovs(abstract, vocab, article_oovs):\n  unk_token = vocab.word2id(UNKNOWN_TOKEN)\n  words = abstract.split(' ')\n  new_words = []\n  for w in words:\n    if vocab.word2id(w) == unk_token: # w is oov\n      if article_oovs is None: # baseline mode\n        new_words.append(\"__%s__\" % w)\n      else: # pointer-generator mode\n        if w in article_oovs:\n          new_words.append(\"__%s__\" % w)\n        else:\n          new_words.append(\"!!__%s__!!\" % w)\n    else: # w is in-vocab word\n      new_words.append(w)\n  out_str = ' '.join(new_words)\n  return out_str\n"
  },
  {
    "path": "eval.py",
    "content": "import os\nos.environ[\"CUDA_VISIBLE_DEVICES\"] = \"0\"\nimport time\n\nimport torch as T\nimport torch.nn as nn\nimport torch.nn.functional as F\nfrom model import Model\n\nfrom data_util import config, data\nfrom data_util.batcher import Batcher\nfrom data_util.data import Vocab\nfrom train_util import *\nfrom beam_search import *\nfrom rouge import Rouge\nimport argparse\n\ndef get_cuda(tensor):\n    if T.cuda.is_available():\n        tensor = tensor.cuda()\n    return tensor\n\nclass Evaluate(object):\n    def __init__(self, data_path, opt, batch_size = config.batch_size):\n        self.vocab = Vocab(config.vocab_path, config.vocab_size)\n        self.batcher = Batcher(data_path, self.vocab, mode='eval',\n                               batch_size=batch_size, single_pass=True)\n        self.opt = opt\n        time.sleep(5)\n\n    def setup_valid(self):\n        self.model = Model()\n        self.model = get_cuda(self.model)\n        checkpoint = T.load(os.path.join(config.save_model_path, self.opt.load_model))\n        self.model.load_state_dict(checkpoint[\"model_dict\"])\n\n\n    def print_original_predicted(self, decoded_sents, ref_sents, article_sents, loadfile):\n        filename = \"test_\"+loadfile.split(\".\")[0]+\".txt\"\n    \n        with open(os.path.join(\"data\",filename), \"w\") as f:\n            for i in range(len(decoded_sents)):\n                f.write(\"article: \"+article_sents[i] + \"\\n\")\n                f.write(\"ref: \" + ref_sents[i] + \"\\n\")\n                f.write(\"dec: \" + decoded_sents[i] + \"\\n\\n\")\n\n    def evaluate_batch(self, print_sents = False):\n\n        self.setup_valid()\n        batch = self.batcher.next_batch()\n        start_id = self.vocab.word2id(data.START_DECODING)\n        end_id = self.vocab.word2id(data.STOP_DECODING)\n        unk_id = self.vocab.word2id(data.UNKNOWN_TOKEN)\n        decoded_sents = []\n        ref_sents = []\n        article_sents = []\n        rouge = Rouge()\n        while batch is not None:\n            enc_batch, enc_lens, enc_padding_mask, enc_batch_extend_vocab, extra_zeros, ct_e = get_enc_data(batch)\n\n            with T.autograd.no_grad():\n                enc_batch = self.model.embeds(enc_batch)\n                enc_out, enc_hidden = self.model.encoder(enc_batch, enc_lens)\n\n            #-----------------------Summarization----------------------------------------------------\n            with T.autograd.no_grad():\n                pred_ids = beam_search(enc_hidden, enc_out, enc_padding_mask, ct_e, extra_zeros, enc_batch_extend_vocab, self.model, start_id, end_id, unk_id)\n\n            for i in range(len(pred_ids)):\n                decoded_words = data.outputids2words(pred_ids[i], self.vocab, batch.art_oovs[i])\n                if len(decoded_words) < 2:\n                    decoded_words = \"xxx\"\n                else:\n                    decoded_words = \" \".join(decoded_words)\n                decoded_sents.append(decoded_words)\n                abstract = batch.original_abstracts[i]\n                article = batch.original_articles[i]\n                ref_sents.append(abstract)\n                article_sents.append(article)\n\n            batch = self.batcher.next_batch()\n\n        load_file = self.opt.load_model\n\n        if print_sents:\n            self.print_original_predicted(decoded_sents, ref_sents, article_sents, load_file)\n\n        scores = rouge.get_scores(decoded_sents, ref_sents, avg = True)\n        if self.opt.task == \"test\":\n            print(load_file, \"scores:\", scores)\n        else:\n            rouge_l = scores[\"rouge-l\"][\"f\"]\n            print(load_file, \"rouge_l:\", \"%.4f\" % rouge_l)\n\n\n\nif __name__ == \"__main__\":\n    parser = argparse.ArgumentParser()\n    parser.add_argument(\"--task\", type=str, default=\"validate\", choices=[\"validate\",\"test\"])\n    parser.add_argument(\"--start_from\", type=str, default=\"0020000.tar\")\n    parser.add_argument(\"--load_model\", type=str, default=None)\n    opt = parser.parse_args()\n\n    if opt.task == \"validate\":\n        saved_models = os.listdir(config.save_model_path)\n        saved_models.sort()\n        file_idx = saved_models.index(opt.start_from)\n        saved_models = saved_models[file_idx:]\n        for f in saved_models:\n            opt.load_model = f\n            eval_processor = Evaluate(config.valid_data_path, opt)\n            eval_processor.evaluate_batch()\n    else:   #test\n        eval_processor = Evaluate(config.test_data_path, opt)\n        eval_processor.evaluate_batch()\n"
  },
  {
    "path": "make_data_files.py",
    "content": "import os\nimport shutil\nimport collections\nimport tqdm\nfrom tensorflow.core.example import example_pb2\nimport struct\nimport random\nimport shutil\n\nfinished_path = \"data/finished\"\nunfinished_path = \"data/unfinished\"\nchunk_path = \"data/chunked\"\n\nvocab_path = \"data/vocab\"\nVOCAB_SIZE = 200000\n\nCHUNK_SIZE = 15000 # num examples per chunk, for the chunked data\ntrain_bin_path = os.path.join(finished_path, \"train.bin\")\nvalid_bin_path = os.path.join(finished_path, \"valid.bin\")\n\ndef make_folder(folder_path):\n    if not os.path.exists(folder_path):\n        os.makedirs(folder_path)\n\ndef delete_folder(folder_path):\n    if os.path.exists(folder_path):\n        shutil.rmtree(folder_path)\n\ndef shuffle_text_data(unshuffled_art, unshuffled_abs, shuffled_art, shuffled_abs):\n    article_itr = open(os.path.join(unfinished_path, unshuffled_art), \"r\")\n    abstract_itr = open(os.path.join(unfinished_path, unshuffled_abs), \"r\")\n    list_of_pairs = []\n    for article in article_itr:\n        article = article.strip()\n        abstract = next(abstract_itr).strip()\n        list_of_pairs.append((article, abstract))\n    article_itr.close()\n    abstract_itr.close()\n    random.shuffle(list_of_pairs)\n    article_itr = open(os.path.join(unfinished_path, shuffled_art), \"w\")\n    abstract_itr = open(os.path.join(unfinished_path, shuffled_abs), \"w\")\n    for pair in list_of_pairs:\n        article_itr.write(pair[0]+\"\\n\")\n        abstract_itr.write(pair[1]+\"\\n\")\n    article_itr.close()\n    abstract_itr.close()\n\ndef write_to_bin(article_path, abstract_path, out_file, vocab_counter = None):\n\n    with open(out_file, 'wb') as writer:\n\n        article_itr = open(article_path, 'r')\n        abstract_itr = open(abstract_path, 'r')\n        for article in tqdm.tqdm(article_itr):\n            article = article.strip()\n            abstract = next(abstract_itr).strip()\n\n            tf_example = example_pb2.Example()\n            tf_example.features.feature['article'].bytes_list.value.extend([article])\n            tf_example.features.feature['abstract'].bytes_list.value.extend([abstract])\n            tf_example_str = tf_example.SerializeToString()\n            str_len = len(tf_example_str)\n            writer.write(struct.pack('q', str_len))\n            writer.write(struct.pack('%ds' % str_len, tf_example_str))\n\n            if vocab_counter is not None:\n                art_tokens = article.split(' ')\n                abs_tokens = abstract.split(' ')\n                # abs_tokens = [t for t in abs_tokens if\n                #               t not in [SENTENCE_START, SENTENCE_END]]  # remove these tags from vocab\n                tokens = art_tokens + abs_tokens\n                tokens = [t.strip() for t in tokens]  # strip\n                tokens = [t for t in tokens if t != \"\"]  # remove empty\n                vocab_counter.update(tokens)\n\n    if vocab_counter is not None:\n        with open(vocab_path, 'w') as writer:\n            for word, count in vocab_counter.most_common(VOCAB_SIZE):\n                writer.write(word + ' ' + str(count) + '\\n')\n\n\ndef creating_finished_data():\n    make_folder(finished_path)\n\n    vocab_counter = collections.Counter()\n\n    write_to_bin(os.path.join(unfinished_path, \"train.art.shuf.txt\"), os.path.join(unfinished_path, \"train.abs.shuf.txt\"), train_bin_path, vocab_counter)\n    write_to_bin(os.path.join(unfinished_path, \"valid.art.shuf.txt\"), os.path.join(unfinished_path, \"valid.abs.shuf.txt\"), valid_bin_path)\n\n\ndef chunk_file(set_name, chunks_dir, bin_file):\n    make_folder(chunks_dir)\n    reader = open(bin_file, \"rb\")\n    chunk = 0\n    finished = False\n    while not finished:\n        chunk_fname = os.path.join(chunks_dir, '%s_%04d.bin' % (set_name, chunk)) # new chunk\n        with open(chunk_fname, 'wb') as writer:\n            for _ in range(CHUNK_SIZE):\n                len_bytes = reader.read(8)\n                if not len_bytes:\n                    finished = True\n                    break\n                str_len = struct.unpack('q', len_bytes)[0]\n                example_str = struct.unpack('%ds' % str_len, reader.read(str_len))[0]\n                writer.write(struct.pack('q', str_len))\n                writer.write(struct.pack('%ds' % str_len, example_str))\n            chunk += 1\n\n\nif __name__ == \"__main__\":\n    shuffle_text_data(\"train.article.txt\", \"train.title.txt\", \"train.art.shuf.txt\", \"train.abs.shuf.txt\")\n    shuffle_text_data(\"valid.article.filter.txt\", \"valid.title.filter.txt\", \"valid.art.shuf.txt\", \"valid.abs.shuf.txt\")\n    print(\"Completed shuffling train & valid text files\")\n    delete_folder(finished_path)\n    creating_finished_data()        #create bin files\n    print(\"Completed creating bin file for train & valid\")\n    delete_folder(chunk_path)\n    chunk_file(\"train\", os.path.join(chunk_path, \"train\"), train_bin_path)\n    chunk_file(\"valid\", os.path.join(chunk_path, \"main_valid\"), valid_bin_path)\n    print(\"Completed chunking main bin files into smaller ones\")\n    #Performing rouge evaluation on 1.9 lakh sentences takes lot of time. So, create mini validation set & test set by borrowing 15k samples each from these 1.9 lakh sentences\n    make_folder(os.path.join(chunk_path, \"valid\"))\n    make_folder(os.path.join(chunk_path, \"test\"))\n    bin_chunks = os.listdir(os.path.join(chunk_path, \"main_valid\"))\n    bin_chunks.sort()\n    samples = random.sample(set(bin_chunks[:-1]), 2)      #Exclude last bin file; contains only 9k sentences\n    valid_chunk, test_chunk = samples[0], samples[1]\n    shutil.copyfile(os.path.join(chunk_path, \"main_valid\", valid_chunk), os.path.join(chunk_path, \"valid\", \"valid_00.bin\"))\n    shutil.copyfile(os.path.join(chunk_path, \"main_valid\", test_chunk), os.path.join(chunk_path, \"test\", \"test_00.bin\"))\n\n    # delete_folder(finished)\n    # delete_folder(os.path.join(chunk_path, \"main_valid\"))\n\n\n\n\n"
  },
  {
    "path": "model.py",
    "content": "import torch as T\nimport torch.nn as nn\nfrom torch.nn.utils.rnn import pack_padded_sequence, pad_packed_sequence\nfrom data_util import config\nimport torch.nn.functional as F\nfrom train_util import get_cuda\n\ndef init_lstm_wt(lstm):\n    for name, _ in lstm.named_parameters():\n        if 'weight' in name:\n            wt = getattr(lstm, name)\n            wt.data.uniform_(-config.rand_unif_init_mag, config.rand_unif_init_mag)\n        elif 'bias' in name:\n            # set forget bias to 1\n            bias = getattr(lstm, name)\n            n = bias.size(0)\n            start, end = n // 4, n // 2\n            bias.data.fill_(0.)\n            bias.data[start:end].fill_(1.)\n\ndef init_linear_wt(linear):\n    linear.weight.data.normal_(std=config.trunc_norm_init_std)\n    if linear.bias is not None:\n        linear.bias.data.normal_(std=config.trunc_norm_init_std)\n\ndef init_wt_normal(wt):\n    wt.data.normal_(std=config.trunc_norm_init_std)\n\n\nclass Encoder(nn.Module):\n    def __init__(self):\n        super(Encoder, self).__init__()\n\n        self.lstm = nn.LSTM(config.emb_dim, config.hidden_dim, num_layers=1, batch_first=True, bidirectional=True)\n        init_lstm_wt(self.lstm)\n\n        self.reduce_h = nn.Linear(config.hidden_dim * 2, config.hidden_dim)\n        init_linear_wt(self.reduce_h)\n        self.reduce_c = nn.Linear(config.hidden_dim * 2, config.hidden_dim)\n        init_linear_wt(self.reduce_c)\n\n    def forward(self, x, seq_lens):\n        packed = pack_padded_sequence(x, seq_lens, batch_first=True)\n        enc_out, enc_hid = self.lstm(packed)\n        enc_out,_ = pad_packed_sequence(enc_out, batch_first=True)\n        enc_out = enc_out.contiguous()                              #bs, n_seq, 2*n_hid\n        h, c = enc_hid                                              #shape of h: 2, bs, n_hid\n        h = T.cat(list(h), dim=1)                                   #bs, 2*n_hid\n        c = T.cat(list(c), dim=1)\n        h_reduced = F.relu(self.reduce_h(h))                        #bs,n_hid\n        c_reduced = F.relu(self.reduce_c(c))\n        return enc_out, (h_reduced, c_reduced)\n\n\nclass encoder_attention(nn.Module):\n\n    def __init__(self):\n        super(encoder_attention, self).__init__()\n        self.W_h = nn.Linear(config.hidden_dim * 2, config.hidden_dim * 2, bias=False)\n        self.W_s = nn.Linear(config.hidden_dim * 2, config.hidden_dim * 2)\n        self.v = nn.Linear(config.hidden_dim * 2, 1, bias=False)\n\n\n    def forward(self, st_hat, h, enc_padding_mask, sum_temporal_srcs):\n        ''' Perform attention over encoder hidden states\n        :param st_hat: decoder hidden state at current time step\n        :param h: encoder hidden states\n        :param enc_padding_mask:\n        :param sum_temporal_srcs: if using intra-temporal attention, contains summation of attention weights from previous decoder time steps\n        '''\n\n        # Standard attention technique (eq 1 in https://arxiv.org/pdf/1704.04368.pdf)\n        et = self.W_h(h)                        # bs,n_seq,2*n_hid\n        dec_fea = self.W_s(st_hat).unsqueeze(1) # bs,1,2*n_hid\n        et = et + dec_fea\n        et = T.tanh(et)                         # bs,n_seq,2*n_hid\n        et = self.v(et).squeeze(2)              # bs,n_seq\n\n        # intra-temporal attention     (eq 3 in https://arxiv.org/pdf/1705.04304.pdf)\n        if config.intra_encoder:\n            exp_et = T.exp(et)\n            if sum_temporal_srcs is None:\n                et1 = exp_et\n                sum_temporal_srcs  = get_cuda(T.FloatTensor(et.size()).fill_(1e-10)) + exp_et\n            else:\n                et1 = exp_et/sum_temporal_srcs  #bs, n_seq\n                sum_temporal_srcs = sum_temporal_srcs + exp_et\n        else:\n            et1 = F.softmax(et, dim=1)\n\n        # assign 0 probability for padded elements\n        at = et1 * enc_padding_mask\n        normalization_factor = at.sum(1, keepdim=True)\n        at = at / normalization_factor\n\n        at = at.unsqueeze(1)                    #bs,1,n_seq\n        # Compute encoder context vector\n        ct_e = T.bmm(at, h)                     #bs, 1, 2*n_hid\n        ct_e = ct_e.squeeze(1)\n        at = at.squeeze(1)\n\n        return ct_e, at, sum_temporal_srcs\n\nclass decoder_attention(nn.Module):\n    def __init__(self):\n        super(decoder_attention, self).__init__()\n        if config.intra_decoder:\n            self.W_prev = nn.Linear(config.hidden_dim, config.hidden_dim, bias=False)\n            self.W_s = nn.Linear(config.hidden_dim, config.hidden_dim)\n            self.v = nn.Linear(config.hidden_dim, 1, bias=False)\n\n    def forward(self, s_t, prev_s):\n        '''Perform intra_decoder attention\n        Args\n        :param s_t: hidden state of decoder at current time step\n        :param prev_s: If intra_decoder attention, contains list of previous decoder hidden states\n        '''\n        if config.intra_decoder is False:\n            ct_d = get_cuda(T.zeros(s_t.size()))\n        elif prev_s is None:\n            ct_d = get_cuda(T.zeros(s_t.size()))\n            prev_s = s_t.unsqueeze(1)               #bs, 1, n_hid\n        else:\n            # Standard attention technique (eq 1 in https://arxiv.org/pdf/1704.04368.pdf)\n            et = self.W_prev(prev_s)                # bs,t-1,n_hid\n            dec_fea = self.W_s(s_t).unsqueeze(1)    # bs,1,n_hid\n            et = et + dec_fea\n            et = T.tanh(et)                         # bs,t-1,n_hid\n            et = self.v(et).squeeze(2)              # bs,t-1\n            # intra-decoder attention     (eq 7 & 8 in https://arxiv.org/pdf/1705.04304.pdf)\n            at = F.softmax(et, dim=1).unsqueeze(1)  #bs, 1, t-1\n            ct_d = T.bmm(at, prev_s).squeeze(1)     #bs, n_hid\n            prev_s = T.cat([prev_s, s_t.unsqueeze(1)], dim=1)    #bs, t, n_hid\n\n        return ct_d, prev_s\n\n\nclass Decoder(nn.Module):\n    def __init__(self):\n        super(Decoder, self).__init__()\n        self.enc_attention = encoder_attention()\n        self.dec_attention = decoder_attention()\n        self.x_context = nn.Linear(config.hidden_dim*2 + config.emb_dim, config.emb_dim)\n\n        self.lstm = nn.LSTMCell(config.emb_dim, config.hidden_dim)\n        init_lstm_wt(self.lstm)\n\n        self.p_gen_linear = nn.Linear(config.hidden_dim * 5 + config.emb_dim, 1)\n\n        #p_vocab\n        self.V = nn.Linear(config.hidden_dim*4, config.hidden_dim)\n        self.V1 = nn.Linear(config.hidden_dim, config.vocab_size)\n        init_linear_wt(self.V1)\n\n    def forward(self, x_t, s_t, enc_out, enc_padding_mask, ct_e, extra_zeros, enc_batch_extend_vocab, sum_temporal_srcs, prev_s):\n        x = self.x_context(T.cat([x_t, ct_e], dim=1))\n        s_t = self.lstm(x, s_t)\n\n        dec_h, dec_c = s_t\n        st_hat = T.cat([dec_h, dec_c], dim=1)\n        ct_e, attn_dist, sum_temporal_srcs = self.enc_attention(st_hat, enc_out, enc_padding_mask, sum_temporal_srcs)\n\n        ct_d, prev_s = self.dec_attention(dec_h, prev_s)        #intra-decoder attention\n\n        p_gen = T.cat([ct_e, ct_d, st_hat, x], 1)\n        p_gen = self.p_gen_linear(p_gen)            # bs,1\n        p_gen = T.sigmoid(p_gen)                    # bs,1\n\n        out = T.cat([dec_h, ct_e, ct_d], dim=1)     # bs, 4*n_hid\n        out = self.V(out)                           # bs,n_hid\n        out = self.V1(out)                          # bs, n_vocab\n        vocab_dist = F.softmax(out, dim=1)\n        vocab_dist = p_gen * vocab_dist\n        attn_dist_ = (1 - p_gen) * attn_dist\n\n        # pointer mechanism (as suggested in eq 9 https://arxiv.org/pdf/1704.04368.pdf)\n        if extra_zeros is not None:\n            vocab_dist = T.cat([vocab_dist, extra_zeros], dim=1)\n        final_dist = vocab_dist.scatter_add(1, enc_batch_extend_vocab, attn_dist_)\n\n        return final_dist, s_t, ct_e, sum_temporal_srcs, prev_s\n\n\n\nclass Model(nn.Module):\n    def __init__(self):\n        super(Model, self).__init__()\n        self.encoder = Encoder()\n        self.decoder = Decoder()\n        self.embeds = nn.Embedding(config.vocab_size, config.emb_dim)\n        init_wt_normal(self.embeds.weight)\n\n        self.encoder = get_cuda(self.encoder)\n        self.decoder = get_cuda(self.decoder)\n        self.embeds = get_cuda(self.embeds)\n\n\n\n"
  },
  {
    "path": "train.py",
    "content": "import os\nos.environ[\"CUDA_VISIBLE_DEVICES\"] = \"0\"    #Set cuda device\n\nimport time\n\nimport torch as T\nimport torch.nn as nn\nimport torch.nn.functional as F\nfrom model import Model\n\nfrom data_util import config, data\nfrom data_util.batcher import Batcher\nfrom data_util.data import Vocab\nfrom train_util import *\nfrom torch.distributions import Categorical\nfrom rouge import Rouge\nfrom numpy import random\nimport argparse\n\nrandom.seed(123)\nT.manual_seed(123)\nif T.cuda.is_available():\n    T.cuda.manual_seed_all(123)\n\nclass Train(object):\n    def __init__(self, opt):\n        self.vocab = Vocab(config.vocab_path, config.vocab_size)\n        self.batcher = Batcher(config.train_data_path, self.vocab, mode='train',\n                               batch_size=config.batch_size, single_pass=False)\n        self.opt = opt\n        self.start_id = self.vocab.word2id(data.START_DECODING)\n        self.end_id = self.vocab.word2id(data.STOP_DECODING)\n        self.pad_id = self.vocab.word2id(data.PAD_TOKEN)\n        self.unk_id = self.vocab.word2id(data.UNKNOWN_TOKEN)\n        time.sleep(5)\n\n    def save_model(self, iter):\n        save_path = config.save_model_path + \"/%07d.tar\" % iter\n        T.save({\n            \"iter\": iter + 1,\n            \"model_dict\": self.model.state_dict(),\n            \"trainer_dict\": self.trainer.state_dict()\n        }, save_path)\n\n    def setup_train(self):\n        self.model = Model()\n        self.model = get_cuda(self.model)\n        self.trainer = T.optim.Adam(self.model.parameters(), lr=config.lr)\n        start_iter = 0\n        if self.opt.load_model is not None:\n            load_model_path = os.path.join(config.save_model_path, self.opt.load_model)\n            checkpoint = T.load(load_model_path)\n            start_iter = checkpoint[\"iter\"]\n            self.model.load_state_dict(checkpoint[\"model_dict\"])\n            self.trainer.load_state_dict(checkpoint[\"trainer_dict\"])\n            print(\"Loaded model at \" + load_model_path)\n        if self.opt.new_lr is not None:\n            self.trainer = T.optim.Adam(self.model.parameters(), lr=self.opt.new_lr)\n        return start_iter\n\n    def train_batch_MLE(self, enc_out, enc_hidden, enc_padding_mask, ct_e, extra_zeros, enc_batch_extend_vocab, batch):\n        ''' Calculate Negative Log Likelihood Loss for the given batch. In order to reduce exposure bias,\n                pass the previous generated token as input with a probability of 0.25 instead of ground truth label\n        Args:\n        :param enc_out: Outputs of the encoder for all time steps (batch_size, length_input_sequence, 2*hidden_size)\n        :param enc_hidden: Tuple containing final hidden state & cell state of encoder. Shape of h & c: (batch_size, hidden_size)\n        :param enc_padding_mask: Mask for encoder input; Tensor of size (batch_size, length_input_sequence) with values of 0 for pad tokens & 1 for others\n        :param ct_e: encoder context vector for time_step=0 (eq 5 in https://arxiv.org/pdf/1705.04304.pdf)\n        :param extra_zeros: Tensor used to extend vocab distribution for pointer mechanism\n        :param enc_batch_extend_vocab: Input batch that stores OOV ids\n        :param batch: batch object\n        '''\n        dec_batch, max_dec_len, dec_lens, target_batch = get_dec_data(batch)                        #Get input and target batchs for training decoder\n        step_losses = []\n        s_t = (enc_hidden[0], enc_hidden[1])                                                        #Decoder hidden states\n        x_t = get_cuda(T.LongTensor(len(enc_out)).fill_(self.start_id))                             #Input to the decoder\n        prev_s = None                                                                               #Used for intra-decoder attention (section 2.2 in https://arxiv.org/pdf/1705.04304.pdf)\n        sum_temporal_srcs = None                                                                    #Used for intra-temporal attention (section 2.1 in https://arxiv.org/pdf/1705.04304.pdf)\n        for t in range(min(max_dec_len, config.max_dec_steps)):\n            use_gound_truth = get_cuda((T.rand(len(enc_out)) > 0.25)).long()                        #Probabilities indicating whether to use ground truth labels instead of previous decoded tokens\n            x_t = use_gound_truth * dec_batch[:, t] + (1 - use_gound_truth) * x_t                   #Select decoder input based on use_ground_truth probabilities\n            x_t = self.model.embeds(x_t)\n            final_dist, s_t, ct_e, sum_temporal_srcs, prev_s = self.model.decoder(x_t, s_t, enc_out, enc_padding_mask, ct_e, extra_zeros, enc_batch_extend_vocab, sum_temporal_srcs, prev_s)\n            target = target_batch[:, t]\n            log_probs = T.log(final_dist + config.eps)\n            step_loss = F.nll_loss(log_probs, target, reduction=\"none\", ignore_index=self.pad_id)\n            step_losses.append(step_loss)\n            x_t = T.multinomial(final_dist, 1).squeeze()                                            #Sample words from final distribution which can be used as input in next time step\n            is_oov = (x_t >= config.vocab_size).long()                                              #Mask indicating whether sampled word is OOV\n            x_t = (1 - is_oov) * x_t.detach() + (is_oov) * self.unk_id                              #Replace OOVs with [UNK] token\n\n        losses = T.sum(T.stack(step_losses, 1), 1)                                                  #unnormalized losses for each example in the batch; (batch_size)\n        batch_avg_loss = losses / dec_lens                                                          #Normalized losses; (batch_size)\n        mle_loss = T.mean(batch_avg_loss)                                                           #Average batch loss\n        return mle_loss\n\n    def train_batch_RL(self, enc_out, enc_hidden, enc_padding_mask, ct_e, extra_zeros, enc_batch_extend_vocab, article_oovs, greedy):\n        '''Generate sentences from decoder entirely using sampled tokens as input. These sentences are used for ROUGE evaluation\n        Args\n        :param enc_out: Outputs of the encoder for all time steps (batch_size, length_input_sequence, 2*hidden_size)\n        :param enc_hidden: Tuple containing final hidden state & cell state of encoder. Shape of h & c: (batch_size, hidden_size)\n        :param enc_padding_mask: Mask for encoder input; Tensor of size (batch_size, length_input_sequence) with values of 0 for pad tokens & 1 for others\n        :param ct_e: encoder context vector for time_step=0 (eq 5 in https://arxiv.org/pdf/1705.04304.pdf)\n        :param extra_zeros: Tensor used to extend vocab distribution for pointer mechanism\n        :param enc_batch_extend_vocab: Input batch that stores OOV ids\n        :param article_oovs: Batch containing list of OOVs in each example\n        :param greedy: If true, performs greedy based sampling, else performs multinomial sampling\n        Returns:\n        :decoded_strs: List of decoded sentences\n        :log_probs: Log probabilities of sampled words\n        '''\n        s_t = enc_hidden                                                                            #Decoder hidden states\n        x_t = get_cuda(T.LongTensor(len(enc_out)).fill_(self.start_id))                             #Input to the decoder\n        prev_s = None                                                                               #Used for intra-decoder attention (section 2.2 in https://arxiv.org/pdf/1705.04304.pdf)\n        sum_temporal_srcs = None                                                                    #Used for intra-temporal attention (section 2.1 in https://arxiv.org/pdf/1705.04304.pdf)\n        inds = []                                                                                   #Stores sampled indices for each time step\n        decoder_padding_mask = []                                                                   #Stores padding masks of generated samples\n        log_probs = []                                                                              #Stores log probabilites of generated samples\n        mask = get_cuda(T.LongTensor(len(enc_out)).fill_(1))                                        #Values that indicate whether [STOP] token has already been encountered; 1 => Not encountered, 0 otherwise\n\n        for t in range(config.max_dec_steps):\n            x_t = self.model.embeds(x_t)\n            probs, s_t, ct_e, sum_temporal_srcs, prev_s = self.model.decoder(x_t, s_t, enc_out, enc_padding_mask, ct_e, extra_zeros, enc_batch_extend_vocab, sum_temporal_srcs, prev_s)\n            if greedy is False:\n                multi_dist = Categorical(probs)\n                x_t = multi_dist.sample()                                                           #perform multinomial sampling\n                log_prob = multi_dist.log_prob(x_t)\n                log_probs.append(log_prob)\n            else:\n                _, x_t = T.max(probs, dim=1)                                                        #perform greedy sampling\n            x_t = x_t.detach()\n            inds.append(x_t)\n            mask_t = get_cuda(T.zeros(len(enc_out)))                                                #Padding mask of batch for current time step\n            mask_t[mask == 1] = 1                                                                   #If [STOP] is not encountered till previous time step, mask_t = 1 else mask_t = 0\n            mask[(mask == 1) + (x_t == self.end_id) == 2] = 0                                       #If [STOP] is not encountered till previous time step and current word is [STOP], make mask = 0\n            decoder_padding_mask.append(mask_t)\n            is_oov = (x_t>=config.vocab_size).long()                                                #Mask indicating whether sampled word is OOV\n            x_t = (1-is_oov)*x_t + (is_oov)*self.unk_id                                             #Replace OOVs with [UNK] token\n\n        inds = T.stack(inds, dim=1)\n        decoder_padding_mask = T.stack(decoder_padding_mask, dim=1)\n        if greedy is False:                                                                         #If multinomial based sampling, compute log probabilites of sampled words\n            log_probs = T.stack(log_probs, dim=1)\n            log_probs = log_probs * decoder_padding_mask                                            #Not considering sampled words with padding mask = 0\n            lens = T.sum(decoder_padding_mask, dim=1)                                               #Length of sampled sentence\n            log_probs = T.sum(log_probs, dim=1) / lens  # (bs,)                                     #compute normalizied log probability of a sentence\n        decoded_strs = []\n        for i in range(len(enc_out)):\n            id_list = inds[i].cpu().numpy()\n            oovs = article_oovs[i]\n            S = data.outputids2words(id_list, self.vocab, oovs)                                     #Generate sentence corresponding to sampled words\n            try:\n                end_idx = S.index(data.STOP_DECODING)\n                S = S[:end_idx]\n            except ValueError:\n                S = S\n            if len(S) < 2:                                                                           #If length of sentence is less than 2 words, replace it with \"xxx\"; Avoids setences like \".\" which throws error while calculating ROUGE\n                S = [\"xxx\"]\n            S = \" \".join(S)\n            decoded_strs.append(S)\n\n        return decoded_strs, log_probs\n\n    def reward_function(self, decoded_sents, original_sents):\n        rouge = Rouge()\n        try:\n            scores = rouge.get_scores(decoded_sents, original_sents)\n        except Exception:\n            print(\"Rouge failed for multi sentence evaluation.. Finding exact pair\")\n            scores = []\n            for i in range(len(decoded_sents)):\n                try:\n                    score = rouge.get_scores(decoded_sents[i], original_sents[i])\n                except Exception:\n                    print(\"Error occured at:\")\n                    print(\"decoded_sents:\", decoded_sents[i])\n                    print(\"original_sents:\", original_sents[i])\n                    score = [{\"rouge-l\":{\"f\":0.0}}]\n                scores.append(score[0])\n        rouge_l_f1 = [score[\"rouge-l\"][\"f\"] for score in scores]\n        rouge_l_f1 = get_cuda(T.FloatTensor(rouge_l_f1))\n        return rouge_l_f1\n\n    # def write_to_file(self, decoded, max, original, sample_r, baseline_r, iter):\n    #     with open(\"temp.txt\", \"w\") as f:\n    #         f.write(\"iter:\"+str(iter)+\"\\n\")\n    #         for i in range(len(original)):\n    #             f.write(\"dec: \"+decoded[i]+\"\\n\")\n    #             f.write(\"max: \"+max[i]+\"\\n\")\n    #             f.write(\"org: \"+original[i]+\"\\n\")\n    #             f.write(\"Sample_R: %.4f, Baseline_R: %.4f\\n\\n\"%(sample_r[i].item(), baseline_r[i].item()))\n\n\n    def train_one_batch(self, batch, iter):\n        enc_batch, enc_lens, enc_padding_mask, enc_batch_extend_vocab, extra_zeros, context = get_enc_data(batch)\n\n        enc_batch = self.model.embeds(enc_batch)                                                    #Get embeddings for encoder input\n        enc_out, enc_hidden = self.model.encoder(enc_batch, enc_lens)\n\n        # -------------------------------Summarization-----------------------\n        if self.opt.train_mle == \"yes\":                                                             #perform MLE training\n            mle_loss = self.train_batch_MLE(enc_out, enc_hidden, enc_padding_mask, context, extra_zeros, enc_batch_extend_vocab, batch)\n        else:\n            mle_loss = get_cuda(T.FloatTensor([0]))\n        # --------------RL training-----------------------------------------------------\n        if self.opt.train_rl == \"yes\":                                                              #perform reinforcement learning training\n            # multinomial sampling\n            sample_sents, RL_log_probs = self.train_batch_RL(enc_out, enc_hidden, enc_padding_mask, context, extra_zeros, enc_batch_extend_vocab, batch.art_oovs, greedy=False)\n            with T.autograd.no_grad():\n                # greedy sampling\n                greedy_sents, _ = self.train_batch_RL(enc_out, enc_hidden, enc_padding_mask, context, extra_zeros, enc_batch_extend_vocab, batch.art_oovs, greedy=True)\n\n            sample_reward = self.reward_function(sample_sents, batch.original_abstracts)\n            baseline_reward = self.reward_function(greedy_sents, batch.original_abstracts)\n            # if iter%200 == 0:\n            #     self.write_to_file(sample_sents, greedy_sents, batch.original_abstracts, sample_reward, baseline_reward, iter)\n            rl_loss = -(sample_reward - baseline_reward) * RL_log_probs                             #Self-critic policy gradient training (eq 15 in https://arxiv.org/pdf/1705.04304.pdf)\n            rl_loss = T.mean(rl_loss)\n\n            batch_reward = T.mean(sample_reward).item()\n        else:\n            rl_loss = get_cuda(T.FloatTensor([0]))\n            batch_reward = 0\n\n    # ------------------------------------------------------------------------------------\n        self.trainer.zero_grad()\n        (self.opt.mle_weight * mle_loss + self.opt.rl_weight * rl_loss).backward()\n        self.trainer.step()\n\n        return mle_loss.item(), batch_reward\n\n    def trainIters(self):\n        iter = self.setup_train()\n        count = mle_total = r_total = 0\n        while iter <= config.max_iterations:\n            batch = self.batcher.next_batch()\n            try:\n                mle_loss, r = self.train_one_batch(batch, iter)\n            except KeyboardInterrupt:\n                print(\"-------------------Keyboard Interrupt------------------\")\n                exit(0)\n\n            mle_total += mle_loss\n            r_total += r\n            count += 1\n            iter += 1\n\n            if iter % 1000 == 0:\n                mle_avg = mle_total / count\n                r_avg = r_total / count\n                print(\"iter:\", iter, \"mle_loss:\", \"%.3f\" % mle_avg, \"reward:\", \"%.4f\" % r_avg)\n                count = mle_total = r_total = 0\n\n            if iter % 5000 == 0:\n                self.save_model(iter)\n\nif __name__ == \"__main__\":\n    parser = argparse.ArgumentParser()\n    parser.add_argument('--train_mle', type=str, default=\"yes\")\n    parser.add_argument('--train_rl', type=str, default=\"no\")\n    parser.add_argument('--mle_weight', type=float, default=1.0)\n    parser.add_argument('--load_model', type=str, default=None)\n    parser.add_argument('--new_lr', type=float, default=None)\n    opt = parser.parse_args()\n    opt.rl_weight = 1 - opt.mle_weight\n    print(\"Training mle: %s, Training rl: %s, mle weight: %.2f, rl weight: %.2f\"%(opt.train_mle, opt.train_rl, opt.mle_weight, opt.rl_weight))\n    print(\"intra_encoder:\", config.intra_encoder, \"intra_decoder:\", config.intra_decoder)\n\n    train_processor = Train(opt)\n    train_processor.trainIters()\n"
  },
  {
    "path": "train_util.py",
    "content": "import numpy as np\nimport torch as T\nfrom data_util import config\n\ndef get_cuda(tensor):\n    if T.cuda.is_available():\n        tensor = tensor.cuda()\n    return tensor\n\ndef get_enc_data(batch):\n    batch_size = len(batch.enc_lens)\n    enc_batch = T.from_numpy(batch.enc_batch).long()\n    enc_padding_mask = T.from_numpy(batch.enc_padding_mask).float()\n\n    enc_lens = batch.enc_lens\n\n    ct_e = T.zeros(batch_size, 2*config.hidden_dim)\n\n    enc_batch = get_cuda(enc_batch)\n    enc_padding_mask = get_cuda(enc_padding_mask)\n\n    ct_e = get_cuda(ct_e)\n\n    enc_batch_extend_vocab = None\n    if batch.enc_batch_extend_vocab is not None:\n        enc_batch_extend_vocab = T.from_numpy(batch.enc_batch_extend_vocab).long()\n        enc_batch_extend_vocab = get_cuda(enc_batch_extend_vocab)\n\n    extra_zeros = None\n    if batch.max_art_oovs > 0:\n        extra_zeros = T.zeros(batch_size, batch.max_art_oovs)\n        extra_zeros = get_cuda(extra_zeros)\n\n\n    return enc_batch, enc_lens, enc_padding_mask, enc_batch_extend_vocab, extra_zeros, ct_e\n\ndef get_dec_data(batch):\n    dec_batch = T.from_numpy(batch.dec_batch).long()\n    dec_lens = batch.dec_lens\n    max_dec_len = np.max(dec_lens)\n    dec_lens = T.from_numpy(batch.dec_lens).float()\n\n    target_batch = T.from_numpy(batch.target_batch).long()\n\n    dec_batch = get_cuda(dec_batch)\n    dec_lens = get_cuda(dec_lens)\n    target_batch = get_cuda(target_batch)\n\n    return dec_batch, max_dec_len, dec_lens, target_batch\n"
  },
  {
    "path": "training_log.txt",
    "content": "--------MLE Training------------\n\n$ python train.py\nTraining mle: yes, Training rl: no, mle weight: 1.00, rl weight: 0.00\nintra_encoder: True intra_decoder: True\niter: 1000 mle_loss: 4.652 reward: 0.0000\niter: 2000 mle_loss: 3.942 reward: 0.0000\niter: 3000 mle_loss: 3.699 reward: 0.0000\niter: 4000 mle_loss: 3.555 reward: 0.0000\niter: 5000 mle_loss: 3.447 reward: 0.0000\niter: 6000 mle_loss: 3.378 reward: 0.0000\niter: 7000 mle_loss: 3.321 reward: 0.0000\niter: 8000 mle_loss: 3.282 reward: 0.0000\niter: 9000 mle_loss: 3.242 reward: 0.0000\niter: 10000 mle_loss: 3.206 reward: 0.0000\niter: 11000 mle_loss: 3.183 reward: 0.0000\niter: 12000 mle_loss: 3.154 reward: 0.0000\niter: 13000 mle_loss: 3.137 reward: 0.0000\niter: 14000 mle_loss: 3.122 reward: 0.0000\niter: 15000 mle_loss: 3.081 reward: 0.0000\niter: 16000 mle_loss: 3.026 reward: 0.0000\niter: 17000 mle_loss: 3.014 reward: 0.0000\niter: 18000 mle_loss: 2.999 reward: 0.0000\niter: 19000 mle_loss: 2.992 reward: 0.0000\niter: 20000 mle_loss: 2.989 reward: 0.0000\niter: 21000 mle_loss: 2.971 reward: 0.0000\niter: 22000 mle_loss: 2.983 reward: 0.0000\niter: 23000 mle_loss: 2.966 reward: 0.0000\niter: 24000 mle_loss: 2.957 reward: 0.0000\niter: 25000 mle_loss: 2.946 reward: 0.0000\niter: 26000 mle_loss: 2.942 reward: 0.0000\niter: 27000 mle_loss: 2.941 reward: 0.0000\niter: 28000 mle_loss: 2.930 reward: 0.0000\niter: 29000 mle_loss: 2.923 reward: 0.0000\niter: 30000 mle_loss: 2.906 reward: 0.0000\niter: 31000 mle_loss: 2.818 reward: 0.0000\niter: 32000 mle_loss: 2.809 reward: 0.0000\niter: 33000 mle_loss: 2.822 reward: 0.0000\niter: 34000 mle_loss: 2.807 reward: 0.0000\niter: 35000 mle_loss: 2.833 reward: 0.0000\niter: 36000 mle_loss: 2.815 reward: 0.0000\niter: 37000 mle_loss: 2.829 reward: 0.0000\niter: 38000 mle_loss: 2.830 reward: 0.0000\niter: 39000 mle_loss: 2.822 reward: 0.0000\niter: 40000 mle_loss: 2.833 reward: 0.0000\niter: 41000 mle_loss: 2.817 reward: 0.0000\niter: 42000 mle_loss: 2.815 reward: 0.0000\niter: 43000 mle_loss: 2.816 reward: 0.0000\niter: 44000 mle_loss: 2.812 reward: 0.0000\niter: 45000 mle_loss: 2.757 reward: 0.0000\niter: 46000 mle_loss: 2.698 reward: 0.0000\niter: 47000 mle_loss: 2.701 reward: 0.0000\niter: 48000 mle_loss: 2.710 reward: 0.0000\niter: 49000 mle_loss: 2.728 reward: 0.0000\niter: 50000 mle_loss: 2.711 reward: 0.0000\niter: 51000 mle_loss: 2.718 reward: 0.0000\niter: 52000 mle_loss: 2.728 reward: 0.0000\niter: 53000 mle_loss: 2.725 reward: 0.0000\niter: 54000 mle_loss: 2.722 reward: 0.0000\niter: 55000 mle_loss: 2.728 reward: 0.0000\niter: 56000 mle_loss: 2.729 reward: 0.0000\niter: 57000 mle_loss: 2.731 reward: 0.0000\niter: 58000 mle_loss: 2.741 reward: 0.0000\niter: 59000 mle_loss: 2.731 reward: 0.0000\niter: 60000 mle_loss: 2.645 reward: 0.0000\niter: 61000 mle_loss: 2.600 reward: 0.0000\niter: 62000 mle_loss: 2.600 reward: 0.0000\niter: 63000 mle_loss: 2.612 reward: 0.0000\niter: 64000 mle_loss: 2.626 reward: 0.0000\niter: 65000 mle_loss: 2.637 reward: 0.0000\niter: 66000 mle_loss: 2.641 reward: 0.0000\niter: 67000 mle_loss: 2.652 reward: 0.0000\niter: 68000 mle_loss: 2.651 reward: 0.0000\niter: 69000 mle_loss: 2.643 reward: 0.0000\niter: 70000 mle_loss: 2.661 reward: 0.0000\niter: 71000 mle_loss: 2.668 reward: 0.0000\niter: 72000 mle_loss: 2.668 reward: 0.0000\niter: 73000 mle_loss: 2.679 reward: 0.0000\niter: 74000 mle_loss: 2.670 reward: 0.0000\niter: 75000 mle_loss: 2.567 reward: 0.0000\niter: 76000 mle_loss: 2.524 reward: 0.0000\niter: 77000 mle_loss: 2.549 reward: 0.0000\niter: 78000 mle_loss: 2.535 reward: 0.0000\niter: 79000 mle_loss: 2.552 reward: 0.0000\niter: 80000 mle_loss: 2.568 reward: 0.0000\niter: 81000 mle_loss: 2.581 reward: 0.0000\niter: 82000 mle_loss: 2.595 reward: 0.0000\niter: 83000 mle_loss: 2.600 reward: 0.0000\niter: 84000 mle_loss: 2.595 reward: 0.0000\niter: 85000 mle_loss: 2.593 reward: 0.0000\niter: 86000 mle_loss: 2.615 reward: 0.0000\niter: 87000 mle_loss: 2.608 reward: 0.0000\niter: 88000 mle_loss: 2.604 reward: 0.0000\niter: 89000 mle_loss: 2.618 reward: 0.0000\niter: 90000 mle_loss: 2.483 reward: 0.0000\niter: 91000 mle_loss: 2.483 reward: 0.0000\niter: 92000 mle_loss: 2.479 reward: 0.0000\niter: 93000 mle_loss: 2.490 reward: 0.0000\niter: 94000 mle_loss: 2.520 reward: 0.0000\niter: 95000 mle_loss: 2.527 reward: 0.0000\niter: 96000 mle_loss: 2.525 reward: 0.0000\niter: 97000 mle_loss: 2.532 reward: 0.0000\niter: 98000 mle_loss: 2.546 reward: 0.0000\niter: 99000 mle_loss: 2.537 reward: 0.0000\niter: 100000 mle_loss: 2.546 reward: 0.0000\niter: 101000 mle_loss: 2.551 reward: 0.0000\niter: 102000 mle_loss: 2.562 reward: 0.0000\niter: 103000 mle_loss: 2.566 reward: 0.0000\niter: 104000 mle_loss: 2.577 reward: 0.0000\niter: 105000 mle_loss: 2.370 reward: 0.0000\niter: 106000 mle_loss: 2.433 reward: 0.0000\niter: 107000 mle_loss: 2.435 reward: 0.0000\niter: 108000 mle_loss: 2.454 reward: 0.0000\niter: 109000 mle_loss: 2.461 reward: 0.0000\niter: 110000 mle_loss: 2.479 reward: 0.0000\niter: 111000 mle_loss: 2.486 reward: 0.0000\niter: 112000 mle_loss: 2.499 reward: 0.0000\niter: 113000 mle_loss: 2.503 reward: 0.0000\niter: 114000 mle_loss: 2.503 reward: 0.0000\niter: 115000 mle_loss: 2.518 reward: 0.0000\niter: 116000 mle_loss: 2.515 reward: 0.0000\niter: 117000 mle_loss: 2.523 reward: 0.0000\niter: 118000 mle_loss: 2.532 reward: 0.0000\niter: 119000 mle_loss: 2.511 reward: 0.0000\niter: 120000 mle_loss: 2.373 reward: 0.0000\niter: 121000 mle_loss: 2.386 reward: 0.0000\niter: 122000 mle_loss: 2.386 reward: 0.0000\niter: 123000 mle_loss: 2.419 reward: 0.0000\niter: 124000 mle_loss: 2.419 reward: 0.0000\niter: 125000 mle_loss: 2.440 reward: 0.0000\niter: 126000 mle_loss: 2.455 reward: 0.0000\niter: 127000 mle_loss: 2.463 reward: 0.0000\niter: 128000 mle_loss: 2.472 reward: 0.0000\niter: 129000 mle_loss: 2.474 reward: 0.0000\niter: 130000 mle_loss: 2.479 reward: 0.0000\niter: 131000 mle_loss: 2.487 reward: 0.0000\niter: 132000 mle_loss: 2.486 reward: 0.0000\niter: 133000 mle_loss: 2.488 reward: 0.0000\niter: 134000 mle_loss: 2.423 reward: 0.0000\niter: 135000 mle_loss: 2.300 reward: 0.0000\niter: 136000 mle_loss: 2.368 reward: 0.0000\niter: 137000 mle_loss: 2.381 reward: 0.0000\niter: 138000 mle_loss: 2.367 reward: 0.0000\niter: 139000 mle_loss: 2.408 reward: 0.0000\niter: 140000 mle_loss: 2.404 reward: 0.0000\niter: 141000 mle_loss: 2.412 reward: 0.0000\niter: 142000 mle_loss: 2.439 reward: 0.0000\niter: 143000 mle_loss: 2.433 reward: 0.0000\niter: 144000 mle_loss: 2.448 reward: 0.0000\niter: 145000 mle_loss: 2.445 reward: 0.0000\niter: 146000 mle_loss: 2.462 reward: 0.0000\niter: 147000 mle_loss: 2.456 reward: 0.0000\niter: 148000 mle_loss: 2.468 reward: 0.0000\niter: 149000 mle_loss: 2.399 reward: 0.0000\niter: 150000 mle_loss: 2.308 reward: 0.0000\niter: 151000 mle_loss: 2.330 reward: 0.0000\niter: 152000 mle_loss: 2.371 reward: 0.0000\niter: 153000 mle_loss: 2.368 reward: 0.0000\niter: 154000 mle_loss: 2.363 reward: 0.0000\niter: 155000 mle_loss: 2.378 reward: 0.0000\niter: 156000 mle_loss: 2.398 reward: 0.0000\niter: 157000 mle_loss: 2.405 reward: 0.0000\niter: 158000 mle_loss: 2.408 reward: 0.0000\n\n\n-------------MLE Validation---------------\n\n$ python eval.py --task=validate --start_from=0005000.tar\n0005000.tar rouge_l: 0.3818\n0010000.tar rouge_l: 0.3921\n0015000.tar rouge_l: 0.3988\n0020000.tar rouge_l: 0.4030\n0025000.tar rouge_l: 0.4047\n0030000.tar rouge_l: 0.4037\n0035000.tar rouge_l: 0.4063\n0040000.tar rouge_l: 0.4078\n0045000.tar rouge_l: 0.4088\n0050000.tar rouge_l: 0.4077\n0055000.tar rouge_l: 0.4075\n0060000.tar rouge_l: 0.4079\n0065000.tar rouge_l: 0.4114\t\t#best\n0070000.tar rouge_l: 0.4074\n0075000.tar rouge_l: 0.4080\n0080000.tar rouge_l: 0.4090\n0085000.tar rouge_l: 0.4060\n0090000.tar rouge_l: 0.4079\n0095000.tar rouge_l: 0.4086\n0100000.tar rouge_l: 0.4076\n0105000.tar rouge_l: 0.4053\n0110000.tar rouge_l: 0.4062\n0115000.tar rouge_l: 0.4056\n0120000.tar rouge_l: 0.4022\n0125000.tar rouge_l: 0.4042\n0130000.tar rouge_l: 0.4067\n0135000.tar rouge_l: 0.4012\n0140000.tar rouge_l: 0.4046\n0145000.tar rouge_l: 0.4026\n0150000.tar rouge_l: 0.4026\n0155000.tar rouge_l: 0.4018\n\n-----------------MLE + RL Training--------------------\n\n$ python train.py --train_mle=yes --train_rl=yes --mle_weight=0.25 --load_model=0065000.tar --new_lr=0.0001\nTraining mle: yes, Training rl: yes, mle weight: 0.25, rl weight: 0.75\nintra_encoder: True intra_decoder: True\nLoaded model at data/saved_models/0065000.tar\niter: 66000 mle_loss: 2.555 reward: 0.3088\niter: 67000 mle_loss: 2.570 reward: 0.3097\niter: 68000 mle_loss: 2.496 reward: 0.3177\niter: 69000 mle_loss: 2.568 reward: 0.3101\niter: 70000 mle_loss: 2.437 reward: 0.3231\niter: 71000 mle_loss: 2.474 reward: 0.3209\niter: 72000 mle_loss: 2.471 reward: 0.3204\niter: 73000 mle_loss: 2.474 reward: 0.3204\niter: 74000 mle_loss: 2.451 reward: 0.3226\niter: 75000 mle_loss: 2.477 reward: 0.3204\niter: 76000 mle_loss: 2.470 reward: 0.3204\niter: 77000 mle_loss: 2.503 reward: 0.3182\niter: 78000 mle_loss: 2.523 reward: 0.3148\niter: 79000 mle_loss: 2.385 reward: 0.3286\niter: 80000 mle_loss: 2.488 reward: 0.3200\niter: 81000 mle_loss: 2.396 reward: 0.3271\niter: 82000 mle_loss: 2.459 reward: 0.3215\niter: 83000 mle_loss: 2.371 reward: 0.3301\niter: 84000 mle_loss: 2.433 reward: 0.3253\niter: 85000 mle_loss: 2.475 reward: 0.3207\niter: 86000 mle_loss: 2.504 reward: 0.3178\niter: 87000 mle_loss: 2.441 reward: 0.3241\niter: 88000 mle_loss: 2.424 reward: 0.3266\niter: 89000 mle_loss: 2.399 reward: 0.3285\niter: 90000 mle_loss: 2.405 reward: 0.3274\niter: 91000 mle_loss: 2.425 reward: 0.3262\niter: 92000 mle_loss: 2.424 reward: 0.3264\niter: 93000 mle_loss: 2.433 reward: 0.3252\niter: 94000 mle_loss: 2.414 reward: 0.3278\niter: 95000 mle_loss: 2.444 reward: 0.3241\niter: 96000 mle_loss: 2.395 reward: 0.3288\niter: 97000 mle_loss: 2.425 reward: 0.3256\niter: 98000 mle_loss: 2.378 reward: 0.3305\niter: 99000 mle_loss: 2.415 reward: 0.3268\niter: 100000 mle_loss: 2.412 reward: 0.3277\niter: 101000 mle_loss: 2.387 reward: 0.3296\niter: 102000 mle_loss: 2.370 reward: 0.3316\niter: 103000 mle_loss: 2.420 reward: 0.3268\niter: 104000 mle_loss: 2.408 reward: 0.3285\niter: 105000 mle_loss: 2.415 reward: 0.3276\niter: 106000 mle_loss: 2.401 reward: 0.3295\niter: 107000 mle_loss: 2.467 reward: 0.3233\n\n\n----------------------MLE + RL Validation--------------------------\n\n$ python eval.py --task=validate --start_from=0070000.tar\n0070000.tar rouge_l: 0.4169\n0075000.tar rouge_l: 0.4174\n0080000.tar rouge_l: 0.4184\n0085000.tar rouge_l: 0.4186\t\t#best\n0090000.tar rouge_l: 0.4165\n0095000.tar rouge_l: 0.4173\n0100000.tar rouge_l: 0.4164\n0105000.tar rouge_l: 0.4163\n\n----------------------MLE Testing------------------------------------\n\n$ python eval.py --task=test --load_model=0065000.tar\n0065000.tar scores: {'rouge-1': {'f': 0.4412018559893622, 'p': 0.4814799494024485, 'r': 0.4232331027817015}, 'rouge-2': {'f': 0.23238981595683728, 'p': 0.2531296070596062, 'r': 0.22407861554997008}, 'rouge-l': {'f': 0.40477682528278364, 'p': 0.4584684491434479, 'r': 0.40351107200202596}}\n\n----------------------MLE + RL Testing-------------------------------\n\n$ python eval.py --task=test --load_model=0085000.tar\n0085000.tar scores: {'rouge-1': {'f': 0.4499047033247696, 'p': 0.4853756369556345, 'r': 0.43544461386607497}, 'rouge-2': {'f': 0.24037014314625643, 'p': 0.25903387205387235, 'r': 0.23362662645146298}, 'rouge-l': {'f': 0.41320241732946406, 'p': 0.4616655167980162, 'r': 0.4144419466382236}}"
  }
]