Repository: liaoxy169/Aggregation-Cross-Entropy
Branch: master
Commit: a31ccec38d6e
Files: 12
Total size: 13.9 KB
Directory structure:
gitextract_44y1fogf/
├── .gitignore
├── README.md
├── log/
│ ├── log/
│ │ └── .gitkeep
│ └── snapshot/
│ └── .gitkeep
└── source/
├── main.py
├── models/
│ ├── __init__.py
│ ├── seq_module.py
│ └── solver.py
├── train.sh
└── utils/
├── __init__.py
├── basic.py
└── data_loader.py
================================================
FILE CONTENTS
================================================
================================================
FILE: .gitignore
================================================
data/
*.txt
*.pkl
*.pyc
================================================
FILE: README.md
================================================
# Aggregation Cross-Entropy for Sequence Recognition
This repository contains the code for the paper **Aggregation Cross-Entropy for Sequence Recognition**. Zecheng Xie, Yaoxiong Huang, Yuanzhi Zhu, Lianwen Jin, Yuliang Liu and Lele Xie. CVPR. 2019. [\[Paper\]](https://arxiv.org/abs/1904.08364)
Connectionist temporal classification (CTC) and attention mechanism are the most popular methods for sequence-learning problem. However, CTC relies on a sophisticated forward-backward algorithm for transcription, which prevents it from addressing two-dimensional (2D) prediction problem, whereas the attention mechanism leans on a complex attention module to fulfill its functionality, resulting in additional network parameters and runtime.
In this paper, we propose a novel method, aggregation cross-entropy (ACE), for sequence recognition from a brand new perspective. The ACE loss function exhibits competitive performance to CTC and the attention mechanism, with much quicker implementation (as it involves only four fundamental formulas), faster inference\back-propagation (approximately *O(1)* in parallel), less storage requirement (no parameter and negligible runtime memory), and convenient employment (by replacing CTC with ACE). Furthermore, the proposed ACE loss function exhibits two noteworthy properties: (1) it can be directly applied for 2D prediction by flattening the 2D prediction into 1D prediction as the input and (2) it requires only characters and their numbers in the sequence annotation for supervision, which allows it to advance beyond sequence recognition, e.g., counting problem.

Figure 1: Illustration of proposed ACE loss function. Generally, the 1D and 2D predictions are generated by integrated CNN-LSTM and FCN model, respectively. For the ACE loss function, the 2D prediction is further flattened to 1D prediction. During aggregation, the 1D predictions at all time-steps are accumulated for each class independently. After normalization, the prediction, together with the ground-truth, is utilized for loss estimation based on cross-entropy.

Figure 2: Toy example to show the advantage of ACE loss function. Resnet-50 trained with ACE loss function is able to recognize shuffled characters in the images. For each sub-image, the right column shows the 2D prediction of the recognition model for the text images. It is noteworthy that they have similar character distributions in the 2D space.
## Requirements
- [Python 2.7](https://www.python.org/)
- [PyTorch >= 0.4.1](https://pytorch.org/)
- [TorchVision](https://pypi.org/project/torchvision/)
- [OpenCV](https://opencv.org/)
## Data Preparation
tar -xzvf data.tar.gz
## Training and Testing
Start training: (in 'source/' folder)
```bash
sh train.sh
```
- The training process should take **about 10s** for 100 iterations on a 1080Ti.
## Citation
```
@inproceedings{xie2019ace,
title = {Aggregation Cross-Entropy for Sequence Recognition},
author = {Zecheng Xie, Yaoxiong Huang, Yuanzhi Zhu, Lianwen Jin, Yuliang Liu and Lele Xie},
booktitle = {CVPR},
year = {2019},
}
```
## Attention
The project is only free for academic research purposes.
================================================
FILE: log/log/.gitkeep
================================================
# Ignore everything in this directory
*
# Except this file !.gitkeep
================================================
FILE: log/snapshot/.gitkeep
================================================
# Ignore everything in this directory
*
# Except this file !.gitkeep
================================================
FILE: source/main.py
================================================
# -*- coding: utf-8 -*-
from __future__ import print_function, division
import torch
import argparse
import numpy as np
import torch.nn as nn
from torch import optim
import torch.nn.functional as F
from models.seq_module import ACE
from torch.autograd import Variable
from models.solver import seq_solver
from utils.basic import timeSince
from torch.utils.data import DataLoader
from utils.data_loader import ImageDataset
parser = argparse.ArgumentParser()
parser.add_argument('--model_path', type=str, default='../log/snapshot/model-{:0>2d}.pkl')
parser.add_argument('--total_epoch', type=int, default=50, help='total epoch number')
parser.add_argument('--train_path', type=str, default='../data/train.txt')
parser.add_argument('--test_path', type=str, default='../data/test.txt')
parser.add_argument('--train_batch_size', type=int, default=50, help='training batch size')
parser.add_argument('--test_batch_size', type=int, default=50, help='testing batch size')
parser.add_argument('--last_epoch', type=int, default=0, help='last epoch')
parser.add_argument('--class_num', type=int, default=26, help='class number')
parser.add_argument('--dict', type=str, default='_abcdefghijklmnopqrstuvwxyz')
opt = parser.parse_args()
print(opt)
import torchvision.models as models
class ResnetEncoderDecoder(nn.Module):
def __init__(self, loss_layer):
super(ResnetEncoderDecoder, self).__init__()
self.bn = nn.BatchNorm2d(64)
resnet = models.resnet18(pretrained=True)
self.conv = nn.Conv2d(1, 64, kernel_size=(3, 3), padding=(1, 1), stride=(1, 1))
self.cnn = nn.Sequential(*list(resnet.children())[4:-2])
self.out = nn.Linear(512, opt.class_num+1)
self.loss_layer = loss_layer(opt.dict)
def forward(self, input, labels):
input = F.relu(self.bn(self.conv(input)), True)
input = F.max_pool2d(input, kernel_size=(2, 2), stride=(2, 2))
input = self.cnn(input)
input = input.permute(0,2,3,1)
input = F.softmax(self.out(input),dim=-1)
labels = labels.cuda()
return self.loss_layer(input,labels)
if __name__ == "__main__":
model = ResnetEncoderDecoder(ACE).cuda()
print(model)
optimizer = optim.Adadelta(model.parameters())
if opt.last_epoch != 0:
check_point = torch.load(opt.model_path.format(opt.last_epoch))
model.load_state_dict(check_point['state_dict'])
optimizer.load_state_dict(check_point['optimizer'])
scheduler = optim.lr_scheduler.MultiStepLR(optimizer, milestones = [opt.total_epoch], gamma = 0.1, last_epoch = opt.last_epoch)
else:
scheduler = optim.lr_scheduler.MultiStepLR(optimizer, milestones = [opt.total_epoch], gamma = 0.1)
train_set = ImageDataset(file_name = opt.train_path, length = 5000, class_num = opt.class_num)
lmdb_train = DataLoader(train_set, batch_size=opt.train_batch_size, shuffle=True, num_workers=0)
test_set = ImageDataset(file_name = opt.test_path, length = 1000, class_num = opt.class_num)
lmdb_test = DataLoader(test_set, batch_size=opt.test_batch_size, shuffle=False, num_workers=0)
the_solver = seq_solver(model = model,
lmdb = [lmdb_train, lmdb_test],
optimizer = optimizer,
scheduler = scheduler,
total_epoch = opt.total_epoch,
model_path = opt.model_path,
last_epoch = opt.last_epoch)
the_solver.forward()
================================================
FILE: source/models/__init__.py
================================================
================================================
FILE: source/models/seq_module.py
================================================
# -*- coding: utf-8 -*-
import math
import torch
import random
import itertools
import numpy as np
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable
class Sequence(nn.Module):
def __init__(self):
super(Sequence, self).__init__()
def result_analysis(self, iteration):
pass;
class ACE(Sequence):
def __init__(self, dictionary):
super(ACE, self).__init__()
self.softmax = None;
self.label = None;
self.dict=dictionary
def forward(self, input, label):
self.bs,self.h,self.w,_ = input.size()
T_ = self.h*self.w
input = input.view(self.bs,T_,-1)
input = input + 1e-10
self.softmax = input
label[:,0] = T_ - label[:,0]
self.label = label
# ACE Implementation (four fundamental formulas)
input = torch.sum(input,1)
input = input/T_
label = label/T_
loss = (-torch.sum(torch.log(input)*label))/self.bs
return loss
def decode_batch(self):
out_best = torch.max(self.softmax, 2)[1].data.cpu().numpy()
pre_result = [0]*self.bs
for j in range(self.bs):
pre_result[j] = out_best[j][out_best[j]!=0]
return pre_result
def vis(self,iteration):
sn = random.randint(0,self.bs-1)
print('Test image %4d:' % (iteration*50+sn))
pred = torch.max(self.softmax, 2)[1].data.cpu().numpy()
pred = pred[sn].tolist() # sample #0
pred_string = ''.join(['%2s' % self.dict[pn] for pn in pred])
pred_string_set = [pred_string[i:i+self.w*2] for i in xrange(0, len(pred_string), self.w*2)]
print('Prediction: ')
for pre_str in pred_string_set:
print(pre_str)
label = ''.join(['%2s:%2d'%(self.dict[idx],pn) for idx, pn in enumerate(self.label[sn]) if idx != 0 and pn != 0])
label = 'Label: ' + label
print(label)
def result_analysis(self, iteration):
prediction = self.decode_batch()
correct_count = 0
pre_total = 0
len_total = self.label[:,1:].sum()
label_data = self.label.data.cpu().numpy()
for idx, pre_list in enumerate(prediction):
for pw in pre_list:
if label_data[idx][pw] > 0:
correct_count = correct_count + 1
label_data[idx][pw] -= 1
pre_total += len(pre_list)
if not self.training and random.random() < 0.05:
self.vis(iteration)
return correct_count, len_total, pre_total
================================================
FILE: source/models/solver.py
================================================
import time
import torch
import numpy as np
from torch.autograd import Variable
from utils.basic import timeSince
class solver():
def __init__(self, model, lmdb, optimizer, scheduler, total_epoch, model_path, last_epoch):
self.model = model
print(self.model)
self.lmdb_train, self.lmdb_test = lmdb
self.optimizer = optimizer
self.scheduler = scheduler
self.total_epoch = total_epoch
self.model_path = model_path
self.last_epoch = last_epoch
self.start = time.time()
def train_one_epoch(self, ep):
pass
def test_one_epoch(self, ep):
pass
def forward(self):
for ep in range(self.total_epoch-self.last_epoch):
ep = ep+self.last_epoch
self.train_one_epoch(ep)
self.test_one_epoch(ep)
import pdb
class seq_solver(solver):
def train_one_epoch(self, ep):
self.model.train()
loss_aver = 0
if self.scheduler is not None:
self.scheduler.step()
print('learning_rate: ', self.scheduler.get_lr())
for it, sample_batched in enumerate(self.lmdb_train):
inputs = sample_batched['image'].squeeze(0)
labels = sample_batched['label'].squeeze(0)
inputs = Variable(inputs.cuda())
loss = self.model(inputs, labels)
self.optimizer.zero_grad()
loss.backward()
loss = loss.data.item()
l2_norm = torch.nn.utils.clip_grad_norm_(self.model.parameters(),10)
if not np.isnan(l2_norm):
self.optimizer.step()
else:
print('l2_norm: ', l2_norm)
l2_norm = 0
if it == 0:
loss_aver = loss
loss_aver = 0.9*loss_aver+0.1*loss
if it == len(self.lmdb_train)-1:
correct_count, len_total, pre_total = self.model.loss_layer.result_analysis(it)
recall = float(correct_count) / len_total
precision = correct_count / (pre_total+0.000001)
print('Train: %10s Epoch: %3d it: %6d, loss: %.4f, l2_norm: %.4f, recall: %.4f, precision: %.4f' %
(timeSince(self.start), ep, it, loss_aver, l2_norm, recall, precision))
torch.save({
'epoch': ep,
'state_dict': self.model.state_dict(),
'optimizer' : self.optimizer.state_dict(),
}, self.model_path.format(ep))
def test_one_epoch(self, ep):
self.model.eval()
loss_aver = 0
for it, sample_batched in enumerate(self.lmdb_test):
inputs = sample_batched['image'].squeeze(0)
labels = sample_batched['label'].squeeze(0)
inputs = Variable(inputs.cuda())
loss = self.model(inputs, labels)
correct_count, len_total, pre_total = self.model.loss_layer.result_analysis(it)
loss = loss.data.item()
if it == 0:
loss_aver = loss
loss_aver = 0.9*loss_aver+0.1*loss
if it == len(self.lmdb_test) -1:
recall = float(correct_count) / len_total
precision = correct_count / (pre_total+0.000001)
print('Test : %10s Epoch: %3d it: %6d, loss: %.4f, len : %4d, recall: %.4f, precision: %.4f' %
(timeSince(self.start), ep, it, loss_aver, len_total, recall, precision))
================================================
FILE: source/train.sh
================================================
#!/usr/bin/env bash
filename="../log/log/log_`date +%y_%m_%d_%H_%M_%S`.txt"
CUDA_VISIBLE_DEVICES=0 python -u main.py \
2>&1 | tee $filename
================================================
FILE: source/utils/__init__.py
================================================
================================================
FILE: source/utils/basic.py
================================================
import time
import math
def asMinutes(s):
m = math.floor(s / 60)
s -= m * 60
return '%dm %ds' % (m, s)
def timeSince(since):
now = time.time()
s = now - since
return '%s' % (asMinutes(s))
================================================
FILE: source/utils/data_loader.py
================================================
import cv2
import torch
import numpy as np
from torch.utils.data import Dataset, DataLoader
class ImageDataset(Dataset):
"""Face Landmarks dataset."""
def __init__(self, file_name, length, class_num, transform=None):
"""
Args:
file_name (string): Path to the files with images and their annotations.
length (string): image number.
class_num (int): class number.
"""
with open(file_name) as fh:
self.img_and_label = fh.readlines()
self.length = length
self.transform = transform
self.class_num = class_num
def __len__(self):
return self.length
def __getitem__(self, idx):
img_and_label = self.img_and_label[idx].strip()
pth, word = img_and_label.split(' ') # image path and its annotation
image = cv2.imread(pth,0)
image = cv2.pyrDown(image).astype('float32') # 100*100
word = [ord(var)-97 for var in word] # a->0
label = np.zeros((self.class_num+1)).astype('float32')
for ln in word:
label[int(ln+1)] += 1 # label construction for ACE
label[0] = len(word)
sample = {'image': image, 'label': label}
sample = {'image': torch.from_numpy(image).unsqueeze(0), 'label': torch.from_numpy(label)}
if self.transform:
sample = self.transform(sample)
return sample
gitextract_44y1fogf/
├── .gitignore
├── README.md
├── log/
│ ├── log/
│ │ └── .gitkeep
│ └── snapshot/
│ └── .gitkeep
└── source/
├── main.py
├── models/
│ ├── __init__.py
│ ├── seq_module.py
│ └── solver.py
├── train.sh
└── utils/
├── __init__.py
├── basic.py
└── data_loader.py
SYMBOL INDEX (26 symbols across 5 files)
FILE: source/main.py
class ResnetEncoderDecoder (line 31) | class ResnetEncoderDecoder(nn.Module):
method __init__ (line 32) | def __init__(self, loss_layer):
method forward (line 41) | def forward(self, input, labels):
FILE: source/models/seq_module.py
class Sequence (line 11) | class Sequence(nn.Module):
method __init__ (line 12) | def __init__(self):
method result_analysis (line 15) | def result_analysis(self, iteration):
class ACE (line 20) | class ACE(Sequence):
method __init__ (line 22) | def __init__(self, dictionary):
method forward (line 28) | def forward(self, input, label):
method decode_batch (line 49) | def decode_batch(self):
method vis (line 57) | def vis(self,iteration):
method result_analysis (line 75) | def result_analysis(self, iteration):
FILE: source/models/solver.py
class solver (line 7) | class solver():
method __init__ (line 9) | def __init__(self, model, lmdb, optimizer, scheduler, total_epoch, mod...
method train_one_epoch (line 23) | def train_one_epoch(self, ep):
method test_one_epoch (line 25) | def test_one_epoch(self, ep):
method forward (line 28) | def forward(self):
class seq_solver (line 35) | class seq_solver(solver):
method train_one_epoch (line 37) | def train_one_epoch(self, ep):
method test_one_epoch (line 80) | def test_one_epoch(self, ep):
FILE: source/utils/basic.py
function asMinutes (line 5) | def asMinutes(s):
function timeSince (line 11) | def timeSince(since):
FILE: source/utils/data_loader.py
class ImageDataset (line 7) | class ImageDataset(Dataset):
method __init__ (line 10) | def __init__(self, file_name, length, class_num, transform=None):
method __len__ (line 23) | def __len__(self):
method __getitem__ (line 26) | def __getitem__(self, idx):
Condensed preview — 12 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (15K chars).
[
{
"path": ".gitignore",
"chars": 26,
"preview": "data/\r\n*.txt\r\n*.pkl\r\n*.pyc"
},
{
"path": "README.md",
"chars": 3243,
"preview": "# Aggregation Cross-Entropy for Sequence Recognition\r\nThis repository contains the code for the paper **Aggregation Cros"
},
{
"path": "log/log/.gitkeep",
"chars": 73,
"preview": "# Ignore everything in this directory \r\n* \r\n# Except this file !.gitkeep "
},
{
"path": "log/snapshot/.gitkeep",
"chars": 73,
"preview": "# Ignore everything in this directory \r\n* \r\n# Except this file !.gitkeep "
},
{
"path": "source/main.py",
"chars": 3520,
"preview": "# -*- coding: utf-8 -*-\nfrom __future__ import print_function, division\nimport torch\nimport argparse\nimport numpy as np\n"
},
{
"path": "source/models/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "source/models/seq_module.py",
"chars": 2594,
"preview": "# -*- coding: utf-8 -*-\nimport math\nimport torch\nimport random\nimport itertools\nimport numpy as np\nimport torch.nn as nn"
},
{
"path": "source/models/solver.py",
"chars": 2870,
"preview": "import time\nimport torch\nimport numpy as np\nfrom torch.autograd import Variable\nfrom utils.basic import timeSince\n\nclass"
},
{
"path": "source/train.sh",
"chars": 146,
"preview": "#!/usr/bin/env bash\n\nfilename=\"../log/log/log_`date +%y_%m_%d_%H_%M_%S`.txt\"\nCUDA_VISIBLE_DEVICES=0 python -u main.py \\\n"
},
{
"path": "source/utils/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "source/utils/basic.py",
"chars": 230,
"preview": "import time\r\nimport math\r\n\r\n\r\ndef asMinutes(s):\r\n m = math.floor(s / 60)\r\n s -= m * 60\r\n return '%dm %ds' % (m,"
},
{
"path": "source/utils/data_loader.py",
"chars": 1423,
"preview": "import cv2\nimport torch\nimport numpy as np\nfrom torch.utils.data import Dataset, DataLoader\n\n\nclass ImageDataset(Dataset"
}
]
About this extraction
This page contains the full source code of the liaoxy169/Aggregation-Cross-Entropy GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 12 files (13.9 KB), approximately 3.9k tokens, and a symbol index with 26 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.
Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.