Repository: yuanyuanli85/FashionAI_KeyPoint_Detection_Challenge_Keras Branch: master Commit: 0b3bd8cdee32 Files: 22 Total size: 85.2 KB Directory structure: gitextract_r2c2s0cn/ ├── .gitignore ├── LICENSE ├── README.md ├── data/ │ └── placeholder.txt ├── src/ │ ├── data_gen/ │ │ ├── data_generator.py │ │ ├── data_process.py │ │ ├── dataset.py │ │ ├── kpAnno.py │ │ ├── ohem.py │ │ └── utils.py │ ├── eval/ │ │ ├── eval_callback.py │ │ ├── evaluation.py │ │ └── post_process.py │ ├── top/ │ │ ├── demo.py │ │ ├── test.py │ │ └── train.py │ └── unet/ │ ├── fashion_net.py │ ├── refinenet.py │ ├── refinenet_mask_v3.py │ └── resnet101.py ├── submission/ │ └── placeholder.txt └── trained_models/ └── placeholder.txt ================================================ FILE CONTENTS ================================================ ================================================ FILE: .gitignore ================================================ .idea *.pyc *.pkl ================================================ FILE: LICENSE ================================================ MIT License Copyright (c) 2018 VictorLi Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ================================================ FILE: README.md ================================================ # AiFashion - Author: VictorLi, yuanyuan.li85@gmail.com - Code for FashionAI Global Challenge—Key Points Detection of Apparel [2018 TianChi](https://tianchi.aliyun.com/competition/introduction.htm?spm=5176.100068.5678.1.4ccc289bCzDJXu&raceId=231648&_lang=en_US) - Rank 45/2322 at 1st round competition, score 0.61 - Rank 46 at 2nd round competition, score 0.477 ## Images with detected keypoints ### Dress ![Dress](./images/dress.jpg) ### Blouse ![Blouse](./images/blouse.jpg) ### Outwear ![Outwear](./images/outwear.jpg) ### Skirt ![Skirt](./images/skirt.jpg) ### Trousers ![Trousers](./images/trousers.jpg) ## Basic idea - The key idea comes from paper [Cascaded Pyramid Network for Multi-Person Pose Estimation](https://arxiv.org/abs/1711.07319). We have a 2 stage network called global net and refine net who are U-net like. The network was trained to detect the heatmap of cloth's key points. The backbone network used here is resnet101. - To overcome the negative impact from different category, `input_mask` was introduced to zero the invalid keypoints. For example, skirt has 4 valid keypoints: `waistband_left`, `waistband_right`, `hemline_left` and `hemline_right`. In `input_mask`, only those valid masks are 1.0 , while other 20 masks are set as zero. - On line hard negative mining, at last stage of refinenet, only take the top losses as consideration and ignore the easy part (small loss) ## Dependency - Keras2.0 - Tensorflow - Opencv/Numpy/Pandas - Pretrained model weights, resenet101 ## Folder Structure - `data`: folder to store training and testing images and annotations - `trained_models`: folder to store trained models and logs - `submission`: folder to store generated submission for evaluation. - `src`: folder to put all of source code. `src/data_gen`: code for data generator including data augmentation and pre-process `src/eval`: code for evaluation, including inference and post-processing. `src/unet`: code for cnn model definition, including train, fine-tune, loss, optimizer definition. `src/top`:top level code for train, test and demo. ## How to train network - Download dataset from competition webpage and put it under data. `data/train` : data used as train. `data/test` : data used for test - Download [resnet101](https://gist.github.com/flyyufelix/65018873f8cb2bbe95f429c474aa1294) model and save it as `data/resnet101_weights_tf.h5`. Note: all the models here use channel_last dim order. - Train all-in-one network from scratch ``` python train.py --category all --epochs 30 --network v11 --batchSize 3 --gpuID 2 ``` - The trained model and log will be put under `trained_models/all/xxxx`, i.e `trained_models/all/2018_05_23_15_18_07/` - The evaluation will run for each epoch and details saved to `val.log` - Resume training from a specific model. ``` python train.py --gpuID 2 --category all --epochs 30 --network v11 --batchSize 3 --resume True --resumeModel /path/to/model/start/with --initEpoch 6 ``` ## How to test and generate submission - Run test and generate submission Below command search the best score from `modelpath` and use that to generate submission ``` python test.py --gpuID 2 --modelpath ../../trained_models/all/xxx --outpath ../../submission/2018_04_19/ --augment True ``` The submission will be saved as `submission.csv` ## How to run demo - Download the pre trained weights from [BaiduDisk](https://pan.baidu.com/s/1t7fB5wnRfW1Vny0gw7xUDQ) (password `1ae2`) or [GoogleDrive](https://drive.google.com/open?id=1VY-AO2F1XMQLBjEZjy6CrOSIPWWaHUGr) - Save it somewhere, i.e `trained_models/all/fashion_ai_keypoint_weights_epoch28.hdf5` - Or use your own trained model. - Run demo and the cloth with keypoints marked will be displayed. ``` python demo.py --gpuID 2 --modelfile ../../trained_models/all/fashion_ai_keypoint_weights_epoch28.hdf5 ``` ## Reference - Resnet 101 Keras : https://github.com/statech/resnet ================================================ FILE: data/placeholder.txt ================================================ ================================================ FILE: src/data_gen/data_generator.py ================================================ import os import cv2 import pandas as pd import numpy as np import random from kpAnno import KpAnno from dataset import getKpNum, getKpKeys, getFlipMapID, generate_input_mask from utils import make_gaussian, load_annotation_from_df from data_process import pad_image, resize_image, normalize_image, rotate_image, \ rotate_image_float, rotate_mask, crop_image from ohem import generate_topk_mask_ohem class DataGenerator(object): def __init__(self, category, annfile): self.category = category self.annfile = annfile self._initialize() def get_dim_order(self): # default tensorflow dim order return "channels_last" def get_dataset_size(self): return len(self.annDataFrame) def generator_with_mask_ohem(self, graph, kerasModel, batchSize=16, inputSize=(512, 512), flipFlag=False, cropFlag=False, shuffle=True, rotateFlag=True, nStackNum=1): ''' Input: batch_size * Height (512) * Width (512) * Channel (3) Input: batch_size * 256 * 256 * Channel (N+1). Mask for each category. 1.0 for valid parts in category. 0.0 for invalid parts Output: batch_size * Height/2 (256) * Width/2 (256) * Channel (N+1) ''' xdf = self.annDataFrame targetHeight, targetWidth = inputSize # train_input: npfloat, height, width, channels # train_gthmap: npfloat, N heatmap + 1 background heatmap, train_input = np.zeros((batchSize, targetHeight, targetWidth, 3), dtype=np.float) train_mask = np.zeros((batchSize, targetHeight / 2, targetWidth / 2, getKpNum(self.category) ), dtype=np.float) train_gthmap = np.zeros((batchSize, targetHeight / 2, targetWidth / 2, getKpNum(self.category) ), dtype=np.float) train_ohem_mask = np.zeros((batchSize, targetHeight / 2, targetWidth / 2, getKpNum(self.category) ), dtype=np.float) train_ohem_gthmap = np.zeros((batchSize, targetHeight / 2, targetWidth / 2, getKpNum(self.category) ), dtype=np.float) ## generator need to be infinite loop while 1: # random shuffle at first if shuffle: xdf = xdf.sample(frac=1) count = 0 for _index, _row in xdf.iterrows(): xindex = count % batchSize xinput, xhmap = self._prcoess_img(_row, inputSize, rotateFlag, flipFlag, cropFlag, nobgFlag=True) xmask = generate_input_mask(_row['image_category'], (targetHeight, targetWidth, getKpNum(self.category))) xohem_mask, xohem_gthmap = generate_topk_mask_ohem([xinput, xmask], xhmap, kerasModel, graph, 8, _row['image_category'], dynamicFlag=False) train_input[xindex, :, :, :] = xinput train_mask[xindex, :, :, :] = xmask train_gthmap[xindex, :, :, :] = xhmap train_ohem_mask[xindex, :, :, :] = xohem_mask train_ohem_gthmap[xindex, :, :, :] = xohem_gthmap # if refinenet enable, refinenet has two outputs, globalnet and refinenet if xindex == 0 and count != 0: gthamplst = list() for i in range(nStackNum): gthamplst.append(train_gthmap) # last stack will use ohem gthmap gthamplst.append(train_ohem_gthmap) yield [train_input, train_mask, train_ohem_mask], gthamplst count += 1 def _initialize(self): self._load_anno() def _load_anno(self): ''' Load annotations from train.csv ''' # Todo: check if category legal self.train_img_path = "../../data/train" # read into dataframe xpd = pd.read_csv(self.annfile) xpd = load_annotation_from_df(xpd, self.category) self.annDataFrame = xpd def _prcoess_img(self, dfrow, inputSize, rotateFlag, flipFlag, cropFlag, nobgFlag): mlist = dfrow[getKpKeys(self.category)] imgName, kpStr = mlist[0], mlist[1:] # read kp annotation from csv file kpAnnlst = list() for _kpstr in kpStr: _kpAn = KpAnno.readFromStr(_kpstr) kpAnnlst.append(_kpAn) assert (len(kpAnnlst) == getKpNum(self.category)), str(len(kpAnnlst))+" is not the same as "+str(getKpNum(self.category)) xcvmat = cv2.imread(os.path.join(self.train_img_path, imgName)) if xcvmat is None: return None, None #flip as first operation. # flip image if random.choice([0, 1]) and flipFlag: xcvmat, kpAnnlst = self.flip_image(xcvmat, kpAnnlst) #if cropFlag: # xcvmat, kpAnnlst = crop_image(xcvmat, kpAnnlst, 0.8, 0.95) # pad image to 512x512 paddedImg, kpAnnlst = pad_image(xcvmat, kpAnnlst, inputSize[0], inputSize[1]) assert (len(kpAnnlst) == getKpNum(self.category)), str(len(kpAnnlst)) + " is not the same as " + str( getKpNum(self.category)) # output ground truth heatmap is 256x256 trainGtHmap = self.__generate_hmap(paddedImg, kpAnnlst) if random.choice([0,1]) and rotateFlag: rAngle = np.random.randint(-1*40, 40) rotatedImage, _ = rotate_image(paddedImg, list(), rAngle) rotatedGtHmap = rotate_mask(trainGtHmap, rAngle) else: rotatedImage = paddedImg rotatedGtHmap = trainGtHmap # resize image resizedImg = cv2.resize(rotatedImage, inputSize) resizedGtHmap = cv2.resize(rotatedGtHmap, (inputSize[0]//2, inputSize[1]//2)) return normalize_image(resizedImg), resizedGtHmap def __generate_hmap(self, cvmat, kpAnnolst): # kpnum + background gthmp = np.zeros((cvmat.shape[0], cvmat.shape[1], getKpNum(self.category)), dtype=np.float) for i, _kpAnn in enumerate(kpAnnolst): if _kpAnn.visibility == -1: continue radius = 100 gaussMask = make_gaussian(radius, radius, 20, None) # avoid out of boundary top_x, top_y = max(0, _kpAnn.x - radius/2), max(0, _kpAnn.y - radius/2) bottom_x, bottom_y = min(cvmat.shape[1], _kpAnn.x + radius/2), min(cvmat.shape[0], _kpAnn.y + radius/2) top_x_offset = top_x - (_kpAnn.x - radius/2) top_y_offset = top_y - (_kpAnn.y - radius/2) gthmp[ top_y:bottom_y, top_x:bottom_x, i] = gaussMask[top_y_offset:top_y_offset + bottom_y-top_y, top_x_offset:top_x_offset + bottom_x-top_x] return gthmp def flip_image(self, orgimg, orgKpAnolst): flipImg = cv2.flip(orgimg, flipCode=1) flipannlst = self.flip_annlst(orgKpAnolst, orgimg.shape) return flipImg, flipannlst def flip_annlst(self, kpannlst, imgshape): height, width, channels = imgshape # flip first flipAnnlst = list() for _kp in kpannlst: flip_x = width - _kp.x flipAnnlst.append(KpAnno(flip_x, _kp.y, _kp.visibility)) # exchange location of flip keypoints, left->right outAnnlst = flipAnnlst[:] for i, _kp in enumerate(flipAnnlst): mapId = getFlipMapID('all', i) outAnnlst[mapId] = _kp return outAnnlst ================================================ FILE: src/data_gen/data_process.py ================================================ import pandas as pd import numpy as np import cv2 import os from kpAnno import KpAnno def normalize_image(cvmat): assert (cvmat.dtype == np.uint8) , " only support normalize np.uint8 to float -0.5 ~ 0.5'" cvmat = cvmat.astype(np.float) cvmat = (cvmat - 128.0) / 256.0 return cvmat def resize_image(cvmat, targetWidth, targetHeight): assert (cvmat.dtype == np.uint8) , " only support normalize np.uint8 in resize_image'" # get scale srcHeight, srcWidth, channles = cvmat.shape minScale = min( targetHeight*1.0/srcHeight, targetWidth*1.0/srcWidth) # resize resizedMat = cv2.resize(cvmat, None, fx=minScale, fy=minScale) reHeight, reWidth, channles = resizedMat.shape # pad to targetWidth or targetHeight outmat = np.zeros((targetHeight, targetWidth, 3), dtype=cvmat.dtype) + 128 if targetHeight == reHeight and targetWidth == reWidth: outmat = resizedMat elif targetWidth != reWidth and targetHeight == reHeight: # add pad to width outmat[:, 0:reWidth, :] = resizedMat elif targetHeight != reHeight and targetWidth == reWidth: # add padding to height outmat[0:reHeight, :, :] = resizedMat else: assert(0), "after resize either width or height same as target width or target height" return (outmat, minScale) def pad_image(cvmat, kpAnno, targetWidth, targetHeight): ''' :param cvmat: input mat :param targetWidth: width to pad :param targetHeight: height to pad :return: ''' assert (cvmat.dtype == np.uint8) , " only support normalize np.uint8 in pad_image'" + str(cvmat.dtype) srcHeight, srcWidth, channles = cvmat.shape outmat = np.zeros((targetHeight, targetWidth, 3), dtype=cvmat.dtype) + 128 if targetHeight == srcHeight and targetWidth == srcWidth: outmat = cvmat outkpAnno = kpAnno elif targetWidth != srcWidth and targetHeight == srcHeight: # add pad to width outmat[:, 0:srcWidth, :] = cvmat outkpAnno = kpAnno elif targetHeight != srcHeight and targetWidth == srcWidth: # add padding to height outmat[0:srcHeight, :, :] = cvmat outkpAnno = kpAnno else: # resize at first, then pad outmat, scale = resize_image(cvmat, targetWidth, targetHeight) outkpAnno = list() for _kpAnno in kpAnno: _nkp = KpAnno.applyScale(_kpAnno, scale) outkpAnno.append(_nkp) return (outmat, outkpAnno) def pad_image_inference(cvmat, targetWidth, targetHeight): ''' :param cvmat: input mat :param targetWidth: width to pad :param targetHeight: height to pad :return: ''' assert (cvmat.dtype == np.uint8), " only support normalize np.uint8 in pad_image'" + str(cvmat.dtype) srcHeight, srcWidth, channles = cvmat.shape outmat = np.zeros((targetHeight, targetWidth, 3), dtype=cvmat.dtype) + 128 if targetHeight == srcHeight and targetWidth == srcWidth: outmat = cvmat scale = 1.0 elif targetWidth > srcWidth and targetHeight == srcHeight: # add pad to width outmat[:, 0:srcWidth, :] = cvmat scale = 1.0 elif targetHeight > srcHeight and targetWidth == srcWidth: # add padding to height outmat[0:srcHeight, :, :] = cvmat scale = 1.0 else: # resize at first, then pad outmat, scale = resize_image(cvmat, targetWidth, targetHeight) return (outmat, scale) def rotate_image(cvmat, kpAnnLst, rotateAngle): assert (cvmat.dtype == np.uint8) , " only support normalize np.uint8 in rotate_image'" ##Make sure cvmat is square? height, width, channel = cvmat.shape center = ( width//2, height//2) rotateMatrix = cv2.getRotationMatrix2D(center, rotateAngle, 1.0) cos, sin = np.abs(rotateMatrix[0,0]), np.abs(rotateMatrix[0, 1]) newH = int((height*sin)+(width*cos)) newW = int((height*cos)+(width*sin)) rotateMatrix[0,2] += (newW/2) - center[0] #x rotateMatrix[1,2] += (newH/2) - center[1] #y # rotate image outMat = cv2.warpAffine(cvmat, rotateMatrix, (newH, newW), borderValue=(128, 128, 128)) # rotate annotations nKpLst = list() for _kp in kpAnnLst: _newkp = KpAnno.applyRotate(_kp, rotateMatrix) nKpLst.append(_newkp) return (outMat, nKpLst) def rotate_image_with_invrmat(cvmat, rotateAngle): assert (cvmat.dtype == np.uint8) , " only support normalize np.uint in rotate_image_with_invrmat'" ##Make sure cvmat is square? height, width, channel = cvmat.shape center = ( width//2, height//2) rotateMatrix = cv2.getRotationMatrix2D(center, rotateAngle, 1.0) cos, sin = np.abs(rotateMatrix[0,0]), np.abs(rotateMatrix[0, 1]) newH = int((height*sin)+(width*cos)) newW = int((height*cos)+(width*sin)) rotateMatrix[0,2] += (newW/2) - center[0] #x rotateMatrix[1,2] += (newH/2) - center[1] #y # rotate image outMat = cv2.warpAffine(cvmat, rotateMatrix, (newH, newW), borderValue=(128, 128, 128)) # generate inv rotate matrix invRotateMatrix = cv2.invertAffineTransform(rotateMatrix) return (outMat, invRotateMatrix, (width, height)) def rotate_mask(mask, rotateAngle): outmask = rotate_image_float(mask, rotateAngle) return outmask def rotate_image_float(cvmat, rotateAngle, borderValue=(0.0, 0.0, 0.0)): assert (cvmat.dtype == np.float) , " only support normalize np.float in rotate_image_float'" ##Make sure cvmat is square? height, width, channels = cvmat.shape center = ( width//2, height//2) rotateMatrix = cv2.getRotationMatrix2D(center, rotateAngle, 1.0) cos, sin = np.abs(rotateMatrix[0,0]), np.abs(rotateMatrix[0, 1]) newH = int((height*sin)+(width*cos)) newW = int((height*cos)+(width*sin)) rotateMatrix[0,2] += (newW/2) - center[0] #x rotateMatrix[1,2] += (newH/2) - center[1] #y # rotate image outMat = cv2.warpAffine(cvmat, rotateMatrix, (newH, newW), borderValue=borderValue) return outMat def crop_image(cvmat, kpAnnLst, lowLimitRatio, upLimitRatio): import random assert(lowLimitRatio < 1.0), 'lowLimitRatio should be less than 1.0' assert(upLimitRatio < 1.0), 'upLimitRatio should be less than 1.0' height, width, channels = cvmat.shape cropHeight = random.randrange(int(lowLimitRatio*height), int(upLimitRatio*height)) cropWidth = random.randrange(int(lowLimitRatio*width), int(upLimitRatio*width)) top_x = random.randrange(0, width - cropWidth) top_y = random.randrange(0, height - cropHeight) # apply offset for keypoints nKpLst = list() for _kp in kpAnnLst: if _kp.visibility == -1: _newkp = _kp else: _newkp = KpAnno.applyOffset(_kp, (top_x, top_y)) if _newkp.x <=0 or _newkp.y <=0: # negative location, return original image return cvmat, kpAnnLst if _newkp.x >= cropWidth or _newkp.y >= cropHeight: # keypoints are cropped out return cvmat, kpAnnLst nKpLst.append(_newkp) return cvmat[top_y:top_y+cropHeight, top_x:top_x+cropWidth], nKpLst if __name__ == "__main__": pass ================================================ FILE: src/data_gen/dataset.py ================================================ def getKpNum(category): # remove one column 'image_id' return len(getKpKeys(category)) - 1 TROUSERS_PART_KYES=['waistband_left', 'waistband_right', 'crotch', 'bottom_left_in', 'bottom_left_out', 'bottom_right_in', 'bottom_right_out'] TROUSERS_PART_FLIP_KYES=['waistband_right', 'waistband_left', 'crotch', 'bottom_right_in', 'bottom_right_out', 'bottom_left_in', 'bottom_left_out'] SKIRT_PART_KEYS=['waistband_left', 'waistband_right', 'hemline_left', 'hemline_right'] SKIRT_PART_FLIP_KEYS=['waistband_right', 'waistband_left', 'hemline_right', 'hemline_left'] DRESS_PART_KEYS= ['neckline_left', 'neckline_right', 'shoulder_left', 'shoulder_right', 'center_front', 'armpit_left', 'armpit_right', 'waistline_left', 'waistline_right', 'cuff_left_in', 'cuff_left_out', 'cuff_right_in', 'cuff_right_out', 'hemline_left', 'hemline_right'] DRESS_PART_FLIP_KEYS=['neckline_right', 'neckline_left', 'shoulder_right', 'shoulder_left', 'center_front', 'armpit_right', 'armpit_left', 'waistline_right', 'waistline_left', 'cuff_right_in', 'cuff_right_out', 'cuff_left_in', 'cuff_left_out', 'hemline_right', 'hemline_left'] BLOUSE_PART_KEYS=['neckline_left', 'neckline_right', 'shoulder_left', 'shoulder_right', 'center_front', 'armpit_left', 'armpit_right', 'top_hem_left', 'top_hem_right', 'cuff_left_in', 'cuff_left_out', 'cuff_right_in', 'cuff_right_out'] BLOUSE_PART_FLIP_KEYS=['neckline_right', 'neckline_left', 'shoulder_right', 'shoulder_left', 'center_front', 'armpit_right', 'armpit_left', 'top_hem_right', 'top_hem_left', 'cuff_right_in', 'cuff_right_out', 'cuff_left_in', 'cuff_left_out'] OUTWEAR_PART_KEYS=['neckline_left', 'neckline_right', 'shoulder_left', 'shoulder_right', 'armpit_left', 'armpit_right', 'waistline_left', 'waistline_right', 'cuff_left_in', 'cuff_left_out', 'cuff_right_in', 'cuff_right_out', 'top_hem_left', 'top_hem_right'] OUTWEAR_PART_FLIP_KEYS = ['neckline_right', 'neckline_left', 'shoulder_right', 'shoulder_left', 'armpit_right', 'armpit_left', 'waistline_right', 'waistline_left', 'cuff_right_in', 'cuff_right_out', 'cuff_left_in', 'cuff_left_out', 'top_hem_right', 'top_hem_left'] ALL_PART_KEYS = ['neckline_left', 'neckline_right', 'center_front', 'shoulder_left', 'shoulder_right', 'armpit_left', 'armpit_right', 'waistline_left', 'waistline_right', 'cuff_left_in', 'cuff_left_out', 'cuff_right_in', 'cuff_right_out', 'top_hem_left', 'top_hem_right', 'waistband_left', 'waistband_right', 'hemline_left', 'hemline_right', 'crotch', 'bottom_left_in', 'bottom_left_out', 'bottom_right_in', 'bottom_right_out'] ALL_PART_FLIP_KEYS = [ 'neckline_right', 'neckline_left', 'center_front', 'shoulder_right', 'shoulder_left', 'armpit_right', 'armpit_left', 'waistline_right', 'waistline_left', 'cuff_right_in', 'cuff_right_out', 'cuff_left_in', 'cuff_left_out', 'top_hem_right', 'top_hem_left', 'waistband_right','waistband_left', 'hemline_right', 'hemline_left', 'crotch', 'bottom_right_in', 'bottom_right_out', 'bottom_left_in', 'bottom_left_out'] def getFlipKeys(category): if category == 'skirt': keys, mapkeys = SKIRT_PART_KEYS, SKIRT_PART_FLIP_KEYS elif category == 'dress': keys, mapkeys = DRESS_PART_KEYS, DRESS_PART_FLIP_KEYS elif category == 'trousers': keys, mapkeys = TROUSERS_PART_KYES, TROUSERS_PART_FLIP_KYES elif category == 'blouse': keys, mapkeys = BLOUSE_PART_KEYS, BLOUSE_PART_FLIP_KEYS elif category == 'outwear': keys, mapkeys = OUTWEAR_PART_KEYS, OUTWEAR_PART_FLIP_KEYS elif category == 'all': keys, mapkeys = ALL_PART_KEYS, ALL_PART_FLIP_KEYS else: assert (0), category + " not supported" xdict = dict() for i in range(len(keys)): xdict[keys[i]] = mapkeys[i] return keys, xdict def getFlipMapID(category, partid): keys, mapDict = getFlipKeys(category) mapKey = mapDict[keys[partid]] mapID = keys.index(mapKey) return mapID def getKpKeys(category): ''' :param category: :return: get the keypoint keys in annotation csv ''' SKIRT_KP_KEYS = ['image_id', 'waistband_left', 'waistband_right', 'hemline_left', 'hemline_right'] DRESS_KP_KEYS = ['image_id', 'neckline_left', 'neckline_right', 'shoulder_left', 'shoulder_right', 'center_front', 'armpit_left', 'armpit_right' , 'waistline_left' , 'waistline_right', 'cuff_left_in', 'cuff_left_out', 'cuff_right_in', 'cuff_right_out', 'hemline_left', 'hemline_right'] TROUSERS_KP_KEYS=['image_id', 'waistband_left', 'waistband_right', 'crotch', 'bottom_left_in', 'bottom_left_out', 'bottom_right_in', 'bottom_right_out'] BLOUSE_KP_KEYS = [ 'image_id', 'neckline_left', 'neckline_right', 'shoulder_left', 'shoulder_right', 'center_front', 'armpit_left', 'armpit_right', 'top_hem_left', 'top_hem_right', 'cuff_left_in', 'cuff_left_out', 'cuff_right_in', 'cuff_right_out'] OUTWEAR_KP_KEYS= ['image_id', 'neckline_left', 'neckline_right', 'shoulder_left', 'shoulder_right', 'armpit_left', 'armpit_right', 'waistline_left', 'waistline_right', 'cuff_left_in', 'cuff_left_out', 'cuff_right_in', 'cuff_right_out', 'top_hem_left', 'top_hem_right'] ALL_KP_KESY = ['image_id','neckline_left', 'neckline_right', 'center_front', 'shoulder_left', 'shoulder_right', 'armpit_left', 'armpit_right', 'waistline_left', 'waistline_right', 'cuff_left_in', 'cuff_left_out', 'cuff_right_in', 'cuff_right_out', 'top_hem_left', 'top_hem_right', 'waistband_left', 'waistband_right', 'hemline_left', 'hemline_right' , 'crotch', 'bottom_left_in' , 'bottom_left_out', 'bottom_right_in' ,'bottom_right_out'] if category == 'skirt': return SKIRT_KP_KEYS elif category == 'dress': return DRESS_KP_KEYS elif category == 'trousers': return TROUSERS_KP_KEYS elif category == 'blouse': return BLOUSE_KP_KEYS elif category == 'outwear': return OUTWEAR_KP_KEYS elif category == 'all': return ALL_KP_KESY else: assert(0), category + ' not supported' def fill_dataframe(kplst, category, dfrow): keys = getKpKeys(category)[1:] # fill category dfrow['image_category'] = category assert (len(keys) == len(kplst)), str(len(kplst)) + ' must be the same as ' + str(len(keys)) for i, _key in enumerate(keys): kpann = kplst[i] outstr = str(int(kpann.x))+"_"+str(int(kpann.y))+"_"+str(1) dfrow[_key] = outstr def get_kp_index_from_allkeys(kpname): ALL_KP_KEYS = ['neckline_left', 'neckline_right', 'center_front', 'shoulder_left', 'shoulder_right', 'armpit_left', 'armpit_right', 'waistline_left', 'waistline_right', 'cuff_left_in', 'cuff_left_out', 'cuff_right_in', 'cuff_right_out', 'top_hem_left', 'top_hem_right', 'waistband_left', 'waistband_right', 'hemline_left', 'hemline_right', 'crotch', 'bottom_left_in', 'bottom_left_out', 'bottom_right_in', 'bottom_right_out'] return ALL_KP_KEYS.index(kpname) def generate_input_mask(image_category, shape, nobgFlag=True): import numpy as np # 0.0 for invalid key points for each category # 1.0 for valid key points for each category h, w, c = shape mask = np.zeros((h // 2, w // 2, c), dtype=np.float) for key in getKpKeys(image_category)[1:]: index = get_kp_index_from_allkeys(key) mask[:, :, index] = 1.0 # for last channel, background if nobgFlag: mask[:, :, -1] = 0.0 else: mask[:, :, -1] = 1.0 return mask ================================================ FILE: src/data_gen/kpAnno.py ================================================ import numpy as np class KpAnno(object): ''' Convert string to x, y, visibility ''' def __init__(self, x, y, visibility): self.x = int(x) self.y = int(y) self.visibility = visibility @classmethod def readFromStr(cls, xstr): xarray = xstr.split('_') x = int(xarray[0]) y = int(xarray[1]) visibility = int(xarray[2]) return cls(x,y, visibility) @classmethod def applyScale(cls, kpAnno, scale): x = int(kpAnno.x*scale) y = int(kpAnno.y*scale) v = kpAnno.visibility return cls(x, y, v) @classmethod def applyRotate(cls, kpAnno, rotateMatrix): vector = [kpAnno.x, kpAnno.y, 1] rotatedV = np.dot(rotateMatrix, vector) return cls( int(rotatedV[0]), int(rotatedV[1]), kpAnno.visibility) @classmethod def applyOffset(cls, kpAnno, offset): x = kpAnno.x - offset[0] y = kpAnno.y - offset[1] v = kpAnno.visibility return cls(x, y, v) @staticmethod def calcDistance(kpA, kpB): distance = (kpA.x - kpB.x)**2 + (kpA.y - kpB.y)**2 return np.sqrt(distance) ================================================ FILE: src/data_gen/ohem.py ================================================ import sys sys.path.insert(0, "../unet/") from keras.models import * from keras.layers import * from utils import np_euclidean_l2 from dataset import getKpNum def generate_topk_mask_ohem(input_data, gthmap, keras_model, graph, topK, image_category, dynamicFlag=False): ''' :param input_data: input :param gthmap: ground truth :param keras_model: keras model :param graph: tf grpah to WA thread issue :param topK: number of kp selected :return: ''' # do inference, and calculate loss of each channel mimg, mmask = input_data ximg = mimg[np.newaxis,:,:,:] xmask = mmask[np.newaxis,:,:,:] if len(keras_model.input_layers) == 3: # use original mask as ohem_mask inputs = [ximg, xmask, xmask] else: inputs = [ximg, xmask] with graph.as_default(): keras_output = keras_model.predict(inputs) # heatmap of last stage outhmap = keras_output[-1] channel_num = gthmap.shape[-1] # calculate loss mloss = list() for i in range(channel_num): _dtmap = outhmap[0, :, :, i] _gtmap = gthmap[:, :, i] loss = np_euclidean_l2(_dtmap, _gtmap) mloss.append(loss) # refill input_mask, set topk as 1.0 and fill 0.0 for rest # fixme: topk may different b/w category if dynamicFlag: topK = getKpNum(image_category)//2 ohem_mask = adjsut_mask(mloss, mmask, topK) ohem_gthmap = ohem_mask * gthmap return ohem_mask, ohem_gthmap def adjsut_mask(loss, input_mask, topk): # pick topk loss from losses # fill topk with 1.0 and fill the rest as 0.0 assert (len(loss) == input_mask.shape[-1]), \ "shape should be same" + str(len(loss)) + " vs " + str(input_mask.shape) outmask = np.zeros(input_mask.shape, dtype=np.float) topk_index = sorted(range(len(loss)), key=lambda i:loss[i])[-topk:] for i in range(len(loss)): if i in topk_index: outmask[:,:,i] = 1.0 return outmask ================================================ FILE: src/data_gen/utils.py ================================================ import numpy as np import pandas as pd import os def make_gaussian(width, height, sigma=3, center=None): ''' generate 2d guassion heatmap :return: ''' x = np.arange(0, width, 1, float) y = np.arange(0, height, 1, float)[:, np.newaxis] if center is None: x0 = width // 2 y0 = height // 2 else: x0 = center[0] y0 = center[1] return np.exp( -4*np.log(2)*((x-x0)**2 + (y-y0)**2)/sigma**2) def split_csv_train_val(allcsv, traincsv, valcsv, ratio=0.8): xdf = pd.read_csv(allcsv) # random shuffle xdf = xdf.sample(frac=1) # random sampling msk = np.random.rand(len(xdf)) < ratio trainDf= xdf[msk] valDf= xdf[~msk] print "total", len(xdf), "split into train ", len(trainDf), ' val', len(valDf) #save to file trainDf.to_csv(traincsv, index=False) valDf.to_csv(valcsv, index=False) def np_euclidean_l2(x, y): assert (x.shape == y.shape), "shape mismatched " + x.shape +" : " + y.shape loss = np.sum((x - y)**2) loss = np.sqrt(loss) return loss def load_annotation_from_df(df, category): if category == 'all': return df else: return df[df['image_category'] == category] ================================================ FILE: src/eval/eval_callback.py ================================================ import keras import os import datetime from evaluation import Evaluation from time import time class NormalizedErrorCallBack(keras.callbacks.Callback): def __init__(self, foldpath, category, multiOut=False, resumeFolder=None): self.parentFoldPath = foldpath self.category = category if resumeFolder is None: self.foldPath = os.path.join(self.parentFoldPath, self.category, datetime.datetime.now().strftime('%Y_%m_%d_%H_%M_%S')) if not os.path.exists(self.foldPath): os.mkdir(self.foldPath) else: self.foldPath = resumeFolder self.valLog = os.path.join(self.foldPath, 'val.log') self.multiOut = multiOut def get_folder_path(self): return self.foldPath def on_epoch_end(self, epoch, logs=None): modelName = os.path.join(self.foldPath, self.category+"_weights_"+str(epoch)+".hdf5") keras.models.save_model(self.model, modelName) print "Saving model to ", modelName print "Runing evaluation ........." xEval = Evaluation(self.category, None) xEval.init_from_model(self.model) start = time() neScore, categoryDict = xEval.eval(self.multiOut, details=True) end = time() print "Evaluation Done", str(neScore), " cost ", end - start, " seconds!" for key in categoryDict.keys(): scores = categoryDict[key] print key, ' score ', sum(scores)/len(scores) with open(self.valLog , 'a+') as xfile: xfile.write(modelName + ", Socre "+ str(neScore)+"\n") for key in categoryDict.keys(): scores = categoryDict[key] xfile.write(key + ": " + str(sum(scores)/len(scores)) + "\n") xfile.close() ================================================ FILE: src/eval/evaluation.py ================================================ import sys sys.path.insert(0, "../data_gen/") sys.path.insert(0, "../unet/") import pandas as pd from dataset import getKpKeys, getKpNum, getFlipMapID, get_kp_index_from_allkeys, generate_input_mask from kpAnno import KpAnno from post_process import post_process_heatmap from keras.models import load_model import os from refinenet_mask_v3 import euclidean_loss import numpy as np import cv2 from resnet101 import Scale from utils import load_annotation_from_df from collections import defaultdict import copy from data_process import pad_image_inference class Evaluation(object): def __init__(self, category, modelFile): self.category = category self.train_img_path = "../../data/train" if modelFile is not None: self._initialize(modelFile) def init_from_model(self, model): self._load_anno() self.net = model def eval(self, multiOut=False, details=False, flip=True): xdf = self.annDataFrame scores = list() xdict = dict() xcategoryDict = defaultdict(list) for _index, _row in xdf.iterrows(): imgId = _row['image_id'] category = _row['image_category'] imgFile = os.path.join(self.train_img_path, imgId) gtKpAnno = self._get_groundtruth_kpAnno(_row) if flip: predKpAnno = self.predict_kp_with_flip(imgFile, category) else: predKpAnno = self.predict_kp(imgFile, category, multiOut) neScore = Evaluation.calc_ne_score(category, predKpAnno, gtKpAnno) scores.extend(neScore) if details: xcategoryDict[category].extend(neScore) if details: return sum(scores)/len(scores), xcategoryDict else: return sum(scores)/len(scores) def _initialize(self, modelFile): self._load_anno() self._initialize_network(modelFile) def _initialize_network(self, modelFile): self.net = load_model(modelFile, custom_objects={'euclidean_loss': euclidean_loss, 'Scale': Scale}) def _load_anno(self): ''' Load annotations from train.csv ''' self.annfile = os.path.join("../../data/train/Annotations", "val_split.csv") # read into dataframe xpd = pd.read_csv(self.annfile) xpd = load_annotation_from_df(xpd, self.category) self.annDataFrame = xpd def _get_groundtruth_kpAnno(self, dfrow): mlist = dfrow[getKpKeys(self.category)] imgName, kpStr = mlist[0], mlist[1:] # read kp annotation from csv file kpAnnlst = [KpAnno.readFromStr(_kpstr) for _kpstr in kpStr] return kpAnnlst def _net_inference_with_mask(self, imgFile, imgCategory): import cv2 from data_process import normalize_image, pad_image_inference assert (len(self.net.input_layers) > 1), "input layer need to more than 1" # load image and preprocess img = cv2.imread(imgFile) img, scale = pad_image_inference(img, 512, 512) img = normalize_image(img) input_img = img[np.newaxis, :, :, :] input_mask = generate_input_mask(imgCategory, (512, 512, getKpNum(self.category)) ) input_mask = input_mask[np.newaxis, :, :, :] # inference heatmap = self.net.predict([input_img, input_mask, input_mask]) return (heatmap, scale) def _heatmap_sum(self, heatmaplst): outheatmap = np.copy(heatmaplst[0]) for i in range(1, len(heatmaplst), 1): outheatmap += heatmaplst[i] return outheatmap def predict_kp(self, imgFile, imgCategory, multiOutput=False): xnetout, scale = self._net_inference_with_mask(imgFile, imgCategory) if multiOutput: #todo: fixme, it is tricky that the previous stage has beeter performance than last stage's output. #todo: here, we are using multiple stage's output sum. heatmap = self._heatmap_sum(xnetout) else: heatmap = xnetout detectedKps = post_process_heatmap(heatmap, kpConfidenceTh=0.2) # scale to padded resolution 256X256 -> 512X512 scaleTo512 = 2.0 # apply scale to original resolution detectedKps = [KpAnno(_kp.x*scaleTo512/scale , _kp.y*scaleTo512/scale, _kp.visibility) for _kp in detectedKps] return detectedKps def predict_kp_with_flip(self, imgFile, imgCategory): # inference with flip and original image heatmap, scale = self._net_inference_flip(imgFile, imgCategory) detectedKps = post_process_heatmap(heatmap, kpConfidenceTh=0.2) # scale to padded resolution 256X256 -> 512X512 scaleTo512 = 2.0 # apply scale to original resolution detectedKps = [KpAnno(_kp.x * scaleTo512 / scale, _kp.y * scaleTo512 / scale, _kp.visibility) for _kp in detectedKps] return detectedKps def _net_inference_flip(self, imgFile, imgCategory): import cv2 from data_process import normalize_image, pad_image_inference assert (len(self.net.input_layers) > 1), "input layer need to more than 1" batch_size =2 input_img = np.zeros(shape=(batch_size, 512, 512, 3), dtype=np.float) input_mask = np.zeros(shape=(batch_size, 256, 256, getKpNum(self.category)), dtype=np.float) # load image and preprocess orgimage = cv2.imread(imgFile) padimg, scale = pad_image_inference(orgimage, 512, 512) flipimg = cv2.flip(padimg, flipCode=1) input_img[0,:,:,:] = normalize_image(padimg) input_img[1,:,:,:] = normalize_image(flipimg) mask = generate_input_mask(imgCategory, (512, 512, getKpNum(self.category))) input_mask[0,:,:,:] = mask input_mask[1,:,:,:] = mask # inference if len(self.net.input_layers) == 2: heatmap = self.net.predict([input_img, input_mask]) elif len(self.net.input_layers) == 3: heatmap = self.net.predict([input_img, input_mask, input_mask]) else: assert (0), str(len(self.net.input_layers)) + " should be 2 or 3 " # sum heatmap avgheatmap = self._heatmap_sum(heatmap) orgheatmap = avgheatmap[0,:,:,:] # convert to same sequency with original heatmap flipheatmap = avgheatmap[1,:,:,:] flipheatmap = self._flip_out_heatmap(flipheatmap) # average original and flip heatmap outheatmap = flipheatmap + orgheatmap outheatmap = outheatmap[np.newaxis, :, :, :] return (outheatmap, scale) def predict_kp_with_rotate(self, imgFile, imgCategory): # inference with rotated image rotateheatmap = self._net_inference_rotate(imgFile, imgCategory) rotateheatmap = rotateheatmap[np.newaxis, :, :, :] # original image and flip image orgflipmap, scale = self._net_inference_flip(imgFile, imgCategory) mflipmap = cv2.resize(orgflipmap[0,:,:,:], None, fx=2.0/scale, fy=2.0/scale) # add mflipmap and rotateheatmap avgheatmap = mflipmap[np.newaxis, :, :, :] b, h, w , c = rotateheatmap.shape avgheatmap[:, 0:h, 0:w,:] += rotateheatmap # generate key point locations detectedKps = post_process_heatmap(avgheatmap, kpConfidenceTh=0.2) return detectedKps def _net_inference_rotate(self, imgFile, imgCategory): from data_process import normalize_image, pad_image_inference, rotate_image_with_invrmat # load image and preprocess orgimage = cv2.imread(imgFile) anglelst = [-20, -10, 10, 20] input_img = np.zeros(shape=(len(anglelst), 512, 512, 3), dtype=np.float) input_mask = np.zeros(shape=(len(anglelst), 256, 256, getKpNum(self.category)), dtype=np.float) mlist = list() for i, angle in enumerate(anglelst): rotateimg, invRotMatrix, orgImgSize = rotate_image_with_invrmat(orgimage, angle) padimg, scale = pad_image_inference(rotateimg, 512, 512) _img = normalize_image(padimg) input_img[i, :, :, :] = _img mlist.append((scale, invRotMatrix)) mask = generate_input_mask(imgCategory, (512, 512, getKpNum(self.category))) for i, angle in enumerate(anglelst): input_mask[i, :,:,:] = mask # inference heatmap = self.net.predict([input_img, input_mask, input_mask]) heatmap = self._heatmap_sum(heatmap) # rotate back to original resolution sumheatmap = np.zeros(shape=(orgimage.shape[0], orgimage.shape[1], getKpNum(self.category)), dtype=np.float) for i, item in enumerate(mlist): _heatmap = heatmap[i, :, :, :] _scale, _invRotMatrix = item _heatmap = cv2.resize(_heatmap, None, fx=2.0 / _scale, fy=2.0 / _scale) _invheatmap = cv2.warpAffine(_heatmap, _invRotMatrix, (orgimage.shape[1], orgimage.shape[0])) sumheatmap += _invheatmap return sumheatmap def _flip_out_heatmap(self, flipout): outmap = np.zeros(flipout.shape, dtype=np.float) for i in range(flipout.shape[-1]): flipid = getFlipMapID(self.category, i) mask = np.copy(flipout[:, :, i]) outmap[:, :, flipid] = cv2.flip(mask, flipCode=1) return outmap @staticmethod def get_normized_distance(category, gtKp): ''' :param category: :param gtKp: :return: if ground truth's two points do not exist, return a big number 1e6 ''' if category in ['skirt' ,'trousers']: ##waistband left and right waistband_left_index = get_kp_index_from_allkeys('waistband_left') waistband_right_index = get_kp_index_from_allkeys('waistband_right') if gtKp[waistband_left_index].visibility != -1 and gtKp[waistband_right_index].visibility != -1: distance = KpAnno.calcDistance(gtKp[waistband_left_index], gtKp[waistband_right_index]) else: distance = 1e6 return distance elif category in ['blouse', 'dress', 'outwear']: armpit_left_index = get_kp_index_from_allkeys('armpit_left') armpit_right_index = get_kp_index_from_allkeys('armpit_right') ##armpit_left armpit_right' if gtKp[armpit_left_index].visibility != -1 and gtKp[armpit_right_index].visibility != -1: distance = KpAnno.calcDistance(gtKp[armpit_left_index], gtKp[armpit_right_index]) else: distance = 1e6 return distance else: assert (0), category + " not implemented in _get_normized_distance" @staticmethod def calc_ne_score(category, dtKp, gtKp): assert (len(dtKp) == len(gtKp)), "predicted keypoint number should be the same as ground truth keypoints" + \ str(dtKp) + " vs " + str(gtKp) # calculate normalized error as score normalizedDistance = Evaluation.get_normized_distance(category, gtKp) mlist = list() for i in range(len(gtKp)): if gtKp[i].visibility == 1: dk = KpAnno.calcDistance(dtKp[i], gtKp[i]) mlist.append( dk/normalizedDistance) return mlist ================================================ FILE: src/eval/post_process.py ================================================ import cv2 import numpy as np from scipy.ndimage import gaussian_filter, maximum_filter from keras.layers import * from kpAnno import KpAnno def post_process_heatmap(heatMap, kpConfidenceTh=0.2): kplst = list() for i in range(heatMap.shape[-1]): # ignore last channel, background channel _map = heatMap[0, :, :, i] _map = gaussian_filter(_map, sigma=0.5) _nmsPeaks = non_max_supression(_map, windowSize=3, threshold=1e-6) y, x = np.where(_nmsPeaks == _nmsPeaks.max()) confidence = np.amax(_nmsPeaks) if confidence > kpConfidenceTh: kplst.append(KpAnno(x[0], y[0], 1)) else: kplst.append(KpAnno(x[0], y[0], -1)) return kplst def non_max_supression(plain, windowSize=3, threshold=1e-6): # clear value less than threshold under_th_indices = plain < threshold plain[under_th_indices] = 0 return plain* (plain == maximum_filter(plain, footprint=np.ones((windowSize, windowSize)))) ================================================ FILE: src/top/demo.py ================================================ import sys sys.path.insert(0, "../data_gen/") sys.path.insert(0, "../eval/") sys.path.insert(0, "../unet/") import argparse import os import pandas as pd import cv2 from evaluation import Evaluation from dataset import getKpKeys, get_kp_index_from_allkeys def visualize_keypoint(imageName, category, dtkp): cvmat = cv2.imread(imageName) for key in getKpKeys(category)[1:]: index = get_kp_index_from_allkeys(key) _kp = dtkp[index] cv2.circle(cvmat, center=(_kp.x, _kp.y), radius=7, color=(1.0, 0.0, 0.0), thickness=2) cv2.imshow('demo', cvmat) cv2.waitKey() def demo(modelfile): # load network xEval = Evaluation('all', modelfile) # load images and run prediction testfile = os.path.join("../../data/test/", 'test.csv') xdf = pd.read_csv(testfile) xdf = xdf.sample(frac=1.0) for _index, _row in xdf.iterrows(): _image_id = _row['image_id'] _category = _row['image_category'] imageName = os.path.join("../../data/test", _image_id) print _image_id, _category dtkp = xEval.predict_kp_with_rotate(imageName, _category) visualize_keypoint(imageName, _category, dtkp) if __name__ == "__main__": parser = argparse.ArgumentParser() parser.add_argument("--gpuID", default=0, type=int, help='gpu id') parser.add_argument("--modelfile", help="file of model") args = parser.parse_args() print args os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID" os.environ["CUDA_VISIBLE_DEVICES"] = str(args.gpuID) demo(args.modelfile) ================================================ FILE: src/top/test.py ================================================ import sys sys.path.insert(0, "../data_gen/") sys.path.insert(0, "../eval/") sys.path.insert(0, "../unet/") import argparse import os from fashion_net import FashionNet from dataset import getKpNum, getKpKeys import pandas as pd from evaluation import Evaluation import pickle import numpy as np def get_best_single_model(valfile): ''' :param valfile: the log file with validation score for each snapshot :return: model file and score ''' def get_key(item): return item[1] with open(valfile) as xval: lines = xval.readlines() xlist = list() for linenum, xline in enumerate(lines): if 'hdf5' in xline and 'Socre' in xline: modelname = xline.strip().split(',')[0] overallscore = xline.strip().split(',')[1] xlist.append((modelname, overallscore)) bestmodel = sorted(xlist, key=get_key)[0] return bestmodel def fill_dataframe(kplst, keys, dfrow, image_category): # fill category dfrow['image_category'] = image_category assert (len(keys) == len(kplst)), str(len(kplst)) + ' must be the same as ' + str(len(keys)) for i, _key in enumerate(keys): kpann = kplst[i] outstr = str(int(kpann.x))+"_"+str(int(kpann.y))+"_"+str(1) dfrow[_key] = outstr def get_kp_from_dict(mdict, image_category, image_id): if image_category in mdict.keys(): xdict = mdict[image_category] else: xdict = mdict['all'] return xdict[image_id] def submission(pklpath): xdf = pd.read_csv("../../data/train/Annotations/train.csv") trainKeys = xdf.keys() testdf = pd.read_csv("../../data/test/test.csv") print len(testdf), " samples in test.csv" mdict = dict() for xfile in os.listdir(pklpath): if xfile.endswith('.pkl'): category = xfile.strip().split('.')[0] pkl = open(os.path.join(pklpath, xfile)) mdict[category] = pickle.load(pkl) print testdf.keys() print mdict.keys() submissionDf = pd.DataFrame(columns=trainKeys, index=np.arange(testdf.shape[0])) submissionDf = submissionDf.fillna(value='-1_-1_-1') submissionDf['image_id'] = testdf['image_id'] submissionDf['image_category'] = testdf['image_category'] for _index, _row in submissionDf.iterrows(): image_id = _row['image_id'] image_category = _row['image_category'] kplst = get_kp_from_dict(mdict, image_category, image_id) fill_dataframe(kplst, getKpKeys('all')[1:], _row, image_category) print len(submissionDf), "save to ", os.path.join(pklpath, 'submission.csv') submissionDf.to_csv( os.path.join(pklpath, 'submission.csv'), index=False ) def load_image_names(annfile, category): # read into dataframe xdf = pd.read_csv(annfile) xdf = xdf[xdf['image_category'] == category] return xdf def main_test(savepath, modelpath, augmentFlag): valfile = os.path.join(modelpath, 'val.log') bestmodels = get_best_single_model(valfile) print bestmodels, augmentFlag xEval = Evaluation('all', bestmodels[0]) # load images and run prediction testfile = os.path.join("../../data/test/", 'test.csv') for category in ['skirt', 'blouse', 'trousers', 'outwear', 'dress']: xdict = dict() xdf = load_image_names(testfile, category) print len(xdf), " images to process ", category count = 0 for _index, _row in xdf.iterrows(): count += 1 if count%1000 == 0: print count, "images have been processed" _image_id = _row['image_id'] imageName = os.path.join("../../data/test", _image_id) if augmentFlag: dtkp = xEval.predict_kp_with_rotate(imageName, _row['image_category']) else: dtkp = xEval.predict_kp(imageName, _row['image_category'], multiOutput=True) xdict[_image_id] = dtkp savefile = os.path.join(savepath, category+'.pkl') with open(savefile, 'wb') as xfile: pickle.dump(xdict, xfile) print "prediction save to ", savefile if __name__ == "__main__": parser = argparse.ArgumentParser() parser.add_argument("--gpuID", default=0, type=int, help='gpu id') parser.add_argument("--modelpath", help="path of trained model") parser.add_argument("--outpath", help="path to save predicted keypoints") parser.add_argument("--augment", default=False, type=bool, help="augment or not") args = parser.parse_args() print args os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID" os.environ["CUDA_VISIBLE_DEVICES"] = str(args.gpuID) main_test(args.outpath, args.modelpath, args.augment) submission(args.outpath) ================================================ FILE: src/top/train.py ================================================ import sys sys.path.insert(0, "../data_gen/") sys.path.insert(0, "../unet/") import argparse import os from fashion_net import FashionNet from dataset import getKpNum import tensorflow as tf from keras import backend as k if __name__ == "__main__": parser = argparse.ArgumentParser() parser.add_argument("--gpuID", default=0, type=int, help='gpu id') parser.add_argument("--category", help="specify cloth category") parser.add_argument("--network", help="specify network arch'") parser.add_argument("--batchSize", default=8, type=int, help='batch size for training') parser.add_argument("--epochs", default=20, type=int, help="number of traning epochs") parser.add_argument("--resume", default=False, type=bool, help="resume training or not") parser.add_argument("--lrdecay", default=False, type=bool, help="lr decay or not") parser.add_argument("--resumeModel", help="start point to retrain") parser.add_argument("--initEpoch", type=int, help="epoch to resume") args = parser.parse_args() os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID" os.environ["CUDA_VISIBLE_DEVICES"] = str(args.gpuID) # TensorFlow wizardry config = tf.ConfigProto() # Don't pre-allocate memory; allocate as-needed config.gpu_options.allow_growth = True # Only allow a total of half the GPU memory to be allocated config.gpu_options.per_process_gpu_memory_fraction = 1.0 # Create a session with the above options specified. k.tensorflow_backend.set_session(tf.Session(config=config)) if not args.resume : xnet = FashionNet(512, 512, getKpNum(args.category)) xnet.build_model(modelName=args.network, show=True) xnet.train(args.category, epochs=args.epochs, batchSize=args.batchSize, lrschedule=args.lrdecay) else: xnet = FashionNet(512, 512, getKpNum(args.category)) xnet.resume_train(args.category, args.resumeModel, args.network, args.initEpoch, epochs=args.epochs, batchSize=args.batchSize) ================================================ FILE: src/unet/fashion_net.py ================================================ import sys sys.path.insert(0, "../data_gen/") sys.path.insert(0, "../eval/") from data_generator import DataGenerator from keras.callbacks import ModelCheckpoint, CSVLogger from keras.models import load_model from data_process import pad_image, normalize_image import os import cv2 import numpy as np import datetime from eval_callback import NormalizedErrorCallBack from refinenet_mask_v3 import Res101RefineNetMaskV3, euclidean_loss from resnet101 import Scale import tensorflow as tf class FashionNet(object): def __init__(self, inputHeight, inputWidth, nClasses): self.inputWidth = inputWidth self.inputHeight = inputHeight self.nClass = nClasses def build_model(self, modelName='v2', show=False): self.modelName = modelName self.model = Res101RefineNetMaskV3(self.nClass, self.inputHeight, self.inputWidth, nStackNum=2) self.nStackNum = 2 # show model summary and layer name if show: self.model.summary() for layer in self.model.layers: print layer.name, layer.trainable def train(self, category, batchSize=8, epochs=20, lrschedule=False): trainDt = DataGenerator(category, os.path.join("../../data/train/Annotations", "train_split.csv")) trainGen = trainDt.generator_with_mask_ohem( graph=tf.get_default_graph(), kerasModel=self.model, batchSize= batchSize, inputSize=(self.inputHeight, self.inputWidth), nStackNum=self.nStackNum, flipFlag=False, cropFlag=False) normalizedErrorCallBack = NormalizedErrorCallBack("../../trained_models/", category, True) csvlogger = CSVLogger( os.path.join(normalizedErrorCallBack.get_folder_path(), "csv_train_"+self.modelName+"_"+str(datetime.datetime.now().strftime('%H:%M'))+".csv")) xcallbacks = [normalizedErrorCallBack, csvlogger] self.model.fit_generator(generator=trainGen, steps_per_epoch=trainDt.get_dataset_size()//batchSize, epochs=epochs, callbacks=xcallbacks) def load_model(self, netWeightFile): self.model = load_model(netWeightFile, custom_objects={'euclidean_loss': euclidean_loss, 'Scale': Scale}) def resume_train(self, category, pretrainModel, modelName, initEpoch, batchSize=8, epochs=20): self.modelName = modelName self.load_model(pretrainModel) refineNetflag = True self.nStackNum = 2 modelPath = os.path.dirname(pretrainModel) trainDt = DataGenerator(category, os.path.join("../../data/train/Annotations", "train_split.csv")) trainGen = trainDt.generator_with_mask_ohem(graph=tf.get_default_graph(), kerasModel=self.model, batchSize=batchSize, inputSize=(self.inputHeight, self.inputWidth), nStackNum=self.nStackNum, flipFlag=False, cropFlag=False) normalizedErrorCallBack = NormalizedErrorCallBack("../../trained_models/", category, refineNetflag, resumeFolder=modelPath) csvlogger = CSVLogger(os.path.join(normalizedErrorCallBack.get_folder_path(), "csv_train_" + self.modelName + "_" + str( datetime.datetime.now().strftime('%H:%M')) + ".csv")) self.model.fit_generator(initial_epoch=initEpoch, generator=trainGen, steps_per_epoch=trainDt.get_dataset_size() // batchSize, epochs=epochs, callbacks=[normalizedErrorCallBack, csvlogger]) def predict_image(self, imgfile): # load image and preprocess img = cv2.imread(imgfile) img, _ = pad_image(img, list(), 512, 512) img = normalize_image(img) input = img[np.newaxis,:,:,:] # inference heatmap = self.model.predict(input) return heatmap def predict(self, input): # inference heatmap = self.model.predict(input) return heatmap ================================================ FILE: src/unet/refinenet.py ================================================ from keras.models import * from keras.layers import * from keras.optimizers import Adam, SGD from keras import backend as K from keras.applications.resnet50 import ResNet50 IMAGE_ORDERING = 'channels_last' def Res101RefineNetDilated(n_classes, inputHeight, inputWidth): model = build_network_resnet101(inputHeight, inputWidth, n_classes, dilated=True) return model def Res101RefineNetStacked(n_classes, inputHeight, inputWidth, nStackNum): model = build_network_resnet101_stack(inputHeight, inputWidth, n_classes, nStackNum) return model def euclidean_loss(x, y): return K.sqrt(K.sum(K.square(x - y))) def create_global_net(lowlevelFeatures, n_classes): lf2x, lf4x, lf8x, lf16x = lowlevelFeatures o = lf16x o = (Conv2D(256, (3, 3), activation='relu', padding='same', name='up16x_conv', data_format=IMAGE_ORDERING))(o) o = (BatchNormalization())(o) o = (Conv2DTranspose(256, kernel_size=(3, 3), strides=(2, 2), name='upsample_16x', activation='relu', padding='same', data_format=IMAGE_ORDERING))(o) o = (concatenate([o, lf8x], axis=-1)) o = (Conv2D(128, (3, 3), activation='relu', padding='same', name='up8x_conv', data_format=IMAGE_ORDERING))(o) o = (BatchNormalization())(o) fup8x = o o = (Conv2DTranspose(128, kernel_size=(3, 3), strides=(2, 2), name='upsample_8x', padding='same', activation='relu', data_format=IMAGE_ORDERING))(o) o = (concatenate([o, lf4x], axis=-1)) o = (Conv2D(64, (3, 3), activation='relu', padding='same', name='up4x_conv', data_format=IMAGE_ORDERING))(o) o = (BatchNormalization())(o) fup4x = o o = (Conv2DTranspose(64, kernel_size=(3, 3), strides=(2, 2), name='upsample_4x', padding='same', activation='relu', data_format=IMAGE_ORDERING))(o) o = (concatenate([o, lf2x], axis=-1)) o = (Conv2D(64, (3, 3), activation='relu', padding='same', name='up2x_conv', data_format=IMAGE_ORDERING))(o) o = (BatchNormalization())(o) fup2x = o out2x = Conv2D(n_classes, (1, 1), activation='linear', padding='same', name='out2x', data_format=IMAGE_ORDERING)(fup2x) out4x = Conv2D(n_classes, (1, 1), activation='linear', padding='same', name='out4x', data_format=IMAGE_ORDERING)(fup4x) out8x = Conv2D(n_classes, (1, 1), activation='linear', padding='same', name='out8x', data_format=IMAGE_ORDERING)(fup8x) x4x = UpSampling2D((2, 2), data_format=IMAGE_ORDERING)(out8x) eadd4x = Add(name='global4x')([x4x, out4x]) x2x = UpSampling2D((2, 2), data_format=IMAGE_ORDERING)(eadd4x) eadd2x = Add(name='global2x')([x2x, out2x]) return (fup8x, eadd4x, eadd2x) def create_refine_net(inputFeatures, n_classes): f8x, f4x, f2x = inputFeatures # 2 Conv2DTranspose f8x -> fup8x fup8x = (Conv2DTranspose(128, kernel_size=(3, 3), strides=(2, 2), name='refine8x_deconv_1', padding='same', activation='relu', data_format=IMAGE_ORDERING))(f8x) fup8x = (BatchNormalization())(fup8x) fup8x = (Conv2DTranspose(128, kernel_size=(3, 3), strides=(2, 2), name='refine8x_deconv_2', padding='same', activation='relu', data_format=IMAGE_ORDERING))(fup8x) fup8x = (BatchNormalization())(fup8x) # 1 Conv2DTranspose f4x -> fup4x fup4x = (Conv2DTranspose(128, kernel_size=(3, 3), strides=(2, 2), name='refine4x_deconv', padding='same', activation='relu', data_format=IMAGE_ORDERING))(f4x) fup4x = (BatchNormalization())(fup4x) # 1 conv f2x -> fup2x fup2x = (Conv2D(128, (3, 3), activation='relu', padding='same', name='refine2x_conv', data_format=IMAGE_ORDERING))(f2x) fup2x = (BatchNormalization())(fup2x) # concat f2x, fup8x, fup4x fconcat = (concatenate([fup8x, fup4x, fup2x], axis=-1, name='refine_concat')) # 1x1 to map to required feature map out2x = Conv2D(n_classes, (1, 1), activation='linear', padding='same', name='refine2x', data_format=IMAGE_ORDERING)(fconcat) return out2x def create_refine_net_bottleneck(inputFeatures, n_classes): f8x, f4x, f2x = inputFeatures # 2 Conv2DTranspose f8x -> fup8x fup8x = (Conv2D(256, kernel_size=(1, 1), name='refine8x_1', padding='same', activation='relu', data_format=IMAGE_ORDERING))(f8x) fup8x = (BatchNormalization())(fup8x) fup8x = (Conv2D(128, kernel_size=(1, 1), name='refine8x_2', padding='same', activation='relu', data_format=IMAGE_ORDERING))(fup8x) fup8x = (BatchNormalization())(fup8x) fup8x = UpSampling2D((4, 4), data_format=IMAGE_ORDERING)(fup8x) # 1 Conv2DTranspose f4x -> fup4x fup4x = (Conv2D(128, kernel_size=(1, 1), name='refine4x', padding='same', activation='relu', data_format=IMAGE_ORDERING))(f4x) fup4x = (BatchNormalization())(fup4x) fup4x = UpSampling2D((2, 2), data_format=IMAGE_ORDERING)(fup4x) # 1 conv f2x -> fup2x fup2x = (Conv2D(128, (1, 1), activation='relu', padding='same', name='refine2x_conv', data_format=IMAGE_ORDERING))(f2x) fup2x = (BatchNormalization())(fup2x) # concat f2x, fup8x, fup4x fconcat = (concatenate([fup8x, fup4x, fup2x], axis=-1, name='refine_concat')) # 1x1 to map to required feature map out2x = Conv2D(n_classes, (1, 1), activation='linear', padding='same', name='refine2x', data_format=IMAGE_ORDERING)(fconcat) return out2x def create_stack_refinenet(inputFeatures, n_classes, layerName): f8x, f4x, f2x = inputFeatures # 2 Conv2DTranspose f8x -> fup8x fup8x = (Conv2D(256, kernel_size=(1, 1), name=layerName+'_refine8x_1', padding='same', activation='relu'))(f8x) fup8x = (BatchNormalization())(fup8x) fup8x = (Conv2D(128, kernel_size=(1, 1), name=layerName+'refine8x_2', padding='same', activation='relu'))(fup8x) fup8x = (BatchNormalization())(fup8x) out8x = fup8x fup8x = UpSampling2D((4, 4), data_format=IMAGE_ORDERING)(fup8x) # 1 Conv2DTranspose f4x -> fup4x fup4x = (Conv2D(128, kernel_size=(1, 1), name=layerName+'refine4x', padding='same', activation='relu'))(f4x) fup4x = (BatchNormalization())(fup4x) out4x = fup4x fup4x = UpSampling2D((2, 2), data_format=IMAGE_ORDERING)(fup4x) # 1 conv f2x -> fup2x fup2x = (Conv2D(128, (1, 1), activation='relu', padding='same', name=layerName+'refine2x_conv'))(f2x) fup2x = (BatchNormalization())(fup2x) # concat f2x, fup8x, fup4x fconcat = (concatenate([fup8x, fup4x, fup2x], axis=-1, name=layerName+'refine_concat')) # 1x1 to map to required feature map out2x = Conv2D(n_classes, (1, 1), activation='linear', padding='same', name=layerName+'refine2x')(fconcat) return out8x, out4x, out2x def create_global_net_dilated(lowlevelFeatures, n_classes): lf2x, lf4x, lf8x, lf16x = lowlevelFeatures o = lf16x o = (Conv2D(256, (3, 3), dilation_rate=(2, 2), activation='relu', padding='same', name='up16x_conv', data_format=IMAGE_ORDERING))(o) o = (BatchNormalization())(o) o = (Conv2DTranspose(256, kernel_size=(3, 3), strides=(2, 2), name='upsample_16x', activation='relu', padding='same', data_format=IMAGE_ORDERING))(o) o = (concatenate([o, lf8x], axis=-1)) o = (Conv2D(128, (3, 3), dilation_rate=(2, 2), activation='relu', padding='same', name='up8x_conv', data_format=IMAGE_ORDERING))(o) o = (BatchNormalization())(o) fup8x = o o = (Conv2DTranspose(128, kernel_size=(3, 3), strides=(2, 2), name='upsample_8x', padding='same', activation='relu', data_format=IMAGE_ORDERING))(o) o = (concatenate([o, lf4x], axis=-1)) o = (Conv2D(64, (3, 3), dilation_rate=(2, 2), activation='relu', padding='same', name='up4x_conv', data_format=IMAGE_ORDERING))(o) o = (BatchNormalization())(o) fup4x = o o = (Conv2DTranspose(64, kernel_size=(3, 3), strides=(2, 2), name='upsample_4x', padding='same', activation='relu', data_format=IMAGE_ORDERING))(o) o = (concatenate([o, lf2x], axis=-1)) o = (Conv2D(64, (3, 3), dilation_rate=(2, 2), activation='relu', padding='same', name='up2x_conv', data_format=IMAGE_ORDERING))(o) o = (BatchNormalization())(o) fup2x = o out2x = Conv2D(n_classes, (1, 1), activation='linear', padding='same', name='out2x', data_format=IMAGE_ORDERING)(fup2x) out4x = Conv2D(n_classes, (1, 1), activation='linear', padding='same', name='out4x', data_format=IMAGE_ORDERING)(fup4x) out8x = Conv2D(n_classes, (1, 1), activation='linear', padding='same', name='out8x', data_format=IMAGE_ORDERING)(fup8x) x4x = UpSampling2D((2, 2), data_format=IMAGE_ORDERING)(out8x) eadd4x = Add(name='global4x')([x4x, out4x]) x2x = UpSampling2D((2, 2), data_format=IMAGE_ORDERING)(eadd4x) eadd2x = Add(name='global2x')([x2x, out2x]) return (fup8x, eadd4x, eadd2x) def build_network_resnet101(inputHeight, inputWidth, n_classes, frozenlayers=True, dilated=False): input, lf2x, lf4x, lf8x, lf16x = load_backbone_res101net(inputHeight, inputWidth) # global net 8x, 4x, and 2x if dilated: g8x, g4x, g2x = create_global_net_dilated((lf2x, lf4x, lf8x, lf16x), n_classes) else: g8x, g4x, g2x = create_global_net((lf2x, lf4x, lf8x, lf16x), n_classes) # refine net, only 2x as output refine2x = create_refine_net_bottleneck((g8x, g4x, g2x), n_classes) model = Model(inputs=input, outputs=[g2x, refine2x]) adam = Adam(lr=1e-4) model.compile(optimizer=adam, loss=euclidean_loss, metrics=["accuracy"]) return model def build_network_resnet101_stack(inputHeight, inputWidth, n_classes, nStack): # backbone network input, lf2x,lf4x, lf8x, lf16x = load_backbone_res101net(inputHeight, inputWidth) # global net g8x, g4x, g2x = create_global_net_dilated((lf2x, lf4x, lf8x, lf16x), n_classes) s8x, s4x, s2x = g8x, g4x, g2x outputs = [g2x] for i in range(nStack): s8x, s4x, s2x = create_stack_refinenet((s8x, s4x, s2x), n_classes, 'stack_'+str(i)) outputs.append(s2x) model = Model(inputs=input, outputs=outputs) adam = Adam(lr=1e-4) model.compile(optimizer=adam, loss=euclidean_loss, metrics=["accuracy"]) return model def load_backbone_res101net(inputHeight, inputWidth): from resnet101 import ResNet101 xresnet = ResNet101(weights='imagenet', include_top=False, input_shape=(inputHeight, inputWidth, 3)) xresnet.load_weights("../../data/resnet101_weights_tf.h5", by_name=True) lf16x = xresnet.get_layer('res4b22_relu').output lf8x = xresnet.get_layer('res3b2_relu').output lf4x = xresnet.get_layer('res2c_relu').output lf2x = xresnet.get_layer('conv1_relu').output # add one padding for lf4x whose shape is 127x127 lf4xp = ZeroPadding2D(padding=((0, 1), (0, 1)))(lf4x) return (xresnet.input, lf2x, lf4xp, lf8x, lf16x) ================================================ FILE: src/unet/refinenet_mask_v3.py ================================================ from refinenet import load_backbone_res101net, create_global_net_dilated, create_stack_refinenet from keras.models import * from keras.layers import * from keras.optimizers import Adam, SGD from keras import backend as K import keras def Res101RefineNetMaskV3(n_classes, inputHeight, inputWidth, nStackNum): model = build_resnet101_stack_mask_v3(inputHeight, inputWidth, n_classes, nStackNum) return model def euclidean_loss(x, y): return K.sqrt(K.sum(K.square(x - y))) def apply_mask_to_output(output, mask): output_with_mask = keras.layers.multiply([output, mask]) return output_with_mask def build_resnet101_stack_mask_v3(inputHeight, inputWidth, n_classes, nStack): input_mask = Input(shape=(inputHeight//2, inputHeight//2, n_classes), name='mask') input_ohem_mask = Input(shape=(inputHeight//2, inputHeight//2, n_classes), name='ohem_mask') # backbone network input_image, lf2x,lf4x, lf8x, lf16x = load_backbone_res101net(inputHeight, inputWidth) # global net g8x, g4x, g2x = create_global_net_dilated((lf2x, lf4x, lf8x, lf16x), n_classes) s8x, s4x, s2x = g8x, g4x, g2x g2x_mask = apply_mask_to_output(g2x, input_mask) outputs = [g2x_mask] for i in range(nStack): s8x, s4x, s2x = create_stack_refinenet((s8x, s4x, s2x), n_classes, 'stack_'+str(i)) if i == (nStack-1): # last stack with ohem_mask s2x_mask = apply_mask_to_output(s2x, input_ohem_mask) else: s2x_mask = apply_mask_to_output(s2x, input_mask) outputs.append(s2x_mask) model = Model(inputs=[input_image, input_mask, input_ohem_mask], outputs=outputs) adam = Adam(lr=1e-4) model.compile(optimizer=adam, loss=euclidean_loss, metrics=["accuracy"]) return model ================================================ FILE: src/unet/resnet101.py ================================================ # -*- coding: utf-8 -*- """ResNet-101 model for Keras. # Reference: - [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385) Slightly modified Felix Yu's (https://github.com/flyyufelix) implementation of ResNet-101 to have consistent API as those pre-trained models within `keras.applications`. The original implementation is found here https://gist.github.com/flyyufelix/65018873f8cb2bbe95f429c474aa1294#file-resnet-101_keras-py Implementation is based on Keras 2.0 """ from keras.layers import ( Input, Dense, Conv2D, MaxPooling2D, AveragePooling2D, ZeroPadding2D, Flatten, Activation, GlobalAveragePooling2D, GlobalMaxPooling2D, add) from keras.layers.normalization import BatchNormalization from keras.models import Model from keras import initializers from keras.engine import Layer, InputSpec from keras.engine.topology import get_source_inputs from keras import backend as K from keras.applications.imagenet_utils import _obtain_input_shape from keras.utils.data_utils import get_file import warnings import sys sys.setrecursionlimit(3000) WEIGHTS_PATH_TH = 'https://dl.dropboxusercontent.com/s/rrp56zm347fbrdn/resnet101_weights_th.h5?dl=0' WEIGHTS_PATH_TF = 'https://dl.dropboxusercontent.com/s/a21lyqwgf88nz9b/resnet101_weights_tf.h5?dl=0' MD5_HASH_TH = '3d2e9a49d05192ce6e22200324b7defe' MD5_HASH_TF = '867a922efc475e9966d0f3f7b884dc15' class Scale(Layer): '''Learns a set of weights and biases used for scaling the input data. the output consists simply in an element-wise multiplication of the input and a sum of a set of constants: out = in * gamma + beta, where 'gamma' and 'beta' are the weights and biases larned. # Arguments axis: integer, axis along which to normalize in mode 0. For instance, if your input tensor has shape (samples, channels, rows, cols), set axis to 1 to normalize per feature map (channels axis). momentum: momentum in the computation of the exponential average of the mean and standard deviation of the data, for feature-wise normalization. weights: Initialization weights. List of 2 Numpy arrays, with shapes: `[(input_shape,), (input_shape,)]` beta_init: name of initialization function for shift parameter (see [initializers](../initializers.md)), or alternatively, Theano/TensorFlow function to use for weights initialization. This parameter is only relevant if you don't pass a `weights` argument. gamma_init: name of initialization function for scale parameter (see [initializers](../initializers.md)), or alternatively, Theano/TensorFlow function to use for weights initialization. This parameter is only relevant if you don't pass a `weights` argument. gamma_init: name of initialization function for scale parameter (see [initializers](../initializers.md)), or alternatively, Theano/TensorFlow function to use for weights initialization. This parameter is only relevant if you don't pass a `weights` argument. ''' def __init__(self, weights=None, axis=-1, momentum=0.9, beta_init='zero', gamma_init='one', **kwargs): self.momentum = momentum self.axis = axis self.beta_init = initializers.get(beta_init) self.gamma_init = initializers.get(gamma_init) self.initial_weights = weights super(Scale, self).__init__(**kwargs) def build(self, input_shape): self.input_spec = [InputSpec(shape=input_shape)] shape = (int(input_shape[self.axis]),) self.gamma = K.variable( self.gamma_init(shape), name='{}_gamma'.format(self.name)) self.beta = K.variable( self.beta_init(shape), name='{}_beta'.format(self.name)) self.trainable_weights = [self.gamma, self.beta] if self.initial_weights is not None: self.set_weights(self.initial_weights) del self.initial_weights def call(self, x, mask=None): input_shape = self.input_spec[0].shape broadcast_shape = [1] * len(input_shape) broadcast_shape[self.axis] = input_shape[self.axis] out = K.reshape( self.gamma, broadcast_shape) * x + K.reshape(self.beta, broadcast_shape) return out def get_config(self): config = {"momentum": self.momentum, "axis": self.axis} base_config = super(Scale, self).get_config() return dict(list(base_config.items()) + list(config.items())) def identity_block(input_tensor, kernel_size, filters, stage, block): '''The identity_block is the block that has no conv layer at shortcut # Arguments input_tensor: input tensor kernel_size: defualt 3, the kernel size of middle conv layer at main path filters: list of integers, the nb_filters of 3 conv layer at main path stage: integer, current stage label, used for generating layer names block: 'a','b'..., current block label, used for generating layer names ''' eps = 1.1e-5 if K.image_data_format() == 'channels_last': bn_axis = 3 else: bn_axis = 1 nb_filter1, nb_filter2, nb_filter3 = filters conv_name_base = 'res' + str(stage) + block + '_branch' bn_name_base = 'bn' + str(stage) + block + '_branch' scale_name_base = 'scale' + str(stage) + block + '_branch' x = Conv2D(nb_filter1, (1, 1), name=conv_name_base + '2a', use_bias=False)(input_tensor) x = BatchNormalization(epsilon=eps, axis=bn_axis, name=bn_name_base + '2a')(x) x = Scale(axis=bn_axis, name=scale_name_base + '2a')(x) x = Activation('relu', name=conv_name_base + '2a_relu')(x) x = ZeroPadding2D((1, 1), name=conv_name_base + '2b_zeropadding')(x) x = Conv2D(nb_filter2, (kernel_size, kernel_size), name=conv_name_base + '2b', use_bias=False)(x) x = BatchNormalization(epsilon=eps, axis=bn_axis, name=bn_name_base + '2b')(x) x = Scale(axis=bn_axis, name=scale_name_base + '2b')(x) x = Activation('relu', name=conv_name_base + '2b_relu')(x) x = Conv2D(nb_filter3, (1, 1), name=conv_name_base + '2c', use_bias=False)(x) x = BatchNormalization(epsilon=eps, axis=bn_axis, name=bn_name_base + '2c')(x) x = Scale(axis=bn_axis, name=scale_name_base + '2c')(x) x = add([x, input_tensor], name='res' + str(stage) + block) x = Activation('relu', name='res' + str(stage) + block + '_relu')(x) return x def conv_block(input_tensor, kernel_size, filters, stage, block, strides=(2, 2)): '''conv_block is the block that has a conv layer at shortcut # Arguments input_tensor: input tensor kernel_size: defualt 3, the kernel size of middle conv layer at main path filters: list of integers, the nb_filters of 3 conv layer at main path stage: integer, current stage label, used for generating layer names block: 'a','b'..., current block label, used for generating layer names Note that from stage 3, the first conv layer at main path is with strides=(2,2). And the shortcut should have strides=(2,2) as well ''' eps = 1.1e-5 if K.image_data_format() == 'channels_last': bn_axis = 3 else: bn_axis = 1 nb_filter1, nb_filter2, nb_filter3 = filters conv_name_base = 'res' + str(stage) + block + '_branch' bn_name_base = 'bn' + str(stage) + block + '_branch' scale_name_base = 'scale' + str(stage) + block + '_branch' x = Conv2D(nb_filter1, (1, 1), strides=strides, name=conv_name_base + '2a', use_bias=False)(input_tensor) x = BatchNormalization(epsilon=eps, axis=bn_axis, name=bn_name_base + '2a')(x) x = Scale(axis=bn_axis, name=scale_name_base + '2a')(x) x = Activation('relu', name=conv_name_base + '2a_relu')(x) x = ZeroPadding2D((1, 1), name=conv_name_base + '2b_zeropadding')(x) x = Conv2D(nb_filter2, (kernel_size, kernel_size), name=conv_name_base + '2b', use_bias=False)(x) x = BatchNormalization(epsilon=eps, axis=bn_axis, name=bn_name_base + '2b')(x) x = Scale(axis=bn_axis, name=scale_name_base + '2b')(x) x = Activation('relu', name=conv_name_base + '2b_relu')(x) x = Conv2D(nb_filter3, (1, 1), name=conv_name_base + '2c', use_bias=False)(x) x = BatchNormalization(epsilon=eps, axis=bn_axis, name=bn_name_base + '2c')(x) x = Scale(axis=bn_axis, name=scale_name_base + '2c')(x) shortcut = Conv2D(nb_filter3, (1, 1), strides=strides, name=conv_name_base + '1', use_bias=False)(input_tensor) shortcut = BatchNormalization(epsilon=eps, axis=bn_axis, name=bn_name_base + '1')(shortcut) shortcut = Scale(axis=bn_axis, name=scale_name_base + '1')(shortcut) x = add([x, shortcut], name='res' + str(stage) + block) x = Activation('relu', name='res' + str(stage) + block + '_relu')(x) return x def ResNet101(include_top=True, weights='imagenet', input_tensor=None, input_shape=None, pooling=None, classes=1000): """Instantiates the ResNet-101 architecture. Optionally loads weights pre-trained on ImageNet. Note that when using TensorFlow, for best performance you should set image_data_format='channels_last'` in your Keras config at ~/.keras/keras.json. The model and the weights are compatible with both TensorFlow and Theano. The data format convention used by the model is the one specified in your Keras config file. Parameters ---------- include_top: whether to include the fully-connected layer at the top of the network. weights: one of `None` (random initialization) or 'imagenet' (pre-training on ImageNet). input_tensor: optional Keras tensor (i.e. output of `layers.Input()`) to use as image input for the model. input_shape: optional shape tuple, only to be specified if `include_top` is False (otherwise the input shape has to be `(224, 224, 3)` (with `channels_last` data format) or `(3, 224, 224)` (with `channels_first` data format). It should have exactly 3 inputs channels, and width and height should be no smaller than 197. E.g. `(200, 200, 3)` would be one valid value. pooling: Optional pooling mode for feature extraction when `include_top` is `False`. - `None` means that the output of the model will be the 4D tensor output of the last convolutional layer. - `avg` means that global average pooling will be applied to the output of the last convolutional layer, and thus the output of the model will be a 2D tensor. - `max` means that global max pooling will be applied. classes: optional number of classes to classify images into, only to be specified if `include_top` is True, and if no `weights` argument is specified. Returns ------- A Keras model instance. Raises ------ ValueError: in case of invalid argument for `weights`, or invalid input shape. """ if weights not in {'imagenet', None}: raise ValueError('The `weights` argument should be either ' '`None` (random initialization) or `imagenet` ' '(pre-training on ImageNet).') if weights == 'imagenet' and include_top and classes != 1000: raise ValueError('If using `weights` as imagenet with `include_top`' ' as true, `classes` should be 1000') # Determine proper input shape input_shape = _obtain_input_shape(input_shape, default_size=224, min_size=197, data_format=K.image_data_format(), require_flatten=include_top, weights=weights) if input_tensor is None: img_input = Input(shape=input_shape, name='data') else: if not K.is_keras_tensor(input_tensor): img_input = Input( tensor=input_tensor, shape=input_shape, name='data') else: img_input = input_tensor if K.image_data_format() == 'channels_last': bn_axis = 3 else: bn_axis = 1 eps = 1.1e-5 x = ZeroPadding2D((3, 3), name='conv1_zeropadding')(img_input) x = Conv2D(64, (7, 7), strides=(2, 2), name='conv1', use_bias=False)(x) x = BatchNormalization(epsilon=eps, axis=bn_axis, name='bn_conv1')(x) x = Scale(axis=bn_axis, name='scale_conv1')(x) x = Activation('relu', name='conv1_relu')(x) x = MaxPooling2D((3, 3), strides=(2, 2), name='pool1')(x) x = conv_block(x, 3, [64, 64, 256], stage=2, block='a', strides=(1, 1)) x = identity_block(x, 3, [64, 64, 256], stage=2, block='b') x = identity_block(x, 3, [64, 64, 256], stage=2, block='c') x = conv_block(x, 3, [128, 128, 512], stage=3, block='a') for i in range(1, 3): x = identity_block(x, 3, [128, 128, 512], stage=3, block='b' + str(i)) x = conv_block(x, 3, [256, 256, 1024], stage=4, block='a') for i in range(1, 23): x = identity_block(x, 3, [256, 256, 1024], stage=4, block='b' + str(i)) x = conv_block(x, 3, [512, 512, 2048], stage=5, block='a') x = identity_block(x, 3, [512, 512, 2048], stage=5, block='b') x = identity_block(x, 3, [512, 512, 2048], stage=5, block='c') x = AveragePooling2D((7, 7), name='avg_pool')(x) if include_top: x = Flatten()(x) x = Dense(classes, activation='softmax', name='mmfc1000')(x) else: if pooling == 'avg': x = GlobalAveragePooling2D()(x) elif pooling == 'max': x = GlobalMaxPooling2D()(x) # Ensure that the model takes into account # any potential predecessors of `input_tensor`. if input_tensor is not None: inputs = get_source_inputs(input_tensor) else: inputs = img_input # Create model. model = Model(inputs, x, name='resnet101') ''' # load weights if weights == 'imagenet': filename = 'resnet101_weights_{}.h5'.format(K.image_dim_ordering()) if K.backend() == 'theano': path = WEIGHTS_PATH_TH md5_hash = MD5_HASH_TH else: path = WEIGHTS_PATH_TF md5_hash = MD5_HASH_TF weights_path = get_file( fname=filename, origin=path, cache_subdir='models', md5_hash=md5_hash, hash_algorithm='md5') model.load_weights(weights_path, by_name=True) if K.image_data_format() == 'channels_first' and K.backend() == 'tensorflow': warnings.warn('You are using the TensorFlow backend, yet you ' 'are using the Theano ' 'image data format convention ' '(`image_data_format="channels_first"`). ' 'For best performance, set ' '`image_data_format="channels_last"` in ' 'your Keras config ' 'at ~/.keras/keras.json.') ''' return model ================================================ FILE: submission/placeholder.txt ================================================ ================================================ FILE: trained_models/placeholder.txt ================================================