Repository: NVIDIA-AI-IOT/trt_pose_hand Branch: main Commit: bb7536dcdc67 Files: 17 Total size: 91.8 KB Directory structure: gitextract_gm96bpqz/ ├── LICENSE.md ├── README.md ├── cursor_control_live_demo.ipynb ├── data_collection/ │ └── gesture_data_collection.ipynb ├── dataloader.py ├── gesture_classification_live_demo.ipynb ├── gesture_classifier.py ├── gesture_data_collection_pose.ipynb ├── gesture_training/ │ ├── realold_svmmodel.sav │ └── train_gesture_classification.ipynb ├── live_hand_pose.ipynb ├── mini_paint_live_demo.ipynb ├── model/ │ └── README.md ├── preprocess/ │ ├── gesture.json │ └── hand_pose.json ├── preprocessdata.py └── svmmodel.sav ================================================ FILE CONTENTS ================================================ ================================================ FILE: LICENSE.md ================================================ Copyright (c) 2020, NVIDIA CORPORATION. All rights reserved. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ================================================ FILE: README.md ================================================ # Hand Pose Estimation And Classification This project is an extention of TRT Pose for Hand Pose Detection. The project includes - Pretrained models for hand pose estimation capable of running in real time on Jetson Xavier NX. - Scripts for applications of Hand Pose Estimation - Hand gesture recoginition (hand pose classification) - Cursor control - Mini-Paint type of application - Pretrained model for gesture recoginition ## Getting Started ### Step 1 - Install trt_pose and it's dependencies Make sure to follow all the instructions from trt_pose and install all it's depenedencies. Follow step 1 and step 2 from https://github.com/NVIDIA-AI-IOT/trt_pose. ### Step 2 - Install dependecies for hand pose pip install traitlets ### Step 3 - Download model wieght | Model | Weight | |-------|---------| | hand_pose_resnet18_baseline_att_224x224_A | [download model](https://drive.google.com/file/d/1NCVo0FiooWccDzY7hCc5MAKaoUpts3mo/view?usp=sharing) | 1. Download the model weight using the link above. 2. Place the downloaded weight in the [model](model/) directory ### Step 4 - Run hand pose and it's applications A) Hand Pose demo - Open and follow live_hand_pose.ipynb notebook. ![](images/live_hand_demo.gif) B) Hand gesture recoginition (hand pose classification) - Install dependecies - scikit-learn - pip install -U scikit-learn - or install it from the source The current gesture classification model supports six classes (fist, pan, stop, fine, peace, no hand). More gestures can be added by a simple process of creating your own dataset and training it on an svm model. An SVM model weight (svmmodel.sav) is provided for inference. ![](images/gesture_classification.gif) To make your own hand gesture classification from the hand pose estimation, follow the following steps - Create your own dataset using the gesture_data_collection.ipynb or gesture_data_collection_with_pose.ipynb. This will allow you to create the type of gestures you want to classify. (eg. tumbs up, fist,etc). This notebook will automatically create a dataset with images and labels that is ready to be trained for gesture classification. - Train using the train_gesture_classification.ipynb notebook file. It uses an SVM from scikit-learn. Other types of models can also be experimented. C) Cursor control application - Install dependecies - pyautogui - python3 -m pip install pyautogui - On jetson install it from the source - Open and follow the cursor_control_live_demo.ipynb notebook. - This will allow you to control your mouse cursor on your desktop. It uses the hand gesture classification. When your hand geture is pan, you can control the cursor. when it is click, it's left click. | Buying a tuna sandwich :) | navigating map | |-------|---------| |![](images/subway_buy.gif) | ![](images/subway_map.gif)| D) Mini-Paint A mini paint app that let's you draw, erase and clear on your camera screen. ------------------------------------------------------------------------------------------------------------------------------------- The model was trained using the training script in trt_pose and the hand pose data collected in Nvidia. Model details: resnet18 ------------------------------------------------------------------------------------------------------------------------------------- ## See also - [trt_pose](https://github.com/NVIDIA-AI-IOT/trt_pose) - Real-time pose estimation accelerated with NVIDIA TensorRT - [deepstream_pose_estimation](https://github.com/NVIDIA-AI-IOT/deepstream_pose_estimation) - [trt_pose](https://github.com/NVIDIA-AI-IOT/trt_pose) deepstream integration - [ros2_trt_pose](https://github.com/NVIDIA-AI-IOT/ros2_trt_pose) - ROS 2 package for "trt_pose": real-time human pose estimation on NVIDIA Jetson Platform - [torch2trt](http://github.com/NVIDIA-AI-IOT/torch2trt) - An easy to use PyTorch to TensorRT converter ## References Cao, Zhe, et al. "Realtime multi-person 2d pose estimation using part affinity fields." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017. Xiao, Bin, Haiping Wu, and Yichen Wei. "Simple baselines for human pose estimation and tracking." Proceedings of the European Conference on Computer Vision (ECCV). 2018. ================================================ FILE: cursor_control_live_demo.ipynb ================================================ { "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Matplotlib created a temporary config/cache directory at /tmp/matplotlib-q__yfjpz because the default path (/home/mikyas/.cache/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.\n" ] } ], "source": [ "import json\n", "import cv2\n", "import matplotlib.pyplot as plt\n", "import matplotlib.image as mpimg \n", "import trt_pose.coco\n", "import math\n", "import os\n", "import numpy as np\n", "import traitlets\n", "import pickle \n", "import pyautogui\n", "import time " ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "with open('hand_pose.json', 'r') as f:\n", " hand_pose = json.load(f)\n", "\n", "topology = trt_pose.coco.coco_category_to_topology(hand_pose)\n", "import trt_pose.models\n", "\n", "num_parts = len(hand_pose['keypoints'])\n", "num_links = len(hand_pose['skeleton'])\n", "\n", "model = trt_pose.models.resnet18_baseline_att(num_parts, 2 * num_links).cuda().eval()\n", "import torch\n", "\n", "\n", "WIDTH = 224\n", "HEIGHT = 224\n", "data = torch.zeros((1, 3, HEIGHT, WIDTH)).cuda()\n", "\n", "if not os.path.exists('hand_pose_resnet18_att_244_244_trt.pth'):\n", " MODEL_WEIGHTS = 'hand_pose_resnet18_att_244_244.pth'\n", " model.load_state_dict(torch.load(MODEL_WEIGHTS))\n", " import torch2trt\n", " model_trt = torch2trt.torch2trt(model, [data], fp16_mode=True, max_workspace_size=1<<25)\n", " OPTIMIZED_MODEL = 'hand_pose_resnet18_att_244_244_trt.pth'\n", " torch.save(model_trt.state_dict(), OPTIMIZED_MODEL)\n", "\n", "\n", "OPTIMIZED_MODEL = 'hand_pose_resnet18_att_244_244_trt.pth'\n", "from torch2trt import TRTModule\n", "\n", "model_trt = TRTModule()\n", "model_trt.load_state_dict(torch.load(OPTIMIZED_MODEL))" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "from trt_pose.draw_objects import DrawObjects\n", "from trt_pose.parse_objects import ParseObjects\n", "\n", "parse_objects = ParseObjects(topology,cmap_threshold=0.15, link_threshold=0.15)\n", "draw_objects = DrawObjects(topology)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "\n", "import torchvision.transforms as transforms\n", "import PIL.Image\n", "\n", "mean = torch.Tensor([0.485, 0.456, 0.406]).cuda()\n", "std = torch.Tensor([0.229, 0.224, 0.225]).cuda()\n", "device = torch.device('cuda')\n", "\n", "def preprocess(image):\n", " global device\n", " device = torch.device('cuda')\n", " image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)\n", " image = PIL.Image.fromarray(image)\n", " image = transforms.functional.to_tensor(image).to(device)\n", " image.sub_(mean[:, None, None]).div_(std[:, None, None])\n", " return image[None, ...]" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "from sklearn.pipeline import make_pipeline\n", "from sklearn.preprocessing import StandardScaler\n", "from sklearn.svm import SVC\n", "clf = make_pipeline(StandardScaler(), SVC(gamma='auto', kernel='rbf'))" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "from preprocessdata import preprocessdata\n", "preprocessdata = preprocessdata(topology, num_parts)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "svm_train = False\n", "if svm_train:\n", " clf, predicted = preprocessdata.trainsvm(clf, joints_train, joints_test, labels_train, hand.labels_test)\n", " filename = 'svmmodel.sav'\n", " pickle.dump(clf, open(filename, 'wb'))\n", "else:\n", " filename = 'svmmodel.sav'\n", " clf = pickle.load(open(filename, 'rb'))" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "from jetcam.usb_camera import USBCamera\n", "from jetcam.csi_camera import CSICamera\n", "from jetcam.utils import bgr8_to_jpeg\n", "\n", "camera = USBCamera(width=WIDTH, height=HEIGHT, capture_fps=30, capture_device=1)\n", "#camera = CSICamera(width=WIDTH, height=HEIGHT, capture_fps=30)\n", "\n", "camera.running = True" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "scrolled": false }, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "ae138f658a94450a9f45d6f81506aac0", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Image(value=b'', format='jpeg', height='256', width='256')" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import ipywidgets\n", "from IPython.display import display\n", "\n", "\n", "image_w = ipywidgets.Image(format='jpeg', width=224, height=224)\n", "display(image_w)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "def draw_joints(image, joints):\n", " count = 0\n", " for i in joints:\n", " if i==[0,0]:\n", " count+=1\n", " if count>= 19:\n", " return \n", " for i in joints:\n", " cv2.circle(image, (i[0],i[1]), 2, (0,0,255), 1)\n", " cv2.circle(image, (joints[0][0],joints[0][1]), 2, (255,0,255), 1)\n", " for i in hand_pose['skeleton']:\n", " if joints[i[0]-1][0]==0 or joints[i[1]-1][0] == 0:\n", " break\n", " cv2.line(image, (joints[i[0]-1][0],joints[i[0]-1][1]), (joints[i[1]-1][0],joints[i[1]-1][1]), (0,255,0), 1)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "screenWidth, screenHeight = pyautogui.size()\n", "p_text = 'none'\n", "p_sc = 0\n", "cur_x, cur_y = pyautogui.position()\n", "fixed_x, fixed_y = pyautogui.position()\n", "pyautogui.FAILSAFE = False\n", "t0 = time.time()\n", "def control_cursor(text, joints):\n", " global p_text\n", " global p_sc\n", " global t0\n", " global cur_x\n", " global cur_y\n", " global fixed_x, fixed_y\n", " cursor_joint = 6\n", " if p_text!=\"pan\":\n", " #pyautogui.position()\n", " fixed_x = joints[cursor_joint][0]\n", " fixed_y = joints[cursor_joint][1] \n", " if p_text!=\"click\" and text==\"click\":\n", " pyautogui.mouseUp(((joints[cursor_joint][0])*screenWidth)/256, ((joints[cursor_joint][1])*screenHeight)/256, button= 'left')\n", " pyautogui.click()\n", " if text == \"pan\":\n", " if joints[cursor_joint]!=[0,0]:\n", " pyautogui.mouseUp(((joints[cursor_joint][0])*screenWidth)/256, ((joints[cursor_joint][1])*screenHeight)/256, button= 'left')\n", "\n", " pyautogui.moveTo(((joints[cursor_joint][0])*screenWidth)/256, ((joints[cursor_joint][1])*screenHeight)/256)\n", " if text == \"scroll\":\n", " \n", " if joints[cursor_joint]!=[0,0] and joints[0]!=[0,0]:\n", " pyautogui.mouseUp(((joints[cursor_joint][0])*screenWidth)/256, ((joints[cursor_joint][1])*screenHeight)/256, button= 'left')#to_scroll = (joints[8][1]-joints[0][1])/10\n", " to_scroll = (p_sc-joints[cursor_joint][1])\n", " if to_scroll>0:\n", " to_scroll = 1\n", " else:\n", " to_scroll = -1\n", " pyautogui.scroll(int(to_scroll),x=(joints[cursor_joint][0]*screenWidth)/256, y=(joints[cursor_joint][1]*screenHeight)/256)\n", " if text == \"zoom\":\n", " \n", " \n", " pyautogui.keyDown('ctrl')\n", " if joints[cursor_joint]!=[0,0] and joints[0]!=[0,0]:\n", " pyautogui.mouseUp(((joints[cursor_joint][0])*screenWidth)/256, ((joints[cursor_joint][1])*screenHeight)/256, button= 'left')\n", " \n", " to_scroll = (p_sc-joints[cursor_joint][1])\n", " if to_scroll>0:\n", " to_scroll = 1\n", " else:\n", " to_scroll = -1\n", " t1 = time.time()\n", " #print(t1-t0)\n", " if t1-t0>1: \n", " pyautogui.scroll(int(to_scroll),x=(joints[cursor_joint][0]*screenWidth)/256, y=(joints[cursor_joint][1]*screenHeight)/256)\n", " t0 = time.time()\n", " pyautogui.keyUp('ctrl')\n", " \n", " \n", " if text == \"drag\":\n", " \n", " if joints[cursor_joint]!=[0,0]:\n", " pyautogui.mouseDown(((joints[cursor_joint][0])*screenWidth)/256, ((joints[cursor_joint][1])*screenHeight)/256, button= 'left')\n", " \n", " \n", " p_text = text\n", " p_sc = joints[cursor_joint][1]" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "def execute(change):\n", " image = change['new']\n", " data = preprocess(image)\n", " cmap, paf = model_trt(data)\n", " cmap, paf = cmap.detach().cpu(), paf.detach().cpu()\n", " counts, objects, peaks = parse_objects(cmap, paf)#, cmap_threshold=0.15, link_threshold=0.15)\n", " #draw_objects(image, counts, objects, peaks)\n", " joints = preprocessdata.joints_inference(image, counts, objects, peaks)\n", " \n", " dist_bn_joints = preprocessdata.find_distance(joints)\n", " gesture = clf.predict([dist_bn_joints,[0]*num_parts*num_parts])\n", " gesture_joints = gesture[0]\n", " preprocessdata.prev_queue.append(gesture_joints)\n", " preprocessdata.prev_queue.pop(0)\n", " preprocessdata.print_label(image, preprocessdata.prev_queue)\n", " #draw_joints(image, joints)\n", " control_cursor(preprocessdata.text, joints)\n", " image_w.value = bgr8_to_jpeg(image)\n" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "execute({'new': camera.value})" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "camera.observe(execute, names='value')" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "camera.unobserve_all()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#camera.running = False" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.9" } }, "nbformat": 4, "nbformat_minor": 4 } ================================================ FILE: data_collection/gesture_data_collection.ipynb ================================================ { "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import os\n", "import json" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This notebook creates a dataset (images and labels as a json file). The dataset created can be used for pose classification. \n", "In order to create a new dataset for gesture recoginition specify the following parameters \n", "\n", "**no_of_classes** - Number of classes to be created. i.e. For hand pose the number of hand gestures to be created.\n", "\n", "**path_dir** - Path to the directory to be created\n", "\n", "**dataset_name** - The name of the dataset to be created\n" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "def create_directories_for_classes(no_of_classes, path_dir, dataset_name):\n", " dir_ = os.path.join(path_dir, dataset_name)\n", " for i in range(no_of_classes):\n", " dir_to_create = os.path.join(dir_,\"%s\" % (i+1))\n", " try:\n", " os.makedirs(dir_to_create)\n", " except FileExistsError:\n", " print(os.path.join(\"The following directory was not created because it already exsists\", dir_ , ))\n" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [], "source": [ "dir_datasets = '/home/mikyas/mike_dataset/jj/'\n", "dataset_name = \"hand_dataset\"\n", "no_of_classes = 5\n", "create_directories_for_classes(no_of_classes, dir_datasets, dataset_name )" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import ipywidgets.widgets as widgets\n", "dir_ = os.path.join(dir_datasets, dataset_name)\n", "curr_class_no = 1\n", "button_layout = widgets.Layout(width='128px', height='32px')\n", "curr_dir = os.path.join(dir_,'%s'%curr_class_no )\n", "collecting_button = widgets.Button(description= 'Collect Class ' + str(curr_class_no), button_style='success', layout=button_layout)\n", "prev_button = widgets.Button(description='Previous Class', button_style='primary', layout=button_layout)\n", "nxt_button = widgets.Button(description='Next Class', button_style='info', layout=button_layout)\n", "\n", "dir_count = widgets.IntText(layout=button_layout, value=len(os.listdir(curr_dir)))\n", "dir_count.continuous_update" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [], "source": [ "from uuid import uuid1\n", "def save_snapshot(directory):\n", " image_path = os.path.join(directory, str(uuid1()) + '.jpg')\n", " with open(image_path, 'wb') as f:\n", " f.write(image_w.value)\n", "def save_dir():\n", " global curr_dir, dir_count\n", " save_snapshot(curr_dir)\n", " dir_count.value = len(os.listdir(curr_dir))\n", "def prev_dir():\n", " global curr_class_no, curr_dir, no_of_classes\n", " if curr_class_no>1:\n", " curr_class_no-=1\n", " curr_dir = os.path.join(dir_,'%s'%curr_class_no )\n", " collecting_button.description = 'Collect Class ' + str(curr_class_no)\n", " dir_count.value = len(os.listdir(curr_dir))\n", " dir_count.continuous_update\n", "def nxt_dir():\n", " global curr_class_no, curr_dir, no_of_classes\n", " if curr_class_no" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "with open('preprocess/hand_pose.json', 'r') as f:\n", " hand_pose = json.load(f)\n", "\n", "topology = trt_pose.coco.coco_category_to_topology(hand_pose)\n", "import trt_pose.models\n", "\n", "num_parts = len(hand_pose['keypoints'])\n", "num_links = len(hand_pose['skeleton'])\n", "\n", "model = trt_pose.models.resnet18_baseline_att(num_parts, 2 * num_links).cuda().eval()\n", "import torch\n", "\n", "\n", "WIDTH = 224\n", "HEIGHT = 224\n", "data = torch.zeros((1, 3, HEIGHT, WIDTH)).cuda()\n", "\n", "if not os.path.exists('model/hand_pose_resnet18_att_244_244_trt.pth'):\n", " MODEL_WEIGHTS = 'model/hand_pose_resnet18_att_244_244.pth'\n", " model.load_state_dict(torch.load(MODEL_WEIGHTS))\n", " import torch2trt\n", " model_trt = torch2trt.torch2trt(model, [data], fp16_mode=True, max_workspace_size=1<<25)\n", " OPTIMIZED_MODEL = 'model/hand_pose_resnet18_att_244_244_trt.pth'\n", " torch.save(model_trt.state_dict(), OPTIMIZED_MODEL)\n", "\n", "\n", "OPTIMIZED_MODEL = 'model/hand_pose_resnet18_att_244_244_trt.pth'\n", "from torch2trt import TRTModule\n", "\n", "model_trt = TRTModule()\n", "model_trt.load_state_dict(torch.load(OPTIMIZED_MODEL))" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "from trt_pose.draw_objects import DrawObjects\n", "from trt_pose.parse_objects import ParseObjects\n", "\n", "parse_objects = ParseObjects(topology,cmap_threshold=0.12, link_threshold=0.15)\n", "draw_objects = DrawObjects(topology)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "\n", "import torchvision.transforms as transforms\n", "import PIL.Image\n", "\n", "mean = torch.Tensor([0.485, 0.456, 0.406]).cuda()\n", "std = torch.Tensor([0.229, 0.224, 0.225]).cuda()\n", "device = torch.device('cuda')\n", "\n", "def preprocess(image):\n", " global device\n", " device = torch.device('cuda')\n", " image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)\n", " image = PIL.Image.fromarray(image)\n", " image = transforms.functional.to_tensor(image).to(device)\n", " image.sub_(mean[:, None, None]).div_(std[:, None, None])\n", " return image[None, ...]" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "from sklearn.pipeline import make_pipeline\n", "from sklearn.preprocessing import StandardScaler\n", "from sklearn.svm import SVC\n", "clf = make_pipeline(StandardScaler(), SVC(gamma='auto', kernel='rbf'))" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "from preprocessdata import preprocessdata\n", "preprocessdata = preprocessdata(topology, num_parts)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "svm_train = False\n", "if svm_train:\n", " clf, predicted = preprocessdata.trainsvm(clf, joints_train, joints_test, hand.labels_train, hand.labels_test)\n", " filename = 'svmmodel.sav'\n", " pickle.dump(clf, open(filename, 'wb'))\n", "else:\n", " filename = 'svmmodel.sav'\n", " clf = pickle.load(open(filename, 'rb'))" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "with open('preprocess/gesture.json', 'r') as f:\n", " gesture = json.load(f)\n", "gesture_type = gesture[\"classes\"]\n" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "from jetcam.usb_camera import USBCamera\n", "from jetcam.csi_camera import CSICamera\n", "from jetcam.utils import bgr8_to_jpeg\n", "\n", "camera = USBCamera(width=WIDTH, height=HEIGHT, capture_fps=30, capture_device=1)\n", "#camera = CSICamera(width=WIDTH, height=HEIGHT, capture_fps=30)\n", "\n", "camera.running = True" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "def draw_joints(image, joints):\n", " count = 0\n", " for i in joints:\n", " if i==[0,0]:\n", " count+=1\n", " if count>= 3:\n", " return \n", " for i in joints:\n", " cv2.circle(image, (i[0],i[1]), 2, (0,0,255), 1)\n", " cv2.circle(image, (joints[0][0],joints[0][1]), 2, (255,0,255), 1)\n", " for i in hand_pose['skeleton']:\n", " if joints[i[0]-1][0]==0 or joints[i[1]-1][0] == 0:\n", " break\n", " cv2.line(image, (joints[i[0]-1][0],joints[i[0]-1][1]), (joints[i[1]-1][0],joints[i[1]-1][1]), (0,255,0), 1)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "33e9829c8753413d8dbc0bc40221d92c", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Image(value=b'', format='jpeg', height='256', width='256')" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import ipywidgets\n", "from IPython.display import display\n", "\n", "\n", "image_w = ipywidgets.Image(format='jpeg', width=224, height=224)\n", "display(image_w)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "def execute(change):\n", " image = change['new']\n", " data = preprocess(image)\n", " cmap, paf = model_trt(data)\n", " cmap, paf = cmap.detach().cpu(), paf.detach().cpu()\n", " counts, objects, peaks = parse_objects(cmap, paf)\n", " joints = preprocessdata.joints_inference(image, counts, objects, peaks)\n", " draw_joints(image, joints)\n", " #draw_objects(image, counts, objects, peaks)\n", " dist_bn_joints = preprocessdata.find_distance(joints)\n", " gesture = clf.predict([dist_bn_joints,[0]*num_parts*num_parts])\n", " gesture_joints = gesture[0]\n", " preprocessdata.prev_queue.append(gesture_joints)\n", " preprocessdata.prev_queue.pop(0)\n", " preprocessdata.print_label(image, preprocessdata.prev_queue, gesture_type)\n", " image_w.value = bgr8_to_jpeg(image)\n" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "execute({'new': camera.value})" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "camera.observe(execute, names='value')" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "#camera.unobserve_all()" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "#camera.running = False" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.9" } }, "nbformat": 4, "nbformat_minor": 4 } ================================================ FILE: gesture_classifier.py ================================================ from sklearn.pipeline import make_pipeline from sklearn.preprocessing import StandardScaler class gesture_classifier: def __init__(self): pass def svm_accuracy(self, test_predicted, labels_test): predicted = [] for i in range(len(labels_test)): if labels_test[i]==test_predicted[i]: predicted.append(0) else: predicted.append(1) accuracy = 1 - sum(predicted)/len(labels_test) return accuracy def trainsvm(self, clf, train_data, test_data, labels_train, labels_test): clf.fit(train_data,labels_train) predicted_test = clf.predict(test_data) return clf, predicted_test ================================================ FILE: gesture_data_collection_pose.ipynb ================================================ { "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Matplotlib created a temporary config/cache directory at /tmp/matplotlib-bhsk56tr because the default path (/home/mikyas/.cache/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.\n" ] } ], "source": [ "import json\n", "import cv2\n", "import matplotlib.pyplot as plt\n", "import matplotlib.image as mpimg \n", "import trt_pose.coco\n", "import math\n", "import os\n", "import numpy as np\n", "import traitlets\n", "import sys\n" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "with open('preprocess/hand_pose.json', 'r') as f:\n", " hand_pose = json.load(f)\n", "\n", "topology = trt_pose.coco.coco_category_to_topology(hand_pose)\n", "import trt_pose.models\n", "\n", "num_parts = len(hand_pose['keypoints'])\n", "num_links = len(hand_pose['skeleton'])\n", "\n", "model = trt_pose.models.resnet18_baseline_att(num_parts, 2 * num_links).cuda().eval()\n", "import torch\n", "\n", "\n", "WIDTH = 224\n", "HEIGHT = 224\n", "data = torch.zeros((1, 3, HEIGHT, WIDTH)).cuda()\n", "\n", "if not os.path.exists('model/hand_pose_resnet18_att_244_244_trt.pth'):\n", " MODEL_WEIGHTS = 'model/hand_pose_resnet18_att_244_244.pth'\n", " model.load_state_dict(torch.load(MODEL_WEIGHTS))\n", " import torch2trt\n", " model_trt = torch2trt.torch2trt(model, [data], fp16_mode=True, max_workspace_size=1<<25)\n", " OPTIMIZED_MODEL = 'model/hand_pose_resnet18_att_244_244_trt.pth'\n", " torch.save(model_trt.state_dict(), OPTIMIZED_MODEL)\n", "\n", "\n", "OPTIMIZED_MODEL = 'model/hand_pose_resnet18_att_244_244_trt.pth'\n", "from torch2trt import TRTModule\n", "\n", "model_trt = TRTModule()\n", "model_trt.load_state_dict(torch.load(OPTIMIZED_MODEL))" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "from trt_pose.draw_objects import DrawObjects\n", "from trt_pose.parse_objects import ParseObjects\n", "\n", "parse_objects = ParseObjects(topology,cmap_threshold=0.15, link_threshold=0.15)\n", "draw_objects = DrawObjects(topology)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "def create_directories_for_classes(no_of_classes, path_dir, dataset_name):\n", " dir_ = os.path.join(path_dir, dataset_name)\n", " for i in range(no_of_classes):\n", " dir_to_create = os.path.join(dir_,\"%s\" % (i+1))\n", " try:\n", " os.makedirs(dir_to_create)\n", " except FileExistsError:\n", " print(os.path.join(\"The following directory was not created because it already exsists\", dir_ , ))\n" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/home/mikyas/data_collection/hand_dataset_train\n", "/home/mikyas/data_collection/hand_dataset_train\n", "/home/mikyas/data_collection/hand_dataset_train\n", "/home/mikyas/data_collection/hand_dataset_train\n", "/home/mikyas/data_collection/hand_dataset_train\n", "/home/mikyas/data_collection/hand_dataset_train\n" ] } ], "source": [ "dir_datasets = '/home/mikyas/data_collection/' #give the path to where you want to save you collected data\n", "dataset_name = \"hand_dataset_train\" #change this to hand_dataset_test when you are collecting data for test\n", "no_of_classes = 6\n", "create_directories_for_classes(no_of_classes, dir_datasets, dataset_name )" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import ipywidgets.widgets as widgets\n", "dir_ = os.path.join(dir_datasets, dataset_name)\n", "curr_class_no = 1\n", "button_layout = widgets.Layout(width='128px', height='32px')\n", "curr_dir = os.path.join(dir_,'%s'%curr_class_no )\n", "collecting_button = widgets.Button(description= 'Collect Class ' + str(curr_class_no), button_style='success', layout=button_layout)\n", "prev_button = widgets.Button(description='Previous Class', button_style='primary', layout=button_layout)\n", "nxt_button = widgets.Button(description='Next Class', button_style='info', layout=button_layout)\n", "\n", "dir_count = widgets.IntText(layout=button_layout, value=len(os.listdir(curr_dir)))\n", "dir_count.continuous_update" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "from uuid import uuid1\n", "def save_snapshot(directory):\n", " image_path = os.path.join(directory, str(uuid1()) + '.jpg')\n", " with open(image_path, 'wb') as f:\n", " f.write(image_s.value)\n", "def save_dir():\n", " global curr_dir, dir_count\n", " save_snapshot(curr_dir)\n", " dir_count.value = len(os.listdir(curr_dir))\n", "def prev_dir():\n", " global curr_class_no, curr_dir, no_of_classes\n", " if curr_class_no>1:\n", " curr_class_no-=1\n", " curr_dir = os.path.join(dir_,'%s'%curr_class_no )\n", " collecting_button.description = 'Collect Class ' + str(curr_class_no)\n", " dir_count.value = len(os.listdir(curr_dir))\n", " dir_count.continuous_update\n", "def nxt_dir():\n", " global curr_class_no, curr_dir, no_of_classes\n", " if curr_class_no= 3:\n", " return \n", " for i in joints:\n", " cv2.circle(image, (i[0],i[1]), 2, (0,0,255), 1)\n", " cv2.circle(image, (joints[0][0],joints[0][1]), 2, (255,0,255), 1)\n", " for i in hand_pose['skeleton']:\n", " if joints[i[0]-1][0]==0 or joints[i[1]-1][0] == 0:\n", " break\n", " cv2.line(image, (joints[i[0]-1][0],joints[i[0]-1][1]), (joints[i[1]-1][0],joints[i[1]-1][1]), (0,255,0), 1)" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "from jetcam.usb_camera import USBCamera\n", "from jetcam.csi_camera import CSICamera\n", "from jetcam.utils import bgr8_to_jpeg\n", "\n", "camera = USBCamera(width=WIDTH, height=HEIGHT, capture_fps=30, capture_device=1)\n", "#camera = CSICamera(width=WIDTH, height=HEIGHT, capture_fps=30)\n", "\n", "camera.running = True" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "import ipywidgets\n", "from IPython.display import display\n", "\n", "\n", "image_w = ipywidgets.Image(format='jpeg', width=224, height=224)\n", "image_s = ipywidgets.Image(format='jpeg', width=224, height=224)\n" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "375b9e6d61384dc1beb1b7cc25a0666b", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Image(value=b'', format='jpeg', height='224', width='224')" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "380b01ad84b340c1a9342ccea6ad1b46", "version_major": 2, "version_minor": 0 }, "text/plain": [ "HBox(children=(IntText(value=0, layout=Layout(height='32px', width='128px')), Button(button_style='success', d…" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "d954768e10cf46fb963d60c733baf892", "version_major": 2, "version_minor": 0 }, "text/plain": [ "HBox(children=(Button(button_style='info', description='Next Class', layout=Layout(height='32px', width='128px…" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "13412c28b4ca46a7b41ab27decca0214", "version_major": 2, "version_minor": 0 }, "text/plain": [ "HBox(children=(Button(button_style='primary', description='Previous Class', layout=Layout(height='32px', width…" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "display(image_w)\n", "display(widgets.HBox([dir_count, collecting_button]))\n", "display(widgets.HBox([ nxt_button]))\n", "display(widgets.HBox([ prev_button]))" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "def execute(change):\n", " image = change['new']\n", " image_s.value = bgr8_to_jpeg(image[:, ::-1, :])\n", " data = preprocess(image)\n", " cmap, paf = model_trt(data)\n", " cmap, paf = cmap.detach().cpu(), paf.detach().cpu()\n", " counts, objects, peaks = parse_objects(cmap, paf)\n", " joints = preprocessdata.joints_inference(image, counts, objects, peaks)\n", " draw_joints(image, joints)\n", " #draw_objects(image, counts, objects, peaks)# try this for multiple hand pose prediction \n", " image_w.value = bgr8_to_jpeg(image[:, ::-1, :])" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "execute({'new': camera.value})" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "camera.observe(execute, names='value')" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [], "source": [ "camera.unobserve_all()" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "#camera.running = False" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [], "source": [ "def generate_labels(dir_, dataset_name):\n", " labels = []\n", " starting_label = 1\n", " for i in range(len(os.listdir(dir_))):\n", " dir_to_check = os.path.join(dir_,\"%s\" % (i+1))\n", " for j in range(len(os.listdir(dir_to_check))):\n", " labels.append(starting_label)\n", " starting_label+=1\n", " labels_to_dict = {\"labels\": labels}\n", " with open((dir_+'.json'), 'w') as outfile:\n", " json.dump(labels_to_dict, outfile)\n", " return labels " ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [], "source": [ "def rename_images(dir_):\n", " overall_count = 0\n", " #dir_ = dir_+dataset_name\n", " for i in range(len(os.listdir(dir_))):\n", " dir_to_check = os.path.join(dir_,\"%s\" % (i+1))\n", " dir_to_check+='/'\n", " for count, filename in enumerate(os.listdir(dir_to_check)):\n", " dst = \"%08d.jpg\"% overall_count\n", " src = dir_to_check+filename\n", " dst = dir_to_check+dst \n", " os.rename(src, dst)\n", " overall_count+=1" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [], "source": [ "generate_labels(dir_, dataset_name)\n", "rename_images(dir_)" ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'/home/mikyas/chitoku/data_collection/hand_dataset_test/hand_dataset_test.json'" ] }, "execution_count": 41, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import shutil\n", "dir_training = dir_datasets +'/training/'#change this to /test/ when you are collecting data for test\n", "try:\n", " os.makedirs(dir_training)\n", "except FileExistsError:\n", " print(os.path.join(\"The following directory was not created because it already exsists\", dir_ , ))\n", "for i in range(len(os.listdir(dir_))):\n", " dir_to_check = os.path.join(dir_,\"%s\" % (i+1))+'/'\n", " for count, filename in enumerate(os.listdir(dir_to_check)):\n", " src = dir_to_check+filename\n", " shutil.move(src,dir_training)\n", " os.rmdir(dir_to_check)\n", "shutil.move(dir_training,dir_)\n", "shutil.move(dir_+'.json',dir_)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.9" } }, "nbformat": 4, "nbformat_minor": 4 } ================================================ FILE: gesture_training/train_gesture_classification.ipynb ================================================ { "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Matplotlib created a temporary config/cache directory at /tmp/matplotlib-mn8hww0h because the default path (/home/mikyas/.cache/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.\n" ] } ], "source": [ "import json\n", "import cv2\n", "import matplotlib.pyplot as plt\n", "import matplotlib.image as mpimg \n", "import trt_pose.coco\n", "import math\n", "import os\n", "import numpy as np\n", "import traitlets\n", "import pickle \n" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "with open('hand_pose.json', 'r') as f:\n", " hand_pose = json.load(f)\n", "\n", "topology = trt_pose.coco.coco_category_to_topology(hand_pose)\n", "import trt_pose.models\n", "\n", "num_parts = len(hand_pose['keypoints'])\n", "num_links = len(hand_pose['skeleton'])\n", "\n", "model = trt_pose.models.resnet18_baseline_att(num_parts, 2 * num_links).cuda().eval()\n", "import torch\n", "\n", "\n", "WIDTH = 256\n", "HEIGHT = 256\n", "data = torch.zeros((1, 3, HEIGHT, WIDTH)).cuda()\n", "\n", "if not os.path.exists('resnet18_244x224_epoch_4150_trt.pth'):\n", " MODEL_WEIGHTS = 'resnet18_244x224_epoch_4150.pth'\n", " model.load_state_dict(torch.load(MODEL_WEIGHTS))\n", " import torch2trt\n", " model_trt = torch2trt.torch2trt(model, [data], fp16_mode=True, max_workspace_size=1<<25)\n", " OPTIMIZED_MODEL = 'resnet18_244x224_epoch_4150_trt.pth'\n", " torch.save(model_trt.state_dict(), OPTIMIZED_MODEL)\n", "\n", "\n", "OPTIMIZED_MODEL = 'resnet18_244x224_epoch_4150_trt.pth'\n", "from torch2trt import TRTModule\n", "\n", "model_trt = TRTModule()\n", "model_trt.load_state_dict(torch.load(OPTIMIZED_MODEL))" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "from trt_pose.draw_objects import DrawObjects\n", "from trt_pose.parse_objects import ParseObjects\n", "\n", "parse_objects = ParseObjects(topology,cmap_threshold=0.12, link_threshold=0.15)\n", "draw_objects = DrawObjects(topology)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "\n", "import torchvision.transforms as transforms\n", "import PIL.Image\n", "\n", "mean = torch.Tensor([0.485, 0.456, 0.406]).cuda()\n", "std = torch.Tensor([0.229, 0.224, 0.225]).cuda()\n", "device = torch.device('cuda')\n", "\n", "def preprocess(image):\n", " global device\n", " device = torch.device('cuda')\n", " image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)\n", " image = PIL.Image.fromarray(image)\n", " image = transforms.functional.to_tensor(image).to(device)\n", " image.sub_(mean[:, None, None]).div_(std[:, None, None])\n", " return image[None, ...]" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "from sklearn.pipeline import make_pipeline\n", "from sklearn.preprocessing import StandardScaler\n", "from sklearn.svm import SVC\n", "clf = make_pipeline(StandardScaler(), SVC(gamma='auto', kernel='rbf'))" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "from preprocessdata import preprocessdata\n", "preprocessdata = preprocessdata(topology, num_parts)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from dataloader import dataloader\n", "path = \"/home/mikyas/mike_dataset/gestures/hand_dataset/\"\n", "label_file = \"hand_dataset.json\"\n", "test_label = \"hand_dataset_test.json\"\n", "hand = dataloader(path, label_file, test_label)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def data_preprocess(images):\n", " dist_bn_joints_all_data = []\n", " for im in images:\n", " im = im[:, ::-1, :]\n", " data_im = preprocess(im)\n", " cmap, paf = model_trt(data_im)\n", " cmap, paf = cmap.detach().cpu(), paf.detach().cpu()\n", " counts, objects, peaks = parse_objects(cmap, paf)\n", " joints = preprocessdata.joints_inference(im, counts, objects, peaks)\n", " dist_bn_joints = preprocessdata.find_distance(joints)\n", " dist_bn_joints_all_data.append(dist_bn_joints)\n", " return dist_bn_joints_all_data" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def smaller_dataset(dataset, no_samples_per_class, no_of_classes):\n", " total_samples_per_class =100\n", " start = 0\n", " end = no_samples_per_class\n", " new_dataset = []\n", " labels = []\n", " for i in range(no_of_classes):\n", " new_data = dataset[start:end]\n", " start = start+total_samples_per_class\n", " end = start+no_samples_per_class\n", " new_dataset.extend(new_data)\n", " labels.extend([i+1]*no_samples_per_class)\n", " return new_dataset, labels" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "train_images, labels_train = hand.smaller_dataset(hand.train_images,100,6)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "joints_train = data_preprocess(hand.train_images)\n", "joints_test = data_preprocess(hand.test_images)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "svm_train = False\n", "if svm_train:\n", " clf, predicted = preprocessdata.trainsvm(clf, joints_train, joints_test, hand.labels_train, hand.labels_test)\n", " filename = 'svmmodel_new.sav'\n", " pickle.dump(clf, open(filename, 'wb'))\n", "else:\n", " filename = 'svmmodel.sav'\n", " clf = pickle.load(open(filename, 'rb'))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "preprocessdata.svm_accuracy(clf.predict(joints_test), hand.labels_test)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "clf.predict([joints_test[40],[0]*num_parts*num_parts])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "clf.predict(joints_test)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "from jetcam.usb_camera import USBCamera\n", "from jetcam.csi_camera import CSICamera\n", "from jetcam.utils import bgr8_to_jpeg\n", "\n", "camera = USBCamera(width=WIDTH, height=HEIGHT, capture_fps=30, capture_device=1)\n", "#camera = CSICamera(width=WIDTH, height=HEIGHT, capture_fps=30)\n", "\n", "camera.running = True" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "0eb636d637824f2596b9f26ee5c970c1", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Image(value=b'', format='jpeg', height='256', width='256')" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import ipywidgets\n", "from IPython.display import display\n", "\n", "\n", "image_w = ipywidgets.Image(format='jpeg', width=256, height=256)\n", "display(image_w)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "def execute(change):\n", " image = change['new']\n", " data = preprocess(image)\n", " cmap, paf = model_trt(data)\n", " cmap, paf = cmap.detach().cpu(), paf.detach().cpu()\n", " counts, objects, peaks = parse_objects(cmap, paf)#, cmap_threshold=0.15, link_threshold=0.15)\n", " draw_objects(image, counts, objects, peaks)\n", " joints = preprocessdata.joints_inference(image, counts, objects, peaks)\n", " dist_bn_joints = preprocessdata.find_distance(joints)\n", " gesture = clf.predict([dist_bn_joints,[0]*num_parts*num_parts])\n", " gesture_joints = gesture[0]\n", " preprocessdata.prev_queue.append(gesture_joints)\n", " preprocessdata.prev_queue.pop(0)\n", " preprocessdata.print_label(image, preprocessdata.prev_queue)\n", " image_w.value = bgr8_to_jpeg(image)\n" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "execute({'new': camera.value})" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "camera.observe(execute, names='value')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "camera.unobserve_all()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#camera.running = False" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.9" } }, "nbformat": 4, "nbformat_minor": 4 } ================================================ FILE: live_hand_pose.ipynb ================================================ { "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Matplotlib created a temporary config/cache directory at /tmp/matplotlib-73_p9iw7 because the default path (/home/mikyas/.cache/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.\n" ] } ], "source": [ "import json\n", "import cv2\n", "import matplotlib.pyplot as plt\n", "import matplotlib.image as mpimg \n", "import trt_pose.coco\n", "import math\n", "import os\n", "import numpy as np\n", "import traitlets\n" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "with open('preprocess/hand_pose.json', 'r') as f:\n", " hand_pose = json.load(f)\n", "\n", "topology = trt_pose.coco.coco_category_to_topology(hand_pose)\n", "import trt_pose.models\n", "\n", "num_parts = len(hand_pose['keypoints'])\n", "num_links = len(hand_pose['skeleton'])\n", "\n", "model = trt_pose.models.resnet18_baseline_att(num_parts, 2 * num_links).cuda().eval()\n", "import torch\n", "\n", "\n", "WIDTH = 224\n", "HEIGHT = 224\n", "data = torch.zeros((1, 3, HEIGHT, WIDTH)).cuda()\n", "\n", "if not os.path.exists('model/hand_pose_resnet18_att_244_244_trt.pth'):\n", " MODEL_WEIGHTS = 'model/hand_pose_resnet18_att_244_244.pth'\n", " model.load_state_dict(torch.load(MODEL_WEIGHTS))\n", " import torch2trt\n", " model_trt = torch2trt.torch2trt(model, [data], fp16_mode=True, max_workspace_size=1<<25)\n", " OPTIMIZED_MODEL = 'model/hand_pose_resnet18_att_244_244_trt.pth'\n", " torch.save(model_trt.state_dict(), OPTIMIZED_MODEL)\n", "\n", "\n", "OPTIMIZED_MODEL = 'model/hand_pose_resnet18_att_244_244_trt.pth'\n", "from torch2trt import TRTModule\n", "\n", "model_trt = TRTModule()\n", "model_trt.load_state_dict(torch.load(OPTIMIZED_MODEL))" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "from trt_pose.draw_objects import DrawObjects\n", "from trt_pose.parse_objects import ParseObjects\n", "\n", "parse_objects = ParseObjects(topology,cmap_threshold=0.15, link_threshold=0.15)\n", "draw_objects = DrawObjects(topology)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "\n", "import torchvision.transforms as transforms\n", "import PIL.Image\n", "\n", "mean = torch.Tensor([0.485, 0.456, 0.406]).cuda()\n", "std = torch.Tensor([0.229, 0.224, 0.225]).cuda()\n", "device = torch.device('cuda')\n", "\n", "def preprocess(image):\n", " global device\n", " device = torch.device('cuda')\n", " image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)\n", " image = PIL.Image.fromarray(image)\n", " image = transforms.functional.to_tensor(image).to(device)\n", " image.sub_(mean[:, None, None]).div_(std[:, None, None])\n", " return image[None, ...]" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "from preprocessdata import preprocessdata\n", "preprocessdata = preprocessdata(topology, num_parts)\n", "from gesture_classifier import gesture_classifier\n", "gesture_classifier = gesture_classifier()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, let's define a function that will preprocess the image, which is originally in BGR8 / HWC format." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "def draw_joints(image, joints):\n", " count = 0\n", " for i in joints:\n", " if i==[0,0]:\n", " count+=1\n", " if count>= 3:\n", " return \n", " for i in joints:\n", " cv2.circle(image, (i[0],i[1]), 2, (0,0,255), 1)\n", " cv2.circle(image, (joints[0][0],joints[0][1]), 2, (255,0,255), 1)\n", " for i in hand_pose['skeleton']:\n", " if joints[i[0]-1][0]==0 or joints[i[1]-1][0] == 0:\n", " break\n", " cv2.line(image, (joints[i[0]-1][0],joints[i[0]-1][1]), (joints[i[1]-1][0],joints[i[1]-1][1]), (0,255,0), 1)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "from jetcam.usb_camera import USBCamera\n", "from jetcam.csi_camera import CSICamera\n", "from jetcam.utils import bgr8_to_jpeg\n", "\n", "camera = USBCamera(width=WIDTH, height=HEIGHT, capture_fps=30, capture_device=1)\n", "#camera = CSICamera(width=WIDTH, height=HEIGHT, capture_fps=30)\n", "\n", "camera.running = True" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "f0833455fca64ca1be4515b072fd6941", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Image(value=b'', format='jpeg', height='224', width='224')" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import ipywidgets\n", "from IPython.display import display\n", "\n", "\n", "image_w = ipywidgets.Image(format='jpeg', width=224, height=224)\n", "display(image_w)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "def execute(change):\n", " image = change['new']\n", " data = preprocess(image)\n", " cmap, paf = model_trt(data)\n", " cmap, paf = cmap.detach().cpu(), paf.detach().cpu()\n", " counts, objects, peaks = parse_objects(cmap, paf)\n", " joints = preprocessdata.joints_inference(image, counts, objects, peaks)\n", " draw_joints(image, joints)\n", " #draw_objects(image, counts, objects, peaks)# try this for multiple hand pose prediction \n", " image_w.value = bgr8_to_jpeg(image[:, ::-1, :])" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "execute({'new': camera.value})" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "camera.observe(execute, names='value')" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "camera.unobserve_all()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#camera.running = False" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.9" } }, "nbformat": 4, "nbformat_minor": 4 } ================================================ FILE: mini_paint_live_demo.ipynb ================================================ { "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Matplotlib created a temporary config/cache directory at /tmp/matplotlib-ljzfvzxa because the default path (/home/mikyas/.cache/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.\n" ] } ], "source": [ "import json\n", "import cv2\n", "import matplotlib.pyplot as plt\n", "import matplotlib.image as mpimg \n", "import trt_pose.coco\n", "import math\n", "import os\n", "import numpy as np\n", "import traitlets\n", "import pickle \n" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "with open('preprocess/hand_pose.json', 'r') as f:\n", " hand_pose = json.load(f)\n", "\n", "topology = trt_pose.coco.coco_category_to_topology(hand_pose)\n", "import trt_pose.models\n", "\n", "num_parts = len(hand_pose['keypoints'])\n", "num_links = len(hand_pose['skeleton'])\n", "\n", "model = trt_pose.models.resnet18_baseline_att(num_parts, 2 * num_links).cuda().eval()\n", "import torch\n", "\n", "\n", "WIDTH = 224\n", "HEIGHT = 224\n", "data = torch.zeros((1, 3, HEIGHT, WIDTH)).cuda()\n", "\n", "if not os.path.exists('model/hand_pose_resnet18_att_244_244_trt.pth'):\n", " MODEL_WEIGHTS = 'model/hand_pose_resnet18_att_244_244.pth'\n", " model.load_state_dict(torch.load(MODEL_WEIGHTS))\n", " import torch2trt\n", " model_trt = torch2trt.torch2trt(model, [data], fp16_mode=True, max_workspace_size=1<<25)\n", " OPTIMIZED_MODEL = 'model/hand_pose_resnet18_att_244_244_trt.pth'\n", " torch.save(model_trt.state_dict(), OPTIMIZED_MODEL)\n", "\n", "\n", "OPTIMIZED_MODEL = 'model/hand_pose_resnet18_att_244_244_trt.pth'\n", "from torch2trt import TRTModule\n", "\n", "model_trt = TRTModule()\n", "model_trt.load_state_dict(torch.load(OPTIMIZED_MODEL))" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "from trt_pose.draw_objects import DrawObjects\n", "from trt_pose.parse_objects import ParseObjects\n", "\n", "parse_objects = ParseObjects(topology,cmap_threshold=0.12, link_threshold=0.15)\n", "draw_objects = DrawObjects(topology)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "\n", "import torchvision.transforms as transforms\n", "import PIL.Image\n", "\n", "mean = torch.Tensor([0.485, 0.456, 0.406]).cuda()\n", "std = torch.Tensor([0.229, 0.224, 0.225]).cuda()\n", "device = torch.device('cuda')\n", "\n", "def preprocess(image):\n", " global device\n", " device = torch.device('cuda')\n", " image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)\n", " image = PIL.Image.fromarray(image)\n", " image = transforms.functional.to_tensor(image).to(device)\n", " image.sub_(mean[:, None, None]).div_(std[:, None, None])\n", " return image[None, ...]" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "from sklearn.pipeline import make_pipeline\n", "from sklearn.preprocessing import StandardScaler\n", "from sklearn.svm import SVC\n", "clf = make_pipeline(StandardScaler(), SVC(gamma='auto', kernel='rbf'))" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "from preprocessdata import preprocessdata\n", "preprocessdata = preprocessdata(topology, num_parts)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "svm_train = False\n", "if svm_train:\n", " clf, predicted = preprocessdata.trainsvm(clf, joints_train, joints_test, hand.labels_train, hand.labels_test)\n", " filename = 'svmmodel.sav'\n", " pickle.dump(clf, open(filename, 'wb'))\n", "else:\n", " filename = 'svmmodel.sav'\n", " clf = pickle.load(open(filename, 'rb'))" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "from jetcam.usb_camera import USBCamera\n", "from jetcam.csi_camera import CSICamera\n", "from jetcam.utils import bgr8_to_jpeg\n", "\n", "camera = USBCamera(width=WIDTH, height=HEIGHT, capture_fps=30, capture_device=1)\n", "#camera = CSICamera(width=WIDTH, height=HEIGHT, capture_fps=30)\n", "\n", "camera.running = True" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "def draw_joints(image, joints):\n", " count = 0\n", " for i in joints:\n", " if i==[0,0]:\n", " count+=1\n", " if count>= 7:\n", " return \n", " for i in joints:\n", " cv2.circle(image, (i[0],i[1]), 2, (0,0,255), 1)\n", " cv2.circle(image, (joints[0][0],joints[0][1]), 2, (255,0,255), 1)\n", " for i in hand_pose['skeleton']:\n", " if joints[i[0]-1][0]==0 or joints[i[1]-1][0] == 0:\n", " break\n", " cv2.line(image, (joints[i[0]-1][0],joints[i[0]-1][1]), (joints[i[1]-1][0],joints[i[1]-1][1]), (0,255,0), 1)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "with open('preprocess/gesture.json', 'r') as f:\n", " gesture = json.load(f)\n", "gesture_type = gesture[\"paint\"]" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "pen = []\n", "rectangle = []\n", "def draw(image, joints):\n", " global pen\n", " global rectangle\n", " if preprocessdata.text==\"draw\":\n", " pen.append((joints[6][0], joints[6][1]))\n", " for i in range(len(pen)):\n", " cv2.circle(image, pen[i], 1,(0,0,0), 2)\n", " if preprocessdata.text==\"line\":\n", " if joints[5]!=[0,0]:\n", " rectangle.append((joints[6][0], joints[6][1]))\n", " for i in range(len(rectangle)):\n", " if i > 0:\n", " if rectangle[i]!=[0,0]:\n", " cv2.line(image,rectangle[i-1], rectangle[i], (0,0,0), 2)\n", " if preprocessdata.text==\"erase\":\n", " to_be_erased = []\n", " for i in range(10):\n", " for j in range(10):\n", " \n", " x = (joints[6][0]+i, joints[6][1]+j)\n", " if x[0]>=0 or x[1]>=0:\n", " to_be_erased.append(x)\n", " for i in to_be_erased:\n", " if i in pen:\n", " pen.remove(i) \n", " \n", " if preprocessdata.text==\"clear\":\n", " pen.clear()\n", " rectangle.clear()" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "scrolled": true }, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "7e1c0d5b318f4e998bbe4312c5d829f2", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Image(value=b'', format='jpeg', height='256', width='256')" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import ipywidgets\n", "from IPython.display import display\n", "\n", "\n", "image_w = ipywidgets.Image(format='jpeg', width=224, height=224)\n", "display(image_w)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "def execute(change):\n", " image = change['new']\n", " data = preprocess(image)\n", " cmap, paf = model_trt(data)\n", " cmap, paf = cmap.detach().cpu(), paf.detach().cpu()\n", " counts, objects, peaks = parse_objects(cmap, paf)\n", " joints = preprocessdata.joints_inference(image, counts, objects, peaks)\n", " #draw_objects(image, counts, objects, peaks )\n", " draw_joints(image, joints)\n", " dist_bn_joints = preprocessdata.find_distance(joints)\n", " gesture = clf.predict([dist_bn_joints,[0]*num_parts*num_parts])\n", " gesture_joints = gesture[0]\n", " preprocessdata.prev_queue.append(gesture_joints)\n", " preprocessdata.prev_queue.pop(0)\n", " preprocessdata.print_label(image, preprocessdata.prev_queue, gesture_type)\n", " draw(image, joints)\n", " #image = image[:, ::-1, :]\n", " image_w.value = bgr8_to_jpeg(image)\n" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "execute({'new': camera.value})" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "camera.observe(execute, names='value')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "camera.unobserve_all()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#camera.running = False" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.9" } }, "nbformat": 4, "nbformat_minor": 4 } ================================================ FILE: model/README.md ================================================ # Download and put your model here ================================================ FILE: preprocess/gesture.json ================================================ {"paint": ["clear","draw", "click", "line", "erase", "no hand", "no hand"], "mouse": ["drag","pan", "click", "zoom", "scroll", "no hand", "no hand"], "classes": ["fist","pan", "stop", "peace", "ok", "no hand", "no hand"]} ================================================ FILE: preprocess/hand_pose.json ================================================ {"supercategory": "hand", "id": 1, "name": "hand", "keypoints": ["palm","thumb_1", "thumb_2", "thumb_3", "thumb_4", "index_finger_1", "index_finger_2", "index_finger_3", "index_finger_4", "middle_finger_1", "middle_finger_2", "middle_finger_3", "middle_finger_4", "ring_finger_1", "ring_finger_2", "ring_finger_3", "ring_finger_4", "baby_finger_1", "baby_finger_2", "baby_finger_3", "baby_finger_4"], "skeleton": [[1, 5], [1, 9], [1, 13], [1, 17], [1, 21], [2, 3], [3, 4], [4, 5], [6, 7], [7, 8], [8, 9], [10, 11], [11, 12], [12, 13], [14, 15], [15, 16], [16, 17], [18, 19], [19, 20], [20, 21]]} ================================================ FILE: preprocessdata.py ================================================ import math import pickle import cv2 class preprocessdata: def __init__(self, topology, num_parts): self.joints = [] self.dist_bn_joints = [] self.topology = topology self.num_parts = num_parts self.text = "no hand" self.num_frames = 4 self.prev_queue = [ self.num_frames ]*self.num_frames def svm_accuracy(self, test_predicted, labels_test): """" This method calculates the accuracy of the model Input: test_predicted - predicted test classes labels_test Output: accuracy - of the model """ predicted = [] for i in range(len(labels_test)): if labels_test[i]==test_predicted[i]: predicted.append(0) else: predicted.append(1) accuracy = 1 - sum(predicted)/len(labels_test) return accuracy def trainsvm(self, clf, train_data, test_data, labels_train, labels_test): """ This method trains the different gestures Input: clf - Sk-learn model pipeline to train, You can choose an SVM, linear regression, etc train_data - preprocessed training image data -in this case the distance between the joints test_data - preprocessed testing image data -in this case the distance between the joints labels_train - labels for training images labels_test - labels for testing images Output: trained model, predicted_test_classes """ clf.fit(train_data,labels_train) predicted_test = clf.predict(test_data) return clf, predicted_test #def loadsvmweights(): def joints_inference(self, image, counts, objects, peaks): """ This method returns predicted joints from an image/frame Input: image, counts, objects, peaks Output: predicted joints """ joints_t = [] height = image.shape[0] width = image.shape[1] K = self.topology.shape[0] count = int(counts[0]) for i in range(count): obj = objects[0][i] C = obj.shape[0] for j in range(C): k = int(obj[j]) picked_peaks = peaks[0][j][k] joints_t.append([round(float(picked_peaks[1]) * width), round(float(picked_peaks[0]) * height)]) joints_pt = joints_t[:self.num_parts] rest_of_joints_t = joints_t[self.num_parts:] #when it does not predict a particular joint in the same association it will try to find it in a different association for i in range(len(rest_of_joints_t)): l = i%self.num_parts if joints_pt[l] == [0,0]: joints_pt[l] = rest_of_joints_t[i] #if nothing is predicted if count == 0: joints_pt = [[0,0]]*self.num_parts return joints_pt def find_distance(self, joints): """ This method finds the distance between each joints Input: a list that contains the [x,y] positions of the 21 joints Output: a list that contains the distance between the joints """ joints_features = [] for i in joints: for j in joints: dist_between_i_j = math.sqrt((i[0]-j[0])**2+(i[1]-j[1])**2) joints_features.append(dist_between_i_j) return joints_features def print_label(self, image, gesture_joints, gesture_type): """ This method prints the gesture class detected. Example. Incase of the cursor control application it shows if your gesture is a click or other type of gesture """ font = cv2.FONT_HERSHEY_SIMPLEX color = (255, 0, 0) org = (50, 50) thickness = 2 fontScale = 0.5 no_frames = 4 if self.prev_queue == [1]* self.num_frames: self.text = gesture_type[0] elif self.prev_queue == [2]* self.num_frames: self.text = gesture_type[1] elif self.prev_queue == [3]* self.num_frames: self.text = gesture_type[2] elif self.prev_queue == [4]* self.num_frames: self.text = gesture_type[3] elif self.prev_queue == [5]* self.num_frames: self.text = gesture_type[4] elif self.prev_queue == [6]* self.num_frames: self.text = gesture_type[5] elif self.prev_queue == [7]*self.num_frames: self.text = gesture_type[6] image = cv2.putText(image, self.text, org, font, fontScale, color, thickness, cv2.LINE_AA) return image