Repository: arefmalek/airdraw Branch: main Commit: dac401536421 Files: 9 Total size: 60.5 KB Directory structure: gitextract_vay903ib/ ├── .gitignore ├── LICENSE.md ├── README.md ├── airdraw.py ├── canvas.py ├── data.py ├── hands.py ├── requirements.txt └── util.py ================================================ FILE CONTENTS ================================================ ================================================ FILE: .gitignore ================================================ # vscode .vscode/ venv/ # mac stuff b/c I wrote a video file .DS_Store # video files I'm playing w lol *.mp4 # Byte-compiled / optimized / DLL files __pycache__/ *.py[cod] *$py.class # C extensions *.so # Distribution / packaging .Python build/ develop-eggs/ dist/ downloads/ eggs/ .eggs/ lib/ lib64/ parts/ sdist/ var/ wheels/ pip-wheel-metadata/ share/python-wheels/ *.egg-info/ .installed.cfg *.egg MANIFEST # PyInstaller # Usually these files are written by a python script from a template # before PyInstaller builds the exe, so as to inject date/other infos into it. *.manifest *.spec # Installer logs pip-log.txt pip-delete-this-directory.txt # Unit test / coverage reports htmlcov/ .tox/ .nox/ .coverage .coverage.* .cache nosetests.xml coverage.xml *.cover *.py,cover .hypothesis/ .pytest_cache/ # Translations *.mo *.pot # Django stuff: *.log local_settings.py db.sqlite3 db.sqlite3-journal # Flask stuff: instance/ .webassets-cache # Scrapy stuff: .scrapy # Sphinx documentation docs/_build/ # PyBuilder target/ # Jupyter Notebook .ipynb_checkpoints # IPython profile_default/ ipython_config.py # pyenv .python-version # pipenv # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. # However, in case of collaboration, if having platform-specific dependencies or dependencies # having no cross-platform support, pipenv may install dependencies that don't work, or not # install all needed dependencies. #Pipfile.lock # PEP 582; used by e.g. github.com/David-OConnor/pyflow __pypackages__/ # Celery stuff celerybeat-schedule celerybeat.pid # SageMath parsed files *.sage.py # Environments .env .venv env/ venv/ ENV/ env.bak/ venv.bak/ # Spyder project settings .spyderproject .spyproject # Rope project settings .ropeproject # mkdocs documentation /site # mypy .mypy_cache/ .dmypy.json dmypy.json # Pyre type checker .pyre/ ================================================ FILE: LICENSE.md ================================================ Mozilla Public License Version 2.0 ================================== 1. Definitions -------------- 1.1. "Contributor" means each individual or legal entity that creates, contributes to the creation of, or owns Covered Software. 1.2. "Contributor Version" means the combination of the Contributions of others (if any) used by a Contributor and that particular Contributor's Contribution. 1.3. "Contribution" means Covered Software of a particular Contributor. 1.4. "Covered Software" means Source Code Form to which the initial Contributor has attached the notice in Exhibit A, the Executable Form of such Source Code Form, and Modifications of such Source Code Form, in each case including portions thereof. 1.5. "Incompatible With Secondary Licenses" means (a) that the initial Contributor has attached the notice described in Exhibit B to the Covered Software; or (b) that the Covered Software was made available under the terms of version 1.1 or earlier of the License, but not also under the terms of a Secondary License. 1.6. "Executable Form" means any form of the work other than Source Code Form. 1.7. "Larger Work" means a work that combines Covered Software with other material, in a separate file or files, that is not Covered Software. 1.8. "License" means this document. 1.9. "Licensable" means having the right to grant, to the maximum extent possible, whether at the time of the initial grant or subsequently, any and all of the rights conveyed by this License. 1.10. "Modifications" means any of the following: (a) any file in Source Code Form that results from an addition to, deletion from, or modification of the contents of Covered Software; or (b) any new file in Source Code Form that contains any Covered Software. 1.11. "Patent Claims" of a Contributor means any patent claim(s), including without limitation, method, process, and apparatus claims, in any patent Licensable by such Contributor that would be infringed, but for the grant of the License, by the making, using, selling, offering for sale, having made, import, or transfer of either its Contributions or its Contributor Version. 1.12. "Secondary License" means either the GNU General Public License, Version 2.0, the GNU Lesser General Public License, Version 2.1, the GNU Affero General Public License, Version 3.0, or any later versions of those licenses. 1.13. "Source Code Form" means the form of the work preferred for making modifications. 1.14. "You" (or "Your") means an individual or a legal entity exercising rights under this License. For legal entities, "You" includes any entity that controls, is controlled by, or is under common control with You. For purposes of this definition, "control" means (a) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (b) ownership of more than fifty percent (50%) of the outstanding shares or beneficial ownership of such entity. 2. License Grants and Conditions -------------------------------- 2.1. Grants Each Contributor hereby grants You a world-wide, royalty-free, non-exclusive license: (a) under intellectual property rights (other than patent or trademark) Licensable by such Contributor to use, reproduce, make available, modify, display, perform, distribute, and otherwise exploit its Contributions, either on an unmodified basis, with Modifications, or as part of a Larger Work; and (b) under Patent Claims of such Contributor to make, use, sell, offer for sale, have made, import, and otherwise transfer either its Contributions or its Contributor Version. 2.2. Effective Date The licenses granted in Section 2.1 with respect to any Contribution become effective for each Contribution on the date the Contributor first distributes such Contribution. 2.3. Limitations on Grant Scope The licenses granted in this Section 2 are the only rights granted under this License. No additional rights or licenses will be implied from the distribution or licensing of Covered Software under this License. Notwithstanding Section 2.1(b) above, no patent license is granted by a Contributor: (a) for any code that a Contributor has removed from Covered Software; or (b) for infringements caused by: (i) Your and any other third party's modifications of Covered Software, or (ii) the combination of its Contributions with other software (except as part of its Contributor Version); or (c) under Patent Claims infringed by Covered Software in the absence of its Contributions. This License does not grant any rights in the trademarks, service marks, or logos of any Contributor (except as may be necessary to comply with the notice requirements in Section 3.4). 2.4. Subsequent Licenses No Contributor makes additional grants as a result of Your choice to distribute the Covered Software under a subsequent version of this License (see Section 10.2) or under the terms of a Secondary License (if permitted under the terms of Section 3.3). 2.5. Representation Each Contributor represents that the Contributor believes its Contributions are its original creation(s) or it has sufficient rights to grant the rights to its Contributions conveyed by this License. 2.6. Fair Use This License is not intended to limit any rights You have under applicable copyright doctrines of fair use, fair dealing, or other equivalents. 2.7. Conditions Sections 3.1, 3.2, 3.3, and 3.4 are conditions of the licenses granted in Section 2.1. 3. Responsibilities ------------------- 3.1. Distribution of Source Form All distribution of Covered Software in Source Code Form, including any Modifications that You create or to which You contribute, must be under the terms of this License. You must inform recipients that the Source Code Form of the Covered Software is governed by the terms of this License, and how they can obtain a copy of this License. You may not attempt to alter or restrict the recipients' rights in the Source Code Form. 3.2. Distribution of Executable Form If You distribute Covered Software in Executable Form then: (a) such Covered Software must also be made available in Source Code Form, as described in Section 3.1, and You must inform recipients of the Executable Form how they can obtain a copy of such Source Code Form by reasonable means in a timely manner, at a charge no more than the cost of distribution to the recipient; and (b) You may distribute such Executable Form under the terms of this License, or sublicense it under different terms, provided that the license for the Executable Form does not attempt to limit or alter the recipients' rights in the Source Code Form under this License. 3.3. Distribution of a Larger Work You may create and distribute a Larger Work under terms of Your choice, provided that You also comply with the requirements of this License for the Covered Software. If the Larger Work is a combination of Covered Software with a work governed by one or more Secondary Licenses, and the Covered Software is not Incompatible With Secondary Licenses, this License permits You to additionally distribute such Covered Software under the terms of such Secondary License(s), so that the recipient of the Larger Work may, at their option, further distribute the Covered Software under the terms of either this License or such Secondary License(s). 3.4. Notices You may not remove or alter the substance of any license notices (including copyright notices, patent notices, disclaimers of warranty, or limitations of liability) contained within the Source Code Form of the Covered Software, except that You may alter any license notices to the extent required to remedy known factual inaccuracies. 3.5. Application of Additional Terms You may choose to offer, and to charge a fee for, warranty, support, indemnity or liability obligations to one or more recipients of Covered Software. However, You may do so only on Your own behalf, and not on behalf of any Contributor. You must make it absolutely clear that any such warranty, support, indemnity, or liability obligation is offered by You alone, and You hereby agree to indemnify every Contributor for any liability incurred by such Contributor as a result of warranty, support, indemnity or liability terms You offer. You may include additional disclaimers of warranty and limitations of liability specific to any jurisdiction. 4. Inability to Comply Due to Statute or Regulation --------------------------------------------------- If it is impossible for You to comply with any of the terms of this License with respect to some or all of the Covered Software due to statute, judicial order, or regulation then You must: (a) comply with the terms of this License to the maximum extent possible; and (b) describe the limitations and the code they affect. Such description must be placed in a text file included with all distributions of the Covered Software under this License. Except to the extent prohibited by statute or regulation, such description must be sufficiently detailed for a recipient of ordinary skill to be able to understand it. 5. Termination -------------- 5.1. The rights granted under this License will terminate automatically if You fail to comply with any of its terms. However, if You become compliant, then the rights granted under this License from a particular Contributor are reinstated (a) provisionally, unless and until such Contributor explicitly and finally terminates Your grants, and (b) on an ongoing basis, if such Contributor fails to notify You of the non-compliance by some reasonable means prior to 60 days after You have come back into compliance. Moreover, Your grants from a particular Contributor are reinstated on an ongoing basis if such Contributor notifies You of the non-compliance by some reasonable means, this is the first time You have received notice of non-compliance with this License from such Contributor, and You become compliant prior to 30 days after Your receipt of the notice. 5.2. If You initiate litigation against any entity by asserting a patent infringement claim (excluding declaratory judgment actions, counter-claims, and cross-claims) alleging that a Contributor Version directly or indirectly infringes any patent, then the rights granted to You by any and all Contributors for the Covered Software under Section 2.1 of this License shall terminate. 5.3. In the event of termination under Sections 5.1 or 5.2 above, all end user license agreements (excluding distributors and resellers) which have been validly granted by You or Your distributors under this License prior to termination shall survive termination. ************************************************************************ * * * 6. Disclaimer of Warranty * * ------------------------- * * * * Covered Software is provided under this License on an "as is" * * basis, without warranty of any kind, either expressed, implied, or * * statutory, including, without limitation, warranties that the * * Covered Software is free of defects, merchantable, fit for a * * particular purpose or non-infringing. The entire risk as to the * * quality and performance of the Covered Software is with You. * * Should any Covered Software prove defective in any respect, You * * (not any Contributor) assume the cost of any necessary servicing, * * repair, or correction. This disclaimer of warranty constitutes an * * essential part of this License. No use of any Covered Software is * * authorized under this License except under this disclaimer. * * * ************************************************************************ ************************************************************************ * * * 7. Limitation of Liability * * -------------------------- * * * * Under no circumstances and under no legal theory, whether tort * * (including negligence), contract, or otherwise, shall any * * Contributor, or anyone who distributes Covered Software as * * permitted above, be liable to You for any direct, indirect, * * special, incidental, or consequential damages of any character * * including, without limitation, damages for lost profits, loss of * * goodwill, work stoppage, computer failure or malfunction, or any * * and all other commercial damages or losses, even if such party * * shall have been informed of the possibility of such damages. This * * limitation of liability shall not apply to liability for death or * * personal injury resulting from such party's negligence to the * * extent applicable law prohibits such limitation. Some * * jurisdictions do not allow the exclusion or limitation of * * incidental or consequential damages, so this exclusion and * * limitation may not apply to You. * * * ************************************************************************ 8. Litigation ------------- Any litigation relating to this License may be brought only in the courts of a jurisdiction where the defendant maintains its principal place of business and such litigation shall be governed by laws of that jurisdiction, without reference to its conflict-of-law provisions. Nothing in this Section shall prevent a party's ability to bring cross-claims or counter-claims. 9. Miscellaneous ---------------- This License represents the complete agreement concerning the subject matter hereof. If any provision of this License is held to be unenforceable, such provision shall be reformed only to the extent necessary to make it enforceable. Any law or regulation which provides that the language of a contract shall be construed against the drafter shall not be used to construe this License against a Contributor. 10. Versions of the License --------------------------- 10.1. New Versions Mozilla Foundation is the license steward. Except as provided in Section 10.3, no one other than the license steward has the right to modify or publish new versions of this License. Each version will be given a distinguishing version number. 10.2. Effect of New Versions You may distribute the Covered Software under the terms of the version of the License under which You originally received the Covered Software, or under the terms of any subsequent version published by the license steward. 10.3. Modified Versions If you create software not governed by this License, and you want to create a new license for such software, you may create and use a modified version of this License if you rename the license and remove any references to the name of the license steward (except to note that such modified license differs from this License). 10.4. Distributing Source Code Form that is Incompatible With Secondary Licenses If You choose to distribute Source Code Form that is Incompatible With Secondary Licenses under the terms of this version of the License, the notice described in Exhibit B of this License must be attached. Exhibit A - Source Code Form License Notice ------------------------------------------- This Source Code Form is subject to the terms of the Mozilla Public License, v. 2.0. If a copy of the MPL was not distributed with this file, You can obtain one at http://mozilla.org/MPL/2.0/. If it is not possible or desirable to put the notice in a particular file, then You may include the notice in a location (such as a LICENSE file in a relevant directory) where a recipient would be likely to look for such a notice. You may add additional accurate notices of copyright ownership. Exhibit B - "Incompatible With Secondary Licenses" Notice --------------------------------------------------------- This Source Code Form is "Incompatible With Secondary Licenses", as defined by the Mozilla Public License, v. 2.0. ================================================ FILE: README.md ================================================ # Air Draw This example is sped up just to show functionality, real-time examples shown below: ## Demo of Functionality ![Demo of all functionality: Draw, Hover, Erase, and Translate](./demo_gifs/demo.gif) ## Setup NOTE This setup is just for what I use (Ubuntu 20.04). While I am willing to bet this will work for windows and unix, just be safe! ### Virtual environment `python3 -m venv venv` ### Install Dependencies `source ./venv/bin/activate` `pip3 install -r requirements.txt` ### Run program `python3 airdraw.py` ## Available Gestures ### Drawing ![Draw: drawing directly on screen](./demo_gifs/drawing.gif) ### Hovering ![Hover: Move across the screen passively](./demo_gifs/hovering.gif) ### Erasing ![Erase: Remove all drawings within radius](./demo_gifs/erasing.gif) ### Translation ![Translation: Move shapes around the screen](./demo_gifs/translating.gif) ## Why? I've seen tons of attempts of this sort of thing with HSV masks and dying your fingers a certain color, and while it's more true to image processing that openCV caters for, I was sort of against letting our own styluses [go to waste](https://money.cnn.com/2015/09/10/technology/apple-pencil-steve-jobs-stylus/index.html). Once I found out about [mediapipe](https://google.github.io/mediapipe/), I decided I would give this thing a shot! What you see is my attempt at materializing the idea, there is a more detailed [writeup](https://arefmalek.github.io/blog/Airdraw/) on my blog. ## How? Like I mentioned before, the ML workhorse here is definitely mediapipe. They've got awesome ML solutions so we can quickly gather data on the hand and use what we gather rather quickly. Other than that I pretty reliantly used OpenCV for image manipulation and NumPy for some basic dot products and because OpenCV uses numpy to represent images. The conversion from hand data to lines / functionality is primarily done with some Python, basic linear algebra, and OpenCV. I'll leave the rest in the blog post. ## What's next? Definitely want to make this more available to everyone, so an upcoming goal will be to write this as a webapp, hopefully within the next month or so, I'll keep everyone posted :). Thanks for reading :) ================================================ FILE: airdraw.py ================================================ import numpy as np import cv2 as cv from hands import HandDetector from canvas import Canvas def replay(fname): print("replaying", fname) cap = cv.VideoCapture(fname) # Use whatever width and height possible frame_width = int(cap.get(cv.CAP_PROP_FRAME_WIDTH)) frame_height = int(cap.get(cv.CAP_PROP_FRAME_HEIGHT)) canvas = Canvas(frame_width, frame_height) if (not cap.isOpened()): print("Error opening video file") return detector = HandDetector() while cap.isOpened() and (cv.waitKey(0) & 0xFF != ord('q')): ret, img = cap.read() # replay is completed when the video capture no longer has any frames to read. if ret: gesture_metadata = detector.get_gesture_metadata(img) img = canvas.update_and_draw(img, gesture_metadata) detector.draw_landmarks(img) cv.imshow('Camera', img) else: break cap.release() cv.destroyAllWindows() print("replay complete", fname) def main(): # Loading the default webcam of PC. cap = cv.VideoCapture(0) # width and height for 2-D grid width = int(cap.get(cv.CAP_PROP_FRAME_WIDTH) + 0.5) height = int(cap.get(cv.CAP_PROP_FRAME_HEIGHT) + 0.5) # initialize the canvas element and hand-detector program canvas = Canvas(height, width) detector = HandDetector() print(width, height) # Keep looping while True: # Reading the frame from the camera ret, frame = cap.read() frame = cv.flip(frame, 1) gesture_metadata = detector.get_gesture_metadata(frame) frame = canvas.update_and_draw(frame, gesture_metadata) detector.draw_landmarks(frame) cv.imshow("Airdraw", frame) stroke = cv.waitKey(1) & 0xff if stroke == ord('b'): # press 'b' to switch backgrounds (camera/black) canvas.switch_background() if stroke == ord('q') or stroke == 27: # press 'q' or 'esc' to quit break cap.release() cv.destroyAllWindows() if __name__ == '__main__': main() ================================================ FILE: canvas.py ================================================ import cv2 as cv import numpy as np from hands import Gesture, HandDetector from util import xy_euclidean_dist from enum import Enum # FIXME: # use a good spatial query system (is just iterating over literally every point the best we can do?) # Gauging this would need the following: # How many data points could i realistically collect over a 2-minute episode? # How much does it cost to iterate and compare versus # 1. storing waypoints in a grid, and then searching every pixel in the grid # 2. Storing all points in some sort of query system (intuition screaming quadtree). # have consistent usage of row, col convention between mediapipe, canvas, and opencv. # keep all data intialized at startup and only completely transform in function class Color(Enum): """Please remember these are in BGR coordinates!""" GRAY = (122, 122, 122) WHITE = (255, 255, 255) BLUE = (255,0,0) GREEN =(0,255,0) RED = (0,0,255) PURPLE = (255, 0, 255) YELLOW = (0, 255, 255) class Shape(Enum): CIRCLE = Color.BLUE SQUARE = Color.GRAY LINE = Color.GREEN class Canvas(): """ This class is responsible for "drawing" all state onto the screen. This includes the actual dashboard hands interact with as well as lines, backgrounds, etc. This component is intended to take (frame, hands_state) -> (update state) -> image to render """ def __init__(self, rows, columns): # FIXME: just make this deterministic via list self.colors = [ Color.BLUE, Color.GREEN, Color.RED ] self.shapes = [ Shape.LINE, Shape.CIRCLE, Shape.SQUARE ] self.rows = rows self.columns = columns self.color = Color.BLUE # only really used to initialize lines self.shape = Shape.LINE self.lines = {} # whole list of points self.circles = [] # whole list of points self.squares = [] # whole list of squares self.currLine = Line(None, self.color)# this is the line we're adding to self.currLine.active = False self.currCircle = Circle((-1, -1), -1, self.color)# this is the line we're adding to self.currCircle.active = False self.currSquare = Square((-1, -1), (-1, -1), self.color) self.currSquare.active = False self.blackout_background = False def switch_background(self): self.blackout_background = not self.blackout_background def get_buttons_coords(self, frame_shape): """ Returns coordinates of the buttons (and colors) to draw on the UI, used to save space later on. Should be useful for detecting overlap between fingers and buttons. Args: frame_shape: tuple describing frame shape Return: List with elements holding the following schema: (button name, button BGR colors, top-left coordinate, bottom-right coordinate) Ordering of the elements is as follows: 1. Clear all button 2. Color buttons 3. Shape buttons """ # Obtains the proportionally correct buttons for the frame shape given. frame_height, frame_width, _ = frame_shape coords = [] # add clear_button # Clear button is manually sized, all other buttons are manually sized clear_button_width = int(frame_width *.2) clear_button_height = int(frame_height * .15) clear_button_width_border = int(clear_button_width * .05) clear_button_height_border = int(clear_button_height * .05) coords.append( ( "Clear all", Color.GRAY.value, (clear_button_width_border, clear_button_height_border), (clear_button_width - clear_button_width_border, clear_button_height - clear_button_height_border) )) num_colors = len(self.colors) remaining_width = frame_width - clear_button_width color_button_width = remaining_width // num_colors color_button_height = int(clear_button_height * 0.7) color_button_border_width = int(color_button_width * 0.05) color_button_border_height = int(color_button_height * 0.05) curr_button_offset_width = clear_button_width # FIXME: use color.name instead? for color in self.colors: coords.append(( color.name, color.value, (curr_button_offset_width + color_button_border_width, color_button_border_height), (curr_button_offset_width + color_button_width - color_button_border_width, color_button_height - color_button_border_height) )) curr_button_offset_width += color_button_width num_shapes = len(Shape) remaining_height = frame_height - clear_button_height shape_button_height = (remaining_height // num_shapes) shape_button_width = int(clear_button_width * 0.7) shape_button_border_height = int(shape_button_height * 0.05) shape_button_border_width = int(shape_button_width * 0.05) curr_button_offset_height = clear_button_height for shape in Shape: coords.append(( shape.name, shape.value.value, (shape_button_border_width, curr_button_offset_height + shape_button_border_height), (shape_button_width - shape_button_border_width, curr_button_offset_height + shape_button_height - shape_button_border_height) )) curr_button_offset_height += shape_button_height return coords def buttons_overlap(self, buttons_coords, fingertip_point): leftCoord, topCoord = buttons_coords[0] rightCoord, bottomCoord = buttons_coords[1] r, c = fingertip_point return leftCoord <= c <= rightCoord and topCoord <= r <= bottomCoord def update_state(self, frame_shape, data = {}): """ This function should take in state updates from our hands, and update internal state of the game. """ buttons_coord = self.get_buttons_coords(frame_shape) clear_button = buttons_coord[0] button_offset = 1 color_buttons = buttons_coord[button_offset:button_offset+len(self.colors)] button_offset += len(self.colors) shape_buttons = buttons_coord[button_offset:button_offset+len(Shape)] gesture = data.get("gesture", Gesture.HOVER) gesture_finger_points = [v for k, v in data.items() if k.endswith("_tip")] # check if any of the active vector points overlap with our buttons coordinates # overlap with clear button for coord in gesture_finger_points: if self.buttons_overlap(clear_button[2:], coord): # Clear state. self.end_drawing() self.lines = {} self.circles = [] self.squares = [] break # overlap with color button for color_button_metadata in color_buttons: button_color_str = color_button_metadata[0] for coord in gesture_finger_points: if self.buttons_overlap(color_button_metadata[2:], coord): new_color = [color for color in self.colors if color.name == button_color_str][0] if gesture == Gesture.DRAW: self.end_drawing() # assign the color value to our metadata self.color = new_color # overlap with shape button for shape_button_metadata in shape_buttons: shape_str = shape_button_metadata[0] for coord in gesture_finger_points: if self.buttons_overlap(shape_button_metadata[2:], coord): new_shape = [shape for shape in self.shapes if shape.name == shape_str][0] if new_shape != self.shape and Gesture.DRAW: self.end_drawing() self.shape = new_shape if gesture == Gesture.DRAW: midpoint_r, midpoint_c = data.get('origin') radius = int(data.get('radius')) # varying sizes if self.shape == Shape.LINE: self.push_point((midpoint_r, midpoint_c)) if self.shape == Shape.CIRCLE: self.update_circle((midpoint_r, midpoint_c)) if self.shape == Shape.SQUARE: self.update_square((midpoint_r, midpoint_c)) elif gesture == Gesture.HOVER: self.end_drawing() elif gesture == Gesture.ERASE: midpoint_r, midpoint_c = data.get('origin') radius = int(data.get('radius')) self.erase_mode((midpoint_r, midpoint_c), radius) elif gesture == Gesture.TRANSLATE: self.end_drawing() midpoint_r, midpoint_c = data.get('origin') radius = int(data.get('radius')) shift = data.get('shift') shift = int(shift[0]), int(shift[1]) self.translate_mode((midpoint_r, midpoint_c), radius, shift) def draw_canvas(self, frame, data): """ Renders dashboard onto screen """ if self.blackout_background: frame = np.zeros_like(frame) buttons_coord = self.get_buttons_coords(frame.shape) for button_metadata in buttons_coord: button_str = button_metadata[0] button_color_rgb = button_metadata[1] button_left, button_top = button_metadata[2] button_right, button_bottom = button_metadata[3] frame = cv.rectangle(frame, (button_left, button_top), (button_right, button_bottom), button_color_rgb, -1) button_width = button_right - button_left button_height = button_bottom - button_top cv.putText(frame, button_str, (button_left + int((button_width)* .3), int(button_top + button_height * .5)), cv.FONT_HERSHEY_SIMPLEX, .5, Color.WHITE.value, 2, cv.LINE_AA) # highlight selected color if button_str == self.color.name or button_str == self.shape.name: frame = cv.rectangle(frame, (button_left, button_top), (button_right, button_bottom), Color.WHITE.value, 2) gesture = data.get('gesture') if gesture == Gesture.DRAW: midpoint_r, midpoint_c = data['origin'] radius = data['radius'] img = frame.copy() # purple cuz im royal cv.circle(img, (midpoint_c, midpoint_r), int(radius), Color.PURPLE.value, -1) alpha = 0.4 frame = cv.addWeighted(frame, alpha, img, 1-alpha, 0) # draw the ring if we're in the eraser mode if gesture == Gesture.ERASE: # get middle finger and radius of circle to draw midpoint_r, midpoint_c = data['origin'] radius = data['radius'] # put circle on the map, and add some opacity img = frame.copy() cv.circle(img, (midpoint_c, midpoint_r), int(radius), Color.YELLOW.value, -1) alpha = 0.4 frame = cv.addWeighted(frame, alpha, img, 1-alpha, 0) elif gesture == Gesture.TRANSLATE: midpoint_r, midpoint_c = data['origin'] radius = data['radius'] # put circle on the map, and add some opacity img = frame.copy() cv.circle(img, (midpoint_c, midpoint_r), int(radius), Color.WHITE.value, -1) alpha = 0.4 frame = cv.addWeighted(frame, alpha, img, 1-alpha, 0) frame = self.draw_lines(frame) frame = self.draw_circles(frame) frame = self.draw_squares(frame) return frame def update_and_draw(self, frame, data = {}): self.update_state(frame.shape, data) frame = self.draw_canvas(frame, data) return frame def update_circle(self, new_point): """ Maintain state of the currently drawn circle. If it doesnt exist, initialize it and pass pointer to self.circles""" point_row, point_col = new_point if not (0 <= point_row < self.rows and 0 <= point_col < self.columns): return if self.currCircle.active == False: self.currCircle = Circle((point_row, point_col), 5, self.color) self.circles.append(self.currCircle) else: dist = int(xy_euclidean_dist(self.currCircle.origin, new_point)) self.currCircle.radius = dist def update_square(self, new_point): """Updates state of the currently drawn square (resizing it). If it doesn't exist, initialize it and pass pointer to self.squares""" point_row, point_col = new_point if not (0 <= point_row < self.rows and 0 <= point_col < self.columns): return if self.currSquare.active == False: # just initialize with some size self.currSquare = Square(new_point, (point_row + 5, point_col + 5), self.color) self.squares.append(self.currSquare) else: self.currSquare.opposite = new_point def push_point(self, point): """ adds a point to draw later on Arguments: point: (r, c) pair describing new coordinate of the line """ row, col = point if not 0 <= row < self.rows or not 0 <= col < self.columns: return # TODO: replace hashmap approach with just generic list # if there isn't an active line being drawn, start one if self.currLine.active == False: # we need to initialize a line line = Line(self.color, point) # start a line with a new color self.currLine = line self.lines[point] = self.currLine # store origin in the lines else: # get the current line, add the new point to the linked list self.currLine.points.append(point) def end_drawing(self): """Ends active drawing""" self.currLine.active = False self.currCircle.active = False self.currSquare.active = False def draw_lines(self, frame): """ Draws all of the lines we have generated so far by looping through line objects Args: - frame: The image straight from camera Returns: Image with all the different lines drawn on top of it """ # self.lines = [{"color": "BLUE", # "points": [(1, 2), (5, 9), ...]}, # {"color": "RED", # "points": [(6, 0), (5, 8), ...]}, for line in self.lines.values(): for i, point in enumerate(line.points): if i == 0: continue prev_r, prev_c = line.points[i-1] r, c = point cv.line( frame, (prev_c, prev_r), (c, r), line.color.value, 5 ) return frame def draw_circles(self, frame): for circle in self.circles: orig_row, orig_col = circle.origin cv.circle(frame, (orig_col, orig_row), circle.radius, circle.color.value, 3) return frame def draw_squares(self, frame): for square in self.squares: topRow, leftCol, bottomRow, rightCol = square.get_coords() frame = cv.rectangle( frame, (leftCol, topRow), (rightCol, bottomRow), square.color.value, 5 ) return frame def translate_mode(self, position, radius, shift): """ Works as following: 1. gather all lines in the radius 2. for each line: shift each point in the line by the shift variable """ # FIXME: introducing extra lines unnecessarily into the program r, c = position if shift == (0, 0): return # we should be able to collect all unique origin points uniqueLines = set() for origin, line in self.lines.items(): for p in line.points: if xy_euclidean_dist(p, position) <= radius: uniqueLines.add(origin) break # debugging line sortedLines = sorted(list(uniqueLines)) # for each origin point in the circle for og_point in sortedLines: # Transform original points line = self.lines[og_point] translation = [] for r, c in line.points: trans_r, trans_c = r + shift[0], c + shift[1] if (0 <= trans_r < self.rows) and (0 <= trans_c < self.columns): translation.append((trans_r, trans_c)) else: break # Check if transformation is valid if len(translation) == len(line.points): self.lines.pop(og_point) line.points = translation new_origin = line.get_origin() assert(og_point != new_origin) # put the value back in the lines self.lines[line.get_origin()] = line for i, circle in enumerate(self.circles): if circle.overlaps_circle(position, radius): new_origin = (circle.origin[0] + shift[0], circle.origin[1] + shift[1]) circle.origin = new_origin for i, square in enumerate(self.squares): if square.overlaps_circle(position, radius): new_anchor = square.anchor[0] + shift[0], square.anchor[1] + shift[1] new_opposite = square.opposite[0] + shift[0], square.opposite[1] + shift[1] square.anchor = new_anchor square.opposite = new_opposite # start of erase mode code def erase_mode(self, position, radius): """ Interprets the position of the pointer, deletes lines if they overlap with the pointer Arguments: position: (x, y) coordinates of the position radius: the radius (in pixels) of our eraser """ origin_points = [] for origin, lines in self.lines.items(): for point in lines.points: if xy_euclidean_dist(point, position) <= radius: origin_points.append(origin) break for origins in origin_points: self.lines.pop(origins) circles_to_keep = [] for circle in self.circles: if circle.overlaps_circle(position, radius): continue else: circles_to_keep.append(circle) self.circles = circles_to_keep squares_to_keep = [] for square in self.squares: if square.overlaps_circle(position, radius): continue else: squares_to_keep.append(square) self.squares = squares_to_keep class Line(): """ Helper class to represent the lines put on the screen """ def __init__(self, color: Color, origin): self.color = color self.points = [origin] self.active = True def get_origin(self): return self.points[0] def __repr__(self): return f"\ncolor({self.color}) \ \n\tactive({self.active}) \ \n\tpoints({self.points})" class Circle(): """Helper class to place circles on screen""" def __init__(self, origin, radius: int, color: Color): self.origin = origin self.radius = radius self.color = color self.active = True def get_radius(self): return self.radius def overlaps_circle(self, point, other_radius) -> bool: dist = xy_euclidean_dist(self.origin, point) return max(self.radius - other_radius, 0) <= dist <= self.radius + other_radius def __repr__(self): return f"Origin:{self.origin}\tRadius:{self.radius}\tColor:{self.color}" class Square(): def __init__(self, anchor, opposite, color: Color): self.anchor = anchor self.opposite = opposite self.color = color self.active = True def get_coords(self): topRow = min(self.anchor[0], self.opposite[0]) bottomRow = max(self.anchor[0], self.opposite[0]) leftCol = min(self.anchor[1], self.opposite[1]) rightCol = max(self.anchor[1], self.opposite[1]) return (topRow, leftCol, bottomRow, rightCol) def get_height(self): topRow, leftCol, bottomRow, rightCol = self.get_coords() return (bottomRow - topRow) def get_width(self): topRow, leftCol, bottomRow, rightCol = self.get_coords() return (rightCol - leftCol) def overlaps_circle(self, point, radius) -> bool: """ Returns true if the border of our square overlaps with the circle. Args point: (row, col) of the query point Math here - https://stackoverflow.com/a/402010 """ point_r, point_c = point topRow, leftCol, bottomRow, rightCol = self.get_coords() square_center_row = (topRow + bottomRow) // 2 square_center_col = (leftCol + rightCol) // 2 point_dist_r = abs(point_r - square_center_row) # compare against height point_dist_c = abs(point_c - square_center_col) # compare against width half_height = self.get_height() // 2 half_width = self.get_width() // 2 square_border_row_dist = abs(point_dist_r - half_height) square_border_col_dist = abs(point_dist_c - half_width) # Too far from the rectangle if (point_dist_r > (half_height + radius)): return False if (point_dist_c > (half_width + radius)): return False # Too close to the origin if (point_dist_r < (half_height - radius) and point_dist_c < (half_width - radius)): return False # If this code does what I think, it means that # the row is in [half_height - radius, half_height + radius] # the col is in [half_width - radius, half_width + radius] assert(half_width - radius <= point_dist_c <= half_width + radius or half_height - radius <= point_dist_r <= half_height + radius) # Point is within if (point_dist_r > half_width and point_dist_c > half_height): cornerDist = (square_border_col_dist) ** 2 + (square_border_row_dist) ** 2 return cornerDist <= radius**2 return True def __repr__(self): topRow, leftCol, bottomRow, rightCol = self.get_coords() return f"topLeft: {(topRow, leftCol)}\tbottomRight:{(bottomRow, rightCol)}\tcolor:{self.color}" def replay(fname): print("replaying", fname) cap = cv.VideoCapture(fname) # Use whatever width and height possible frame_width = int(cap.get(cv.CAP_PROP_FRAME_WIDTH)) frame_height = int(cap.get(cv.CAP_PROP_FRAME_HEIGHT)) canvas = Canvas(frame_height, frame_width) if (not cap.isOpened()): print("Error opening video file") return detector = HandDetector() while cap.isOpened() and (cv.waitKey(0) & 0xFF != ord('q')): ret, img = cap.read() # replay is completed when the video capture no longer has any frames to read. if ret: gesture_metadata = detector.get_gesture_metadata(img) img = canvas.update_and_draw(img, gesture_metadata) detector.draw_landmarks(img) cv.imshow('Camera', img) else: break cap.release() cv.destroyAllWindows() print("replay complete", fname) def main(): canvas = Canvas(100, 200) line = Line("BLUE", (1, 1)) line.points.append((10, 5)) print(line) if __name__ == '__main__': # replay("./hands_basic_gestures.mp4") # replay("./buttons_overlap.mp4") # replay("./translation_debug.mp4") replay("./hands_drawing_ui.mp4") # replay("./eraser_debug.mp4") # main() ================================================ FILE: data.py ================================================ import cv2 as cv import argparse def record(fname): print("recording ", fname) cam = cv.VideoCapture(0) # Use whatever width and height possible frame_width = int(cam.get(cv.CAP_PROP_FRAME_WIDTH)) frame_height = int(cam.get(cv.CAP_PROP_FRAME_HEIGHT)) fourcc = cv.VideoWriter_fourcc(*'mp4v') out = cv.VideoWriter(fname, fourcc, 60.0, (frame_width, frame_height)) while True: _, img = cam.read() img = cv.flip(img, 1) out.write(img) cv.imshow('Recording', img) if cv.waitKey(1) & 0xFF == ord('q'): break out.release() cam.release() cv.destroyAllWindows() print("recording complete. shutting down.") def replay(fname): print("replaying", fname) cap = cv.VideoCapture(fname) print("captured") if (not cap.isOpened()): print("Error opening video file") return print("waiting to open") while cap.isOpened(): # and (cv.waitKey(0) & 0xFF != ord('q')): ret, img = cap.read() # replay is completed when the video capture no longer has any frames to read. if ret: cv.imshow('Camera', img) else: break print("img", img.size) cap.release() cv.destroyAllWindows() print("replay complete", fname) def main(): parser = argparse.ArgumentParser( prog='data.py', description='data collections tools' ) parser.add_argument("-m", "--mode") parser.add_argument("-f", "--filename") args = parser.parse_args() if not args.filename.endswith(".mp4"): print(f"filename({args.filename}) must end with .mp4") return False if args.mode == 'replay': replay(args.filename) elif args.mode == "record": record(args.filename) else: print(f"data mode must fall into ['replay', 'record'], provided {args.mode}") return False if __name__ == "__main__": main() ================================================ FILE: hands.py ================================================ import cv2 as cv import mediapipe as mp import numpy as np from enum import Enum from collections import deque from util import xy_euclidean_dist, vectorize, cos_angle class Gesture(Enum): DRAW = 'DRAW' HOVER = 'HOVER' ERASE = 'ERASE' TRANSLATE = 'TRANSLATE' class LandmarkBuffer(): """Helper RingBuffer class to abstract away averaging logic""" def __init__(self, max_size): self.buf = deque([], maxlen=max_size) def push_landmark(self, element): self.buf.append(element) def average_landmarks(self): assert(len(self.buf) > 0) res = [[0]*3 for i in range(21)] num_points = len(self.buf) for landmark in self.buf: for i, vec in enumerate(landmark): res[i][0] += vec[0] res[i][1] += vec[1] res[i][2] += vec[2] for i, vec in enumerate(res): res[i][0] /= num_points res[i][1] /= num_points res[i][2] /= num_points return res def displacement(self): """Calculates the residual from the last two landmarks""" res = [[0]*3 for i in range(21)] num_points = len(self.buf) if num_points < 2 or any([len(landmark) != 21 for landmark in self.buf]): return res for i in range(21): for j in range(3): res[i][j] = self.buf[-1][i][j] - self.buf[-2][i][j] return res class HandDetector(): """ This class defines the interaction the program will have with Mediapipe. It is essentially a wrapper layer around MP. This class will define how Airdraw will be passing information to and receiving information from Mediapipe. Successful implementation of this class should involve no image rendering, but rather just state transformation of hands, gestures, and other metadata used from Mediapipe. """ def __init__(self, mode = False, max_hands = 1): # setup self.max_hands = max_hands self.mode = mode # hand drawing stuff self.hands = mp.solutions.hands.Hands(self.mode, self.max_hands) self.drawing = mp.solutions.drawing_utils self.hand_connections = mp.solutions.hands.HAND_CONNECTIONS # will be used for translation self.translation_buffer = LandmarkBuffer(5) # we have 0 velocity to start translation def detect_landmarks(self, frame): """ Noting all the points of one's hand in the image. args: - frame: np array representing image input. used to resize the prediction against mediapipe (will just use the builtin api soon though). returns: - list of landmarks on the hand in order of size and position """ img_rgb = cv.cvtColor(frame, cv.COLOR_BGR2RGB) # I think we need RGB self.results = self.hands.process(img_rgb) landmarks = [] if self.results.multi_hand_landmarks: my_hand = self.results.multi_hand_landmarks[0] # should only be one for idx, landmark in enumerate(my_hand.landmark): height, width, _ = frame.shape x, y = int(landmark.x * width), int(landmark.y * height) landmarks.append((idx, x, y)) return landmarks def draw_landmarks(self, img): """ Draws hand landmarks on image. Breaks rules of class being only "img"->hand current state, but I think this looks the best so I'm keeping it this way. """ if self.results.multi_hand_landmarks: for hand_landmark in self.results.multi_hand_landmarks: self.drawing.draw_landmarks(img, hand_landmark, mp.solutions.hands.HAND_CONNECTIONS) def detect_gesture(self, landmarks, threshhold=0.70, debug=False): """ This function determines which "mode" we are in, signified by the hand-signs someone indicates when we are drawing Arguments: landmarks: finger points threshhold: value we need in order to change 'modes' returns: String that matches the gesture we have """ # adding all vectors # palm vectors palm_index_vector = vectorize(landmarks[0], landmarks[5]) palm_mid_vector = vectorize(landmarks[0], landmarks[9]) palm_ring_vector = vectorize(landmarks[0], landmarks[13]) palm_pinky_vector = vectorize(landmarks[0], landmarks[17]) # index vectors, each start from first knuckle of the hand index_vector = vectorize(landmarks[6], landmarks[8]) middle_vector = vectorize(landmarks[10], landmarks[12]) ring_vector = vectorize(landmarks[14], landmarks[16]) pinky_vector = vectorize(landmarks[18], landmarks[20]) # really just to debug if debug: return cos_angle(index_vector, palm_index_vector) # index finger pointing out, # middle/ring/pinky finger tucked if cos_angle(palm_index_vector, index_vector) > threshhold and \ cos_angle(index_vector, middle_vector) < 0 and \ cos_angle(index_vector, ring_vector) < 0 and \ cos_angle(index_vector, pinky_vector) < 0: return Gesture.HOVER # index/middle finger pointing out, # ring/pinky finger tucked if cos_angle(palm_index_vector, index_vector) > threshhold and \ cos_angle(palm_mid_vector, middle_vector) > threshhold and \ cos_angle(index_vector, ring_vector) < 0 and \ cos_angle(index_vector, pinky_vector) < 0: return Gesture.DRAW # index/middle/ring finger pointing out # pinky finger tucked if cos_angle(palm_index_vector, index_vector) > threshhold and \ cos_angle(index_vector, middle_vector) > 0.90 and \ cos_angle(index_vector, ring_vector) > 0.90 and \ cos_angle(palm_pinky_vector, pinky_vector) < 0: return Gesture.ERASE # add the stuff relative to knuckles if cos_angle(palm_index_vector, index_vector) > threshhold and \ cos_angle(palm_pinky_vector, pinky_vector) > threshhold and \ cos_angle(index_vector, middle_vector) < 0 and \ cos_angle(index_vector, ring_vector) < 0: return Gesture.TRANSLATE # otherwise hover return Gesture.HOVER def get_gesture_metadata(self, frame): """ Calls MP on frame and returns metadata about gesture determined. Args: - frame: np array defining our image. Returns: - returns a dict defining gesture as well as metadata to draw output with. """ landmark_list = self.detect_landmarks(frame) if len(landmark_list) == 0 or np.sum(landmark_list) == 0: return {} self.translation_buffer.push_landmark(landmark_list) average_landmark_list = self.translation_buffer.average_landmarks() gesture = self.detect_gesture(average_landmark_list) # only extract the row, col before sending it literally anywhere else _, index_c, index_r = average_landmark_list[8] _, mid_c, mid_r = average_landmark_list[12] _, ring_c, ring_r = average_landmark_list[16] _, pinky_c, pinky_r = average_landmark_list[20] # just writing in finger info index_fing_tip = (index_r, index_c) # coordinates of tip of index fing mid_fing_tip = (mid_r, mid_c) ring_fing_tip = (ring_r, ring_c) pinky_fing_tip = (pinky_r, pinky_c) # data sent to canvas: # formatted in row, column format because I index the internal grid that way. post = {'gesture': gesture, 'idx_fing_tip': index_fing_tip, 'mid_fing_tip' : mid_fing_tip, 'ring_fing_tip': ring_fing_tip, 'pinky_fing_tip': pinky_fing_tip, 'origin': None, 'radius': None, 'shift': None, } if gesture == Gesture.DRAW: distance = xy_euclidean_dist(index_fing_tip, mid_fing_tip) index_r, index_c = index_fing_tip mid_r, mid_c = mid_fing_tip midpoint_r, midpoint_c = int((index_r + mid_r) * 0.5), int((index_c + mid_c) * 0.5) post['origin'] = (midpoint_r, midpoint_c) post['radius'] = distance * 0.5 elif gesture == Gesture.ERASE: distance = xy_euclidean_dist(index_fing_tip, ring_fing_tip) index_r, index_c = index_fing_tip ring_r, ring_c = ring_fing_tip midpoint_r, midpoint_c = int((index_r + ring_r) * 0.5), int((index_c + ring_c) * 0.5) post['origin'] = (midpoint_r, midpoint_c) post['radius'] = distance * 0.5 # Add additonal info based off of info the gesture we got elif gesture == Gesture.TRANSLATE: distance = xy_euclidean_dist(index_fing_tip, pinky_fing_tip) index_r, index_c = index_fing_tip pinky_r, pinky_c = pinky_fing_tip midpoint_r, midpoint_c = int((index_r + pinky_r) * 0.5), int((index_c + pinky_c) * 0.5) post['origin'] = (midpoint_r, midpoint_c) post['radius'] = distance * 0.5 # Calculate and store the shift displacement = self.translation_buffer.displacement() index_displacement = displacement[8] _, index_c_displacement, index_r_displacement = index_displacement post['shift'] = (index_r_displacement, index_c_displacement) elif gesture == Gesture.HOVER: index_r, index_c = index_fing_tip midpoint_r, midpoint_c = int(index_r), int(index_c) # Update previous position position with current point return post def replay(fname): print("replaying", fname) cap = cv.VideoCapture(fname) detector = HandDetector() if (not cap.isOpened()): print("Error opening video file") return while cap.isOpened() and (cv.waitKey(0) & 0xFF != ord('q')): ret, img = cap.read() # replay is completed when the video capture no longer has any frames to read. if ret: landmark_list = detector.detect_landmarks(img) detector.draw_landmarks(img) if len(landmark_list) != 0: val = detector.detect_gesture(landmark_list, threshhold=0.9) cv.putText(img, f"Mode: {val.value}", (50, 50), cv.FONT_HERSHEY_SIMPLEX, 1, (255, 0, 0), 2, cv.LINE_AA) cv.imshow('Camera', img) else: break cap.release() cv.destroyAllWindows() print("replay complete", fname) def live_demo(): cap = cv.VideoCapture(0) detector = HandDetector() while True: _, img = cap.read() img = cv.flip(img, 1) landmark_list = detector.detect_landmarks(img) detector.draw_landmarks(img) if len(landmark_list) != 0: val = detector.detect_gesture(landmark_list, threshhold=0.9) cv.putText(img, f"Mode: {val.value}", (50, 50), cv.FONT_HERSHEY_SIMPLEX, 1, (255, 0, 0), 2, cv.LINE_AA) cv.imshow('Camera', img) if cv.waitKey(1) & 0xFF == ord('q'): break cap.release() cv.destroyAllWindows() if __name__ == "__main__": replay('hands_basic_gestures.mp4') ================================================ FILE: requirements.txt ================================================ absl-py==0.15.0 attrs==21.2.0 cycler==0.11.0 kiwisolver==1.3.2 matplotlib==3.4.3 mediapipe==0.8.9 numpy==1.21.4 opencv-contrib-python==4.5.4.58 opencv-python==4.5.4.58 Pillow==8.4.0 protobuf==3.19.1 pyparsing==3.0.5 python-dateutil==2.8.2 six==1.16.0 ================================================ FILE: util.py ================================================ import numpy as np def xy_euclidean_dist(a1, a2): return ((a1[0] - a2[0]) ** 2 + (a1[1] - a2[1]) ** 2) ** 0.5 def clamp(value, lower_bound, upper_bound): return min(upper_bound, max(value, lower_bound)) def vectorize(u, v): assert(len(u) == len(v)) # cant vectorize unequal lengths return [v[i] - u[i] for i in range(len(v))] def vector_magnitude(vector): return sum([dim**2 for dim in vector]) ** 0.5 def cos_angle(u, v): u_mag = vector_magnitude(u) v_mag = vector_magnitude(v) if (u_mag == 0 or v_mag == 0): return 0 return np.dot(u, v) / (vector_magnitude(u) * vector_magnitude(v))