Full Code of googlecreativelab/quickdraw-dataset for AI

master 5fe6c0a910b3 cached

9 files

29.1 KB

8.4k tokens

4 symbols

1 requests

Download .txt

Repository: googlecreativelab/quickdraw-dataset
Branch: master
Commit: 5fe6c0a910b3
Files: 9
Total size: 29.1 KB

Directory structure:
gitextract_0rv1hnhi/

├── LICENSE
├── README.md
├── categories.txt
└── examples/
    ├── binary_file_parser.py
    └── nodejs/
        ├── .gitignore
        ├── binary-parser.js
        ├── ndjson.md
        ├── package.json
        └── simplified-parser.js

================================================
FILE CONTENTS
================================================

================================================
FILE: LICENSE
================================================
This data made available by Google, Inc. under the Creative Commons Attribution 4.0 International license.
https://creativecommons.org/licenses/by/4.0/


================================================
FILE: README.md
================================================
# The Quick, Draw! Dataset
![preview](preview.jpg)

The Quick Draw Dataset is a collection of 50 million drawings across [345 categories](categories.txt), contributed by players of the game [Quick, Draw!](https://quickdraw.withgoogle.com). The drawings were captured as timestamped vectors, tagged with metadata including what the player was asked to draw and in which country the player was located. You can browse the recognized drawings on [quickdraw.withgoogle.com/data](https://quickdraw.withgoogle.com/data).

We're sharing them here for developers, researchers, and artists to explore, study, and learn from. If you create something with this dataset, please let us know [by e-mail](mailto:quickdraw-support@google.com) or at [A.I. Experiments](https://aiexperiments.withgoogle.com/submit).

We have also released a tutorial and model for training your own drawing classifier on [tensorflow.org](https://github.com/tensorflow/docs/blob/master/site/en/r1/tutorials/sequences/recurrent_quickdraw.md).

Please keep in mind that while this collection of drawings was individually moderated, it may still contain inappropriate content.

## Content
- [The raw moderated dataset](#the-raw-moderated-dataset)
- [Preprocessed dataset](#preprocessed-dataset)
- [Get the data](#get-the-data)
- [Projects using the dataset](#projects-using-the-dataset)
- [Changes](#changes)
- [License](#license)


## The raw moderated dataset
The raw data is available as [`ndjson`](https://github.com/ndjson) files seperated by category, in the following format: 

| Key          | Type                   | Description                                  |
| ------------ | -----------------------| -------------------------------------------- |
| key_id       | 64-bit unsigned integer| A unique identifier across all drawings.     |
| word         | string                 | Category the player was prompted to draw.    |
| recognized   | boolean                | Whether the word was recognized by the game. |
| timestamp    | datetime               | When the drawing was created.                |
| countrycode  | string                 | A two letter country code ([ISO 3166-1 alpha-2](https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2)) of where the player was located. |
| drawing      | string                 | A JSON array representing the vector drawing |  


Each line contains one drawing. Here's an example of a single drawing:

```javascript
  { 
    "key_id":"5891796615823360",
    "word":"nose",
    "countrycode":"AE",
    "timestamp":"2017-03-01 20:41:36.70725 UTC",
    "recognized":true,
    "drawing":[[[129,128,129,129,130,130,131,132,132,133,133,133,133,...]]]
  }
```

The format of the drawing array is as following:
 
```javascript
[ 
  [  // First stroke 
    [x0, x1, x2, x3, ...],
    [y0, y1, y2, y3, ...],
    [t0, t1, t2, t3, ...]
  ],
  [  // Second stroke
    [x0, x1, x2, x3, ...],
    [y0, y1, y2, y3, ...],
    [t0, t1, t2, t3, ...]
  ],
  ... // Additional strokes
]
```

Where `x` and `y` are the pixel coordinates, and `t` is the time in milliseconds since the first point. `x` and `y` are real-valued while `t` is an integer. The raw drawings can have vastly different bounding boxes and number of points due to the different devices used for display and input.

## Preprocessed dataset
We've preprocessed and split the dataset into different files and formats to make it faster and easier to download and explore.

#### Simplified Drawing files (`.`)
We've simplified the vectors, removed the timing information, and positioned and scaled the data into a 256x256 region. The data is exported in [`ndjson`](https://github.com/ndjson) format with the same metadata as the raw format. The simplification process was:

1. Align the drawing to the top-left corner, to have minimum values of 0.
2. Uniformly scale the drawing, to have a maximum value of 255. 
3. Resample all strokes with a 1 pixel spacing.
4. Simplify all strokes using the [Ramer–Douglas–Peucker algorithm](https://en.wikipedia.org/wiki/Ramer%E2%80%93Douglas%E2%80%93Peucker_algorithm) with an epsilon value of 2.0.

There is an example in [examples/nodejs/simplified-parser.js](examples/nodejs/simplified-parser.js) showing how to read ndjson files in NodeJS.  
Additionally, the [examples/nodejs/ndjson.md](examples/nodejs/ndjson.md) document details a set of command-line tools that can help explore subsets of these quite large files.

#### Binary files (`.bin`)
The simplified drawings and metadata are also available in a custom binary format for efficient compression and loading.

There is an example in [examples/binary_file_parser.py](examples/binary_file_parser.py) showing how to load the binary files in Python.  
There is also an example in [examples/nodejs/binary-parser.js](examples/nodejs/binary-parser.js) showing how to read the binary files in NodeJS.

#### Numpy bitmaps (`.npy`)
All the simplified drawings have been rendered into a 28x28 grayscale bitmap in numpy `.npy` format. The files can be loaded with [`np.load()`](https://docs.scipy.org/doc/numpy-1.12.0/reference/generated/numpy.load.html). These images were generated from the simplified data, but are aligned to the center of the drawing's bounding box rather than the top-left corner. [See here for code snippet used for generation](https://github.com/googlecreativelab/quickdraw-dataset/issues/19#issuecomment-402247262).

## Get the data
The dataset is available on Google Cloud Storage as [`ndjson`](https://github.com/ndjson) files seperated by category. See the list of files in [Cloud 
](https://console.cloud.google.com/storage/browser/quickdraw_dataset/), or read more about [accessing public datasets](https://cloud.google.com/storage/docs/access-public-data) using other methods. As an example, to easily download all simplified drawings, one way is to run the command `gsutil -m cp 'gs://quickdraw_dataset/full/simplified/*.ndjson' .` 

#### Full dataset seperated by categories
- [Raw files](https://console.cloud.google.com/storage/browser/quickdraw_dataset/full/raw) (`.ndjson`)
- [Simplified drawings files](https://console.cloud.google.com/storage/browser/quickdraw_dataset/full/simplified) (`.ndjson`)
- [Binary files](https://console.cloud.google.com/storage/browser/quickdraw_dataset/full/binary) (`.bin`)
- [Numpy bitmap files](https://console.cloud.google.com/storage/browser/quickdraw_dataset/full/numpy_bitmap) (`.npy`)

#### Sketch-RNN QuickDraw Dataset
This data is also used for training the [Sketch-RNN](https://arxiv.org/abs/1704.03477) model.  An open source, TensorFlow implementation of this model is available in the [Magenta Project](https://magenta.tensorflow.org/sketch_rnn), (link to GitHub [repo](https://github.com/tensorflow/magenta/tree/master/magenta/models/sketch_rnn)).  You can also read more about this model in this Google Research [blog post](https://research.googleblog.com/2017/04/teaching-machines-to-draw.html).  The data is stored in compressed `.npz` files, in a format suitable for inputs into a recurrent neural network.

In this dataset, 75K samples (70K Training, 2.5K Validation, 2.5K Test) has been randomly selected from each category, processed with [RDP](https://en.wikipedia.org/wiki/Ramer%E2%80%93Douglas%E2%80%93Peucker_algorithm) line simplification with an `epsilon` parameter of 2.0.  Each category will be stored in its own `.npz` file, for example, `cat.npz`.

We have also provided the full data for each category, if you want to use more than 70K training examples.  These are stored with the `.full.npz` extensions.

- [Numpy .npz files](https://console.cloud.google.com/storage/browser/quickdraw_dataset/sketchrnn)

*Note:* For Python3, loading the `npz` files using `np.load(data_filepath, encoding='latin1', allow_pickle=True)`

Instructions for converting Raw `ndjson` files to this `npz` format is available in this [notebook](https://github.com/hardmaru/quickdraw-ndjson-to-npz).

## Projects using the dataset
Here are some projects and experiments that are using or featuring the dataset in interesting ways. Got something to add? [Let us know!](mailto:quickdraw-support@google.com)

*Creative and artistic projects*

- [Letter collages](http://frauzufall.de/en/2017/google-quick-draw/) by [Deborah Schmidt](http://frauzufall.de/)
- [Face tracking experiment](https://www.instagram.com/p/BUU8TuQD6_v/) by [Neil Mendoza](http://www.neilmendoza.com/)
- [Faces of Humanity](http://project.laboiteatortue.com/facesofhumanity/) by [Tortue](www.laboiteatortue.com)
- [Infinite QuickDraw](https://kynd.github.io/infinite_quickdraw/) by [kynd.info](http://kynd.info)
- [Misfire.io](http://misfire.io/) by Matthew Collyer
- [Draw This](http://danmacnish.com/2018/07/01/draw-this/) by [Dan Macnish](http://danmacnish.com/)
- [Scribbling Speech](http://xinyue.de/scribbling-speech.html) by [Xinyue Yang](http://xinyue.de/)
- illustrAItion by [Ling Chen](https://github.com/lingchen42/illustrAItion)
- [Dreaming of Electric Sheep](https://medium.com/@libreai/dreaming-of-electric-sheep-d1aca32545dc) by [
Dr. Ernesto Diaz-Aviles](http://ernesto.diazaviles.com/)

*Data analyses*

- [How do you draw a circle?](https://qz.com/994486/the-way-you-draw-circles-says-a-lot-about-you/) by [Quartz](https://qz.com/)
- [Forma Fluens](http://formafluens.io/) by [Mauro Martino](http://www.mamartino.com/), [Hendrik Strobelt](http://hendrik.strobelt.com/) and [Owen Cornec](http://www.byowen.com/)
- [How Long Does it Take to (Quick) Draw a Dog?](http://vallandingham.me/quickdraw/) by [Jim Vallandingham](http://vallandingham.me/)
- [Finding bad flamingo drawings with recurrent neural networks](http://colinmorris.github.io/blog/bad_flamingos) by [Colin Morris](http://colinmorris.github.io/)
- [Facets Dive x Quick, Draw!](https://pair-code.github.io/facets/quickdraw.html) by [People + AI Research Initiative (PAIR), Google](https://ai.google/pair)
- [Exploring and Visualizing an Open Global Dataset](https://research.googleblog.com/2017/08/exploring-and-visualizing-open-global.html) by Google Research
- [Machine Learning for Visualization](https://medium.com/@enjalot/machine-learning-for-visualization-927a9dff1cab) - Talk / article by Ian Johnson

*Papers*
- [A Neural Representation of Sketch Drawings](https://arxiv.org/pdf/1704.03477.pdf) by [David Ha](https://scholar.google.com/citations?user=J1j92GsxVUMC&hl=en), [Douglas Eck](https://scholar.google.com/citations?user=bLb3VdIAAAAJ&hl=en), ICLR 2018. [code](https://github.com/tensorflow/magenta/tree/master/magenta/models/sketch_rnn)
- [Sketchmate: Deep hashing for million-scale human sketch retrieval](http://openaccess.thecvf.com/content_cvpr_2018/papers/Xu_SketchMate_Deep_Hashing_CVPR_2018_paper.pdf) by [Peng Xu](http://www.pengxu.net/) et al., CVPR 2018.
- [Multi-graph transformer for free-hand sketch recognition](https://arxiv.org/pdf/1912.11258.pdf) by [Peng Xu](http://www.pengxu.net/), [Chaitanya K Joshi](https://chaitjo.github.io/), [Xavier Bresson](https://www.ntu.edu.sg/home/xbresson/), ArXiv 2019. [code](https://github.com/PengBoXiangShang/multigraph_transformer)
- [Deep Self-Supervised Representation Learning for Free-Hand Sketch](https://arxiv.org/pdf/2002.00867.pdf) by [Peng Xu](http://www.pengxu.net/) et al., ArXiv 2020. [code](https://github.com/zzz1515151/self-supervised_learning_sketch)
- [SketchTransfer: A Challenging New Task for Exploring Detail-Invariance and the Abstractions Learned by Deep Networks](https://arxiv.org/pdf/1912.11570.pdf) by [Alex Lamb](https://sites.google.com/view/alexmlamb), [Sherjil Ozair](https://sherjilozair.github.io/), [Vikas Verma](https://scholar.google.com/citations?user=wo_M4uQAAAAJ&hl=en), [David Ha](https://scholar.google.com/citations?user=J1j92GsxVUMC&hl=en), WACV 2020.
- [Deep Learning for Free-Hand Sketch: A Survey](https://arxiv.org/pdf/2001.02600.pdf) by [Peng Xu](http://www.pengxu.net/), ArXiv 2020.
- [A Novel Sketch Recognition Model based on Convolutional Neural Networks](https://ieeexplore.ieee.org/document/9152911) by [Abdullah Talha Kabakus](https://www.linkedin.com/in/talhakabakus), 2nd International Congress on Human-Computer Interaction, Optimization and Robotic Applications, pp. 101-106, 2020.

*Guides & Tutorials*
- [TensorFlow tutorial for drawing classification](https://github.com/tensorflow/docs/blob/master/site/en/r1/tutorials/sequences/recurrent_quickdraw.md)
- [Train a model in tf.keras with Colab, and run it in the browser with TensorFlow.js](https://medium.com/tensorflow/train-on-google-colab-and-run-on-the-browser-a-case-study-8a45f9b1474e) by Zaid Alyafeai

*Code and tools*
- [Quick, Draw! Polymer Component & Data API](https://github.com/googlecreativelab/quickdraw-component) by Nick Jonas
- [Quick, Draw for Processing](https://github.com/codybenlewis/Quick-Draw-for-Processing) by [Cody Ben Lewis](https://twitter.com/CodyBenLewis)
- [Quick, Draw! prediction model](https://github.com/keisukeirie/quickdraw_prediction_model) by Keisuke Irie 
- [Random sample tool](http://learning.statistics-is-awesome.org/draw/) by [Learning statistics is awesome](http://learning.statistics-is-awesome.org/)
- [SVG rendering in d3.js example](https://bl.ocks.org/enjalot/a2b28f0ed18b891f9fb70910f1b8886d) by [Ian Johnson](http://enja.org/) (read more about the process [here](https://gist.github.com/enjalot/54c4342eb7527ea523884dbfa52d174b))
- [Sketch-RNN Classification](https://github.com/payalbajaj/sketch_rnn_classification) by Payal Bajaj
- [quickdraw.js](https://github.com/wagenaartje/quickdraw.js) by Thomas Wagenaar
- [~ Doodler ~](https://github.com/krishnasriSomepalli/cs50-project/) by [
Krishna Sri Somepalli](https://krishnasrisomepalli.github.io/)
- [quickdraw Python API](http://quickdraw.readthedocs.io) by [Martin O'Hanlon](https://github.com/martinohanlon)
- [RealTime QuickDraw](https://github.com/akshaybahadur21/QuickDraw) by [Akshay Bahadur](http://akshaybahadur.com/)
- [DataFlow processing](https://github.com/gxercavins/dataflow-samples/tree/master/quick-draw) by Guillem Xercavins 
- [QuickDrawGH Rhino Plugin](https://www.food4rhino.com/app/quickdrawgh) by [James Dalessandro](https://github.com/DalessandroJ)
- [QuickDrawBattle](https://andri.io/quickdrawbattle/) by [Andri Soone](https://github.com/ndri)


## Changes

May 25, 2017: Updated Sketch-RNN QuickDraw dataset, created `.full.npz` complementary sets.

## License
This data made available by Google, Inc. under the [Creative Commons Attribution 4.0 International license.](https://creativecommons.org/licenses/by/4.0/)

## Dataset Metadata
The following table is necessary for this dataset to be indexed by search
engines such as <a href="https://g.co/datasetsearch">Google Dataset Search</a>.
<div itemscope itemtype="http://schema.org/Dataset">
<table>
  <tr>
    <th>property</th>
    <th>value</th>
  </tr>
  <tr>
    <td>name</td>
    <td><code itemprop="name">The Quick, Draw! Dataset</code></td>
  </tr>
  <tr>
    <td>alternateName</td>
    <td><code itemprop="alternateName">Quick Draw Dataset</code></td>
  </tr>
  <tr>
    <td>alternateName</td>
    <td><code itemprop="alternateName">quickdraw-dataset</code></td>
  </tr>
  <tr>
    <td>url</td>
    <td><code itemprop="url">https://github.com/googlecreativelab/quickdraw-dataset</code></td>
  </tr>
  <tr>
    <td>sameAs</td>
    <td><code itemprop="sameAs">https://github.com/googlecreativelab/quickdraw-dataset</code></td>
  </tr>
  <tr>
    <td>description</td>
    <td><code itemprop="description">The Quick Draw Dataset is a collection of 50 million drawings across 345 categories, contributed by players of the game "Quick, Draw!". The drawings were captured as timestamped vectors, tagged with metadata including what the player was asked to draw and in which country the player was located.\n
\n
Example drawings:
![preview](https://raw.githubusercontent.com/googlecreativelab/quickdraw-dataset/master/preview.jpg)</code></td>
  </tr>
  <tr>
    <td>provider</td>
    <td>
      <div itemscope itemtype="http://schema.org/Organization" itemprop="provider">
        <table>
          <tr>
            <th>property</th>
            <th>value</th>
          </tr>
          <tr>
            <td>name</td>
            <td><code itemprop="name">Google</code></td>
          </tr>
          <tr>
            <td>sameAs</td>
            <td><code itemprop="sameAs">https://en.wikipedia.org/wiki/Google</code></td>
          </tr>
        </table>
      </div>
    </td>
  </tr>
  <tr>
    <td>license</td>
    <td>
      <div itemscope itemtype="http://schema.org/CreativeWork" itemprop="license">
        <table>
          <tr>
            <th>property</th>
            <th>value</th>
          </tr>
          <tr>
            <td>name</td>
            <td><code itemprop="name">CC BY 4.0</code></td>
          </tr>
          <tr>
            <td>url</td>
            <td><code itemprop="url">https://creativecommons.org/licenses/by/4.0/</code></td>
          </tr>
        </table>
      </div>
    </td>
  </tr>
</table>
</div>


================================================
FILE: categories.txt
================================================
aircraft carrier
airplane
alarm clock
ambulance
angel
animal migration
ant
anvil
apple
arm
asparagus
axe
backpack
banana
bandage
barn
baseball
baseball bat
basket
basketball
bat
bathtub
beach
bear
beard
bed
bee
belt
bench
bicycle
binoculars
bird
birthday cake
blackberry
blueberry
book
boomerang
bottlecap
bowtie
bracelet
brain
bread
bridge
broccoli
broom
bucket
bulldozer
bus
bush
butterfly
cactus
cake
calculator
calendar
camel
camera
camouflage
campfire
candle
cannon
canoe
car
carrot
castle
cat
ceiling fan
cello
cell phone
chair
chandelier
church
circle
clarinet
clock
cloud
coffee cup
compass
computer
cookie
cooler
couch
cow
crab
crayon
crocodile
crown
cruise ship
cup
diamond
dishwasher
diving board
dog
dolphin
donut
door
dragon
dresser
drill
drums
duck
dumbbell
ear
elbow
elephant
envelope
eraser
eye
eyeglasses
face
fan
feather
fence
finger
fire hydrant
fireplace
firetruck
fish
flamingo
flashlight
flip flops
floor lamp
flower
flying saucer
foot
fork
frog
frying pan
garden
garden hose
giraffe
goatee
golf club
grapes
grass
guitar
hamburger
hammer
hand
harp
hat
headphones
hedgehog
helicopter
helmet
hexagon
hockey puck
hockey stick
horse
hospital
hot air balloon
hot dog
hot tub
hourglass
house
house plant
hurricane
ice cream
jacket
jail
kangaroo
key
keyboard
knee
knife
ladder
lantern
laptop
leaf
leg
light bulb
lighter
lighthouse
lightning
line
lion
lipstick
lobster
lollipop
mailbox
map
marker
matches
megaphone
mermaid
microphone
microwave
monkey
moon
mosquito
motorbike
mountain
mouse
moustache
mouth
mug
mushroom
nail
necklace
nose
ocean
octagon
octopus
onion
oven
owl
paintbrush
paint can
palm tree
panda
pants
paper clip
parachute
parrot
passport
peanut
pear
peas
pencil
penguin
piano
pickup truck
picture frame
pig
pillow
pineapple
pizza
pliers
police car
pond
pool
popsicle
postcard
potato
power outlet
purse
rabbit
raccoon
radio
rain
rainbow
rake
remote control
rhinoceros
rifle
river
roller coaster
rollerskates
sailboat
sandwich
saw
saxophone
school bus
scissors
scorpion
screwdriver
sea turtle
see saw
shark
sheep
shoe
shorts
shovel
sink
skateboard
skull
skyscraper
sleeping bag
smiley face
snail
snake
snorkel
snowflake
snowman
soccer ball
sock
speedboat
spider
spoon
spreadsheet
square
squiggle
squirrel
stairs
star
steak
stereo
stethoscope
stitches
stop sign
stove
strawberry
streetlight
string bean
submarine
suitcase
sun
swan
sweater
swing set
sword
syringe
table
teapot
teddy-bear
telephone
television
tennis racquet
tent
The Eiffel Tower
The Great Wall of China
The Mona Lisa
tiger
toaster
toe
toilet
tooth
toothbrush
toothpaste
tornado
tractor
traffic light
train
tree
triangle
trombone
truck
trumpet
t-shirt
umbrella
underwear
van
vase
violin
washing machine
watermelon
waterslide
whale
wheel
windmill
wine bottle
wine glass
wristwatch
yoga
zebra
zigzag


================================================
FILE: examples/binary_file_parser.py
================================================
# Copyright 2017 Google Inc.
# 
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
# 
# https://www.apache.org/licenses/LICENSE-2.0
# 
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import struct
from struct import unpack


def unpack_drawing(file_handle):
    key_id, = unpack('Q', file_handle.read(8))
    country_code, = unpack('2s', file_handle.read(2))
    recognized, = unpack('b', file_handle.read(1))
    timestamp, = unpack('I', file_handle.read(4))
    n_strokes, = unpack('H', file_handle.read(2))
    image = []
    for i in range(n_strokes):
        n_points, = unpack('H', file_handle.read(2))
        fmt = str(n_points) + 'B'
        x = unpack(fmt, file_handle.read(n_points))
        y = unpack(fmt, file_handle.read(n_points))
        image.append((x, y))

    return {
        'key_id': key_id,
        'country_code': country_code,
        'recognized': recognized,
        'timestamp': timestamp,
        'image': image
    }


def unpack_drawings(filename):
    with open(filename, 'rb') as f:
        while True:
            try:
                yield unpack_drawing(f)
            except struct.error:
                break


for drawing in unpack_drawings('nose.bin'):
    # do something with the drawing
    print(drawing['country_code'])


================================================
FILE: examples/nodejs/.gitignore
================================================
node_modules


================================================
FILE: examples/nodejs/binary-parser.js
================================================
/*
Copyright 2017 Google Inc.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

https://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
/*
  Demonstration of parsing binary files from  Quick, Draw! dataset with NodeJS.

  https://github.com/googlecreativelab/quickdraw-dataset
  https://quickdraw.withgoogle.com/data

  This demo assumes you've put the file "face.bin" into a folder called "data"
  in the same directory as this script.
*/
var fs = require('fs');
var Parser = require('binary-parser').Parser;
var BigInteger = require('javascript-biginteger').BigInteger;

var Drawing = Parser.start()
  .endianess('little')
  .array('key_id', {
      type: 'uint8',
      length: 8
  })
  .string('countrycode', { length: 2, encoding: 'ascii' })
  // .uint8('recognized')
  .bit1('recognized')
  .uint32le('timestamp') // unix timestamp in seconds
  .uint16le('n_strokes')
  .array('strokes', {
    type: Parser.start()
      .uint16le('n_points')
      .array('x', {
        type: 'uint8',
        length: 'n_points'
      })
      .array('y', {
        type: 'uint8',
        length: 'n_points'
      }),
    length: 'n_strokes'
  });

function parseBinaryDrawings(fileName, callback) {
  fs.readFile(fileName, function(err, buffer) {
    var unpacked = Parser.start()
      .array('drawings', {
          type: Drawing,
          // length: 2
          readUntil: 'eof'
      }).parse(buffer);
    // console.log("unpacked", unpacked)
    var drawings = unpacked.drawings.map(function(d) {
      var ka = d.key_id;
      // the key is a long integer so we have to parse it specially
      var key = BigInteger(0);
      for (var i = 7; i >= 0; i--) {
        key = key.multiply(256);
        key = key.add(ka[i]);
      }
      var strokes = d.strokes.map(function(d,i) { return [ d.x, d.y ] });
      return {
        'key_id': key.toString(),
        'countrycode': d.countrycode,
        'recognized': !!d.recognized, //convert to boolean
        'timestamp': d.timestamp * 1000, // turn it into milliseconds
        'drawing': strokes
      }
    })
    callback(null, drawings);
  })
}

parseBinaryDrawings("data/face.bin", function(err, drawings) {
  if(err) return console.error(err);
  drawings.forEach(function(d) {
    // Do something with the drawing
    console.log(d.key_id, d.countrycode)
  })
  console.log("# of drawings:", drawings.length)
})


================================================
FILE: examples/nodejs/ndjson.md
================================================
# Quick, Draw! ndjson data

The [Quick, Draw! dataset](https://github.com/googlecreativelab/quickdraw-dataset) uses
[ndjson](https://github.com/maxogden/ndjson) as one of the formats to store its millions of drawings.

We can use the [ndjson-cli](https://github.com/mbostock/ndjson-cli) utility to quickly create interesting subsets of this dataset.

The drawings (stroke data and associated metadata) are stored as one JSON object per line. e.g.:
```js
{
  "key_id":"5891796615823360",
  "word":"nose",
  "countrycode":"AE",
  "timestamp":"2017-03-01 20:41:36.70725 UTC",
  "recognized":true,
  "drawing":[[[129,128,129,129,130,130,131,132,132,133,133,133,133,...]]]
}
```

Each file represents all of the drawings for a given word. So, you can download the one you want.
For this exploration we will focus on the [simplified drawings](https://pantheon.corp.google.com/storage/browser/quickdraw_dataset/full/simplified)
because the files are about 10x smaller and the drawings look just as good.
We do lose timing information available in the raw data, so feel free to explore that when you are comfortable navigating the data (the format is pretty much exactly the same besides the added timing array and more points in the stroke data.)

# Let's explore the `face` collection!

One nice thing that you can do with `.ndjson` files are to quickly peek at the data using some simple Unix commands:

```bash
# look at the first 5 lines
cat face.ndjson | head -n 5
# look at the last 5 lines
cat face.ndjson | tail -n 5
```

## Filtering

Now let's take our first subset of the data by filtering:
```bash
# let's filter down to only the recognized drawings
cat face.ndjson | ndjson-filter 'd.recognized == true' | head -n 5
# How many recognized drawings are there?
cat face.ndjson | ndjson-filter 'd.recognized == true' | wc -l
# How about unrecognized?
cat face.ndjson | ndjson-filter 'd.recognized == false' | wc -l

# We can also filter down to a country we are interested in
cat face.ndjson | ndjson-filter 'd.recognized == true && d.countrycode == "CA"' | wc -l
```

## Sorting

For sorting, you can make things easier by including d3. This means you'll need to `npm install d3` in the directory from which you are calling these commands.
```bash
# sort by when the drawing was created
cat face.ndjson | ndjson-sort -r d3 'd3.ascending(a.timestamp, b.timestamp)' | head -n 5

# sort from the most complex drawings to the simplest (judged by how many strokes they use to draw)
cat face.ndjson | ndjson-sort -r d3 'd3.descending(a.drawing.length, b.drawing.length)' | head -n 5
```

## Saving to JSON
If you want to save out a subset as a regular JSON file, you can use `ndjson-reduce`:
```bash
# save to the file "canadian-faces.json"
cat face.ndjson | ndjson-filter 'd.recognized == true && d.countrycode == "CA"' | ndjson-reduce > canadian-faces.json

# You can combine these utilities to further filter down your data
cat face.ndjson | ndjson-filter 'd.recognized == true && d.countrycode == "CA"' | head -n 1000 | ndjson-reduce > canadian-faces.json

cat face.ndjson | ndjson-filter 'd.recognized == true && d.countrycode == "CA"' | ndjson-sort -r d3 'd3.descending(a.drawing.length, b.drawing.length)' | head -n 100 | ndjson-reduce > complex-faces.json
```


================================================
FILE: examples/nodejs/package.json
================================================
{
  "name": "quickdraw-node-demos",
  "version": "0.0.1",
  "description": "Sample code for parsing Quick, Draw! dataset in NodeJS",
  "main": "simplified-parser.js",
  "scripts": {
    "test": "echo \"Error: no test specified\" && exit 1"
  },
  "author": "Ian Johnson (enjalot@google.com)",
  "license": "Apache-2.0",
  "dependencies": {
    "binary-parser": "^1.1.5",
    "javascript-biginteger": "^0.9.2",
    "ndjson": "^1.5.0"
  },
  "devDependencies": {
    "d3": "^4.9.1",
    "ndjson-cli": "^0.3.0"
  }
}


================================================
FILE: examples/nodejs/simplified-parser.js
================================================
/*
Copyright 2017 Google Inc.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

https://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
/*
  Demonstration of parsing simplified ndjson files from  Quick, Draw! dataset with node.js.
  Read in all of the simplified drawings into memory and log out some properties.

  https://github.com/googlecreativelab/quickdraw-dataset
  https://quickdraw.withgoogle.com/data

  This demo assumes you've put the file "face-simple.ndjson" into a folder called "data"
  in the same directory as this script.
*/
var fs = require('fs');
var ndjson = require('ndjson'); // npm install ndjson

function parseSimplifiedDrawings(fileName, callback) {
  var drawings = [];
  var fileStream = fs.createReadStream(fileName)
  fileStream
    .pipe(ndjson.parse())
    .on('data', function(obj) {
      drawings.push(obj)
    })
    .on("error", callback)
    .on("end", function() {
      callback(null, drawings)
    });
}

parseSimplifiedDrawings("data/face-simple.ndjson", function(err, drawings) {
  if(err) return console.error(err);
  drawings.forEach(function(d) {
    // Do something with the drawing
    console.log(d.key_id, d.countrycode);
  })
  console.log("# of drawings:", drawings.length);
})

Download .txt

gitextract_0rv1hnhi/

├── LICENSE
├── README.md
├── categories.txt
└── examples/
    ├── binary_file_parser.py
    └── nodejs/
        ├── .gitignore
        ├── binary-parser.js
        ├── ndjson.md
        ├── package.json
        └── simplified-parser.js

Download .txt

SYMBOL INDEX (4 symbols across 3 files)

FILE: examples/binary_file_parser.py
  function unpack_drawing (line 19) | def unpack_drawing(file_handle):
  function unpack_drawings (line 42) | def unpack_drawings(filename):

FILE: examples/nodejs/binary-parser.js
  function parseBinaryDrawings (line 54) | function parseBinaryDrawings(fileName, callback) {

FILE: examples/nodejs/simplified-parser.js
  function parseSimplifiedDrawings (line 29) | function parseSimplifiedDrawings(fileName, callback) {

Download .json

Condensed preview — 9 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (31K chars).

[
  {
    "path": "LICENSE",
    "chars": 152,
    "preview": "This data made available by Google, Inc. under the Creative Commons Attribution 4.0 International license.\nhttps://creat"
  },
  {
    "path": "README.md",
    "chars": 16991,
    "preview": "# The Quick, Draw! Dataset\n![preview](preview.jpg)\n\nThe Quick Draw Dataset is a collection of 50 million drawings across"
  },
  {
    "path": "categories.txt",
    "chars": 2791,
    "preview": "aircraft carrier\nairplane\nalarm clock\nambulance\nangel\nanimal migration\nant\nanvil\napple\narm\nasparagus\naxe\nbackpack\nbanana"
  },
  {
    "path": "examples/binary_file_parser.py",
    "chars": 1660,
    "preview": "# Copyright 2017 Google Inc.\n# \n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# you may not use this"
  },
  {
    "path": "examples/nodejs/.gitignore",
    "chars": 13,
    "preview": "node_modules\n"
  },
  {
    "path": "examples/nodejs/binary-parser.js",
    "chars": 2783,
    "preview": "/*\nCopyright 2017 Google Inc.\n\nLicensed under the Apache License, Version 2.0 (the \"License\");\nyou may not use this file"
  },
  {
    "path": "examples/nodejs/ndjson.md",
    "chars": 3265,
    "preview": "# Quick, Draw! ndjson data\n\nThe [Quick, Draw! dataset](https://github.com/googlecreativelab/quickdraw-dataset) uses\n[ndj"
  },
  {
    "path": "examples/nodejs/package.json",
    "chars": 514,
    "preview": "{\n  \"name\": \"quickdraw-node-demos\",\n  \"version\": \"0.0.1\",\n  \"description\": \"Sample code for parsing Quick, Draw! dataset"
  },
  {
    "path": "examples/nodejs/simplified-parser.js",
    "chars": 1651,
    "preview": "/*\nCopyright 2017 Google Inc.\n\nLicensed under the Apache License, Version 2.0 (the \"License\");\nyou may not use this file"
  }
]

About this extraction

This page contains the full source code of the googlecreativelab/quickdraw-dataset GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 9 files (29.1 KB), approximately 8.4k tokens, and a symbol index with 4 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo