[
  {
    "path": ".gitignore",
    "content": "# Logs\nlogs\n*.log\nnpm-debug.log*\nyarn-debug.log*\nyarn-error.log*\nlerna-debug.log*\n\n# Diagnostic reports (https://nodejs.org/api/report.html)\nreport.[0-9]*.[0-9]*.[0-9]*.[0-9]*.json\n\n# Runtime data\npids\n*.pid\n*.seed\n*.pid.lock\n\n# Directory for instrumented libs generated by jscoverage/JSCover\nlib-cov\n\n# Coverage directory used by tools like istanbul\ncoverage\n*.lcov\n\n# nyc test coverage\n.nyc_output\n\n# Grunt intermediate storage (https://gruntjs.com/creating-plugins#storing-task-files)\n.grunt\n\n# Bower dependency directory (https://bower.io/)\nbower_components\n\n# node-waf configuration\n.lock-wscript\n\n# Compiled binary addons (https://nodejs.org/api/addons.html)\nbuild/Release\n\n# Dependency directories\nnode_modules/\njspm_packages/\n\n# TypeScript v1 declaration files\ntypings/\n\n# TypeScript cache\n*.tsbuildinfo\n\n# Optional npm cache directory\n.npm\n\n# Optional eslint cache\n.eslintcache\n\n# Microbundle cache\n.rpt2_cache/\n.rts2_cache_cjs/\n.rts2_cache_es/\n.rts2_cache_umd/\n\n# Optional REPL history\n.node_repl_history\n\n# Output of 'npm pack'\n*.tgz\n\n# Yarn Integrity file\n.yarn-integrity\n\n# dotenv environment variables file\n.env\n.env.test\n\n# parcel-bundler cache (https://parceljs.org/)\n.cache\n\n# Next.js build output\n.next\n\n# Nuxt.js build / generate output\n.nuxt\ndist\n\n# Gatsby files\n.cache/\n# Comment in the public line in if your project uses Gatsby and *not* Next.js\n# https://nextjs.org/blog/next-9-1#public-directory-support\n# public\n\n# vuepress build output\n.vuepress/dist\n\n# Serverless directories\n.serverless/\n\n# FuseBox cache\n.fusebox/\n\n# DynamoDB Local files\n.dynamodb/\n\n# TernJS port file\n.tern-port\n"
  },
  {
    "path": "LICENSE",
    "content": "MIT License\n\nCopyright (c) 2020 Filipe Deschamps\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\nSOFTWARE.\n"
  },
  {
    "path": "README.md",
    "content": "# Parse Google Docs JSON\n\nThis **Node.js** module authenticates with **Google API** and parse **Google Docs** to human-readable **JSON** or **Markdown** without the need to use cumbersome methods like exporting it in HTML via **Google Drive API** and then parse it back to other formats.\n\n# Why\n\nWhen you use **Google Docs API V1**, the [body](https://developers.google.com/docs/api/reference/rest/v1/documents#Body) that comes with the `documents.get` method is completely fragmented. It's a JSON that you need to recursively parse to get the document into human-readable format. For my luck, there's a Gatsby plugin that internally has this implementation already: [gatsby-source-google-docs](https://github.com/cedricdelpoux/gatsby-source-google-docs). So I've extracted this implementation into this module and exposed it with a **Service Authentication**. For more information about this type of authentication, follow this tutorial: [How to authenticate to any Google API](https://flaviocopes.com/google-api-authentication)\n\n# Warning\n\nThis module works like a charm, but it's for personal use, primarily. It will follow semantic version best practices, but will not have any automated tests in the short term.\n\n# How to use\n\n```js\nconst parseGoogleDocsJson = require(\"parse-google-docs-json\");\n\nasync function start() {\n  const parsed = await parseGoogleDocsJson({\n    documentId: \"1ymKw2OGcMfc02XdEEWdy22a_zUAlCxyN3P5Ab4c\",\n    clientEmail: \"service@iam.gserviceaccount.com\",\n    privateKey: \"-----BEGIN PRIVATE KEY...\",\n  });\n\n  console.log(parsed.toJson());\n  console.log(parsed.toMarkdown());\n}\n\nstart();\n```\n\n# Environment variables\n\n```\nclientEmail = process.env.PARSE_GOOGLE_DOCS_CLIENT_EMAIL\nprivateKey = process.env.PARSE_GOOGLE_DOCS_PRIVATE_KEY\n```\n"
  },
  {
    "path": "index.d.ts",
    "content": "declare module 'parse-google-docs-json' {\n  interface Configuration {\n    clientEmail?:string,\n    privateKey?:string,\n    documentId?: string\n  }\n\n  namespace parseGoogleDocs {}\n\n  function parseGoogleDocs( configuration?: Configuration): {\n    toJson:()=>{\n      cover: {\n        image: string\n        title: string\n        alt: string\n      }\n      content: any[]\n      metadata: {\n        title: string\n      }\n    },\n    toMarkdown:()=> string\n  }\n\n  export = parseGoogleDocs\n}"
  },
  {
    "path": "package.json",
    "content": "{\n  \"name\": \"parse-google-docs-json\",\n  \"version\": \"4.0.0\",\n  \"description\": \"This Node.js module authenticates with Google API and parse Google Docs to human-readable JSON or Markdown.\",\n  \"main\": \"source/index.js\",\n  \"scripts\": {\n    \"test\": \"echo \\\"Error: no test specified\\\" && exit 1\"\n  },\n  \"repository\": {\n    \"type\": \"git\",\n    \"url\": \"git+https://github.com/filipedeschamps/parse-google-docs-json.git\"\n  },\n  \"keywords\": [\n    \"google\",\n    \"api\",\n    \"google\",\n    \"docs\",\n    \"html\",\n    \"markdown\"\n  ],\n  \"author\": \"Filipe Deschamps\",\n  \"license\": \"MIT\",\n  \"bugs\": {\n    \"url\": \"https://github.com/filipedeschamps/parse-google-docs-json/issues\"\n  },\n  \"homepage\": \"https://github.com/filipedeschamps/parse-google-docs-json#readme\",\n  \"typings\": \"index.d.ts\",\n  \"dependencies\": {\n    \"googleapis\": \"148.0.0\",\n    \"json2md\": \"1.7.1\",\n    \"lodash.get\": \"4.4.2\",\n    \"lodash.last\": \"3.0.0\",\n    \"lodash.repeat\": \"4.1.0\",\n    \"yamljs\": \"0.3.0\"\n  }\n}\n"
  },
  {
    "path": "source/index.js",
    "content": "const { google } = require(\"googleapis\");\n\nconst {\n  convertGoogleDocumentToJson,\n  convertJsonToMarkdown,\n} = require(\"./parser.js\");\n\nasync function parseGoogleDocs(configuration = {}) {\n  const clientEmail =\n    configuration.clientEmail || process.env.PARSE_GOOGLE_DOCS_CLIENT_EMAIL;\n  const privateKey =\n    configuration.privateKey || process.env.PARSE_GOOGLE_DOCS_PRIVATE_KEY;\n  const documentId = configuration.documentId;\n  const scopes = [\"https://www.googleapis.com/auth/documents.readonly\"];\n\n  if (!clientEmail) {\n    throw new Error('Please, provide \"clientEmail\" in the constructor');\n  }\n\n  if (!privateKey) {\n    throw new Error('Please, provide \"privateKey\" in the constructor');\n  }\n\n  if (!documentId) {\n    throw new Error('Please, provide \"documentId\" in the constructor');\n  }\n\n  const auth = new google.auth.JWT({\n    email: clientEmail,\n    key: privateKey,\n    scopes: scopes,\n  });\n\n  const docs = google.docs({ version: \"v1\", auth });\n\n  const docsResponse = await docs.documents.get({\n    documentId: documentId,\n  });\n\n  function toJson() {\n    const jsonDocument = convertGoogleDocumentToJson(docsResponse.data);\n\n    return {\n      metadata: { title: docsResponse.data.title },\n      ...jsonDocument,\n    };\n  }\n\n  function toMarkdown() {\n    const documentInJson = convertGoogleDocumentToJson(docsResponse.data);\n    return convertJsonToMarkdown(documentInJson);\n  }\n\n  return {\n    toJson,\n    toMarkdown,\n  };\n}\n\nmodule.exports = parseGoogleDocs;\n"
  },
  {
    "path": "source/parser.js",
    "content": "const json2md = require(\"json2md\");\nconst YAML = require(\"yamljs\");\nconst _last = require(\"lodash.last\");\nconst _get = require(\"lodash.get\");\nconst _repeat = require(\"lodash.repeat\");\n\nfunction getParagraphTag(p) {\n  const tags = {\n    NORMAL_TEXT: \"p\",\n    SUBTITLE: \"blockquote\",\n    HEADING_1: \"h1\",\n    HEADING_2: \"h2\",\n    HEADING_3: \"h3\",\n    HEADING_4: \"h4\",\n    HEADING_5: \"h5\",\n  };\n\n  return tags[p.paragraphStyle.namedStyleType];\n}\n\nfunction getListTag(list) {\n  const glyphType = _get(list, [\n    \"listProperties\",\n    \"nestingLevels\",\n    0,\n    \"glyphType\",\n  ]);\n  return glyphType !== undefined ? \"ol\" : \"ul\";\n}\n\nfunction cleanText(text) {\n  return text.replace(/\\n/g, \"\").trim();\n}\n\nfunction getNestedListIndent(level, listTag) {\n  const indentType = listTag === \"ol\" ? \"1.\" : \"-\";\n  return `${_repeat(\"  \", level)}${indentType} `;\n}\n\nfunction getTextFromParagraph(p) {\n  return p.elements\n    ? p.elements\n        .filter((el) => el.textRun && el.textRun.content !== \"\\n\")\n        .map((el) => (el.textRun ? getText(el) : \"\"))\n        .join(\"\")\n    : \"\";\n}\n\nfunction getTableCellContent(content) {\n  if (!content.length === 0) return \"\";\n  return content\n    .map(({ paragraph }) => cleanText(getTextFromParagraph(paragraph)))\n    .join(\"\");\n}\n\nfunction getImage(document, element) {\n  const { inlineObjects } = document;\n\n  if (!inlineObjects || !element.inlineObjectElement) {\n    return null;\n  }\n\n  const inlineObject =\n    inlineObjects[element.inlineObjectElement.inlineObjectId];\n  const embeddedObject = inlineObject.inlineObjectProperties.embeddedObject;\n\n  if (embeddedObject && embeddedObject.imageProperties) {\n    return {\n      source: embeddedObject.imageProperties.contentUri,\n      title: embeddedObject.title || \"\",\n      alt: embeddedObject.description || \"\",\n    };\n  }\n\n  return null;\n}\n\nfunction getBulletContent(document, element) {\n  if (element.inlineObjectElement) {\n    const image = getImage(document, element);\n    return `![${image.alt}](${image.source} \"${image.title}\")`;\n  }\n\n  return getText(element);\n}\n\nfunction getText(element, { isHeader = false } = {}) {\n  let text = cleanText(element.textRun.content);\n  const {\n    link,\n    underline,\n    strikethrough,\n    bold,\n    italic,\n  } = element.textRun.textStyle;\n\n  text = text.replace(/\\*/g, \"\\\\*\");\n  text = text.replace(/_/g, \"\\\\_\");\n\n  if (underline) {\n    // Underline isn't supported in markdown so we'll use emphasis\n    text = `_${text}_`;\n  }\n\n  if (italic) {\n    text = `_${text}_`;\n  }\n\n  // Set bold unless it's a header\n  if (bold & !isHeader) {\n    text = `**${text}**`;\n  }\n\n  if (strikethrough) {\n    text = `~~${text}~~`;\n  }\n\n  if (link) {\n    return `[${text}](${link.url})`;\n  }\n\n  return text;\n}\n\nfunction getCover(document) {\n  const { headers, documentStyle } = document;\n\n  if (\n    !documentStyle ||\n    !documentStyle.firstPageHeaderId ||\n    !headers[documentStyle.firstPageHeaderId]\n  ) {\n    return null;\n  }\n\n  const headerElement = _get(headers[documentStyle.firstPageHeaderId], [\n    \"content\",\n    0,\n    \"paragraph\",\n    \"elements\",\n    0,\n  ]);\n\n  const image = getImage(document, headerElement);\n\n  return image\n    ? {\n        image: image.source,\n        title: image.title,\n        alt: image.alt,\n      }\n    : null;\n}\n\nfunction convertGoogleDocumentToJson(document) {\n  const { body, footnotes = {} } = document;\n  const cover = getCover(document);\n\n  const content = [];\n  const footnoteIDs = {};\n\n  body.content.forEach(({ paragraph, table }, i) => {\n    // Paragraphs\n    if (paragraph) {\n      const tag = getParagraphTag(paragraph);\n\n      // Lists\n      if (paragraph.bullet) {\n        const listId = paragraph.bullet.listId;\n        const list = document.lists[listId];\n        const listTag = getListTag(list);\n\n        const bulletContent = paragraph.elements\n          .map((el) => getBulletContent(document, el))\n          .join(\" \")\n          .replace(\" .\", \".\")\n          .replace(\" ,\", \",\");\n\n        const prev = body.content[i - 1];\n        const prevListId = _get(prev, \"paragraph.bullet.listId\");\n\n        if (prevListId === listId) {\n          const list = _last(content)[listTag];\n          const { nestingLevel } = paragraph.bullet;\n\n          if (nestingLevel !== undefined) {\n            // mimic nested lists\n            const lastIndex = list.length - 1;\n            const indent = getNestedListIndent(nestingLevel, listTag);\n\n            list[lastIndex] += `\\n${indent} ${bulletContent}`;\n          } else {\n            list.push(bulletContent);\n          }\n        } else {\n          content.push({\n            [listTag]: [bulletContent],\n          });\n        }\n      }\n\n      // Headings, Images, Texts\n      else if (tag) {\n        let tagContent = [];\n\n        paragraph.elements.forEach((el) => {\n          // EmbeddedObject\n          if (el.inlineObjectElement) {\n            const image = getImage(document, el);\n\n            if (image) {\n              tagContent.push({\n                img: image,\n              });\n            }\n          }\n\n          // Headings, Texts\n          else if (el.textRun && el.textRun.content !== \"\\n\") {\n            tagContent.push({\n              [tag]: getText(el, {\n                isHeader: tag !== \"p\",\n              }),\n            });\n          }\n\n          // Footnotes\n          else if (el.footnoteReference) {\n            tagContent.push({\n              [tag]: `[^${el.footnoteReference.footnoteNumber}]`,\n            });\n            footnoteIDs[el.footnoteReference.footnoteId] =\n              el.footnoteReference.footnoteNumber;\n          }\n        });\n\n        if (tagContent.every((el) => el[tag] !== undefined)) {\n          content.push({\n            [tag]: tagContent\n              .map((el) => el[tag])\n              .join(\" \")\n              .replace(\" .\", \".\")\n              .replace(\" ,\", \",\"),\n          });\n        } else {\n          content.push(...tagContent);\n        }\n      }\n    }\n\n    // Table\n    else if (table && table.tableRows.length > 0) {\n      const [thead, ...tbody] = table.tableRows;\n      content.push({\n        table: {\n          headers: thead.tableCells.map(({ content }) =>\n            getTableCellContent(content)\n          ),\n          rows: tbody.map((row) =>\n            row.tableCells.map(({ content }) => getTableCellContent(content))\n          ),\n        },\n      });\n    }\n  });\n\n  // Footnotes reference section (end of document)\n  let formatedFootnotes = [];\n  Object.entries(footnotes).forEach(([, value]) => {\n    // Concatenate all content\n    const text_items = value.content[0].paragraph.elements.map((element) =>\n      getText(element)\n    );\n    const text = text_items.join(\" \").replace(\" .\", \".\").replace(\" ,\", \",\");\n\n    formatedFootnotes.push({\n      footnote: { number: footnoteIDs[value.footnoteId], text: text },\n    });\n  });\n  formatedFootnotes.sort(\n    (item1, item2) =>\n      parseInt(item1.footnote.number) - parseInt(item2.footnote.number)\n  );\n  content.push(...formatedFootnotes);\n  return {\n    cover,\n    content,\n  };\n}\n\n// Add extra converter for footnotes\njson2md.converters.footnote = function (footnote) {\n  return `[^${footnote.number}]: ${footnote.text}`;\n};\n\nfunction convertJsonToMarkdown({ content, metadata }) {\n  // Do NOT move the formatting of the following lines\n  // to prevent markdown parsing errors\n  return `---\n${YAML.stringify(metadata)}\n---\n\n${json2md(content)}`;\n}\n\nmodule.exports = { convertGoogleDocumentToJson, convertJsonToMarkdown };\n"
  }
]