[
  {
    "path": ".github/workflows/publish-npm.yml",
    "content": "name: NPM publish\n\non:\n  push:\n    tags: \n      - 'v*'\n\njobs:\n  publish-npm:\n    runs-on: ubuntu-latest\n    steps:\n      - uses: actions/checkout@v1\n      - uses: actions/setup-node@v1\n        with:\n          node-version: 12\n          registry-url: https://registry.npmjs.org/\n      - run: yarn\n      - run: yarn build\n      - run: yarn test\n      - run: npm publish\n        env:\n          NODE_AUTH_TOKEN: ${{secrets.npm_token}}\n"
  },
  {
    "path": ".github/workflows/test.yml",
    "content": "name: Test\n\non: [push, pull_request]\n\njobs:\n  build:\n\n    runs-on: ubuntu-latest\n\n    strategy:\n      matrix:\n        node-version: [12.x]\n\n    steps:\n    - uses: actions/checkout@v1\n    - name: Use Node.js ${{ matrix.node-version }}\n      uses: actions/setup-node@v1\n      with:\n        node-version: ${{ matrix.node-version }}\n    - name: install, build, and test\n      run: |\n        yarn\n        yarn build\n        yarn test\n      env:\n        CI: true\n"
  },
  {
    "path": ".gitignore",
    "content": "node_modules\nlib\n"
  },
  {
    "path": ".npmignore",
    "content": "node_modules\nfixtures\nexamples\n.vscode\n"
  },
  {
    "path": ".prettierrc",
    "content": "{\n  \"trailingComma\": \"all\",\n  \"tabWidth\": 2,\n  \"semi\": false,\n  \"singleQuote\": true,\n  \"printWidth\": 100,\n  \"arrowParens\": \"always\"\n}\n"
  },
  {
    "path": ".vscode/settings.json",
    "content": "{\n  \"debug.node.autoAttach\": \"on\"\n}"
  },
  {
    "path": "CHANGELOG.md",
    "content": "# Changelog\n\nAll notable changes to this project will be documented in this file. See [standard-version](https://github.com/conventional-changelog/standard-version) for commit guidelines.\n\n## [2.0.4](https://github.com/gaoxiaoliangz/epub-parser/compare/v2.0.3...v2.0.4) (2021-07-26)\n\n\n\n## [2.0.3](https://github.com/gaoxiaoliangz/epub-parser/compare/v2.0.2...v2.0.3) (2020-07-02)\n\n\n### Bug Fixes\n\n* fix issue [#18](https://github.com/gaoxiaoliangz/epub-parser/issues/18) ([d979ff4](https://github.com/gaoxiaoliangz/epub-parser/commit/d979ff46b4dee8247af2e363f646690316505e43))\n\n\n\n# Changelog\n\n## v2.0.2 (2019-11-22)\n\n- fixed <https://github.com/gaoxiaoliangz/epub-parser/issues/13>\n"
  },
  {
    "path": "README.md",
    "content": "# 📖 epub-parser\n\n> A powerful yet easy-to-use epub parser\n\n[![npm version](https://badge.fury.io/js/%40gxl%2Fepub-parser.svg)](https://badge.fury.io/js/%40gxl%2Fepub-parser)\n![Test](https://github.com/gaoxiaoliangz/epub-parser/workflows/Test/badge.svg)\n\nThe package exports a simple parser function which use epub file as input and output JavaScript object.\n\nAs it is written in TypeScript, types are already included in the package.\n\n## Install\n\n```bash\nnpm install @gxl/epub-parser --save\n```\n\nor if you prefer yarn\n\n```bash\nyarn add @gxl/epub-parser\n```\n\n## Usage\n\n```js\nimport { parseEpub } from '@gxl/epub-parser'\n\nconst epubObj = await parseEpub('/path/to/file.epub', {\n  type: 'path',\n})\n\nconsole.log('epub content:', epubObj)\n```\n\n### parseEpub(target: string | buffer, options?: object): EpubObject\n\n#### target\n\ntype: `string` or `buffer`\n\nIt can be the path to the file or file's binary string or buffer\n\n#### options\n\ntype: `object`\n\n##### type(optional): 'binaryString' | 'path' | 'buffer'\n\nIt forces the parser to treat supplied target as the defined type, if not defined the parser itself will decide how to treat the file (useful when you are not sure if the path is valid).\n\n#### EpubObject\n\nThe output is an object which contains `structure`, `sections`, `info`(private property names start with `_`. I don't recommend using them, since they are subscribed to change).\n\n`structure` is the parsed `toc` of epub file, they contain information about how the book is constructed.\n\n`sections` is an array of chapters or sections under chapters, they are referred in `structure`. Each section object contains the raw html string and a few handy methods.\n\n- `Section.prototype.toMarkdown`: convert to markdown object.\n- `Section.prototype.toHtmlObjects`: convert to html object. And a note about `src` and `href`, the `src` and `href` in raw html stay untouched, but the `toHtmlObjects` method resolves `src` to base64 string, and alters `href` so that they make sense in the parsed epub. And the parsed `href` is something like `#{sectionId},{hash}`.\n\n## How to contribute\n\n- Raise an issue in the issue section.\n- PRs are the best.\n\n❤️\n"
  },
  {
    "path": "_config.yml",
    "content": "theme: jekyll-theme-merlot"
  },
  {
    "path": "examples/simple/main.js",
    "content": "// @ts-check\nconst { parseEpub } = require('../../lib')\n\nparseEpub('../../fixtures/zhihu.epub').then(result => {\n  console.log('result object has keys: ', Object.keys(result))\n  console.log('book info', result.info)\n  console.log('book structure', result.structure)\n  console.log('the book has', result.sections.length, 'sections')\n  console.log('here is first section')\n\n  const showSection = idx => {\n    console.log(`-------- section index ${idx} --------`)\n    console.log(result.sections[idx])\n    console.log('toMarkdown')\n    console.log(result.sections[idx].toMarkdown())\n    console.log('toHtmlObjects')\n    const htmlObjects = result.sections[idx].toHtmlObjects()\n    console.log(htmlObjects)\n  }\n\n  showSection(2)\n\n  // this section contains images which are converted to base64\n  // showSection(4)\n})\n"
  },
  {
    "path": "examples/simple/package.json",
    "content": "{\n  \"name\": \"simple\",\n  \"version\": \"1.0.0\",\n  \"description\": \"\",\n  \"scripts\": {\n    \"start\": \"node main.js\",\n    \"debug\": \"node --inspect main.js\"\n  },\n  \"author\": \"gaoxiaoliangz\",\n  \"license\": \"ISC\"\n}\n"
  },
  {
    "path": "fixtures/file-e.epub",
    "content": "pretend to be one\nhehe"
  },
  {
    "path": "jest.config.js",
    "content": "module.exports = {\n  preset: 'ts-jest',\n  testEnvironment: 'node',\n  modulePathIgnorePatterns: ['lib'],\n}\n"
  },
  {
    "path": "package.json",
    "content": "{\n  \"name\": \"@gxl/epub-parser\",\n  \"version\": \"2.0.4\",\n  \"description\": \"A powerful yet easy-to-use epub parser\",\n  \"main\": \"lib/index.js\",\n  \"scripts\": {\n    \"prebuild\": \"yarn clean\",\n    \"build\": \"tsc\",\n    \"watch\": \"tsc --watch\",\n    \"clean\": \"rimraf lib\",\n    \"format\": \"prettier --write \\\"src/**/*.{js,jsx,ts,tsx,json,md,css,scss}\\\"\",\n    \"test\": \"jest\",\n    \"test-debug\": \"node --inspect-brk -r ts-node/register node_modules/.bin/jest --runInBand\",\n    \"v\": \"standard-version --preset angular\",\n    \"postv\": \"git push --follow-tags origin master\",\n    \"prepare\": \"yarn build\"\n  },\n  \"repository\": {\n    \"type\": \"git\",\n    \"url\": \"git+https://github.com/gaoxiaoliangz/epub-parser.git\"\n  },\n  \"keywords\": [\n    \"epub-parser\",\n    \"parser\",\n    \"epub\",\n    \"easy\",\n    \"book\",\n    \"file\"\n  ],\n  \"author\": \"gaoxiaoliangz\",\n  \"license\": \"MIT\",\n  \"bugs\": {\n    \"url\": \"https://github.com/gaoxiaoliangz/epub-parser/issues\"\n  },\n  \"homepage\": \"https://github.com/gaoxiaoliangz/epub-parser#readme\",\n  \"dependencies\": {\n    \"jsdom\": \"^15.1.1\",\n    \"lodash\": \"^4.17.15\",\n    \"node-zip\": \"^1.1.1\",\n    \"to-markdown\": \"^3.1.1\",\n    \"xml2js\": \"^0.4.19\"\n  },\n  \"devDependencies\": {\n    \"@types/express\": \"^4.17.1\",\n    \"@types/jest\": \"^24.0.18\",\n    \"@types/jsdom\": \"^12.2.4\",\n    \"@types/lodash\": \"^4.14.137\",\n    \"@types/node\": \"^12.7.2\",\n    \"@types/xml2js\": \"^0.4.4\",\n    \"cross-env\": \"^5.2.0\",\n    \"dotenv\": \"^8.1.0\",\n    \"express\": \"^4.17.1\",\n    \"jest\": \"^24.9.0\",\n    \"prettier\": \"^2.0.5\",\n    \"rimraf\": \"^3.0.0\",\n    \"source-map-support\": \"^0.5.13\",\n    \"standard-version\": \"^8.0.0\",\n    \"ts-jest\": \"^24.0.2\",\n    \"ts-node\": \"^8.3.0\",\n    \"tslint\": \"^5.19.0\",\n    \"typescript\": \"^3.9.6\",\n    \"vrsource-tslint-rules\": \"^6.0.0\"\n  }\n}\n"
  },
  {
    "path": "src/index.ts",
    "content": "import parseEpub from './parseEpub'\nimport parseLink from './parseLink'\nimport parseHTML from './parseHTML'\n\nexport { parseLink, parseHTML, parseEpub }\n"
  },
  {
    "path": "src/mdConverters.ts",
    "content": "import parseLink from './parseLink'\n\nexport const resolveInlineNavHref = (href: string) => {\n  if (href && href.indexOf('http://') === -1) {\n    const parsed = parseLink(href)\n\n    if (parsed.hash) {\n      return `#${parsed.name}$${parsed.hash}`\n    }\n\n    return `#${parsed.name}`\n  }\n\n  return href\n}\n\nexport const h = {\n  filter: ['h1', 'h2', 'h3', 'h4', 'h5', 'h6'],\n\n  replacement: function (innerHTML: string, node: HTMLElement) {\n    let hLevel = node.tagName.charAt(1) as any\n    let hPrefix = ''\n\n    for (let i = 0; i < hLevel; i++) {\n      hPrefix += '#'\n    }\n\n    // return `\\n${hPrefix} ${innerHTML.trim()}\\n\\n`\n    const hTag = node.tagName.toLowerCase()\n    const id = node.getAttribute('id')\n\n    if (!id) {\n      return `\\n${hPrefix} ${innerHTML}\\n\\n`\n    }\n\n    // 块级元素若保留原标签需添加换行符，否则临近元素渲染会出现问题\n    return `\\n<${hTag} id=\"${id}\">${innerHTML.trim().split('\\n').join(' ')}</${hTag}>\\n\\n`\n  },\n}\n\nexport const span = {\n  filter: ['span'],\n\n  replacement: function (innerHTML: string) {\n    return innerHTML\n  },\n}\n\nexport const a = {\n  filter: ['a'],\n\n  replacement: function (innerHTML: string, node: HTMLEmbedElement) {\n    const href = node.getAttribute('href')\n    return `\\n[${innerHTML}](${resolveInlineNavHref(href!)})\\n\\n`\n  },\n}\n\nexport const div = {\n  filter: ['div'],\n\n  replacement: function (innerHTML: string) {\n    return `\\n${innerHTML}\\n\\n`\n  },\n}\n\nexport const img = {\n  filter: ['img'],\n\n  replacement: function (innerHTML: string) {\n    return `\\n[PIC]\\n\\n`\n  },\n}\n"
  },
  {
    "path": "src/parseEpub.spec.ts",
    "content": "import parser from './parseEpub'\nimport _ from 'lodash'\nimport * as path from 'path'\n\nconst baseDir = process.cwd()\nconst filesToBeTested = ['file-1', 'file-2', 'file-3', 'file-4', 'file-1-no-toc', 'wells']\n\nconst testFile = (filename: string) => {\n  describe(`parser 测试 ${filename}.epub`, () => {\n    const fileContent = parser(path.join(baseDir, `fixtures/${filename}.epub`), {\n      type: 'path',\n      expand: true,\n    })\n\n    test('Result should have keys', async () => {\n      const keys = _.keys(await fileContent)\n      expect(keys.length).not.toBe(0)\n    })\n\n    test('toc', async () => {\n      const result = await fileContent\n      if (filename === 'file-1-no-toc') {\n        expect(result.structure).toBe(undefined)\n      } else {\n        expect(fileContent && typeof fileContent).toBe('object')\n      }\n    })\n\n    // it('key 分别为: flesh, nav, meta', done => {\n    //   const expectedKeys = ['flesh', 'nav', 'meta']\n\n    //   fileContent.then(result => {\n    //     const keys = _.keys(result)\n    //     keys.forEach(key => {\n    //       expect(expectedKeys.indexOf(key)).to.not.be(-1)\n    //     })\n    //     done()\n    //   })\n    // })\n  })\n}\n\nfilesToBeTested.forEach((filename) => {\n  testFile(filename)\n})\n"
  },
  {
    "path": "src/parseEpub.ts",
    "content": "import fs from 'fs'\nimport xml2js from 'xml2js'\nimport _ from 'lodash'\n// @ts-ignore\nimport nodeZip from 'node-zip'\nimport parseLink from './parseLink'\nimport parseSection, { Section } from './parseSection'\nimport { GeneralObject } from './types'\n\nconst xmlParser = new xml2js.Parser()\n\nconst xmlToJs = (xml: string) => {\n  return new Promise<any>((resolve, reject) => {\n    xmlParser.parseString(xml, (err: Error, object: GeneralObject) => {\n      if (err) {\n        reject(err)\n      } else {\n        resolve(object)\n      }\n    })\n  })\n}\n\nconst determineRoot = (opfPath: string) => {\n  let root = ''\n  // set the opsRoot for resolving paths\n  if (opfPath.match(/\\//)) {\n    // not at top level\n    root = opfPath.replace(/\\/([^\\/]+)\\.opf/i, '')\n    if (!root.match(/\\/$/)) {\n      // 以 '/' 结尾，下面的 zip 路径写法会简单很多\n      root += '/'\n    }\n    if (root.match(/^\\//)) {\n      root = root.replace(/^\\//, '')\n    }\n  }\n  return root\n}\n\nconst parseMetadata = (metadata: GeneralObject[]) => {\n  const title = _.get(metadata[0], ['dc:title', 0]) as string\n  let author = _.get(metadata[0], ['dc:creator', 0]) as string\n\n  if (typeof author === 'object') {\n    author = _.get(author, ['_']) as string\n  }\n\n  const publisher = _.get(metadata[0], ['dc:publisher', 0]) as string\n  const meta = {\n    title,\n    author,\n    publisher,\n  }\n  return meta\n}\n\nexport class Epub {\n  private _zip: any // nodeZip instance\n  private _opfPath?: string\n  private _root?: string\n  private _content?: GeneralObject\n  private _manifest?: any[]\n  private _spine?: string[] // array of ids defined in manifest\n  private _toc?: GeneralObject\n  private _metadata?: GeneralObject\n  structure?: GeneralObject\n  info?: {\n    title: string\n    author: string\n    publisher: string\n  }\n  sections?: Section[]\n\n  constructor(buffer: Buffer) {\n    this._zip = new nodeZip(buffer, { binary: true, base64: false, checkCRC32: true })\n  }\n\n  resolve(\n    path: string,\n  ): {\n    asText: () => string\n  } {\n    let _path\n    if (path[0] === '/') {\n      // use absolute path, root is zip root\n      _path = path.substr(1)\n    } else {\n      _path = this._root + path\n    }\n    const file = this._zip.file(decodeURI(_path))\n    if (file) {\n      return file\n    } else {\n      throw new Error(`${_path} not found!`)\n    }\n  }\n\n  async _resolveXMLAsJsObject(path: string) {\n    const xml = this.resolve(path).asText()\n    return xmlToJs(xml)\n  }\n\n  private async _getOpfPath() {\n    const container = await this._resolveXMLAsJsObject('/META-INF/container.xml')\n    const opfPath = container.container.rootfiles[0].rootfile[0]['$']['full-path']\n    return opfPath\n  }\n\n  _getManifest(content: GeneralObject) {\n    return _.get(content, ['package', 'manifest', 0, 'item'], []).map(\n      (item: any) => item.$,\n    ) as any[]\n  }\n\n  _resolveIdFromLink(href: string) {\n    const { name: tarName } = parseLink(href)\n    const tarItem = _.find(this._manifest, (item) => {\n      const { name } = parseLink(item.href)\n      return name === tarName\n    })\n    return _.get(tarItem, 'id')\n  }\n\n  _getSpine() {\n    return _.get(this._content, ['package', 'spine', 0, 'itemref'], []).map(\n      (item: GeneralObject) => {\n        return item.$.idref\n      },\n    )\n  }\n\n  _genStructureForHTML(tocObj: GeneralObject) {\n    const tocRoot = tocObj.html.body[0].nav[0]['ol'][0].li;\n    let runningIndex = 1;\n\n    const parseHTMLNavPoints = (navPoint: GeneralObject) => {\n      const element = navPoint.a[0] || {};\n      const path = element['$'].href;\n      let name = element['_'];\n      const prefix = element.span;\n      if (prefix) {\n        name = `${prefix.map((p: GeneralObject) => p['_']).join('')}${name}`;\n      }\n      const sectionId = this._resolveIdFromLink(path);\n      const { hash: nodeId } = parseLink(path)\n      const playOrder = runningIndex;\n\n      let children = navPoint?.ol?.[0]?.li;\n\n      if (children) {\n        children = parseOuterHTML(children);\n      }\n\n      runningIndex++;\n\n      return {\n        name,\n        sectionId,\n        nodeId,\n        path,\n        playOrder,\n        children,\n      };\n    };\n\n    const parseOuterHTML = (collection: GeneralObject[]) => {\n      return collection.map((point) => {\n        return parseHTMLNavPoints(point);\n      });\n    }\n\n    return parseOuterHTML(tocRoot);\n  }\n\n  _genStructure(tocObj: GeneralObject, resolveNodeId = false) {\n    if (tocObj.html) {\n      return this._genStructureForHTML(tocObj);\n    }\n\n    const rootNavPoints = _.get(tocObj, ['ncx', 'navMap', '0', 'navPoint'], [])\n\n    const parseNavPoint = (navPoint: GeneralObject) => {\n      // link to section\n      const path = _.get(navPoint, ['content', '0', '$', 'src'], '')\n      const name = _.get(navPoint, ['navLabel', '0', 'text', '0'])\n      const playOrder = _.get(navPoint, ['$', 'playOrder']) as string\n      const { hash: nodeId } = parseLink(path)\n      let children = navPoint.navPoint\n\n      if (children) {\n        // tslint:disable-next-line:no-use-before-declare\n        children = parseNavPoints(children)\n      }\n\n      const sectionId = this._resolveIdFromLink(path)\n\n      return {\n        name,\n        sectionId,\n        nodeId,\n        path,\n        playOrder,\n        children,\n      }\n    }\n\n    const parseNavPoints = (navPoints: GeneralObject[]) => {\n      return navPoints.map((point) => {\n        return parseNavPoint(point)\n      })\n    }\n\n    return parseNavPoints(rootNavPoints)\n  }\n\n  _resolveSectionsFromSpine(expand = false) {\n    // no chain\n    return _.map(_.union(this._spine), (id) => {\n      const path = _.find(this._manifest, { id }).href\n      const html = this.resolve(path).asText()\n\n      return parseSection({\n        id,\n        htmlString: html,\n        resourceResolver: this.resolve.bind(this),\n        idResolver: this._resolveIdFromLink.bind(this),\n        expand,\n      })\n    })\n  }\n\n  async parse(expand = false) {\n    const opfPath = await this._getOpfPath()\n    this._root = determineRoot(opfPath)\n\n    const content = await this._resolveXMLAsJsObject('/' + opfPath)\n    const manifest = this._getManifest(content)\n    const metadata = _.get(content, ['package', 'metadata'], [])\n    const tocID = _.get(content, ['package', 'spine', 0, '$', 'toc'], 'toc.xhtml');\n    // https://github.com/gaoxiaoliangz/epub-parser/issues/13\n    // https://www.w3.org/publishing/epub32/epub-packages.html#sec-spine-elem\n\n    const tocPath = (_.find(manifest, { id: tocID }) || {}).href\n    if (tocPath) {\n      const toc = await this._resolveXMLAsJsObject(tocPath)\n      this._toc = toc\n      this.structure = this._genStructure(toc)\n    }\n\n    this._manifest = manifest\n    this._content = content\n    this._opfPath = opfPath\n    this._spine = this._getSpine()\n    this._metadata = metadata\n    this.info = parseMetadata(metadata)\n    this.sections = this._resolveSectionsFromSpine(expand)\n\n    return this\n  }\n}\n\nexport interface ParserOptions {\n  type?: 'binaryString' | 'path' | 'buffer'\n  expand?: boolean\n}\nexport default function parserWrapper(target: string | Buffer, options: ParserOptions = {}) {\n  // seems 260 is the length limit of old windows standard\n  // so path length is not used to determine whether it's path or binary string\n  // the downside here is that if the filepath is incorrect, it will be treated as binary string by default\n  // but it can use options to define the target type\n  const { type, expand } = options\n  let _target = target\n  if (type === 'path' || (typeof target === 'string' && fs.existsSync(target))) {\n    _target = fs.readFileSync(target as string, 'binary')\n  }\n  return new Epub(_target as Buffer).parse(expand)\n}\n"
  },
  {
    "path": "src/parseHTML.spec.ts",
    "content": "import parseHTML from './parseHTML'\nimport _ from 'lodash'\n\ndescribe('parseHTML1', () => {\n  it('unwrap tag in unwrap tag situation', () => {\n    const result = parseHTML(`\n      <p class=\"calibre8\"><span class=\"blue1\">李剑波</span><sup class=\"calibre10\"><a id=\"note21\" href=\"../Text/part0006_split_001.html#note21n\">[21]</a></sup><span class=\"calibre9\" style=\"text-decoration:underline\">用他的创业经历告诉你：<span class=\"skycolor\">你的创业方向离不开你决定创业那一刻之前的人生积累，尤其是你的职业生涯的积累。</span></span></p>\n      <p class=\"calibre8\">如果你的积累是工程师，我觉得你选择从解决问题的角度去创业是比较合适的。这个问题也应该是你自己本身需要解决的。更重要的是，你要多跟那些已经在创业的、创业小有所成的、创业失败的人去聊天。聊他们的项目，他们的产品，他们从0到1是怎么过来的。我创业之前聊过的朋友有：做手机做到上亿规模的，代理火控雷达做到千万规模的，做互联网品牌做到百万规模的，做二维码的，做电子商务做失败的，也有做到一年几十万规模的，还有做传统生意的。如果你足够有悟性，相信你能够从中找到你的创业方向的。</p>\n    `)\n    expect(JSON.stringify(result)).toBe(\n      `[{\"tag\":\"p\",\"type\":1,\"children\":[{\"type\":3,\"text\":\"李剑波\"},{\"tag\":\"sup\",\"type\":1,\"children\":[{\"tag\":\"a\",\"type\":1,\"children\":[{\"type\":3,\"text\":\"[21]\"}],\"attrs\":{\"href\":\"../Text/part0006_split_001.html#note21n\",\"id\":\"note21\"}}],\"attrs\":{}},{\"type\":3,\"text\":\"用他的创业经历告诉你：\"},{\"type\":3,\"text\":\"你的创业方向离不开你决定创业那一刻之前的人生积累，尤其是你的职业生涯的积累。\"}],\"attrs\":{}},{\"tag\":\"p\",\"type\":1,\"children\":[{\"type\":3,\"text\":\"如果你的积累是工程师，我觉得你选择从解决问题的角度去创业是比较合适的。这个问题也应该是你自己本身需要解决的。更重要的是，你要多跟那些已经在创业的、创业小有所成的、创业失败的人去聊天。聊他们的项目，他们的产品，他们从0到1是怎么过来的。我创业之前聊过的朋友有：做手机做到上亿规模的，代理火控雷达做到千万规模的，做互联网品牌做到百万规模的，做二维码的，做电子商务做失败的，也有做到一年几十万规模的，还有做传统生意的。如果你足够有悟性，相信你能够从中找到你的创业方向的。\"}],\"attrs\":{}}]`,\n    )\n  })\n})\n"
  },
  {
    "path": "src/parseHTML.ts",
    "content": "import { JSDOM } from 'jsdom'\nimport _ from 'lodash'\nimport { traverseNestedObject } from './utils'\nimport { HtmlNodeObject, GeneralObject } from './types'\n\nconst OMITTED_TAGS = ['head', 'input', 'textarea', 'script', 'style', 'svg']\nconst UNWRAP_TAGS = ['body', 'html', 'div', 'span']\nconst PICKED_ATTRS = ['href', 'src', 'id']\n\n/**\n * recursivelyReadParent\n * @param node\n * @param callback invoke every time a parent node is read, return truthy value to stop the reading process\n * @param final callback when reaching the root\n */\nconst recursivelyReadParent = (\n  node: GeneralObject,\n  callback: (node: GeneralObject) => GeneralObject | null,\n  final?: () => GeneralObject,\n) => {\n  const _read = (_node: GeneralObject): GeneralObject => {\n    const parent = _node.parentNode\n    if (parent) {\n      const newNode = callback(parent)\n      if (!newNode) {\n        return _read(parent)\n      }\n      return newNode\n    } else {\n      if (final) {\n        return final()\n      }\n      return node\n    }\n  }\n  return _read(node)\n}\n\nexport interface ParseHTMLConfig {\n  resolveSrc?: (src: string) => string\n  resolveHref?: (href: string) => string\n}\nconst parseHTML = (HTMLString: string, config: ParseHTMLConfig = {}) => {\n  const rootNode = new JSDOM(HTMLString).window.document.documentElement\n  const { resolveHref, resolveSrc } = config\n\n  // initial parse\n  return traverseNestedObject(rootNode, {\n    childrenKey: 'childNodes',\n    preFilter(node) {\n      return node.nodeType === 1 || node.nodeType === 3\n    },\n    transformer(node, children) {\n      if (node.nodeType === 1) {\n        const tag = node.tagName.toLowerCase()\n        const attrs: GeneralObject = {}\n\n        if (OMITTED_TAGS.indexOf(tag) !== -1) {\n          return null\n        }\n\n        if (UNWRAP_TAGS.indexOf(tag) !== -1 && children) {\n          return children.length === 1 ? children[0] : children\n        }\n\n        PICKED_ATTRS.forEach((attr) => {\n          let attrVal = node.getAttribute(attr) || undefined\n          if (attrVal && attr === 'href' && resolveHref) {\n            attrVal = resolveHref(attrVal)\n          }\n          if (attrVal && attr === 'src' && resolveSrc) {\n            attrVal = resolveSrc(attrVal)\n          }\n          attrs[attr] = attrVal\n        })\n\n        return { tag, type: 1, children, attrs }\n      } else {\n        const text = node.textContent.trim()\n        if (!text) {\n          return null\n        }\n\n        const makeTextObject = () => {\n          return {\n            type: 3,\n            text,\n          }\n        }\n\n        // find the closest parent which is not in UNWRAP_TAGS\n        // if failed then wrap with p tag\n        return recursivelyReadParent(\n          node,\n          (parent) => {\n            const tag = parent.tagName && parent.tagName.toLowerCase()\n            if (!tag || UNWRAP_TAGS.indexOf(tag) !== -1) {\n              return null\n            }\n            return makeTextObject()\n          },\n          () => {\n            return {\n              tag: 'p',\n              children: [makeTextObject()],\n            }\n          },\n        )\n      }\n    },\n    postFilter(node) {\n      return !_.isEmpty(node)\n    },\n  }) as HtmlNodeObject[]\n}\n\nexport default parseHTML\n"
  },
  {
    "path": "src/parseLink.ts",
    "content": "import _ from 'lodash'\n\nexport default function parseHref(href: string) {\n  const hash = href.split('#')[1]\n  const url = href.split('#')[0]\n  const prefix = url.split('/').slice(0, -1).join('/')\n  const filename = _.last(url.split('/')) as string\n  const name = filename.split('.').slice(0, -1).join('.')\n  let ext = _.last(filename.split('.'))\n\n  if (filename.indexOf('.') === -1) {\n    ext = ''\n  }\n\n  return { hash, name, ext, prefix, url }\n}\n"
  },
  {
    "path": "src/parseSection.ts",
    "content": "import path from 'path'\n// @ts-ignore\nimport toMarkdown from 'to-markdown'\nimport parseLink from './parseLink'\nimport parseHTML from './parseHTML'\nimport * as mdConverters from './mdConverters'\nimport { HtmlNodeObject } from './types'\n\nconst isInternalUri = (uri: string) => {\n  return uri.indexOf('http://') === -1 && uri.indexOf('https://') === -1\n}\n\nexport type ParseSectionConfig = {\n  id: string\n  htmlString: string\n  resourceResolver: (path: string) => any\n  idResolver: (link: string) => string\n  expand: boolean\n}\n\nexport class Section {\n  id: string\n  htmlString: string\n  htmlObjects?: HtmlNodeObject[]\n  private _resourceResolver?: (path: string) => any\n  private _idResolver?: (link: string) => string\n\n  constructor({ id, htmlString, resourceResolver, idResolver, expand }: ParseSectionConfig) {\n    this.id = id\n    this.htmlString = htmlString\n    this._resourceResolver = resourceResolver\n    this._idResolver = idResolver\n    if (expand) {\n      this.htmlObjects = this.toHtmlObjects?.()\n    }\n  }\n\n  toMarkdown?() {\n    return toMarkdown(this.htmlString, {\n      converters: [\n        mdConverters.h,\n        mdConverters.span,\n        mdConverters.div,\n        mdConverters.img,\n        mdConverters.a,\n      ],\n    })\n  }\n\n  toHtmlObjects?() {\n    return parseHTML(this.htmlString, {\n      resolveHref: (href) => {\n        if (isInternalUri(href)) {\n          const { hash } = parseLink(href)\n          // todo: what if a link only contains hash part?\n          const sectionId = this._idResolver?.(href)\n          if (hash) {\n            return `#${sectionId},${hash}`\n          }\n          return `#${sectionId}`\n        }\n        return href\n      },\n      resolveSrc: (src) => {\n        if (isInternalUri(src)) {\n          // todo: may have bugs\n          const absolutePath = path.resolve('/', src).substr(1)\n          const buffer = this._resourceResolver?.(absolutePath)?.asNodeBuffer()\n          const base64 = buffer.toString('base64')\n          return `data:image/png;base64,${base64}`\n        }\n        return src\n      },\n    })\n  }\n}\n\nconst parseSection = (config: ParseSectionConfig) => {\n  return new Section(config)\n}\n\nexport default parseSection\n"
  },
  {
    "path": "src/types.ts",
    "content": "export interface GeneralObject {\n  [key: string]: any\n}\n\nexport interface HtmlNodeObject {\n  tag?: string\n  type: 1 | 3\n  text?: string\n  children?: HtmlNodeObject[]\n  attrs: {\n    id: string\n    href: string\n    src: string\n  }\n}\n"
  },
  {
    "path": "src/utils.ts",
    "content": "import _ from 'lodash'\nimport { GeneralObject } from './types'\n\nexport interface TraverseNestedObject {\n  preFilter?: (node: GeneralObject) => boolean\n  postFilter?: (node: GeneralObject) => boolean\n\n  // children must be returned from transformer\n  // or it may not work as expected\n  transformer?: (node: GeneralObject, children?: GeneralObject[]) => any\n  finalTransformer?: (node: GeneralObject) => any\n\n  childrenKey: string\n}\n\n/**\n * traverseNestedObject\n * a note about config.transformer\n * `children` is a recursively transformed object and should be returned for transformer to take effect\n * objects without `children` will be transformed by finalTransformer\n * @param _rootObject\n * @param config\n */\nexport const traverseNestedObject = (\n  _rootObject: Object | Object[],\n  config: TraverseNestedObject,\n) => {\n  const { childrenKey, transformer, preFilter, postFilter, finalTransformer } = config\n\n  if (!_rootObject) {\n    return []\n  }\n\n  const traverse = (rootObject: any | any[]): any[] => {\n    const makeArray = () => {\n      if (\n        Array.isArray(rootObject) ||\n        _.isArrayLikeObject(rootObject) ||\n        _.isArrayLike(rootObject)\n      ) {\n        return rootObject\n      }\n      return [rootObject]\n    }\n    const rootArray = makeArray()\n\n    let result = rootArray\n\n    if (preFilter) {\n      result = _.filter(result, preFilter)\n    }\n\n    result = _.map(result, (object, index) => {\n      if (object[childrenKey]) {\n        const transformedChildren = traverse(object[childrenKey])\n        // in parseHTML, if a tag is in unwrap list, like <span>aaa<span>bbb</span></span>\n        // the result needs to be flatten\n        const children = _.isEmpty(transformedChildren)\n          ? undefined\n          : _.flattenDeep(transformedChildren)\n        if (transformer) {\n          return transformer(object, children)\n        }\n        return {\n          ...object,\n          ...{\n            [childrenKey]: children,\n          },\n        }\n      }\n\n      if (finalTransformer) {\n        return finalTransformer(object)\n      }\n      return object\n    })\n\n    if (postFilter) {\n      result = _.filter(result, postFilter)\n    }\n\n    return result\n  }\n\n  return _.flattenDeep(traverse(_rootObject))\n}\n"
  },
  {
    "path": "tsconfig.json",
    "content": "{\n  \"compilerOptions\": {\n    \"target\": \"es5\",\n    \"lib\": [\n      \"es6\",\n      \"dom\"\n    ],\n    \"module\": \"commonjs\",\n    \"moduleResolution\": \"node\",\n    \"experimentalDecorators\": true,\n    \"emitDecoratorMetadata\": true,\n    \"outDir\": \"lib\",\n    \"sourceMap\": true,\n    \"declaration\": true,\n    \"allowJs\": false,\n    \"jsx\": \"react\",\n    \"allowSyntheticDefaultImports\": true,\n    \"esModuleInterop\": true,\n    \"preserveWatchOutput\": true,\n    \"strict\": true\n  },\n  \"include\": [\n    \"src\"\n  ]\n}\n"
  },
  {
    "path": "tslint.json",
    "content": "{\n  \"rulesDirectory\": [\"node_modules/vrsource-tslint-rules/rules\"],\n  \"rules\": {\n    \"class-name\": false,\n    \"comment-format\": [\n      true,\n      \"check-space\"\n    ],\n    \"indent\": [\n      true,\n      \"spaces\"\n    ],\n    \"no-duplicate-variable\": true,\n    \"no-eval\": true,\n    \"no-internal-module\": true,\n    \"no-trailing-whitespace\": false,\n    \"no-var-keyword\": true,\n    \"one-line\": [\n      true,\n      \"check-open-brace\",\n      \"check-whitespace\"\n    ],\n    \"quotemark\": [\n      true,\n      \"single\",\n      \"jsx-double\"\n    ],\n    \"semicolon\": [\n      true,\n      \"never\"\n    ],\n    \"triple-equals\": [\n      true,\n      \"allow-null-check\"\n    ],\n    \"typedef-whitespace\": [\n      true,\n      {\n        \"call-signature\": \"nospace\",\n        \"index-signature\": \"nospace\",\n        \"parameter\": \"nospace\",\n        \"property-declaration\": \"nospace\",\n        \"variable-declaration\": \"nospace\"\n      }\n    ],\n    \"variable-name\": [\n      true,\n      \"ban-keywords\"\n    ],\n    \"whitespace\": [\n      true,\n      \"check-branch\",\n      \"check-decl\",\n      \"check-operator\",\n      \"check-separator\",\n      \"check-type\"\n    ],\n    \"no-shadowed-variable\": true,\n    \"no-unused-expression\": true,\n    \"no-use-before-declare\": true,\n    \"no-unused-variable\": [\n      true,\n      {\n        \"ignore-pattern\": [\"^_|React\"]\n      }\n    ],\n    \"one-variable-per-declaration\": [true, \"ignore-for-loop\"],\n    \"no-console\": [true, \"log\"],\n    \n    // from plugin\n    \"no-param-reassign\": true\n  }\n}"
  }
]