Pricing

A Link

We are the 04/02/2017

\n\t\tPhone USA: (912) 148-456

Phone FR: +332 38 30 37 90\n\t\tEmail: lspurcell@suddenlink.net\n\t\n\n\n```\n\n#### $( selector ).scrape( frame , {options})\n\n`selector` is defined in [Cheerio's documentation](https://github.com/cheeriojs/cheerio#-selector-context-root-)\n\n`frame` is a JSON or Javascript Object\n\n`{options}` are detailed [later in its own section](#options)\n\n```js\nlet frame = {\n\t\"title\": \"h2\" // CSS selector\n}\n```\n\nWe then pass the frame to the function:\n\n```js\nlet result = $('body').scrape(frame, { string: true })\nconsole.log( result )\n//=> {\"title\": \"Pricing\"}\n```\n\n### Frame\n\n#### Inline Selector\nMost common selector, `inline line` by specifying nothing more than the data name property and the selector as its value.\n\n```js\n...\nlet frame = { \"title\": \"h2\" }\n\nlet result = $('body').scrape(frame, { string: true })\nconsole.log( result )\n\n/* output =>\n\t{ \"title\": \"Pricing\" }\n*/\n...\n```\n\n#### New : Inline attribute / extractor / parser\nYou can now declare everything in line. You should just be careful to always use them in the following order when combining them : `@ (attribute), | (extractor), || (parse)`.\n\n_See examples for each of them above._\n\n#### Attribute\n`_a: \"attributeName\"` allows you to retrieve `any attribute data` \n`@` inside the selector `_s` allows you to do it inline\n\n```js\n...\nlet frame = {\n\t\"proPrice\": \".planName:contains('Pro') + span@price\"\n}\n\nlet result = $('body').scrape(frame, { string: true })\nconsole.log( result )\n\n/* output =>\n\t{ \"proPrice\": \"39.00\" }\n*/\n...\n```\n\n\n#### Extractor\n`<` inside the selector `_s` allows you to do it inline\n\nIt currently supports `email` (also `mail`), `telephone` (also `phone`), `date`, `fullName` (or `firstName`, `lastName`, `initials`, `suffix`, `salutation`) and `html` (to get the inner html) and by default (no declaration), we get the `inner text`.\n\n```js\n...\nlet frame = {\n\t\"email\": \"[itemprop=email] < phone\",\n\t\"frphone\": \"[itemprop=frphone] < phone\"\n}\n\nlet result = $('body').scrape(frame, { string: true })\nconsole.log( result )\n\n/* output =>\n\t{\n\t\t\"email\": \"example@google.net\",\n\t\t\"frphone\": \"33238303790\"\n\t}\n*/\n...\n```\n\n#### Filter\n`|` inside the selector `_s` allows you to do it inline\n\nIt currently supports `trim` (remove spaces at beginning and end), `lowercase or lcase`, `uppercase or ucase`, `capitalize or cap`, `words or w`, `noescapchar or nec`, `compact or cmp` and `number or nb`.\n\n```js\n...\nlet frame = {\n\t\"email1\": \"[itemprop=email] < phone | uppercase\",\n\t\"email2\": \"[itemprop=email] < phone | capitalize\"\n}\n\nlet result = $('body').scrape(frame, { string: true })\nconsole.log( result )\n\n/* output =>\n\t{\n\t\t\"email1\": \"EXAMPLE@GOOGLE.NET\",\n\t\t\"email2\": \"EXAMPLE GOOGLE NET\"\n\t}\n*/\n...\n```\n\n#### Parse / Regex\n`||` inside the selector `_s` allows you to use regexes in line\n`_p: /regex/` allows you to extract data based on **regular expressions** \n\n\n```js\n...\nlet frame = {\n\t\"data\": \".date || \\\\d{1,2}/\\\\d{1,2}/\\\\d{2,4}\"\n}\n\n// or use the longer version for proper regex entry\n\nlet frame = {\n\t\"data\": {\n\t\t_s: \".date\",\n\t\t_p: /\\d{1,2}\\/\\d{1,2}\\/\\d{2,4}/ // n[n]/n[n]/nn[nn] format here\n\t}\n}\n\nlet result = $('body').scrape(frame, { string: true })\nconsole.log( result )\n\n/* output =>\n\t{\n\t\t\"date\": \"04/02/2017\"\n\t}\n*/\n...\n```\n\n#### List / Array\n`_d: [{ }]` allows you to get an `array / list of data` \n`_d: [\"selector\"]` will retrieves a list based on the selector inbetween quotes. \n`_d: [\"firstSelector\", \"secondSelector\"]` works too and merge the results into one array\n\nYou could even shorten it more by listing right from the selector as follows:\n`\"selectorName\": [\".selector\"]` which returns an array of strings\n\n```js\n...\nlet frame = {\n\t\"pricing\": {\n\t\t_s: \"#pricing .item\",\n\t\t_d: [{\n\t\t\t\"name\": \".planName\",\n\t\t\t\"price\": \".planPrice\"\n\t\t}]\n\t}\n}\n\nlet result = $('body').scrape(frame, { string: true })\nconsole.log( result )\n\n/* output =>\n\t{\n\t\t\"pricing\": [\n\t\t\t{\n\t\t\t\t\"name\": \"Hacker\",\n\t\t\t\t\"price\": \"Free\"\n\t\t\t},\n\t\t\t{\n\t\t\t\t\"name\": \"Pro\",\n\t\t\t\t\"price\": \"$39\"\n\t\t\t}\n\t\t]\n\t}\n*/\n\n// Or a shorter way which works for simple string arrays\n\nlet frame = {\n\t\"pricingNames\": [\"#pricing .item .planName\"]\n}\n\nlet result = $('body').scrape(frame, { string: true })\nconsole.log( result )\n\n/* output =>\n\t{\n\t\t\"pricingNames\": [\"Hacker\", \"Pro\"]\n\t}\n*/\n...\n```\n#### Grouped\n`\"_g\": { _s: \"\", _d: {} }` allows you to group some data selectors by a parent selector without naming the parent. You can also extends the group property to add some meaning or simply have several groups at the same level. \nGroup property name must be `_g` or `_group` followed by `_` and whatever string you want. \nex: `_g_head : {}` or `_g_body : {}`\n\n```js\n...\nlet frame = {\n\t_g: {\n\t\t_s: \"#pricing .item\",\n\t\t_d: {\n\t\t\t\"name\": \".planName\",\n\t\t\t\"price\": \".planPrice\"\n\t\t}\n\t},\n\t_g_second: {\n\t\t_s: \"#pricing .item\",\n\t\t_d: {\n\t\t\t\"secondName\": \".planName\",\n\t\t\t\"secondPrice\": \".planPrice\"\n\t\t}\n\t}\n}\n\nlet result = $('body').scrape(frame, { string: true })\nconsole.log( result )\n\n/* output =>\n\t{\n\t\t\"name\": \"Hacker\",\n\t\t\"price\": \"Free\",\n\t\t\"secondName\": \"Hacker\",\n\t\t\"secondPrice\": \"Free\"\n\t}\n*/\n...\n```\n\n\n#### Nested\n`\"parent\": { _s: \"parentSelector\", _d: {} }` allows you to segment your data by `setting a parent section` from which the child data will be scraped. \n\nYou can also use `\"parent\": { }` when you only want to nest data into objects without setting a parent selector.\n\n```js\n...\nlet frame = {\n\t\"pricing\": {\n\t\t_s: \"#pricing .item\",\n\t\t_d: {\n\t\t\t\"name\": \".planName\",\n\t\t\t\"price\": \".planPrice\"\n\t\t}\n\t}\n}\n\nlet result = $('body').scrape(frame, { string: true })\nconsole.log( result )\n\n/* output =>\n\t{\n\t\t\"pricing\":{\n\t\t\t\"name\": \"Hacker\",\n\t\t\t\"price\": \"Free\"\n\t\t}\n\t}\n*/\n...\n```\n\n> Note here that we get the first returned result (#pricing .item).\n\n#### Example\nSee how you can properly `structure your data`, ready for the output!\n\n```js\n...\nlet frame = {\n\t\"pricing\": {\n\t\t_s: \"#pricing .item\",\n\t\t_d: [{\n\t\t\t\"name\": \".planName\",\n\t\t\t\"price\": \".planPrice @ price\",\n\t\t\t\"image\": {\n\t\t\t\t\"url\": \"img @ src\",\n\t\t\t\t\"link\": \"a @ href\"\n\t\t\t}\n\t\t}]\n\t}\n}\n\nlet result = $('body').scrape(frame, { string: true })\nconsole.log( result )\n\n/* output =>\n\t{\n\t\t\"pricing\":[\n\t\t\t{\n\t\t\t\t\"name\": \"Hacker\",\n\t\t\t\t\"price\": \"0\",\n\t\t\t\t\"image\": {\n\t\t\t\t\t\"url\": \"./img/hacker.png\",\n\t\t\t\t\t\"link\": \"/hacker\"\n\t\t\t\t}\n\t\t\t},\n\t\t\t{\n\t\t\t\t\"name\": \"Pro\",\n\t\t\t\t\"price\": \"39.00\",\n\t\t\t\t\"image\": {\n\t\t\t\t\t\"url\": \"./img/pro.png\",\n\t\t\t\t\t\"link\": \"/pro\"\n\t\t\t\t}\n\t\t\t}\n\t\t]\n\t}\n*/\n...\n```\n\n> Note here that we get the first returned result (#pricing .item).\n\n\n### Options\n\n```js\n...\nlet frame = {\n\t\"proPrice\": {\n\t\t_s: \".planName:contains('Pro') + span\",\n\t\t_a: \"price\"\n\t}\n}\n\nlet result = $('body')\n\t.scrape(frame, {\n\t\t\ttimestats: true, // default: false\n\t\t\tstring: true // default: false\n\t\t})\nconsole.log(result)\n\n/* output =>\n\t{\n\t\t\"proPrice\": {\n\t\t\t\"value\":\"39.00\",\n\t\t\t\"_timestats\": \"1\" // ms\n\t\t}\n\n\t}\n*/\n...\n```\n\n## Tests\n\nOne shot tests\n```bash\nnpm run test\n```\n\nWatching test on updates\n```bash\nnpm run test-watch\n```\n\n## Changelog\n\n⚠ Careful if you've been using **jsonframe** from the **version 1.x.x**, some things changed to make it more **flexible**, **faster to use (inline parameters)** and **more meaningful in the syntax**.\n\n**2.0.52** (28/02/2017)\n- Update the email regex\n- Update the website regex\n- Fix array into array results\n- Improving script efficiency getting data from node(s)\n- Fix date extractor when no date to extract\n\n**2.0.51** (27/02/2017)\n- Fix a fatal error (argh) which was just a typo about the new chained extractors\n\n**2.0.50** (27/02/2017)\n- Extractors chaining is now possible. For ex: `.selector < html email` would work\n\n**2.0.49** (27/02/2017)\n- Fixing issue when attribute doesn't exists (@ attributeNmae)\n- Improving array of object management (need to find a way to avoid empty objects still)\n\n**2.0.48** (27/02/2017)\n- Add Filter `Split(char)` to split string based on character (default to whitespace)\n- Add Extractor `numbers or nb` (return potentially an array)\n- Update Filter `numbers or nb` (simply filter the string to output only numbers)\n- Add Filter `between(string1&&string2)` to filter data by starting and finishing string\n- Add Filter `before(string)` to get data before a string\n- Add Filter `after(string)` to get data after a string\n- Add array support to Filter `left(nb)` and `right(nb)` (slice the array elements)\n- Add Filter `fromto(startNb,endNb)` to either slice an array or a string from index to index\n- Add Filter `get(nb)` to extract either an array item or a character from a string\n\n**2.0.46** (26/02/2017)\n- Rebuild of the Unstructured scraper with breaks (_b) - Works like a charm now!\n\n**2.0.45** (25/02/2017)\n- Fix weird fullName parsing in some cases\n- Update Handle of Regex - Is now able to capture a group with a regex\n\n**2.0.44** (24/02/2017)\n- Inline array for extractors like `\"mails\": [\".parentSelector < email\"]`\n- Adds french words: `prenom` and `nom` to humanname extractor\n- Add filters: `right(number)`, `left(number)`\n- Set a stricter regex for email extractor `/([a-zA-Z0-9._-]{0,30}@[a-zA-Z0-9._-]{0,15}\\.[a-zA-Z0-9._-]{0,15})/gmi`\n\n**2.0.3** (23/02/2017)\n- Possibility to scrape unstructured data with breaks (`_b`). More about this soooon in the readme.\n- New filters: `words or w`, `noescapchar or nec` and `compact or cmp`\n- Multi-filters is available now. Ex: `.selector | words compact`. Simply separated by spaces.\n- Disabling google libphonenumber for now\n\n**2.0.2** (15/02/2017)\n- String option to get a stringified output right away\n- Multi-groups possibility at same level (several _g wouldn't work as same property name) in frame like _g_head and _g_body for example\n- Joined arrays/lists with [\"firstlist.selector\", \"secondlist.selector\", \"...\"] when inline\n- Better handling of img node - automatic src attribute is output (if nothing else set)\n\n**2.0.1** (14/02/2017)\n- Fixed the non-passing tests and added all the new ones for 2.x.x updates\n- Refactoring the way data is processed for future multiple occurences\n\n**2.0.0** (12/02/2017)\n- ⚠ Changing ~~`Type`~~ for `Extractor` with shortcode `<` instead of `|`\n- ⚠ `filters` with the shortcode `|`\n- Inline parameters support for `\"attribute\"`, `\"extractor\"` and `\"parse\"`\n- Simple string arrays from inline selector\n- Group property to group data selectors whitout naming the group (childs take the place of the group property `\"_g\"` or `\"_group\"` )\n\n\n**1.1.1** (05/02/2017)\n- Short & functionnal parameters ( `_s`, `_t`, `_a`) instead of `\"selector\"`, `\"extractor\"`, `\"attr\"`. Idea behind being to easily differentiate **retrieved data name** to **functionnal data**.\n- Automatic handler for `img` selected element (automatically retrieve the img src link)\n- `_parent_` selector to target the **parent content**\n- A **regex parser** with the functionnal parameter **parse**: `_p` (`_parse` works too)\n- **Extractor** `_t: \"html\"` feature to get back **inner html of a selector**\n- Added **timestats** to measure time spent on each node via `.scrape(frame, {timestats: true})`\n- Refactorization of the whole code to make it evolutive (DRY)\n- Update of the tests cases accordingly\n\n\n**1.0.0** (27/01/2017)\n- Stable version release with basic features \n\n## Contributing 🤝\n> Feel free to follow the procedure to make it even more awesome!\n\n1. Create an `issue` so we `get the discussion started`\n2. Fork it!\n3. Create your feature branch: `git checkout -b my-new-feature`\n4. Commit your changes: `git commit -am 'Add some feature'`\n5. Push to the branch: `git push origin my-new-feature`\n6. Submit a pull request :D\n\n\n## License\n[Gabin Desserprit](mailto:gabin@datascraper.pro) - [datascraper.pro](datascraper.pro) \nReleased under MIT License\n" }, { "path": "index.js", "content": "'use strict'\n\nconst _ = require('lodash')\nconst chrono = require('chrono-node')\nconst humanname = require('humanname')\nconst addressit = require('addressit')\n// const phoneUtil = require('google-libphonenumber').PhoneNumberUtil.getInstance()\n\n\nlet parseData = function (data, regex, {\n\tmultiple = false\n} = {}) {\n\tlet result = data\n\tlet extracted\n\tif (regex) {\n\t\ttry {\n\t\t\tlet rgx = regex\n\t\t\tif (_.isString(regex)) {\n\t\t\t\trgx = new RegExp(regex, 'gim')\n\t\t\t}\n\t\t\textracted = rgx.exec(data)\n\t\t\tif (multiple) {\n\t\t\t\tresult = extracted\n\t\t\t\t// result = data.match(rgx)\n\t\t\t} else {\n\t\t\t\tif (extracted[1]) {\n\t\t\t\t\tresult = extracted[1]\n\t\t\t\t} else {\n\t\t\t\t\tresult = extracted[0]\n\t\t\t\t}\n\t\t\t\t// result = data.match(rgx)[0]\n\t\t\t}\n\t\t} catch (error) {\n\t\t\t// console.log(\"Regex error: \", error)\n\t\t}\n\t}\n\treturn result\n}\n\nlet filterData = function (data, filter) {\n\n\tlet paranthethisRegex = /(?:\$)(.+)(?:\$)/gim\n\n\tlet result = data\n\tif ([\"raw\"].includes(filter)) {\n\t\t// let the raw data\n\n\t} else if (filter && filter.includes(\"split\")) {\n\t\tlet splitValue = paranthethisRegex.exec(filter)\n\t\tif (splitValue && splitValue[1]) {\n\t\t\tresult = result.split(splitValue[1])\n\t\t} else {\n\t\t\tresult = result.split(\" \")\n\t\t}\n\t\tresult = result.filter(function (x) {\n\t\t\treturn x !== \"\"\n\t\t})\n\t\tresult = result.map(function (x) {\n\t\t\treturn x.trim()\n\t\t})\n\t} else if (filter && filter.includes(\"between\")) {\n\t\tlet betweenValues = paranthethisRegex.exec(filter)\n\t\tif (betweenValues && betweenValues[1]) {\n\t\t\tbetweenValues = betweenValues[1].split(\"&&\")\n\t\t\tif (betweenValues.length > 1) {\n\t\t\t\tresult = result.split(betweenValues[0].replace(/_/gm, \" \").trim()).pop().split(betweenValues[1].replace(/_/gm, \" \").trim()).shift().trim() || \"\"\n\t\t\t}\n\t\t}\n\t} else if (filter && filter.includes(\"after\")) {\n\t\tlet afterValue = paranthethisRegex.exec(filter)\n\t\tif (afterValue && afterValue[1]) {\n\t\t\tresult = result.split(afterValue[1].replace(/_/gm, \" \").trim()).pop().trim() || \"\"\n\t\t}\n\t} else if (filter && filter.includes(\"before\")) {\n\t\tlet beforeValue = paranthethisRegex.exec(filter)\n\t\tif (beforeValue && beforeValue[1]) {\n\t\t\tresult = result.split(beforeValue[1].replace(/_/gm, \" \").trim()).shift().trim() || \"\"\n\t\t}\n\t} else if (filter && filter.includes(\"css\")) {\n\t\t// let cssValue = paranthethisRegex.exec(filter)\n\t\t// if(cssValue && cssValue[1]){\n\t\t// \tresult = result.split(cssValue[1].trim()).pop().split(\",\",1).shift().trim() || \"\"\n\t\t// }\t\n\t} else if ([\"trim\"].includes(filter)) {\n\t\tresult = result.trim()\n\t} else if (filter && filter.includes(\"join\") && _.isArray(result)) {\n\t\tlet joinChar = paranthethisRegex.exec(filter)\n\t\tif (joinChar && joinChar[1]) {\n\t\t\tresult = result.join(joinChar[1].replace(/_/gm, \" \"))\n\t\t} else {\n\t\t\tresult = result.join(\" \")\n\t\t}\n\t} else if ([\"lowercase\", \"lcase\"].includes(filter)) {\n\t\tresult = result.toLowerCase()\n\t} else if ([\"uppercase\", \"ucase\"].includes(filter)) {\n\t\tresult = result.toUpperCase()\n\t} else if ([\"capitalize\", \"cap\"].includes(filter)) {\n\t\tresult = _.startCase(result)\n\t} else if ([\"number\", \"nb\"].includes(filter)) {\n\t\tresult = result.match(/\\d+/gm)\n\t\tresult = result.join(\" \")\n\t} else if ([\"words\", \"w\"].includes(filter)) {\n\t\tresult = result.replace(/\\W/gm, \" \")\n\t} else if ([\"noescapchar\", \"nec\"].includes(filter)) {\n\t\tresult = result.replace(/\\t+|\\n+|\\r+/gm, \" \")\n\n\t} else if (filter && filter.includes(\"right\")) {\n\t\tlet regexified = filter.match(/\\d+/g)\n\t\tif (regexified && regexified[0]) {\n\t\t\tlet nb = regexified[0]\n\t\t\tif (_.isArray(result)) {\n\t\t\t\tresult = result.slice(result.length - nb, result.length)\n\t\t\t} else {\n\t\t\t\tresult = result.substr(result.length - nb)\n\t\t\t}\n\t\t}\n\t} else if (filter && filter.includes(\"left\")) {\n\t\tlet regexified = filter.match(/\\d+/g)\n\t\tif (regexified && regexified[0]) {\n\t\t\tlet nb = regexified[0]\n\t\t\tif (_.isArray(result)) {\n\t\t\t\tresult = result.slice(0, nb)\n\t\t\t} else {\n\t\t\t\tresult = result.substr(0, nb)\n\t\t\t}\n\t\t}\n\t} else if (filter && filter.includes(\"fromto\")) {\n\t\tlet regexified = paranthethisRegex.exec(filter)\n\t\tif (regexified && regexified[1]) {\n\t\t\tlet nbs = regexified[1].split(/[,-]/gim)\n\t\t\tlet start, end\n\t\t\tif (nbs.length > 1) {\n\t\t\t\tstart = parseInt(nbs[0].trim())\n\t\t\t\tend = parseInt(nbs[1].trim())\n\t\t\t\tif (_.isArray(result)) {\n\t\t\t\t\tresult = result.slice(start, end + 1)\n\t\t\t\t} else {\n\t\t\t\t\tresult = result.substr(start, end)\n\t\t\t\t}\n\t\t\t}\n\t\t}\n\n\t} else if (filter && filter.includes(\"get\")) {\n\t\tlet regexified = filter.match(/\\d+/g)\n\t\tif (regexified && regexified[0]) {\n\t\t\tlet nb = regexified[0]\n\t\t\tif (_.isArray(result)) {\n\t\t\t\tresult = result[nb]\n\t\t\t} else {\n\t\t\t\tresult = result.charAt(nb)\n\t\t\t}\n\t\t}\n\t\t//default\n\t} else if ([\"compact\", \"cmp\"].includes(filter) || !filter) {\n\t\tresult = result.replace(/\\s+/gm, \" \").trim()\n\t}\n\treturn result\n}\n\nlet extractByExtractor = function (data, extractor, {\n\tmultiple = false\n} = {}) {\n\tlet result = data\n\tlet emailRegex = /([a-zA-Z0-9._-]{1,30}@[a-zA-Z0-9._-]{2,15}\\.[a-zA-Z0-9._-]{2,15})/gmi\n\tlet phoneRegex = /\\+?\$?\\d*\$? ?\$?\\d+\$?\\d*([\\s./-]\\d{2,})+/gmi\n\tlet websiteRegex = /(?:[\\s\\W])((https?:\\/\\/)?(www\\.)?[-a-zA-Z0-9:%._\\+~#=]{2,256}\\.[a-z]{2,6}\\b[-a-zA-Z0-9@:%_\\+.~#?&/=]*)/gmi\n\n\tif ([\"phone\", \"telephone\"].includes(extractor)) {\n\t\tif (multiple) {\n\t\t\tresult = data.match(phoneRegex) || \"\"\n\t\t} else {\n\t\t\tresult = data.match(phoneRegex) !== null ? data.match(phoneRegex)[0] : \"\"\n\t\t}\n\t} else if ([\"numbers\", \"nb\"].includes(extractor)) {\n\t\tif (multiple) {\n\t\t\tresult = result.match(/\\d+/gm) || \"\"\n\t\t} else {\n\t\t\tresult = result.match(/\\d+/gm) !== null ? result.match(/\\d+/gm)[0] : \"\"\n\t\t}\n\t} else if ([\"website\"].includes(extractor)) {\n\n\t\tlet websites = data.match(websiteRegex)\n\n\t\tif (websites && websites.length > 0) {\n\t\t\twebsites = websites.map(function (x) {\n\t\t\t\treturn x.substr(1, x.length) // remove first character\n\t\t\t})\n\n\t\t\tif (multiple) {\n\t\t\t\tresult = websites || \"\"\n\t\t\t} else {\n\t\t\t\tresult = websites !== null ? websites[0] : \"\"\n\t\t\t}\n\t\t}\n\n\t} else if ([\"address\", \"add\"].includes(extractor)) {\n\t\tresult = addressit(data)\n\t} else if ([\"email\", \"mail\", \"@\"].includes(extractor)) {\n\t\tif (multiple) {\n\t\t\tresult = data.match(emailRegex) || data\n\t\t\tif (_.isArray(result) && result.length === 1) {\n\t\t\t\tresult = result[0]\n\t\t\t}\n\t\t} else {\n\t\t\tresult = data.match(emailRegex) !== null ? data.match(emailRegex)[0] : \"\"\n\t\t}\n\t} else if ([\"date\", \"d\"].includes(extractor)) {\n\t\tlet date = chrono.casual.parseDate(data)\n\t\tif (date) {\n\t\t\tresult = date.toString()\n\t\t} else {\n\t\t\tresult = \"\"\n\t\t}\n\t} else if ([\"fullName\", \"prenom\", \"firstName\", \"nom\", \"lastName\", \"initials\", \"suffix\", \"salutation\"].includes(extractor)) {\n\t\t// compact data before to parse it\n\t\tresult = humanname.parse(filterData(data, \"cmp\"))\n\t\tif (\"fullName\".includes(extractor)) {\n\t\t\t// return the object\n\t\t} else if ([\"firstName\", \"prenom\"].includes(extractor)) {\n\t\t\tresult = result.firstName\n\t\t} else if ([\"lastName\", \"nom\"].includes(extractor)) {\n\t\t\tresult = result.lastName\n\t\t} else if (\"initials\".includes(extractor)) {\n\t\t\tresult = result.initials\n\t\t} else if (\"suffix\".includes(extractor)) {\n\t\t\tresult = result.suffix\n\t\t} else if (\"salutation\".includes(extractor)) {\n\t\t\tresult = result.salutation\n\t\t}\n\t}\n\n\treturn result\n}\n\nlet isAGroupKey = function (groupKey) {\n\tlet groupProperties = ['_g', '_group', '_groupe']\n\tlet isAGroup = false\n\tgroupProperties.forEach(function (value) {\n\t\tif (value === groupKey || groupKey.startsWith(value + '_')) {\n\t\t\tisAGroup = true\n\t\t\treturn\n\t\t}\n\t})\n\treturn isAGroup\n}\n\nlet getPropertyFromObj = function (obj, propertyName) {\n\tlet properties = {\n\t\t'selector': ['_s', '_selector', '_selecteur', 'selector'],\n\t\t'attribute': ['_a', '_attr', '_attribut', 'attr', 'attribute'],\n\t\t'filter': ['_filter', '_f', '_filtre', 'filter'],\n\t\t'extractor': ['_e', '_extracteur', 'extractor', 'type', '_t'], //keep temporary old types\n\t\t'data': ['_d', '_data', '_donnee', 'data'],\n\t\t'parser': ['_p', '_parser', '_parseur', 'parser'],\n\t\t'break': ['_b', '_break', '_cassure']\n\t}\n\n\tlet ob = this\n\tlet res = null\n\tif (properties[propertyName]) {\n\t\tproperties[propertyName].forEach(function (property, i) {\n\t\t\tif (obj[property]) {\n\t\t\t\tres = obj[property]\n\t\t\t\treturn\n\t\t\t}\n\t\t})\n\t}\n\treturn res\n}\n\nlet timeSpent = function (lastTime) {\n\treturn new Date().getTime() - lastTime\n}\n\nString.prototype.oneSplitFromEnd = function (char) {\n\tlet arr = this.split(char),\n\t\tres = []\n\n\tres[1] = arr[arr.length - 1]\n\tarr.pop()\n\tres[0] = arr.join(char)\n\treturn res\n}\n\nmodule.exports = function ($) {\n\n\n\tlet getNodesFromSmartSelector = function (node, selector) {\n\t\tif (selector === \"_parent_\") {\n\t\t\treturn node\n\t\t} else {\n\t\t\treturn $(node).find(selector)\n\t\t}\n\t}\n\n\tlet getFunctionalParameters = function (obj) {\n\t\tlet result = {\n\t\t\tselector: getPropertyFromObj(obj, 'selector'),\n\t\t\tattribute: getPropertyFromObj(obj, 'attribute'),\n\t\t\tfilter: getPropertyFromObj(obj, 'filter'),\n\t\t\textractor: getPropertyFromObj(obj, 'extractor'),\n\t\t\tdata: getPropertyFromObj(obj, 'data'),\n\t\t\tparser: getPropertyFromObj(obj, 'parser'),\n\t\t\tbreak: getPropertyFromObj(obj, 'break')\n\t\t}\n\n\t\treturn result\n\t}\n\n\tlet updateFunctionalParametersFromSelector = function (g, selector, node) {\n\n\t\tlet gUpdate = extractSmartSelector({\n\t\t\tselector: selector,\n\t\t\tnode: $(node)\n\t\t})\n\n\t\tg.selector = gUpdate.selector\n\t\tg.parser = g.parser ? g.parser : gUpdate.parser\n\t\tg.filter = g.filter ? g.filter : gUpdate.filter\n\t\tg.attribute = g.attribute ? g.attribute : gUpdate.attribute\n\t\tg.extractor = g.extractor ? g.extractor : gUpdate.extractor\n\n\t\treturn g\n\t}\n\n\tlet getDataFromNodes = function (nodes, g, {\n\t\ttimestats = false,\n\t\tmultiple = true\n\t} = {}) {\n\n\t\tlet result = []\n\n\t\tif (timestats) {\n\t\t\tresult = {}\n\t\t\tresult['_value'] = []\n\t\t}\n\n\t\t// Getting data\n\t\t$(nodes).each(function (i, n) {\n\t\t\tlet r = getTheRightData($(n), {\n\t\t\t\textractor: g.extractor,\n\t\t\t\tfilter: g.filter,\n\t\t\t\tattr: g.attribute,\n\t\t\t\tparser: g.parser,\n\t\t\t\tmultiple: multiple\n\t\t\t})\n\n\t\t\tif (_.isArray(r) && r.length === 1) {\n\t\t\t\tr = r[0]\n\t\t\t}\n\n\t\t\tif (r) {\n\t\t\t\tif (result['_value']) {\n\t\t\t\t\tif (_.isArray(r) && r.length > 1) {\n\t\t\t\t\t\tresult['_value'] = r\n\t\t\t\t\t} else {\n\t\t\t\t\t\tresult['_value'].push(r)\n\t\t\t\t\t}\n\t\t\t\t} else {\n\t\t\t\t\tif (_.isArray(r) && r.length > 1) {\n\t\t\t\t\t\tresult = r\n\t\t\t\t\t} else {\n\t\t\t\t\t\tresult.push(r)\n\t\t\t\t\t}\n\t\t\t\t}\n\t\t\t}\n\n\t\t\t// not multiple wanted, stop at the first one\n\t\t\tif (!multiple) {\n\t\t\t\treturn false\n\t\t\t}\n\t\t})\n\n\t\tif (result['_value']) {\n\t\t\tresult['_timestat'] = timeSpent(gTime)\n\t\t}\n\n\t\t// avoid listing\n\t\tif ((!g.filter || !g.filter.join(\"\").includes(\"split\")) && !multiple && result[0]) {\n\t\t\tresult = result[0]\n\t\t}\n\n\t\tif (g.filter && g.filter.join(\"\").includes(\"join\") && result.length === 1) {\n\t\t\tresult = result[0]\n\t\t}\n\n\t\tif (result.length === 0) {\n\t\t\tresult = null\n\t\t}\n\n\t\treturn result\n\t}\n\n\tlet extractSmartSelector = function ({\n\t\tselector,\n\t\tnode = null,\n\t\tattribute = null,\n\t\tfilter = null,\n\t\textractor = null,\n\t\tparser = null\n\t}) {\n\t\tlet res = {\n\t\t\t\"selector\": selector,\n\t\t\t\"attribute\": attribute,\n\t\t\t\"filter\": filter,\n\t\t\t\"extractor\": extractor,\n\t\t\t\"parser\": parser\n\t\t}\n\n\t\tif (res.selector.includes('||')) {\n\t\t\tres.parser = res.selector.oneSplitFromEnd('||')[1].trim()\n\t\t\tres.selector = res.selector.oneSplitFromEnd('||')[0].trim()\n\t\t}\n\n\t\tif (res.selector.includes('|')) {\n\t\t\tres.filter = res.selector.oneSplitFromEnd('|')[1].trim()\n\t\t\tres.filter = res.filter.split(/\\s+/)\n\t\t\tres.selector = res.selector.oneSplitFromEnd('|')[0].trim()\n\t\t}\n\n\t\tif (res.selector.includes('<')) {\n\t\t\tres.extractor = res.selector.oneSplitFromEnd('<')[1].trim()\n\t\t\tres.extractor = res.extractor.split(/\\s+/)\n\t\t\tres.selector = res.selector.oneSplitFromEnd('<')[0].trim()\n\t\t}\n\n\t\tif (res.selector.includes('@')) {\n\t\t\tres.attribute = res.selector.oneSplitFromEnd('@')[1].trim()\n\t\t\tres.selector = res.selector.oneSplitFromEnd('@')[0].trim()\n\t\t}\n\n\t\tif (!res.extractor && !res.attribute && $(node).find(res.selector)['0'] && $(node).find(res.selector)['0'].name.toLowerCase() === \"img\") {\n\t\t\tres.attribute = \"src\"\n\t\t}\n\n\t\treturn res\n\t}\n\n\tlet getTheRightData = function (node, {\n\t\tattr = null,\n\t\textractor = null,\n\t\tfilter = null,\n\t\tparser = null,\n\t\tmultiple = false\n\t} = {}) {\n\n\t\t//assuming we handle only one node from getDataFromNodes\n\n\t\tlet result = null\n\t\tlet localNode = node[0] || node // in case of many, shouldn't happen\n\n\t\tif (attr) {\n\t\t\tresult = $(localNode).attr(attr) || \"\"\n\t\t} else {\n\t\t\tresult = $(localNode).text()\n\t\t}\n\n\t\tlet extractors = []\n\n\t\t// build an array of extractors anyway\n\t\tif (!_.isArray(extractor)) {\n\t\t\textractors.push(extractor)\n\t\t} else {\n\t\t\textractors = extractor\n\t\t}\n\n\t\tif (extractors[0] && extractors[0] === \"html\") {\n\t\t\tresult = $(localNode).html()\n\t\t}\n\n\t\tif (_.isObject(result)) {\n\t\t\t_.forOwn(result, function (value, key) {\n\t\t\t\textractors.forEach(function (ext, index) {\n\t\t\t\t\tresult[key] = extractByExtractor(result[key], ext, {\n\t\t\t\t\t\tmultiple\n\t\t\t\t\t})\n\t\t\t\t})\n\t\t\t})\n\t\t} else {\n\t\t\textractors.forEach(function (ext, index) {\n\t\t\t\tresult = extractByExtractor(result, ext, {\n\t\t\t\t\tmultiple\n\t\t\t\t})\n\t\t\t})\n\t\t}\n\n\t\tif (_.isObject(result)) {\n\t\t\t_.forOwn(result, function (value, key) {\n\t\t\t\tif (_.isArray(filter)) {\n\t\t\t\t\tfilter.forEach(function (f, index) {\n\t\t\t\t\t\tresult[key] = filterData(result[key], f)\n\t\t\t\t\t})\n\t\t\t\t} else {\n\t\t\t\t\t// handle type of child\n\t\t\t\t\tif (_.isString(result[key])) {\n\t\t\t\t\t\tresult[key] = filterData(result[key], filter)\n\t\t\t\t\t}\n\t\t\t\t}\n\t\t\t})\n\t\t} else {\n\t\t\tif (_.isArray(filter)) {\n\t\t\t\tfilter.forEach(function (f, index) {\n\t\t\t\t\tresult = filterData(result, f)\n\t\t\t\t})\n\t\t\t} else {\n\t\t\t\tresult = filterData(result, filter)\n\t\t\t}\n\t\t}\n\n\t\tif (parser) {\n\t\t\tresult = parseData(result, parser, {\n\t\t\t\tmultiple\n\t\t\t})\n\t\t}\n\n\t\t// if(!multiple && _.isArray(result)){\n\t\t// \tresult = result[0]\n\t\t// }\n\n\t\treturn result\n\n\t}\n\n\n\t// real prototype\n\t$.prototype.scrape = function (frame, {\n\t\tdebug = false,\n\t\ttimestats = false,\n\t\tstring = false\n\t} = {}) {\n\n\t\tlet output = {}\n\t\tlet mainNode = $(this)\n\n\t\tlet iterateThrough = function (obj, elem, node) {\n\n\t\t\tlet gTime = new Date().getTime()\n\n\t\t\tif (_.isObject(obj)) {\n\n\t\t\t\t_.forOwn(obj, function (currentValue, key) {\n\n\t\t\t\t\t// Security for jsonpath in \"_to\" > \"_frame\"\n\t\t\t\t\tif (key === \"_frame\" || key === \"_from\") {\n\t\t\t\t\t\telem[key] = currentValue\n\n\t\t\t\t\t\t// If it's a group key\n\t\t\t\t\t} else if (isAGroupKey(key)) {\n\n\t\t\t\t\t\tlet selector = getPropertyFromObj(currentValue, 'selector')\n\t\t\t\t\t\tlet data = getPropertyFromObj(currentValue, 'data')\n\t\t\t\t\t\tlet n = getNodesFromSmartSelector($(node), selector)\n\t\t\t\t\t\titerateThrough(data, elem, $(n))\n\n\t\t\t\t\t} else {\n\n\t\t\t\t\t\ttry {\n\n\t\t\t\t\t\t\tlet g = {}\n\n\t\t\t\t\t\t\tif (_.isObject(currentValue) && !_.isArray(currentValue)) {\n\t\t\t\t\t\t\t\tg = getFunctionalParameters(currentValue)\n\n\n\t\t\t\t\t\t\t\tif (g.selector && _.isString(g.selector)) {\n\t\t\t\t\t\t\t\t\tg = updateFunctionalParametersFromSelector(g, g.selector, $(node))\n\n\t\t\t\t\t\t\t\t\tif (g.data && _.isObject(g.data)) {\n\n\t\t\t\t\t\t\t\t\t\tif (_.isArray(g.data)) {\n\n\t\t\t\t\t\t\t\t\t\t\t// Check if break included\n\t\t\t\t\t\t\t\t\t\t\tif (g.break && _.isString(g.break)) {\n\n\t\t\t\t\t\t\t\t\t\t\t\tlet parent = getNodesFromSmartSelector($(node), g.selector)\n\t\t\t\t\t\t\t\t\t\t\t\t// Clone the parent to leave the initial DOM in place :)\n\t\t\t\t\t\t\t\t\t\t\t\tlet tempParent = $(parent).clone()\n\t\t\t\t\t\t\t\t\t\t\t\t// Get the number of blocks to create\n\t\t\t\t\t\t\t\t\t\t\t\tlet l = $(tempParent).children(g.break).length\n\t\t\t\t\t\t\t\t\t\t\t\t// Random name to set the list\n\t\t\t\t\t\t\t\t\t\t\t\tvar breaklist = \"#breaklist1234\"\n\t\t\t\t\t\t\t\t\t\t\t\t// Add the list after the parent in the DOM\n\t\t\t\t\t\t\t\t\t\t\t\t$(parent).after('

')\n\n\t\t\t\t\t\t\t\t\t\t\t\t// Moving the dom elements to blocks\n\t\t\t\t\t\t\t\t\t\t\t\tfor (var index = 0; index < l; index++) {\n\n\t\t\t\t\t\t\t\t\t\t\t\t\t$(breaklist).append('

')\n\t\t\t\t\t\t\t\t\t\t\t\t\t// console.log(\"Appending: \",$(parent).children(g.break).first().text())\n\n\t\t\t\t\t\t\t\t\t\t\t\t\t// Move the break element to the .break block\n\t\t\t\t\t\t\t\t\t\t\t\t\t$(breaklist).children().last().append($(tempParent).children(g.break).first())\n\n\t\t\t\t\t\t\t\t\t\t\t\t\t// Move the next blocks to the .break block\n\t\t\t\t\t\t\t\t\t\t\t\t\t$(tempParent).children().first().nextUntil(g.break).each(function (i, e) {\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t// console.log(\"nextItem\", $(e).text());\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t$(breaklist).children().last().append($(e))\n\t\t\t\t\t\t\t\t\t\t\t\t\t})\n\n\t\t\t\t\t\t\t\t\t\t\t\t}\n\n\t\t\t\t\t\t\t\t\t\t\t\telem[key] = []\n\n\t\t\t\t\t\t\t\t\t\t\t\t// Iterating in this list\n\t\t\t\t\t\t\t\t\t\t\t\t$(breaklist).children(\".break\").each(function (i, e) {\n\t\t\t\t\t\t\t\t\t\t\t\t\telem[key][i] = {}\n\t\t\t\t\t\t\t\t\t\t\t\t\titerateThrough(g.data[0], elem[key][i], $(e))\n\t\t\t\t\t\t\t\t\t\t\t\t})\n\n\t\t\t\t\t\t\t\t\t\t\t}\n\t\t\t\t\t\t\t\t\t\t\t// Check if object in array\n\t\t\t\t\t\t\t\t\t\t\telse if (_.isObject(g.data[0]) && _.size(g.data[0]) > 0) {\n\n\t\t\t\t\t\t\t\t\t\t\t\telem[key] = []\n\t\t\t\t\t\t\t\t\t\t\t\tlet nn = getNodesFromSmartSelector($(node), g.selector)\n\n\t\t\t\t\t\t\t\t\t\t\t\tif ($(nn).length > 0) {\n\t\t\t\t\t\t\t\t\t\t\t\t\t$(nn).each(function (i, n) {\n\t\t\t\t\t\t\t\t\t\t\t\t\t\telem[key][i] = {}\n\t\t\t\t\t\t\t\t\t\t\t\t\t\titerateThrough(g.data[0], elem[key][i], $(n))\n\t\t\t\t\t\t\t\t\t\t\t\t\t})\n\t\t\t\t\t\t\t\t\t\t\t\t}\n\n\n\t\t\t\t\t\t\t\t\t\t\t\t// If no object, taking the single string\n\t\t\t\t\t\t\t\t\t\t\t} else if (_.isString(g.data[0])) {\n\n\t\t\t\t\t\t\t\t\t\t\t\tlet n = getNodesFromSmartSelector($(node), g.selector)\n\t\t\t\t\t\t\t\t\t\t\t\tlet dataResp = getDataFromNodes($(n), g)\n\t\t\t\t\t\t\t\t\t\t\t\tif (dataResp) {\n\t\t\t\t\t\t\t\t\t\t\t\t\telem[key] = dataResp\n\t\t\t\t\t\t\t\t\t\t\t\t}\n\n\t\t\t\t\t\t\t\t\t\t\t}\n\n\t\t\t\t\t\t\t\t\t\t\t// Simple data object to use parent selector as base\n\t\t\t\t\t\t\t\t\t\t} else {\n\n\t\t\t\t\t\t\t\t\t\t\tif (_.size(g.data) > 0) {\n\t\t\t\t\t\t\t\t\t\t\t\telem[key] = {}\n\t\t\t\t\t\t\t\t\t\t\t\tlet n = $(node).find(g.selector).first()\n\t\t\t\t\t\t\t\t\t\t\t\titerateThrough(g.data, elem[key], $(n))\n\t\t\t\t\t\t\t\t\t\t\t}\n\n\t\t\t\t\t\t\t\t\t\t}\n\n\t\t\t\t\t\t\t\t\t} else {\n\n\t\t\t\t\t\t\t\t\t\tlet n = getNodesFromSmartSelector($(node), g.selector)\n\t\t\t\t\t\t\t\t\t\tlet dataResp = getDataFromNodes($(n), g, {\n\t\t\t\t\t\t\t\t\t\t\tmultiple: false\n\t\t\t\t\t\t\t\t\t\t})\n\t\t\t\t\t\t\t\t\t\tif (dataResp) {\n\t\t\t\t\t\t\t\t\t\t\t// push data as unit of array\n\t\t\t\t\t\t\t\t\t\t\telem[key] = dataResp\n\t\t\t\t\t\t\t\t\t\t}\n\n\t\t\t\t\t\t\t\t\t}\n\t\t\t\t\t\t\t\t}\n\n\t\t\t\t\t\t\t\t// There is no Selector but still an Object for organization\n\t\t\t\t\t\t\t\telse {\n\t\t\t\t\t\t\t\t\telem[key] = {}\n\t\t\t\t\t\t\t\t\titerateThrough(currentValue, elem[key], node)\n\t\t\t\t\t\t\t\t}\n\t\t\t\t\t\t\t} else if (_.isArray(currentValue)) {\n\n\t\t\t\t\t\t\t\telem[key] = []\n\t\t\t\t\t\t\t\t// For each unique string\n\t\t\t\t\t\t\t\tcurrentValue.forEach(function (arrSelector, h) {\n\t\t\t\t\t\t\t\t\tif (_.isString(arrSelector)) {\n\n\t\t\t\t\t\t\t\t\t\tg = updateFunctionalParametersFromSelector(g, arrSelector, $(node))\n\t\t\t\t\t\t\t\t\t\tlet n = getNodesFromSmartSelector($(node), g.selector)\n\t\t\t\t\t\t\t\t\t\tlet dataResp = getDataFromNodes($(n), g)\n\t\t\t\t\t\t\t\t\t\tif (dataResp) {\n\t\t\t\t\t\t\t\t\t\t\t// push data as unit of array\n\t\t\t\t\t\t\t\t\t\t\telem[key].push(...dataResp)\n\t\t\t\t\t\t\t\t\t\t}\n\n\t\t\t\t\t\t\t\t\t}\n\t\t\t\t\t\t\t\t})\n\n\t\t\t\t\t\t\t}\n\t\t\t\t\t\t\t// The Parameter is a single string === selector > directly scraped\n\t\t\t\t\t\t\telse {\n\n\t\t\t\t\t\t\t\tg = updateFunctionalParametersFromSelector(g, currentValue, $(node))\n\t\t\t\t\t\t\t\tlet n = getNodesFromSmartSelector($(node), g.selector)\n\t\t\t\t\t\t\t\tlet dataResp = getDataFromNodes($(n), g, {\n\t\t\t\t\t\t\t\t\tmultiple: false\n\t\t\t\t\t\t\t\t})\n\n\t\t\t\t\t\t\t\tif (dataResp) {\n\t\t\t\t\t\t\t\t\t// push data as unit of array\n\t\t\t\t\t\t\t\t\telem[key] = dataResp\n\t\t\t\t\t\t\t\t}\n\n\t\t\t\t\t\t\t}\n\n\t\t\t\t\t\t} catch (error) {\n\t\t\t\t\t\t\t// console.log(\"obj key\", key);\n\t\t\t\t\t\t\tconsole.log(error)\n\t\t\t\t\t\t}\n\n\t\t\t\t\t}\n\n\t\t\t\t})\n\t\t\t}\n\n\t\t}\n\n\t\titerateThrough(frame, output, mainNode)\n\n\t\tif (string) {\n\t\t\toutput = JSON.stringify(output, null, 2)\n\t\t}\n\n\t\treturn output\n\t}\n\n\n}" }, { "path": "package.json", "content": "{\n \"name\": \"jsonframe-cheerio\",\n \"version\": \"2.0.52\",\n \"description\": \"simple multi-level scraper json input/output\",\n \"main\": \"index.js\",\n \"scripts\": {\n \"test\": \"mocha tests/**/*.test.js\",\n \"test-watch\": \"nodemon --exec \\\"npm run test\\\"\"\n },\n \"author\": {\n \"name\": \"Gabin Desserprit\",\n \"email\": \"gabin@datascraper.pro\",\n \"url\": \"http://datascraper.pro\"\n },\n \"keywords\": [\n \"cheerio\",\n \"scraper\",\n \"frame\",\n \"json\",\n \"parser\",\n \"template\"\n ],\n \"repository\": {\n \"type\": \"git\",\n \"url\": \"https://github.com/gahabeen/jsonframe-cheerio\"\n },\n \"bugs\": {\n \"url\": \"https://github.com/gahabeen/jsonframe-cheerio/issues\"\n },\n \"license\": \"ISC\",\n \"devDependencies\": {\n \"cheerio\": \"^0.22.0\",\n \"expect\": \"^1.20.2\",\n \"lodash\": \"^4.17.4\",\n \"mocha\": \"^3.2.0\",\n \"nodemon\": \"^1.11.0\",\n \"unfluff\": \"^1.1.0\",\n \"xmldom\": \"^0.1.27\",\n \"xpath\": \"0.0.23\"\n },\n \"dependencies\": {\n \"addressit\": \"^1.4.0\",\n \"chrono-node\": \"^1.2.5\",\n \"google-libphonenumber\": \"^2.0.9\",\n \"humanname\": \"^0.2.2\",\n \"lodash\": \"^4.17.4\"\n }\n}\n" }, { "path": "tests/index.test.js", "content": "const expect = require('expect')\nconst cheerio = require('cheerio')\nlet _ = require('lodash')\n\nlet jsonframe = require('./../index.js')\n\nlet html = `\n\n\n\n

Pricing

\n\t\t

\n\t\tA Link\n\t\t We are the 04/02/2017\n\t\t\n \n\t

\n\t\tPhone USA: (912) 148-456

\n\t\tPhone FR: +332 38 30 37 90\n\t\tEmail: lspurcell@suddenlink.net\n\t\n\n\n`\n\nlet $ = cheerio.load(html)\n\njsonframe($)\n\ndescribe('JsonFrame Tests', () => {\n\n\tdescribe('Get Data from Inline Selector', () => {\n\n\t\tit('should get simple text', () => {\n\n\t\t\tlet frame = {\n\t\t\t\t\"title\": \"h2\"\n\t\t\t}\n\n\t\t\tlet output = $('body').scrape(frame)\n\n\t\t\texpect(output)\n\t\t\t\t.toContain({\n\t\t\t\t\t\"title\": \"Pricing\"\n\t\t\t\t})\n\n\t\t})\n\n\n\t\tit('should get img src automatically', () => {\n\n\t\t\tlet frame = {\n\t\t\t\t\"picture\": \".picture\" // even without mentionning the img tag\n\t\t\t}\n\n\t\t\tlet output = $('body').scrape(frame)\n\n\t\t\texpect(output)\n\t\t\t\t.toContain({\n\t\t\t\t\t\"picture\": \"somepath/to/image.png\"\n\t\t\t\t})\n\n\t\t})\n\n\t})\n\n\tdescribe('Get Attribute Data from Object {selector, attribute}', () => {\n\n\t\tit('should get the price attribute value', () => {\n\n\t\t\tlet frame = {\n\t\t\t\t\"proPrice\": {\n\t\t\t\t\t_s: \".planName:contains('Pro') + span\",\n\t\t\t\t\t_a: \"price\"\n\t\t\t\t}\n\t\t\t}\n\n\t\t\tlet output = $('body').scrape(frame)\n\n\t\t\texpect(output)\n\t\t\t\t.toContain({\n\t\t\t\t\t\"proPrice\": \"39.00\"\n\t\t\t\t})\n\n\t\t})\n\n\t\tit('should get the price attribute value (inline)', () => {\n\n\t\t\tlet frame = {\n\t\t\t\t\"proPrice\": \".planName:contains('Pro') + span @ price\"\n\t\t\t}\n\n\t\t\tlet output = $('body').scrape(frame)\n\n\t\t\texpect(output)\n\t\t\t\t.toContain({\n\t\t\t\t\t\"proPrice\": \"39.00\"\n\t\t\t\t})\n\n\t\t})\n\n\t\tit('should get the link (href) attribute value', () => {\n\n\t\t\tlet frame = {\n\t\t\t\t\"link\": {\n\t\t\t\t\t_s: \".mainLink\",\n\t\t\t\t\t_a: \"href\"\n\t\t\t\t}\n\t\t\t}\n\n\t\t\tlet output = $('body').scrape(frame)\n\n\t\t\texpect(output)\n\t\t\t\t.toContain({\n\t\t\t\t\t\"link\": \"some/url/to/somewhere\"\n\t\t\t\t})\n\n\t\t})\n\n\t\tit('should get the link (href) attribute value', () => {\n\n\t\t\tlet frame = {\n\t\t\t\t\"link\": \".mainLink @ href\"\n\t\t\t}\n\n\t\t\tlet output = $('body').scrape(frame)\n\n\t\t\texpect(output)\n\t\t\t\t.toContain({\n\t\t\t\t\t\"link\": \"some/url/to/somewhere\"\n\t\t\t\t})\n\n\t\t})\n\n\t})\n\n\n\tdescribe('Get Data with Type {selector, type[, attribute,]}', () => {\n\n\t\tit('should get the USA telephone value', () => {\n\t\t\tlet frame = {\n\t\t\t\t\"telephone\": {\n\t\t\t\t\t_s: \"[itemprop=usaphone]\",\n\t\t\t\t\t_t: \"telephone\"\n\t\t\t\t}\n\t\t\t}\n\n\t\t\tlet output = $('body').scrape(frame)\n\n\t\t\texpect(output)\n\t\t\t\t.toContain({\n\t\t\t\t\t\"telephone\": \"(912) 148-456\"\n\t\t\t\t})\n\t\t})\n\n\t\tit('should get the USA telephone value (inline)', () => {\n\t\t\tlet frame = {\n\t\t\t\t\"telephone\": \"[itemprop=usaphone] < telephone\"\n\t\t\t}\n\n\t\t\tlet output = $('body').scrape(frame)\n\n\t\t\texpect(output)\n\t\t\t\t.toContain({\n\t\t\t\t\t\"telephone\": \"(912) 148-456\"\n\t\t\t\t})\n\t\t})\n\n\t\tit('should get the FR telephone value', () => {\n\t\t\tlet frame = {\n\t\t\t\t\"telephone\": {\n\t\t\t\t\t_s: \"[itemprop=frphone]\",\n\t\t\t\t\t_t: \"telephone\"\n\t\t\t\t}\n\t\t\t}\n\n\t\t\tlet output = $('body').scrape(frame)\n\n\t\t\texpect(output)\n\t\t\t\t.toContain({\n\t\t\t\t\t\"telephone\": \"+332 38 30 37 90\"\n\t\t\t\t})\n\t\t})\n\n\t\tit('should get the FR telephone value (inline)', () => {\n\t\t\tlet frame = {\n\t\t\t\t\"telephone\": \"[itemprop=frphone] < telephone\"\n\t\t\t}\n\n\t\t\tlet output = $('body').scrape(frame)\n\n\t\t\texpect(output)\n\t\t\t\t.toContain({\n\t\t\t\t\t\"telephone\": \"+332 38 30 37 90\"\n\t\t\t\t})\n\t\t})\n\n\t\tit('should get the email value', () => {\n\t\t\tlet frame = {\n\t\t\t\t\"email\": {\n\t\t\t\t\t_s: \"[itemprop=email]\",\n\t\t\t\t\t_t: \"email\"\n\t\t\t\t}\n\t\t\t}\n\n\t\t\tlet output = $('body').scrape(frame)\n\n\t\t\texpect(output)\n\t\t\t\t.toContain({\n\t\t\t\t\t\"email\": \"lspurcell@suddenlink.net\"\n\t\t\t\t})\n\t\t})\n\n\t\tit('should get the email value (inline)', () => {\n\t\t\tlet frame = {\n\t\t\t\t\"email\": \"[itemprop=email] < email\"\n\t\t\t}\n\n\t\t\tlet output = $('body').scrape(frame)\n\n\t\t\texpect(output)\n\t\t\t\t.toContain({\n\t\t\t\t\t\"email\": \"lspurcell@suddenlink.net\"\n\t\t\t\t})\n\t\t})\n\n\t\tit('should get the inner html value', () => {\n\t\t\tlet frame = {\n\t\t\t\t\"inner\": {\n\t\t\t\t\t_s: \".popup\",\n\t\t\t\t\t_t: \"html\"\n\t\t\t\t}\n\t\t\t}\n\n\t\t\tlet output = $('body').scrape(frame)\n\n\t\t\texpect(output)\n\t\t\t\t.toContain({\n\t\t\t\t\t\"inner\": \"Some inner content\"\n\t\t\t\t})\n\t\t})\n\n\t\tit('should get the inner html value (inline)', () => {\n\t\t\tlet frame = {\n\t\t\t\t\"inner\": \".popup < html\"\n\t\t\t}\n\n\t\t\tlet output = $('body').scrape(frame)\n\n\t\t\texpect(output)\n\t\t\t\t.toContain({\n\t\t\t\t\t\"inner\": \"Some inner content\"\n\t\t\t\t})\n\t\t})\n\n\n\t})\n\n\tdescribe('Get Parsed Data thanks to Regex {selector, parse[, type, attribute]}', () => {\n\n\t\tit('should get the parsed date dd/mm/yyyy from regex', () => {\n\n\t\t\tlet frame = {\n\t\t\t\t\"data\": {\n\t\t\t\t\t_s: \".date\",\n\t\t\t\t\t_p: /\\d{1,2}\\/\\d{1,2}\\/\\d{2,4}/\n\t\t\t\t}\n\t\t\t}\n\n\t\t\tlet output = $('body').scrape(frame)\n\n\t\t\texpect(output)\n\t\t\t\t.toContain({\n\t\t\t\t\t\"data\": \"04/02/2017\"\n\t\t\t\t})\n\t\t})\n\n\t\tit('should get the parsed date dd/mm/yyyy from regex (inline)', () => {\n\n\t\t\tlet frame = {\n\t\t\t\t\"data\": \".date || \\\\d{1,2}/\\\\d{1,2}/\\\\d{2,4}\"\n\t\t\t}\n\n\t\t\tlet output = $('body').scrape(frame)\n\n\t\t\texpect(output)\n\t\t\t\t.toContain({\n\t\t\t\t\t\"data\": \"04/02/2017\"\n\t\t\t\t})\n\t\t})\n\n\t})\n\n\tdescribe('Get Child Obj Data {selector, data: {}}', () => {\n\n\t\tit('should get json object with parent > child', () => {\n\n\t\t\tlet frame = {\n\t\t\t\t\"pricing\": {\n\t\t\t\t\t_s: \"#pricing .item\",\n\t\t\t\t\t_d: {\n\t\t\t\t\t\t\"name\": \".planName\",\n\t\t\t\t\t\t\"price\": \".planPrice\"\n\t\t\t\t\t}\n\t\t\t\t}\n\t\t\t}\n\n\t\t\tlet output = $('body').scrape(frame)\n\n\t\t\texpect(output)\n\t\t\t\t.toContain({\n\t\t\t\t\t\"pricing\": {\n\t\t\t\t\t\t\"name\": \"Hacker\",\n\t\t\t\t\t\t\"price\": \"Free\"\n\t\t\t\t\t}\n\t\t\t\t})\n\n\t\t})\n\n\t})\n\n\tdescribe('Get Array / List of Data {selector, data: [{}]}', () => {\n\n\t\tit('should get json object with parent > childs []', () => {\n\n\t\t\tlet frame = {\n\t\t\t\t\"pricing\": {\n\t\t\t\t\t_s: \"#pricing .item\",\n\t\t\t\t\t_d: [{\n\t\t\t\t\t\t\"name\": \".planName\",\n\t\t\t\t\t\t\"price\": \".planPrice\"\n\t\t\t\t\t}]\n\t\t\t\t}\n\t\t\t}\n\n\t\t\tlet output = $('body').scrape(frame)\n\n\t\t\texpect(output)\n\t\t\t\t.toContain({\n\t\t\t\t\t\"pricing\": [{\n\t\t\t\t\t\t\t\"name\": \"Hacker\",\n\t\t\t\t\t\t\t\"price\": \"Free\"\n\t\t\t\t\t\t},\n\t\t\t\t\t\t{\n\t\t\t\t\t\t\t\"name\": \"Pro\",\n\t\t\t\t\t\t\t\"price\": \"$39\"\n\t\t\t\t\t\t}\n\t\t\t\t\t]\n\t\t\t\t})\n\n\t\t})\n\n\t})\n\n\tdescribe('Get child elements grouped by a selector with _g', () => {\n\n\t\tit('should get the data within the first li item', () => {\n\t\t\tlet frame = {\n\t\t\t\t_g: {\n\t\t\t\t\t_s: \"#pricing .item\",\n\t\t\t\t\t_d: {\n\t\t\t\t\t\t\"name\": \".planName\",\n\t\t\t\t\t\t\"price\": \".planPrice @ price\",\n\t\t\t\t\t\t\"image\": {\n\t\t\t\t\t\t\t\"url\": \"img\",\n\t\t\t\t\t\t\t\"link\": \"a @ href\"\n\t\t\t\t\t\t}\n\t\t\t\t\t}\n\t\t\t\t}\n\t\t\t}\n\n\t\t\tlet output = $('body').scrape(frame)\n\n\t\t\texpect(output)\n\t\t\t\t.toContain({\n\t\t\t\t\t\"name\": \"Hacker\",\n\t\t\t\t\t\"price\": \"0\",\n\t\t\t\t\t\"image\": {\n\t\t\t\t\t\t\"url\": \"./img/hacker.png\",\n\t\t\t\t\t\t\"link\": \"/hacker\"\n\t\t\t\t\t}\n\t\t\t\t})\n\t\t})\n\n\t})\n\n\tdescribe('Full examples', () => {\n\n\t\tit('should get the pricing list + details', () => {\n\n\t\t\tlet frame = {\n\t\t\t\t\"pricing\": {\n\t\t\t\t\t_s: \"#pricing .item\",\n\t\t\t\t\t_d: [{\n\t\t\t\t\t\t\"name\": \".planName\",\n\t\t\t\t\t\t\"price\": {\n\t\t\t\t\t\t\t_s: \".planPrice\",\n\t\t\t\t\t\t\t_a: \"price\"\n\t\t\t\t\t\t},\n\t\t\t\t\t\t\"image\": {\n\t\t\t\t\t\t\t\"url\": {\n\t\t\t\t\t\t\t\t_s: \"img\",\n\t\t\t\t\t\t\t\t_a: \"src\"\n\t\t\t\t\t\t\t},\n\t\t\t\t\t\t\t\"link\": {\n\t\t\t\t\t\t\t\t_s: \"a\",\n\t\t\t\t\t\t\t\t_a: \"href\"\n\t\t\t\t\t\t\t}\n\t\t\t\t\t\t}\n\t\t\t\t\t}]\n\t\t\t\t}\n\t\t\t}\n\n\t\t\tlet output = $('body').scrape(frame)\n\n\t\t\texpect(output)\n\t\t\t\t.toContain({\n\t\t\t\t\t\"pricing\": [{\n\t\t\t\t\t\t\t\"name\": \"Hacker\",\n\t\t\t\t\t\t\t\"price\": \"0\",\n\t\t\t\t\t\t\t\"image\": {\n\t\t\t\t\t\t\t\t\"url\": \"./img/hacker.png\",\n\t\t\t\t\t\t\t\t\"link\": \"/hacker\"\n\t\t\t\t\t\t\t}\n\t\t\t\t\t\t},\n\t\t\t\t\t\t{\n\t\t\t\t\t\t\t\"name\": \"Pro\",\n\t\t\t\t\t\t\t\"price\": \"39.00\",\n\t\t\t\t\t\t\t\"image\": {\n\t\t\t\t\t\t\t\t\"url\": \"./img/pro.png\",\n\t\t\t\t\t\t\t\t\"link\": \"/pro\"\n\t\t\t\t\t\t\t}\n\t\t\t\t\t\t}\n\t\t\t\t\t]\n\t\t\t\t})\n\n\t\t})\n\n\t\tit('should get the pricing list + details (inline)', () => {\n\n\t\t\tlet frame = {\n\t\t\t\t\"pricing\": {\n\t\t\t\t\t_s: \"#pricing .item\",\n\t\t\t\t\t_d: [{\n\t\t\t\t\t\t\"name\": \".planName\",\n\t\t\t\t\t\t\"price\": \".planPrice @ price\",\n\t\t\t\t\t\t\"image\": {\n\t\t\t\t\t\t\t\"url\": \"img\",\n\t\t\t\t\t\t\t\"link\": \"a @ href\"\n\t\t\t\t\t\t}\n\t\t\t\t\t}]\n\t\t\t\t}\n\t\t\t}\n\n\t\t\tlet output = $('body').scrape(frame)\n\n\t\t\texpect(output)\n\t\t\t\t.toContain({\n\t\t\t\t\t\"pricing\": [{\n\t\t\t\t\t\t\t\"name\": \"Hacker\",\n\t\t\t\t\t\t\t\"price\": \"0\",\n\t\t\t\t\t\t\t\"image\": {\n\t\t\t\t\t\t\t\t\"url\": \"./img/hacker.png\",\n\t\t\t\t\t\t\t\t\"link\": \"/hacker\"\n\t\t\t\t\t\t\t}\n\t\t\t\t\t\t},\n\t\t\t\t\t\t{\n\t\t\t\t\t\t\t\"name\": \"Pro\",\n\t\t\t\t\t\t\t\"price\": \"39.00\",\n\t\t\t\t\t\t\t\"image\": {\n\t\t\t\t\t\t\t\t\"url\": \"./img/pro.png\",\n\t\t\t\t\t\t\t\t\"link\": \"/pro\"\n\t\t\t\t\t\t\t}\n\t\t\t\t\t\t}\n\t\t\t\t\t]\n\t\t\t\t})\n\n\t\t})\n\n\t})\n\n})" }, { "path": "tests/playground/html/company.html", "content": "\n\n\n\n\t\n\t\n\t\n\tCompany\n\n\n\n\n\n\t\n\n\t

\n\t\t

Bonjour

\n\t\t

Lorem ipsum dolor sit amet, consectetur adipisicing elit. Iusto, inventore, nihil! Itaque aspernatur tenetur repellendus ipsam iste non accusamus similique, ab minus. Sed saepe nesciunt debitis, sit asperiores optio corporis. \n\n\n\n\t\t

\n\t

\n\t\n\t

\n\t\t

\n\t\t\t The company name \n\t\t\t 815 684 9704 \n\t\t\t test@google.com \n\t\t\t\n\t\t

\n\t\t

\n\t\t\t A company \n\t\t\t+18749284 \n\t\t\t allo@google.com \n\t\t

\n\t\t

\n\t\t\t My Company \n\t\t\t 104 78794 15 \n\t\t\t coucou@google.com \n\t\t

\n\t\t

\n\t\t\t Hire Me \n\t\t\t 849 0445 667 \n\t\t\t naaan@google.com \n\t\t

\n\t

\n\n\t\n\n\n\n" } ]

jsonframe

\n\tsimple multi-level scraper json input/output
\n\t
\n\t\n\t\t $\"npm$ \n\t\n\t\n\t\t $\"\"$ \n\t\n\t\n\t\t $\"a$ \n\t\n

I love jsonframe!

Pricing

Pricing

Bonjour

Bonsoir

jsonframe

\n\tsimple multi-level scraper json input/output\n\t\n\t\n\t\t\n\t\n\t\n\t\t\n\t\n\t\n\t\t\n\t\n

I love jsonframe!

Pricing

Pricing

Bonjour

Bonsoir

\n\tsimple multi-level scraper json input/output
\n\t
\n\t\n\t\t $\"npm$ \n\t\n\t\n\t\t $\"\"$ \n\t\n\t\n\t\t $\"a$ \n\t\n