Repository: basilioss/obsidian-scrapers Branch: main Commit: 2d25fb906cd0 Files: 18 Total size: 52.0 KB Directory structure: gitextract_uz74s91k/ ├── .github/ │ └── FUNDING.yml ├── LICENSE ├── README.md ├── scripts/ │ ├── goodreads.js │ ├── imdb.js │ ├── letterboxd.js │ ├── odysee.js │ ├── website.js │ ├── wikipedia.js │ └── youtube.js └── templates/ ├── goodreads.md ├── imdb.md ├── letterboxd.md ├── odysee.md ├── scraper.md ├── website.md ├── wikipedia.md └── youtube.md ================================================ FILE CONTENTS ================================================ ================================================ FILE: .github/FUNDING.yml ================================================ # These are supported funding model platforms github: # Replace with up to 4 GitHub Sponsors-enabled usernames e.g., [user1, user2] patreon: # Replace with a single Patreon username open_collective: # Replace with a single Open Collective username ko_fi: basilioss tidelift: # Replace with a single Tidelift platform-name/package-name e.g., npm/babel community_bridge: # Replace with a single Community Bridge project-name e.g., cloud-foundry liberapay: # Replace with a single Liberapay username issuehunt: # Replace with a single IssueHunt username lfx_crowdfunding: # Replace with a single LFX Crowdfunding project-name e.g., cloud-foundry polar: # Replace with a single Polar username buy_me_a_coffee: # Replace with a single Buy Me a Coffee username thanks_dev: # Replace with a single thanks.dev username custom: # Replace with up to 4 custom sponsorship URLs e.g., ['link1', 'link2'] ================================================ FILE: LICENSE ================================================ MIT License Copyright (c) 2022 Vasyl Tyshchuk Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ================================================ FILE: README.md ================================================ # Obsidian scrapers A collection of [Templater](https://github.com/SilentVoid13/Templater) scripts for [Obsidian](https://obsidian.md/) that can be easily integrated into your templates to get information from different sites with a copied link. https://user-images.githubusercontent.com/71596800/193448137-3a4d4489-cbc6-4108-905c-9eb3165e6ee1.mp4 ## Installation 1. [Download](https://github.com/basilioss/obsidian-scrapers/archive/refs/heads/main.zip) and unzip the files/folders. 2. Copy `scripts` and `templates` folder to your vault (notes) folder. 3. Install Templater via the Community Plugins tab within Obsidian. Open Templater options in settings under `Community plugins` section. 4. Set `Template folder location` to downloaded `templates` folder. 5. Set `Script files folder location` to downloaded `scripts` folder. 6. Optionally, add a new hotkey for the `scraper` template that can automatically insert the correct template depending on the link domain. 7. Copy the URL and open insert template modal (by default `Alt + E`). Choose `scraper` or youtube/goodreads/imdb etc., depending on the link. 8. Customize the downloaded templates to your liking. Refer to [Templater documentation](https://silentvoid13.github.io/Templater/) for more info. ## Available functions ### Any website | Function | Description | | ------------------------------------------ | ---------------------- | | `<% tp.user.website('title', tp) %>` | Get title | | `<% tp.user.website('description', tp) %>` | Get description | | `<% tp.user.website('url', tp) %>` | Get url | | `<% tp.user.website('image', tp) %>` | Get image preview link | ### [YouTube](https://www.youtube.com/) | Function | Description | | ---------------------------------------------- | ------------------------------------------------------------------------------------- | | `<% tp.user.youtube('title', tp) %>` | Get title | | `<% tp.user.youtube('channel', tp) %>` | Get channel name | | `<% tp.user.youtube('published', tp) %>` | Get publish date | | `<% tp.user.youtube('url', tp) %>` | Get url | | `<% tp.user.youtube('thumbnail', tp) %>` | Get thumbnail link | | `<% tp.user.youtube('keywords', tp) %>` | Get keywords (alternative formats: `keywordsList`, `keywordsQuotes`, `keywordsLinks`) | | `<% tp.user.youtube('duration', tp) %>` | Get duration | | `<% tp.user.youtube('description', tp) %>` | Get short description | | `<% tp.user.youtube('descriptionFull', tp) %>` | Get full description | | `<% tp.user.youtube('id', tp) %>` | Get ID (can be used in embeds) | ### [Goodreads](https://www.goodreads.com/) | Function | Description | | -------------------------------------------- | --------------------------------------------------------------------------------- | | `<% tp.user.goodreads('url', tp) %>` | Get url | | `<% tp.user.goodreads('title', tp) %>` | Get title | | `<% tp.user.goodreads('authors', tp) %>` | Get authors (alternative formats: `authorsList`, `authorsQuotes`, `authorsLinks`) | | `<% tp.user.goodreads('isbn', tp) %>` | Get ISBN | | `<% tp.user.goodreads('published', tp) %>` | Get publish date | | `<% tp.user.goodreads('genres', tp) %>` | Get genres (alternative formats: `genresList`, `genresQuotes`, `genresLinks`) | | `<% tp.user.goodreads('cover', tp) %>` | Get cover link | | `<% tp.user.goodreads('pageCount', tp) %>` | Get number of pages | | `<% tp.user.goodreads('description', tp) %>` | Get description | | `<% tp.user.goodreads('rating', tp) %>` | Get rating | ### [IMDb](https://www.imdb.com/) | Function | Description | | ----------------------------------------- | ----------------------------------------------------------------------------------------- | | `<% tp.user.imdb('title', tp) %>` | Get title | | `<% tp.user.imdb('image', tp) %>` | Get poster link | | `<% tp.user.imdb('published', tp) %>` | Get publish date | | `<% tp.user.imdb('keywords', tp) %>` | Get keywords (alternative formats: `keywordsList`, `keywordsQuotes`, `keywordsLinks`) | | `<% tp.user.imdb('directors', tp) %>` | Get directors (alternative formats: `directorsList`, `directorsQuotes`, `directorsLinks`) | | `<% tp.user.imdb('creators', tp) %>` | Get creators (alternative formats: `creatorsList`, `creatorsQuotes`, `creatorsLinks`) | | `<% tp.user.imdb('duration', tp) %>` | Get duration | | `<% tp.user.imdb('description', tp) %>` | Get description | | `<% tp.user.imdb('type', tp) %>` | Get type (movie/series) | | `<% tp.user.imdb('contentRating', tp) %>` | Get content rating | | `<% tp.user.imdb('genres', tp) %>` | Get genres (alternative formats: `genresList`, `genresQuotes`, `genresLinks`) | | `<% tp.user.imdb('stars', tp) %>` | Get cast (alternative formats: `starsList`, `starsQuotes`, `starsLinks`) | | `<% tp.user.imdb('imdbRating', tp) %>` | Get IMDb rating | | `<% tp.user.imdb('countries', tp) %>` | Get countries (alternative formats: `countriesList`, `countriesQuotes`, `countriesLinks`) | | `<% tp.user.imdb('url', tp) %>` | Get url | ### [Letterboxd](https://letterboxd.com/) | Function | Description | | ---------------------------------------------- | ---------------------------------------------------------------------------------------------- | | `<% tp.user.letterboxd('image', tp) %>` | Get image link | | `<% tp.user.letterboxd('directors', tp) %>` | Get directors (alternative formats: `directorsList`, `directorsQuotes`, `directorsLinks`) | | `<% tp.user.letterboxd('studios', tp) %>` | Get studios (alternative formats: `studiosList`, `studiosQuotes`, `studiosLinks`) | | `<% tp.user.letterboxd('published', tp) %>` | Get publish date | | `<% tp.user.letterboxd('url', tp) %>` | Get url | | `<% tp.user.letterboxd('cast', tp) %>` | Get cast (alternative formats: `castList`, `castQuotes`, `castLinks`) | | `<% tp.user.letterboxd('castShort', tp) %>` | Get cast shortlist (alternative formats: `castShortList`, `castShortQuotes`, `castShortLinks`) | | `<% tp.user.letterboxd('title', tp) %>` | Get title | | `<% tp.user.letterboxd('genres', tp) %>` | Get genres (alternative formats: `genresList`, `genresQuotes`, `genresLinks`) | | `<% tp.user.letterboxd('countries', tp) %>` | Get countries (alternative formats: `countriesList`, `countriesQuotes`, `countriesLinks`) | | `<% tp.user.letterboxd('rating', tp) %>` | Get rating | | `<% tp.user.letterboxd('description', tp) %>` | Get description | | `<% tp.user.letterboxd('imdbUrl', tp) %>` | Get IMDb link | | `<% tp.user.letterboxd('tmdbUrl', tp) %>` | Get TMDB link | | `<% tp.user.letterboxd('languages', tp) %>` | Get languages (alternative formats: `languagesList`, `languagesQuotes`, `languagesLinks`) | | `<% tp.user.letterboxd('writers', tp) %>` | Get writers (alternative formats: `writersList`, `writersQuotes`, `writersLinks`) | | `<% tp.user.letterboxd('runtime', tp) %>` | Get duration | | `<% tp.user.letterboxd('altTitle', tp) %>` | Get alternative title | | `<% tp.user.letterboxd('altTitleUTF8', tp) %>` | Get alternative title if it includes UTF8 only (e.g. no Chinese characters) | ### [Wikipedia](https://www.wikipedia.org/) | Function | Description | | ----------------------------------------- | --------------------- | | `<% tp.user.wikipedia('title', tp) %>` | Get title | | `<% tp.user.wikipedia('url', tp) %>` | Get url | | `<% tp.user.wikipedia('image', tp) %>` | Get image link | | `<% tp.user.wikipedia('headline', tp) %>` | Get short description | ### [Odysee](https://odysee.com/) | Function | Description | | ----------------------------------------- | ------------------------------------------------------------------------------------- | | `<% tp.user.odysee('title', tp) %>` | Get title | | `<% tp.user.odysee('channel', tp) %>` | Get channel name | | `<% tp.user.odysee('description', tp) %>` | Get description | | `<% tp.user.odysee('published', tp) %>` | Get publish date | | `<% tp.user.odysee('thumbnail', tp) %>` | Get thumbnail link | | `<% tp.user.odysee('duration', tp) %>` | Get duration | | `<% tp.user.odysee('url', tp) %>` | Get Odysee url | | `<% tp.user.odysee('contentUrl', tp) %>` | Get direct link to a video | | `<% tp.user.odysee('embedUrl', tp) %>` | Get embed url | | `<% tp.user.odysee('keywords', tp) %>` | Get keywords (alternative formats: `keywordsList`, `keywordsQuotes`, `keywordsLinks`) | ================================================ FILE: scripts/goodreads.js ================================================ // source: https://github.com/basilioss/obsidian-scrapers async function goodreads(value, tp, doc) { let url = await tp.system.clipboard(); if (!isValidHttpUrl(url)) { console.error("Invalid URL for " + value); return ""; } if (!doc) { let page = await tp.obsidian.request({ url }); let p = new DOMParser(); doc = p.parseFromString(page, "text/html"); } const data = extractData(doc); const $ = (selector) => doc.querySelector(selector); switch (value) { case "url": return safeReturn(getUrl(doc), "url"); case "title": return safeReturn(getTitle(doc), "title"); case "authors": return safeReturn(getAuthors(data), "authors"); case "authorsQ": case "authorsQuotes": return formatQuote(getAuthors(data), "authors"); case "authorsL": case "authorsList": return formatList(getAuthors(data), "authors"); case "authorsW": case "authorsLinks": return formatLink(getAuthors(data), "authors"); case "isbn": return safeReturn(data?.isbn, "isbn"); case "published": return safeReturn(getPublished(doc), "published"); case "genres": return safeReturn(getGenres(doc), "genres"); case "genresQ": case "genresQuotes": return formatQuote(getGenres(doc), "genres"); case "genresL": case "genresList": return formatList(getGenres(doc), "genres"); case "genresW": case "genresLinks": return formatLink(getGenres(doc), "genres"); case "cover": return safeReturn(getCover(data), "cover"); case "pageCount": return safeReturn(data?.numberOfPages, "pageCount"); case "description": return safeReturn(getDescription(doc), "description"); case "rating": return safeReturn(data?.aggregateRating?.ratingValue, "rating"); default: new Notice("Incorrect parameter: " + value, 5000); return ""; } } // --- Data extractors --- function extractData(doc) { const scriptTag = doc.querySelector('script[type="application/ld+json"]'); return scriptTag ? JSON.parse(scriptTag.textContent) : {}; } function getUrl(doc) { return doc.querySelector("link[rel='canonical']")?.href || ""; } function getTitle(doc) { const title = doc.querySelector(".BookPageTitleSection .Text__title1")?.innerText || ""; return title.trim().replace(/&/g, "&").replace(/'/g, "'"); } function getAuthors(data) { const authors = data.author; return !authors || authors.length === 0 ? "" : Array.from(authors, (a) => a.name.trim().replace(/ +(?= )/g, "")).join(", "); } function getPublished(doc) { const pubInfo = doc.querySelector('p[data-testid="publicationInfo"]'); if (!pubInfo) return ""; const match = pubInfo.innerHTML.match(/First published\s(.*)/); return match ? match[1].trim() : ""; } function getGenres(doc) { const genreElements = doc.querySelectorAll('.BookPageMetadataSection__genreButton .Button__labelItem'); if (!genreElements || genreElements.length === 0) return ""; const genres = Array.from(genreElements, (el) => el.textContent.trim()); return [...new Set(genres)].join(", "); } function getCover(data) { return (data?.image || "").replace(/\?.*$/g, ""); } function getDescription(doc) { const desc = doc.querySelector('.BookPageMetadataSection__description .Formatted'); return desc ? desc.textContent.trim() : ""; } // --- Helpers --- function safeReturn(result, name) { if (!result) logParsingError(name); return result || ""; } function formatQuote(value, name) { if (!value) logParsingError(name); return value ? `"${value.replace(/, /g, '", "')}"` : ""; } function formatList(value, name) { if (!value) logParsingError(name); return value ? `\n- ${value.replace(/, /g, "\n- ")}` : ""; } function formatLink(value, name) { if (!value) logParsingError(name); return value ? `[[${value.replace(/, /g, "]], [[")}]]` : ""; } function isValidHttpUrl(string) { try { const url = new URL(string); return url.protocol === "http:" || url.protocol === "https:"; } catch (_) { return false; } } function logParsingError(variable) { console.error(`Parsing Error: Couldn't get ${variable}.`); } module.exports = goodreads; ================================================ FILE: scripts/imdb.js ================================================ // source: https://github.com/basilioss/obsidian-scrapers async function imdb(value, tp, doc) { let url = await tp.system.clipboard(); if (!isValidHttpUrl(url)) { console.error("Invalid URL for " + value); return ""; } if (doc === undefined) { let page = await tp.obsidian.request({ url }); let p = new DOMParser(); doc = p.parseFromString(page, "text/html"); } const $ = (selector) => doc.querySelector(selector); let json; try { let script = $("script[type='application/ld+json']")?.innerText ?? ""; json = JSON.parse(script); } catch (err) { console.warn("Warning: Failed to parse JSON-LD metadata. Proceeding without JSON data."); json = null; // allow fallback functions to run } switch (value) { case "title": return safeReturn(getTitle(json), "title"); case "image": return safeReturn(json?.image, "image"); case "published": return safeReturn(getPublished(json, doc), "published"); case "keywords": return safeReturn(getKeywords(json), "keywords"); case "keywordsQ": case "keywordsQuotes": return formatQuote(getKeywords(json), "keywords"); case "keywordsL": case "keywordsList": return formatList(getKeywords(json), "keywords"); case "keywordsW": case "keywordsLinks": return formatLink(getKeywords(json), "keywords"); case "directors": return safeReturn(getDirectors(json), "directors"); case "directorsQ": case "directorsQuotes": return formatQuote(getDirectors(json), "directors"); case "directorsL": case "directorsList": return formatList(getDirectors(json), "directors"); case "directorsW": case "directorsLinks": return formatLink(getDirectors(json), "directors"); case "creators": return safeReturn(getCreators(json), "creators"); case "creatorsQ": case "creatorsQuotes": return formatQuote(getCreators(json), "creators"); case "creatorsL": case "creatorsList": return formatList(getCreators(json), "creators"); case "creatorsW": case "creatorsLinks": return formatLink(getCreators(json), "creators"); case "duration": return safeReturn(getDuration(json), "duration"); case "description": return safeReturn(getDescription(doc), "description"); case "type": return safeReturn(getType(json), "type"); case "contentRating": return safeReturn(json?.contentRating, "contentRating"); case "genres": return safeReturn(getGenres(json), "genres"); case "genresQ": case "genresQuotes": return formatQuote(getGenres(json), "genres"); case "genresL": case "genresList": return formatList(getGenres(json), "genres"); case "genresW": case "genresLinks": return formatLink(getGenres(json), "genres"); case "stars": return safeReturn(getStars(json), "stars"); case "starsQ": case "starsQuotes": return formatQuote(getStars(json), "stars"); case "starsL": case "starsList": return formatList(getStars(json), "stars"); case "starsW": case "starsLinks": return formatLink(getStars(json), "stars"); case "imdbRating": return safeReturn(json?.aggregateRating?.ratingValue, "imdbRating") case "countries": return safeReturn(getCountries(doc), "countries"); case "countriesQ": case "countriesQuotes": return formatQuote(getCountries(doc), "countries"); case "countriesL": case "countriesList": return formatList(getCountries(doc), "countries"); case "countriesW": case "countriesLinks": return formatLink(getCountries(doc), "countries"); case "url": return safeReturn(getUrl(json), "url"); default: new Notice("Incorrect parameter: " + value, 5000); return ""; } } // --- Data extractors --- function getTitle(json) { let title = json?.alternateName ?? json?.name ?? ""; return title.replace(/'/g, "'"); } function getPublished(json, doc) { if (json?.datePublished) { return json.datePublished.substring(0, 4); } return doc.querySelector("a[href*='releaseinfo']")?.innerText || ""; } function getKeywords(json) { if (!json?.keywords) return ""; return json.keywords.toLowerCase().replace(/,/g, ", "); } function getDirectors(json) { if (!json?.director) return ""; return json.director.map((d) => d.name).filter(Boolean).join(", "); } function getCreators(json) { if (!json?.creator) return ""; return json.creator.map((c) => c.name).filter(Boolean).join(", "); } function getDuration(json) { let duration = ""; if (json?.duration != null) { duration = JSON.stringify(json.duration).toLowerCase(); duration = duration .replace(/"pt/, "") .replace(/h/, "h ") .replace(/m"/, "m"); } return duration; } function getType(json) { return json?.["@type"]?.toLowerCase().replace(/tv/i, "") || ""; } function getGenres(json) { if (!json?.genre) return ""; return JSON.stringify(json.genre) .toLowerCase() .replace(/","/g, ", ") .replace(/\["/g, "") .replace(/\"]/, ""); } function getStars(json) { if (!json?.actor) return ""; return json.actor.map((a) => a.name).join(", "); } function getCountries(doc) { let countries = doc.querySelectorAll("a[href*='country_of_origin']"); return Array.from(countries, (countries) => countries.textContent).join(", "); } function getDescription(doc) { return doc.querySelector("span[data-testid='plot-xl']")?.innerText || ""; } function getUrl(json) { if (!json?.url) return ""; return json.url.startsWith("http") ? json.url : "https://www.imdb.com" + json.url; } // --- Helpers --- function safeReturn(result, name) { if (!result) logParsingError(name); return result || ""; } function formatQuote(value, name) { if (!value) logParsingError(name); return value ? `"${value.replace(/, /g, '", "')}"` : ""; } function formatList(value, name) { if (!value) logParsingError(name); return value ? `\n- ${value.replace(/, /g, "\n- ")}` : ""; } function formatLink(value, name) { if (!value) logParsingError(name); return value ? `[[${value.replace(/, /g, "]], [[")}]]` : ""; } function logParsingError(variable) { console.error(`Parsing Error: Couldn't get ${variable}.`); } function isValidHttpUrl(string) { try { let url = new URL(string); return url.protocol === "http:" || url.protocol === "https:"; } catch (_) { return false; } } module.exports = imdb; ================================================ FILE: scripts/letterboxd.js ================================================ // source: https://github.com/basilioss/obsidian-scrapers async function letterboxd(value, tp, doc) { let url = await tp.system.clipboard(); if (!isValidHttpUrl(url)) { console.error("Invalid URL for " + value); return ""; } if (doc === undefined) { let page = await tp.obsidian.request({ url }); let p = new DOMParser(); doc = p.parseFromString(page, "text/html"); } const $ = (selector) => doc.querySelector(selector); let json; try { let script = $("script[type='application/ld+json']")?.innerText ?? ""; // Remove multi line comments inside the script script = script.replace(/\/\*([\s\S]*?)\*\//g, ""); // Remove new lines script = script.replace(/(\r\n|\n|\r)/gm, ""); json = JSON.parse(script); } catch (err) { console.warn("Warning: Failed to parse JSON-LD metadata. Proceeding without JSON data."); json = null; // allow fallback functions to run } switch (value) { case "image": return safeReturn(getImage(json), "image"); case "directors": return safeReturn(getDirectors(json), "directors"); case "directorsQ": case "directorsQuotes": return formatQuote(getDirectors(json), "directors"); case "directorsL": case "directorsList": return formatList(getDirectors(json), "directors"); case "directorsW": case "directorsLinks": return formatLink(getDirectors(json), "directors"); case "studios": return safeReturn(getStudios(json), "studios"); case "studiosQ": case "studiosQuotes": return formatQuote(getStudios(json), "studios"); case "studiosL": case "studiosList": return formatList(getStudios(json), "studios"); case "studiosW": case "studiosLinks": return formatLink(getStudios(json), "studios"); case "published": return safeReturn(json?.releasedEvent?.[0]?.startDate, "published"); case "url": return safeReturn(json?.url, "url"); case "cast": return safeReturn(getCast(json), "cast"); case "castQ": case "castQuotes": return formatQuote(getCast(json), "cast"); case "castL": case "castList": return formatList(getCast(json), "cast"); case "castW": case "castLinks": return formatLink(getCast(json), "cast"); case "castShort": return safeReturn(getCastShort(json), "castShort"); case "castShortQ": case "castShortQuotes": return formatQuote(getCastShort(json), "castShort"); case "castShortL": case "castShortList": return formatList(getCastShort(json), "castShort"); case "castShortW": case "castShortLinks": return formatLink(getCastShort(json), "castShort"); case "title": return safeReturn(json?.name?.replace(/"/g, "”"), "title"); case "genres": return safeReturn(getGenres(json), "genres"); case "genresQ": case "genresQuotes": return formatQuote(getGenres(json), "genres"); case "genresL": case "genresList": return formatList(getGenres(json), "genres"); case "genresW": case "genresLinks": return formatLink(getGenres(json), "genres"); case "countries": return safeReturn(getCountries(json), "countries"); case "countriesQ": case "countriesQuotes": return formatQuote(getCountries(json), "countries"); case "countriesL": case "countriesList": return formatList(getCountries(json), "countries"); case "countriesW": case "countriesLinks": return formatLink(getCountries(json), "countries"); case "rating": return safeReturn(json?.aggregateRating?.ratingValue, "rating"); case "description": return safeReturn($("meta[name='description']")?.content, "description"); case "imdbUrl": return safeReturn(getImdbUrl(doc), "imdbUrl"); case "tmdbUrl": return safeReturn($("a[data-track-action='TMDB']")?.href, "tmdbUrl"); case "languages": return safeReturn(getLanguages(doc), "languages"); case "languagesQ": case "languagesQuotes": return formatQuote(getLanguages(doc), "languages"); case "languagesL": case "languagesList": return formatList(getLanguages(doc), "languages"); case "languagesW": case "languagesLinks": return formatLink(getLanguages(doc), "languages"); case "writers": return safeReturn(getWriters(doc), "writers"); case "writersQ": case "writersQuotes": return formatQuote(getWriters(doc), "writers"); case "writersL": case "writersList": return formatList(getWriters(doc), "writers"); case "writersW": case "writersLinks": return formatLink(getWriters(doc), "writers"); case "runtime": return safeReturn(getRuntime(doc), "runtime"); case "altTitle": return safeReturn(getAltTitle(doc), "altTitle"); case "altTitleUTF8": return safeReturn(getAltTitleUTF8(doc), "altTitle"); default: new Notice("Incorrect parameter: " + value, 5000); return ""; } } // --- Data Extractors --- function getImage(json) { return (json?.image || "").replace(/\?.*$/, ""); } function getDirectors(json) { if (json?.director) { return json.director.map((d) => d.name).join(", "); } return ""; } function getStudios(json) { if (json?.productionCompany) { return json.productionCompany.map((s) => s.name).join(", "); } return ""; } function getCast(json) { if (json?.actors) { return json.actors.map((a) => a.name).join(", "); } return ""; } function getCastShort(json, n = 5) { let _cast = getCast(json); if (!_cast) return ""; return _cast.split(", ").slice(0, n).join(", "); } function getGenres(json) { return Array.isArray(json?.genre) ? json.genre.join(", ").toLowerCase() : (json?.genre || "").toLowerCase(); } function getCountries(json) { if (json?.countryOfOrigin) { return json.countryOfOrigin.map((c) => c.name).join(", "); } return ""; } function getLanguages(doc) { let languages = doc.querySelectorAll("a[href^='/films/language/']"); languages = Array.from(languages, (languages) => languages.textContent); languages = [...new Set(languages)]; // Remove duplicates return languages.join(", "); } function getWriters(doc) { let writers = doc.querySelectorAll("a[href^='/writer/']"); return Array.from(writers, (writers) => writers.textContent).join(", "); } function getRuntime(doc) { let runtime = doc.querySelector("p[class*='text-link']")?.innerText || ""; // Remove new lines runtime = runtime.replace(/(\r\n|\n|\r)/gm, "").trim(); runtime = runtime.substring(0, runtime.indexOf(" ")).replace(/\smins/, ""); return runtime; } function getImdbUrl(doc) { let imdb = doc.querySelector("a[data-track-action='IMDb']")?.href; return imdb ? imdb.replace(/\/maindetails/, "") : ""; } function getAltTitle(doc) { // let alt = doc.querySelector("section[id='featured-film-header'] em")?.innerText || ""; let altTitle = doc.querySelector("h2.originalname em")?.innerText || ""; return altTitle.replace(/[‘’]/g, "").replace(/"/g, "”"); } function isUTF8(input) { for (var i = 0; i < input.length; i++) { var temp = input.charCodeAt(i) if (temp > 0xFF) { return false } } return true } function getAltTitleUTF8(doc) { altTitle = getAltTitle(doc); if (!isUTF8(altTitle)) { return ""; } else { return altTitle; } } // --- Helpers --- function isValidHttpUrl(string) { try { let url = new URL(string); return url.protocol === "http:" || url.protocol === "https:"; } catch (_) { return false; } } function logParsingError(variable) { console.error(`Parsing Error: Couldn't get ${variable}.`); } function safeReturn(result, name) { if (!result) logParsingError(name); return result || ""; } function formatQuote(value, name) { if (!value) logParsingError(name); return value ? `"${value.replace(/, /g, '", "')}"` : ""; } function formatList(value, name) { if (!value) logParsingError(name); return value ? `\n- ${value.replace(/, /g, "\n- ")}` : ""; } function formatLink(value, name) { if (!value) logParsingError(name); return value ? `[[${value.replace(/, /g, "]], [[")}]]` : ""; } module.exports = letterboxd; ================================================ FILE: scripts/odysee.js ================================================ // source: https://github.com/basilioss/obsidian-scrapers async function odysee(value, tp, doc) { let url = await tp.system.clipboard(); if (!isValidHttpUrl(url)) { console.error("Invalid URL for " + value); return ""; } if (doc === undefined) { let page = await tp.obsidian.request({ url }); let p = new DOMParser(); doc = p.parseFromString(page, "text/html"); } let json = JSON.parse( doc.querySelector("script[type='application/ld+json']").innerHTML ); switch (value) { case "title": // Get title from JSON. If undefined, return empty string let title = json?.name || ""; return title .replace(/&/g, "&") .replace(/'/g, "'") .replace(/"/g, "”"); case "description": let description = json?.description || ""; return description .replace(/&/g, "&") .replace(/'/g, "'") .replace(/"/g, "”"); case "thumbnail": return json?.thumbnailUrl || ""; case "published": let published = json?.uploadDate || ""; return published.substring(0, 10); case "duration": let duration = json?.duration || ""; return duration .replace(/PT/, "") .replace(/H/, "h ") .replace(/M/, "m ") .replace(/S/, "s"); case "url": return json?.url || ""; case "contentUrl": return json?.contentUrl || ""; case "embedUrl": return json?.embedUrl || ""; case "channel": return json.author?.name || ""; case "keywords": let keywords = json?.keywords || ""; return keywords.replace(/,/g, ", "); case "keywordsQ": // Quotes let keywordsQ = json?.keywords || ""; return '"' + keywordsQ.replace(/,/g, '", "') + '"'; case "keywordsL": // List let keywordsL = json?.keywords || ""; return "\n- " + keywordsL.replace(/,/g, "\n- "); case "keywordsW": // Wiki links let keywordsW = json?.keywords || ""; return "[[" + keywordsW.replace(/,/g, "]], [[") + "]]"; default: new Notice("Incorrect parameter: " + value, 5000); } } function isValidHttpUrl(string) { let url; try { url = new URL(string); } catch (_) { return false; } return url.protocol === "http:" || url.protocol === "https:"; } module.exports = odysee; ================================================ FILE: scripts/website.js ================================================ // source: https://github.com/basilioss/obsidian-scrapers async function website(value, tp, doc) { let url = await tp.system.clipboard(); if (!isValidHttpUrl(url)) { console.error("Invalid URL for " + value); return ""; } if (doc === undefined) { let page = await tp.obsidian.request(url); let p = new DOMParser(); doc = p.parseFromString(page, "text/html"); } // Alias for querySelector let $ = (s) => doc.querySelector(s); switch (value) { case "url": return url.trim(); case "title": return ( $("meta[property='title']")?.content || $("meta[property='og:title']")?.content || $("meta[name='twitter:title']")?.content || $("title")?.textContent.trim() || "" ); case "description": let description = $("meta[property='og:description']")?.content || $("meta[name='description']")?.content || $("meta[name='twitter:description']")?.content || ""; description = description .replace(/'/g, "'") .replace(/'/g, "'") .trim(); return description .replace(/&/g, "&") .replace(/"/g, '"') .replace(/ /g, " "); case "image": let image = $("meta[property='og:image']")?.content || $("meta[name='twitter:image']")?.content || $("meta[name='twitter:image:src']")?.content || ""; // Remove unnecessary part return image.replace(/\?.*$/g, "") default: new Notice("Incorrect parameter: " + value, 5000); } } function isValidHttpUrl(string) { let url; try { url = new URL(string); } catch (_) { return false; } return url.protocol === "http:" || url.protocol === "https:"; } module.exports = website; ================================================ FILE: scripts/wikipedia.js ================================================ // source: https://github.com/basilioss/obsidian-scrapers async function wikipedia(value, tp, doc) { let url = await tp.system.clipboard(); if (!isValidHttpUrl(url)) { console.error("Invalid URL for " + value); return ""; } if (doc === undefined) { let page = await tp.obsidian.request({ url }); let p = new DOMParser(); doc = p.parseFromString(page, "text/html"); } let json = ""; try { json = JSON.parse( doc.querySelector("script[type='application/ld+json']").innerHTML ); } catch (error) { new Notice(error); } switch (value) { case "title": return json?.name || ""; case "url": return json?.url || ""; case "image": return json?.image || ""; case "headline": // Short description return json?.headline || ""; default: new Notice("Incorrect parameter: " + value, 5000); } } function isValidHttpUrl(string) { let url; try { url = new URL(string); } catch (_) { return false; } return url.protocol === "http:" || url.protocol === "https:"; } module.exports = wikipedia; ================================================ FILE: scripts/youtube.js ================================================ // source: https://github.com/basilioss/obsidian-scrapers async function youtube(value, tp, doc) { let url = await tp.system.clipboard(); if (!isValidHttpUrl(url)) { console.error("Invalid URL for " + value); return ""; } if (doc === undefined) { // Alternative front-end (invidious.io) let altDomain = "yewtu.be"; if (url.includes(altDomain)) { var regex = new RegExp(altDomain, "g"); url = url.replace(regex, "youtube.com"); } let page = await tp.obsidian.request({ url }); let p = new DOMParser(); doc = p.parseFromString(page, "text/html"); } const $ = (selector) => doc.querySelector(selector); switch (value) { case "title": return safeReturn(getTitle(doc), "title"); case "channel": return safeReturn(getChannel(doc), "channel"); case "published": return safeReturn(getPublished(doc), "published"); case "url": return safeReturn(getShortUrl(doc), "url"); case "thumbnail": return safeReturn(getThumbnail(doc), "thumbnail"); case "keywords": return safeReturn(getKeywords(doc), "keywords"); case "keywordsQ": case "keywordsQuotes": return formatQuote(getKeywords(doc), "keywords"); case "keywordsL": case "keywordsList": return formatList(getKeywords(doc), "keywords"); case "keywordsW": case "keywordsLinks": return formatLink(getKeywords(doc), "keywords"); case "duration": return safeReturn(getDuration(doc), "duration"); case "description": return safeReturn(getDescription(doc), "description"); case "descriptionFull": return safeReturn(getDescriptionFull(doc), "descriptionFull"); case "id": return safeReturn(getId(doc), "id"); default: new Notice("Incorrect parameter: " + value, 5000); return ""; } } // --- Data extractors --- function getTitle(doc) { const title = doc.querySelector("meta[property='og:title']")?.content || ""; return title.replace(/"/g, "'"); } function getChannel(doc) { return doc.querySelector("link[itemprop='name']")?.getAttribute("content") || ""; } function getPublished(doc) { return doc.querySelector("meta[itemprop='uploadDate']")?.content || ""; } function getShortUrl(doc) { return doc.querySelector("link[rel='shortLinkUrl']")?.href || ""; } function getThumbnail(doc) { const shortUrl = getShortUrl(doc); return shortUrl ? shortUrl.replace(/youtu\.be/, "img.youtube.com/vi") + "/maxresdefault.jpg" : ""; } function getKeywords(doc) { return doc.querySelector("meta[name='keywords']")?.content || ""; } function getDuration(doc) { let duration = doc.querySelector("meta[itemprop='duration']")?.content || ""; if (duration.startsWith("PT")) duration = duration.slice(2); return duration.replace(/M/gi, "m ").replace(/S/gi, "s"); } function getDescription(doc) { const desc = doc.querySelector("meta[itemprop='description']")?.content || ""; return desc.replace(/"/g, "'"); } function getDescriptionFull(doc) { const html = new XMLSerializer().serializeToString(doc); const match = html.match(/"shortDescription":"(.*?)","isCrawlable":/); if (!match) return ""; return match[1] .replace(/\\u0026/g, "&") .replace(/\\n/g, "\n") .replace(/\\r/g, "") .replace(/\\"/g, '"'); } function getId(doc) { return doc.querySelector("meta[itemprop='identifier']")?.content || ""; } // --- Helpers --- function safeReturn(result, name) { if (!result) logParsingError(name); return result || ""; } function formatQuote(value, name) { if (!value) logParsingError(name); return value ? `"${value.replace(/, /g, '", "')}"` : ""; } function formatList(value, name) { if (!value) logParsingError(name); return value ? `\n- ${value.replace(/, /g, "\n- ")}` : ""; } function formatLink(value, name) { if (!value) logParsingError(name); return value ? `[[${value.replace(/, /g, "]], [[")}]]` : ""; } function isValidHttpUrl(string) { try { let url = new URL(string); return url.protocol === "http:" || url.protocol === "https:"; } catch (_) { return false; } } function logParsingError(variable) { console.error(`Parsing Error: Couldn't get ${variable}.`); } module.exports = youtube; ================================================ FILE: templates/goodreads.md ================================================ --- <%* // Request a web page to speed up execution time let page = await tp.obsidian.request(await tp.system.clipboard()) let doc = new DOMParser().parseFromString(page,"text/html") let title = await tp.user.goodreads('title', tp, doc) -%> url: "<% tp.user.goodreads('url', tp, doc) %>" isbn: <% tp.user.goodreads('isbn', tp, doc) %> published: <% tp.user.goodreads('published', tp, doc) %> pages: <% tp.user.goodreads('pageCount', tp, doc) %> ratings: <% tp.user.goodreads('rating', tp, doc) %> authors: [<% tp.user.goodreads('authorsQuotes', tp) %>] genres: [<% tp.user.goodreads('genresQuotes', tp, doc) %>] --- # <% title %> ![](<% tp.user.goodreads('cover', tp, doc) %>) ## Description <% tp.user.goodreads('description', tp, doc) %> ## Authors <% tp.user.goodreads('authors', tp) %> Links: <% tp.user.goodreads('authorsLinks', tp) %> List: <% tp.user.goodreads('authorsList', tp) %> ## Genres <% tp.user.goodreads('genres', tp) %> Links: <% tp.user.goodreads('genresLinks', tp) %> List: <% tp.user.goodreads('genresList', tp) %> <%* let filename = title // Remove prohibited characters filename = filename.replace(/[/\:*?<>|""]/g, "") // Rename a note await tp.file.move(filename) -%> ================================================ FILE: templates/imdb.md ================================================ --- <%* // Request a web page to speed up execution time let page = await tp.obsidian.request(await tp.system.clipboard()) let doc = new DOMParser().parseFromString(page,"text/html") let title = await tp.user.imdb('title', tp, doc) -%> url: "<% tp.user.imdb('url', tp, doc) %>" imdb-rating: <% tp.user.imdb('imdbRating', tp, doc) %> content-rating: <% tp.user.imdb('contentRating', tp, doc) %> duration: <% tp.user.imdb('duration', tp, doc) %> year: <% tp.user.imdb('published', tp, doc) %> type: <% tp.user.imdb('type', tp, doc) %> genres: [<% tp.user.imdb('genresQuotes', tp, doc) %>] keywords: [<% tp.user.imdb('keywordsQuotes', tp, doc) %>] directors: [<% tp.user.imdb('directorsQuotes', tp, doc) %>] creators: [<% tp.user.imdb('creatorsQuotes', tp, doc) %>] cast: [<% tp.user.imdb('starsQuotes', tp, doc) %>] countries: [<% tp.user.imdb('countriesQuotes', tp, doc) %>] --- # <% title %> ## Image ![](<% tp.user.imdb('image', tp, doc) %>) ## Description <% tp.user.imdb('description', tp, doc) %> ## Genres - <% tp.user.imdb('genres', tp, doc) %> - Links: <% tp.user.imdb('genresLinks', tp, doc) %> List: <% tp.user.imdb('genresList', tp, doc) %> ## Keywords - <% tp.user.imdb('keywords', tp, doc) %> - Links: <% tp.user.imdb('keywordsLinks', tp, doc) %> List: <% tp.user.imdb('keywordsList', tp, doc) %> ## Directors - <% tp.user.imdb('directors', tp, doc) %> - Links: <% tp.user.imdb('directorsLinks', tp, doc) %> List: <% tp.user.imdb('directorsList', tp, doc) %> ## Creators - <% tp.user.imdb('creators', tp, doc) %> - Links: <% tp.user.imdb('creatorsLinks', tp, doc) %> List: <% tp.user.imdb('creatorsList', tp, doc) %> ## Countries - <% tp.user.imdb('countries', tp, doc) %> - Links: <% tp.user.imdb('countriesLinks', tp, doc) %> List: <% tp.user.imdb('countriesList', tp, doc) %> ## Cast - <% tp.user.imdb('stars', tp, doc) %> - Links: <% tp.user.imdb('starsLinks', tp, doc) %> List: <% tp.user.imdb('starsList', tp, doc) %> <%* let filename = title // Remove prohibited characters filename = filename.replace(/[/\:*?<>|""]/g, "") // Rename a note await tp.file.move(filename) -%> ================================================ FILE: templates/letterboxd.md ================================================ --- <%* // Request a web page to speed up execution time let page = await tp.obsidian.request(await tp.system.clipboard()) let doc = new DOMParser().parseFromString(page,"text/html") let title = await tp.user.letterboxd('title', tp, doc) let altTitle = await tp.user.letterboxd('altTitle', tp, doc) -%> aliases: <%* altTitle == "" ? tR += '["' + title + '"]' : tR += '["' + title + '", "' + altTitle + '"]' %> url: "<% tp.user.letterboxd('url', tp, doc) %>" imdb-url: <% tp.user.letterboxd('imdbUrl', tp, doc) %> tmdb-url: <% tp.user.letterboxd('tmdbUrl', tp, doc) %> rating: <% tp.user.letterboxd('rating', tp, doc) %> runtime: <% tp.user.letterboxd('runtime', tp, doc) %> year: <% tp.user.letterboxd('published', tp, doc) %> genres: [<% tp.user.letterboxd('genresQuotes', tp, doc) %>] directors: [<% tp.user.letterboxd('directorsQuotes', tp, doc) %>] studios: [<% tp.user.letterboxd('studiosQuotes', tp, doc) %>] countries: [<% tp.user.letterboxd('countriesQuotes', tp, doc) %>] languages: [<% tp.user.letterboxd('languagesQuotes', tp, doc) %>] writers: [<% tp.user.letterboxd('writersQuotes', tp, doc) %>] cast: [<% tp.user.letterboxd('castShortQuotes', tp, doc) %>] --- # <% tp.user.letterboxd('title', tp, doc) %> ## Image ![](<% tp.user.letterboxd('image', tp, doc) %>) ## Description <% tp.user.letterboxd('description', tp, doc) %> ## Genres <% tp.user.letterboxd('genres', tp, doc) %> Links: <% tp.user.letterboxd('genresLinks', tp, doc) %> List: <% tp.user.letterboxd('genresL', tp, doc) %> ## Directors <% tp.user.letterboxd('directors', tp, doc) %> Links: <% tp.user.letterboxd('directorsLinks', tp, doc) %> List: <% tp.user.letterboxd('directorsList', tp, doc) %> ## Studios <% tp.user.letterboxd('studios', tp, doc) %> Links: <% tp.user.letterboxd('studiosLinks', tp, doc) %> List: <% tp.user.letterboxd('studiosList', tp, doc) %> ## Countries <% tp.user.letterboxd('countries', tp, doc) %> Links: <% tp.user.letterboxd('countriesLinks', tp, doc) %> List: <% tp.user.letterboxd('countriesList', tp, doc) %> ## Languages <% tp.user.letterboxd('languages', tp, doc) %> Links: <% tp.user.letterboxd('languagesLinks', tp, doc) %> List: <% tp.user.letterboxd('languagesList', tp, doc) %> ## Writers <% tp.user.letterboxd('writers', tp, doc) %> Links: <% tp.user.letterboxd('writersLinks', tp, doc) %> List: <% tp.user.letterboxd('writersList', tp, doc) %> ## Cast (shortlist) <% tp.user.letterboxd('castShort', tp, doc) %> Links: <% tp.user.letterboxd('castShortLinks', tp, doc) %> List: <% tp.user.letterboxd('castShortList', tp, doc) %> ## Cast <% tp.user.letterboxd('cast', tp, doc) %> Links: <% tp.user.letterboxd('castLinks', tp, doc) %> List: <% tp.user.letterboxd('castList', tp, doc) %> <%* let filename = title // Remove prohibited characters filename = filename.replace(/[/\:*?<>|""]/g, "") // Rename a note await tp.file.move(filename) -%> ================================================ FILE: templates/odysee.md ================================================ --- <%* // Request a web page to speed up execution time let page = await tp.obsidian.request(await tp.system.clipboard()) let doc = new DOMParser().parseFromString(page,"text/html") let title = await tp.user.odysee('title', tp, doc) -%> channel: "<% tp.user.odysee('channel', tp, doc) %>" published: <% tp.user.odysee('published', tp, doc) %> url: "<% tp.user.odysee('url', tp, doc) %>" content-url: "<% tp.user.odysee('contentUrl', tp, doc) %>" duration: <% tp.user.odysee('duration', tp, doc) %> keywords: [<% tp.user.odysee('keywordsQ', tp, doc) %>] --- # <% title %> ## Thumbnail ![](<% tp.user.odysee('thumbnail', tp, doc) %>) ## Keywords <% tp.user.odysee('keywords', tp, doc) %> Links: <% tp.user.odysee('keywordsW', tp, doc) %> List: <% tp.user.odysee('keywordsL', tp, doc) %> ## Description <% tp.user.odysee('description', tp, doc) %> <%* let filename = title // Remove prohibited characters filename = filename.replace(/[/\:*?<>|""]/g, "") // Rename a note await tp.file.move(filename) -%> ================================================ FILE: templates/scraper.md ================================================ <%* let clipboard = await tp.system.clipboard(); clipboard = clipboard.trim(); // remove whitespace from both ends let urlExpression = /^https?:\/\/(www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()@:%_\+.~#?&//=]*)?/gmi; let urlRegex = new RegExp(urlExpression); if (clipboard.includes("/www.youtube.com/")) { // Insert YouTube template tR += await tp.file.include("[[youtube]]"); } else if (clipboard.includes("/youtu.be/")) { tR += await tp.file.include("[[youtube]]"); } else if (clipboard.includes("/yewtu.be/")) { // Alternative YouTube front-end (invidious.io) tR += await tp.file.include("[[youtube]]"); } else if (clipboard.includes("/www.goodreads.com/")) { tR += await tp.file.include("[[goodreads]]"); } else if (clipboard.includes("/www.imdb.com/")) { tR += await tp.file.include("[[imdb]]"); } else if (clipboard.includes("/letterboxd.com/")) { tR += await tp.file.include("[[letterboxd]]"); } else if (clipboard.includes("wikipedia.org/")) { tR += await tp.file.include("[[wikipedia]]"); } else if (clipboard.includes("/odysee.com/")) { tR += await tp.file.include("[[odysee]]"); } else if (clipboard.match(urlRegex)) { tR += await tp.file.include("[[website]]"); } else { new Notice("No link in the clipboard"); } %> ================================================ FILE: templates/website.md ================================================ [<% tp.user.website('title', tp) %>](<% tp.user.website('url', tp) %>) ![](<% tp.user.website('image', tp) %>) <% tp.user.website('description', tp) %> ================================================ FILE: templates/wikipedia.md ================================================ [<% tp.user.wikipedia('title', tp) %>](<% tp.user.wikipedia('url', tp) %>) ![](<% tp.user.wikipedia('image', tp) %>) <% tp.user.wikipedia('headline', tp) %> ================================================ FILE: templates/youtube.md ================================================ --- <%* // Request a web page to speed up execution time let page = await tp.obsidian.request(await tp.system.clipboard()) let doc = new DOMParser().parseFromString(page,"text/html") let title = await tp.user.youtube('title', tp, doc) -%> channel: "<% tp.user.youtube('channel', tp, doc) %>" published: <% tp.user.youtube('published', tp, doc) %> url: "<% tp.user.youtube('url', tp, doc) %>" duration: <% tp.user.youtube('duration', tp, doc) %> id: <% tp.user.youtube('id', tp, doc) %> keywords: [<% tp.user.youtube('keywordsQuotes', tp, doc) %>] --- # <% title %> ## Thumbnail ![](<% tp.user.youtube('thumbnail', tp, doc) %>) ## Keywords <% tp.user.youtube('keywords', tp, doc) %> Links: <% tp.user.youtube('keywordsLinks', tp, doc) %> List: <% tp.user.youtube('keywordsList', tp, doc) %> ## Description <% tp.user.youtube('description', tp, doc) %> ## Full description <% tp.user.youtube('descriptionFull', tp, doc) %> <%* let filename = title // Remove prohibited characters filename = filename.replace(/[/\:*?<>|""]/g, "") // Rename a note await tp.file.move(filename) -%>