[
  {
    "path": ".gitattributes",
    "content": "# convert to OS line endings on checkout, back to LF on commit\n* text=auto\n\n# ensure anything copied to the container has unix style line endings\n*.sh text eol=lf\nrequirements.txt text eol=lf"
  },
  {
    "path": ".gitignore",
    "content": "__pycache__\n.mypy_cache/\nmodels/\n"
  },
  {
    "path": "CONTRIBUTORS.md",
    "content": "# Contributors (alphabetically)\n\n* **[madisonmay](https://github.com/madisonmay)**\n\n  Added Dockerfiles\n\n* **[Margaret Mitchell et al](https://arxiv.org/abs/1810.03993)**\n\n  Our [usage](./README.md#usage) writeup was loosely inspired by the paper\n  [Model Cards for Model Reporting](https://arxiv.org/abs/1810.03993)\n  and related conversations with some of the authors.\n\n* **[webproduktion01](https://github.com/webproduktion01)**\n\n  Ported download script to python.\n\n**[Full code contributors list](https://github.com/openai/gpt-2/contributors).**\n"
  },
  {
    "path": "DEVELOPERS.md",
    "content": "# Installation\n\nGit clone this repository, and `cd` into directory for remaining commands\n```\ngit clone https://github.com/openai/gpt-2.git && cd gpt-2\n```\n\nThen, follow instructions for either native or Docker installation.\n\n## Native Installation\n\nAll steps can optionally be done in a virtual environment using tools such as `virtualenv` or `conda`.\n\nInstall tensorflow 1.12 (with GPU support, if you have a GPU and want everything to run faster)\n```\npip3 install tensorflow==1.12.0\n```\nor\n```\npip3 install tensorflow-gpu==1.12.0\n```\n\nInstall other python packages:\n```\npip3 install -r requirements.txt\n```\n\nDownload the model data\n```\npython3 download_model.py 124M\npython3 download_model.py 355M\npython3 download_model.py 774M\npython3 download_model.py 1558M\n```\n\n## Docker Installation\n\nBuild the Dockerfile and tag the created image as `gpt-2`:\n```\ndocker build --tag gpt-2 -f Dockerfile.gpu . # or Dockerfile.cpu\n```\n\nStart an interactive bash session from the `gpt-2` docker image.\n\nYou can opt to use the `--runtime=nvidia` flag if you have access to a NVIDIA GPU\nand a valid install of [nvidia-docker 2.0](https://github.com/nvidia/nvidia-docker/wiki/Installation-(version-2.0)).\n```\ndocker run --runtime=nvidia -it gpt-2 bash\n```\n\n# Running\n\n| WARNING: Samples are unfiltered and may contain offensive content. |\n| --- |\n\nSome of the examples below may include Unicode text characters. Set the environment variable:\n```\nexport PYTHONIOENCODING=UTF-8\n```\nto override the standard stream settings in UTF-8 mode.\n\n## Unconditional sample generation\n\nTo generate unconditional samples from the small model:\n```\npython3 src/generate_unconditional_samples.py | tee /tmp/samples\n```\nThere are various flags for controlling the samples:\n```\npython3 src/generate_unconditional_samples.py --top_k 40 --temperature 0.7 | tee /tmp/samples\n```\n\nTo check flag descriptions, use:\n```\npython3 src/generate_unconditional_samples.py -- --help\n```\n\n## Conditional sample generation\n\nTo give the model custom prompts, you can use:\n```\npython3 src/interactive_conditional_samples.py --top_k 40\n```\n\nTo check flag descriptions, use:\n```\npython3 src/interactive_conditional_samples.py -- --help\n```\n"
  },
  {
    "path": "Dockerfile.cpu",
    "content": "FROM tensorflow/tensorflow:1.12.0-py3\n\nENV LANG=C.UTF-8\nRUN mkdir /gpt-2\nWORKDIR /gpt-2\nADD . /gpt-2\nRUN pip3 install -r requirements.txt\nRUN python3 download_model.py 124M\nRUN python3 download_model.py 355M\nRUN python3 download_model.py 774M\nRUN python3 download_model.py 1558M\n"
  },
  {
    "path": "Dockerfile.gpu",
    "content": "FROM tensorflow/tensorflow:1.12.0-gpu-py3\n\n# nvidia-docker 1.0\nLABEL com.nvidia.volumes.needed=\"nvidia_driver\"\nLABEL com.nvidia.cuda.version=\"${CUDA_VERSION}\"\n\n# nvidia-container-runtime\nENV NVIDIA_VISIBLE_DEVICES=all \\\n    NVIDIA_DRIVER_CAPABILITIES=compute,utility \\\n    NVIDIA_REQUIRE_CUDA=\"cuda>=8.0\" \\\n    LANG=C.UTF-8\n\nRUN mkdir /gpt-2\nWORKDIR /gpt-2\nADD . /gpt-2\nRUN pip3 install -r requirements.txt\nRUN python3 download_model.py 124M\nRUN python3 download_model.py 355M\nRUN python3 download_model.py 774M\nRUN python3 download_model.py 1558M\n"
  },
  {
    "path": "LICENSE",
    "content": "Modified MIT License\n\nSoftware Copyright (c) 2019 OpenAI\n\nWe don’t claim ownership of the content you create with GPT-2, so it is yours to do with as you please.\nWe only ask that you use GPT-2 responsibly and clearly indicate your content was created using GPT-2.\n\nPermission is hereby granted, free of charge, to any person obtaining a copy of this software and\nassociated documentation files (the \"Software\"), to deal in the Software without restriction,\nincluding without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense,\nand/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so,\nsubject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included\nin all copies or substantial portions of the Software.\nThe above copyright notice and this permission notice need not be included\nwith content created by the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED,\nINCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS\nBE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,\nTORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE\nOR OTHER DEALINGS IN THE SOFTWARE.\n"
  },
  {
    "path": "README.md",
    "content": "**Status:** Archive (code is provided as-is, no updates expected)\n\n# gpt-2\n\nCode and models from the paper [\"Language Models are Unsupervised Multitask Learners\"](https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf).\n\nYou can read about GPT-2 and its staged release in our [original blog post](https://openai.com/research/better-language-models/), [6 month follow-up post](https://openai.com/blog/gpt-2-6-month-follow-up/), and [final post](https://www.openai.com/blog/gpt-2-1-5b-release/).\n\nWe have also [released a dataset](https://github.com/openai/gpt-2-output-dataset) for researchers to study their behaviors.\n\n<sup>*</sup> *Note that our original parameter counts were wrong due to an error (in our previous blog posts and paper).  Thus you may have seen small referred to as 117M and medium referred to as 345M.*\n\n## Usage\n\nThis repository is meant to be a starting point for researchers and engineers to experiment with GPT-2.\n\nFor basic information, see our [model card](./model_card.md).\n\n### Some caveats\n\n- GPT-2 models' robustness and worst case behaviors are not well-understood.  As with any machine-learned model, carefully evaluate GPT-2 for your use case, especially if used without fine-tuning or in safety-critical applications where reliability is important.\n- The dataset our GPT-2 models were trained on contains many texts with [biases](https://twitter.com/TomerUllman/status/1101485289720242177) and factual inaccuracies, and thus GPT-2 models are likely to be biased and inaccurate as well.\n- To avoid having samples mistaken as human-written, we recommend clearly labeling samples as synthetic before wide dissemination.  Our models are often incoherent or inaccurate in subtle ways, which takes more than a quick read for a human to notice.\n\n### Work with us\n\nPlease [let us know](mailto:languagequestions@openai.com) if you’re doing interesting research with or working on applications of GPT-2!  We’re especially interested in hearing from and potentially working with those who are studying\n- Potential malicious use cases and defenses against them (e.g. the detectability of synthetic text)\n- The extent of problematic content (e.g. bias) being baked into the models and effective mitigations\n\n## Development\n\nSee [DEVELOPERS.md](./DEVELOPERS.md)\n\n## Contributors\n\nSee [CONTRIBUTORS.md](./CONTRIBUTORS.md)\n\n## Citation\n\nPlease use the following bibtex entry:\n```\n@article{radford2019language,\n  title={Language Models are Unsupervised Multitask Learners},\n  author={Radford, Alec and Wu, Jeff and Child, Rewon and Luan, David and Amodei, Dario and Sutskever, Ilya},\n  year={2019}\n}\n```\n\n## Future work\n\nWe may release code for evaluating the models on various benchmarks.\n\nWe are still considering release of the larger models.\n\n## License\n\n[Modified MIT](./LICENSE)\n"
  },
  {
    "path": "domains.txt",
    "content": "1542261 google\n596207 archive\n456344 blogspot\n414695 github\n333160 nytimes\n321622 wordpress\n315368 washingtonpost\n313137 wikia\n311917 bbc\n246303 theguardian\n210714 ebay\n209416 pastebin\n199360 cnn\n196124 yahoo\n186668 huffingtonpost\n186137 go\n183592 reuters\n183080 imdb\n160553 goo\n139965 nih\n135562 cbc\n128011 apple\n125615 medium\n118676 dailymail\n108012 steampowered\n106417 independent\n105239 etsy\n98941 craigslist\n93048 businessinsider\n92712 telegraph\n90262 wizards\n83266 usatoday\n80384 thehill\n79655 nhl\n79494 foxnews\n79167 taobao\n78070 bloomberg\n77515 npr\n77407 mlb\n77172 latimes\n75676 megalodon\n72525 espn\n72523 kickstarter\n71743 breitbart\n69334 abc\n68009 newegg\n67008 wwe\n66278 myanimelist\n65520 microsoft\n64723 buzzfeed\n63162 vice\n62911 indiatimes\n61845 forbes\n61772 tappedout\n60889 wsj\n60240 vid\n60239 battle\n59996 adf\n58706 politico\n58345 redditgifts\n56769 nexusmods\n56469 goodreads\n54866 magiccards\n53973 nbcnews\n53060 gamepedia\n52110 mediafire\n50567 time\n50144 cbsnews\n49203 ppy\n48442 gstatic\n48042 nfl\n47460 steamusercontent\n47046 thestar\n46603 bugguide\n46340 fanfiction\n45505 mturk\n45458 cbslocal\n44729 theglobeandmail\n44134 nydailynews\n42992 theatlantic\n42941 netflix\n42328 theverge\n41952 smh\n40694 nbcsports\n40613 cnbc\n40469 slate\n40071 ign\n39655 dotabuff\n38968 wired\n38779 chicagotribune\n38590 urbandictionary\n38575 rt\n38092 wuxiaworld\n38065 wowhead\n37954 wolframalpha\n37749 guardian\n37594 xboxdvr\n36841 nypost\n36741 ravelry\n36321 thedailybeast\n36298 nba\n36188 yelp\n36008 arstechnica\n35485 csgo\n35365 flic\n35269 stackexchange\n35124 vidble\n35024 googleusercontent\n34311 msn\n34121 gizmodo\n34120 boardgamegeek\n33867 aljazeera\n33598 rawstory\n33516 scryfall\n33467 bleacherreport\n33419 bit\n33395 thinkprogress\n33170 dailycaller\n32843 ap\n32433 fangraphs\n31742 salon\n31728 mirror\n31496 nintendo\n31294 nationalpost\n31278 nasa\n31110 oddshot\n31057 hltv\n30952 amzn\n30877 quora\n30586 engadget\n30397 stackoverflow\n30201 aliexpress\n29710 cnet\n28850 leagueoflegends\n28822 surveymonkey\n28704 ctvnews\n28650 walmart\n28644 plays\n28536 sfgate\n28375 cbssports\n28210 globo\n27992 discogs\n27630 wiktionary\n27588 ibb\n27544 stuff\n27349 nature\n27112 news\n27020 biblegateway\n26801 subtletv\n26427 change\n26355 zippyshare\n26311 guildwars2\n26231 vox\n26205 zkillboard\n26174 techcrunch\n25993 economist\n25964 globalnews\n25621 washingtontimes\n25610 hollywoodreporter\n25351 archiveofourown\n25336 ibtimes\n25257 newsweek\n25139 zerohedge\n25074 fav\n25050 sciencedirect\n24894 bestbuy\n24870 spiegel\n24869 247sports\n24866 smmry\n24764 xda-developers\n24726 tvtropes\n24698 phys\n24663 teamliquid\n24619 state\n23953 gleam\n23676 sbnation\n23644 asahi\n23620 foxsports\n23240 ndtv\n23189 si\n23183 alternet\n23009 redbubble\n22846 metro\n22845 theonion\n22835 playstation\n22808 washingtonexaminer\n22682 thehindu\n22557 espncricinfo\n22482 mozilla\n22219 op\n22038 t\n21984 nj\n21921 indianexpress\n21707 apnews\n21603 dw\n21422 nationalgeographic\n21399 pinterest\n21368 ft\n21319 wiley\n21254 about\n21074 skysports\n21033 gamespot\n21014 dailykos\n21009 goal\n20858 patheos\n20842 irishtimes\n20664 variety\n20592 kotaku\n20584 mashable\n20575 scientificamerican\n20448 basketball-reference\n20262 yle\n20218 theage\n20176 usnews\n20133 animenewsnetwork\n20092 livejournal\n20068\n20024 pbs\n19802 nhk\n19741 newyorker\n19727 seattletimes\n19672 mlssoccer\n19619 meetup\n19543 nzherald\n19509 philly\n19496 uol\n19470 patreon\n19429 wikileaks\n19400 gravitytales\n19294 oregonlive\n19267 xbox\n19216 linkedin\n19202 crunchyroll\n19045 target\n19021 ew\n18922 redditpoll\n18875 homedepot\n18867 qz\n18865 donmai\n18653 baseball-reference\n18646 talkingpointsmemo\n18576 pathofexile\n18536 makeameme\n18489 postimg\n18308 clyp\n18175 scribd\n18120 thegatewaypundit\n18097 removeddit\n18063 deadspin\n18049 sciencedaily\n18019 huffpost\n17987 dallasnews\n17956 europa\n17878 merriam-webster\n17816 haaretz\n17746 deadline\n17637 msnbc\n17579 hindustantimes\n17531 nymag\n17429 gph\n17208 typepad\n17204 express\n17098 naver\n17085 bizjournals\n17084 mlive\n16834 rollingstone\n16793 motherjones\n16704 okcupid\n16441 tinyurl\n16410 espnfc\n16397 bostonglobe\n16374 thingiverse\n16351 denverpost\n16332 bitcointalk\n16256 timesofisrael\n16209 xnxx\n16202 wikihow\n16051 neopets\n16043 indiegogo\n16033 al\n16032 chron\n16004 avclub\n15970 marketwatch\n15933 mercurynews\n15675 startribune\n15646 pro-football-reference\n15568 d20pfsrd\n15545 pcgamer\n15451 reason\n15422 uesp\n15356 lds\n15152 polygon\n15132 humblebundle\n14962 tradingview\n14931 baltimoresun\n14914 strava\n14912 firstpost\n14856 commondreams\n14801 sky\n14739 eventbrite\n14722 nicovideo\n14697 fortune\n14693 knowyourmeme\n14666 robertsspaceindustries\n14471 pitchfork\n14466 psychologytoday\n14435 combodeck\n14392 mixcloud\n14372 lemonde\n14290 sciencemag\n14060 jpost\n13926 miamiherald\n13902 patch\n13850 nationalreview\n13849 gofundme\n13798 thelocal\n13763 derpibooru\n13726 techdirt\n13658 townhall\n13596 mtg\n13588 gettyimages\n13530 mit\n13436 challonge\n13369 mediaite\n13357 tsn\n13350 pokemonshowdown\n13176 neogaf\n13130 publico\n13126 snopes\n13092 scmp\n13082 cleveland\n13044 thesun\n13025 mtggoldfish\n12994 freep\n12984 grailed\n12948 standard\n12923 theconversation\n12913 upi\n12870 bing\n12778 blockchain\n12774 people\n12771 arxiv\n12760 hearthpwn\n12668 reference\n12626 edhrec\n12611 sputniknews\n12551 nordstrom\n12550 lapresse\n12496 metacritic\n12447 last\n12395 ajc\n12355 mangadex\n12349 ycombinator\n12345 csmonitor\n12240 sportsnet\n12229 cornell\n12205 smithsonianmag\n12201 sephora\n12194 bulbagarden\n12181 japantimes\n12171 zdnet\n12152 comicbook\n12139 whitehouse\n12109 theregister\n12089 libsyn\n12052 asos\n12016 neatclip\n12001 imirhil\n12000 boston\n11973 behance\n11966 eveonline\n11954 androidpolice\n11935 livescience\n11843 instructables\n11817 hs\n11788 infowars\n11712 ca\n11704 runescape\n11699 suntimes\n11697 eurogamer\n11654 roblox\n11622 genius\n11602 stltoday\n11499 elpais\n11494 motorsport\n11461 ceddit\n11426 france24\n11373 bungie\n11371 youtubedoubler\n11362 openload\n11348 jstor\n11328 thefreedictionary\n11307 inquisitr\n11215 nhentai\n11204 zeit\n11198 ikea\n11114 springer\n11108 tripadvisor\n11082 thescore\n11036 kerbalspaceprogram\n11007 cdc\n10995 dailywire\n10965 gawker\n10953 a\n10950 brooksbaseball\n10940 dn\n10927 sltrib\n10867 brickset\n10823 dictionary\n10821 squarespace\n10819 battlefield\n10807 harvard\n10786 afpbb\n10734 steemit\n10730 billboard\n10707 tampabay\n10654 nola\n10621 stanford\n10602 sbs\n10524 cc\n10520 dailydot\n10510 straitstimes\n10493 itch\n10490 foreignpolicy\n10465 vancouversun\n10440 rottentomatoes\n10419 dnainfo\n10389 digi24\n10348 dropboxusercontent\n10332 complex\n10330 scp-wiki\n10327 prnt\n10313 ottawacitizen\n10304 anandtech\n10269 thenation\n10253 fivethirtyeight\n10244 newscientist\n10240 svt\n10240 inquirer\n10236 coindesk\n10227 codepen\n10208 lichess\n10204 sankei\n10189 ted\n10181 roosterteeth\n10170 livemint\n10161 teamfortress\n10141 sourceforge\n10119 sapo\n10113 countle\n10086 mtv\n10075 sacbee\n10066 fimfiction\n10057 hentai-foundry\n10054 gamesplanet\n10044 io9\n10032 lifehacker\n10007 cracked\n9991 mainichi\n9984 itmedia\n9966 warthunder\n9936 nos\n9935 boingboing\n9925 vulture\n9904 lanacion\n9892 qualtrics\n9884 muthead\n9856 jcrew\n9814 jsonline\n9787 spacebattles\n9748 worldstarhiphop\n9734 jalopnik\n9721 welt\n9717 curbed\n9708 dbr\n9705 mmafighting\n9697 bigcartel\n9682 transfermarkt\n9680 vlive\n9659 vanityfair\n9658 dawn\n9621 dnaindia\n9601 theblaze\n9599 allrecipes\n9576 thejournal\n9572 dailystar\n9521 minecraftforum\n9505 theweek\n9502 kansascity\n9494 anilist\n9443 gog\n9420 bato\n9401 oxforddictionaries\n9400 soompi\n9394 sagepub\n9389 wikiwand\n9382 lolking\n9322 torontosun\n9319 mangapanda\n9316 politifact\n9306 realclearpolitics\n9278 tagpro\n9261 webmd\n9206 app\n9202 hotnews\n9184 9news\n9174 bhphotovideo\n9147 giantbomb\n9132 gamestop\n9073 azcentral\n9053 noaa\n9040 repubblica\n9021 mangaupdates\n8998 space\n8998 researchgate\n8971 bitcoin\n8957 sueddeutsche\n8898 rightwingwatch\n8892 mediacru\n8890 afl\n8862 fasttech\n8858 tmz\n8841 orlandosentinel\n8832 tomshardware\n8828 altomfotball\n8822 mtgprice\n8821 haskell\n8816 discovery\n8810 destinytracker\n8808 massdrop\n8800 csgolounge\n8791 weather\n8778 daddyleagues\n8720 govtrack\n8678 mentalfloss\n8678 justice\n8663 frontier\n8655 youporn\n8641 paradoxplaza\n8640 rockstargames\n8632 derstandard\n8622 pinknews\n8619 macrumors\n8598 gamefaqs\n8587 thepiratebay\n8586 4chan\n8582 post-gazette\n8573 faz\n8563 e-hentai\n8530 jiji\n8525 quoracdn\n8519 fullmatchesandshows\n8516 sun-sentinel\n8513 xboxclips\n8488 financialpost\n8476 audible\n8439 investopedia\n8425 loc\n8418 venturebeat\n8414 amazonaws\n8368 ubi\n8345 etymonline\n8326 wsws\n8316 jezebel\n8300 americanthinker\n8284 wikidot\n8269 digitaltrends\n8260 nrk\n8232 weebly\n8228 thenextweb\n8225 snahp\n8223 gematsu\n8210 daum\n8206 ea\n8189 liverpoolecho\n8186 freebeacon\n8178 thetimes\n8168 naturalcrit\n8153 warframe\n8150 1drv\n8143 gap\n8131 seriouseats\n8119 myfigurecollection\n8109 gov\n8086 eporner\n8080 hulu\n8077 senate\n8046 esquire\n8015 gosugamers\n8000 radionz\n7997 eater\n7982 politicususa\n7978 rte\n7956 marvel\n7942 metronews\n7917 starcitygames\n7917 hotair\n7914 marca\n7872 eurekalert\n7840 screenrant\n7834 dota2\n7797 truth-out\n7784 dell\n7783 eldiario\n7782 pcworld\n7782 doi\n7780 comicbookresources\n7765 dr\n7729 howstuffworks\n7727 gocomics\n7715 worldoftanks\n7707 tandfonline\n7690 examiner\n7688 newrepublic\n7682 curseforge\n7680 findlaw\n7673 nikkei\n7665 heraldsun\n7652 podbean\n7645 aftonbladet\n7638 duckduckgo\n7633 ynetnews\n7629 timesofindia\n7628 freshphase\n7591 westeros\n7576 youjizz\n7574 spectator\n7548 justia\n7537 antiwar\n7536 mmajunkie\n7516 yomiuri\n7485 newstatesman\n7481 greenmangaming\n7475 joystiq\n7444 jsfiddle\n7424 anime-planet\n7415 counterpunch\n7410 autosport\n7395 archlinux\n7384 berkeley\n7383 smbc-comics\n7374 rockpapershotgun\n7372 pjmedia\n7367 estadao\n7365 intoday\n7361 newsmax\n7346 newsbusters\n7337 grantland\n7329 voanews\n7292 myshopify\n7286 wnd\n7265 9to5mac\n7257 hurriyetdailynews\n7229 bleedingcool\n7225 indiewire\n7222 radio-canada\n7216 viewsync\n7211 cambridge\n7204 drsd\n7197 house\n7185 uproxx\n7152 mlbtraderumors\n7145 gamasutra\n7134 bricklink\n7122 foodnetwork\n7122 presstv\n7119 opensecrets\n7118 canada\n7116 bgr\n7097 democracynow\n7091 businessweek\n7085 smash\n7080 usda\n7078 cloudfront\n7044 psu\n7028 detroitnews\n7028 explosm\n7013 woobox\n7011 football-italia\n7005 academia\n6948 channelnewsasia\n6927 siliconera\n6923 rei\n6917 deseretnews\n6916 supload\n6914 mises\n6905 rotoworld\n6886 gsmarena\n6878 rappler\n6876 kijiji\n6866 metal-archives\n6826 theaustralian\n6823 mediamatters\n6823 wa\n6818 bodybuilding\n6811 memedad\n6803 ucsd\n6802 barnesandnoble\n6791 india\n6780 readability\n6777 today\n6726 indystar\n6720 scotsman\n6694 impress\n6689 torrentfreak\n6675 heise\n6668 sportingnews\n6658 pnas\n6650 chzbgr\n6650 milb\n6631 business-standard\n6630 bustle\n6623 square-enix\n6622 madison\n6615 moddb\n6613 uniqlo\n6599 zillow\n6577 tribune\n6556 airliners\n6552 svd\n6547 gameinformer\n6536 brisbanetimes\n6536 ocregister\n6533 swtor\n6526 calgaryherald\n6521 c-span\n6518 slashdot\n6505 belfasttelegraph\n6499 hiyo\n6494 news24\n6484 theintercept\n6479 technologyreview\n6455 gutenberg\n6449 cinemablend\n6438 dailytelegraph\n6424 globalresearch\n6411 lefigaro\n6405 tenor\n6381 redstate\n6374 aclu\n6361 bloodyelbow\n6357 axios\n6353 thewrap\n6349 redditmetrics\n6345 evike\n6339 aol\n6327 ulta\n6326 plos\n6324 periscope\n6312 drivethrurpg\n6308 infobae\n6300 debian\n6298 congress\n6289 warcraftlogs\n6284 gothamist\n6281 mangastream\n6276 newgrounds\n6275 berniesanders\n6263 lolesports\n6262 mayoclinic\n6242 sfchronicle\n6235 edmontonjournal\n6200 dhgate\n6194 cincinnati\n6180 history\n6176 xtube\n6169 nike\n6160 kiji\n6147 tube8\n6140 vdare\n6133 unity3d\n6130 twincities\n6127 escapistmagazine\n6126 komonews\n6104 openneo\n6090 oup\n6082 dispatch\n6079 newsobserver\n6060 ballotpedia\n6058 indiegala\n6054 index\n6050 charlotteobserver\n6048 androidcentral\n6032 webtoons\n6028 tcgplayer\n6018 zappos\n6004 intel\n5998 seattlepi\n5996 profootballfocus\n5990 ksl\n5989 macleans\n5984 atlasobscura\n5981 yugiohprices\n5980 ubuntu\n5964 gq\n5952 myvidster\n5941 tv2\n5930 paizo\n5926 montrealgazette\n5919 al-monitor\n5919 herokuapp\n5918 volarenovels\n5909 usgs\n5906 nme\n5906 society6\n5905 vg247\n5902 popsci\n5895 lowes\n5893 thefederalist\n5878 amiami\n5862 nyti\n5848 steamdb\n5841 crooksandliars\n5833 popularmechanics\n5832 slashfilm\n5826 woot\n5818 ev\n5807 illinois\n5792 nps\n5791 destructoid\n5790 mysanantonio\n5772 sbtl\n5742 smashboards\n5700 biblehub\n5696 euronews\n5694 urbanoutfitters\n5687 itv\n5685 fastcompany\n5684 techpowerup\n5674 hearthhead\n5656 mic\n5649 autoblog\n5646 futbin\n5638 voat\n5636 statesman\n5626 zap2it\n5623 userbenchmark\n5623 legaliq\n5622 mspaintadventures\n5622 familysearch\n5616 themoscowtimes\n5606 theprovince\n5604 allkpop\n5594 Omegle\n5570 activistpost\n5565 thefreethoughtproject\n5565 in\n5559 sandiegouniontribune\n5556 consumerist\n5554 eff\n5532 lego\n5520 translationnations\n5515 clickhole\n5498 etherscan\n5491 live\n5486 vndb\n5484 poll-maker\n5481 mtgsalvation\n5481 computerworld\n5475 comicvine\n5470 python\n5469 digitalspy\n5468 citylab\n5458 expressen\n5455 oxfordjournals\n5451 collider\n5447 statista\n5437 apa\n5434 g\n5430 thenational\n5430 eslgaming\n5425 politiken\n5421 ktla\n5420 webmshare\n5408 bostonherald\n5407 comixology\n5400 ustream\n5399 sony\n5396 tennessean\n5377 scout\n5374 drop\n5372 ieee\n5359 sverigesradio\n5356 sherdog\n5353 viooz\n5353 marxists\n5353 adobe\n5349 myfitnesspal\n5342 seahawks\n5339 rferl\n5338 thediplomat\n5335 storeparser\n5332 prnewswire\n5330 midwayusa\n5327 liverpoolfc\n5326 cisco\n5326 windowsphone\n5323 toysrus\n5321 archivesofnethys\n5317 eluniversal\n5309 gmanetwork\n5303 asus\n5297 android\n5297 finalfantasyxiv\n5296 cyclingnews\n5293 worldbank\n5288 boxingscene\n5285 ticketmaster\n5279 grooveshark\n5277 khl\n5276 gallup\n5268 britannica\n5263 abc7\n5260 penny-arcade\n5257 hsreplay\n5257 oculus\n5256 bt\n5250 theroot\n5246 makeagif\n5246 cnsnews\n5243 nbc\n5243 rbc\n5243 fextralife\n5234 legislation\n5225 sendvid\n5221 sciencealert\n5214 wbur\n5212 myfonts\n5207 picsarus\n5206 phoronix\n5204 nerdist\n5203 eonline\n5195 advocate\n5191 king5\n5189 xkcd\n5183 kitsu\n5182 weibo\n5181 mangareader\n5178 palmbeachpost\n5176 go1dfish\n5175 livestrong\n5174 truthdig\n5173 lgbtqnation\n5172 nikkansports\n5167 slickdeals\n5166 streamja\n5164 irs\n5158 readms\n5152 microcenter\n5137 telesurtv\n5135 lastwordonsports\n5129 alarabiya\n5117 cointelegraph\n5114 iltalehti\n5112 fc2\n5108 wral\n5108 thinkgeek\n5102 bitbucket\n5101 letterboxd\n5098 ehow\n5092 abc13\n5083 beeradvocate\n5077 umich\n5067 macys\n5064 factorio\n5063 comicbookmovie\n5042 telegram\n5039 scroll\n5034 setlist\n5028 dailyherald\n5019 games-workshop\n5015 irishexaminer\n5008 fbi\n5007 heraldscotland\n5001 jellyneo\n4999 yale\n4996 cbr\n4994 masslive\n4984 thestranger\n4982 bundlestars\n4981 alibaba\n4977 filedropper\n4974 monoprice\n4968 forward\n4964 parliament\n4960 theringer\n4950 hobbyking\n4950 manchestereveningnews\n4949 bmj\n4948 thewire\n4947 ff2ebook\n4938 ashemaletube\n4937 Twitch\n4933 sketchtoy\n4932 mcclatchydc\n4931 memory-alpha\n4925 newsok\n4911 desmoinesregister\n4901 puzzledragonx\n4889 memecrunch\n"
  },
  {
    "path": "download_model.py",
    "content": "import os\nimport sys\nimport requests\nfrom tqdm import tqdm\n\nif len(sys.argv) != 2:\n    print('You must enter the model name as a parameter, e.g.: download_model.py 124M')\n    sys.exit(1)\n\nmodel = sys.argv[1]\n\nsubdir = os.path.join('models', model)\nif not os.path.exists(subdir):\n    os.makedirs(subdir)\nsubdir = subdir.replace('\\\\','/') # needed for Windows\n\nfor filename in ['checkpoint','encoder.json','hparams.json','model.ckpt.data-00000-of-00001', 'model.ckpt.index', 'model.ckpt.meta', 'vocab.bpe']:\n\n    r = requests.get(\"https://openaipublic.blob.core.windows.net/gpt-2/\" + subdir + \"/\" + filename, stream=True)\n\n    with open(os.path.join(subdir, filename), 'wb') as f:\n        file_size = int(r.headers[\"content-length\"])\n        chunk_size = 1000\n        with tqdm(ncols=100, desc=\"Fetching \" + filename, total=file_size, unit_scale=True) as pbar:\n            # 1k for chunk_size, since Ethernet packet size is around 1500 bytes\n            for chunk in r.iter_content(chunk_size=chunk_size):\n                f.write(chunk)\n                pbar.update(chunk_size)\n"
  },
  {
    "path": "model_card.md",
    "content": "# GPT-2 model card\n\nLast updated: November 2019\n\nInspired by [Model Cards for Model Reporting (Mitchell et al.)](https://arxiv.org/abs/1810.03993), we’re providing some accompanying information about the GPT-2 family of models we're releasing.\n\n## Model Details.\n\nThis model was developed by researchers at OpenAI to help us understand how the capabilities of language model capabilities scale as a function of the size of the models (by parameter count) combined with very large internet-scale datasets (WebText).\n\n### Model date\n\nFebruary 2019, trained on data that cuts off at the end of 2017.\n\n### Model type\n\nLanguage model\n\n### Model version\n\n1.5 billion parameters: the fourth and largest GPT-2 version. We have also released 124 million, 355 million, and 774 million parameter models.\n\n### Paper or other resource for more information\n[Blog post](https://openai.com/blog/better-language-models/) and [paper](https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf)\n\n### Where to send questions or comments about the model\nPlease use this [Google Form](https://forms.gle/A7WBSbTY2EkKdroPA)\n\n## Intended Uses:\n\n### Primary intended uses\n\nThe primary intended users of these models are *AI researchers and practitioners*.\n\nWe primarily imagine these language models will be used by researchers to better understand the behaviors, capabilities, biases, and constraints of large-scale generative language models.\n\n### Secondary uses\n\nHere are some secondary use cases we believe are likely:\n\n- **Writing assistance**: Grammar assistance, autocompletion (for normal prose or code)\n- **Creative writing and art**: exploring the generation of creative, fictional texts; aiding creation of poetry and other literary art.\n- **Entertainment**: Creation of games, chat bots, and amusing generations.\n\n### Out-of-scope use cases\n\nBecause large-scale language models like GPT-2 do not distinguish fact from fiction, we don’t support use-cases that require the generated text to be true.\n\nAdditionally, language models like GPT-2 reflect the biases inherent to the systems they were trained on, so we do not recommend that they be deployed into systems that interact with humans unless the deployers first carry out a study of biases relevant to the intended use-case. We found no statistically significant difference in gender, race, and religious bias probes between 774M and 1.5B, implying all versions of GPT-2 should be approached with similar levels of caution around use cases that are sensitive to biases around human attributes.\n\n## Evaluation Data\n\n### Datasets\n\nThis model was trained on (and evaluated against) WebText, a dataset consisting of the text contents of 45 million links posted by users of the ‘Reddit’ social network. WebText is made of data derived from outbound links from Reddit and does not consist of data taken directly from Reddit itself. Before generating the dataset we used a blocklist to ensure we didn’t sample from a variety of subreddits which contain sexually explicit or otherwise offensive content.\n\nTo get a sense of the data that went into GPT-2, we’ve [published a list](domains.txt) of the top 1,000 domains present in WebText and their frequency.  The top 15 domains by volume in WebText are: Google, Archive, Blogspot, GitHub, NYTimes, Wordpress, Washington Post, Wikia, BBC, The Guardian, eBay, Pastebin, CNN, Yahoo!, and the Huffington Post.\n\n### Motivation\n\nThe motivation behind WebText was to create an Internet-scale, heterogeneous dataset that we could use to test large-scale language models against. WebText was (and is) intended to be primarily for research purposes rather than production purposes.\n\n### Caveats and Recommendations\n\nBecause GPT-2 is an internet-scale language model, it’s currently difficult to know what disciplined testing procedures can be applied to it to fully understand its capabilities and how the data it is trained on influences its vast range of outputs. We recommend researchers investigate these aspects of the model and share their results.\n\nAdditionally, as indicated in our discussion of issues relating to potential misuse of the model, it remains unclear what the long-term dynamics are of detecting outputs from these models. We conducted [in-house automated ML-based detection research](https://github.com/openai/gpt-2-output-dataset/tree/master/detector) using simple classifiers, zero shot, and fine-tuning methods. Our fine-tuned detector model reached accuracy levels of approximately 95%. However, no one detection method is a panacea; automated ML-based detection, human detection, human-machine teaming, and metadata-based detection are all methods that can be combined for more confident classification. Developing better approaches to detection today will give us greater intuitions when thinking about future models and could help us understand ahead of time if detection methods will eventually become ineffective.\n\n\n"
  },
  {
    "path": "requirements.txt",
    "content": "fire>=0.1.3\nregex==2017.4.5\nrequests==2.21.0\ntqdm==4.31.1\n"
  },
  {
    "path": "src/encoder.py",
    "content": "\"\"\"Byte pair encoding utilities\"\"\"\n\nimport os\nimport json\nimport regex as re\nfrom functools import lru_cache\n\n@lru_cache()\ndef bytes_to_unicode():\n    \"\"\"\n    Returns list of utf-8 byte and a corresponding list of unicode strings.\n    The reversible bpe codes work on unicode strings.\n    This means you need a large # of unicode characters in your vocab if you want to avoid UNKs.\n    When you're at something like a 10B token dataset you end up needing around 5K for decent coverage.\n    This is a signficant percentage of your normal, say, 32K bpe vocab.\n    To avoid that, we want lookup tables between utf-8 bytes and unicode strings.\n    And avoids mapping to whitespace/control characters the bpe code barfs on.\n    \"\"\"\n    bs = list(range(ord(\"!\"), ord(\"~\")+1))+list(range(ord(\"¡\"), ord(\"¬\")+1))+list(range(ord(\"®\"), ord(\"ÿ\")+1))\n    cs = bs[:]\n    n = 0\n    for b in range(2**8):\n        if b not in bs:\n            bs.append(b)\n            cs.append(2**8+n)\n            n += 1\n    cs = [chr(n) for n in cs]\n    return dict(zip(bs, cs))\n\ndef get_pairs(word):\n    \"\"\"Return set of symbol pairs in a word.\n\n    Word is represented as tuple of symbols (symbols being variable-length strings).\n    \"\"\"\n    pairs = set()\n    prev_char = word[0]\n    for char in word[1:]:\n        pairs.add((prev_char, char))\n        prev_char = char\n    return pairs\n\nclass Encoder:\n    def __init__(self, encoder, bpe_merges, errors='replace'):\n        self.encoder = encoder\n        self.decoder = {v:k for k,v in self.encoder.items()}\n        self.errors = errors # how to handle errors in decoding\n        self.byte_encoder = bytes_to_unicode()\n        self.byte_decoder = {v:k for k, v in self.byte_encoder.items()}\n        self.bpe_ranks = dict(zip(bpe_merges, range(len(bpe_merges))))\n        self.cache = {}\n\n        # Should haved added re.IGNORECASE so BPE merges can happen for capitalized versions of contractions\n        self.pat = re.compile(r\"\"\"'s|'t|'re|'ve|'m|'ll|'d| ?\\p{L}+| ?\\p{N}+| ?[^\\s\\p{L}\\p{N}]+|\\s+(?!\\S)|\\s+\"\"\")\n\n    def bpe(self, token):\n        if token in self.cache:\n            return self.cache[token]\n        word = tuple(token)\n        pairs = get_pairs(word)\n\n        if not pairs:\n            return token\n\n        while True:\n            bigram = min(pairs, key = lambda pair: self.bpe_ranks.get(pair, float('inf')))\n            if bigram not in self.bpe_ranks:\n                break\n            first, second = bigram\n            new_word = []\n            i = 0\n            while i < len(word):\n                try:\n                    j = word.index(first, i)\n                    new_word.extend(word[i:j])\n                    i = j\n                except:\n                    new_word.extend(word[i:])\n                    break\n\n                if word[i] == first and i < len(word)-1 and word[i+1] == second:\n                    new_word.append(first+second)\n                    i += 2\n                else:\n                    new_word.append(word[i])\n                    i += 1\n            new_word = tuple(new_word)\n            word = new_word\n            if len(word) == 1:\n                break\n            else:\n                pairs = get_pairs(word)\n        word = ' '.join(word)\n        self.cache[token] = word\n        return word\n\n    def encode(self, text):\n        bpe_tokens = []\n        for token in re.findall(self.pat, text):\n            token = ''.join(self.byte_encoder[b] for b in token.encode('utf-8'))\n            bpe_tokens.extend(self.encoder[bpe_token] for bpe_token in self.bpe(token).split(' '))\n        return bpe_tokens\n\n    def decode(self, tokens):\n        text = ''.join([self.decoder[token] for token in tokens])\n        text = bytearray([self.byte_decoder[c] for c in text]).decode('utf-8', errors=self.errors)\n        return text\n\ndef get_encoder(model_name, models_dir):\n    with open(os.path.join(models_dir, model_name, 'encoder.json'), 'r') as f:\n        encoder = json.load(f)\n    with open(os.path.join(models_dir, model_name, 'vocab.bpe'), 'r', encoding=\"utf-8\") as f:\n        bpe_data = f.read()\n    bpe_merges = [tuple(merge_str.split()) for merge_str in bpe_data.split('\\n')[1:-1]]\n    return Encoder(\n        encoder=encoder,\n        bpe_merges=bpe_merges,\n    )\n"
  },
  {
    "path": "src/generate_unconditional_samples.py",
    "content": "#!/usr/bin/env python3\n\nimport fire\nimport json\nimport os\nimport numpy as np\nimport tensorflow as tf\n\nimport model, sample, encoder\n\ndef sample_model(\n    model_name='124M',\n    seed=None,\n    nsamples=0,\n    batch_size=1,\n    length=None,\n    temperature=1,\n    top_k=0,\n    top_p=1,\n    models_dir='models',\n):\n    \"\"\"\n    Run the sample_model\n    :model_name=124M : String, which model to use\n    :seed=None : Integer seed for random number generators, fix seed to\n     reproduce results\n    :nsamples=0 : Number of samples to return, if 0, continues to\n     generate samples indefinately.\n    :batch_size=1 : Number of batches (only affects speed/memory).\n    :length=None : Number of tokens in generated text, if None (default), is\n     determined by model hyperparameters\n    :temperature=1 : Float value controlling randomness in boltzmann\n     distribution. Lower temperature results in less random completions. As the\n     temperature approaches zero, the model will become deterministic and\n     repetitive. Higher temperature results in more random completions.\n    :top_k=0 : Integer value controlling diversity. 1 means only 1 word is\n     considered for each step (token), resulting in deterministic completions,\n     while 40 means 40 words are considered at each step. 0 (default) is a\n     special setting meaning no restrictions. 40 generally is a good value.\n     :models_dir : path to parent folder containing model subfolders\n     (i.e. contains the <model_name> folder)\n    \"\"\"\n    models_dir = os.path.expanduser(os.path.expandvars(models_dir))\n    enc = encoder.get_encoder(model_name, models_dir)\n    hparams = model.default_hparams()\n    with open(os.path.join(models_dir, model_name, 'hparams.json')) as f:\n        hparams.override_from_dict(json.load(f))\n\n    if length is None:\n        length = hparams.n_ctx\n    elif length > hparams.n_ctx:\n        raise ValueError(\"Can't get samples longer than window size: %s\" % hparams.n_ctx)\n\n    with tf.Session(graph=tf.Graph()) as sess:\n        np.random.seed(seed)\n        tf.set_random_seed(seed)\n\n        output = sample.sample_sequence(\n            hparams=hparams, length=length,\n            start_token=enc.encoder['<|endoftext|>'],\n            batch_size=batch_size,\n            temperature=temperature, top_k=top_k, top_p=top_p\n        )[:, 1:]\n\n        saver = tf.train.Saver()\n        ckpt = tf.train.latest_checkpoint(os.path.join(models_dir, model_name))\n        saver.restore(sess, ckpt)\n\n        generated = 0\n        while nsamples == 0 or generated < nsamples:\n            out = sess.run(output)\n            for i in range(batch_size):\n                generated += batch_size\n                text = enc.decode(out[i])\n                print(\"=\" * 40 + \" SAMPLE \" + str(generated) + \" \" + \"=\" * 40)\n                print(text)\n\nif __name__ == '__main__':\n    fire.Fire(sample_model)\n\n"
  },
  {
    "path": "src/interactive_conditional_samples.py",
    "content": "#!/usr/bin/env python3\n\nimport fire\nimport json\nimport os\nimport numpy as np\nimport tensorflow as tf\n\nimport model, sample, encoder\n\ndef interact_model(\n    model_name='124M',\n    seed=None,\n    nsamples=1,\n    batch_size=1,\n    length=None,\n    temperature=1,\n    top_k=0,\n    top_p=1,\n    models_dir='models',\n):\n    \"\"\"\n    Interactively run the model\n    :model_name=124M : String, which model to use\n    :seed=None : Integer seed for random number generators, fix seed to reproduce\n     results\n    :nsamples=1 : Number of samples to return total\n    :batch_size=1 : Number of batches (only affects speed/memory).  Must divide nsamples.\n    :length=None : Number of tokens in generated text, if None (default), is\n     determined by model hyperparameters\n    :temperature=1 : Float value controlling randomness in boltzmann\n     distribution. Lower temperature results in less random completions. As the\n     temperature approaches zero, the model will become deterministic and\n     repetitive. Higher temperature results in more random completions.\n    :top_k=0 : Integer value controlling diversity. 1 means only 1 word is\n     considered for each step (token), resulting in deterministic completions,\n     while 40 means 40 words are considered at each step. 0 (default) is a\n     special setting meaning no restrictions. 40 generally is a good value.\n     :models_dir : path to parent folder containing model subfolders\n     (i.e. contains the <model_name> folder)\n    \"\"\"\n    models_dir = os.path.expanduser(os.path.expandvars(models_dir))\n    if batch_size is None:\n        batch_size = 1\n    assert nsamples % batch_size == 0\n\n    enc = encoder.get_encoder(model_name, models_dir)\n    hparams = model.default_hparams()\n    with open(os.path.join(models_dir, model_name, 'hparams.json')) as f:\n        hparams.override_from_dict(json.load(f))\n\n    if length is None:\n        length = hparams.n_ctx // 2\n    elif length > hparams.n_ctx:\n        raise ValueError(\"Can't get samples longer than window size: %s\" % hparams.n_ctx)\n\n    with tf.Session(graph=tf.Graph()) as sess:\n        context = tf.placeholder(tf.int32, [batch_size, None])\n        np.random.seed(seed)\n        tf.set_random_seed(seed)\n        output = sample.sample_sequence(\n            hparams=hparams, length=length,\n            context=context,\n            batch_size=batch_size,\n            temperature=temperature, top_k=top_k, top_p=top_p\n        )\n\n        saver = tf.train.Saver()\n        ckpt = tf.train.latest_checkpoint(os.path.join(models_dir, model_name))\n        saver.restore(sess, ckpt)\n\n        while True:\n            raw_text = input(\"Model prompt >>> \")\n            while not raw_text:\n                print('Prompt should not be empty!')\n                raw_text = input(\"Model prompt >>> \")\n            context_tokens = enc.encode(raw_text)\n            generated = 0\n            for _ in range(nsamples // batch_size):\n                out = sess.run(output, feed_dict={\n                    context: [context_tokens for _ in range(batch_size)]\n                })[:, len(context_tokens):]\n                for i in range(batch_size):\n                    generated += 1\n                    text = enc.decode(out[i])\n                    print(\"=\" * 40 + \" SAMPLE \" + str(generated) + \" \" + \"=\" * 40)\n                    print(text)\n            print(\"=\" * 80)\n\nif __name__ == '__main__':\n    fire.Fire(interact_model)\n\n"
  },
  {
    "path": "src/model.py",
    "content": "import numpy as np\nimport tensorflow as tf\nfrom tensorflow.contrib.training import HParams\n\ndef default_hparams():\n    return HParams(\n        n_vocab=0,\n        n_ctx=1024,\n        n_embd=768,\n        n_head=12,\n        n_layer=12,\n    )\n\ndef shape_list(x):\n    \"\"\"Deal with dynamic shape in tensorflow cleanly.\"\"\"\n    static = x.shape.as_list()\n    dynamic = tf.shape(x)\n    return [dynamic[i] if s is None else s for i, s in enumerate(static)]\n\ndef softmax(x, axis=-1):\n    x = x - tf.reduce_max(x, axis=axis, keepdims=True)\n    ex = tf.exp(x)\n    return ex / tf.reduce_sum(ex, axis=axis, keepdims=True)\n\ndef gelu(x):\n    return 0.5*x*(1+tf.tanh(np.sqrt(2/np.pi)*(x+0.044715*tf.pow(x, 3))))\n\ndef norm(x, scope, *, axis=-1, epsilon=1e-5):\n    \"\"\"Normalize to mean = 0, std = 1, then do a diagonal affine transform.\"\"\"\n    with tf.variable_scope(scope):\n        n_state = x.shape[-1].value\n        g = tf.get_variable('g', [n_state], initializer=tf.constant_initializer(1))\n        b = tf.get_variable('b', [n_state], initializer=tf.constant_initializer(0))\n        u = tf.reduce_mean(x, axis=axis, keepdims=True)\n        s = tf.reduce_mean(tf.square(x-u), axis=axis, keepdims=True)\n        x = (x - u) * tf.rsqrt(s + epsilon)\n        x = x*g + b\n        return x\n\ndef split_states(x, n):\n    \"\"\"Reshape the last dimension of x into [n, x.shape[-1]/n].\"\"\"\n    *start, m = shape_list(x)\n    return tf.reshape(x, start + [n, m//n])\n\ndef merge_states(x):\n    \"\"\"Smash the last two dimensions of x into a single dimension.\"\"\"\n    *start, a, b = shape_list(x)\n    return tf.reshape(x, start + [a*b])\n\ndef conv1d(x, scope, nf, *, w_init_stdev=0.02):\n    with tf.variable_scope(scope):\n        *start, nx = shape_list(x)\n        w = tf.get_variable('w', [1, nx, nf], initializer=tf.random_normal_initializer(stddev=w_init_stdev))\n        b = tf.get_variable('b', [nf], initializer=tf.constant_initializer(0))\n        c = tf.reshape(tf.matmul(tf.reshape(x, [-1, nx]), tf.reshape(w, [-1, nf]))+b, start+[nf])\n        return c\n\ndef attention_mask(nd, ns, *, dtype):\n    \"\"\"1's in the lower triangle, counting from the lower right corner.\n\n    Same as tf.matrix_band_part(tf.ones([nd, ns]), -1, ns-nd), but doesn't produce garbage on TPUs.\n    \"\"\"\n    i = tf.range(nd)[:,None]\n    j = tf.range(ns)\n    m = i >= j - ns + nd\n    return tf.cast(m, dtype)\n\n\ndef attn(x, scope, n_state, *, past, hparams):\n    assert x.shape.ndims == 3  # Should be [batch, sequence, features]\n    assert n_state % hparams.n_head == 0\n    if past is not None:\n        assert past.shape.ndims == 5  # Should be [batch, 2, heads, sequence, features], where 2 is [k, v]\n\n    def split_heads(x):\n        # From [batch, sequence, features] to [batch, heads, sequence, features]\n        return tf.transpose(split_states(x, hparams.n_head), [0, 2, 1, 3])\n\n    def merge_heads(x):\n        # Reverse of split_heads\n        return merge_states(tf.transpose(x, [0, 2, 1, 3]))\n\n    def mask_attn_weights(w):\n        # w has shape [batch, heads, dst_sequence, src_sequence], where information flows from src to dst.\n        _, _, nd, ns = shape_list(w)\n        b = attention_mask(nd, ns, dtype=w.dtype)\n        b = tf.reshape(b, [1, 1, nd, ns])\n        w = w*b - tf.cast(1e10, w.dtype)*(1-b)\n        return w\n\n    def multihead_attn(q, k, v):\n        # q, k, v have shape [batch, heads, sequence, features]\n        w = tf.matmul(q, k, transpose_b=True)\n        w = w * tf.rsqrt(tf.cast(v.shape[-1].value, w.dtype))\n\n        w = mask_attn_weights(w)\n        w = softmax(w)\n        a = tf.matmul(w, v)\n        return a\n\n    with tf.variable_scope(scope):\n        c = conv1d(x, 'c_attn', n_state*3)\n        q, k, v = map(split_heads, tf.split(c, 3, axis=2))\n        present = tf.stack([k, v], axis=1)\n        if past is not None:\n            pk, pv = tf.unstack(past, axis=1)\n            k = tf.concat([pk, k], axis=-2)\n            v = tf.concat([pv, v], axis=-2)\n        a = multihead_attn(q, k, v)\n        a = merge_heads(a)\n        a = conv1d(a, 'c_proj', n_state)\n        return a, present\n\n\ndef mlp(x, scope, n_state, *, hparams):\n    with tf.variable_scope(scope):\n        nx = x.shape[-1].value\n        h = gelu(conv1d(x, 'c_fc', n_state))\n        h2 = conv1d(h, 'c_proj', nx)\n        return h2\n\n\ndef block(x, scope, *, past, hparams):\n    with tf.variable_scope(scope):\n        nx = x.shape[-1].value\n        a, present = attn(norm(x, 'ln_1'), 'attn', nx, past=past, hparams=hparams)\n        x = x + a\n        m = mlp(norm(x, 'ln_2'), 'mlp', nx*4, hparams=hparams)\n        x = x + m\n        return x, present\n\ndef past_shape(*, hparams, batch_size=None, sequence=None):\n    return [batch_size, hparams.n_layer, 2, hparams.n_head, sequence, hparams.n_embd // hparams.n_head]\n\ndef expand_tile(value, size):\n    \"\"\"Add a new axis of given size.\"\"\"\n    value = tf.convert_to_tensor(value, name='value')\n    ndims = value.shape.ndims\n    return tf.tile(tf.expand_dims(value, axis=0), [size] + [1]*ndims)\n\ndef positions_for(tokens, past_length):\n    batch_size = tf.shape(tokens)[0]\n    nsteps = tf.shape(tokens)[1]\n    return expand_tile(past_length + tf.range(nsteps), batch_size)\n\n\ndef model(hparams, X, past=None, scope='model', reuse=False):\n    with tf.variable_scope(scope, reuse=reuse):\n        results = {}\n        batch, sequence = shape_list(X)\n\n        wpe = tf.get_variable('wpe', [hparams.n_ctx, hparams.n_embd],\n                             initializer=tf.random_normal_initializer(stddev=0.01))\n        wte = tf.get_variable('wte', [hparams.n_vocab, hparams.n_embd],\n                             initializer=tf.random_normal_initializer(stddev=0.02))\n        past_length = 0 if past is None else tf.shape(past)[-2]\n        h = tf.gather(wte, X) + tf.gather(wpe, positions_for(X, past_length))\n\n        # Transformer\n        presents = []\n        pasts = tf.unstack(past, axis=1) if past is not None else [None] * hparams.n_layer\n        assert len(pasts) == hparams.n_layer\n        for layer, past in enumerate(pasts):\n            h, present = block(h, 'h%d' % layer, past=past, hparams=hparams)\n            presents.append(present)\n        results['present'] = tf.stack(presents, axis=1)\n        h = norm(h, 'ln_f')\n\n        # Language model loss.  Do tokens <n predict token n?\n        h_flat = tf.reshape(h, [batch*sequence, hparams.n_embd])\n        logits = tf.matmul(h_flat, wte, transpose_b=True)\n        logits = tf.reshape(logits, [batch, sequence, hparams.n_vocab])\n        results['logits'] = logits\n        return results\n"
  },
  {
    "path": "src/sample.py",
    "content": "import tensorflow as tf\n\nimport model\n\ndef top_k_logits(logits, k):\n    if k == 0:\n        # no truncation\n        return logits\n\n    def _top_k():\n        values, _ = tf.nn.top_k(logits, k=k)\n        min_values = values[:, -1, tf.newaxis]\n        return tf.where(\n            logits < min_values,\n            tf.ones_like(logits, dtype=logits.dtype) * -1e10,\n            logits,\n        )\n    return tf.cond(\n       tf.equal(k, 0),\n       lambda: logits,\n       lambda: _top_k(),\n    )\n\n\ndef top_p_logits(logits, p):\n    \"\"\"Nucleus sampling\"\"\"\n    batch, _ = logits.shape.as_list()\n    sorted_logits = tf.sort(logits, direction='DESCENDING', axis=-1)\n    cumulative_probs = tf.cumsum(tf.nn.softmax(sorted_logits, axis=-1), axis=-1)\n    indices = tf.stack([\n        tf.range(0, batch),\n        # number of indices to include\n        tf.maximum(tf.reduce_sum(tf.cast(cumulative_probs <= p, tf.int32), axis=-1) - 1, 0),\n    ], axis=-1)\n    min_values = tf.gather_nd(sorted_logits, indices)\n    return tf.where(\n        logits < min_values,\n        tf.ones_like(logits) * -1e10,\n        logits,\n    )\n\n\ndef sample_sequence(*, hparams, length, start_token=None, batch_size=None, context=None, temperature=1, top_k=0, top_p=1):\n    if start_token is None:\n        assert context is not None, 'Specify exactly one of start_token and context!'\n    else:\n        assert context is None, 'Specify exactly one of start_token and context!'\n        context = tf.fill([batch_size, 1], start_token)\n\n    def step(hparams, tokens, past=None):\n        lm_output = model.model(hparams=hparams, X=tokens, past=past, reuse=tf.AUTO_REUSE)\n\n        logits = lm_output['logits'][:, :, :hparams.n_vocab]\n        presents = lm_output['present']\n        presents.set_shape(model.past_shape(hparams=hparams, batch_size=batch_size))\n        return {\n            'logits': logits,\n            'presents': presents,\n        }\n\n    with tf.name_scope('sample_sequence'):\n        def body(past, prev, output):\n            next_outputs = step(hparams, prev, past=past)\n            logits = next_outputs['logits'][:, -1, :]  / tf.to_float(temperature)\n            logits = top_k_logits(logits, k=top_k)\n            logits = top_p_logits(logits, p=top_p)\n            samples = tf.multinomial(logits, num_samples=1, output_dtype=tf.int32)\n            return [\n                next_outputs['presents'] if past is None else tf.concat([past, next_outputs['presents']], axis=-2),\n                samples,\n                tf.concat([output, samples], axis=1)\n            ]\n\n        past, prev, output = body(None, context, context)\n\n        def cond(*args):\n            return True\n\n        _, _, tokens = tf.while_loop(\n            cond=cond, body=body,\n            maximum_iterations=length - 1,\n            loop_vars=[\n                past,\n                prev,\n                output\n            ],\n            shape_invariants=[\n                tf.TensorShape(model.past_shape(hparams=hparams, batch_size=batch_size)),\n                tf.TensorShape([batch_size, None]),\n                tf.TensorShape([batch_size, None]),\n            ],\n            back_prop=False,\n        )\n\n        return tokens\n"
  }
]