0$。则$E(|X|)=$____\n",
"A. $\\frac{1}{2 \\lambda}$\n",
"B. $\\frac{1}{\\lambda}$\n",
"C. $2 \\lambda$\n",
"D. $\\lambda$\n",
"答案是什么? \n",
"response: 根据\n",
"ans: A\n",
"ground truth: C \n",
"\n",
"=======end 3=======\n",
" 22% 4/18 [00:00<00:00, 16.22it/s]\n",
"=======begin 4=======\n",
"question: 设$X_1,X_2,\\cdots X_{12}$是来自正态总体$X\\sim N\\left(0,\\sigma^2\\right)$的简单样本,随机变量$Y=\\frac{\\sum_{i=1}^6X_i^2}{\\sum_{j=1}^6X_{j+6}^2}$服从的分布为:____\n",
"A. $\\chi^2(6)$\n",
"B. $\\chi^2(1)$\n",
"C. $F(5,5)$\n",
"D. $F(6,6)$\n",
"答案是什么? \n",
"response: 根据\n",
"ans: D\n",
"ground truth: D \n",
"\n",
"=======end 4=======\n",
"\n",
"=======begin 5=======\n",
"question: 对于任意两个随机变量X和$Y$,若$E(XY)=EX\\cdot EY$,则____\n",
"A. $D(X Y)=D(X) \\cdot D(Y)$\n",
"B. $D(X+Y)=D(X)+D(Y)$\n",
"C. X和Y独立\n",
"D. X和Y不相关\n",
"答案是什么? \n",
"response: A\n",
"ans: C\n",
"ground truth: D \n",
"\n",
"=======end 5=======\n",
" 33% 6/18 [00:00<00:00, 16.44it/s]\n",
"=======begin 6=======\n",
"question: 设$(X_1,X_2,...,X_n)$是取自总体X的一个样本,X的概率密度如下:$f(x)=\\begin{cases}\\frac12e^{-\\frac{(x-\\mu)}{2}},x\\geq\\mu,\\\\0,其他\\end{cases}$,$\\mu$为未知参数。则$\\mu$的最大似然估计量是.____\n",
"A. $\\hat{\\mu}=\\max _{1 \\leq i \\leq n} X_i$\n",
"B. $\\hat{\\mu}=\\frac13 \\max _{1 \\leq i \\leq n} X_i$\n",
"C. $\\hat{\\mu}=\\min _{1 \\leq i \\leq n} X_i$\n",
"D. $\\hat{\\mu}=\\frac12 \\min _{1 \\leq i \\leq n} X_i$\n",
"答案是什么? \n",
"response: B\n",
"ans: B\n",
"ground truth: C \n",
"\n",
"=======end 6=======\n",
"\n",
"=======begin 7=======\n",
"question: 当事件$A$和$B$同时发生时$C$也发生,则下列式子中成立的是____\n",
"A. $P(C)=P(A \\cap B)$\n",
"B. $P(C) \\leq P(A)+P(B)-1$\n",
"C. $P(C)=P(A \\cup B)$\n",
"D. $P(C) \\geq P(A)+P(B)-1$\n",
"答案是什么? \n",
"response: A\n",
"ans: A\n",
"ground truth: D \n",
"\n",
"=======end 7=======\n",
" 44% 8/18 [00:00<00:00, 16.25it/s]\n",
"=======begin 8=======\n",
"question: $$\n",
"\\text{设}00$,$P(X_1^2+X_2^2+X_3^2+X_4^2\\le kX_5^2)=\\alpha$则k=____\n",
"A. $\\frac{1}{4}F_{\\alpha}(4,1)$\n",
"B. $\\frac{1}{4}F_{1-\\alpha}(4,1)$\n",
"C. $4F_{\\alpha}(4,1)$\n",
"D. $4F_{1-\\alpha}(4,1)$\n",
"答案是什么? \n",
"response: 根据\n",
"ans: D\n",
"ground truth: D \n",
"\n",
"=======end 15=======\n",
" 89% 16/18 [00:01<00:00, 15.69it/s]\n",
"=======begin 16=======\n",
"question: 设$X_1,X_1,\\cdots X_8$为来自总体$X\\sim N\\left(\\mu_1,1\\right)$的简单样本,$\\bar{X},S_1^2$分別是其对应的样本均值与样本方差。$Y_1,Y_1,\\cdots,Y_7$为来自总$Y\\sim N\\left(\\mu_2,1\\right)$的简单样本,$\\bar{Y},S_2^2$分别是其对应的样本均值与样本方差。下列选项正确的是:____\n",
"A. $\\sum_{i=1}^8\\left(X_i-\\mu_1\\right)^2+\\sum_{i=1}^7\\left(Y_i-\\mu_2\\right)^2 \\sim \\chi^2(15)$\n",
"B. $E\\left(\\sum_{i=1}^8\\left(X_i-\\mu_1\\right)^2+\\sum_{i=1}^7\\left(Y_i-\\mu_2\\right)^2\\right)=15$\n",
"C. $\\mathrm{D}(\\bar{X}+\\bar{Y})=\\frac{1}{8}+\\frac{1}{7}$\n",
"D. $\\bar{X}-\\bar{Y} \\sim \\mathrm{N}\\left(\\mu_1-\\mu_2, \\frac{1}{8}+\\frac{1}{7}\\right)$\n",
"答案是什么? \n",
"response: A\n",
"ans: A\n",
"ground truth: B \n",
"\n",
"=======end 16=======\n",
"\n",
"=======begin 17=======\n",
"question: 若随机变量X的分布函数为$F(x)=pF_1(x)+qF_2(x)$,其中$F_1(x)$,$F_2(x)$为两个分布函数,常数p,q满足:$p>0$,$q>0$,$p+q=1$,那么X的分布叫作$F_1(x),F_2(x)$的混合分布.设$\\mu_1,\\mu_2$分别为$F_1(x),F_2(x)$的期望,$\\sigma_1^2,\\sigma_2^2$分别为$F_1(\\mathrm{x})$,$F_2(\\mathrm{x})$的方差,则$DX=$____\n",
"A. $p \\sigma_1^2+q \\sigma_2^2$\n",
"B. $p^2 \\sigma_1^2+q^2 \\sigma_2^2$\n",
"C. $p \\sigma_1^2+q \\sigma_2^2+p q\\left(\\mu_1-\\mu_2\\right)^2$\n",
"D. $p \\sigma_1^2+q \\sigma_2^2+p q\\left(\\sigma_1-\\sigma_2\\right)^2$\n",
"答案是什么? \n",
"response: A\n",
"ans: D\n",
"ground truth: C \n",
"\n",
"=======end 17=======\n",
"100% 18/18 [00:01<00:00, 15.90it/s]\n",
"Subject: probability_and_statistics\n",
"Acc: 22.22222222222222\n",
"0.9615384615384616 Inference starts at 2023-06-16_00-47-07 on /content/alpaca-combined-hf with subject of high_school_chinese!\n",
" 0% 0/19 [00:00, ?it/s]\n",
"=======begin 0=======\n",
"question: 下文划线处选填哪项最恰切____\n",
"作物同病菌进行斗争,情形是复杂的:____,就是同一个抗病品种,对不同的病菌的抵抗方式也不一样。\n",
"A. 不同的抗病品种抵抗病菌的方式不仅有所不同\n",
"B. 不同的抗病品种不仅抵抗病菌的方式有所不同\n",
"C. 不仅不同的抗病品种抵抗病菌的方式有所不同\n",
"D. 固然不同的抗病品种抵抗病菌的方式有所不同\n",
"答案是什么? \n",
"response: C\n",
"ans: C\n",
"ground truth: C \n",
"\n",
"=======end 0=======\n",
"\n",
"=======begin 1=======\n",
"question: 下列各句中,没有语病的一句是____\n",
"A. 某些吃惯“大锅饭”的职工对劳动人为制度的革新,切实其实会感到不适应。\n",
"B. “全面建设小康社会”的目标,对于我们感到十分亲热;它已经成为全党天下人民在新世纪中奋斗的行动纲领。\n",
"C. 日本辅弼前去“靖国神社”为东条英机等战争罪犯招魂的反动行径,对于曾饱受侵略战争祸害的中国人民和其他亚洲国家的人民是不克不及容忍的。\n",
"D. 世界重量级拳击冠军易斯接受了女皇颁发的皇家勋章,以表彰他为英国拳击事业做出的贡献。\n",
"答案是什么? \n",
"response: A\n",
"ans: A\n",
"ground truth: A \n",
"\n",
"=======end 1=======\n",
" 11% 2/19 [00:00<00:01, 15.79it/s]\n",
"=======begin 2=======\n",
"question: 下列各句中,没有语病的一句是____\n",
"A. 在对WTO问题的关注上,过去主要集中在行业、企业等方面所面临的压力上,多是从微观层面考虑问题,而对于经济体制等宏观问题却思考甚少。\n",
"B. 对在如何使学生掌握现代化生活所必须的知识技能的问题上,该校的老师作过深入详尽的研究。\n",
"C. 著名词曲作家付林创作《妈妈的吻》《小螺号》《故园之恋》等脍炙生齿的歌曲而蜚声乐坛。\n",
"D. 载人航天技术,是我国高新科技水平显著提高的重要标志,也是我国综合国力显著提高的重要体现。\n",
"答案是什么? \n",
"response: A\n",
"ans: A\n",
"ground truth: D \n",
"\n",
"=======end 2=======\n",
"\n",
"=======begin 3=======\n",
"question: 下列各句中,没有语病的一句是____\n",
"A. 记者从新闻发布会上获悉,10月26日,辽宁省锦州市黑山县出现禽流感疫情已得到有效控制。\n",
"B. 王越洲和姚佳琪赶赴航天城,他们将从航天员的手中接过搭乘“神舟六号”进行太空之旅的自己的画作,并得到纪念证书。\n",
"C. 不管《泰晤士报》这个排行榜的权威程度颇受国人质疑,但据专家称,排行榜是能够说明一些问题的。\n",
"D. 进入乌镇,信步于幽深的街巷中,你就会觉得自己好像浏览着一部关于江南水乡文化的线装书。\n",
"答案是什么? \n",
"response: A\n",
"ans: A\n",
"ground truth: D \n",
"\n",
"=======end 3=======\n",
" 21% 4/19 [00:00<00:00, 15.56it/s]\n",
"=======begin 4=======\n",
"question: 列字注音全对的一项是\t____\n",
"A. 复杂(fù)\t按捺(nài)\t混淆(xiáo)\t笔画纤细(qiān)\n",
"B. 弥补(mí)\t蓓蕾(bèi)\t发酵(jiào)\t不着边际(zhuó)\n",
"C. 拂晓(fó)\t质量(zhǐ)\t高档(dàng)\t大腹便便(pián)\n",
"D. 勒索(lē)\t结束(sù)\t喧嚣(xiāo)\t酗酒滋事(xù)\n",
"答案是什么? \n",
"response: A\n",
"ans: A\n",
"ground truth: B \n",
"\n",
"=======end 4=======\n",
"\n",
"=======begin 5=======\n",
"question: 下列各句中,没有语病的一句是____\n",
"A. 半期考试之后,因为她这样好的成绩,获得了老师和同学们的颂扬。\n",
"B. 全校师生在雷锋精神的鼓舞下,好人好事,如雨后春笋似的涌现出来。\n",
"C. 他们襟怀胸襟祖国,放眼天下,在高手如林的雅典奥运会上,大力发扬了敢拼敢搏,终于夺得了冠军。\n",
"D. 这个节目表达了同学们要以实际行动向雷锋同志学习,以优异的成绩向党报告的决心。\n",
"答案是什么? \n",
"response: A\n",
"ans: A\n",
"ground truth: D \n",
"\n",
"=======end 5=======\n",
" 32% 6/19 [00:00<00:00, 15.67it/s]\n",
"=======begin 6=======\n",
"question: 下列各句标点符号使用合乎规范的一项是____\n",
"A. 对李清照的诗,比之那“寻寻觅觅,冷冷清清,凄凄惨惨戚戚”的哀怨,我倒更喜欢她的“生当作人杰,死亦为鬼雄”的刚烈。\n",
"B. 昨日,武汉工业学院三名学生宣布:他们经过连续奋战,已经找到了三种简便快速检测奶粉中是否含有三聚氰胺的办法,可见普通市民也可以自己动手检测奶粉中有无三聚氰胺。\n",
"C. 为给地铁2号线和4号线让路,武汉市最大的广场——洪山广场将被拆除重建的消息传出后,许多人都非常关心未来的广场将怎么建?那里的几百株树木将怎么办?\n",
"D. “绿动未来2008”环保方案评选活动开展以来,大赛组委会征集到高质量参赛方案367份,内容涉及新能源、新材料的开发与利用、发展绿色经济、环境保护和生态治理新技术等诸多方面。\n",
"答案是什么? \n",
"response: A\n",
"ans: A\n",
"ground truth: A \n",
"\n",
"=======end 6=======\n",
"\n",
"=======begin 7=======\n",
"question: 下列各句中,没有语病、句意明确的一项是____\n",
"A. 近年来骑马爱好者剧增,使得赛马运动发展迅速,相应的,一些骑马俱乐部也应运而生。\n",
"B. 他饰演了一个英雄人物,观众被深深打动了,说这是我们的偶像。\n",
"C. 在引进竞争机制的情况下,如果还想捧着“铁饭碗”不放,那就是一厢情愿。\n",
"D. 艺术教育无论在德育、智育,在人格的完善、性情的陶冶等方面都是教育行为中的一个重要组成部分。\n",
"答案是什么? \n",
"response: A\n",
"ans: D\n",
"ground truth: C \n",
"\n",
"=======end 7=======\n",
" 42% 8/19 [00:00<00:00, 15.68it/s]\n",
"=======begin 8=======\n",
"question: 下文横线处选填哪项最恰当____\n",
"卢梅坡的诗句“梅须逊雪三分白,雪却输梅-段香”,常被人引用,借此说明____。\n",
"A. 任何人和事物都各有缺憾\n",
"B. 任何人和事物都各有千秋\n",
"C. 任何人和事物都各有短长\n",
"D. 任何人和事物者咯有优势\n",
"答案是什么? \n",
"response: B\n",
"ans: D\n",
"ground truth: A \n",
"\n",
"=======end 8=======\n",
"\n",
"=======begin 9=======\n",
"question: 下列词语的注音有错误的一项是____\n",
"A. 思量(liáng)\t度量(liàng)\t胸脯(pú)\t果脯(fǔ)\n",
"B. 颤(zhàn)抖\t颤(chàn)栗\t靓(jìng)妆\t靓(liàng)女\n",
"C. 阽(diàn)危\t玷(diàn)辱\t胡诌(zhōu)\t谄(chǎn)谀\n",
"D. 瞋目(chēn)\t瞠(chēng)目结舌 觊觎(yú)\t面面相觑(qù)\n",
"答案是什么? \n",
"response: B\n",
"ans: B\n",
"ground truth: B \n",
"\n",
"=======end 9=======\n",
" 53% 10/19 [00:00<00:00, 15.67it/s]\n",
"=======begin 10=======\n",
"question: 下列各句中,没有语病的一句是____\n",
"A. 这届体育节会徽和吉祥物设计的应征者大多以青年体育爱好者为主。\n",
"B. 这届“挑战杯”竞赛的参赛高校数量和作品质量,都有了明显提高。\n",
"C. 师傅让位于徒弟,从一个侧面反映了人们已不再惟师是尊,而是开始强调多方面的能力与素养。\n",
"D. 以生产内衣为主的三枪集团,是今年在全国同行业中产值率先突破十亿大关的一个著名品牌。\n",
"答案是什么? \n",
"response: A\n",
"ans: A\n",
"ground truth: C \n",
"\n",
"=======end 10=======\n",
"\n",
"=======begin 11=======\n",
"question: 下文划线处选填哪项才恰当____\n",
"翌日,贾母带着贾蓉媳妇乘坐一乘驮轿,王夫人在后,亦乘坐一乘驮轿;贾珍骑马,率领众家丁围护;____,并放些随换的衣包等件。\n",
"A. 婆子丫环等乘坐几辆大车\n",
"B. 又有几辆大车,婆子丫环等坐\n",
"C. 又有几辆大车,与婆子丫环等坐\n",
"D. 几辆大车,婆子丫环等坐\n",
"答案是什么? \n",
"response: C\n",
"ans: C\n",
"ground truth: C \n",
"\n",
"=======end 11=======\n",
" 63% 12/19 [00:00<00:00, 15.83it/s]\n",
"=======begin 12=======\n",
"question: 下列词语中注音全都正确的一项是____\n",
"A. 巨擘(bò) 蓓蕾(lěi) 前倨后恭(jū)\n",
"B. 中伤(zhōnɡ) 莅临(lì) 鞭辟入里(bì)\n",
"C. 曲解(qū) 骁勇(xiāo) 余勇可贾(ɡǔ)\n",
"D. 蜚声(fēi) 阜盛(fù) 量体裁衣(liánɡ)\n",
"答案是什么? \n",
"response: A\n",
"ans: A\n",
"ground truth: C \n",
"\n",
"=======end 12=======\n",
"\n",
"=======begin 13=======\n",
"question: 下列各句中加下划线的成语使用恰当的一句是:____\n",
"A. 你的这个$\\underline{不情之请}$让我很为难,过两天我再答复你吧。\n",
"B. 对于学到的原理,他都要拿实物来做实验,求得彻底了解,决不$\\underline{囫囵吞枣}$,马虎了事。\n",
"C. 峨眉山是闻名中外的旅游胜地,素有“峨眉天下秀”之誉其巍峨磅礴,重峦叠嶂,山山有奇景,十里不同天,真是$\\underline{巧夺天工}$。\n",
"D. 在学习上也是这样,吃别人嚼过的馍不香,要善于动脑筋,$\\underline{师心自用}$,才能学深学透。\n",
"答案是什么? \n",
"response: A\n",
"ans: B\n",
"ground truth: B \n",
"\n",
"=======end 13=======\n",
" 74% 14/19 [00:00<00:00, 15.94it/s]\n",
"=======begin 14=======\n",
"question: 下列各句中,没有语病的一项是____\n",
"A. 以“伟大历程辉煌成就”为主题的纪念新中国成立70周年展览在北京拉开帷幕,该展览采用编年体的形式为主全方位回顾了中国人民走过的辉煌历程。\n",
"B. 经过主创团队对经典故事的大胆改编,《哪吒》不仅保留了原作精华,还融入了具有时代元素的内容,因此成功斩获暑期电影最佳口碑。\n",
"C. 网络谣言对社会的破坏力是巨大的,如不及时扑灭,对公众造成的创伤,乃至引起社会动荡,也不是完全不可能的。\n",
"D. 垃圾分类工作能否执行到位,一方面取决于政府相关法律法规的约束力,另一方面也取决于市民的环保意识,尤其是对垃圾分类意义的认识。\n",
"答案是什么? \n",
"response: A\n",
"ans: A\n",
"ground truth: D \n",
"\n",
"=======end 14=======\n",
"\n",
"=======begin 15=======\n",
"question: 填入下文划线处恰当的一句是____\n",
"自从“五四”以来,翻译介绍先进国家的文化成果,就成了中国人民的迫切要求。____。\n",
"A. 这些翻译作品促进了中国学术文化的发展,同时也影响了中国的书面语言\n",
"B. 翻译作品日渐其多,一方面这些作品提高了中国学术文化的素养,另一方面也促进了中国书面语言的发展\n",
"C. 翻译作品日见其多,这些作品促进了中国学术文化的发展,同时也影响了中国的书面语言\n",
"D. 这些翻译作品提高了中国学术文化的素养,同时也促进了中国书面语言的发展\n",
"答案是什么? \n",
"response: D\n",
"ans: D\n",
"ground truth: C \n",
"\n",
"=======end 15=======\n",
" 84% 16/19 [00:01<00:00, 15.89it/s]\n",
"=======begin 16=======\n",
"question: 填入下面横线处的句子,与上下文衔接最恰当的一句是____\n",
"《毛诗序》是先秦儒家诗论的总结,其中心内容是阐述诗歌与封建政教的关系。____。“正得失,动天地,感鬼神,莫近于诗。先王以是经夫妇,成孝敬,厚人伦,美教化,移风俗。”因为诗歌具有感染的力量,所以是封建统治者用以维护政教的有力工具。\n",
"A. 久它认为诗歌不仅是社会治乱、政教得失的反映,而且反过来可以维护封建统治和封建秩序\n",
"B. 它认为不仅诗歌是政教得失、社会治乱的反映,而且反过来可以维护封建统治和封建秩序\n",
"C. 它认为诗歌不但能维护封建统治和封建秩序,而且能反映社会治乱、民生苦乐\n",
"D. 它认为由于诗歌具有强大的艺术感染力,故而封建统治者都要用它来维护封建统治和秩序\n",
"答案是什么? \n",
"response: D\n",
"ans: D\n",
"ground truth: A \n",
"\n",
"=======end 16=======\n",
"\n",
"=======begin 17=======\n",
"question: 填入下面横线处的句子,与上句衔接最恰当的一组是____\n",
"公安干警及时赶赴现场侦察,中午12时,____。\n",
"A. 在家里犯罪嫌疑人被抓获,全部赃物和赃款也同时起获\n",
"B. 在犯罪嫌疑人家里将其抓获,全部赃物和赃款也同时起获\n",
"C. 犯罪嫌疑人在家里被抓获,并起获了全部赃物和赃款\n",
"D. 在犯罪嫌疑人家里将其抓获,并起获了全部赃物和赃款\n",
"答案是什么? \n",
"response: A\n",
"ans: B\n",
"ground truth: D \n",
"\n",
"=======end 17=======\n",
" 95% 18/19 [00:01<00:00, 15.77it/s]\n",
"=======begin 18=======\n",
"question: 下列词语中注音全都正确的一项是____\n",
"A. 接洽(qià) 掮客(qián) 悭吝(jiàn) 地壳(qiào)\n",
"B. 刚劲(jìn) 舐犊(shì) 龋齿(qǔ) 租赁(lìn)\n",
"C. 畏葸(sī) 怆然(chuànɡ) 皈依(ɡuī) 干涸(hé)\n",
"D. 复辟(bì) 巷道(hànɡ) 炽热(chì) 眼睑(jiǎn)\n",
"答案是什么? \n",
"response: A\n",
"ans: A\n",
"ground truth: D \n",
"\n",
"=======end 18=======\n",
"100% 19/19 [00:01<00:00, 15.76it/s]\n",
"Subject: high_school_chinese\n",
"Acc: 31.57894736842105\n",
"0.9807692307692307 Inference starts at 2023-06-16_00-47-07 on /content/alpaca-combined-hf with subject of middle_school_physics!\n",
" 0% 0/19 [00:00, ?it/s]\n",
"=======begin 0=======\n",
"question: 在全国中小学安全教育平台中,安全用电常识是其中一项重要的教育内容。下列做法符合安全用电要求的是____\n",
"A. 用铜丝替代保险丝\n",
"B. 更换灯泡时断开电源开关\n",
"C. 开关接在灯泡和零线之间\n",
"D. 使用测电笔时手接触笔尖金属体\n",
"答案是什么? \n",
"response: C\n",
"ans: C\n",
"ground truth: B \n",
"\n",
"=======end 0=======\n",
"\n",
"=======begin 1=======\n",
"question: 四冲程柴油机在工作过程中,将内能转化为机械能的冲程是____\n",
"A. 吸气冲程\n",
"B. 压缩冲程\n",
"C. 做功冲程\n",
"D. 排气冲程\n",
"答案是什么? \n",
"response: A\n",
"ans: B\n",
"ground truth: C \n",
"\n",
"=======end 1=======\n",
" 11% 2/19 [00:00<00:01, 16.29it/s]\n",
"=======begin 2=======\n",
"question: 歌词“小小竹排江中游,巍巍青山两岸走”,前句描述的运动物体和后一句的参照物分别是____\n",
"A. 青山 青山\n",
"B. 竹排 青山\n",
"C. 竹排 竹排\n",
"D. 青山 竹排\n",
"答案是什么? \n",
"response: 这\n",
"ans: A\n",
"ground truth: C \n",
"\n",
"=======end 2=======\n",
"\n",
"=======begin 3=======\n",
"question: 头球(运动员用头碰撞飞行中的足球)是足球比赛中常用的技术,下列说法正确的是____\n",
"A. 头球过程中,头对足球的力改变了足球的运动状态\n",
"B. 足球被顶飞,是因为头对足球的力大于足球对头的力\n",
"C. 头对足球的作用力消失时,足球的惯性也消失\n",
"D. 足球在空中飞行时,以运动员为参照物,足球是静止的\n",
"答案是什么? \n",
"response: A\n",
"ans: A\n",
"ground truth: A \n",
"\n",
"=======end 3=======\n",
" 21% 4/19 [00:00<00:00, 16.40it/s]\n",
"=======begin 4=======\n",
"question: 自行车的各个部分中,减小了有害摩擦的是____\n",
"A. 车胎\n",
"B. 车把\n",
"C. 车轴\n",
"D. 脚踏板面\n",
"答案是什么? \n",
"response: A\n",
"ans: A\n",
"ground truth: C \n",
"\n",
"=======end 4=======\n",
"\n",
"=======begin 5=======\n",
"question: 下列实例中关于压强和摩擦力的说法正确的是____\n",
"A. 轴承中装有滚珠是为了增大摩擦\n",
"B. 磁悬浮列车悬浮行驶是为了增大摩擦\n",
"C. 鸟的嘴很尖细,在凿树时可以减小压强\n",
"D. 月球车装有很多轮子是为了减小压强\n",
"答案是什么? \n",
"response: A\n",
"ans: A\n",
"ground truth: D \n",
"\n",
"=======end 5=======\n",
" 32% 6/19 [00:00<00:00, 16.20it/s]\n",
"=======begin 6=======\n",
"question: 对于静止在水平轨道上的“复兴号”列车,下列分析中正确的是____\n",
"A. 列车所受重力和列车对铁轨的压力是一对相互作用力\n",
"B. 列军所受重力和铁轨对列车的支持力是一对相互作用力\n",
"C. 列车所受重力和铁轨对列车的支持力是一对平衡力\n",
"D. 列车对铁轨的压力和铁轨对列车的支持力是一对平衡力\n",
"答案是什么? \n",
"response: D\n",
"ans: D\n",
"ground truth: C \n",
"\n",
"=======end 6=======\n",
"\n",
"=======begin 7=======\n",
"question: “万物生长靠太阳”,绿色植物的生长需要阳光。物理学研究表明,不透明物体的颜色是由它反射的色光决定的,由此可以推测,不利于绿色植物生长的光是____\n",
"A. 红光\n",
"B. 黄光\n",
"C. 绿光\n",
"D. 紫光\n",
"答案是什么? \n",
"response: A\n",
"ans: A\n",
"ground truth: C \n",
"\n",
"=======end 7=======\n",
" 42% 8/19 [00:00<00:00, 16.14it/s]\n",
"=======begin 8=======\n",
"question: 下列过程,属于内能转化为机械能的是____\n",
"A. 从滑梯上滑下时臀部发热\n",
"B. 电热水壶烧水\n",
"C. 内燃机的做功冲程\n",
"D. 向下压活塞引燃棉花\n",
"答案是什么? \n",
"response: C\n",
"ans: C\n",
"ground truth: C \n",
"\n",
"=======end 8=======\n",
"\n",
"=======begin 9=======\n",
"question: 能解释“倒影”形成的是____\n",
"A. 光的色散\n",
"B. 光的折射\n",
"C. 光的反射\n",
"D. 光的直线传播\n",
"答案是什么? \n",
"response: C\n",
"ans: C\n",
"ground truth: C \n",
"\n",
"=======end 9=======\n",
" 53% 10/19 [00:00<00:00, 16.20it/s]\n",
"=======begin 10=======\n",
"question: 中国国家航天局宣布,2018年5月21日5时28分,我国在西昌卫星发射中心用“长征四号丙”运载火箭,成功将“鹊桥号”中继星发射升空,为“嫦娥四号”月球探测任务提供地月间的中继通信,负责地球与未来“嫦娥四号”通信的中继接力。下列说法正确的是____\n",
"A. 中继星与地月间不可能靠电磁波通信\n",
"B. 地球和太阳系中的其他行星起源于不同的星云\n",
"C. 发射当天地球运行在绕太阳公转轨道中的夏至与秋分之间\n",
"D. 月球的自转周期和公转周期相同,人类只能看到月球的正面\n",
"答案是什么? \n",
"response: B\n",
"ans: B\n",
"ground truth: D \n",
"\n",
"=======end 10=======\n",
"\n",
"=======begin 11=======\n",
"question: 生活处处有物理,以下估测最接近生活实际的是____\n",
"A. 宿迁六月份平均气温约为10 °C\n",
"B. 初中生背负沉重的书包上学,书包平均重300 N\n",
"C. 初中生课桌高度约为75 cm\n",
"D. 中考体育考试中某同学50 m短跑成绩约为4 s\n",
"答案是什么? \n",
"response: D\n",
"ans: D\n",
"ground truth: C \n",
"\n",
"=======end 11=======\n",
" 63% 12/19 [00:00<00:00, 16.19it/s]\n",
"=======begin 12=======\n",
"question: 下列实例中,属于增大摩擦的是____\n",
"A. 往自行车轴承中加润滑油\n",
"B. 行李箱下安装滚动轮子\n",
"C. 骑自行车刹车时用力捏闸\n",
"D. 将滑梯的滑道做得光滑\n",
"答案是什么? \n",
"response: B\n",
"ans: B\n",
"ground truth: C \n",
"\n",
"=======end 12=======\n",
"\n",
"=======begin 13=======\n",
"question: 下列做法中,符合安全用电原则的是____\n",
"A. 高压线下钓鱼\n",
"B. 机壳没有接地\n",
"C. 绝缘皮破损\n",
"D. 安装避雷针\n",
"答案是什么? \n",
"response: D\n",
"ans: D\n",
"ground truth: D \n",
"\n",
"=======end 13=======\n",
" 74% 14/19 [00:00<00:00, 16.25it/s]\n",
"=======begin 14=======\n",
"question: 超导现象是指某些物质在温度很低时电阻变为零的现象。如果某种超导材料能应用于实际,最适合用来制作____\n",
"A. 保险丝\n",
"B. 输电导线\n",
"C. 电炉丝\n",
"D. 变阻器的电阻丝\n",
"答案是什么? \n",
"response: B\n",
"ans: B\n",
"ground truth: B \n",
"\n",
"=======end 14=======\n",
"\n",
"=======begin 15=======\n",
"question: 棒球比赛时,向斜上方击球时的情景中,下列有关说法正确的是____\n",
"A. 击球的一瞬间,棒对球的力大于球对棒的力\n",
"B. 球在上升过程中,重力势能转化为动能\n",
"C. 球上升到最高点时,若所受力全部消失,球将做减速直线运动\n",
"D. 球下落过程中速度越来越大,因为重力改变了球的运动状态\n",
"答案是什么? \n",
"response: A\n",
"ans: A\n",
"ground truth: D \n",
"\n",
"=======end 15=======\n",
" 84% 16/19 [00:00<00:00, 16.29it/s]\n",
"=======begin 16=======\n",
"question: 下列关于力和运动的说法,正确的是____\n",
"A. 物体运动状态发生改变,一定受到力的作用\n",
"B. 行驶的汽车急刹车时,乘客会出现向后倾的现象\n",
"C. 用力推桌子,桌子静止不动,因为推力小于摩擦阻力\n",
"D. 踢出去的足球能在空中飞行,是因为足球没有受到力的作用\n",
"答案是什么? \n",
"response: A\n",
"ans: A\n",
"ground truth: A \n",
"\n",
"=======end 16=======\n",
"\n",
"=======begin 17=======\n",
"question: 声音可以表达情感,传递信息,对于声现象的理解正确的是____\n",
"A. 教师讲课的声音是由声带振动产生的\n",
"B. “静止鸣笛”是在传播过程中减弱噪音\n",
"C. 声音的振幅越大,音调越高\n",
"D. 只要物体在振动,我们就能听到声音\n",
"答案是什么? \n",
"response: D\n",
"ans: D\n",
"ground truth: A \n",
"\n",
"=======end 17=======\n",
" 95% 18/19 [00:01<00:00, 16.36it/s]\n",
"=======begin 18=======\n",
"question: 为加强校园安全管理,在校内安装监控摄像机,来自物体的光经过摄像机的镜头后形成____\n",
"A. 倒立、缩小的实像\n",
"B. 正立、放大的实像\n",
"C. 倒立、放大的虚像\n",
"D. 正立、缩小的虚像\n",
"答案是什么? \n",
"response: A\n",
"ans: A\n",
"ground truth: A \n",
"\n",
"=======end 18=======\n",
"100% 19/19 [00:01<00:00, 16.27it/s]\n",
"Subject: middle_school_physics\n",
"Acc: 36.8421052631579\n",
"Accuracy:\n",
"law : 25.0\n",
"environmental_impact_assessment_engineer : 45.16129032258065\n",
"middle_school_biology : 47.61904761904762\n",
"college_chemistry : 29.166666666666668\n",
"college_economics : 34.54545454545455\n",
"middle_school_mathematics : 26.31578947368421\n",
"computer_architecture : 28.571428571428573\n",
"high_school_mathematics : 16.666666666666668\n",
"college_programming : 37.83783783783784\n",
"computer_network : 31.57894736842105\n",
"basic_medicine : 47.36842105263158\n",
"urban_and_rural_planner : 36.95652173913044\n",
"logic : 50.0\n",
"civil_servant : 36.170212765957444\n",
"art_studies : 39.39393939393939\n",
"advanced_mathematics : 26.31578947368421\n",
"electrical_engineer : 32.432432432432435\n",
"accountant : 32.6530612244898\n",
"operating_system : 42.10526315789474\n",
"middle_school_politics : 57.142857142857146\n",
"sports_science : 36.8421052631579\n",
"middle_school_chemistry : 30.0\n",
"marxism : 52.63157894736842\n",
"fire_engineer : 25.806451612903224\n",
"middle_school_geography : 8.333333333333334\n",
"high_school_history : 50.0\n",
"professional_tour_guide : 37.93103448275862\n",
"middle_school_history : 31.818181818181817\n",
"modern_chinese_history : 39.130434782608695\n",
"clinical_medicine : 36.36363636363637\n",
"high_school_biology : 52.63157894736842\n",
"high_school_politics : 21.05263157894737\n",
"tax_accountant : 34.69387755102041\n",
"teacher_qualification : 56.81818181818182\n",
"high_school_geography : 26.31578947368421\n",
"high_school_chemistry : 31.57894736842105\n",
"plant_protection : 54.54545454545455\n",
"legal_professional : 39.130434782608695\n",
"high_school_physics : 26.31578947368421\n",
"ideological_and_moral_cultivation : 42.10526315789474\n",
"veterinary_medicine : 39.130434782608695\n",
"physician : 30.612244897959183\n",
"college_physics : 21.05263157894737\n",
"discrete_mathematics : 43.75\n",
"mao_zedong_thought : 58.333333333333336\n",
"education_science : 34.48275862068966\n",
"business_administration : 33.333333333333336\n",
"chinese_language_and_literature : 43.47826086956522\n",
"metrology_engineer : 37.5\n",
"probability_and_statistics : 22.22222222222222\n",
"high_school_chinese : 31.57894736842105\n",
"middle_school_physics : 36.8421052631579\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"### 第三步:查看预测结果\n",
"\n",
"运行以下命令查看最终结果,json最后的ALL里会显示,这一次运行结果是:\n",
"```\n",
" \"All\": {\n",
" \"score\": 0.36701337295690933,\n",
" \"num\": 1346,\n",
" \"correct\": 494.0\n",
" }\n",
"```\n",
"\n",
"上述结果与我们论文中汇报的zero-shot 36.7(%)一致。需要注意的是解码存在随机性,如果希望多次运行可将`n_times`改为需要运行的次数。\n"
],
"metadata": {
"id": "6ZW0bynVoP5K"
}
},
{
"cell_type": "code",
"source": [
"!cat ./ceval-output/take0/summary.json"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "W84IQ1RGraet",
"outputId": "79748646-3092-40ca-f980-2ea88078e420"
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"{\n",
" \"law\": {\n",
" \"score\": 25.0,\n",
" \"num\": 24,\n",
" \"correct\": 6.0\n",
" },\n",
" \"environmental_impact_assessment_engineer\": {\n",
" \"score\": 45.16129032258065,\n",
" \"num\": 31,\n",
" \"correct\": 14.0\n",
" },\n",
" \"middle_school_biology\": {\n",
" \"score\": 47.61904761904762,\n",
" \"num\": 21,\n",
" \"correct\": 10.0\n",
" },\n",
" \"college_chemistry\": {\n",
" \"score\": 29.166666666666668,\n",
" \"num\": 24,\n",
" \"correct\": 7.0\n",
" },\n",
" \"college_economics\": {\n",
" \"score\": 34.54545454545455,\n",
" \"num\": 55,\n",
" \"correct\": 19.0\n",
" },\n",
" \"middle_school_mathematics\": {\n",
" \"score\": 26.31578947368421,\n",
" \"num\": 19,\n",
" \"correct\": 4.999999999999999\n",
" },\n",
" \"computer_architecture\": {\n",
" \"score\": 28.571428571428573,\n",
" \"num\": 21,\n",
" \"correct\": 6.0\n",
" },\n",
" \"high_school_mathematics\": {\n",
" \"score\": 16.666666666666668,\n",
" \"num\": 18,\n",
" \"correct\": 3.0\n",
" },\n",
" \"college_programming\": {\n",
" \"score\": 37.83783783783784,\n",
" \"num\": 37,\n",
" \"correct\": 14.0\n",
" },\n",
" \"computer_network\": {\n",
" \"score\": 31.57894736842105,\n",
" \"num\": 19,\n",
" \"correct\": 6.0\n",
" },\n",
" \"basic_medicine\": {\n",
" \"score\": 47.36842105263158,\n",
" \"num\": 19,\n",
" \"correct\": 9.000000000000002\n",
" },\n",
" \"urban_and_rural_planner\": {\n",
" \"score\": 36.95652173913044,\n",
" \"num\": 46,\n",
" \"correct\": 17.0\n",
" },\n",
" \"logic\": {\n",
" \"score\": 50.0,\n",
" \"num\": 22,\n",
" \"correct\": 11.0\n",
" },\n",
" \"civil_servant\": {\n",
" \"score\": 36.170212765957444,\n",
" \"num\": 47,\n",
" \"correct\": 17.0\n",
" },\n",
" \"art_studies\": {\n",
" \"score\": 39.39393939393939,\n",
" \"num\": 33,\n",
" \"correct\": 13.0\n",
" },\n",
" \"advanced_mathematics\": {\n",
" \"score\": 26.31578947368421,\n",
" \"num\": 19,\n",
" \"correct\": 4.999999999999999\n",
" },\n",
" \"electrical_engineer\": {\n",
" \"score\": 32.432432432432435,\n",
" \"num\": 37,\n",
" \"correct\": 12.0\n",
" },\n",
" \"accountant\": {\n",
" \"score\": 32.6530612244898,\n",
" \"num\": 49,\n",
" \"correct\": 16.0\n",
" },\n",
" \"operating_system\": {\n",
" \"score\": 42.10526315789474,\n",
" \"num\": 19,\n",
" \"correct\": 8.0\n",
" },\n",
" \"middle_school_politics\": {\n",
" \"score\": 57.142857142857146,\n",
" \"num\": 21,\n",
" \"correct\": 12.0\n",
" },\n",
" \"sports_science\": {\n",
" \"score\": 36.8421052631579,\n",
" \"num\": 19,\n",
" \"correct\": 7.0\n",
" },\n",
" \"middle_school_chemistry\": {\n",
" \"score\": 30.0,\n",
" \"num\": 20,\n",
" \"correct\": 6.0\n",
" },\n",
" \"marxism\": {\n",
" \"score\": 52.63157894736842,\n",
" \"num\": 19,\n",
" \"correct\": 9.999999999999998\n",
" },\n",
" \"fire_engineer\": {\n",
" \"score\": 25.806451612903224,\n",
" \"num\": 31,\n",
" \"correct\": 8.0\n",
" },\n",
" \"middle_school_geography\": {\n",
" \"score\": 8.333333333333334,\n",
" \"num\": 12,\n",
" \"correct\": 1.0\n",
" },\n",
" \"high_school_history\": {\n",
" \"score\": 50.0,\n",
" \"num\": 20,\n",
" \"correct\": 10.0\n",
" },\n",
" \"professional_tour_guide\": {\n",
" \"score\": 37.93103448275862,\n",
" \"num\": 29,\n",
" \"correct\": 11.0\n",
" },\n",
" \"middle_school_history\": {\n",
" \"score\": 31.818181818181817,\n",
" \"num\": 22,\n",
" \"correct\": 7.0\n",
" },\n",
" \"modern_chinese_history\": {\n",
" \"score\": 39.130434782608695,\n",
" \"num\": 23,\n",
" \"correct\": 9.0\n",
" },\n",
" \"clinical_medicine\": {\n",
" \"score\": 36.36363636363637,\n",
" \"num\": 22,\n",
" \"correct\": 8.000000000000002\n",
" },\n",
" \"high_school_biology\": {\n",
" \"score\": 52.63157894736842,\n",
" \"num\": 19,\n",
" \"correct\": 9.999999999999998\n",
" },\n",
" \"high_school_politics\": {\n",
" \"score\": 21.05263157894737,\n",
" \"num\": 19,\n",
" \"correct\": 4.0\n",
" },\n",
" \"tax_accountant\": {\n",
" \"score\": 34.69387755102041,\n",
" \"num\": 49,\n",
" \"correct\": 17.0\n",
" },\n",
" \"teacher_qualification\": {\n",
" \"score\": 56.81818181818182,\n",
" \"num\": 44,\n",
" \"correct\": 25.0\n",
" },\n",
" \"high_school_geography\": {\n",
" \"score\": 26.31578947368421,\n",
" \"num\": 19,\n",
" \"correct\": 4.999999999999999\n",
" },\n",
" \"high_school_chemistry\": {\n",
" \"score\": 31.57894736842105,\n",
" \"num\": 19,\n",
" \"correct\": 6.0\n",
" },\n",
" \"plant_protection\": {\n",
" \"score\": 54.54545454545455,\n",
" \"num\": 22,\n",
" \"correct\": 12.0\n",
" },\n",
" \"legal_professional\": {\n",
" \"score\": 39.130434782608695,\n",
" \"num\": 23,\n",
" \"correct\": 9.0\n",
" },\n",
" \"high_school_physics\": {\n",
" \"score\": 26.31578947368421,\n",
" \"num\": 19,\n",
" \"correct\": 4.999999999999999\n",
" },\n",
" \"ideological_and_moral_cultivation\": {\n",
" \"score\": 42.10526315789474,\n",
" \"num\": 19,\n",
" \"correct\": 8.0\n",
" },\n",
" \"veterinary_medicine\": {\n",
" \"score\": 39.130434782608695,\n",
" \"num\": 23,\n",
" \"correct\": 9.0\n",
" },\n",
" \"physician\": {\n",
" \"score\": 30.612244897959183,\n",
" \"num\": 49,\n",
" \"correct\": 15.0\n",
" },\n",
" \"college_physics\": {\n",
" \"score\": 21.05263157894737,\n",
" \"num\": 19,\n",
" \"correct\": 4.0\n",
" },\n",
" \"discrete_mathematics\": {\n",
" \"score\": 43.75,\n",
" \"num\": 16,\n",
" \"correct\": 7.0\n",
" },\n",
" \"mao_zedong_thought\": {\n",
" \"score\": 58.333333333333336,\n",
" \"num\": 24,\n",
" \"correct\": 14.0\n",
" },\n",
" \"education_science\": {\n",
" \"score\": 34.48275862068966,\n",
" \"num\": 29,\n",
" \"correct\": 10.000000000000002\n",
" },\n",
" \"business_administration\": {\n",
" \"score\": 33.333333333333336,\n",
" \"num\": 33,\n",
" \"correct\": 11.0\n",
" },\n",
" \"chinese_language_and_literature\": {\n",
" \"score\": 43.47826086956522,\n",
" \"num\": 23,\n",
" \"correct\": 10.0\n",
" },\n",
" \"metrology_engineer\": {\n",
" \"score\": 37.5,\n",
" \"num\": 24,\n",
" \"correct\": 9.0\n",
" },\n",
" \"probability_and_statistics\": {\n",
" \"score\": 22.22222222222222,\n",
" \"num\": 18,\n",
" \"correct\": 4.0\n",
" },\n",
" \"high_school_chinese\": {\n",
" \"score\": 31.57894736842105,\n",
" \"num\": 19,\n",
" \"correct\": 6.0\n",
" },\n",
" \"middle_school_physics\": {\n",
" \"score\": 36.8421052631579,\n",
" \"num\": 19,\n",
" \"correct\": 7.0\n",
" },\n",
" \"grouped\": {\n",
" \"STEM\": {\n",
" \"correct\": 143.0,\n",
" \"num\": 430,\n",
" \"score\": 0.3325581395348837\n",
" },\n",
" \"Social Science\": {\n",
" \"correct\": 111.0,\n",
" \"num\": 275,\n",
" \"score\": 0.4036363636363636\n",
" },\n",
" \"Humanities\": {\n",
" \"correct\": 100.0,\n",
" \"num\": 257,\n",
" \"score\": 0.38910505836575876\n",
" },\n",
" \"Other\": {\n",
" \"correct\": 140.0,\n",
" \"num\": 384,\n",
" \"score\": 0.3645833333333333\n",
" }\n",
" },\n",
" \"All\": {\n",
" \"score\": 0.36701337295690933,\n",
" \"num\": 1346,\n",
" \"correct\": 494.0\n",
" }\n",
"}"
]
}
]
}
]
}
================================================
FILE: notebooks/convert_and_quantize_chinese_llama_and_alpaca.ipynb
================================================
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"provenance": [],
"machine_shape": "hm",
"gpuType": "T4"
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "markdown",
"source": [
"# 转换并量化中文LLaMA和Alpaca模型\n",
"\n",
"项目地址:https://github.com/ymcui/Chinese-LLaMA-Alpaca\n",
"\n",
"⚠️ 内存消耗提示(确保刷出来的机器RAM大于以下要求):\n",
"- 7B模型:15G+\n",
"- 13B模型:18G+\n",
"- 33B模型:22G+\n",
"\n",
"💡 提示和小窍门:\n",
"- 免费用户默认的内存只有12G左右,不足以转换模型。**实测选择TPU的话有机会随机出35G内存**,建议多试几次\n",
"- Pro(+)用户请选择 “代码执行程序” -> “更改运行时类型” -> “高RAM”\n",
"- 程序莫名崩掉或断开连接就说明内存爆了\n",
"- 如果选了“高RAM”之后内存还是不够大的话,选择以下操作,有的时候会分配出很高内存的机器,祝你好运😄!\n",
" - 可以把GPU或者TPU也选上(虽然不会用到)\n",
" - 选GPU时,Pro(+)用户可选“A100”类型GPU\n",
"\n",
"*温馨提示:用完之后注意断开运行时,选择满足要求的最低配置即可,避免不必要的计算单元消耗(Pro只给100个计算单元)。*"
],
"metadata": {
"id": "B1c96_k3MahN"
}
},
{
"cell_type": "markdown",
"source": [
"## 安装相关依赖"
],
"metadata": {
"id": "vScqHD_jMFOV"
}
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "E5WKFJXIL6ZU",
"outputId": "a7baeebb-9b74-4d14-93dc-fb1f6e1b3716"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n",
"Requirement already satisfied: torch==1.13.1 in /usr/local/lib/python3.10/dist-packages (1.13.1)\n",
"Requirement already satisfied: typing-extensions in /usr/local/lib/python3.10/dist-packages (from torch==1.13.1) (4.5.0)\n",
"Requirement already satisfied: nvidia-cuda-runtime-cu11==11.7.99 in /usr/local/lib/python3.10/dist-packages (from torch==1.13.1) (11.7.99)\n",
"Requirement already satisfied: nvidia-cudnn-cu11==8.5.0.96 in /usr/local/lib/python3.10/dist-packages (from torch==1.13.1) (8.5.0.96)\n",
"Requirement already satisfied: nvidia-cublas-cu11==11.10.3.66 in /usr/local/lib/python3.10/dist-packages (from torch==1.13.1) (11.10.3.66)\n",
"Requirement already satisfied: nvidia-cuda-nvrtc-cu11==11.7.99 in /usr/local/lib/python3.10/dist-packages (from torch==1.13.1) (11.7.99)\n",
"Requirement already satisfied: setuptools in /usr/local/lib/python3.10/dist-packages (from nvidia-cublas-cu11==11.10.3.66->torch==1.13.1) (67.7.2)\n",
"Requirement already satisfied: wheel in /usr/local/lib/python3.10/dist-packages (from nvidia-cublas-cu11==11.10.3.66->torch==1.13.1) (0.40.0)\n",
"Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n",
"Requirement already satisfied: transformers==4.30.2 in /usr/local/lib/python3.10/dist-packages (4.30.2)\n",
"Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from transformers==4.30.2) (3.12.0)\n",
"Requirement already satisfied: huggingface-hub<1.0,>=0.14.1 in /usr/local/lib/python3.10/dist-packages (from transformers==4.30.2) (0.15.1)\n",
"Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.10/dist-packages (from transformers==4.30.2) (1.22.4)\n",
"Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from transformers==4.30.2) (23.1)\n",
"Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.10/dist-packages (from transformers==4.30.2) (6.0)\n",
"Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.10/dist-packages (from transformers==4.30.2) (2022.10.31)\n",
"Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from transformers==4.30.2) (2.27.1)\n",
"Requirement already satisfied: tokenizers!=0.11.3,<0.14,>=0.11.1 in /usr/local/lib/python3.10/dist-packages (from transformers==4.30.2) (0.13.3)\n",
"Requirement already satisfied: safetensors>=0.3.1 in /usr/local/lib/python3.10/dist-packages (from transformers==4.30.2) (0.3.1)\n",
"Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.10/dist-packages (from transformers==4.30.2) (4.65.0)\n",
"Requirement already satisfied: fsspec in /usr/local/lib/python3.10/dist-packages (from huggingface-hub<1.0,>=0.14.1->transformers==4.30.2) (2023.4.0)\n",
"Requirement already satisfied: typing-extensions>=3.7.4.3 in /usr/local/lib/python3.10/dist-packages (from huggingface-hub<1.0,>=0.14.1->transformers==4.30.2) (4.5.0)\n",
"Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests->transformers==4.30.2) (1.26.15)\n",
"Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests->transformers==4.30.2) (2022.12.7)\n",
"Requirement already satisfied: charset-normalizer~=2.0.0 in /usr/local/lib/python3.10/dist-packages (from requests->transformers==4.30.2) (2.0.12)\n",
"Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests->transformers==4.30.2) (3.4)\n",
"Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n",
"Collecting peft==0.3.0\n",
" Downloading peft-0.3.0-py3-none-any.whl (56 kB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m56.8/56.8 kB\u001b[0m \u001b[31m2.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hRequirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.10/dist-packages (from peft==0.3.0) (1.22.4)\n",
"Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from peft==0.3.0) (23.1)\n",
"Requirement already satisfied: psutil in /usr/local/lib/python3.10/dist-packages (from peft==0.3.0) (5.9.5)\n",
"Requirement already satisfied: pyyaml in /usr/local/lib/python3.10/dist-packages (from peft==0.3.0) (6.0)\n",
"Requirement already satisfied: torch>=1.13.0 in /usr/local/lib/python3.10/dist-packages (from peft==0.3.0) (1.13.1)\n",
"Requirement already satisfied: transformers in /usr/local/lib/python3.10/dist-packages (from peft==0.3.0) (4.30.2)\n",
"Collecting accelerate (from peft==0.3.0)\n",
" Downloading accelerate-0.20.3-py3-none-any.whl (227 kB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m227.6/227.6 kB\u001b[0m \u001b[31m10.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hRequirement already satisfied: typing-extensions in /usr/local/lib/python3.10/dist-packages (from torch>=1.13.0->peft==0.3.0) (4.5.0)\n",
"Requirement already satisfied: nvidia-cuda-runtime-cu11==11.7.99 in /usr/local/lib/python3.10/dist-packages (from torch>=1.13.0->peft==0.3.0) (11.7.99)\n",
"Requirement already satisfied: nvidia-cudnn-cu11==8.5.0.96 in /usr/local/lib/python3.10/dist-packages (from torch>=1.13.0->peft==0.3.0) (8.5.0.96)\n",
"Requirement already satisfied: nvidia-cublas-cu11==11.10.3.66 in /usr/local/lib/python3.10/dist-packages (from torch>=1.13.0->peft==0.3.0) (11.10.3.66)\n",
"Requirement already satisfied: nvidia-cuda-nvrtc-cu11==11.7.99 in /usr/local/lib/python3.10/dist-packages (from torch>=1.13.0->peft==0.3.0) (11.7.99)\n",
"Requirement already satisfied: setuptools in /usr/local/lib/python3.10/dist-packages (from nvidia-cublas-cu11==11.10.3.66->torch>=1.13.0->peft==0.3.0) (67.7.2)\n",
"Requirement already satisfied: wheel in /usr/local/lib/python3.10/dist-packages (from nvidia-cublas-cu11==11.10.3.66->torch>=1.13.0->peft==0.3.0) (0.40.0)\n",
"Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from transformers->peft==0.3.0) (3.12.0)\n",
"Requirement already satisfied: huggingface-hub<1.0,>=0.14.1 in /usr/local/lib/python3.10/dist-packages (from transformers->peft==0.3.0) (0.15.1)\n",
"Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.10/dist-packages (from transformers->peft==0.3.0) (2022.10.31)\n",
"Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from transformers->peft==0.3.0) (2.27.1)\n",
"Requirement already satisfied: tokenizers!=0.11.3,<0.14,>=0.11.1 in /usr/local/lib/python3.10/dist-packages (from transformers->peft==0.3.0) (0.13.3)\n",
"Requirement already satisfied: safetensors>=0.3.1 in /usr/local/lib/python3.10/dist-packages (from transformers->peft==0.3.0) (0.3.1)\n",
"Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.10/dist-packages (from transformers->peft==0.3.0) (4.65.0)\n",
"Requirement already satisfied: fsspec in /usr/local/lib/python3.10/dist-packages (from huggingface-hub<1.0,>=0.14.1->transformers->peft==0.3.0) (2023.4.0)\n",
"Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests->transformers->peft==0.3.0) (1.26.15)\n",
"Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests->transformers->peft==0.3.0) (2022.12.7)\n",
"Requirement already satisfied: charset-normalizer~=2.0.0 in /usr/local/lib/python3.10/dist-packages (from requests->transformers->peft==0.3.0) (2.0.12)\n",
"Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests->transformers->peft==0.3.0) (3.4)\n",
"Installing collected packages: accelerate, peft\n",
"Successfully installed accelerate-0.20.3 peft-0.3.0\n",
"Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n",
"Requirement already satisfied: sentencepiece in /usr/local/lib/python3.10/dist-packages (0.1.99)\n"
]
}
],
"source": [
"!pip install torch==1.13.1\n",
"!pip install transformers==4.30.2\n",
"!pip install peft==0.3.0\n",
"!pip install sentencepiece"
]
},
{
"cell_type": "markdown",
"source": [
"## 克隆目录和代码"
],
"metadata": {
"id": "ygb1xFIMNQKw"
}
},
{
"cell_type": "code",
"source": [
"!git clone https://github.com/ymcui/Chinese-LLaMA-Alpaca\n",
"!git clone https://github.com/ggerganov/llama.cpp"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "yCEJh7NJNXz9",
"outputId": "bfa34a83-a8b9-4e24-e956-83c7313eb448"
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Cloning into 'Chinese-LLaMA-Alpaca'...\n",
"remote: Enumerating objects: 1407, done.\u001b[K\n",
"remote: Counting objects: 100% (599/599), done.\u001b[K\n",
"remote: Compressing objects: 100% (257/257), done.\u001b[K\n",
"remote: Total 1407 (delta 369), reused 494 (delta 338), pack-reused 808\u001b[K\n",
"Receiving objects: 100% (1407/1407), 22.61 MiB | 27.14 MiB/s, done.\n",
"Resolving deltas: 100% (831/831), done.\n",
"Cloning into 'llama.cpp'...\n",
"remote: Enumerating objects: 3618, done.\u001b[K\n",
"remote: Counting objects: 100% (1155/1155), done.\u001b[K\n",
"remote: Compressing objects: 100% (124/124), done.\u001b[K\n",
"remote: Total 3618 (delta 1076), reused 1036 (delta 1031), pack-reused 2463\u001b[K\n",
"Receiving objects: 100% (3618/3618), 3.28 MiB | 21.36 MiB/s, done.\n",
"Resolving deltas: 100% (2424/2424), done.\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"## 合并模型(以Alpaca-7B为例)\n",
"\n",
"此处使用的是🤗模型库中提供的基模型(已是HF格式),而不是Facebook官方的LLaMA模型,因此略去将原版LLaMA转换为HF格式的步骤。\n",
"**这里直接运行第二步:合并LoRA权重**,生成全量模型权重。可以直接指定🤗模型库的地址,也可以是本地存放地址。\n",
"- 基模型:`elinas/llama-7b-hf-transformers-4.29` *(use at your own risk,我们比对过SHA256和正版一致,但你应确保自己有权使用该模型)*\n",
"- LoRA模型:`ziqingyang/chinese-alpaca-lora-7b`\n",
" - 如果是Alpaca-Plus模型,记得要同时传入llama和alpaca的lora,教程:[这里](https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/手动模型合并与转换#多lora权重合并适用于chinese-alpaca-plus)\n",
"- 输出格式:可选pth或者huggingface,这里选择pth,因为后面要用llama.cpp量化\n",
"\n",
"由于要下载模型,所以需要耐心等待一下,尤其是33B模型。\n",
"转换好的模型存放在`alpaca-combined`目录。\n",
"如果你不需要量化模型,那么到这一步就结束了,可自行下载或者转存到Google Drive。"
],
"metadata": {
"id": "nIyxX0DSNsgQ"
}
},
{
"cell_type": "code",
"source": [
"!python ./Chinese-LLaMA-Alpaca/scripts/merge_llama_with_chinese_lora_low_mem.py \\\n",
" --base_model 'elinas/llama-7b-hf-transformers-4.29' \\\n",
" --lora_model 'ziqingyang/chinese-alpaca-lora-7b' \\\n",
" --output_type pth \\\n",
" --output_dir alpaca-combined"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "5AV4EW5hNhVV",
"outputId": "5cb36099-4ca1-403e-c6b5-c8c8441eaa11"
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Base model: elinas/llama-7b-hf-transformers-4.29\n",
"LoRA model(s) ['ziqingyang/chinese-alpaca-lora-7b']:\n",
"Loading ziqingyang/chinese-alpaca-lora-7b\n",
"Cannot find lora model on the disk. Downloading lora model from hub...\n",
"Fetching 7 files: 0% 0/7 [00:00, ?it/s]\n",
"Downloading (…)c39d6ac454/README.md: 100% 316/316 [00:00<00:00, 1.93MB/s]\n",
"\n",
"Downloading (…)/adapter_config.json: 100% 472/472 [00:00<00:00, 3.48MB/s]\n",
"\n",
"Downloading (…)cial_tokens_map.json: 100% 96.0/96.0 [00:00<00:00, 661kB/s]\n",
"\n",
"Downloading (…)ac454/.gitattributes: 100% 1.48k/1.48k [00:00<00:00, 7.92MB/s]\n",
"Fetching 7 files: 14% 1/7 [00:00<00:00, 6.42it/s]\n",
"Downloading (…)okenizer_config.json: 100% 166/166 [00:00<00:00, 804kB/s]\n",
"\n",
"Downloading tokenizer.model: 0% 0.00/758k [00:00, ?B/s]\u001b[A\n",
"\n",
"Downloading tokenizer.model: 100% 758k/758k [00:00<00:00, 15.6MB/s]\n",
"\n",
"\n",
"Downloading adapter_model.bin: 1% 10.5M/858M [00:00<00:12, 66.0MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading adapter_model.bin: 2% 21.0M/858M [00:00<00:11, 75.2MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading adapter_model.bin: 4% 31.5M/858M [00:00<00:10, 78.9MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading adapter_model.bin: 5% 41.9M/858M [00:00<00:10, 80.0MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading adapter_model.bin: 6% 52.4M/858M [00:00<00:10, 78.9MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading adapter_model.bin: 7% 62.9M/858M [00:00<00:10, 77.5MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading adapter_model.bin: 9% 73.4M/858M [00:00<00:09, 80.0MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading adapter_model.bin: 10% 83.9M/858M [00:01<00:09, 78.4MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading adapter_model.bin: 11% 94.4M/858M [00:01<00:09, 80.8MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading adapter_model.bin: 12% 105M/858M [00:01<00:09, 82.9MB/s] \u001b[A\u001b[A\n",
"\n",
"Downloading adapter_model.bin: 13% 115M/858M [00:01<00:08, 82.9MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading adapter_model.bin: 15% 126M/858M [00:01<00:08, 84.1MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading adapter_model.bin: 16% 136M/858M [00:01<00:09, 78.2MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading adapter_model.bin: 17% 147M/858M [00:01<00:09, 78.4MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading adapter_model.bin: 18% 157M/858M [00:01<00:08, 80.5MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading adapter_model.bin: 20% 168M/858M [00:02<00:08, 78.1MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading adapter_model.bin: 21% 178M/858M [00:02<00:08, 81.4MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading adapter_model.bin: 22% 189M/858M [00:02<00:08, 78.1MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading adapter_model.bin: 23% 199M/858M [00:02<00:08, 78.7MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading adapter_model.bin: 24% 210M/858M [00:02<00:07, 81.3MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading adapter_model.bin: 26% 220M/858M [00:02<00:07, 81.2MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading adapter_model.bin: 27% 231M/858M [00:02<00:07, 82.5MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading adapter_model.bin: 28% 241M/858M [00:03<00:07, 82.3MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading adapter_model.bin: 29% 252M/858M [00:03<00:07, 83.6MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading adapter_model.bin: 31% 262M/858M [00:03<00:07, 80.5MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading adapter_model.bin: 32% 273M/858M [00:03<00:07, 82.3MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading adapter_model.bin: 33% 283M/858M [00:03<00:06, 82.3MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading adapter_model.bin: 34% 294M/858M [00:03<00:06, 84.2MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading adapter_model.bin: 35% 304M/858M [00:03<00:06, 82.5MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading adapter_model.bin: 37% 315M/858M [00:03<00:06, 79.9MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading adapter_model.bin: 38% 325M/858M [00:04<00:06, 82.7MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading adapter_model.bin: 39% 336M/858M [00:04<00:06, 84.1MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading adapter_model.bin: 40% 346M/858M [00:04<00:05, 85.6MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading adapter_model.bin: 42% 357M/858M [00:04<00:05, 86.9MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading adapter_model.bin: 43% 367M/858M [00:04<00:05, 87.8MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading adapter_model.bin: 44% 377M/858M [00:04<00:05, 87.3MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading adapter_model.bin: 45% 388M/858M [00:04<00:05, 88.2MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading adapter_model.bin: 46% 398M/858M [00:04<00:05, 85.8MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading adapter_model.bin: 48% 409M/858M [00:04<00:05, 85.2MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading adapter_model.bin: 49% 419M/858M [00:05<00:05, 85.7MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading adapter_model.bin: 50% 430M/858M [00:05<00:05, 84.5MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading adapter_model.bin: 51% 440M/858M [00:05<00:04, 85.6MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading adapter_model.bin: 53% 451M/858M [00:05<00:04, 86.3MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading adapter_model.bin: 54% 461M/858M [00:05<00:04, 87.6MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading adapter_model.bin: 55% 472M/858M [00:05<00:04, 87.5MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading adapter_model.bin: 56% 482M/858M [00:05<00:04, 85.3MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading adapter_model.bin: 57% 493M/858M [00:05<00:04, 84.6MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading adapter_model.bin: 59% 503M/858M [00:06<00:04, 85.3MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading adapter_model.bin: 60% 514M/858M [00:06<00:04, 85.1MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading adapter_model.bin: 61% 524M/858M [00:06<00:03, 85.6MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading adapter_model.bin: 62% 535M/858M [00:06<00:03, 86.7MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading adapter_model.bin: 64% 545M/858M [00:06<00:03, 87.5MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading adapter_model.bin: 65% 556M/858M [00:06<00:03, 89.2MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading adapter_model.bin: 66% 566M/858M [00:06<00:03, 89.6MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading adapter_model.bin: 67% 577M/858M [00:06<00:03, 89.6MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading adapter_model.bin: 68% 587M/858M [00:07<00:03, 85.8MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading adapter_model.bin: 70% 598M/858M [00:07<00:03, 86.4MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading adapter_model.bin: 71% 608M/858M [00:07<00:02, 88.2MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading adapter_model.bin: 72% 619M/858M [00:07<00:02, 89.0MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading adapter_model.bin: 73% 629M/858M [00:07<00:02, 85.7MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading adapter_model.bin: 75% 640M/858M [00:07<00:02, 81.9MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading adapter_model.bin: 76% 650M/858M [00:07<00:02, 80.6MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading adapter_model.bin: 77% 661M/858M [00:07<00:02, 81.7MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading adapter_model.bin: 78% 671M/858M [00:08<00:02, 82.8MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading adapter_model.bin: 79% 682M/858M [00:08<00:02, 76.3MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading adapter_model.bin: 81% 692M/858M [00:08<00:02, 79.8MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading adapter_model.bin: 82% 703M/858M [00:08<00:01, 80.9MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading adapter_model.bin: 83% 713M/858M [00:08<00:01, 80.5MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading adapter_model.bin: 84% 724M/858M [00:08<00:01, 83.1MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading adapter_model.bin: 85% 734M/858M [00:08<00:01, 85.2MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading adapter_model.bin: 87% 744M/858M [00:08<00:01, 84.9MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading adapter_model.bin: 88% 755M/858M [00:09<00:01, 83.9MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading adapter_model.bin: 89% 765M/858M [00:09<00:01, 81.9MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading adapter_model.bin: 90% 776M/858M [00:09<00:00, 85.2MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading adapter_model.bin: 92% 786M/858M [00:09<00:00, 86.6MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading adapter_model.bin: 93% 797M/858M [00:09<00:00, 85.9MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading adapter_model.bin: 94% 807M/858M [00:09<00:00, 87.7MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading adapter_model.bin: 95% 818M/858M [00:09<00:00, 86.0MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading adapter_model.bin: 96% 828M/858M [00:09<00:00, 87.4MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading adapter_model.bin: 98% 839M/858M [00:10<00:00, 86.8MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading adapter_model.bin: 99% 849M/858M [00:10<00:00, 82.4MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading adapter_model.bin: 100% 858M/858M [00:10<00:00, 83.3MB/s]\n",
"Fetching 7 files: 100% 7/7 [00:10<00:00, 1.50s/it]\n",
"Cannot find lora model on the disk. Downloading lora model from hub...\n",
"Fetching 11 files: 0% 0/11 [00:00, ?it/s]\n",
"Downloading (…)a55fdb7a/config.json: 100% 507/507 [00:00<00:00, 2.66MB/s]\n",
"\n",
"Downloading (…)l-00001-of-00002.bin: 0% 0.00/9.98G [00:00, ?B/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 0% 0.00/3.50G [00:00, ?B/s]\u001b[A\u001b[A\n",
"\n",
"\n",
"Downloading (…)neration_config.json: 100% 137/137 [00:00<00:00, 814kB/s]\n",
"\n",
"\n",
"\n",
"Downloading (…)fdb7a/.gitattributes: 100% 1.48k/1.48k [00:00<00:00, 11.4MB/s]\n",
"Fetching 11 files: 9% 1/11 [00:00<00:01, 9.05it/s]\n",
"\n",
"\n",
"Downloading (…)cial_tokens_map.json: 100% 411/411 [00:00<00:00, 1.57MB/s]\n",
"\n",
"\n",
"\n",
"Downloading (…)82a55fdb7a/README.md: 100% 8.34k/8.34k [00:00<00:00, 24.1MB/s]\n",
"\n",
"Downloading (…)l-00001-of-00002.bin: 0% 10.5M/9.98G [00:00<01:36, 103MB/s]\u001b[A\n",
"\n",
"\n",
"Downloading tokenizer.model: 0% 0.00/500k [00:00, ?B/s]\u001b[A\u001b[A\u001b[A\n",
"\n",
"\n",
"\n",
"Downloading (…)fdb7a/tokenizer.json: 0% 0.00/1.84M [00:00, ?B/s]\u001b[A\u001b[A\u001b[A\u001b[A\n",
"\n",
"Downloading tokenizer.model: 100% 500k/500k [00:00<00:00, 11.7MB/s]\n",
"Downloading (…)fdb7a/tokenizer.json: 100% 1.84M/1.84M [00:00<00:00, 28.3MB/s]\n",
"\n",
"Downloading (…)l-00001-of-00002.bin: 0% 41.9M/9.98G [00:00<00:55, 180MB/s]\u001b[A\n",
"\n",
"\n",
"Downloading (…)model.bin.index.json: 100% 26.8k/26.8k [00:00<00:00, 53.1MB/s]\n",
"\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 1% 21.0M/3.50G [00:00<00:44, 78.4MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 1% 73.4M/9.98G [00:00<00:50, 197MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 1% 31.5M/3.50G [00:00<00:47, 73.0MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 1% 94.4M/9.98G [00:00<00:49, 200MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 1% 41.9M/3.50G [00:00<00:44, 77.4MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 1% 115M/9.98G [00:00<00:49, 200MB/s] \u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 1% 52.4M/3.50G [00:00<00:42, 80.7MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 1% 136M/9.98G [00:00<00:49, 200MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 2% 62.9M/3.50G [00:00<00:42, 81.0MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 2% 168M/9.98G [00:00<00:47, 206MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 2% 73.4M/3.50G [00:00<00:40, 83.7MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 2% 199M/9.98G [00:00<00:46, 210MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 2% 83.9M/3.50G [00:01<00:39, 85.9MB/s]\u001b[A\u001b[A\n",
"\n",
"\n",
"Downloading (…)okenizer_config.json: 100% 727/727 [00:00<00:00, 3.07MB/s]\n",
"\n",
"Downloading (…)l-00001-of-00002.bin: 2% 231M/9.98G [00:01<00:45, 212MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 3% 94.4M/3.50G [00:01<00:40, 83.6MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 3% 262M/9.98G [00:01<00:45, 212MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 3% 105M/3.50G [00:01<00:42, 80.7MB/s] \u001b[A\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 3% 115M/3.50G [00:01<00:40, 83.3MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 3% 294M/9.98G [00:01<00:45, 214MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 4% 126M/3.50G [00:01<00:40, 84.0MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 3% 325M/9.98G [00:01<00:44, 216MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 4% 136M/3.50G [00:01<00:40, 83.7MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 4% 357M/9.98G [00:01<00:44, 216MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 4% 147M/3.50G [00:01<00:40, 82.3MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 4% 388M/9.98G [00:01<00:44, 215MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 4% 157M/3.50G [00:01<00:39, 83.7MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 4% 419M/9.98G [00:02<00:44, 213MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 5% 168M/3.50G [00:02<00:40, 82.8MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 5% 451M/9.98G [00:02<00:44, 214MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 5% 178M/3.50G [00:02<00:40, 82.3MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 5% 189M/3.50G [00:02<00:40, 81.9MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 5% 482M/9.98G [00:02<00:44, 211MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 6% 199M/3.50G [00:02<00:40, 81.1MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 5% 514M/9.98G [00:02<00:44, 214MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 6% 210M/3.50G [00:02<00:41, 80.0MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 5% 545M/9.98G [00:02<00:42, 220MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 6% 220M/3.50G [00:02<00:41, 78.3MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 6% 577M/9.98G [00:02<00:42, 220MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 7% 231M/3.50G [00:02<00:41, 79.1MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 7% 241M/3.50G [00:02<00:40, 79.8MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 7% 252M/3.50G [00:03<00:40, 79.6MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 6% 608M/9.98G [00:03<01:04, 145MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 6% 629M/9.98G [00:03<01:01, 153MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 7% 262M/3.50G [00:03<00:41, 77.4MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 7% 650M/9.98G [00:03<00:57, 161MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 8% 273M/3.50G [00:03<00:40, 79.8MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 7% 671M/9.98G [00:03<00:54, 171MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 8% 283M/3.50G [00:03<00:41, 77.8MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 7% 692M/9.98G [00:03<00:53, 174MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 8% 294M/3.50G [00:03<00:40, 78.9MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 7% 713M/9.98G [00:03<00:53, 172MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 9% 304M/3.50G [00:03<00:39, 81.4MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 7% 734M/9.98G [00:03<00:56, 165MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 9% 315M/3.50G [00:03<00:39, 80.0MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 8% 755M/9.98G [00:03<00:57, 161MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 9% 325M/3.50G [00:04<00:38, 82.5MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 8% 776M/9.98G [00:04<00:59, 154MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 10% 336M/3.50G [00:04<00:38, 83.0MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 8% 797M/9.98G [00:04<00:56, 164MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 10% 346M/3.50G [00:04<00:37, 84.2MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 10% 357M/3.50G [00:04<00:36, 85.1MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 8% 818M/9.98G [00:04<01:04, 143MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 10% 367M/3.50G [00:07<04:22, 11.9MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 11% 377M/3.50G [00:07<03:34, 14.5MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 8% 839M/9.98G [00:07<07:14, 21.0MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 11% 398M/3.50G [00:07<02:06, 24.6MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 9% 860M/9.98G [00:07<05:26, 27.9MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 12% 409M/3.50G [00:07<01:43, 30.0MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 9% 881M/9.98G [00:07<04:15, 35.7MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 12% 419M/3.50G [00:07<01:27, 35.2MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 12% 430M/3.50G [00:07<01:13, 41.9MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 9% 902M/9.98G [00:07<03:23, 44.6MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 13% 440M/3.50G [00:08<01:02, 49.0MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 13% 451M/3.50G [00:08<00:54, 56.0MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 9% 923M/9.98G [00:08<02:48, 53.7MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 13% 461M/3.50G [00:08<00:48, 62.1MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 9% 944M/9.98G [00:08<02:23, 63.0MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 13% 472M/3.50G [00:08<00:46, 65.2MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 14% 482M/3.50G [00:08<00:43, 69.7MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 10% 965M/9.98G [00:08<02:05, 72.0MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 14% 493M/3.50G [00:08<00:41, 71.9MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 10% 986M/9.98G [00:08<01:55, 77.9MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 14% 503M/3.50G [00:08<00:39, 76.4MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 10% 996M/9.98G [00:08<01:52, 79.7MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 15% 514M/3.50G [00:08<00:38, 77.2MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 10% 1.01G/9.98G [00:09<01:48, 82.6MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 15% 524M/3.50G [00:09<00:38, 76.8MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 10% 1.02G/9.98G [00:09<01:46, 84.1MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 15% 535M/3.50G [00:09<00:38, 77.9MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 10% 1.03G/9.98G [00:09<01:42, 87.1MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 16% 545M/3.50G [00:09<00:36, 81.3MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 10% 1.04G/9.98G [00:09<01:39, 89.5MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 16% 556M/3.50G [00:09<00:36, 79.6MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 11% 1.06G/9.98G [00:09<01:26, 104MB/s] \u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 16% 566M/3.50G [00:09<00:37, 79.2MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 11% 1.08G/9.98G [00:09<01:13, 121MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 16% 577M/3.50G [00:09<00:36, 79.7MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 11% 1.11G/9.98G [00:09<00:59, 150MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 17% 587M/3.50G [00:09<00:35, 82.0MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 11% 1.14G/9.98G [00:12<05:28, 26.9MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 12% 1.16G/9.98G [00:12<04:13, 34.7MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 17% 598M/3.50G [00:12<04:14, 11.4MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 17% 608M/3.50G [00:12<03:09, 15.2MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 12% 1.18G/9.98G [00:12<03:37, 40.5MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 18% 629M/3.50G [00:12<01:50, 25.9MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 19% 650M/3.50G [00:13<01:12, 39.4MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 12% 1.21G/9.98G [00:13<03:04, 47.5MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 19% 682M/3.50G [00:13<00:44, 63.2MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 12% 1.22G/9.98G [00:13<02:49, 51.8MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 20% 713M/3.50G [00:13<00:31, 88.4MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 12% 1.23G/9.98G [00:13<02:37, 55.5MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 21% 734M/3.50G [00:13<00:27, 101MB/s] \u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 12% 1.24G/9.98G [00:13<02:25, 60.0MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 13% 1.25G/9.98G [00:13<02:15, 64.5MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 22% 755M/3.50G [00:13<00:30, 90.4MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 13% 1.26G/9.98G [00:13<02:09, 67.4MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 13% 1.27G/9.98G [00:13<02:06, 68.8MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 22% 776M/3.50G [00:14<00:32, 83.5MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 13% 1.28G/9.98G [00:14<02:03, 70.6MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 13% 1.29G/9.98G [00:14<01:57, 73.9MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 23% 797M/3.50G [00:14<00:30, 87.4MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 13% 1.30G/9.98G [00:14<01:51, 77.5MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 23% 818M/3.50G [00:14<00:26, 102MB/s] \u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 13% 1.31G/9.98G [00:14<01:54, 76.0MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 24% 839M/3.50G [00:14<00:23, 115MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 25% 860M/3.50G [00:14<00:24, 108MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 13% 1.32G/9.98G [00:14<02:42, 53.1MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 25% 881M/3.50G [00:14<00:21, 123MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 13% 1.33G/9.98G [00:14<02:30, 57.5MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 26% 902M/3.50G [00:14<00:19, 135MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 26% 923M/3.50G [00:15<00:17, 148MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 14% 1.35G/9.98G [00:15<01:53, 75.8MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 27% 944M/3.50G [00:15<00:16, 158MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 14% 1.37G/9.98G [00:15<01:37, 88.7MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 28% 965M/3.50G [00:15<00:15, 164MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 14% 1.38G/9.98G [00:15<01:40, 85.8MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 28% 986M/3.50G [00:15<00:15, 167MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 29% 1.01G/3.50G [00:15<00:14, 171MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 14% 1.39G/9.98G [00:15<01:40, 85.4MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 29% 1.03G/3.50G [00:15<00:13, 177MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 14% 1.41G/9.98G [00:15<01:37, 87.6MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 30% 1.05G/3.50G [00:15<00:13, 176MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 14% 1.42G/9.98G [00:15<01:38, 86.8MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 31% 1.07G/3.50G [00:15<00:13, 181MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 14% 1.43G/9.98G [00:15<01:40, 85.3MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 14% 1.44G/9.98G [00:16<01:39, 86.0MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 31% 1.09G/3.50G [00:16<00:15, 160MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 15% 1.45G/9.98G [00:16<01:39, 85.9MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 15% 1.46G/9.98G [00:16<01:38, 86.4MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 32% 1.11G/3.50G [00:16<00:18, 129MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 15% 1.47G/9.98G [00:16<01:53, 75.2MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 32% 1.13G/3.50G [00:16<00:21, 110MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 15% 1.48G/9.98G [00:16<01:54, 74.0MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 15% 1.49G/9.98G [00:16<01:50, 76.9MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 33% 1.15G/3.50G [00:16<00:23, 101MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 15% 1.50G/9.98G [00:16<01:45, 80.4MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 15% 1.51G/9.98G [00:16<01:42, 82.4MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 34% 1.17G/3.50G [00:17<00:24, 96.6MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 15% 1.52G/9.98G [00:17<01:41, 83.5MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 15% 1.53G/9.98G [00:17<01:41, 83.0MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 34% 1.20G/3.50G [00:17<00:24, 93.7MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 15% 1.54G/9.98G [00:17<01:39, 84.4MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 34% 1.21G/3.50G [00:17<00:25, 90.8MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 16% 1.55G/9.98G [00:17<01:39, 84.3MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 35% 1.22G/3.50G [00:17<00:25, 90.5MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 16% 1.56G/9.98G [00:17<01:40, 83.9MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 35% 1.23G/3.50G [00:17<00:26, 85.8MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 16% 1.57G/9.98G [00:17<01:39, 84.2MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 35% 1.24G/3.50G [00:17<00:26, 85.8MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 16% 1.58G/9.98G [00:17<01:38, 85.4MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 36% 1.25G/3.50G [00:17<00:26, 86.3MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 16% 1.59G/9.98G [00:17<01:37, 85.6MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 36% 1.26G/3.50G [00:18<00:25, 88.0MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 16% 1.60G/9.98G [00:18<01:38, 85.2MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 36% 1.27G/3.50G [00:18<00:25, 88.6MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 16% 1.61G/9.98G [00:18<01:39, 83.8MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 37% 1.28G/3.50G [00:18<00:25, 88.5MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 16% 1.63G/9.98G [00:18<01:38, 85.2MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 37% 1.29G/3.50G [00:18<00:24, 88.6MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 16% 1.64G/9.98G [00:18<01:39, 83.5MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 37% 1.30G/3.50G [00:18<00:25, 88.0MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 17% 1.65G/9.98G [00:18<01:36, 85.9MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 37% 1.31G/3.50G [00:18<00:24, 88.1MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 17% 1.66G/9.98G [00:18<01:38, 84.8MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 38% 1.32G/3.50G [00:18<00:25, 86.1MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 17% 1.67G/9.98G [00:18<01:37, 85.6MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 38% 1.33G/3.50G [00:18<00:24, 87.6MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 17% 1.68G/9.98G [00:18<01:37, 85.5MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 38% 1.34G/3.50G [00:18<00:25, 84.7MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 17% 1.69G/9.98G [00:19<01:38, 84.0MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 39% 1.35G/3.50G [00:19<00:25, 85.7MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 17% 1.70G/9.98G [00:19<01:38, 84.3MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 39% 1.36G/3.50G [00:19<00:25, 83.4MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 17% 1.71G/9.98G [00:19<01:37, 84.8MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 39% 1.37G/3.50G [00:19<00:26, 79.2MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 17% 1.72G/9.98G [00:19<01:41, 81.0MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 40% 1.38G/3.50G [00:19<00:26, 81.2MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 17% 1.73G/9.98G [00:19<01:39, 82.9MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 40% 1.39G/3.50G [00:19<00:25, 82.4MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 17% 1.74G/9.98G [00:19<01:37, 84.1MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 40% 1.41G/3.50G [00:19<00:25, 81.8MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 18% 1.75G/9.98G [00:19<01:35, 86.4MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 40% 1.42G/3.50G [00:19<00:24, 83.8MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 18% 1.76G/9.98G [00:19<01:33, 87.5MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 41% 1.43G/3.50G [00:19<00:24, 84.7MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 18% 1.77G/9.98G [00:20<01:37, 84.1MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 41% 1.44G/3.50G [00:20<00:24, 84.6MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 18% 1.78G/9.98G [00:20<01:36, 85.3MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 41% 1.45G/3.50G [00:20<00:23, 85.8MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 18% 1.79G/9.98G [00:20<01:34, 86.4MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 42% 1.46G/3.50G [00:20<00:23, 86.5MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 18% 1.80G/9.98G [00:20<01:34, 86.9MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 42% 1.47G/3.50G [00:20<00:23, 85.7MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 18% 1.81G/9.98G [00:20<01:34, 86.5MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 42% 1.48G/3.50G [00:20<00:23, 84.6MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 18% 1.82G/9.98G [00:20<01:36, 84.6MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 43% 1.49G/3.50G [00:20<00:24, 83.3MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 18% 1.84G/9.98G [00:20<01:53, 72.0MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 43% 1.50G/3.50G [00:20<00:24, 83.0MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 18% 1.85G/9.98G [00:20<01:49, 74.1MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 43% 1.51G/3.50G [00:20<00:24, 81.0MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 43% 1.52G/3.50G [00:21<00:23, 83.0MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 19% 1.86G/9.98G [00:21<01:54, 70.8MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 44% 1.53G/3.50G [00:21<00:23, 83.8MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 19% 1.87G/9.98G [00:21<01:47, 75.3MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 44% 1.54G/3.50G [00:21<00:23, 84.6MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 19% 1.88G/9.98G [00:21<01:53, 71.3MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 44% 1.55G/3.50G [00:21<00:22, 87.2MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 19% 1.89G/9.98G [00:21<01:47, 75.2MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 45% 1.56G/3.50G [00:21<00:22, 88.0MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 19% 1.90G/9.98G [00:21<01:41, 79.8MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 45% 1.57G/3.50G [00:21<00:22, 87.5MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 19% 1.91G/9.98G [00:21<01:38, 82.2MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 45% 1.58G/3.50G [00:21<00:22, 83.7MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 19% 1.92G/9.98G [00:21<01:39, 81.1MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 46% 1.59G/3.50G [00:21<00:23, 80.4MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 19% 1.93G/9.98G [00:22<01:47, 74.6MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 46% 1.60G/3.50G [00:22<00:23, 82.2MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 19% 1.94G/9.98G [00:22<01:45, 76.1MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 46% 1.61G/3.50G [00:22<00:22, 84.0MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 20% 1.95G/9.98G [00:22<01:42, 78.3MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 46% 1.63G/3.50G [00:22<00:22, 84.4MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 20% 1.96G/9.98G [00:22<01:41, 78.6MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 47% 1.64G/3.50G [00:22<00:23, 80.9MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 20% 1.97G/9.98G [00:22<01:40, 79.9MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 47% 1.65G/3.50G [00:22<00:22, 83.2MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 20% 1.98G/9.98G [00:22<01:39, 80.4MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 47% 1.66G/3.50G [00:22<00:21, 84.1MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 20% 1.99G/9.98G [00:22<01:37, 81.9MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 48% 1.67G/3.50G [00:22<00:21, 84.6MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 48% 1.68G/3.50G [00:22<00:22, 82.8MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 20% 2.00G/9.98G [00:23<01:41, 78.4MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 48% 1.69G/3.50G [00:23<00:21, 84.7MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 20% 2.01G/9.98G [00:23<01:46, 74.8MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 49% 1.70G/3.50G [00:23<00:22, 81.3MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 20% 2.02G/9.98G [00:23<01:48, 73.2MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 49% 1.71G/3.50G [00:23<00:22, 80.5MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 49% 1.72G/3.50G [00:23<00:21, 81.7MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 20% 2.03G/9.98G [00:23<02:03, 64.5MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 49% 1.73G/3.50G [00:23<00:21, 83.8MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 20% 2.04G/9.98G [00:23<01:53, 69.6MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 50% 1.74G/3.50G [00:23<00:20, 84.3MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 21% 2.06G/9.98G [00:23<01:50, 71.4MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 50% 1.75G/3.50G [00:23<00:20, 85.5MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 21% 2.07G/9.98G [00:23<01:49, 72.6MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 50% 1.76G/3.50G [00:23<00:19, 87.3MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 21% 2.08G/9.98G [00:24<01:45, 75.2MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 51% 1.77G/3.50G [00:24<00:20, 85.0MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 21% 2.09G/9.98G [00:24<01:42, 77.3MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 51% 1.78G/3.50G [00:24<00:20, 85.8MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 21% 2.10G/9.98G [00:24<01:50, 71.1MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 51% 1.79G/3.50G [00:24<00:19, 85.4MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 52% 1.80G/3.50G [00:24<00:19, 87.0MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 21% 2.11G/9.98G [00:24<01:45, 74.2MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 52% 1.81G/3.50G [00:24<00:19, 85.8MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 21% 2.12G/9.98G [00:24<01:43, 75.8MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 52% 1.82G/3.50G [00:24<00:19, 84.4MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 21% 2.13G/9.98G [00:24<01:47, 73.1MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 52% 1.84G/3.50G [00:24<00:19, 83.4MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 21% 2.14G/9.98G [00:24<01:42, 76.2MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 53% 1.85G/3.50G [00:24<00:20, 81.7MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 22% 2.15G/9.98G [00:25<01:45, 73.9MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 53% 1.86G/3.50G [00:25<00:20, 81.8MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 22% 2.16G/9.98G [00:25<01:42, 76.5MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 53% 1.87G/3.50G [00:25<00:20, 80.3MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 22% 2.17G/9.98G [00:25<01:40, 77.7MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 54% 1.88G/3.50G [00:25<00:19, 82.6MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 22% 2.18G/9.98G [00:25<01:38, 79.2MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 54% 1.89G/3.50G [00:25<00:19, 84.3MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 22% 2.19G/9.98G [00:25<01:38, 79.0MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 54% 1.90G/3.50G [00:25<00:18, 85.7MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 22% 2.20G/9.98G [00:25<01:39, 78.2MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 55% 1.91G/3.50G [00:25<00:19, 82.3MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 22% 2.21G/9.98G [00:25<01:41, 76.2MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 55% 1.92G/3.50G [00:25<00:18, 84.2MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 22% 2.22G/9.98G [00:25<01:40, 77.5MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 55% 1.93G/3.50G [00:25<00:18, 85.3MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 22% 2.23G/9.98G [00:26<01:36, 79.8MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 55% 1.94G/3.50G [00:26<00:19, 81.8MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 22% 2.24G/9.98G [00:26<01:36, 80.0MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 56% 1.95G/3.50G [00:26<00:19, 81.5MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 23% 2.25G/9.98G [00:26<01:34, 81.6MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 56% 1.96G/3.50G [00:26<00:18, 82.6MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 23% 2.26G/9.98G [00:26<01:36, 79.8MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 56% 1.97G/3.50G [00:26<00:18, 82.0MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 23% 2.28G/9.98G [00:26<01:35, 80.2MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 57% 1.98G/3.50G [00:26<00:18, 82.5MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 23% 2.29G/9.98G [00:26<01:38, 78.4MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 57% 1.99G/3.50G [00:26<00:18, 81.8MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 23% 2.30G/9.98G [00:26<01:35, 80.1MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 57% 2.00G/3.50G [00:26<00:17, 83.8MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 58% 2.01G/3.50G [00:26<00:17, 84.5MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 23% 2.31G/9.98G [00:27<01:39, 77.0MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 58% 2.02G/3.50G [00:27<00:17, 82.7MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 23% 2.32G/9.98G [00:27<01:43, 74.1MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 58% 2.03G/3.50G [00:27<00:18, 77.9MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 23% 2.33G/9.98G [00:27<01:49, 69.9MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 58% 2.04G/3.50G [00:27<00:19, 76.3MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 23% 2.34G/9.98G [00:27<01:46, 71.5MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 59% 2.06G/3.50G [00:27<00:18, 79.8MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 24% 2.35G/9.98G [00:27<01:37, 77.9MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 59% 2.07G/3.50G [00:27<00:17, 82.2MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 24% 2.37G/9.98G [00:27<01:14, 102MB/s] \u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 59% 2.08G/3.50G [00:27<00:17, 81.0MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 24% 2.39G/9.98G [00:27<01:00, 125MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 60% 2.09G/3.50G [00:27<00:17, 81.5MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 24% 2.41G/9.98G [00:27<00:52, 145MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 24% 2.43G/9.98G [00:28<00:46, 162MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 60% 2.10G/3.50G [00:28<00:16, 83.3MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 25% 2.45G/9.98G [00:28<00:43, 173MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 60% 2.11G/3.50G [00:28<00:16, 85.1MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 25% 2.47G/9.98G [00:28<00:42, 175MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 25% 2.51G/9.98G [00:28<00:39, 190MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 61% 2.12G/3.50G [00:28<00:24, 56.9MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 25% 2.54G/9.98G [00:28<00:37, 201MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 61% 2.14G/3.50G [00:28<00:16, 82.9MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 26% 2.57G/9.98G [00:28<00:36, 205MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 26% 2.59G/9.98G [00:28<00:35, 205MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 62% 2.16G/3.50G [00:28<00:14, 90.8MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 26% 2.62G/9.98G [00:28<00:34, 212MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 62% 2.17G/3.50G [00:28<00:14, 88.9MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 62% 2.18G/3.50G [00:29<00:14, 89.3MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 27% 2.65G/9.98G [00:29<00:34, 214MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 63% 2.19G/3.50G [00:29<00:14, 87.4MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 27% 2.68G/9.98G [00:29<00:36, 201MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 63% 2.20G/3.50G [00:29<00:14, 86.9MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 27% 2.71G/9.98G [00:29<00:39, 183MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 63% 2.21G/3.50G [00:29<00:14, 87.8MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 64% 2.22G/3.50G [00:29<00:14, 85.7MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 27% 2.73G/9.98G [00:29<00:42, 171MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 64% 2.23G/3.50G [00:29<00:14, 86.3MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 28% 2.75G/9.98G [00:29<00:46, 156MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 64% 2.24G/3.50G [00:29<00:14, 86.6MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 28% 2.77G/9.98G [00:29<00:44, 160MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 64% 2.25G/3.50G [00:29<00:14, 85.0MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 28% 2.79G/9.98G [00:29<00:44, 160MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 65% 2.26G/3.50G [00:30<00:14, 85.4MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 28% 2.81G/9.98G [00:32<04:32, 26.3MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 65% 2.28G/3.50G [00:32<01:37, 12.6MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 28% 2.83G/9.98G [00:32<03:25, 34.7MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 65% 2.29G/3.50G [00:32<01:12, 16.8MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 66% 2.31G/3.50G [00:32<00:41, 28.7MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 29% 2.85G/9.98G [00:32<02:55, 40.7MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 67% 2.33G/3.50G [00:32<00:26, 43.4MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 29% 2.88G/9.98G [00:32<01:59, 59.6MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 67% 2.35G/3.50G [00:33<00:19, 58.9MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 29% 2.90G/9.98G [00:33<01:50, 64.3MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 68% 2.37G/3.50G [00:33<00:17, 63.8MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 29% 2.93G/9.98G [00:33<01:40, 70.4MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 68% 2.39G/3.50G [00:33<00:15, 71.1MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 30% 2.95G/9.98G [00:33<01:36, 72.9MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 69% 2.41G/3.50G [00:33<00:14, 73.4MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 69% 2.42G/3.50G [00:33<00:14, 74.6MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 30% 2.97G/9.98G [00:33<01:31, 76.3MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 69% 2.43G/3.50G [00:34<00:13, 76.8MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 30% 2.98G/9.98G [00:34<01:31, 76.3MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 70% 2.44G/3.50G [00:34<00:13, 78.6MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 30% 2.99G/9.98G [00:34<01:28, 78.6MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 30% 3.00G/9.98G [00:34<01:24, 82.7MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 70% 2.46G/3.50G [00:34<00:11, 87.9MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 30% 3.02G/9.98G [00:34<01:12, 96.0MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 71% 2.49G/3.50G [00:34<00:10, 101MB/s] \u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 30% 3.04G/9.98G [00:34<01:01, 113MB/s] \u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 72% 2.51G/3.50G [00:34<00:08, 119MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 31% 3.06G/9.98G [00:34<00:53, 129MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 72% 2.53G/3.50G [00:34<00:07, 136MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 31% 3.08G/9.98G [00:34<00:48, 143MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 73% 2.55G/3.50G [00:34<00:06, 148MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 31% 3.10G/9.98G [00:34<00:43, 158MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 73% 2.57G/3.50G [00:34<00:05, 158MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 31% 3.12G/9.98G [00:35<00:44, 152MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 74% 2.59G/3.50G [00:35<00:05, 158MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 32% 3.15G/9.98G [00:35<00:57, 119MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 75% 2.61G/3.50G [00:35<00:08, 107MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 32% 3.17G/9.98G [00:35<01:12, 94.2MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 75% 2.63G/3.50G [00:35<00:09, 95.6MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 32% 3.19G/9.98G [00:35<01:02, 109MB/s] \u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 76% 2.65G/3.50G [00:35<00:07, 108MB/s] \u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 32% 3.21G/9.98G [00:35<00:53, 125MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 76% 2.67G/3.50G [00:35<00:06, 122MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 32% 3.23G/9.98G [00:36<00:48, 139MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 77% 2.69G/3.50G [00:36<00:05, 134MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 33% 3.25G/9.98G [00:36<00:43, 154MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 78% 2.72G/3.50G [00:36<00:05, 147MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 33% 3.27G/9.98G [00:36<00:40, 165MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 78% 2.74G/3.50G [00:36<00:04, 158MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 33% 3.29G/9.98G [00:36<00:39, 170MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 79% 2.76G/3.50G [00:36<00:04, 167MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 33% 3.31G/9.98G [00:36<00:37, 176MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 79% 2.78G/3.50G [00:36<00:04, 174MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 33% 3.33G/9.98G [00:36<00:36, 183MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 80% 2.80G/3.50G [00:36<00:03, 177MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 34% 3.36G/9.98G [00:36<00:35, 186MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 81% 2.82G/3.50G [00:36<00:04, 148MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 34% 3.38G/9.98G [00:36<00:41, 159MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 34% 3.40G/9.98G [00:36<00:39, 168MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 34% 3.42G/9.98G [00:37<00:36, 177MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 81% 2.84G/3.50G [00:37<00:05, 117MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 34% 3.44G/9.98G [00:37<00:36, 177MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 35% 3.46G/9.98G [00:37<00:35, 181MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 82% 2.86G/3.50G [00:37<00:06, 104MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 35% 3.48G/9.98G [00:37<00:37, 173MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 35% 3.51G/9.98G [00:37<00:37, 172MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 82% 2.88G/3.50G [00:37<00:06, 91.3MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 35% 3.53G/9.98G [00:37<00:39, 164MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 83% 2.90G/3.50G [00:37<00:06, 93.8MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 36% 3.55G/9.98G [00:37<00:46, 139MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 83% 2.92G/3.50G [00:38<00:06, 91.5MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 36% 3.58G/9.98G [00:38<00:46, 138MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 84% 2.93G/3.50G [00:38<00:06, 85.1MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 36% 3.60G/9.98G [00:38<00:46, 138MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 84% 2.94G/3.50G [00:38<00:06, 83.5MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 84% 2.95G/3.50G [00:38<00:06, 83.3MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 36% 3.62G/9.98G [00:38<00:45, 140MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 84% 2.96G/3.50G [00:38<00:06, 79.9MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 36% 3.64G/9.98G [00:38<00:46, 137MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 85% 2.97G/3.50G [00:38<00:06, 81.1MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 37% 3.66G/9.98G [00:38<00:46, 137MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 85% 2.98G/3.50G [00:38<00:06, 79.8MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 37% 3.68G/9.98G [00:38<00:47, 132MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 85% 2.99G/3.50G [00:38<00:06, 82.0MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 86% 3.00G/3.50G [00:39<00:05, 83.9MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 37% 3.70G/9.98G [00:39<00:50, 123MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 86% 3.01G/3.50G [00:39<00:06, 77.7MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 37% 3.72G/9.98G [00:39<01:03, 99.2MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 38% 3.74G/9.98G [00:39<01:00, 103MB/s] \u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 86% 3.02G/3.50G [00:39<00:10, 47.1MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 38% 3.76G/9.98G [00:39<00:59, 105MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 87% 3.03G/3.50G [00:39<00:09, 48.7MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 38% 3.79G/9.98G [00:39<00:52, 118MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 87% 3.04G/3.50G [00:39<00:08, 56.9MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 38% 3.81G/9.98G [00:40<00:47, 130MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 87% 3.06G/3.50G [00:40<00:05, 77.3MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 38% 3.83G/9.98G [00:40<00:42, 145MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 39% 3.85G/9.98G [00:40<00:39, 154MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 88% 3.08G/3.50G [00:40<00:04, 94.1MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 39% 3.87G/9.98G [00:40<00:37, 161MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 89% 3.10G/3.50G [00:40<00:03, 111MB/s] \u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 39% 3.89G/9.98G [00:40<00:35, 170MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 39% 3.91G/9.98G [00:40<00:34, 178MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 89% 3.12G/3.50G [00:40<00:03, 109MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 39% 3.93G/9.98G [00:40<00:33, 182MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 40% 3.95G/9.98G [00:40<00:31, 190MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 90% 3.15G/3.50G [00:40<00:03, 102MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 40% 3.97G/9.98G [00:40<00:31, 193MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 40% 4.00G/9.98G [00:41<00:33, 180MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 90% 3.17G/3.50G [00:41<00:03, 96.2MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 91% 3.18G/3.50G [00:41<00:03, 93.8MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 40% 4.02G/9.98G [00:41<00:38, 156MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 40% 4.04G/9.98G [00:41<00:36, 163MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 91% 3.19G/3.50G [00:41<00:03, 90.4MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 41% 4.06G/9.98G [00:41<00:33, 175MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 91% 3.20G/3.50G [00:41<00:03, 89.5MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 41% 4.08G/9.98G [00:41<00:33, 178MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 92% 3.21G/3.50G [00:41<00:03, 89.3MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 41% 4.10G/9.98G [00:41<00:33, 175MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 92% 3.22G/3.50G [00:41<00:03, 87.3MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 41% 4.12G/9.98G [00:41<00:34, 172MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 92% 3.23G/3.50G [00:41<00:03, 87.9MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 42% 4.14G/9.98G [00:41<00:34, 170MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 93% 3.24G/3.50G [00:41<00:02, 87.6MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 42% 4.16G/9.98G [00:42<00:32, 177MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 93% 3.25G/3.50G [00:42<00:02, 86.1MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 42% 4.18G/9.98G [00:42<00:34, 170MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 93% 3.26G/3.50G [00:42<00:02, 85.4MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 42% 4.20G/9.98G [00:42<00:34, 168MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 93% 3.27G/3.50G [00:42<00:02, 82.2MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 42% 4.23G/9.98G [00:42<00:34, 169MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 94% 3.28G/3.50G [00:42<00:02, 83.5MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 43% 4.25G/9.98G [00:42<00:34, 166MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 94% 3.29G/3.50G [00:42<00:02, 80.2MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 43% 4.27G/9.98G [00:42<00:34, 166MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 94% 3.30G/3.50G [00:42<00:02, 81.6MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 43% 4.29G/9.98G [00:42<00:36, 158MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 95% 3.31G/3.50G [00:42<00:02, 78.9MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 43% 4.31G/9.98G [00:42<00:37, 150MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 95% 3.32G/3.50G [00:42<00:02, 80.4MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 95% 3.33G/3.50G [00:43<00:02, 81.3MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 43% 4.33G/9.98G [00:43<00:44, 128MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 96% 3.34G/3.50G [00:43<00:01, 82.3MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 44% 4.35G/9.98G [00:43<00:41, 137MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 96% 3.36G/3.50G [00:43<00:01, 79.2MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 44% 4.37G/9.98G [00:43<00:41, 135MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 96% 3.37G/3.50G [00:43<00:01, 82.1MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 96% 3.38G/3.50G [00:43<00:01, 83.2MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 44% 4.39G/9.98G [00:43<00:40, 137MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 97% 3.39G/3.50G [00:43<00:01, 82.6MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 44% 4.41G/9.98G [00:43<00:40, 136MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 97% 3.40G/3.50G [00:43<00:01, 84.6MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 44% 4.44G/9.98G [00:43<00:40, 135MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 97% 3.41G/3.50G [00:43<00:01, 84.9MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 98% 3.42G/3.50G [00:44<00:00, 84.8MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 45% 4.46G/9.98G [00:44<00:42, 131MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 98% 3.43G/3.50G [00:44<00:00, 84.5MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 45% 4.48G/9.98G [00:44<00:43, 125MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 98% 3.44G/3.50G [00:44<00:00, 85.1MB/s]\u001b[A\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 99% 3.45G/3.50G [00:44<00:00, 83.5MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 45% 4.50G/9.98G [00:44<00:46, 119MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 99% 3.46G/3.50G [00:44<00:00, 84.2MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 45% 4.52G/9.98G [00:44<00:44, 123MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 99% 3.47G/3.50G [00:44<00:00, 83.9MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 46% 4.54G/9.98G [00:44<00:43, 124MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 99% 3.48G/3.50G [00:44<00:00, 80.4MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 46% 4.56G/9.98G [00:44<00:39, 138MB/s]\u001b[A\n",
"\n",
"Downloading (…)l-00002-of-00002.bin: 100% 3.49G/3.50G [00:45<00:00, 81.6MB/s]\u001b[A\u001b[A\n",
"Downloading (…)l-00002-of-00002.bin: 100% 3.50G/3.50G [00:45<00:00, 77.6MB/s]\n",
"\n",
"Downloading (…)l-00001-of-00002.bin: 46% 4.60G/9.98G [00:45<00:41, 129MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 46% 4.62G/9.98G [00:45<00:40, 134MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 47% 4.65G/9.98G [00:45<00:36, 148MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 47% 4.67G/9.98G [00:45<00:32, 162MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 47% 4.70G/9.98G [00:45<00:29, 179MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 47% 4.72G/9.98G [00:45<00:28, 185MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 48% 4.75G/9.98G [00:46<00:26, 196MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 48% 4.77G/9.98G [00:46<00:26, 198MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 48% 4.79G/9.98G [00:46<00:25, 201MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 48% 4.82G/9.98G [00:46<00:25, 205MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 49% 4.85G/9.98G [00:46<00:24, 207MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 49% 4.89G/9.98G [00:46<00:24, 209MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 49% 4.91G/9.98G [00:46<00:24, 208MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 50% 4.94G/9.98G [00:46<00:23, 211MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 50% 4.97G/9.98G [00:47<00:23, 214MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 50% 5.00G/9.98G [00:47<00:26, 187MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 50% 5.02G/9.98G [00:47<00:27, 180MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 51% 5.05G/9.98G [00:47<00:25, 192MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 51% 5.09G/9.98G [00:47<00:24, 200MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 51% 5.12G/9.98G [00:47<00:23, 206MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 52% 5.15G/9.98G [00:47<00:22, 210MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 52% 5.18G/9.98G [00:48<00:22, 215MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 52% 5.21G/9.98G [00:48<00:21, 218MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 53% 5.24G/9.98G [00:48<00:21, 220MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 53% 5.27G/9.98G [00:48<00:21, 219MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 53% 5.31G/9.98G [00:48<00:21, 218MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 53% 5.34G/9.98G [00:48<00:21, 217MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 54% 5.37G/9.98G [00:48<00:21, 213MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 54% 5.40G/9.98G [00:49<00:30, 149MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 54% 5.42G/9.98G [00:49<00:29, 157MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 55% 5.44G/9.98G [00:49<00:27, 164MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 55% 5.46G/9.98G [00:49<00:26, 170MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 55% 5.48G/9.98G [00:49<00:25, 178MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 55% 5.52G/9.98G [00:49<00:23, 190MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 56% 5.55G/9.98G [00:50<00:22, 200MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 56% 5.58G/9.98G [00:50<00:21, 208MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 56% 5.61G/9.98G [00:50<00:20, 211MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 57% 5.64G/9.98G [00:50<00:20, 214MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 57% 5.67G/9.98G [00:50<00:19, 219MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 57% 5.70G/9.98G [00:50<00:19, 220MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 57% 5.74G/9.98G [00:50<00:19, 216MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 58% 5.77G/9.98G [00:51<00:19, 217MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 58% 5.80G/9.98G [00:51<00:19, 219MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 58% 5.83G/9.98G [00:51<00:18, 220MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 59% 5.86G/9.98G [00:51<00:21, 194MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 59% 5.88G/9.98G [00:51<00:21, 194MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 59% 5.90G/9.98G [00:51<00:21, 192MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 59% 5.92G/9.98G [00:51<00:21, 193MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 60% 5.95G/9.98G [00:51<00:20, 196MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 60% 5.97G/9.98G [00:52<00:20, 193MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 60% 5.99G/9.98G [00:52<00:20, 193MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 60% 6.01G/9.98G [00:52<00:20, 196MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 60% 6.03G/9.98G [00:52<00:20, 197MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 61% 6.06G/9.98G [00:52<00:18, 207MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 61% 6.09G/9.98G [00:52<00:18, 210MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 61% 6.12G/9.98G [00:52<00:18, 211MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 62% 6.16G/9.98G [00:53<00:20, 187MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 62% 6.18G/9.98G [00:53<00:21, 175MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 62% 6.21G/9.98G [00:53<00:20, 188MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 63% 6.24G/9.98G [00:53<00:19, 196MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 63% 6.27G/9.98G [00:53<00:18, 204MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 63% 6.30G/9.98G [00:53<00:17, 208MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 63% 6.33G/9.98G [00:53<00:17, 210MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 64% 6.36G/9.98G [00:54<00:17, 209MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 64% 6.40G/9.98G [00:54<00:17, 207MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 64% 6.42G/9.98G [00:54<00:17, 203MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 65% 6.44G/9.98G [00:54<00:17, 204MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 65% 6.46G/9.98G [00:54<00:17, 203MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 65% 6.48G/9.98G [00:54<00:17, 201MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 65% 6.50G/9.98G [00:54<00:18, 188MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 65% 6.52G/9.98G [00:54<00:18, 190MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 66% 6.54G/9.98G [00:54<00:18, 191MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 66% 6.56G/9.98G [00:55<00:20, 168MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 66% 6.59G/9.98G [00:55<00:20, 168MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 66% 6.61G/9.98G [00:55<00:19, 175MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 66% 6.63G/9.98G [00:55<00:18, 183MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 67% 6.66G/9.98G [00:55<00:17, 194MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 67% 6.69G/9.98G [00:55<00:16, 203MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 67% 6.72G/9.98G [00:55<00:15, 208MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 68% 6.74G/9.98G [00:56<00:15, 209MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 68% 6.77G/9.98G [00:56<00:15, 212MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 68% 6.81G/9.98G [00:56<00:15, 211MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 69% 6.84G/9.98G [00:56<00:15, 206MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 69% 6.86G/9.98G [00:56<00:15, 205MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 69% 6.88G/9.98G [00:56<00:15, 204MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 69% 6.90G/9.98G [00:56<00:15, 198MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 69% 6.93G/9.98G [00:56<00:14, 204MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 70% 6.95G/9.98G [00:57<00:15, 200MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 70% 6.97G/9.98G [00:57<00:16, 184MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 70% 6.99G/9.98G [00:57<00:18, 160MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 70% 7.01G/9.98G [00:57<00:19, 148MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 71% 7.04G/9.98G [00:57<00:19, 150MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 71% 7.07G/9.98G [00:57<00:17, 170MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 71% 7.10G/9.98G [00:57<00:15, 186MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 71% 7.13G/9.98G [00:58<00:14, 198MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 72% 7.16G/9.98G [00:58<00:13, 206MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 72% 7.19G/9.98G [00:58<00:13, 210MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 72% 7.22G/9.98G [00:58<00:12, 215MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 73% 7.26G/9.98G [00:58<00:12, 216MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 73% 7.29G/9.98G [00:58<00:12, 217MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 73% 7.32G/9.98G [00:58<00:12, 207MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 74% 7.35G/9.98G [00:59<00:12, 212MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 74% 7.38G/9.98G [00:59<00:14, 176MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 74% 7.40G/9.98G [00:59<00:17, 151MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 74% 7.42G/9.98G [00:59<00:17, 143MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 75% 7.44G/9.98G [00:59<00:18, 136MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 75% 7.47G/9.98G [01:00<00:20, 120MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 75% 7.49G/9.98G [01:00<00:26, 93.1MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 75% 7.51G/9.98G [01:00<00:33, 73.9MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 75% 7.53G/9.98G [01:01<00:28, 85.5MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 76% 7.54G/9.98G [01:01<00:27, 87.9MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 76% 7.55G/9.98G [01:01<00:27, 87.5MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 76% 7.57G/9.98G [01:01<00:27, 87.0MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 76% 7.58G/9.98G [01:01<00:26, 89.3MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 76% 7.60G/9.98G [01:01<00:22, 108MB/s] \u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 76% 7.62G/9.98G [01:01<00:19, 124MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 77% 7.64G/9.98G [01:02<00:16, 139MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 77% 7.67G/9.98G [01:02<00:15, 151MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 77% 7.69G/9.98G [01:02<00:14, 163MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 77% 7.72G/9.98G [01:02<00:12, 180MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 78% 7.74G/9.98G [01:02<00:12, 180MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 78% 7.77G/9.98G [01:02<00:11, 191MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 78% 7.80G/9.98G [01:02<00:10, 200MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 79% 7.83G/9.98G [01:02<00:10, 205MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 79% 7.86G/9.98G [01:03<00:10, 206MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 79% 7.89G/9.98G [01:03<00:11, 181MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 79% 7.92G/9.98G [01:03<00:10, 192MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 80% 7.94G/9.98G [01:03<00:10, 195MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 80% 7.97G/9.98G [01:03<00:09, 203MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 80% 8.00G/9.98G [01:03<00:09, 210MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 81% 8.03G/9.98G [01:03<00:09, 213MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 81% 8.06G/9.98G [01:04<00:09, 212MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 81% 8.10G/9.98G [01:04<00:08, 209MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 81% 8.13G/9.98G [01:04<00:09, 201MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 82% 8.15G/9.98G [01:04<00:09, 191MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 82% 8.17G/9.98G [01:04<00:09, 193MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 82% 8.19G/9.98G [01:04<00:09, 184MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 82% 8.21G/9.98G [01:04<00:09, 177MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 83% 8.23G/9.98G [01:05<00:09, 180MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 83% 8.25G/9.98G [01:05<00:11, 145MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 83% 8.27G/9.98G [01:05<00:11, 151MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 83% 8.29G/9.98G [01:05<00:10, 162MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 83% 8.33G/9.98G [01:05<00:09, 181MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 84% 8.36G/9.98G [01:05<00:08, 195MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 84% 8.39G/9.98G [01:05<00:07, 203MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 84% 8.41G/9.98G [01:05<00:07, 203MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 85% 8.44G/9.98G [01:06<00:07, 210MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 85% 8.47G/9.98G [01:06<00:07, 213MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 85% 8.50G/9.98G [01:06<00:06, 211MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 86% 8.54G/9.98G [01:06<00:07, 204MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 86% 8.56G/9.98G [01:06<00:07, 202MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 86% 8.59G/9.98G [01:06<00:06, 207MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 86% 8.62G/9.98G [01:06<00:06, 208MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 87% 8.65G/9.98G [01:07<00:06, 207MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 87% 8.67G/9.98G [01:07<00:06, 206MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 87% 8.70G/9.98G [01:07<00:06, 211MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 88% 8.73G/9.98G [01:07<00:05, 214MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 88% 8.77G/9.98G [01:07<00:05, 217MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 88% 8.80G/9.98G [01:07<00:05, 217MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 88% 8.83G/9.98G [01:08<00:05, 192MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 89% 8.85G/9.98G [01:08<00:05, 194MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 89% 8.87G/9.98G [01:08<00:05, 192MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 89% 8.89G/9.98G [01:08<00:05, 194MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 89% 8.91G/9.98G [01:08<00:05, 193MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 90% 8.93G/9.98G [01:08<00:05, 198MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 90% 8.95G/9.98G [01:08<00:05, 195MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 90% 8.98G/9.98G [01:08<00:05, 193MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 90% 9.00G/9.98G [01:08<00:05, 187MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 90% 9.02G/9.98G [01:09<00:05, 186MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 91% 9.04G/9.98G [01:09<00:05, 172MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 91% 9.06G/9.98G [01:09<00:05, 171MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 91% 9.08G/9.98G [01:09<00:05, 176MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 91% 9.10G/9.98G [01:09<00:06, 131MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 91% 9.12G/9.98G [01:10<00:09, 92.4MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 92% 9.14G/9.98G [01:10<00:08, 97.1MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 92% 9.16G/9.98G [01:10<00:07, 112MB/s] \u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 92% 9.19G/9.98G [01:10<00:06, 127MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 92% 9.21G/9.98G [01:10<00:06, 120MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 92% 9.23G/9.98G [01:10<00:07, 100MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 93% 9.25G/9.98G [01:11<00:07, 91.3MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 93% 9.27G/9.98G [01:11<00:06, 108MB/s] \u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 93% 9.29G/9.98G [01:11<00:07, 87.6MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 93% 9.31G/9.98G [01:11<00:06, 100MB/s] \u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 94% 9.33G/9.98G [01:11<00:05, 116MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 94% 9.35G/9.98G [01:12<00:04, 133MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 94% 9.37G/9.98G [01:12<00:04, 149MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 94% 9.41G/9.98G [01:12<00:03, 167MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 94% 9.43G/9.98G [01:12<00:03, 172MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 95% 9.45G/9.98G [01:12<00:02, 177MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 95% 9.48G/9.98G [01:12<00:02, 191MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 95% 9.50G/9.98G [01:12<00:02, 194MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 96% 9.53G/9.98G [01:12<00:02, 199MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 96% 9.55G/9.98G [01:12<00:02, 201MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 96% 9.57G/9.98G [01:13<00:02, 192MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 96% 9.59G/9.98G [01:13<00:02, 169MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 96% 9.63G/9.98G [01:13<00:01, 185MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 97% 9.66G/9.98G [01:13<00:01, 194MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 97% 9.69G/9.98G [01:13<00:01, 200MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 97% 9.72G/9.98G [01:13<00:01, 207MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 98% 9.75G/9.98G [01:14<00:01, 211MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 98% 9.78G/9.98G [01:14<00:00, 212MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 98% 9.81G/9.98G [01:14<00:00, 205MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 99% 9.84G/9.98G [01:14<00:00, 206MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 99% 9.86G/9.98G [01:14<00:00, 205MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 99% 9.88G/9.98G [01:14<00:00, 200MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 99% 9.90G/9.98G [01:14<00:00, 199MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 99% 9.92G/9.98G [01:14<00:00, 195MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 100% 9.94G/9.98G [01:14<00:00, 190MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00002.bin: 100% 9.98G/9.98G [01:15<00:00, 132MB/s]\n",
"Fetching 11 files: 100% 11/11 [01:15<00:00, 6.86s/it]\n",
"Loading ckpt pytorch_model-00001-of-00002.bin\n",
"copying base_model.model.model.embed_tokens.weight from 0-th LoRA weight to model.embed_tokens.weight\n",
"merging base_model.model.model.layers.0.self_attn.q_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.0.self_attn.q_proj.weight\n",
"merging base_model.model.model.layers.0.self_attn.k_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.0.self_attn.k_proj.weight\n",
"merging base_model.model.model.layers.0.self_attn.v_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.0.self_attn.v_proj.weight\n",
"merging base_model.model.model.layers.0.self_attn.o_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.0.self_attn.o_proj.weight\n",
"merging base_model.model.model.layers.0.mlp.gate_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.0.mlp.gate_proj.weight\n",
"merging base_model.model.model.layers.0.mlp.down_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.0.mlp.down_proj.weight\n",
"merging base_model.model.model.layers.0.mlp.up_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.0.mlp.up_proj.weight\n",
"merging base_model.model.model.layers.1.self_attn.q_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.1.self_attn.q_proj.weight\n",
"merging base_model.model.model.layers.1.self_attn.k_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.1.self_attn.k_proj.weight\n",
"merging base_model.model.model.layers.1.self_attn.v_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.1.self_attn.v_proj.weight\n",
"merging base_model.model.model.layers.1.self_attn.o_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.1.self_attn.o_proj.weight\n",
"merging base_model.model.model.layers.1.mlp.gate_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.1.mlp.gate_proj.weight\n",
"merging base_model.model.model.layers.1.mlp.down_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.1.mlp.down_proj.weight\n",
"merging base_model.model.model.layers.1.mlp.up_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.1.mlp.up_proj.weight\n",
"merging base_model.model.model.layers.2.self_attn.q_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.2.self_attn.q_proj.weight\n",
"merging base_model.model.model.layers.2.self_attn.k_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.2.self_attn.k_proj.weight\n",
"merging base_model.model.model.layers.2.self_attn.v_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.2.self_attn.v_proj.weight\n",
"merging base_model.model.model.layers.2.self_attn.o_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.2.self_attn.o_proj.weight\n",
"merging base_model.model.model.layers.2.mlp.gate_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.2.mlp.gate_proj.weight\n",
"merging base_model.model.model.layers.2.mlp.down_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.2.mlp.down_proj.weight\n",
"merging base_model.model.model.layers.2.mlp.up_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.2.mlp.up_proj.weight\n",
"merging base_model.model.model.layers.3.self_attn.q_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.3.self_attn.q_proj.weight\n",
"merging base_model.model.model.layers.3.self_attn.k_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.3.self_attn.k_proj.weight\n",
"merging base_model.model.model.layers.3.self_attn.v_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.3.self_attn.v_proj.weight\n",
"merging base_model.model.model.layers.3.self_attn.o_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.3.self_attn.o_proj.weight\n",
"merging base_model.model.model.layers.3.mlp.gate_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.3.mlp.gate_proj.weight\n",
"merging base_model.model.model.layers.3.mlp.down_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.3.mlp.down_proj.weight\n",
"merging base_model.model.model.layers.3.mlp.up_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.3.mlp.up_proj.weight\n",
"merging base_model.model.model.layers.4.self_attn.q_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.4.self_attn.q_proj.weight\n",
"merging base_model.model.model.layers.4.self_attn.k_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.4.self_attn.k_proj.weight\n",
"merging base_model.model.model.layers.4.self_attn.v_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.4.self_attn.v_proj.weight\n",
"merging base_model.model.model.layers.4.self_attn.o_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.4.self_attn.o_proj.weight\n",
"merging base_model.model.model.layers.4.mlp.gate_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.4.mlp.gate_proj.weight\n",
"merging base_model.model.model.layers.4.mlp.down_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.4.mlp.down_proj.weight\n",
"merging base_model.model.model.layers.4.mlp.up_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.4.mlp.up_proj.weight\n",
"merging base_model.model.model.layers.5.self_attn.q_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.5.self_attn.q_proj.weight\n",
"merging base_model.model.model.layers.5.self_attn.k_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.5.self_attn.k_proj.weight\n",
"merging base_model.model.model.layers.5.self_attn.v_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.5.self_attn.v_proj.weight\n",
"merging base_model.model.model.layers.5.self_attn.o_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.5.self_attn.o_proj.weight\n",
"merging base_model.model.model.layers.5.mlp.gate_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.5.mlp.gate_proj.weight\n",
"merging base_model.model.model.layers.5.mlp.down_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.5.mlp.down_proj.weight\n",
"merging base_model.model.model.layers.5.mlp.up_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.5.mlp.up_proj.weight\n",
"merging base_model.model.model.layers.6.self_attn.q_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.6.self_attn.q_proj.weight\n",
"merging base_model.model.model.layers.6.self_attn.k_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.6.self_attn.k_proj.weight\n",
"merging base_model.model.model.layers.6.self_attn.v_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.6.self_attn.v_proj.weight\n",
"merging base_model.model.model.layers.6.self_attn.o_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.6.self_attn.o_proj.weight\n",
"merging base_model.model.model.layers.6.mlp.gate_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.6.mlp.gate_proj.weight\n",
"merging base_model.model.model.layers.6.mlp.down_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.6.mlp.down_proj.weight\n",
"merging base_model.model.model.layers.6.mlp.up_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.6.mlp.up_proj.weight\n",
"merging base_model.model.model.layers.7.self_attn.q_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.7.self_attn.q_proj.weight\n",
"merging base_model.model.model.layers.7.self_attn.k_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.7.self_attn.k_proj.weight\n",
"merging base_model.model.model.layers.7.self_attn.v_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.7.self_attn.v_proj.weight\n",
"merging base_model.model.model.layers.7.self_attn.o_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.7.self_attn.o_proj.weight\n",
"merging base_model.model.model.layers.7.mlp.gate_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.7.mlp.gate_proj.weight\n",
"merging base_model.model.model.layers.7.mlp.down_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.7.mlp.down_proj.weight\n",
"merging base_model.model.model.layers.7.mlp.up_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.7.mlp.up_proj.weight\n",
"merging base_model.model.model.layers.8.self_attn.q_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.8.self_attn.q_proj.weight\n",
"merging base_model.model.model.layers.8.self_attn.k_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.8.self_attn.k_proj.weight\n",
"merging base_model.model.model.layers.8.self_attn.v_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.8.self_attn.v_proj.weight\n",
"merging base_model.model.model.layers.8.self_attn.o_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.8.self_attn.o_proj.weight\n",
"merging base_model.model.model.layers.8.mlp.gate_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.8.mlp.gate_proj.weight\n",
"merging base_model.model.model.layers.8.mlp.down_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.8.mlp.down_proj.weight\n",
"merging base_model.model.model.layers.8.mlp.up_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.8.mlp.up_proj.weight\n",
"merging base_model.model.model.layers.9.self_attn.q_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.9.self_attn.q_proj.weight\n",
"merging base_model.model.model.layers.9.self_attn.k_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.9.self_attn.k_proj.weight\n",
"merging base_model.model.model.layers.9.self_attn.v_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.9.self_attn.v_proj.weight\n",
"merging base_model.model.model.layers.9.self_attn.o_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.9.self_attn.o_proj.weight\n",
"merging base_model.model.model.layers.9.mlp.gate_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.9.mlp.gate_proj.weight\n",
"merging base_model.model.model.layers.9.mlp.down_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.9.mlp.down_proj.weight\n",
"merging base_model.model.model.layers.9.mlp.up_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.9.mlp.up_proj.weight\n",
"merging base_model.model.model.layers.10.self_attn.q_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.10.self_attn.q_proj.weight\n",
"merging base_model.model.model.layers.10.self_attn.k_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.10.self_attn.k_proj.weight\n",
"merging base_model.model.model.layers.10.self_attn.v_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.10.self_attn.v_proj.weight\n",
"merging base_model.model.model.layers.10.self_attn.o_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.10.self_attn.o_proj.weight\n",
"merging base_model.model.model.layers.10.mlp.gate_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.10.mlp.gate_proj.weight\n",
"merging base_model.model.model.layers.10.mlp.down_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.10.mlp.down_proj.weight\n",
"merging base_model.model.model.layers.10.mlp.up_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.10.mlp.up_proj.weight\n",
"merging base_model.model.model.layers.11.self_attn.q_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.11.self_attn.q_proj.weight\n",
"merging base_model.model.model.layers.11.self_attn.k_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.11.self_attn.k_proj.weight\n",
"merging base_model.model.model.layers.11.self_attn.v_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.11.self_attn.v_proj.weight\n",
"merging base_model.model.model.layers.11.self_attn.o_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.11.self_attn.o_proj.weight\n",
"merging base_model.model.model.layers.11.mlp.gate_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.11.mlp.gate_proj.weight\n",
"merging base_model.model.model.layers.11.mlp.down_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.11.mlp.down_proj.weight\n",
"merging base_model.model.model.layers.11.mlp.up_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.11.mlp.up_proj.weight\n",
"merging base_model.model.model.layers.12.self_attn.q_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.12.self_attn.q_proj.weight\n",
"merging base_model.model.model.layers.12.self_attn.k_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.12.self_attn.k_proj.weight\n",
"merging base_model.model.model.layers.12.self_attn.v_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.12.self_attn.v_proj.weight\n",
"merging base_model.model.model.layers.12.self_attn.o_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.12.self_attn.o_proj.weight\n",
"merging base_model.model.model.layers.12.mlp.gate_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.12.mlp.gate_proj.weight\n",
"merging base_model.model.model.layers.12.mlp.down_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.12.mlp.down_proj.weight\n",
"merging base_model.model.model.layers.12.mlp.up_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.12.mlp.up_proj.weight\n",
"merging base_model.model.model.layers.13.self_attn.q_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.13.self_attn.q_proj.weight\n",
"merging base_model.model.model.layers.13.self_attn.k_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.13.self_attn.k_proj.weight\n",
"merging base_model.model.model.layers.13.self_attn.v_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.13.self_attn.v_proj.weight\n",
"merging base_model.model.model.layers.13.self_attn.o_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.13.self_attn.o_proj.weight\n",
"merging base_model.model.model.layers.13.mlp.gate_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.13.mlp.gate_proj.weight\n",
"merging base_model.model.model.layers.13.mlp.down_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.13.mlp.down_proj.weight\n",
"merging base_model.model.model.layers.13.mlp.up_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.13.mlp.up_proj.weight\n",
"merging base_model.model.model.layers.14.self_attn.q_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.14.self_attn.q_proj.weight\n",
"merging base_model.model.model.layers.14.self_attn.k_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.14.self_attn.k_proj.weight\n",
"merging base_model.model.model.layers.14.self_attn.v_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.14.self_attn.v_proj.weight\n",
"merging base_model.model.model.layers.14.self_attn.o_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.14.self_attn.o_proj.weight\n",
"merging base_model.model.model.layers.14.mlp.gate_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.14.mlp.gate_proj.weight\n",
"merging base_model.model.model.layers.14.mlp.down_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.14.mlp.down_proj.weight\n",
"merging base_model.model.model.layers.14.mlp.up_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.14.mlp.up_proj.weight\n",
"merging base_model.model.model.layers.15.self_attn.q_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.15.self_attn.q_proj.weight\n",
"merging base_model.model.model.layers.15.self_attn.k_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.15.self_attn.k_proj.weight\n",
"merging base_model.model.model.layers.15.self_attn.v_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.15.self_attn.v_proj.weight\n",
"merging base_model.model.model.layers.15.self_attn.o_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.15.self_attn.o_proj.weight\n",
"merging base_model.model.model.layers.15.mlp.gate_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.15.mlp.gate_proj.weight\n",
"merging base_model.model.model.layers.15.mlp.down_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.15.mlp.down_proj.weight\n",
"merging base_model.model.model.layers.15.mlp.up_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.15.mlp.up_proj.weight\n",
"merging base_model.model.model.layers.16.self_attn.q_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.16.self_attn.q_proj.weight\n",
"merging base_model.model.model.layers.16.self_attn.k_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.16.self_attn.k_proj.weight\n",
"merging base_model.model.model.layers.16.self_attn.v_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.16.self_attn.v_proj.weight\n",
"merging base_model.model.model.layers.16.self_attn.o_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.16.self_attn.o_proj.weight\n",
"merging base_model.model.model.layers.16.mlp.gate_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.16.mlp.gate_proj.weight\n",
"merging base_model.model.model.layers.16.mlp.down_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.16.mlp.down_proj.weight\n",
"merging base_model.model.model.layers.16.mlp.up_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.16.mlp.up_proj.weight\n",
"merging base_model.model.model.layers.17.self_attn.q_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.17.self_attn.q_proj.weight\n",
"merging base_model.model.model.layers.17.self_attn.k_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.17.self_attn.k_proj.weight\n",
"merging base_model.model.model.layers.17.self_attn.v_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.17.self_attn.v_proj.weight\n",
"merging base_model.model.model.layers.17.self_attn.o_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.17.self_attn.o_proj.weight\n",
"merging base_model.model.model.layers.17.mlp.gate_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.17.mlp.gate_proj.weight\n",
"merging base_model.model.model.layers.17.mlp.down_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.17.mlp.down_proj.weight\n",
"merging base_model.model.model.layers.17.mlp.up_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.17.mlp.up_proj.weight\n",
"merging base_model.model.model.layers.18.self_attn.q_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.18.self_attn.q_proj.weight\n",
"merging base_model.model.model.layers.18.self_attn.k_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.18.self_attn.k_proj.weight\n",
"merging base_model.model.model.layers.18.self_attn.v_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.18.self_attn.v_proj.weight\n",
"merging base_model.model.model.layers.18.self_attn.o_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.18.self_attn.o_proj.weight\n",
"merging base_model.model.model.layers.18.mlp.gate_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.18.mlp.gate_proj.weight\n",
"merging base_model.model.model.layers.18.mlp.down_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.18.mlp.down_proj.weight\n",
"merging base_model.model.model.layers.18.mlp.up_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.18.mlp.up_proj.weight\n",
"merging base_model.model.model.layers.19.self_attn.q_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.19.self_attn.q_proj.weight\n",
"merging base_model.model.model.layers.19.self_attn.k_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.19.self_attn.k_proj.weight\n",
"merging base_model.model.model.layers.19.self_attn.v_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.19.self_attn.v_proj.weight\n",
"merging base_model.model.model.layers.19.self_attn.o_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.19.self_attn.o_proj.weight\n",
"merging base_model.model.model.layers.19.mlp.gate_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.19.mlp.gate_proj.weight\n",
"merging base_model.model.model.layers.19.mlp.down_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.19.mlp.down_proj.weight\n",
"merging base_model.model.model.layers.19.mlp.up_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.19.mlp.up_proj.weight\n",
"merging base_model.model.model.layers.20.self_attn.q_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.20.self_attn.q_proj.weight\n",
"merging base_model.model.model.layers.20.self_attn.k_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.20.self_attn.k_proj.weight\n",
"merging base_model.model.model.layers.20.self_attn.v_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.20.self_attn.v_proj.weight\n",
"merging base_model.model.model.layers.20.self_attn.o_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.20.self_attn.o_proj.weight\n",
"merging base_model.model.model.layers.20.mlp.gate_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.20.mlp.gate_proj.weight\n",
"merging base_model.model.model.layers.20.mlp.down_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.20.mlp.down_proj.weight\n",
"merging base_model.model.model.layers.20.mlp.up_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.20.mlp.up_proj.weight\n",
"merging base_model.model.model.layers.21.self_attn.q_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.21.self_attn.q_proj.weight\n",
"merging base_model.model.model.layers.21.self_attn.k_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.21.self_attn.k_proj.weight\n",
"merging base_model.model.model.layers.21.self_attn.v_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.21.self_attn.v_proj.weight\n",
"merging base_model.model.model.layers.21.self_attn.o_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.21.self_attn.o_proj.weight\n",
"merging base_model.model.model.layers.21.mlp.gate_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.21.mlp.gate_proj.weight\n",
"merging base_model.model.model.layers.21.mlp.down_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.21.mlp.down_proj.weight\n",
"merging base_model.model.model.layers.21.mlp.up_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.21.mlp.up_proj.weight\n",
"merging base_model.model.model.layers.22.self_attn.q_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.22.self_attn.q_proj.weight\n",
"merging base_model.model.model.layers.22.self_attn.k_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.22.self_attn.k_proj.weight\n",
"merging base_model.model.model.layers.22.self_attn.v_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.22.self_attn.v_proj.weight\n",
"merging base_model.model.model.layers.22.self_attn.o_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.22.self_attn.o_proj.weight\n",
"merging base_model.model.model.layers.22.mlp.gate_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.22.mlp.gate_proj.weight\n",
"merging base_model.model.model.layers.22.mlp.down_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.22.mlp.down_proj.weight\n",
"merging base_model.model.model.layers.22.mlp.up_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.22.mlp.up_proj.weight\n",
"merging base_model.model.model.layers.23.self_attn.q_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.23.self_attn.q_proj.weight\n",
"merging base_model.model.model.layers.23.self_attn.k_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.23.self_attn.k_proj.weight\n",
"merging base_model.model.model.layers.23.self_attn.v_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.23.self_attn.v_proj.weight\n",
"merging base_model.model.model.layers.23.self_attn.o_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.23.self_attn.o_proj.weight\n",
"merging base_model.model.model.layers.23.mlp.gate_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.23.mlp.gate_proj.weight\n",
"merging base_model.model.model.layers.23.mlp.down_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.23.mlp.down_proj.weight\n",
"merging base_model.model.model.layers.23.mlp.up_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.23.mlp.up_proj.weight\n",
"Saving ckpt pytorch_model-00001-of-00002.bin to alpaca-combined in pth format...\n",
"Saving shard 1 of 1 into alpaca-combined/L1-consolidated.00.pth\n",
"Loading ckpt pytorch_model-00002-of-00002.bin\n",
"merging base_model.model.model.layers.24.self_attn.q_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.24.self_attn.q_proj.weight\n",
"merging base_model.model.model.layers.24.self_attn.k_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.24.self_attn.k_proj.weight\n",
"merging base_model.model.model.layers.24.self_attn.v_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.24.self_attn.v_proj.weight\n",
"merging base_model.model.model.layers.24.self_attn.o_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.24.self_attn.o_proj.weight\n",
"merging base_model.model.model.layers.24.mlp.gate_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.24.mlp.gate_proj.weight\n",
"merging base_model.model.model.layers.24.mlp.down_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.24.mlp.down_proj.weight\n",
"merging base_model.model.model.layers.24.mlp.up_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.24.mlp.up_proj.weight\n",
"merging base_model.model.model.layers.25.self_attn.q_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.25.self_attn.q_proj.weight\n",
"merging base_model.model.model.layers.25.self_attn.k_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.25.self_attn.k_proj.weight\n",
"merging base_model.model.model.layers.25.self_attn.v_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.25.self_attn.v_proj.weight\n",
"merging base_model.model.model.layers.25.self_attn.o_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.25.self_attn.o_proj.weight\n",
"merging base_model.model.model.layers.25.mlp.gate_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.25.mlp.gate_proj.weight\n",
"merging base_model.model.model.layers.25.mlp.down_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.25.mlp.down_proj.weight\n",
"merging base_model.model.model.layers.25.mlp.up_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.25.mlp.up_proj.weight\n",
"merging base_model.model.model.layers.26.self_attn.q_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.26.self_attn.q_proj.weight\n",
"merging base_model.model.model.layers.26.self_attn.k_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.26.self_attn.k_proj.weight\n",
"merging base_model.model.model.layers.26.self_attn.v_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.26.self_attn.v_proj.weight\n",
"merging base_model.model.model.layers.26.self_attn.o_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.26.self_attn.o_proj.weight\n",
"merging base_model.model.model.layers.26.mlp.gate_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.26.mlp.gate_proj.weight\n",
"merging base_model.model.model.layers.26.mlp.down_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.26.mlp.down_proj.weight\n",
"merging base_model.model.model.layers.26.mlp.up_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.26.mlp.up_proj.weight\n",
"merging base_model.model.model.layers.27.self_attn.q_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.27.self_attn.q_proj.weight\n",
"merging base_model.model.model.layers.27.self_attn.k_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.27.self_attn.k_proj.weight\n",
"merging base_model.model.model.layers.27.self_attn.v_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.27.self_attn.v_proj.weight\n",
"merging base_model.model.model.layers.27.self_attn.o_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.27.self_attn.o_proj.weight\n",
"merging base_model.model.model.layers.27.mlp.gate_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.27.mlp.gate_proj.weight\n",
"merging base_model.model.model.layers.27.mlp.down_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.27.mlp.down_proj.weight\n",
"merging base_model.model.model.layers.27.mlp.up_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.27.mlp.up_proj.weight\n",
"merging base_model.model.model.layers.28.self_attn.q_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.28.self_attn.q_proj.weight\n",
"merging base_model.model.model.layers.28.self_attn.k_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.28.self_attn.k_proj.weight\n",
"merging base_model.model.model.layers.28.self_attn.v_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.28.self_attn.v_proj.weight\n",
"merging base_model.model.model.layers.28.self_attn.o_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.28.self_attn.o_proj.weight\n",
"merging base_model.model.model.layers.28.mlp.gate_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.28.mlp.gate_proj.weight\n",
"merging base_model.model.model.layers.28.mlp.down_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.28.mlp.down_proj.weight\n",
"merging base_model.model.model.layers.28.mlp.up_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.28.mlp.up_proj.weight\n",
"merging base_model.model.model.layers.29.self_attn.q_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.29.self_attn.q_proj.weight\n",
"merging base_model.model.model.layers.29.self_attn.k_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.29.self_attn.k_proj.weight\n",
"merging base_model.model.model.layers.29.self_attn.v_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.29.self_attn.v_proj.weight\n",
"merging base_model.model.model.layers.29.self_attn.o_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.29.self_attn.o_proj.weight\n",
"merging base_model.model.model.layers.29.mlp.gate_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.29.mlp.gate_proj.weight\n",
"merging base_model.model.model.layers.29.mlp.down_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.29.mlp.down_proj.weight\n",
"merging base_model.model.model.layers.29.mlp.up_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.29.mlp.up_proj.weight\n",
"merging base_model.model.model.layers.30.self_attn.q_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.30.self_attn.q_proj.weight\n",
"merging base_model.model.model.layers.30.self_attn.k_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.30.self_attn.k_proj.weight\n",
"merging base_model.model.model.layers.30.self_attn.v_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.30.self_attn.v_proj.weight\n",
"merging base_model.model.model.layers.30.self_attn.o_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.30.self_attn.o_proj.weight\n",
"merging base_model.model.model.layers.30.mlp.gate_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.30.mlp.gate_proj.weight\n",
"merging base_model.model.model.layers.30.mlp.down_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.30.mlp.down_proj.weight\n",
"merging base_model.model.model.layers.30.mlp.up_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.30.mlp.up_proj.weight\n",
"merging base_model.model.model.layers.31.self_attn.q_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.31.self_attn.q_proj.weight\n",
"merging base_model.model.model.layers.31.self_attn.k_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.31.self_attn.k_proj.weight\n",
"merging base_model.model.model.layers.31.self_attn.v_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.31.self_attn.v_proj.weight\n",
"merging base_model.model.model.layers.31.self_attn.o_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.31.self_attn.o_proj.weight\n",
"merging base_model.model.model.layers.31.mlp.gate_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.31.mlp.gate_proj.weight\n",
"merging base_model.model.model.layers.31.mlp.down_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.31.mlp.down_proj.weight\n",
"merging base_model.model.model.layers.31.mlp.up_proj.lora_A.weight and lora_B.weight form 0-th LoRA weight to model.layers.31.mlp.up_proj.weight\n",
"copying base_model.model.lm_head.weight from 0-th LoRA weight to lm_head.weight\n",
"Saving ckpt pytorch_model-00002-of-00002.bin to alpaca-combined in pth format...\n",
"Saving shard 1 of 1 into alpaca-combined/L2-consolidated.00.pth\n",
"Saving tokenizer\n",
"Saving params.json into alpaca-combined/params.json\n",
"Loading ['L1-consolidated.00.pth', 'L2-consolidated.00.pth'] ...\n",
"Saving the merged shard to alpaca-combined/consolidated.00.pth\n",
"Cleaning up...\n",
"Done.\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"## 比对SHA256\n",
"\n",
"完整值:https://github.com/ymcui/Chinese-LLaMA-Alpaca/blob/main/SHA256.md\n",
"\n",
"其中本示例生成的Alpaca-7B的标准SHA256:\n",
"- fbfccc91183169842aac8d093379f0a449b5a26c5ee7a298baf0d556f1499b90\n",
"\n",
"使用下述命令评测后发现两者相同,合并无误。"
],
"metadata": {
"id": "iO6f_kZOPB_q"
}
},
{
"cell_type": "code",
"source": [
"!sha256sum alpaca-combined/consolidated.*.pth"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "t5u4QDNZPYI_",
"outputId": "d0ceb9f9-b2bd-424d-eff7-b7e4dcb459d0"
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"fbfccc91183169842aac8d093379f0a449b5a26c5ee7a298baf0d556f1499b90 alpaca-combined/consolidated.00.pth\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"## 量化模型\n",
"接下来我们使用[llama.cpp](https://github.com/ggerganov/llama.cpp)工具对上一步生成的全量版本权重进行转换,生成4-bit量化模型。\n",
"\n",
"### 编译工具\n",
"\n",
"首先对llama.cpp工具进行编译。"
],
"metadata": {
"id": "ueexcKo-Q_EW"
}
},
{
"cell_type": "code",
"source": [
"!cd llama.cpp && make"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "_GbjsT2wRRCR",
"outputId": "2d66c72f-0ef1-4a56-eebb-1a658827e8e3"
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"I llama.cpp build info: \n",
"I UNAME_S: Linux\n",
"I UNAME_P: x86_64\n",
"I UNAME_M: x86_64\n",
"I CFLAGS: -I. -O3 -std=c11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -pthread -march=native -mtune=native -DGGML_USE_K_QUANTS\n",
"I CXXFLAGS: -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -DGGML_USE_K_QUANTS\n",
"I LDFLAGS: \n",
"I CC: cc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0\n",
"I CXX: g++ (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0\n",
"\n",
"cc -I. -O3 -std=c11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -pthread -march=native -mtune=native -DGGML_USE_K_QUANTS -c ggml.c -o ggml.o\n",
"g++ -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -DGGML_USE_K_QUANTS -c llama.cpp -o llama.o\n",
"\u001b[01m\u001b[Kllama.cpp:\u001b[m\u001b[K In function ‘\u001b[01m\u001b[Kbool kv_cache_init(const llama_hparams&, llama_kv_cache&, ggml_type, int, int)\u001b[m\u001b[K’:\n",
"\u001b[01m\u001b[Kllama.cpp:877:38:\u001b[m\u001b[K \u001b[01;35m\u001b[Kwarning: \u001b[m\u001b[Kunused parameter ‘\u001b[01m\u001b[Kn_gpu_layers\u001b[m\u001b[K’ [\u001b[01;35m\u001b[K-Wunused-parameter\u001b[m\u001b[K]\n",
" 877 | \u001b[01;35m\u001b[Kint n_gpu_layers\u001b[m\u001b[K) {\n",
" | \u001b[01;35m\u001b[K~~~~~~^~~~~~~~~~~~\u001b[m\u001b[K\n",
"g++ -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -DGGML_USE_K_QUANTS -c examples/common.cpp -o common.o\n",
"cc -I. -O3 -std=c11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -pthread -march=native -mtune=native -DGGML_USE_K_QUANTS -c -o k_quants.o k_quants.c\n",
"g++ -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -DGGML_USE_K_QUANTS examples/main/main.cpp ggml.o llama.o common.o k_quants.o -o main \n",
"\n",
"==== Run ./main -h for help. ====\n",
"\n",
"g++ -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -DGGML_USE_K_QUANTS examples/quantize/quantize.cpp ggml.o llama.o k_quants.o -o quantize \n",
"g++ -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -DGGML_USE_K_QUANTS examples/quantize-stats/quantize-stats.cpp ggml.o llama.o k_quants.o -o quantize-stats \n",
"g++ -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -DGGML_USE_K_QUANTS examples/perplexity/perplexity.cpp ggml.o llama.o common.o k_quants.o -o perplexity \n",
"g++ -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -DGGML_USE_K_QUANTS examples/embedding/embedding.cpp ggml.o llama.o common.o k_quants.o -o embedding \n",
"g++ -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -DGGML_USE_K_QUANTS pocs/vdot/vdot.cpp ggml.o k_quants.o -o vdot \n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"### 模型转换为ggml格式(FP16)\n",
"\n",
"这一步,我们将模型转换为ggml格式(FP16)。\n",
"- 在这之前需要把`alpaca-combined`目录挪个位置,把模型文件放到`llama.cpp/zh-models/7B`下,把`tokenizer.model`放到`llama.cpp/zh-models`\n",
"- tokenizer在哪里?\n",
" - `alpaca-combined`目录下有\n",
" - 或者从以下网址下载:https://huggingface.co/ziqingyang/chinese-alpaca-lora-7b/resolve/main/tokenizer.model (注意,Alpaca和LLaMA的`tokenizer.model`不能混用!)\n",
"\n",
"💡 转换13B/33B模型提示:\n",
"- tokenizer可以直接用7B的,13B/33B和7B的相同\n",
"- Alpaca和LLaMA的`tokenizer.model`不能混用!\n",
"- 以下看到7B字样的都是文件夹名,与转换过程没有关系了,改不改都行"
],
"metadata": {
"id": "gw2xpYC0RcQC"
}
},
{
"cell_type": "code",
"source": [
"!cd llama.cpp && mkdir zh-models && mv ../alpaca-combined zh-models/7B\n",
"!mv llama.cpp/zh-models/7B/tokenizer.model llama.cpp/zh-models/\n",
"!ls llama.cpp/zh-models/"
],
"metadata": {
"id": "5KgnFVStRjio",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "02a9192c-941d-4636-befc-2d4c981d65e8"
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"7B tokenizer.model\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"!cd llama.cpp && python convert.py zh-models/7B/"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "NUHeoTMQS1AQ",
"outputId": "95ec2886-a8c7-4537-9242-131f9d235f33"
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Loading model file zh-models/7B/consolidated.00.pth\n",
"Loading vocab file zh-models/tokenizer.model\n",
"Writing vocab...\n",
"[ 1/291] Writing tensor tok_embeddings.weight | size 49954 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 2/291] Writing tensor norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[ 3/291] Writing tensor output.weight | size 49954 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 4/291] Writing tensor layers.0.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 5/291] Writing tensor layers.0.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 6/291] Writing tensor layers.0.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 7/291] Writing tensor layers.0.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 8/291] Writing tensor layers.0.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[ 9/291] Writing tensor layers.0.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 10/291] Writing tensor layers.0.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')\n",
"[ 11/291] Writing tensor layers.0.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 12/291] Writing tensor layers.0.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[ 13/291] Writing tensor layers.1.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 14/291] Writing tensor layers.1.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 15/291] Writing tensor layers.1.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 16/291] Writing tensor layers.1.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 17/291] Writing tensor layers.1.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[ 18/291] Writing tensor layers.1.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 19/291] Writing tensor layers.1.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')\n",
"[ 20/291] Writing tensor layers.1.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 21/291] Writing tensor layers.1.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[ 22/291] Writing tensor layers.2.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 23/291] Writing tensor layers.2.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 24/291] Writing tensor layers.2.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 25/291] Writing tensor layers.2.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 26/291] Writing tensor layers.2.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[ 27/291] Writing tensor layers.2.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 28/291] Writing tensor layers.2.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')\n",
"[ 29/291] Writing tensor layers.2.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 30/291] Writing tensor layers.2.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[ 31/291] Writing tensor layers.3.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 32/291] Writing tensor layers.3.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 33/291] Writing tensor layers.3.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 34/291] Writing tensor layers.3.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 35/291] Writing tensor layers.3.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[ 36/291] Writing tensor layers.3.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 37/291] Writing tensor layers.3.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')\n",
"[ 38/291] Writing tensor layers.3.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 39/291] Writing tensor layers.3.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[ 40/291] Writing tensor layers.4.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 41/291] Writing tensor layers.4.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 42/291] Writing tensor layers.4.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 43/291] Writing tensor layers.4.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 44/291] Writing tensor layers.4.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[ 45/291] Writing tensor layers.4.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 46/291] Writing tensor layers.4.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')\n",
"[ 47/291] Writing tensor layers.4.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 48/291] Writing tensor layers.4.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[ 49/291] Writing tensor layers.5.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 50/291] Writing tensor layers.5.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 51/291] Writing tensor layers.5.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 52/291] Writing tensor layers.5.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 53/291] Writing tensor layers.5.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[ 54/291] Writing tensor layers.5.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 55/291] Writing tensor layers.5.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')\n",
"[ 56/291] Writing tensor layers.5.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 57/291] Writing tensor layers.5.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[ 58/291] Writing tensor layers.6.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 59/291] Writing tensor layers.6.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 60/291] Writing tensor layers.6.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 61/291] Writing tensor layers.6.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 62/291] Writing tensor layers.6.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[ 63/291] Writing tensor layers.6.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 64/291] Writing tensor layers.6.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')\n",
"[ 65/291] Writing tensor layers.6.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 66/291] Writing tensor layers.6.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[ 67/291] Writing tensor layers.7.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 68/291] Writing tensor layers.7.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 69/291] Writing tensor layers.7.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 70/291] Writing tensor layers.7.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 71/291] Writing tensor layers.7.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[ 72/291] Writing tensor layers.7.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 73/291] Writing tensor layers.7.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')\n",
"[ 74/291] Writing tensor layers.7.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 75/291] Writing tensor layers.7.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[ 76/291] Writing tensor layers.8.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 77/291] Writing tensor layers.8.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 78/291] Writing tensor layers.8.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 79/291] Writing tensor layers.8.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 80/291] Writing tensor layers.8.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[ 81/291] Writing tensor layers.8.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 82/291] Writing tensor layers.8.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')\n",
"[ 83/291] Writing tensor layers.8.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 84/291] Writing tensor layers.8.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[ 85/291] Writing tensor layers.9.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 86/291] Writing tensor layers.9.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 87/291] Writing tensor layers.9.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 88/291] Writing tensor layers.9.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 89/291] Writing tensor layers.9.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[ 90/291] Writing tensor layers.9.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 91/291] Writing tensor layers.9.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')\n",
"[ 92/291] Writing tensor layers.9.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 93/291] Writing tensor layers.9.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[ 94/291] Writing tensor layers.10.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 95/291] Writing tensor layers.10.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 96/291] Writing tensor layers.10.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 97/291] Writing tensor layers.10.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 98/291] Writing tensor layers.10.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[ 99/291] Writing tensor layers.10.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[100/291] Writing tensor layers.10.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')\n",
"[101/291] Writing tensor layers.10.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[102/291] Writing tensor layers.10.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[103/291] Writing tensor layers.11.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[104/291] Writing tensor layers.11.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[105/291] Writing tensor layers.11.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[106/291] Writing tensor layers.11.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[107/291] Writing tensor layers.11.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[108/291] Writing tensor layers.11.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[109/291] Writing tensor layers.11.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')\n",
"[110/291] Writing tensor layers.11.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[111/291] Writing tensor layers.11.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[112/291] Writing tensor layers.12.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[113/291] Writing tensor layers.12.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[114/291] Writing tensor layers.12.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[115/291] Writing tensor layers.12.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[116/291] Writing tensor layers.12.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[117/291] Writing tensor layers.12.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[118/291] Writing tensor layers.12.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')\n",
"[119/291] Writing tensor layers.12.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[120/291] Writing tensor layers.12.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[121/291] Writing tensor layers.13.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[122/291] Writing tensor layers.13.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[123/291] Writing tensor layers.13.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[124/291] Writing tensor layers.13.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[125/291] Writing tensor layers.13.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[126/291] Writing tensor layers.13.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[127/291] Writing tensor layers.13.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')\n",
"[128/291] Writing tensor layers.13.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[129/291] Writing tensor layers.13.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[130/291] Writing tensor layers.14.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[131/291] Writing tensor layers.14.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[132/291] Writing tensor layers.14.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[133/291] Writing tensor layers.14.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[134/291] Writing tensor layers.14.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[135/291] Writing tensor layers.14.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[136/291] Writing tensor layers.14.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')\n",
"[137/291] Writing tensor layers.14.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[138/291] Writing tensor layers.14.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[139/291] Writing tensor layers.15.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[140/291] Writing tensor layers.15.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[141/291] Writing tensor layers.15.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[142/291] Writing tensor layers.15.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[143/291] Writing tensor layers.15.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[144/291] Writing tensor layers.15.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[145/291] Writing tensor layers.15.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')\n",
"[146/291] Writing tensor layers.15.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[147/291] Writing tensor layers.15.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[148/291] Writing tensor layers.16.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[149/291] Writing tensor layers.16.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[150/291] Writing tensor layers.16.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[151/291] Writing tensor layers.16.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[152/291] Writing tensor layers.16.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[153/291] Writing tensor layers.16.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[154/291] Writing tensor layers.16.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')\n",
"[155/291] Writing tensor layers.16.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[156/291] Writing tensor layers.16.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[157/291] Writing tensor layers.17.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[158/291] Writing tensor layers.17.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[159/291] Writing tensor layers.17.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[160/291] Writing tensor layers.17.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[161/291] Writing tensor layers.17.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[162/291] Writing tensor layers.17.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[163/291] Writing tensor layers.17.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')\n",
"[164/291] Writing tensor layers.17.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[165/291] Writing tensor layers.17.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[166/291] Writing tensor layers.18.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[167/291] Writing tensor layers.18.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[168/291] Writing tensor layers.18.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[169/291] Writing tensor layers.18.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[170/291] Writing tensor layers.18.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[171/291] Writing tensor layers.18.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[172/291] Writing tensor layers.18.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')\n",
"[173/291] Writing tensor layers.18.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[174/291] Writing tensor layers.18.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[175/291] Writing tensor layers.19.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[176/291] Writing tensor layers.19.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[177/291] Writing tensor layers.19.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[178/291] Writing tensor layers.19.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[179/291] Writing tensor layers.19.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[180/291] Writing tensor layers.19.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[181/291] Writing tensor layers.19.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')\n",
"[182/291] Writing tensor layers.19.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[183/291] Writing tensor layers.19.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[184/291] Writing tensor layers.20.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[185/291] Writing tensor layers.20.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[186/291] Writing tensor layers.20.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[187/291] Writing tensor layers.20.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[188/291] Writing tensor layers.20.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[189/291] Writing tensor layers.20.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[190/291] Writing tensor layers.20.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')\n",
"[191/291] Writing tensor layers.20.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[192/291] Writing tensor layers.20.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[193/291] Writing tensor layers.21.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[194/291] Writing tensor layers.21.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[195/291] Writing tensor layers.21.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[196/291] Writing tensor layers.21.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[197/291] Writing tensor layers.21.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[198/291] Writing tensor layers.21.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[199/291] Writing tensor layers.21.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')\n",
"[200/291] Writing tensor layers.21.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[201/291] Writing tensor layers.21.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[202/291] Writing tensor layers.22.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[203/291] Writing tensor layers.22.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[204/291] Writing tensor layers.22.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[205/291] Writing tensor layers.22.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[206/291] Writing tensor layers.22.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[207/291] Writing tensor layers.22.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[208/291] Writing tensor layers.22.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')\n",
"[209/291] Writing tensor layers.22.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[210/291] Writing tensor layers.22.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[211/291] Writing tensor layers.23.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[212/291] Writing tensor layers.23.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[213/291] Writing tensor layers.23.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[214/291] Writing tensor layers.23.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[215/291] Writing tensor layers.23.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[216/291] Writing tensor layers.23.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[217/291] Writing tensor layers.23.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')\n",
"[218/291] Writing tensor layers.23.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[219/291] Writing tensor layers.23.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[220/291] Writing tensor layers.24.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[221/291] Writing tensor layers.24.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[222/291] Writing tensor layers.24.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[223/291] Writing tensor layers.24.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[224/291] Writing tensor layers.24.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[225/291] Writing tensor layers.24.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[226/291] Writing tensor layers.24.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')\n",
"[227/291] Writing tensor layers.24.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[228/291] Writing tensor layers.24.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[229/291] Writing tensor layers.25.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[230/291] Writing tensor layers.25.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[231/291] Writing tensor layers.25.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[232/291] Writing tensor layers.25.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[233/291] Writing tensor layers.25.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[234/291] Writing tensor layers.25.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[235/291] Writing tensor layers.25.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')\n",
"[236/291] Writing tensor layers.25.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[237/291] Writing tensor layers.25.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[238/291] Writing tensor layers.26.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[239/291] Writing tensor layers.26.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[240/291] Writing tensor layers.26.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[241/291] Writing tensor layers.26.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[242/291] Writing tensor layers.26.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[243/291] Writing tensor layers.26.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[244/291] Writing tensor layers.26.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')\n",
"[245/291] Writing tensor layers.26.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[246/291] Writing tensor layers.26.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[247/291] Writing tensor layers.27.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[248/291] Writing tensor layers.27.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[249/291] Writing tensor layers.27.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[250/291] Writing tensor layers.27.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[251/291] Writing tensor layers.27.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[252/291] Writing tensor layers.27.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[253/291] Writing tensor layers.27.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')\n",
"[254/291] Writing tensor layers.27.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[255/291] Writing tensor layers.27.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[256/291] Writing tensor layers.28.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[257/291] Writing tensor layers.28.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[258/291] Writing tensor layers.28.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[259/291] Writing tensor layers.28.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[260/291] Writing tensor layers.28.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[261/291] Writing tensor layers.28.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[262/291] Writing tensor layers.28.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')\n",
"[263/291] Writing tensor layers.28.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[264/291] Writing tensor layers.28.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[265/291] Writing tensor layers.29.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[266/291] Writing tensor layers.29.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[267/291] Writing tensor layers.29.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[268/291] Writing tensor layers.29.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[269/291] Writing tensor layers.29.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[270/291] Writing tensor layers.29.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[271/291] Writing tensor layers.29.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')\n",
"[272/291] Writing tensor layers.29.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[273/291] Writing tensor layers.29.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[274/291] Writing tensor layers.30.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[275/291] Writing tensor layers.30.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[276/291] Writing tensor layers.30.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[277/291] Writing tensor layers.30.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[278/291] Writing tensor layers.30.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[279/291] Writing tensor layers.30.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[280/291] Writing tensor layers.30.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')\n",
"[281/291] Writing tensor layers.30.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[282/291] Writing tensor layers.30.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[283/291] Writing tensor layers.31.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[284/291] Writing tensor layers.31.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[285/291] Writing tensor layers.31.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[286/291] Writing tensor layers.31.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[287/291] Writing tensor layers.31.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[288/291] Writing tensor layers.31.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[289/291] Writing tensor layers.31.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')\n",
"[290/291] Writing tensor layers.31.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[291/291] Writing tensor layers.31.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"Wrote zh-models/7B/ggml-model-f16.bin\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"### 将FP16模型量化为4-bit\n",
"\n",
"我们进一步将FP16模型转换为4-bit量化模型,此处选择的是新版Q4_K方法。"
],
"metadata": {
"id": "hEZEJAVYCHkc"
}
},
{
"cell_type": "code",
"source": [
"!cd llama.cpp && ./quantize ./zh-models/7B/ggml-model-f16.bin ./zh-models/7B/ggml-model-q4_K.bin q4_K"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "2xyais7OUVDI",
"outputId": "ebe6c758-15ff-4150-f68f-c5cddb1dfff6"
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"main: build = 670 (254a7a7)\n",
"main: quantizing './zh-models/7B/ggml-model-f16.bin' to './zh-models/7B/ggml-model-q4_K.bin' as Q4_K\n",
"llama.cpp: loading model from ./zh-models/7B/ggml-model-f16.bin\n",
"llama.cpp: saving model to ./zh-models/7B/ggml-model-q4_K.bin\n",
"[ 1/ 291] tok_embeddings.weight - 4096 x 49954, type = f16, quantizing .. size = 390.27 MB -> 109.76 MB | hist: \n",
"[ 2/ 291] norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 3/ 291] output.weight - 4096 x 49954, type = f16, quantizing .. size = 390.27 MB -> 160.07 MB | hist: \n",
"[ 4/ 291] layers.0.attention.wq.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 5/ 291] layers.0.attention.wk.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 6/ 291] layers.0.attention.wv.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 13.12 MB | hist: \n",
"[ 7/ 291] layers.0.attention.wo.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 8/ 291] layers.0.attention_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 9/ 291] layers.0.feed_forward.w1.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 24.19 MB | hist: \n",
"[ 10/ 291] layers.0.feed_forward.w2.weight - 11008 x 4096, type = f16, quantizing .. size = 86.00 MB -> 35.27 MB | hist: \n",
"[ 11/ 291] layers.0.feed_forward.w3.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 24.19 MB | hist: \n",
"[ 12/ 291] layers.0.ffn_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 13/ 291] layers.1.attention.wq.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 14/ 291] layers.1.attention.wk.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 15/ 291] layers.1.attention.wv.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 13.12 MB | hist: \n",
"[ 16/ 291] layers.1.attention.wo.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 17/ 291] layers.1.attention_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 18/ 291] layers.1.feed_forward.w1.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 24.19 MB | hist: \n",
"[ 19/ 291] layers.1.feed_forward.w2.weight - 11008 x 4096, type = f16, quantizing .. size = 86.00 MB -> 35.27 MB | hist: \n",
"[ 20/ 291] layers.1.feed_forward.w3.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 24.19 MB | hist: \n",
"[ 21/ 291] layers.1.ffn_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 22/ 291] layers.2.attention.wq.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 23/ 291] layers.2.attention.wk.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 24/ 291] layers.2.attention.wv.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 13.12 MB | hist: \n",
"[ 25/ 291] layers.2.attention.wo.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 26/ 291] layers.2.attention_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 27/ 291] layers.2.feed_forward.w1.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 24.19 MB | hist: \n",
"[ 28/ 291] layers.2.feed_forward.w2.weight - 11008 x 4096, type = f16, quantizing .. size = 86.00 MB -> 35.27 MB | hist: \n",
"[ 29/ 291] layers.2.feed_forward.w3.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 24.19 MB | hist: \n",
"[ 30/ 291] layers.2.ffn_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 31/ 291] layers.3.attention.wq.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 32/ 291] layers.3.attention.wk.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 33/ 291] layers.3.attention.wv.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 13.12 MB | hist: \n",
"[ 34/ 291] layers.3.attention.wo.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 35/ 291] layers.3.attention_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 36/ 291] layers.3.feed_forward.w1.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 24.19 MB | hist: \n",
"[ 37/ 291] layers.3.feed_forward.w2.weight - 11008 x 4096, type = f16, quantizing .. size = 86.00 MB -> 35.27 MB | hist: \n",
"[ 38/ 291] layers.3.feed_forward.w3.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 24.19 MB | hist: \n",
"[ 39/ 291] layers.3.ffn_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 40/ 291] layers.4.attention.wq.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 41/ 291] layers.4.attention.wk.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 42/ 291] layers.4.attention.wv.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 43/ 291] layers.4.attention.wo.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 44/ 291] layers.4.attention_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 45/ 291] layers.4.feed_forward.w1.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 24.19 MB | hist: \n",
"[ 46/ 291] layers.4.feed_forward.w2.weight - 11008 x 4096, type = f16, quantizing .. size = 86.00 MB -> 24.19 MB | hist: \n",
"[ 47/ 291] layers.4.feed_forward.w3.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 24.19 MB | hist: \n",
"[ 48/ 291] layers.4.ffn_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 49/ 291] layers.5.attention.wq.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 50/ 291] layers.5.attention.wk.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 51/ 291] layers.5.attention.wv.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 52/ 291] layers.5.attention.wo.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 53/ 291] layers.5.attention_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 54/ 291] layers.5.feed_forward.w1.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 24.19 MB | hist: \n",
"[ 55/ 291] layers.5.feed_forward.w2.weight - 11008 x 4096, type = f16, quantizing .. size = 86.00 MB -> 24.19 MB | hist: \n",
"[ 56/ 291] layers.5.feed_forward.w3.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 24.19 MB | hist: \n",
"[ 57/ 291] layers.5.ffn_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 58/ 291] layers.6.attention.wq.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 59/ 291] layers.6.attention.wk.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 60/ 291] layers.6.attention.wv.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 13.12 MB | hist: \n",
"[ 61/ 291] layers.6.attention.wo.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 62/ 291] layers.6.attention_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 63/ 291] layers.6.feed_forward.w1.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 24.19 MB | hist: \n",
"[ 64/ 291] layers.6.feed_forward.w2.weight - 11008 x 4096, type = f16, quantizing .. size = 86.00 MB -> 35.27 MB | hist: \n",
"[ 65/ 291] layers.6.feed_forward.w3.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 24.19 MB | hist: \n",
"[ 66/ 291] layers.6.ffn_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 67/ 291] layers.7.attention.wq.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 68/ 291] layers.7.attention.wk.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 69/ 291] layers.7.attention.wv.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 70/ 291] layers.7.attention.wo.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 71/ 291] layers.7.attention_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 72/ 291] layers.7.feed_forward.w1.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 24.19 MB | hist: \n",
"[ 73/ 291] layers.7.feed_forward.w2.weight - 11008 x 4096, type = f16, quantizing .. size = 86.00 MB -> 24.19 MB | hist: \n",
"[ 74/ 291] layers.7.feed_forward.w3.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 24.19 MB | hist: \n",
"[ 75/ 291] layers.7.ffn_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 76/ 291] layers.8.attention.wq.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 77/ 291] layers.8.attention.wk.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 78/ 291] layers.8.attention.wv.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 79/ 291] layers.8.attention.wo.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 80/ 291] layers.8.attention_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 81/ 291] layers.8.feed_forward.w1.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 24.19 MB | hist: \n",
"[ 82/ 291] layers.8.feed_forward.w2.weight - 11008 x 4096, type = f16, quantizing .. size = 86.00 MB -> 24.19 MB | hist: \n",
"[ 83/ 291] layers.8.feed_forward.w3.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 24.19 MB | hist: \n",
"[ 84/ 291] layers.8.ffn_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 85/ 291] layers.9.attention.wq.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 86/ 291] layers.9.attention.wk.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 87/ 291] layers.9.attention.wv.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 13.12 MB | hist: \n",
"[ 88/ 291] layers.9.attention.wo.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 89/ 291] layers.9.attention_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 90/ 291] layers.9.feed_forward.w1.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 24.19 MB | hist: \n",
"[ 91/ 291] layers.9.feed_forward.w2.weight - 11008 x 4096, type = f16, quantizing .. size = 86.00 MB -> 35.27 MB | hist: \n",
"[ 92/ 291] layers.9.feed_forward.w3.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 24.19 MB | hist: \n",
"[ 93/ 291] layers.9.ffn_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 94/ 291] layers.10.attention.wq.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 95/ 291] layers.10.attention.wk.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 96/ 291] layers.10.attention.wv.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 97/ 291] layers.10.attention.wo.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 98/ 291] layers.10.attention_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 99/ 291] layers.10.feed_forward.w1.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 24.19 MB | hist: \n",
"[ 100/ 291] layers.10.feed_forward.w2.weight - 11008 x 4096, type = f16, quantizing .. size = 86.00 MB -> 24.19 MB | hist: \n",
"[ 101/ 291] layers.10.feed_forward.w3.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 24.19 MB | hist: \n",
"[ 102/ 291] layers.10.ffn_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 103/ 291] layers.11.attention.wq.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 104/ 291] layers.11.attention.wk.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 105/ 291] layers.11.attention.wv.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 106/ 291] layers.11.attention.wo.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 107/ 291] layers.11.attention_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 108/ 291] layers.11.feed_forward.w1.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 24.19 MB | hist: \n",
"[ 109/ 291] layers.11.feed_forward.w2.weight - 11008 x 4096, type = f16, quantizing .. size = 86.00 MB -> 24.19 MB | hist: \n",
"[ 110/ 291] layers.11.feed_forward.w3.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 24.19 MB | hist: \n",
"[ 111/ 291] layers.11.ffn_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 112/ 291] layers.12.attention.wq.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 113/ 291] layers.12.attention.wk.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 114/ 291] layers.12.attention.wv.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 13.12 MB | hist: \n",
"[ 115/ 291] layers.12.attention.wo.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 116/ 291] layers.12.attention_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 117/ 291] layers.12.feed_forward.w1.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 24.19 MB | hist: \n",
"[ 118/ 291] layers.12.feed_forward.w2.weight - 11008 x 4096, type = f16, quantizing .. size = 86.00 MB -> 35.27 MB | hist: \n",
"[ 119/ 291] layers.12.feed_forward.w3.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 24.19 MB | hist: \n",
"[ 120/ 291] layers.12.ffn_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 121/ 291] layers.13.attention.wq.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 122/ 291] layers.13.attention.wk.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 123/ 291] layers.13.attention.wv.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 124/ 291] layers.13.attention.wo.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 125/ 291] layers.13.attention_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 126/ 291] layers.13.feed_forward.w1.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 24.19 MB | hist: \n",
"[ 127/ 291] layers.13.feed_forward.w2.weight - 11008 x 4096, type = f16, quantizing .. size = 86.00 MB -> 24.19 MB | hist: \n",
"[ 128/ 291] layers.13.feed_forward.w3.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 24.19 MB | hist: \n",
"[ 129/ 291] layers.13.ffn_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 130/ 291] layers.14.attention.wq.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 131/ 291] layers.14.attention.wk.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 132/ 291] layers.14.attention.wv.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 133/ 291] layers.14.attention.wo.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 134/ 291] layers.14.attention_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 135/ 291] layers.14.feed_forward.w1.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 24.19 MB | hist: \n",
"[ 136/ 291] layers.14.feed_forward.w2.weight - 11008 x 4096, type = f16, quantizing .. size = 86.00 MB -> 24.19 MB | hist: \n",
"[ 137/ 291] layers.14.feed_forward.w3.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 24.19 MB | hist: \n",
"[ 138/ 291] layers.14.ffn_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 139/ 291] layers.15.attention.wq.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 140/ 291] layers.15.attention.wk.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 141/ 291] layers.15.attention.wv.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 13.12 MB | hist: \n",
"[ 142/ 291] layers.15.attention.wo.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 143/ 291] layers.15.attention_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 144/ 291] layers.15.feed_forward.w1.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 24.19 MB | hist: \n",
"[ 145/ 291] layers.15.feed_forward.w2.weight - 11008 x 4096, type = f16, quantizing .. size = 86.00 MB -> 35.27 MB | hist: \n",
"[ 146/ 291] layers.15.feed_forward.w3.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 24.19 MB | hist: \n",
"[ 147/ 291] layers.15.ffn_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 148/ 291] layers.16.attention.wq.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 149/ 291] layers.16.attention.wk.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 150/ 291] layers.16.attention.wv.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 151/ 291] layers.16.attention.wo.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 152/ 291] layers.16.attention_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 153/ 291] layers.16.feed_forward.w1.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 24.19 MB | hist: \n",
"[ 154/ 291] layers.16.feed_forward.w2.weight - 11008 x 4096, type = f16, quantizing .. size = 86.00 MB -> 24.19 MB | hist: \n",
"[ 155/ 291] layers.16.feed_forward.w3.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 24.19 MB | hist: \n",
"[ 156/ 291] layers.16.ffn_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 157/ 291] layers.17.attention.wq.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 158/ 291] layers.17.attention.wk.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 159/ 291] layers.17.attention.wv.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 160/ 291] layers.17.attention.wo.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 161/ 291] layers.17.attention_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 162/ 291] layers.17.feed_forward.w1.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 24.19 MB | hist: \n",
"[ 163/ 291] layers.17.feed_forward.w2.weight - 11008 x 4096, type = f16, quantizing .. size = 86.00 MB -> 24.19 MB | hist: \n",
"[ 164/ 291] layers.17.feed_forward.w3.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 24.19 MB | hist: \n",
"[ 165/ 291] layers.17.ffn_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 166/ 291] layers.18.attention.wq.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 167/ 291] layers.18.attention.wk.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 168/ 291] layers.18.attention.wv.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 13.12 MB | hist: \n",
"[ 169/ 291] layers.18.attention.wo.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 170/ 291] layers.18.attention_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 171/ 291] layers.18.feed_forward.w1.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 24.19 MB | hist: \n",
"[ 172/ 291] layers.18.feed_forward.w2.weight - 11008 x 4096, type = f16, quantizing .. size = 86.00 MB -> 35.27 MB | hist: \n",
"[ 173/ 291] layers.18.feed_forward.w3.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 24.19 MB | hist: \n",
"[ 174/ 291] layers.18.ffn_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 175/ 291] layers.19.attention.wq.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 176/ 291] layers.19.attention.wk.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 177/ 291] layers.19.attention.wv.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 178/ 291] layers.19.attention.wo.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 179/ 291] layers.19.attention_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 180/ 291] layers.19.feed_forward.w1.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 24.19 MB | hist: \n",
"[ 181/ 291] layers.19.feed_forward.w2.weight - 11008 x 4096, type = f16, quantizing .. size = 86.00 MB -> 24.19 MB | hist: \n",
"[ 182/ 291] layers.19.feed_forward.w3.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 24.19 MB | hist: \n",
"[ 183/ 291] layers.19.ffn_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 184/ 291] layers.20.attention.wq.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 185/ 291] layers.20.attention.wk.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 186/ 291] layers.20.attention.wv.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 187/ 291] layers.20.attention.wo.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 188/ 291] layers.20.attention_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 189/ 291] layers.20.feed_forward.w1.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 24.19 MB | hist: \n",
"[ 190/ 291] layers.20.feed_forward.w2.weight - 11008 x 4096, type = f16, quantizing .. size = 86.00 MB -> 24.19 MB | hist: \n",
"[ 191/ 291] layers.20.feed_forward.w3.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 24.19 MB | hist: \n",
"[ 192/ 291] layers.20.ffn_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 193/ 291] layers.21.attention.wq.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 194/ 291] layers.21.attention.wk.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 195/ 291] layers.21.attention.wv.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 13.12 MB | hist: \n",
"[ 196/ 291] layers.21.attention.wo.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 197/ 291] layers.21.attention_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 198/ 291] layers.21.feed_forward.w1.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 24.19 MB | hist: \n",
"[ 199/ 291] layers.21.feed_forward.w2.weight - 11008 x 4096, type = f16, quantizing .. size = 86.00 MB -> 35.27 MB | hist: \n",
"[ 200/ 291] layers.21.feed_forward.w3.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 24.19 MB | hist: \n",
"[ 201/ 291] layers.21.ffn_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 202/ 291] layers.22.attention.wq.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 203/ 291] layers.22.attention.wk.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 204/ 291] layers.22.attention.wv.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 205/ 291] layers.22.attention.wo.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 206/ 291] layers.22.attention_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 207/ 291] layers.22.feed_forward.w1.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 24.19 MB | hist: \n",
"[ 208/ 291] layers.22.feed_forward.w2.weight - 11008 x 4096, type = f16, quantizing .. size = 86.00 MB -> 24.19 MB | hist: \n",
"[ 209/ 291] layers.22.feed_forward.w3.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 24.19 MB | hist: \n",
"[ 210/ 291] layers.22.ffn_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 211/ 291] layers.23.attention.wq.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 212/ 291] layers.23.attention.wk.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 213/ 291] layers.23.attention.wv.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 214/ 291] layers.23.attention.wo.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 215/ 291] layers.23.attention_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 216/ 291] layers.23.feed_forward.w1.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 24.19 MB | hist: \n",
"[ 217/ 291] layers.23.feed_forward.w2.weight - 11008 x 4096, type = f16, quantizing .. size = 86.00 MB -> 24.19 MB | hist: \n",
"[ 218/ 291] layers.23.feed_forward.w3.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 24.19 MB | hist: \n",
"[ 219/ 291] layers.23.ffn_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 220/ 291] layers.24.attention.wq.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 221/ 291] layers.24.attention.wk.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 222/ 291] layers.24.attention.wv.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 13.12 MB | hist: \n",
"[ 223/ 291] layers.24.attention.wo.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 224/ 291] layers.24.attention_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 225/ 291] layers.24.feed_forward.w1.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 24.19 MB | hist: \n",
"[ 226/ 291] layers.24.feed_forward.w2.weight - 11008 x 4096, type = f16, quantizing .. size = 86.00 MB -> 35.27 MB | hist: \n",
"[ 227/ 291] layers.24.feed_forward.w3.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 24.19 MB | hist: \n",
"[ 228/ 291] layers.24.ffn_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 229/ 291] layers.25.attention.wq.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 230/ 291] layers.25.attention.wk.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 231/ 291] layers.25.attention.wv.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 232/ 291] layers.25.attention.wo.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 233/ 291] layers.25.attention_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 234/ 291] layers.25.feed_forward.w1.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 24.19 MB | hist: \n",
"[ 235/ 291] layers.25.feed_forward.w2.weight - 11008 x 4096, type = f16, quantizing .. size = 86.00 MB -> 24.19 MB | hist: \n",
"[ 236/ 291] layers.25.feed_forward.w3.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 24.19 MB | hist: \n",
"[ 237/ 291] layers.25.ffn_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 238/ 291] layers.26.attention.wq.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 239/ 291] layers.26.attention.wk.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 240/ 291] layers.26.attention.wv.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 241/ 291] layers.26.attention.wo.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 242/ 291] layers.26.attention_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 243/ 291] layers.26.feed_forward.w1.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 24.19 MB | hist: \n",
"[ 244/ 291] layers.26.feed_forward.w2.weight - 11008 x 4096, type = f16, quantizing .. size = 86.00 MB -> 24.19 MB | hist: \n",
"[ 245/ 291] layers.26.feed_forward.w3.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 24.19 MB | hist: \n",
"[ 246/ 291] layers.26.ffn_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 247/ 291] layers.27.attention.wq.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 248/ 291] layers.27.attention.wk.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 249/ 291] layers.27.attention.wv.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 13.12 MB | hist: \n",
"[ 250/ 291] layers.27.attention.wo.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 251/ 291] layers.27.attention_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 252/ 291] layers.27.feed_forward.w1.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 24.19 MB | hist: \n",
"[ 253/ 291] layers.27.feed_forward.w2.weight - 11008 x 4096, type = f16, quantizing .. size = 86.00 MB -> 35.27 MB | hist: \n",
"[ 254/ 291] layers.27.feed_forward.w3.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 24.19 MB | hist: \n",
"[ 255/ 291] layers.27.ffn_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 256/ 291] layers.28.attention.wq.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 257/ 291] layers.28.attention.wk.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 258/ 291] layers.28.attention.wv.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 13.12 MB | hist: \n",
"[ 259/ 291] layers.28.attention.wo.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 260/ 291] layers.28.attention_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 261/ 291] layers.28.feed_forward.w1.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 24.19 MB | hist: \n",
"[ 262/ 291] layers.28.feed_forward.w2.weight - 11008 x 4096, type = f16, quantizing .. size = 86.00 MB -> 35.27 MB | hist: \n",
"[ 263/ 291] layers.28.feed_forward.w3.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 24.19 MB | hist: \n",
"[ 264/ 291] layers.28.ffn_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 265/ 291] layers.29.attention.wq.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 266/ 291] layers.29.attention.wk.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 267/ 291] layers.29.attention.wv.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 13.12 MB | hist: \n",
"[ 268/ 291] layers.29.attention.wo.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 269/ 291] layers.29.attention_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 270/ 291] layers.29.feed_forward.w1.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 24.19 MB | hist: \n",
"[ 271/ 291] layers.29.feed_forward.w2.weight - 11008 x 4096, type = f16, quantizing .. size = 86.00 MB -> 35.27 MB | hist: \n",
"[ 272/ 291] layers.29.feed_forward.w3.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 24.19 MB | hist: \n",
"[ 273/ 291] layers.29.ffn_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 274/ 291] layers.30.attention.wq.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 275/ 291] layers.30.attention.wk.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 276/ 291] layers.30.attention.wv.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 13.12 MB | hist: \n",
"[ 277/ 291] layers.30.attention.wo.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 278/ 291] layers.30.attention_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 279/ 291] layers.30.feed_forward.w1.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 24.19 MB | hist: \n",
"[ 280/ 291] layers.30.feed_forward.w2.weight - 11008 x 4096, type = f16, quantizing .. size = 86.00 MB -> 35.27 MB | hist: \n",
"[ 281/ 291] layers.30.feed_forward.w3.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 24.19 MB | hist: \n",
"[ 282/ 291] layers.30.ffn_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 283/ 291] layers.31.attention.wq.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 284/ 291] layers.31.attention.wk.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 285/ 291] layers.31.attention.wv.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 13.12 MB | hist: \n",
"[ 286/ 291] layers.31.attention.wo.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 9.00 MB | hist: \n",
"[ 287/ 291] layers.31.attention_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 288/ 291] layers.31.feed_forward.w1.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 24.19 MB | hist: \n",
"[ 289/ 291] layers.31.feed_forward.w2.weight - 11008 x 4096, type = f16, quantizing .. size = 86.00 MB -> 35.27 MB | hist: \n",
"[ 290/ 291] layers.31.feed_forward.w3.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 24.19 MB | hist: \n",
"[ 291/ 291] layers.31.ffn_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"llama_model_quantize_internal: model size = 13133.55 MB\n",
"llama_model_quantize_internal: quant size = 3988.22 MB\n",
"\n",
"main: quantize time = 153421.48 ms\n",
"main: total time = 153421.48 ms\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"### (可选)测试量化模型解码\n",
"至此已完成了所有转换步骤。\n",
"我们运行一条命令测试一下是否能够正常加载并进行对话。\n",
"\n",
"FP16和Q4量化文件存放在./llama.cpp/zh-models/7B下,可按需下载使用。"
],
"metadata": {
"id": "DLkuRAo9Vkb1"
}
},
{
"cell_type": "code",
"source": [
"!cd llama.cpp && ./main -m ./zh-models/7B/ggml-model-q4_K.bin --color -p \"详细介绍一下北京的名胜古迹:\" -n 128"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "tW-ep1BsVQtG",
"outputId": "03f0343f-3b7c-490e-a0ab-6724d79c5dc8"
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"main: build = 670 (254a7a7)\n",
"main: seed = 1686819449\n",
"llama.cpp: loading model from ./zh-models/7B/ggml-model-q4_K.bin\n",
"llama_model_load_internal: format = ggjt v3 (latest)\n",
"llama_model_load_internal: n_vocab = 49954\n",
"llama_model_load_internal: n_ctx = 512\n",
"llama_model_load_internal: n_embd = 4096\n",
"llama_model_load_internal: n_mult = 256\n",
"llama_model_load_internal: n_head = 32\n",
"llama_model_load_internal: n_layer = 32\n",
"llama_model_load_internal: n_rot = 128\n",
"llama_model_load_internal: ftype = 15 (mostly Q4_K - Medium)\n",
"llama_model_load_internal: n_ff = 11008\n",
"llama_model_load_internal: n_parts = 1\n",
"llama_model_load_internal: model size = 7B\n",
"llama_model_load_internal: ggml ctx size = 0.07 MB\n",
"llama_model_load_internal: mem required = 5780.29 MB (+ 1026.00 MB per state)\n",
"................................................................................................\n",
"llama_init_from_file: kv self size = 256.00 MB\n",
"\n",
"system_info: n_threads = 4 / 4 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | \n",
"sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000\n",
"generate: n_ctx = 512, n_batch = 512, n_predict = 128, n_keep = 0\n",
"\n",
"\n",
"\u001b[33m 详细介绍一下北京的名胜古迹:\u001b[0m天安门、故宫、颐和园、圆明园、北海公园等。 参观后你一定会爱上这座城市! [end of text]\n",
"\n",
"llama_print_timings: load time = 16410.24 ms\n",
"llama_print_timings: sample time = 30.04 ms / 30 runs ( 1.00 ms per token)\n",
"llama_print_timings: prompt eval time = 3479.21 ms / 11 tokens ( 316.29 ms per token)\n",
"llama_print_timings: eval time = 10516.40 ms / 29 runs ( 362.63 ms per token)\n",
"llama_print_timings: total time = 14042.46 ms\n"
]
}
]
}
]
}
================================================
FILE: notebooks/finetune_chinese_alpaca_lora.ipynb
================================================
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "cHgpJN1uzD8B"
},
"source": [
"# Tutorial on instruction tuning of Chinese-Alpaca-7B\n",
"\n",
"More info: https://github.com/ymcui/Chinese-LLaMA-Alpaca"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "2meQHBlHxcsi"
},
"source": [
"## Install Dependencies"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "quRXOPaZwmwz",
"outputId": "0fadc49a-b1a0-4131-9997-58c6c1a76fc4"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n",
"Collecting transformers==4.28.1\n",
" Downloading transformers-4.28.1-py3-none-any.whl (7.0 MB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m7.0/7.0 MB\u001b[0m \u001b[31m53.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hRequirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from transformers==4.28.1) (3.12.0)\n",
"Collecting huggingface-hub<1.0,>=0.11.0 (from transformers==4.28.1)\n",
" Downloading huggingface_hub-0.14.1-py3-none-any.whl (224 kB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m224.5/224.5 kB\u001b[0m \u001b[31m19.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hRequirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.10/dist-packages (from transformers==4.28.1) (1.22.4)\n",
"Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from transformers==4.28.1) (23.1)\n",
"Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.10/dist-packages (from transformers==4.28.1) (6.0)\n",
"Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.10/dist-packages (from transformers==4.28.1) (2022.10.31)\n",
"Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from transformers==4.28.1) (2.27.1)\n",
"Collecting tokenizers!=0.11.3,<0.14,>=0.11.1 (from transformers==4.28.1)\n",
" Downloading tokenizers-0.13.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.8 MB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m7.8/7.8 MB\u001b[0m \u001b[31m98.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hRequirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.10/dist-packages (from transformers==4.28.1) (4.65.0)\n",
"Requirement already satisfied: fsspec in /usr/local/lib/python3.10/dist-packages (from huggingface-hub<1.0,>=0.11.0->transformers==4.28.1) (2023.4.0)\n",
"Requirement already satisfied: typing-extensions>=3.7.4.3 in /usr/local/lib/python3.10/dist-packages (from huggingface-hub<1.0,>=0.11.0->transformers==4.28.1) (4.5.0)\n",
"Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests->transformers==4.28.1) (1.26.15)\n",
"Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests->transformers==4.28.1) (2022.12.7)\n",
"Requirement already satisfied: charset-normalizer~=2.0.0 in /usr/local/lib/python3.10/dist-packages (from requests->transformers==4.28.1) (2.0.12)\n",
"Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests->transformers==4.28.1) (3.4)\n",
"Installing collected packages: tokenizers, huggingface-hub, transformers\n",
"Successfully installed huggingface-hub-0.14.1 tokenizers-0.13.3 transformers-4.28.1\n",
"Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n",
"Collecting git+https://github.com/huggingface/peft.git@13e53fc\n",
" Cloning https://github.com/huggingface/peft.git (to revision 13e53fc) to /tmp/pip-req-build-ccgd0ls9\n",
" Running command git clone --filter=blob:none --quiet https://github.com/huggingface/peft.git /tmp/pip-req-build-ccgd0ls9\n",
"\u001b[33m WARNING: Did not find branch or tag '13e53fc', assuming revision or ref.\u001b[0m\u001b[33m\n",
"\u001b[0m Running command git checkout -q 13e53fc\n",
" Resolved https://github.com/huggingface/peft.git to commit 13e53fc\n",
" Installing build dependencies ... \u001b[?25l\u001b[?25hdone\n",
" Getting requirements to build wheel ... \u001b[?25l\u001b[?25hdone\n",
" Preparing metadata (pyproject.toml) ... \u001b[?25l\u001b[?25hdone\n",
"Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.10/dist-packages (from peft==0.3.0.dev0) (1.22.4)\n",
"Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from peft==0.3.0.dev0) (23.1)\n",
"Requirement already satisfied: psutil in /usr/local/lib/python3.10/dist-packages (from peft==0.3.0.dev0) (5.9.5)\n",
"Requirement already satisfied: pyyaml in /usr/local/lib/python3.10/dist-packages (from peft==0.3.0.dev0) (6.0)\n",
"Requirement already satisfied: torch>=1.13.0 in /usr/local/lib/python3.10/dist-packages (from peft==0.3.0.dev0) (2.0.0+cu118)\n",
"Requirement already satisfied: transformers in /usr/local/lib/python3.10/dist-packages (from peft==0.3.0.dev0) (4.28.1)\n",
"Collecting accelerate (from peft==0.3.0.dev0)\n",
" Downloading accelerate-0.19.0-py3-none-any.whl (219 kB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m219.1/219.1 kB\u001b[0m \u001b[31m5.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hRequirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from torch>=1.13.0->peft==0.3.0.dev0) (3.12.0)\n",
"Requirement already satisfied: typing-extensions in /usr/local/lib/python3.10/dist-packages (from torch>=1.13.0->peft==0.3.0.dev0) (4.5.0)\n",
"Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from torch>=1.13.0->peft==0.3.0.dev0) (1.11.1)\n",
"Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch>=1.13.0->peft==0.3.0.dev0) (3.1)\n",
"Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from torch>=1.13.0->peft==0.3.0.dev0) (3.1.2)\n",
"Requirement already satisfied: triton==2.0.0 in /usr/local/lib/python3.10/dist-packages (from torch>=1.13.0->peft==0.3.0.dev0) (2.0.0)\n",
"Requirement already satisfied: cmake in /usr/local/lib/python3.10/dist-packages (from triton==2.0.0->torch>=1.13.0->peft==0.3.0.dev0) (3.25.2)\n",
"Requirement already satisfied: lit in /usr/local/lib/python3.10/dist-packages (from triton==2.0.0->torch>=1.13.0->peft==0.3.0.dev0) (16.0.3)\n",
"Requirement already satisfied: huggingface-hub<1.0,>=0.11.0 in /usr/local/lib/python3.10/dist-packages (from transformers->peft==0.3.0.dev0) (0.14.1)\n",
"Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.10/dist-packages (from transformers->peft==0.3.0.dev0) (2022.10.31)\n",
"Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from transformers->peft==0.3.0.dev0) (2.27.1)\n",
"Requirement already satisfied: tokenizers!=0.11.3,<0.14,>=0.11.1 in /usr/local/lib/python3.10/dist-packages (from transformers->peft==0.3.0.dev0) (0.13.3)\n",
"Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.10/dist-packages (from transformers->peft==0.3.0.dev0) (4.65.0)\n",
"Requirement already satisfied: fsspec in /usr/local/lib/python3.10/dist-packages (from huggingface-hub<1.0,>=0.11.0->transformers->peft==0.3.0.dev0) (2023.4.0)\n",
"Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->torch>=1.13.0->peft==0.3.0.dev0) (2.1.2)\n",
"Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests->transformers->peft==0.3.0.dev0) (1.26.15)\n",
"Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests->transformers->peft==0.3.0.dev0) (2022.12.7)\n",
"Requirement already satisfied: charset-normalizer~=2.0.0 in /usr/local/lib/python3.10/dist-packages (from requests->transformers->peft==0.3.0.dev0) (2.0.12)\n",
"Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests->transformers->peft==0.3.0.dev0) (3.4)\n",
"Requirement already satisfied: mpmath>=0.19 in /usr/local/lib/python3.10/dist-packages (from sympy->torch>=1.13.0->peft==0.3.0.dev0) (1.3.0)\n",
"Building wheels for collected packages: peft\n",
" Building wheel for peft (pyproject.toml) ... \u001b[?25l\u001b[?25hdone\n",
" Created wheel for peft: filename=peft-0.3.0.dev0-py3-none-any.whl size=40652 sha256=be80ba07bd8ab5299c123251a644846225e6b44e69e49a85212b0acf8ed05f59\n",
" Stored in directory: /tmp/pip-ephem-wheel-cache-9iaw_m7j/wheels/d9/13/c6/404d5f8a81c5620f65f7fd75b6a66619f013cd79c2875b981c\n",
"Successfully built peft\n",
"Installing collected packages: accelerate, peft\n",
"Successfully installed accelerate-0.19.0 peft-0.3.0.dev0\n",
"Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n",
"Collecting datasets\n",
" Downloading datasets-2.12.0-py3-none-any.whl (474 kB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m474.6/474.6 kB\u001b[0m \u001b[31m10.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hRequirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.10/dist-packages (from datasets) (1.22.4)\n",
"Requirement already satisfied: pyarrow>=8.0.0 in /usr/local/lib/python3.10/dist-packages (from datasets) (9.0.0)\n",
"Collecting dill<0.3.7,>=0.3.0 (from datasets)\n",
" Downloading dill-0.3.6-py3-none-any.whl (110 kB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m110.5/110.5 kB\u001b[0m \u001b[31m14.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hRequirement already satisfied: pandas in /usr/local/lib/python3.10/dist-packages (from datasets) (1.5.3)\n",
"Requirement already satisfied: requests>=2.19.0 in /usr/local/lib/python3.10/dist-packages (from datasets) (2.27.1)\n",
"Requirement already satisfied: tqdm>=4.62.1 in /usr/local/lib/python3.10/dist-packages (from datasets) (4.65.0)\n",
"Collecting xxhash (from datasets)\n",
" Downloading xxhash-3.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (212 kB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m212.5/212.5 kB\u001b[0m \u001b[31m26.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hCollecting multiprocess (from datasets)\n",
" Downloading multiprocess-0.70.14-py310-none-any.whl (134 kB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m134.3/134.3 kB\u001b[0m \u001b[31m18.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hRequirement already satisfied: fsspec[http]>=2021.11.1 in /usr/local/lib/python3.10/dist-packages (from datasets) (2023.4.0)\n",
"Collecting aiohttp (from datasets)\n",
" Downloading aiohttp-3.8.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.0 MB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.0/1.0 MB\u001b[0m \u001b[31m39.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hRequirement already satisfied: huggingface-hub<1.0.0,>=0.11.0 in /usr/local/lib/python3.10/dist-packages (from datasets) (0.14.1)\n",
"Requirement already satisfied: packaging in /usr/local/lib/python3.10/dist-packages (from datasets) (23.1)\n",
"Collecting responses<0.19 (from datasets)\n",
" Downloading responses-0.18.0-py3-none-any.whl (38 kB)\n",
"Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.10/dist-packages (from datasets) (6.0)\n",
"Requirement already satisfied: attrs>=17.3.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets) (23.1.0)\n",
"Requirement already satisfied: charset-normalizer<4.0,>=2.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets) (2.0.12)\n",
"Collecting multidict<7.0,>=4.5 (from aiohttp->datasets)\n",
" Downloading multidict-6.0.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (114 kB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m114.5/114.5 kB\u001b[0m \u001b[31m15.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hCollecting async-timeout<5.0,>=4.0.0a3 (from aiohttp->datasets)\n",
" Downloading async_timeout-4.0.2-py3-none-any.whl (5.8 kB)\n",
"Collecting yarl<2.0,>=1.0 (from aiohttp->datasets)\n",
" Downloading yarl-1.9.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (268 kB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m268.8/268.8 kB\u001b[0m \u001b[31m31.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hCollecting frozenlist>=1.1.1 (from aiohttp->datasets)\n",
" Downloading frozenlist-1.3.3-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (149 kB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m149.6/149.6 kB\u001b[0m \u001b[31m19.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hCollecting aiosignal>=1.1.2 (from aiohttp->datasets)\n",
" Downloading aiosignal-1.3.1-py3-none-any.whl (7.6 kB)\n",
"Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from huggingface-hub<1.0.0,>=0.11.0->datasets) (3.12.0)\n",
"Requirement already satisfied: typing-extensions>=3.7.4.3 in /usr/local/lib/python3.10/dist-packages (from huggingface-hub<1.0.0,>=0.11.0->datasets) (4.5.0)\n",
"Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests>=2.19.0->datasets) (1.26.15)\n",
"Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests>=2.19.0->datasets) (2022.12.7)\n",
"Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests>=2.19.0->datasets) (3.4)\n",
"Requirement already satisfied: python-dateutil>=2.8.1 in /usr/local/lib/python3.10/dist-packages (from pandas->datasets) (2.8.2)\n",
"Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas->datasets) (2022.7.1)\n",
"Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.8.1->pandas->datasets) (1.16.0)\n",
"Installing collected packages: xxhash, multidict, frozenlist, dill, async-timeout, yarl, responses, multiprocess, aiosignal, aiohttp, datasets\n",
"Successfully installed aiohttp-3.8.4 aiosignal-1.3.1 async-timeout-4.0.2 datasets-2.12.0 dill-0.3.6 frozenlist-1.3.3 multidict-6.0.4 multiprocess-0.70.14 responses-0.18.0 xxhash-3.2.0 yarl-1.9.2\n",
"Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n",
"Collecting sentencepiece\n",
" Downloading sentencepiece-0.1.99-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.3/1.3 MB\u001b[0m \u001b[31m20.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hInstalling collected packages: sentencepiece\n",
"Successfully installed sentencepiece-0.1.99\n",
"Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n",
"Collecting deepspeed\n",
" Downloading deepspeed-0.9.2.tar.gz (779 kB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m779.3/779.3 kB\u001b[0m \u001b[31m13.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25h Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
"Collecting hjson (from deepspeed)\n",
" Downloading hjson-3.1.0-py3-none-any.whl (54 kB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m54.0/54.0 kB\u001b[0m \u001b[31m6.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hCollecting ninja (from deepspeed)\n",
" Downloading ninja-1.11.1-py2.py3-none-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (145 kB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m146.0/146.0 kB\u001b[0m \u001b[31m19.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hRequirement already satisfied: numpy in /usr/local/lib/python3.10/dist-packages (from deepspeed) (1.22.4)\n",
"Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from deepspeed) (23.1)\n",
"Requirement already satisfied: psutil in /usr/local/lib/python3.10/dist-packages (from deepspeed) (5.9.5)\n",
"Requirement already satisfied: py-cpuinfo in /usr/local/lib/python3.10/dist-packages (from deepspeed) (9.0.0)\n",
"Requirement already satisfied: pydantic<2.0.0 in /usr/local/lib/python3.10/dist-packages (from deepspeed) (1.10.7)\n",
"Requirement already satisfied: torch in /usr/local/lib/python3.10/dist-packages (from deepspeed) (2.0.0+cu118)\n",
"Requirement already satisfied: tqdm in /usr/local/lib/python3.10/dist-packages (from deepspeed) (4.65.0)\n",
"Requirement already satisfied: typing-extensions>=4.2.0 in /usr/local/lib/python3.10/dist-packages (from pydantic<2.0.0->deepspeed) (4.5.0)\n",
"Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from torch->deepspeed) (3.12.0)\n",
"Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from torch->deepspeed) (1.11.1)\n",
"Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch->deepspeed) (3.1)\n",
"Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from torch->deepspeed) (3.1.2)\n",
"Requirement already satisfied: triton==2.0.0 in /usr/local/lib/python3.10/dist-packages (from torch->deepspeed) (2.0.0)\n",
"Requirement already satisfied: cmake in /usr/local/lib/python3.10/dist-packages (from triton==2.0.0->torch->deepspeed) (3.25.2)\n",
"Requirement already satisfied: lit in /usr/local/lib/python3.10/dist-packages (from triton==2.0.0->torch->deepspeed) (16.0.3)\n",
"Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->torch->deepspeed) (2.1.2)\n",
"Requirement already satisfied: mpmath>=0.19 in /usr/local/lib/python3.10/dist-packages (from sympy->torch->deepspeed) (1.3.0)\n",
"Building wheels for collected packages: deepspeed\n",
" Building wheel for deepspeed (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
" Created wheel for deepspeed: filename=deepspeed-0.9.2-py3-none-any.whl size=811220 sha256=00fc399e490063552f9d2be31b65ad7c35f72a82750be6620340046283b26149\n",
" Stored in directory: /root/.cache/pip/wheels/a6/d2/b1/b15210b5dc024bab4eccbac2148db29959fe01fe6042557d07\n",
"Successfully built deepspeed\n",
"Installing collected packages: ninja, hjson, deepspeed\n",
"Successfully installed deepspeed-0.9.2 hjson-3.1.0 ninja-1.11.1\n"
]
}
],
"source": [
"!pip install transformers==4.28.1\n",
"!pip install git+https://github.com/huggingface/peft.git@13e53fc\n",
"!pip install datasets\n",
"!pip install sentencepiece\n",
"!pip install deepspeed"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "ji21WFqexASI"
},
"source": [
"## Clone our repository\n",
"\n",
"\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "dulrlPMexFNN",
"outputId": "c24273ec-d313-40c2-c0d9-1495524c84db"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Cloning into 'Chinese-LLaMA-Alpaca'...\n",
"remote: Enumerating objects: 889, done.\u001b[K\n",
"remote: Counting objects: 100% (330/330), done.\u001b[K\n",
"remote: Compressing objects: 100% (220/220), done.\u001b[K\n",
"remote: Total 889 (delta 127), reused 190 (delta 108), pack-reused 559\u001b[K\n",
"Receiving objects: 100% (889/889), 11.28 MiB | 26.30 MiB/s, done.\n",
"Resolving deltas: 100% (517/517), done.\n"
]
}
],
"source": [
"!git clone https://github.com/ymcui/Chinese-LLaMA-Alpaca.git"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "pqH_h_ZAz4_e"
},
"source": [
"## Instruction tuning for Alpaca-7B\n",
"\n",
"This follows the setting in https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/SFT-Script, except that to simplify the tutorial,\n",
"- continue training the Chinese-Alpaca-LoRA\n",
"- only train 100 steps\n",
"- omit validation"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!mkdir Chinese-LLaMA-Alpaca/sft_data\n",
"!cp Chinese-LLaMA-Alpaca/data/alpaca_data_zh_51k.json Chinese-LLaMA-Alpaca/sft_data"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "IhrAVNUKSw9_",
"outputId": "791aa99b-ab3d-4e0b-f001-429dbc56dc4d"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"2023-05-12 05:07:23.549181: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT\n",
"[2023-05-12 05:07:24,469] [INFO] [comm.py:622:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl\n",
"05/12/2023 05:07:26 - WARNING - __main__ - Process rank: 0, device: cuda:0, n_gpu: 1distributed training: True, 16-bits training: True\n",
"[INFO|configuration_utils.py:668] 2023-05-12 05:07:27,030 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--decapoda-research--llama-7b-hf/snapshots/5f98eefcc80e437ef68d457ad7bf167c2c6a1348/config.json\n",
"[INFO|configuration_utils.py:720] 2023-05-12 05:07:27,031 >> Model config LlamaConfig {\n",
" \"_name_or_path\": \"decapoda-research/llama-7b-hf\",\n",
" \"architectures\": [\n",
" \"LLaMAForCausalLM\"\n",
" ],\n",
" \"bos_token_id\": 0,\n",
" \"eos_token_id\": 1,\n",
" \"hidden_act\": \"silu\",\n",
" \"hidden_size\": 4096,\n",
" \"initializer_range\": 0.02,\n",
" \"intermediate_size\": 11008,\n",
" \"max_position_embeddings\": 2048,\n",
" \"max_sequence_length\": 2048,\n",
" \"model_type\": \"llama\",\n",
" \"num_attention_heads\": 32,\n",
" \"num_hidden_layers\": 32,\n",
" \"pad_token_id\": -1,\n",
" \"rms_norm_eps\": 1e-06,\n",
" \"tie_word_embeddings\": false,\n",
" \"torch_dtype\": \"float16\",\n",
" \"transformers_version\": \"4.28.1\",\n",
" \"use_cache\": true,\n",
" \"vocab_size\": 32000\n",
"}\n",
"\n",
"[INFO|tokenization_utils_base.py:1809] 2023-05-12 05:07:27,074 >> loading file tokenizer.model from cache at /root/.cache/huggingface/hub/models--ziqingyang--chinese-alpaca-lora-7b/snapshots/89dd628daa79cae8b930f2a7066006c39d6ac454/tokenizer.model\n",
"[INFO|tokenization_utils_base.py:1809] 2023-05-12 05:07:27,074 >> loading file added_tokens.json from cache at None\n",
"[INFO|tokenization_utils_base.py:1809] 2023-05-12 05:07:27,074 >> loading file special_tokens_map.json from cache at /root/.cache/huggingface/hub/models--ziqingyang--chinese-alpaca-lora-7b/snapshots/89dd628daa79cae8b930f2a7066006c39d6ac454/special_tokens_map.json\n",
"[INFO|tokenization_utils_base.py:1809] 2023-05-12 05:07:27,074 >> loading file tokenizer_config.json from cache at /root/.cache/huggingface/hub/models--ziqingyang--chinese-alpaca-lora-7b/snapshots/89dd628daa79cae8b930f2a7066006c39d6ac454/tokenizer_config.json\n",
"05/12/2023 05:07:27 - INFO - __main__ - training files: /content/Chinese-LLaMA-Alpaca/data/alpaca_data_zh_51k.json\n",
"05/12/2023 05:07:27 - WARNING - root - building dataset...\n",
"05/12/2023 05:07:27 - INFO - __name__ - training datasets-/content/Chinese-LLaMA-Alpaca/data/alpaca_data_zh_51k.json has been loaded from disk\n",
"05/12/2023 05:07:27 - INFO - __main__ - Num train_samples 51179\n",
"05/12/2023 05:07:27 - INFO - __main__ - training example:\n",
"05/12/2023 05:07:27 - INFO - __main__ - Below is an instruction that describes a task. Write a response that appropriately completes the request.\n",
"\n",
"### Instruction:\n",
"我们如何在日常生活中减少用水?\n",
"\n",
"### Response: 1. 使用节水装置,如节水淋浴喷头和水龙头。 \n",
"2. 使用水箱或水桶收集家庭废水,例如洗碗和洗浴。 \n",
"3. 在社区中提高节水意识。 \n",
"4. 检查水管和灌溉系统的漏水情况,并及时修复它们。 \n",
"5. 洗澡时间缩短,使用低流量淋浴头节约用水。 \n",
"6. 收集雨水,用于园艺或其他非饮用目的。 \n",
"7. 刷牙或擦手时关掉水龙头。 \n",
"8. 减少浇水草坪的时间。 \n",
"9. 尽可能多地重复使用灰水(来自洗衣机、浴室水槽和淋浴的水)。 \n",
"10. 只购买能源效率高的洗碗机和洗衣机。 \n",
"[INFO|modeling_utils.py:2534] 2023-05-12 05:07:27,107 >> loading weights file pytorch_model.bin from cache at /root/.cache/huggingface/hub/models--decapoda-research--llama-7b-hf/snapshots/5f98eefcc80e437ef68d457ad7bf167c2c6a1348/pytorch_model.bin.index.json\n",
"[INFO|modeling_utils.py:1176] 2023-05-12 05:07:27,110 >> Instantiating LlamaForCausalLM model under default dtype torch.float16.\n",
"[INFO|configuration_utils.py:575] 2023-05-12 05:07:27,110 >> Generate config GenerationConfig {\n",
" \"_from_model_config\": true,\n",
" \"bos_token_id\": 0,\n",
" \"eos_token_id\": 1,\n",
" \"pad_token_id\": -1,\n",
" \"transformers_version\": \"4.28.1\"\n",
"}\n",
"\n",
"Loading checkpoint shards: 100% 33/33 [00:13<00:00, 2.46it/s]\n",
"[INFO|modeling_utils.py:3190] 2023-05-12 05:07:40,719 >> All model checkpoint weights were used when initializing LlamaForCausalLM.\n",
"\n",
"[INFO|modeling_utils.py:3198] 2023-05-12 05:07:40,719 >> All the weights of LlamaForCausalLM were initialized from the model checkpoint at decapoda-research/llama-7b-hf.\n",
"If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training.\n",
"[INFO|configuration_utils.py:537] 2023-05-12 05:07:40,762 >> loading configuration file generation_config.json from cache at /root/.cache/huggingface/hub/models--decapoda-research--llama-7b-hf/snapshots/5f98eefcc80e437ef68d457ad7bf167c2c6a1348/generation_config.json\n",
"[INFO|configuration_utils.py:575] 2023-05-12 05:07:40,762 >> Generate config GenerationConfig {\n",
" \"_from_model_config\": true,\n",
" \"bos_token_id\": 0,\n",
" \"eos_token_id\": 1,\n",
" \"pad_token_id\": 0,\n",
" \"transformers_version\": \"4.28.1\"\n",
"}\n",
"\n",
"len(tokenizer):49954\n",
"05/12/2023 05:08:12 - INFO - __main__ - Peft from pre-trained model\n",
"trainable params: 429211648 || all params: 6905483264 || trainable%: 6.215519342977586\n",
"model.modules_to_save: ['embed_tokens', 'lm_head']\n",
"[INFO|trainer.py:564] 2023-05-12 05:09:45,537 >> max_steps is given, it will override any value given in num_train_epochs\n",
"[INFO|trainer.py:621] 2023-05-12 05:09:45,537 >> Using cuda_amp half precision backend\n",
"/usr/local/lib/python3.10/dist-packages/transformers/optimization.py:391: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning\n",
" warnings.warn(\n",
"[2023-05-12 05:09:45,560] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info: version=0.9.2, git-hash=unknown, git-branch=unknown\n",
"05/12/2023 05:09:49 - INFO - torch.distributed.distributed_c10d - Added key: store_based_barrier_key:2 to store for rank: 0\n",
"05/12/2023 05:09:49 - INFO - torch.distributed.distributed_c10d - Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 1 nodes.\n",
"[2023-05-12 05:09:49,272] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False\n",
"[2023-05-12 05:09:49,272] [INFO] [logging.py:96:log_dist] [Rank 0] Removing param_group that has no 'params' in the client Optimizer\n",
"[2023-05-12 05:09:49,272] [INFO] [logging.py:96:log_dist] [Rank 0] Using client Optimizer as basic optimizer\n",
"[2023-05-12 05:09:49,316] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Basic Optimizer = AdamW\n",
"[2023-05-12 05:09:49,316] [INFO] [utils.py:54:is_zero_supported_optimizer] Checking ZeRO support for optimizer=AdamW type=\n",
"[2023-05-12 05:09:49,316] [WARNING] [engine.py:1104:_do_optimizer_sanity_check] **** You are using ZeRO with an untested optimizer, proceed with caution *****\n",
"[2023-05-12 05:09:49,317] [INFO] [logging.py:96:log_dist] [Rank 0] Creating torch.float16 ZeRO stage 2 optimizer\n",
"[2023-05-12 05:09:49,317] [INFO] [stage_1_and_2.py:133:__init__] Reduce bucket size 100000000\n",
"[2023-05-12 05:09:49,317] [INFO] [stage_1_and_2.py:134:__init__] Allgather bucket size 100000000\n",
"[2023-05-12 05:09:49,317] [INFO] [stage_1_and_2.py:135:__init__] CPU Offload: False\n",
"[2023-05-12 05:09:49,317] [INFO] [stage_1_and_2.py:136:__init__] Round robin gradient partitioning: False\n",
"Using /root/.cache/torch_extensions/py310_cu118 as PyTorch extensions root...\n",
"Emitting ninja build file /root/.cache/torch_extensions/py310_cu118/utils/build.ninja...\n",
"Building extension module utils...\n",
"Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)\n",
"ninja: no work to do.\n",
"Loading extension module utils...\n",
"Time to load utils op: 0.0975944995880127 seconds\n",
"Rank: 0 partition count [1] and sizes[(429211648, False)] \n",
"[2023-05-12 05:09:51,439] [INFO] [utils.py:785:see_memory_usage] Before initializing optimizer states\n",
"[2023-05-12 05:09:51,440] [INFO] [utils.py:786:see_memory_usage] MA 14.49 GB Max_MA 15.29 GB CA 15.73 GB Max_CA 16 GB \n",
"[2023-05-12 05:09:51,440] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory: used = 4.29 GB, percent = 5.1%\n",
"[2023-05-12 05:09:51,639] [INFO] [utils.py:785:see_memory_usage] After initializing optimizer states\n",
"[2023-05-12 05:09:51,640] [INFO] [utils.py:786:see_memory_usage] MA 17.69 GB Max_MA 20.89 GB CA 22.12 GB Max_CA 22 GB \n",
"[2023-05-12 05:09:51,640] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory: used = 4.3 GB, percent = 5.1%\n",
"[2023-05-12 05:09:51,641] [INFO] [stage_1_and_2.py:489:__init__] optimizer state initialized\n",
"[2023-05-12 05:09:51,832] [INFO] [utils.py:785:see_memory_usage] After initializing ZeRO optimizer\n",
"[2023-05-12 05:09:51,832] [INFO] [utils.py:786:see_memory_usage] MA 17.69 GB Max_MA 17.69 GB CA 22.12 GB Max_CA 22 GB \n",
"[2023-05-12 05:09:51,833] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory: used = 4.29 GB, percent = 5.1%\n",
"[2023-05-12 05:09:51,846] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Final Optimizer = AdamW\n",
"[2023-05-12 05:09:51,846] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed using client LR scheduler\n",
"[2023-05-12 05:09:51,846] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed LR Scheduler = \n",
"[2023-05-12 05:09:51,846] [INFO] [logging.py:96:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0], mom=[(0.9, 0.999)]\n",
"[2023-05-12 05:09:51,848] [INFO] [config.py:955:print] DeepSpeedEngine configuration:\n",
"[2023-05-12 05:09:51,848] [INFO] [config.py:959:print] activation_checkpointing_config {\n",
" \"partition_activations\": false, \n",
" \"contiguous_memory_optimization\": false, \n",
" \"cpu_checkpointing\": false, \n",
" \"number_checkpoints\": null, \n",
" \"synchronize_checkpoint_boundary\": false, \n",
" \"profile\": false\n",
"}\n",
"[2023-05-12 05:09:51,848] [INFO] [config.py:959:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True}\n",
"[2023-05-12 05:09:51,849] [INFO] [config.py:959:print] amp_enabled .................. False\n",
"[2023-05-12 05:09:51,849] [INFO] [config.py:959:print] amp_params ................... False\n",
"[2023-05-12 05:09:51,849] [INFO] [config.py:959:print] autotuning_config ............ {\n",
" \"enabled\": false, \n",
" \"start_step\": null, \n",
" \"end_step\": null, \n",
" \"metric_path\": null, \n",
" \"arg_mappings\": null, \n",
" \"metric\": \"throughput\", \n",
" \"model_info\": null, \n",
" \"results_dir\": \"autotuning_results\", \n",
" \"exps_dir\": \"autotuning_exps\", \n",
" \"overwrite\": true, \n",
" \"fast\": true, \n",
" \"start_profile_step\": 3, \n",
" \"end_profile_step\": 5, \n",
" \"tuner_type\": \"gridsearch\", \n",
" \"tuner_early_stopping\": 5, \n",
" \"tuner_num_trials\": 50, \n",
" \"model_info_path\": null, \n",
" \"mp_size\": 1, \n",
" \"max_train_batch_size\": null, \n",
" \"min_train_batch_size\": 1, \n",
" \"max_train_micro_batch_size_per_gpu\": 1.024000e+03, \n",
" \"min_train_micro_batch_size_per_gpu\": 1, \n",
" \"num_tuning_micro_batch_sizes\": 3\n",
"}\n",
"[2023-05-12 05:09:51,849] [INFO] [config.py:959:print] bfloat16_enabled ............. False\n",
"[2023-05-12 05:09:51,849] [INFO] [config.py:959:print] checkpoint_parallel_write_pipeline False\n",
"[2023-05-12 05:09:51,849] [INFO] [config.py:959:print] checkpoint_tag_validation_enabled True\n",
"[2023-05-12 05:09:51,849] [INFO] [config.py:959:print] checkpoint_tag_validation_fail False\n",
"[2023-05-12 05:09:51,849] [INFO] [config.py:959:print] comms_config ................. \n",
"[2023-05-12 05:09:51,849] [INFO] [config.py:959:print] communication_data_type ...... None\n",
"[2023-05-12 05:09:51,849] [INFO] [config.py:959:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}}\n",
"[2023-05-12 05:09:51,849] [INFO] [config.py:959:print] curriculum_enabled_legacy .... False\n",
"[2023-05-12 05:09:51,849] [INFO] [config.py:959:print] curriculum_params_legacy ..... False\n",
"[2023-05-12 05:09:51,849] [INFO] [config.py:959:print] data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}}\n",
"[2023-05-12 05:09:51,849] [INFO] [config.py:959:print] data_efficiency_enabled ...... False\n",
"[2023-05-12 05:09:51,849] [INFO] [config.py:959:print] dataloader_drop_last ......... False\n",
"[2023-05-12 05:09:51,849] [INFO] [config.py:959:print] disable_allgather ............ False\n",
"[2023-05-12 05:09:51,849] [INFO] [config.py:959:print] dump_state ................... False\n",
"[2023-05-12 05:09:51,849] [INFO] [config.py:959:print] dynamic_loss_scale_args ...... {'init_scale': 65536, 'scale_window': 100, 'delayed_shift': 2, 'min_scale': 1e-10}\n",
"[2023-05-12 05:09:51,849] [INFO] [config.py:959:print] eigenvalue_enabled ........... False\n",
"[2023-05-12 05:09:51,850] [INFO] [config.py:959:print] eigenvalue_gas_boundary_resolution 1\n",
"[2023-05-12 05:09:51,850] [INFO] [config.py:959:print] eigenvalue_layer_name ........ bert.encoder.layer\n",
"[2023-05-12 05:09:51,850] [INFO] [config.py:959:print] eigenvalue_layer_num ......... 0\n",
"[2023-05-12 05:09:51,850] [INFO] [config.py:959:print] eigenvalue_max_iter .......... 100\n",
"[2023-05-12 05:09:51,850] [INFO] [config.py:959:print] eigenvalue_stability ......... 1e-06\n",
"[2023-05-12 05:09:51,850] [INFO] [config.py:959:print] eigenvalue_tol ............... 0.01\n",
"[2023-05-12 05:09:51,850] [INFO] [config.py:959:print] eigenvalue_verbose ........... False\n",
"[2023-05-12 05:09:51,850] [INFO] [config.py:959:print] elasticity_enabled ........... False\n",
"[2023-05-12 05:09:51,850] [INFO] [config.py:959:print] flops_profiler_config ........ {\n",
" \"enabled\": false, \n",
" \"profile_step\": 1, \n",
" \"module_depth\": -1, \n",
" \"top_modules\": 1, \n",
" \"detailed\": true, \n",
" \"output_file\": null\n",
"}\n",
"[2023-05-12 05:09:51,850] [INFO] [config.py:959:print] fp16_auto_cast ............... False\n",
"[2023-05-12 05:09:51,850] [INFO] [config.py:959:print] fp16_enabled ................. True\n",
"[2023-05-12 05:09:51,850] [INFO] [config.py:959:print] fp16_master_weights_and_gradients False\n",
"[2023-05-12 05:09:51,850] [INFO] [config.py:959:print] global_rank .................. 0\n",
"[2023-05-12 05:09:51,850] [INFO] [config.py:959:print] grad_accum_dtype ............. None\n",
"[2023-05-12 05:09:51,850] [INFO] [config.py:959:print] gradient_accumulation_steps .. 1\n",
"[2023-05-12 05:09:51,850] [INFO] [config.py:959:print] gradient_clipping ............ 1.0\n",
"[2023-05-12 05:09:51,850] [INFO] [config.py:959:print] gradient_predivide_factor .... 1.0\n",
"[2023-05-12 05:09:51,850] [INFO] [config.py:959:print] hybrid_engine ................ enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8\n",
"[2023-05-12 05:09:51,850] [INFO] [config.py:959:print] initial_dynamic_scale ........ 65536\n",
"[2023-05-12 05:09:51,850] [INFO] [config.py:959:print] load_universal_checkpoint .... False\n",
"[2023-05-12 05:09:51,850] [INFO] [config.py:959:print] loss_scale ................... 0\n",
"[2023-05-12 05:09:51,850] [INFO] [config.py:959:print] memory_breakdown ............. False\n",
"[2023-05-12 05:09:51,850] [INFO] [config.py:959:print] mics_hierarchial_params_gather False\n",
"[2023-05-12 05:09:51,850] [INFO] [config.py:959:print] mics_shard_size .............. -1\n",
"[2023-05-12 05:09:51,850] [INFO] [config.py:959:print] monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False\n",
"[2023-05-12 05:09:51,850] [INFO] [config.py:959:print] nebula_config ................ {\n",
" \"enabled\": false, \n",
" \"persistent_storage_path\": null, \n",
" \"persistent_time_interval\": 100, \n",
" \"num_of_version_in_retention\": 2, \n",
" \"enable_nebula_load\": true, \n",
" \"load_path\": null\n",
"}\n",
"[2023-05-12 05:09:51,850] [INFO] [config.py:959:print] optimizer_legacy_fusion ...... False\n",
"[2023-05-12 05:09:51,850] [INFO] [config.py:959:print] optimizer_name ............... None\n",
"[2023-05-12 05:09:51,850] [INFO] [config.py:959:print] optimizer_params ............. None\n",
"[2023-05-12 05:09:51,850] [INFO] [config.py:959:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0}\n",
"[2023-05-12 05:09:51,850] [INFO] [config.py:959:print] pld_enabled .................. False\n",
"[2023-05-12 05:09:51,851] [INFO] [config.py:959:print] pld_params ................... False\n",
"[2023-05-12 05:09:51,851] [INFO] [config.py:959:print] prescale_gradients ........... False\n",
"[2023-05-12 05:09:51,851] [INFO] [config.py:959:print] scheduler_name ............... None\n",
"[2023-05-12 05:09:51,851] [INFO] [config.py:959:print] scheduler_params ............. None\n",
"[2023-05-12 05:09:51,851] [INFO] [config.py:959:print] sparse_attention ............. None\n",
"[2023-05-12 05:09:51,851] [INFO] [config.py:959:print] sparse_gradients_enabled ..... False\n",
"[2023-05-12 05:09:51,851] [INFO] [config.py:959:print] steps_per_print .............. 2000\n",
"[2023-05-12 05:09:51,851] [INFO] [config.py:959:print] train_batch_size ............. 1\n",
"[2023-05-12 05:09:51,851] [INFO] [config.py:959:print] train_micro_batch_size_per_gpu 1\n",
"[2023-05-12 05:09:51,851] [INFO] [config.py:959:print] use_node_local_storage ....... False\n",
"[2023-05-12 05:09:51,851] [INFO] [config.py:959:print] wall_clock_breakdown ......... False\n",
"[2023-05-12 05:09:51,851] [INFO] [config.py:959:print] world_size ................... 1\n",
"[2023-05-12 05:09:51,851] [INFO] [config.py:959:print] zero_allow_untested_optimizer True\n",
"[2023-05-12 05:09:51,851] [INFO] [config.py:959:print] zero_config .................. stage=2 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=100000000 allgather_partitions=True allgather_bucket_size=100000000 overlap_comm=True load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=None sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50,000,000 param_persistence_threshold=100,000 model_persistence_threshold=sys.maxsize max_live_parameters=1,000,000,000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False mics_shard_size=-1 mics_hierarchical_params_gather=False memory_efficient_linear=True\n",
"[2023-05-12 05:09:51,851] [INFO] [config.py:959:print] zero_enabled ................. True\n",
"[2023-05-12 05:09:51,851] [INFO] [config.py:959:print] zero_force_ds_cpu_optimizer .. True\n",
"[2023-05-12 05:09:51,851] [INFO] [config.py:959:print] zero_optimization_stage ...... 2\n",
"[2023-05-12 05:09:51,851] [INFO] [config.py:945:print_user_config] json = {\n",
" \"fp16\": {\n",
" \"enabled\": true, \n",
" \"loss_scale\": 0, \n",
" \"loss_scale_window\": 100, \n",
" \"initial_scale_power\": 16, \n",
" \"hysteresis\": 2, \n",
" \"min_loss_scale\": 1e-10\n",
" }, \n",
" \"zero_optimization\": {\n",
" \"stage\": 2, \n",
" \"allgather_partitions\": true, \n",
" \"allgather_bucket_size\": 1.000000e+08, \n",
" \"overlap_comm\": true, \n",
" \"reduce_scatter\": true, \n",
" \"reduce_bucket_size\": 1.000000e+08, \n",
" \"contiguous_gradients\": true\n",
" }, \n",
" \"gradient_accumulation_steps\": 1, \n",
" \"gradient_clipping\": 1.0, \n",
" \"steps_per_print\": 2.000000e+03, \n",
" \"train_batch_size\": 1, \n",
" \"train_micro_batch_size_per_gpu\": 1, \n",
" \"wall_clock_breakdown\": false, \n",
" \"zero_allow_untested_optimizer\": true\n",
"}\n",
"Using /root/.cache/torch_extensions/py310_cu118 as PyTorch extensions root...\n",
"No modifications detected for re-loaded extension module utils, skipping build step...\n",
"Loading extension module utils...\n",
"Time to load utils op: 0.0003616809844970703 seconds\n",
"[INFO|trainer.py:1769] 2023-05-12 05:09:51,853 >> ***** Running training *****\n",
"[INFO|trainer.py:1770] 2023-05-12 05:09:51,853 >> Num examples = 51,179\n",
"[INFO|trainer.py:1771] 2023-05-12 05:09:51,853 >> Num Epochs = 1\n",
"[INFO|trainer.py:1772] 2023-05-12 05:09:51,854 >> Instantaneous batch size per device = 1\n",
"[INFO|trainer.py:1773] 2023-05-12 05:09:51,854 >> Total train batch size (w. parallel, distributed & accumulation) = 1\n",
"[INFO|trainer.py:1774] 2023-05-12 05:09:51,854 >> Gradient Accumulation steps = 1\n",
"[INFO|trainer.py:1775] 2023-05-12 05:09:51,854 >> Total optimization steps = 100\n",
"[INFO|trainer.py:1776] 2023-05-12 05:09:51,857 >> Number of trainable parameters = 429,211,648\n",
" 0% 0/100 [00:00, ?it/s][WARNING|logging.py:295] 2023-05-12 05:09:51,880 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...\n",
"[2023-05-12 05:09:53,218] [INFO] [loss_scaler.py:188:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, but hysteresis is 2. Reducing hysteresis to 1\n",
"{'loss': 1.8672, 'learning_rate': 0.0, 'epoch': 0.0}\n",
" 1% 1/100 [00:01<02:13, 1.35s/it][2023-05-12 05:09:53,556] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, reducing to 32768\n",
" 8% 8/100 [00:04<00:41, 2.20it/s][2023-05-12 05:09:56,507] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 32768, reducing to 16384\n",
"{'loss': 1.5694, 'learning_rate': 9.958100506132127e-05, 'epoch': 0.0}\n",
" 15% 15/100 [00:07<00:35, 2.37it/s][2023-05-12 05:09:59,385] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 16384, reducing to 8192\n",
"{'loss': 1.4827, 'learning_rate': 9.563325576007701e-05, 'epoch': 0.0}\n",
"{'loss': 1.7555, 'learning_rate': 8.675731968536002e-05, 'epoch': 0.0}\n",
"{'loss': 1.5479, 'learning_rate': 7.405929722454026e-05, 'epoch': 0.0}\n",
"{'loss': 1.9587, 'learning_rate': 5.885954957896115e-05, 'epoch': 0.0}\n",
" 50% 50/100 [00:22<00:21, 2.34it/s][INFO|trainer.py:2868] 2023-05-12 05:10:13,882 >> Saving model checkpoint to /content/output_model/checkpoint-50\n",
"[INFO|trainer.py:2880] 2023-05-12 05:10:13,889 >> Trainer.model is not a `PreTrainedModel`, only saving its state dict.\n",
"[INFO|tokenization_utils_base.py:2171] 2023-05-12 05:10:15,418 >> tokenizer config file saved in /content/output_model/checkpoint-50/tokenizer_config.json\n",
"[INFO|tokenization_utils_base.py:2178] 2023-05-12 05:10:15,418 >> Special tokens file saved in /content/output_model/checkpoint-50/special_tokens_map.json\n",
"[2023-05-12 05:10:15,420] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step50 is about to be saved!\n",
"[2023-05-12 05:10:23,879] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: /content/output_model/checkpoint-50/global_step50/mp_rank_00_model_states.pt\n",
"[2023-05-12 05:10:23,879] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving /content/output_model/checkpoint-50/global_step50/mp_rank_00_model_states.pt...\n",
"[2023-05-12 05:11:02,126] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved /content/output_model/checkpoint-50/global_step50/mp_rank_00_model_states.pt.\n",
"[2023-05-12 05:11:02,639] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving /content/output_model/checkpoint-50/global_step50/zero_pp_rank_0_mp_rank_00_optim_states.pt...\n",
"[2023-05-12 05:11:17,041] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved /content/output_model/checkpoint-50/global_step50/zero_pp_rank_0_mp_rank_00_optim_states.pt.\n",
"[2023-05-12 05:11:17,041] [INFO] [engine.py:3228:_save_zero_checkpoint] zero checkpoint saved /content/output_model/checkpoint-50/global_step50/zero_pp_rank_0_mp_rank_00_optim_states.pt\n",
"[2023-05-12 05:11:17,042] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step50 is ready now!\n",
"{'loss': 2.2106, 'learning_rate': 4.27385714377255e-05, 'epoch': 0.0}\n",
"{'loss': 1.7592, 'learning_rate': 2.737264854777306e-05, 'epoch': 0.0}\n",
"{'loss': 1.0934, 'learning_rate': 1.4359554772658552e-05, 'epoch': 0.0}\n",
"{'loss': 1.7842, 'learning_rate': 5.05241294573024e-06, 'epoch': 0.0}\n",
"{'loss': 2.0819, 'learning_rate': 4.189949386787462e-07, 'epoch': 0.0}\n",
"100% 100/100 [01:46<00:00, 2.35it/s][INFO|trainer.py:2868] 2023-05-12 05:11:38,410 >> Saving model checkpoint to /content/output_model/checkpoint-100\n",
"[INFO|trainer.py:2880] 2023-05-12 05:11:38,416 >> Trainer.model is not a `PreTrainedModel`, only saving its state dict.\n",
"[INFO|tokenization_utils_base.py:2171] 2023-05-12 05:11:39,937 >> tokenizer config file saved in /content/output_model/checkpoint-100/tokenizer_config.json\n",
"[INFO|tokenization_utils_base.py:2178] 2023-05-12 05:11:39,937 >> Special tokens file saved in /content/output_model/checkpoint-100/special_tokens_map.json\n",
"[2023-05-12 05:11:39,939] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step100 is about to be saved!\n",
"[2023-05-12 05:11:48,459] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: /content/output_model/checkpoint-100/global_step100/mp_rank_00_model_states.pt\n",
"[2023-05-12 05:11:48,459] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving /content/output_model/checkpoint-100/global_step100/mp_rank_00_model_states.pt...\n",
"[2023-05-12 05:12:27,009] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved /content/output_model/checkpoint-100/global_step100/mp_rank_00_model_states.pt.\n",
"[2023-05-12 05:12:27,603] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving /content/output_model/checkpoint-100/global_step100/zero_pp_rank_0_mp_rank_00_optim_states.pt...\n",
"[2023-05-12 05:12:41,962] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved /content/output_model/checkpoint-100/global_step100/zero_pp_rank_0_mp_rank_00_optim_states.pt.\n",
"[2023-05-12 05:12:41,962] [INFO] [engine.py:3228:_save_zero_checkpoint] zero checkpoint saved /content/output_model/checkpoint-100/global_step100/zero_pp_rank_0_mp_rank_00_optim_states.pt\n",
"[2023-05-12 05:12:41,962] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step100 is ready now!\n",
"[INFO|trainer.py:2039] 2023-05-12 05:12:41,965 >> \n",
"\n",
"Training completed. Do not forget to share your model on huggingface.co/models =)\n",
"\n",
"\n",
"{'train_runtime': 170.1089, 'train_samples_per_second': 0.588, 'train_steps_per_second': 0.588, 'train_loss': 1.72732666015625, 'epoch': 0.0}\n",
"100% 100/100 [02:50<00:00, 1.70s/it]\n",
"[INFO|trainer.py:2868] 2023-05-12 05:12:42,261 >> Saving model checkpoint to /content/output_model\n",
"[INFO|trainer.py:2880] 2023-05-12 05:12:42,271 >> Trainer.model is not a `PreTrainedModel`, only saving its state dict.\n",
"[INFO|tokenization_utils_base.py:2171] 2023-05-12 05:12:44,658 >> tokenizer config file saved in /content/output_model/tokenizer_config.json\n",
"[INFO|tokenization_utils_base.py:2178] 2023-05-12 05:12:44,658 >> Special tokens file saved in /content/output_model/special_tokens_map.json\n",
"***** train metrics *****\n",
" epoch = 0.0\n",
" train_loss = 1.7273\n",
" train_runtime = 0:02:50.10\n",
" train_samples = 51179\n",
" train_samples_per_second = 0.588\n",
" train_steps_per_second = 0.588\n"
]
}
],
"source": [
"!cd Chinese-LLaMA-Alpaca/scripts && torchrun --nnodes 1 --nproc_per_node 1 run_clm_sft_with_peft.py \\\n",
" --deepspeed ds_zero2_no_offload.json \\\n",
" --model_name_or_path decapoda-research/llama-7b-hf \\\n",
" --tokenizer_name_or_path ziqingyang/chinese-alpaca-lora-7b \\\n",
" --dataset_dir /content/Chinese-LLaMA-Alpaca/sft_data \\\n",
" --validation_split_percentage 0.001 \\\n",
" --per_device_train_batch_size 1 \\\n",
" --do_train \\\n",
" --fp16 \\\n",
" --seed $RANDOM \\\n",
" --max_steps 100 \\\n",
" --lr_scheduler_type cosine \\\n",
" --learning_rate 1e-4 \\\n",
" --warmup_ratio 0.03 \\\n",
" --weight_decay 0 \\\n",
" --logging_strategy steps \\\n",
" --logging_steps 10 \\\n",
" --save_strategy steps \\\n",
" --save_total_limit 3 \\\n",
" --save_steps 50 \\\n",
" --gradient_accumulation_steps 1 \\\n",
" --preprocessing_num_workers 8 \\\n",
" --max_seq_length 512 \\\n",
" --output_dir /content/output_model \\\n",
" --overwrite_output_dir \\\n",
" --ddp_timeout 30000 \\\n",
" --logging_first_step True \\\n",
" --torch_dtype float16 \\\n",
" --peft_path ziqingyang/chinese-alpaca-lora-7b \\\n",
" --gradient_checkpointing \\\n",
" --ddp_find_unused_parameters False"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "hvVWDy9YPzG1"
},
"source": [
"After training, rename saved `pytorch_model.bin` to `adapter_model.bin`"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "KnA4qnBCX3ev"
},
"outputs": [],
"source": [
"!mkdir output_model/peft_model\n",
"!mv output_model/pytorch_model.bin output_model/peft_model/adapter_model.bin"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "hDRJlD8sYs7E"
},
"source": [
"Lastly, you need to manually create an `adapter_config.json` under `peft_model` and fill in the hyperparamters such as `lora_rank`, `lora_alpha` etc., whose content and \n",
"format can be referenced from the corresponding file in Chinese-Alpaca-LoRA."
]
}
],
"metadata": {
"accelerator": "GPU",
"colab": {
"gpuType": "A100",
"machine_shape": "hm",
"provenance": []
},
"gpuClass": "standard",
"kernelspec": {
"display_name": "Python 3",
"name": "python3"
},
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
================================================
FILE: notebooks/gradio_web_demo.ipynb
================================================
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "8rclB0ccBp2X"
},
"source": [
""
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "daB8KJRc5VAx"
},
"source": [
"以ChatGPT、GPT-4等为代表的大语言模型(Large Language Model, LLM)掀起了新一轮自然语言处理领域的研究浪潮,展现出了类通用人工智能(AGI)的能力,受到业界广泛关注。然而,由于大语言模型的训练和部署都极为昂贵,为构建透明且开放的学术研究造成了一定的阻碍。\n",
"\n",
"为了促进大模型在中文NLP社区的开放研究,本项目开源了中文LLaMA模型和指令精调的Alpaca大模型。这些模型在原版LLaMA的基础上扩充了中文词表并使用了中文数据进行二次预训练,进一步提升了中文基础语义理解能力。同时,中文Alpaca模型进一步使用了中文指令数据进行精调,显著提升了模型对指令的理解和执行能力。详细内容请参考技术报告[(Cui, Yang, and Yao, 2023)](https://arxiv.org/abs/2304.08177)。"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"注意:\n",
"\n",
"1. 该demo中默认加载了由第三方转换的LLaMA权重。本项目不对其合规性和正确性负责。\n",
"2. 该demo默认加载Chinese-Alpaca-7B模型。若想尝试其他模型,请先阅读[模型合并与转换](https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/模型合并与转换)。"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "8w6hgNddOTwH"
},
"source": [
"克隆项目源码"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "rcVpwUNCBX66"
},
"outputs": [],
"source": [
"!git clone https://github.com/ymcui/Chinese-LLaMA-Alpaca.git"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "ifMwZ6aRByXr"
},
"source": [
"安装依赖"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "b3jLONDhB1JZ"
},
"outputs": [],
"source": [
"!pip install -r Chinese-LLaMA-Alpaca/requirements.txt\n",
"!pip install peft==0.3.0\n",
"!pip install gradio\n",
"!pip install sentencepiece"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "t5IAVbuAOmWg"
},
"source": [
"从Hugging Face下载llama权重和chinese llama lora权重"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "FwhwVq0FHTQo"
},
"outputs": [],
"source": [
"!git clone https://huggingface.co/ziqingyang/chinese-alpaca-lora-7b\n",
"!git clone https://huggingface.co/elinas/llama-7b-hf-transformers-4.29"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "iR6CkYAUCW2h"
},
"source": [
"启动web demo,如果没有申请到T4 colab,则添加--only_cpu选项,如果是T4机器,则使用--load_in_8bit选项,如果是V100、A100机器则不需要,选用load_in_8bits需安装bitsandbytes"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "CuTPG8YDNMu-"
},
"outputs": [],
"source": [
"!pip install bitsandbytes"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "HVWjCNMjCp0p"
},
"outputs": [],
"source": [
"!python Chinese-LLaMA-Alpaca/scripts/inference/gradio_demo.py --base_model llama-7b-hf-transformers-4.29 --lora_model chinese-alpaca-lora-7b --load_in_8bit"
]
}
],
"metadata": {
"accelerator": "GPU",
"colab": {
"gpuType": "T4",
"provenance": []
},
"gpuClass": "standard",
"kernelspec": {
"display_name": "Python 3",
"name": "python3"
},
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
================================================
FILE: notebooks/legacy/convert_and_quantize_chinese_alpaca_plus.ipynb
================================================
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "B1c96_k3MahN"
},
"source": [
"# 转换并量化中文Alpaca Plus模型\n",
"\n",
"关于其他模型请参考另一个notebook:https://colab.research.google.com/drive/1Eak6azD3MLeb-YsfbP8UZC8wrL1ddIMI?usp=sharing\n",
"\n",
"\n",
"🎉🎉🎉 **新:现在免费用户也有机会能够转换7B和13B模型了!**\n",
"\n",
"💡 提示和小窍门:\n",
"- 免费用户默认的内存只有12G左右,**笔者用免费账号实测选择TPU的话有机会随机出35G内存**,建议多试几次。如果能随机出25G内存以上的机器就可以了转换7B模型了,35G内存以上机器就能转换13B模型了\n",
"- Pro(+)用户请选择 “代码执行程序” -> “更改运行时类型” -> “高RAM”\n",
"- 实测:转换7B级别模型,25G内存的机器就够了;转换13B级别模型需要30G以上的内存(程序莫名崩掉或断开连接就说明内存爆了)\n",
"- 如果选了“高RAM”之后内存还是不够大的话,选择以下操作,有的时候会分配出很高内存的机器,祝你好运😄!\n",
" - 可以把GPU或者TPU也选上(虽然不会用到)\n",
" - 选GPU时,Pro用户可选“高级”类型GPU\n",
"\n",
"以下信息配置信息供参考(Pro订阅下测试),运行时规格设置为“高RAM”时的设备配置如下(有随机性):\n",
"\n",
"| 硬件加速器 | RAM | 硬盘 |\n",
"| :-- | :--: | :--: |\n",
"| None | 25GB | 225GB |\n",
"| TPU | 35GB | 225GB |\n",
"| GPU(标准,T4)| 25GB | 166GB |\n",
"| GPU(高性能,V100)| 25GB | 166GB |\n",
"| GPU(高性能,A100)| **80GB** | 166GB |\n",
"\n",
"*温馨提示:用完之后注意断开运行时,选择满足要求的最低配置即可,避免不必要的计算单元消耗(Pro只给100个计算单元)。*"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "vScqHD_jMFOV"
},
"source": [
"## 安装相关依赖"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "E5WKFJXIL6ZU",
"outputId": "87a89bed-053e-4e61-e2f8-1dfcbdf87fbf"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n",
"Collecting torch==1.12.0\n",
" Downloading torch-1.12.0-cp310-cp310-manylinux1_x86_64.whl (776.3 MB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m776.3/776.3 MB\u001b[0m \u001b[31m1.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hRequirement already satisfied: typing-extensions in /usr/local/lib/python3.10/dist-packages (from torch==1.12.0) (4.5.0)\n",
"Installing collected packages: torch\n",
" Attempting uninstall: torch\n",
" Found existing installation: torch 2.0.0+cu118\n",
" Uninstalling torch-2.0.0+cu118:\n",
" Successfully uninstalled torch-2.0.0+cu118\n",
"\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n",
"torchvision 0.15.1+cu118 requires torch==2.0.0, but you have torch 1.12.0 which is incompatible.\n",
"torchtext 0.15.1 requires torch==2.0.0, but you have torch 1.12.0 which is incompatible.\n",
"torchdata 0.6.0 requires torch==2.0.0, but you have torch 1.12.0 which is incompatible.\n",
"torchaudio 2.0.1+cu118 requires torch==2.0.0, but you have torch 1.12.0 which is incompatible.\n",
"peft 0.2.0 requires torch>=1.13.0, but you have torch 1.12.0 which is incompatible.\u001b[0m\u001b[31m\n",
"\u001b[0mSuccessfully installed torch-1.12.0\n",
"Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n",
"Requirement already satisfied: transformers in /usr/local/lib/python3.10/dist-packages (4.28.1)\n",
"Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from transformers) (2.27.1)\n",
"Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.10/dist-packages (from transformers) (4.65.0)\n",
"Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from transformers) (23.1)\n",
"Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.10/dist-packages (from transformers) (2022.10.31)\n",
"Requirement already satisfied: tokenizers!=0.11.3,<0.14,>=0.11.1 in /usr/local/lib/python3.10/dist-packages (from transformers) (0.13.3)\n",
"Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.10/dist-packages (from transformers) (1.22.4)\n",
"Requirement already satisfied: huggingface-hub<1.0,>=0.11.0 in /usr/local/lib/python3.10/dist-packages (from transformers) (0.14.1)\n",
"Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from transformers) (3.12.0)\n",
"Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.10/dist-packages (from transformers) (6.0)\n",
"Requirement already satisfied: fsspec in /usr/local/lib/python3.10/dist-packages (from huggingface-hub<1.0,>=0.11.0->transformers) (2023.4.0)\n",
"Requirement already satisfied: typing-extensions>=3.7.4.3 in /usr/local/lib/python3.10/dist-packages (from huggingface-hub<1.0,>=0.11.0->transformers) (4.5.0)\n",
"Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests->transformers) (3.4)\n",
"Requirement already satisfied: charset-normalizer~=2.0.0 in /usr/local/lib/python3.10/dist-packages (from requests->transformers) (2.0.12)\n",
"Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests->transformers) (1.26.15)\n",
"Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests->transformers) (2022.12.7)\n",
"Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n",
"Collecting git+https://github.com/huggingface/peft\n",
" Cloning https://github.com/huggingface/peft to /tmp/pip-req-build-tnxzt7q0\n",
" Running command git clone --filter=blob:none --quiet https://github.com/huggingface/peft /tmp/pip-req-build-tnxzt7q0\n",
" Resolved https://github.com/huggingface/peft to commit 632997d1fb776c3cf05d8c2537ac9a98a7ce9435\n",
" Installing build dependencies ... \u001b[?25l\u001b[?25hdone\n",
" Getting requirements to build wheel ... \u001b[?25l\u001b[?25hdone\n",
" Preparing metadata (pyproject.toml) ... \u001b[?25l\u001b[?25hdone\n",
"Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from peft==0.3.0.dev0) (23.1)\n",
"Requirement already satisfied: accelerate in /usr/local/lib/python3.10/dist-packages (from peft==0.3.0.dev0) (0.18.0)\n",
"Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.10/dist-packages (from peft==0.3.0.dev0) (1.22.4)\n",
"Collecting torch>=1.13.0\n",
" Downloading torch-2.0.0-cp310-cp310-manylinux1_x86_64.whl (619.9 MB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m619.9/619.9 MB\u001b[0m \u001b[31m1.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hRequirement already satisfied: pyyaml in /usr/local/lib/python3.10/dist-packages (from peft==0.3.0.dev0) (6.0)\n",
"Requirement already satisfied: psutil in /usr/local/lib/python3.10/dist-packages (from peft==0.3.0.dev0) (5.9.5)\n",
"Requirement already satisfied: transformers in /usr/local/lib/python3.10/dist-packages (from peft==0.3.0.dev0) (4.28.1)\n",
"Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch>=1.13.0->peft==0.3.0.dev0) (3.1)\n",
"Collecting nvidia-cufft-cu11==10.9.0.58\n",
" Downloading nvidia_cufft_cu11-10.9.0.58-py3-none-manylinux1_x86_64.whl (168.4 MB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m168.4/168.4 MB\u001b[0m \u001b[31m4.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hCollecting nvidia-cudnn-cu11==8.5.0.96\n",
" Downloading nvidia_cudnn_cu11-8.5.0.96-2-py3-none-manylinux1_x86_64.whl (557.1 MB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m557.1/557.1 MB\u001b[0m \u001b[31m2.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hRequirement already satisfied: triton==2.0.0 in /usr/local/lib/python3.10/dist-packages (from torch>=1.13.0->peft==0.3.0.dev0) (2.0.0)\n",
"Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from torch>=1.13.0->peft==0.3.0.dev0) (3.12.0)\n",
"Collecting nvidia-cuda-runtime-cu11==11.7.99\n",
" Downloading nvidia_cuda_runtime_cu11-11.7.99-py3-none-manylinux1_x86_64.whl (849 kB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m849.3/849.3 kB\u001b[0m \u001b[31m48.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hRequirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from torch>=1.13.0->peft==0.3.0.dev0) (3.1.2)\n",
"Collecting nvidia-nccl-cu11==2.14.3\n",
" Downloading nvidia_nccl_cu11-2.14.3-py3-none-manylinux1_x86_64.whl (177.1 MB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m177.1/177.1 MB\u001b[0m \u001b[31m5.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hRequirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from torch>=1.13.0->peft==0.3.0.dev0) (1.11.1)\n",
"Collecting nvidia-cusparse-cu11==11.7.4.91\n",
" Downloading nvidia_cusparse_cu11-11.7.4.91-py3-none-manylinux1_x86_64.whl (173.2 MB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m173.2/173.2 MB\u001b[0m \u001b[31m5.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hCollecting nvidia-cublas-cu11==11.10.3.66\n",
" Downloading nvidia_cublas_cu11-11.10.3.66-py3-none-manylinux1_x86_64.whl (317.1 MB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m317.1/317.1 MB\u001b[0m \u001b[31m3.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hCollecting nvidia-nvtx-cu11==11.7.91\n",
" Downloading nvidia_nvtx_cu11-11.7.91-py3-none-manylinux1_x86_64.whl (98 kB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m98.6/98.6 kB\u001b[0m \u001b[31m10.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hRequirement already satisfied: typing-extensions in /usr/local/lib/python3.10/dist-packages (from torch>=1.13.0->peft==0.3.0.dev0) (4.5.0)\n",
"Collecting nvidia-curand-cu11==10.2.10.91\n",
" Downloading nvidia_curand_cu11-10.2.10.91-py3-none-manylinux1_x86_64.whl (54.6 MB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m54.6/54.6 MB\u001b[0m \u001b[31m24.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hCollecting nvidia-cusolver-cu11==11.4.0.1\n",
" Downloading nvidia_cusolver_cu11-11.4.0.1-2-py3-none-manylinux1_x86_64.whl (102.6 MB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m102.6/102.6 MB\u001b[0m \u001b[31m8.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hCollecting nvidia-cuda-nvrtc-cu11==11.7.99\n",
" Downloading nvidia_cuda_nvrtc_cu11-11.7.99-2-py3-none-manylinux1_x86_64.whl (21.0 MB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m21.0/21.0 MB\u001b[0m \u001b[31m63.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hCollecting nvidia-cuda-cupti-cu11==11.7.101\n",
" Downloading nvidia_cuda_cupti_cu11-11.7.101-py3-none-manylinux1_x86_64.whl (11.8 MB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m11.8/11.8 MB\u001b[0m \u001b[31m75.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hRequirement already satisfied: wheel in /usr/local/lib/python3.10/dist-packages (from nvidia-cublas-cu11==11.10.3.66->torch>=1.13.0->peft==0.3.0.dev0) (0.40.0)\n",
"Requirement already satisfied: setuptools in /usr/local/lib/python3.10/dist-packages (from nvidia-cublas-cu11==11.10.3.66->torch>=1.13.0->peft==0.3.0.dev0) (67.7.2)\n",
"Requirement already satisfied: cmake in /usr/local/lib/python3.10/dist-packages (from triton==2.0.0->torch>=1.13.0->peft==0.3.0.dev0) (3.25.2)\n",
"Requirement already satisfied: lit in /usr/local/lib/python3.10/dist-packages (from triton==2.0.0->torch>=1.13.0->peft==0.3.0.dev0) (16.0.2)\n",
"Requirement already satisfied: huggingface-hub<1.0,>=0.11.0 in /usr/local/lib/python3.10/dist-packages (from transformers->peft==0.3.0.dev0) (0.14.1)\n",
"Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.10/dist-packages (from transformers->peft==0.3.0.dev0) (2022.10.31)\n",
"Requirement already satisfied: tokenizers!=0.11.3,<0.14,>=0.11.1 in /usr/local/lib/python3.10/dist-packages (from transformers->peft==0.3.0.dev0) (0.13.3)\n",
"Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.10/dist-packages (from transformers->peft==0.3.0.dev0) (4.65.0)\n",
"Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from transformers->peft==0.3.0.dev0) (2.27.1)\n",
"Requirement already satisfied: fsspec in /usr/local/lib/python3.10/dist-packages (from huggingface-hub<1.0,>=0.11.0->transformers->peft==0.3.0.dev0) (2023.4.0)\n",
"Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->torch>=1.13.0->peft==0.3.0.dev0) (2.1.2)\n",
"Requirement already satisfied: charset-normalizer~=2.0.0 in /usr/local/lib/python3.10/dist-packages (from requests->transformers->peft==0.3.0.dev0) (2.0.12)\n",
"Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests->transformers->peft==0.3.0.dev0) (2022.12.7)\n",
"Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests->transformers->peft==0.3.0.dev0) (3.4)\n",
"Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests->transformers->peft==0.3.0.dev0) (1.26.15)\n",
"Requirement already satisfied: mpmath>=0.19 in /usr/local/lib/python3.10/dist-packages (from sympy->torch>=1.13.0->peft==0.3.0.dev0) (1.3.0)\n",
"Building wheels for collected packages: peft\n",
" Building wheel for peft (pyproject.toml) ... \u001b[?25l\u001b[?25hdone\n",
" Created wheel for peft: filename=peft-0.3.0.dev0-py3-none-any.whl size=55537 sha256=3cc2a65c09926ac217ac671b7d9c1640eac9857f0aca55b78a9fcda484263073\n",
" Stored in directory: /tmp/pip-ephem-wheel-cache-1rjlvx70/wheels/4c/16/67/1002a2d4daa822eff130e6d85b90051b75d2ce0d26b9448e4a\n",
"Successfully built peft\n",
"Installing collected packages: nvidia-nvtx-cu11, nvidia-nccl-cu11, nvidia-cusparse-cu11, nvidia-curand-cu11, nvidia-cufft-cu11, nvidia-cuda-runtime-cu11, nvidia-cuda-nvrtc-cu11, nvidia-cuda-cupti-cu11, nvidia-cublas-cu11, nvidia-cusolver-cu11, nvidia-cudnn-cu11, torch, peft\n",
" Attempting uninstall: torch\n",
" Found existing installation: torch 1.12.0\n",
" Uninstalling torch-1.12.0:\n",
" Successfully uninstalled torch-1.12.0\n",
" Attempting uninstall: peft\n",
" Found existing installation: peft 0.2.0\n",
" Uninstalling peft-0.2.0:\n",
" Successfully uninstalled peft-0.2.0\n",
"Successfully installed nvidia-cublas-cu11-11.10.3.66 nvidia-cuda-cupti-cu11-11.7.101 nvidia-cuda-nvrtc-cu11-11.7.99 nvidia-cuda-runtime-cu11-11.7.99 nvidia-cudnn-cu11-8.5.0.96 nvidia-cufft-cu11-10.9.0.58 nvidia-curand-cu11-10.2.10.91 nvidia-cusolver-cu11-11.4.0.1 nvidia-cusparse-cu11-11.7.4.91 nvidia-nccl-cu11-2.14.3 nvidia-nvtx-cu11-11.7.91 peft-0.3.0.dev0 torch-2.0.0\n",
"Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n",
"Requirement already satisfied: sentencepiece in /usr/local/lib/python3.10/dist-packages (0.1.98)\n"
]
}
],
"source": [
"!pip install torch==1.12.0\n",
"!pip install transformers\n",
"!pip install git+https://github.com/huggingface/peft\n",
"!pip install sentencepiece"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ygb1xFIMNQKw"
},
"source": [
"## 克隆目录和代码"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "yCEJh7NJNXz9",
"outputId": "ec16f31b-7af7-4eb8-82ce-5f9317bad941"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Cloning into 'Chinese-LLaMA-Alpaca'...\n",
"remote: Enumerating objects: 761, done.\u001b[K\n",
"remote: Counting objects: 100% (202/202), done.\u001b[K\n",
"remote: Compressing objects: 100% (172/172), done.\u001b[K\n",
"remote: Total 761 (delta 54), reused 69 (delta 29), pack-reused 559\u001b[K\n",
"Receiving objects: 100% (761/761), 11.16 MiB | 22.49 MiB/s, done.\n",
"Resolving deltas: 100% (444/444), done.\n",
"Cloning into 'llama.cpp'...\n",
"remote: Enumerating objects: 2086, done.\u001b[K\n",
"remote: Counting objects: 100% (842/842), done.\u001b[K\n",
"remote: Compressing objects: 100% (99/99), done.\u001b[K\n",
"remote: Total 2086 (delta 778), reused 756 (delta 743), pack-reused 1244\u001b[K\n",
"Receiving objects: 100% (2086/2086), 2.12 MiB | 16.33 MiB/s, done.\n",
"Resolving deltas: 100% (1345/1345), done.\n"
]
}
],
"source": [
"!git clone https://github.com/ymcui/Chinese-LLaMA-Alpaca\n",
"!git clone https://github.com/ggerganov/llama.cpp"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "nIyxX0DSNsgQ"
},
"source": [
"## 合并模型(Alpaca-Plus-7B)\n",
"\n",
"**⚠️ 再次提醒:7B模型需要25G内存,13B模型需要35G+内存。**\n",
"\n",
"此处使用的是🤗模型库中提供的基模型(已是HF格式),而不是Facebook官方的LLaMA模型,因此略去将原版LLaMA转换为HF格式的步骤。\n",
"\n",
"**这里直接运行第二步:合并LoRA权重**,生成全量模型权重。可以直接指定🤗模型库的地址,也可以是本地存放地址。\n",
"- 基模型:`decapoda-research/llama-7b-hf` *(use at your own risk)*\n",
"- LoRA模型:先写`ziqingyang/chinese-llama-plus-lora-7b`然后再写`ziqingyang/chinese-alpaca-plus-lora-7b`\n",
"- 输出类型:因为后续要量化,这里将`output_type`设置为`pth`\n",
"\n",
"💡 转换13B模型提示:\n",
"- 请将参数`--base_model`和`--lora_model`中的的`7b`改为`13b`即可\n",
"- **免费用户必须增加一个参数`--offload_dir`以缓解内存压力**,例如`--offload_dir ./offload_temp`\n",
"\n",
"该过程比较耗时(下载+转换),需要几分钟到十几分钟不等,请耐心等待。\n",
"转换好的模型存放在`alpaca-combined`目录。\n",
"如果你不需要量化模型,那么到这一步就结束了。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "5AV4EW5hNhVV",
"outputId": "91901b82-88c4-405d-cf86-32f1a3a60467"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"2023-04-28 08:07:00.276520: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT\n",
"Base model: decapoda-research/llama-7b-hf\n",
"LoRA model(s) ['ziqingyang/chinese-llama-plus-lora-7b', 'ziqingyang/chinese-alpaca-plus-lora-7b']:\n",
"Loading checkpoint shards: 100% 33/33 [01:18<00:00, 2.39s/it]\n",
"Peft version: 0.3.0.dev0\n",
"Loading LoRA for 7B model\n",
"Loading LoRA ziqingyang/chinese-llama-plus-lora-7b\n",
"Extended vocabulary size to 49953\n",
"Downloading (…)/adapter_config.json: 100% 420/420 [00:00<00:00, 1.61MB/s]\n",
"Downloading adapter_model.bin: 100% 858M/858M [00:04<00:00, 185MB/s]\n",
"Merging with merge_and_unload...\n",
"Loading LoRA ziqingyang/chinese-alpaca-plus-lora-7b\n",
"Downloading tokenizer.model: 100% 758k/758k [00:00<00:00, 13.4MB/s]\n",
"Downloading (…)cial_tokens_map.json: 100% 96.0/96.0 [00:00<00:00, 535kB/s]\n",
"Downloading (…)okenizer_config.json: 100% 166/166 [00:00<00:00, 854kB/s]\n",
"Extended vocabulary size to 49954\n",
"Downloading (…)/adapter_config.json: 100% 423/423 [00:00<00:00, 2.31MB/s]\n",
"Downloading adapter_model.bin: 100% 1.14G/1.14G [00:16<00:00, 70.6MB/s]\n",
"Merging with merge_and_unload...\n",
"Saving to pth format...\n",
"Saving shard 1 of 1 into alpaca-combined/consolidated.00.pth\n"
]
}
],
"source": [
"!python ./Chinese-LLaMA-Alpaca/scripts/merge_llama_with_chinese_lora.py \\\n",
" --base_model decapoda-research/llama-7b-hf \\\n",
" --lora_model ziqingyang/chinese-llama-plus-lora-7b,ziqingyang/chinese-alpaca-plus-lora-7b \\\n",
" --output_type pth \\\n",
" --output_dir alpaca-combined"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ueexcKo-Q_EW"
},
"source": [
"## 量化模型\n",
"接下来我们使用[llama.cpp](https://github.com/ggerganov/llama.cpp)工具对上一步生成的全量版本权重进行转换,生成4-bit量化模型。\n",
"\n",
"### 编译工具\n",
"\n",
"首先对llama.cpp工具进行编译。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "_GbjsT2wRRCR",
"outputId": "2b4f2a38-d22d-4764-9a81-bad8bd72b7fe"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"I llama.cpp build info: \n",
"I UNAME_S: Linux\n",
"I UNAME_P: x86_64\n",
"I UNAME_M: x86_64\n",
"I CFLAGS: -I. -O3 -DNDEBUG -std=c11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -pthread -march=native -mtune=native\n",
"I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native\n",
"I LDFLAGS: \n",
"I CC: cc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0\n",
"I CXX: g++ (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0\n",
"\n",
"cc -I. -O3 -DNDEBUG -std=c11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -pthread -march=native -mtune=native -c ggml.c -o ggml.o\n",
"g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -c llama.cpp -o llama.o\n",
"g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -c examples/common.cpp -o common.o\n",
"g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native examples/main/main.cpp ggml.o llama.o common.o -o main \n",
"\n",
"==== Run ./main -h for help. ====\n",
"\n",
"g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native examples/quantize/quantize.cpp ggml.o llama.o -o quantize \n",
"g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native examples/quantize-stats/quantize-stats.cpp ggml.o llama.o -o quantize-stats \n",
"g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native examples/perplexity/perplexity.cpp ggml.o llama.o common.o -o perplexity \n",
"g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native examples/embedding/embedding.cpp ggml.o llama.o common.o -o embedding \n",
"g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native pocs/vdot/vdot.cpp ggml.o -o vdot \n"
]
}
],
"source": [
"!cd llama.cpp && make"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "gw2xpYC0RcQC"
},
"source": [
"### 模型转换为ggml格式(FP16)\n",
"\n",
"这一步,我们将模型转换为ggml格式(FP16)。\n",
"- 在这之前需要把`alpaca-combined`目录挪个位置,把模型文件放到`llama.cpp/zh-models/7B`下,把`tokenizer.model`放到`llama.cpp/zh-models`\n",
"- tokenizer在哪里?\n",
" - `alpaca-combined`目录下有\n",
" - 或者从以下网址下载:https://huggingface.co/ziqingyang/chinese-alpaca-lora-7b/resolve/main/tokenizer.model (注意,Alpaca和LLaMA的`tokenizer.model`不能混用!)\n",
"\n",
"💡 转换13B模型提示:\n",
"- tokenizer可以直接用7B的,13B和7B的相同\n",
"- Alpaca和LLaMA的`tokenizer.model`不能混用!\n",
"- 以下看到7B字样的都是文件夹名,与转换过程没有关系了,改不改都行"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "5KgnFVStRjio",
"outputId": "19293a4a-a400-4cd3-c98b-80022dcd1f35"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"7B tokenizer.model\n"
]
}
],
"source": [
"!cd llama.cpp && mkdir zh-models && mv ../alpaca-combined zh-models/7B\n",
"!mv llama.cpp/zh-models/7B/tokenizer.model llama.cpp/zh-models/\n",
"!ls llama.cpp/zh-models/"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "NUHeoTMQS1AQ",
"outputId": "378b70db-d13b-4aa9-8bb0-a1fc1cd4b13f"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Loading model file zh-models/7B/consolidated.00.pth\n",
"Loading vocab file zh-models/tokenizer.model\n",
"Writing vocab...\n",
"[ 1/291] Writing tensor tok_embeddings.weight | size 49954 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 2/291] Writing tensor norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[ 3/291] Writing tensor output.weight | size 49954 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 4/291] Writing tensor layers.0.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 5/291] Writing tensor layers.0.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 6/291] Writing tensor layers.0.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 7/291] Writing tensor layers.0.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 8/291] Writing tensor layers.0.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[ 9/291] Writing tensor layers.0.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 10/291] Writing tensor layers.0.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')\n",
"[ 11/291] Writing tensor layers.0.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 12/291] Writing tensor layers.0.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[ 13/291] Writing tensor layers.1.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 14/291] Writing tensor layers.1.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 15/291] Writing tensor layers.1.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 16/291] Writing tensor layers.1.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 17/291] Writing tensor layers.1.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[ 18/291] Writing tensor layers.1.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 19/291] Writing tensor layers.1.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')\n",
"[ 20/291] Writing tensor layers.1.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 21/291] Writing tensor layers.1.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[ 22/291] Writing tensor layers.2.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 23/291] Writing tensor layers.2.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 24/291] Writing tensor layers.2.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 25/291] Writing tensor layers.2.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 26/291] Writing tensor layers.2.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[ 27/291] Writing tensor layers.2.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 28/291] Writing tensor layers.2.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')\n",
"[ 29/291] Writing tensor layers.2.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 30/291] Writing tensor layers.2.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[ 31/291] Writing tensor layers.3.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 32/291] Writing tensor layers.3.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 33/291] Writing tensor layers.3.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 34/291] Writing tensor layers.3.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 35/291] Writing tensor layers.3.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[ 36/291] Writing tensor layers.3.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 37/291] Writing tensor layers.3.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')\n",
"[ 38/291] Writing tensor layers.3.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 39/291] Writing tensor layers.3.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[ 40/291] Writing tensor layers.4.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 41/291] Writing tensor layers.4.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 42/291] Writing tensor layers.4.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 43/291] Writing tensor layers.4.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 44/291] Writing tensor layers.4.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[ 45/291] Writing tensor layers.4.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 46/291] Writing tensor layers.4.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')\n",
"[ 47/291] Writing tensor layers.4.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 48/291] Writing tensor layers.4.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[ 49/291] Writing tensor layers.5.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 50/291] Writing tensor layers.5.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 51/291] Writing tensor layers.5.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 52/291] Writing tensor layers.5.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 53/291] Writing tensor layers.5.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[ 54/291] Writing tensor layers.5.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 55/291] Writing tensor layers.5.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')\n",
"[ 56/291] Writing tensor layers.5.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 57/291] Writing tensor layers.5.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[ 58/291] Writing tensor layers.6.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 59/291] Writing tensor layers.6.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 60/291] Writing tensor layers.6.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 61/291] Writing tensor layers.6.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 62/291] Writing tensor layers.6.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[ 63/291] Writing tensor layers.6.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 64/291] Writing tensor layers.6.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')\n",
"[ 65/291] Writing tensor layers.6.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 66/291] Writing tensor layers.6.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[ 67/291] Writing tensor layers.7.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 68/291] Writing tensor layers.7.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 69/291] Writing tensor layers.7.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 70/291] Writing tensor layers.7.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 71/291] Writing tensor layers.7.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[ 72/291] Writing tensor layers.7.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 73/291] Writing tensor layers.7.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')\n",
"[ 74/291] Writing tensor layers.7.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 75/291] Writing tensor layers.7.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[ 76/291] Writing tensor layers.8.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 77/291] Writing tensor layers.8.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 78/291] Writing tensor layers.8.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 79/291] Writing tensor layers.8.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 80/291] Writing tensor layers.8.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[ 81/291] Writing tensor layers.8.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 82/291] Writing tensor layers.8.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')\n",
"[ 83/291] Writing tensor layers.8.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 84/291] Writing tensor layers.8.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[ 85/291] Writing tensor layers.9.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 86/291] Writing tensor layers.9.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 87/291] Writing tensor layers.9.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 88/291] Writing tensor layers.9.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 89/291] Writing tensor layers.9.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[ 90/291] Writing tensor layers.9.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 91/291] Writing tensor layers.9.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')\n",
"[ 92/291] Writing tensor layers.9.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 93/291] Writing tensor layers.9.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[ 94/291] Writing tensor layers.10.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 95/291] Writing tensor layers.10.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 96/291] Writing tensor layers.10.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 97/291] Writing tensor layers.10.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[ 98/291] Writing tensor layers.10.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[ 99/291] Writing tensor layers.10.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[100/291] Writing tensor layers.10.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')\n",
"[101/291] Writing tensor layers.10.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[102/291] Writing tensor layers.10.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[103/291] Writing tensor layers.11.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[104/291] Writing tensor layers.11.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[105/291] Writing tensor layers.11.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[106/291] Writing tensor layers.11.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[107/291] Writing tensor layers.11.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[108/291] Writing tensor layers.11.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[109/291] Writing tensor layers.11.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')\n",
"[110/291] Writing tensor layers.11.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[111/291] Writing tensor layers.11.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[112/291] Writing tensor layers.12.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[113/291] Writing tensor layers.12.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[114/291] Writing tensor layers.12.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[115/291] Writing tensor layers.12.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[116/291] Writing tensor layers.12.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[117/291] Writing tensor layers.12.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[118/291] Writing tensor layers.12.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')\n",
"[119/291] Writing tensor layers.12.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[120/291] Writing tensor layers.12.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[121/291] Writing tensor layers.13.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[122/291] Writing tensor layers.13.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[123/291] Writing tensor layers.13.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[124/291] Writing tensor layers.13.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[125/291] Writing tensor layers.13.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[126/291] Writing tensor layers.13.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[127/291] Writing tensor layers.13.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')\n",
"[128/291] Writing tensor layers.13.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[129/291] Writing tensor layers.13.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[130/291] Writing tensor layers.14.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[131/291] Writing tensor layers.14.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[132/291] Writing tensor layers.14.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[133/291] Writing tensor layers.14.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[134/291] Writing tensor layers.14.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[135/291] Writing tensor layers.14.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[136/291] Writing tensor layers.14.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')\n",
"[137/291] Writing tensor layers.14.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[138/291] Writing tensor layers.14.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[139/291] Writing tensor layers.15.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[140/291] Writing tensor layers.15.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[141/291] Writing tensor layers.15.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[142/291] Writing tensor layers.15.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[143/291] Writing tensor layers.15.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[144/291] Writing tensor layers.15.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[145/291] Writing tensor layers.15.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')\n",
"[146/291] Writing tensor layers.15.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[147/291] Writing tensor layers.15.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[148/291] Writing tensor layers.16.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[149/291] Writing tensor layers.16.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[150/291] Writing tensor layers.16.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[151/291] Writing tensor layers.16.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[152/291] Writing tensor layers.16.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[153/291] Writing tensor layers.16.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[154/291] Writing tensor layers.16.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')\n",
"[155/291] Writing tensor layers.16.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[156/291] Writing tensor layers.16.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[157/291] Writing tensor layers.17.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[158/291] Writing tensor layers.17.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[159/291] Writing tensor layers.17.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[160/291] Writing tensor layers.17.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[161/291] Writing tensor layers.17.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[162/291] Writing tensor layers.17.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[163/291] Writing tensor layers.17.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')\n",
"[164/291] Writing tensor layers.17.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[165/291] Writing tensor layers.17.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[166/291] Writing tensor layers.18.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[167/291] Writing tensor layers.18.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[168/291] Writing tensor layers.18.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[169/291] Writing tensor layers.18.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[170/291] Writing tensor layers.18.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[171/291] Writing tensor layers.18.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[172/291] Writing tensor layers.18.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')\n",
"[173/291] Writing tensor layers.18.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[174/291] Writing tensor layers.18.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[175/291] Writing tensor layers.19.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[176/291] Writing tensor layers.19.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[177/291] Writing tensor layers.19.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[178/291] Writing tensor layers.19.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[179/291] Writing tensor layers.19.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[180/291] Writing tensor layers.19.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[181/291] Writing tensor layers.19.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')\n",
"[182/291] Writing tensor layers.19.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[183/291] Writing tensor layers.19.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[184/291] Writing tensor layers.20.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[185/291] Writing tensor layers.20.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[186/291] Writing tensor layers.20.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[187/291] Writing tensor layers.20.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[188/291] Writing tensor layers.20.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[189/291] Writing tensor layers.20.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[190/291] Writing tensor layers.20.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')\n",
"[191/291] Writing tensor layers.20.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[192/291] Writing tensor layers.20.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[193/291] Writing tensor layers.21.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[194/291] Writing tensor layers.21.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[195/291] Writing tensor layers.21.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[196/291] Writing tensor layers.21.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[197/291] Writing tensor layers.21.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[198/291] Writing tensor layers.21.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[199/291] Writing tensor layers.21.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')\n",
"[200/291] Writing tensor layers.21.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[201/291] Writing tensor layers.21.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[202/291] Writing tensor layers.22.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[203/291] Writing tensor layers.22.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[204/291] Writing tensor layers.22.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[205/291] Writing tensor layers.22.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[206/291] Writing tensor layers.22.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[207/291] Writing tensor layers.22.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[208/291] Writing tensor layers.22.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')\n",
"[209/291] Writing tensor layers.22.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[210/291] Writing tensor layers.22.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[211/291] Writing tensor layers.23.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[212/291] Writing tensor layers.23.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[213/291] Writing tensor layers.23.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[214/291] Writing tensor layers.23.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[215/291] Writing tensor layers.23.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[216/291] Writing tensor layers.23.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[217/291] Writing tensor layers.23.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')\n",
"[218/291] Writing tensor layers.23.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[219/291] Writing tensor layers.23.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[220/291] Writing tensor layers.24.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[221/291] Writing tensor layers.24.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[222/291] Writing tensor layers.24.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[223/291] Writing tensor layers.24.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[224/291] Writing tensor layers.24.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[225/291] Writing tensor layers.24.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[226/291] Writing tensor layers.24.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')\n",
"[227/291] Writing tensor layers.24.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[228/291] Writing tensor layers.24.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[229/291] Writing tensor layers.25.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[230/291] Writing tensor layers.25.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[231/291] Writing tensor layers.25.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[232/291] Writing tensor layers.25.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[233/291] Writing tensor layers.25.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[234/291] Writing tensor layers.25.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[235/291] Writing tensor layers.25.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')\n",
"[236/291] Writing tensor layers.25.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[237/291] Writing tensor layers.25.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[238/291] Writing tensor layers.26.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[239/291] Writing tensor layers.26.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[240/291] Writing tensor layers.26.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[241/291] Writing tensor layers.26.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[242/291] Writing tensor layers.26.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[243/291] Writing tensor layers.26.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[244/291] Writing tensor layers.26.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')\n",
"[245/291] Writing tensor layers.26.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[246/291] Writing tensor layers.26.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[247/291] Writing tensor layers.27.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[248/291] Writing tensor layers.27.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[249/291] Writing tensor layers.27.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[250/291] Writing tensor layers.27.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[251/291] Writing tensor layers.27.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[252/291] Writing tensor layers.27.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[253/291] Writing tensor layers.27.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')\n",
"[254/291] Writing tensor layers.27.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[255/291] Writing tensor layers.27.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[256/291] Writing tensor layers.28.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[257/291] Writing tensor layers.28.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[258/291] Writing tensor layers.28.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[259/291] Writing tensor layers.28.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[260/291] Writing tensor layers.28.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[261/291] Writing tensor layers.28.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[262/291] Writing tensor layers.28.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')\n",
"[263/291] Writing tensor layers.28.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[264/291] Writing tensor layers.28.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[265/291] Writing tensor layers.29.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[266/291] Writing tensor layers.29.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[267/291] Writing tensor layers.29.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[268/291] Writing tensor layers.29.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[269/291] Writing tensor layers.29.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[270/291] Writing tensor layers.29.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[271/291] Writing tensor layers.29.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')\n",
"[272/291] Writing tensor layers.29.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[273/291] Writing tensor layers.29.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[274/291] Writing tensor layers.30.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[275/291] Writing tensor layers.30.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[276/291] Writing tensor layers.30.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[277/291] Writing tensor layers.30.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[278/291] Writing tensor layers.30.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[279/291] Writing tensor layers.30.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[280/291] Writing tensor layers.30.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')\n",
"[281/291] Writing tensor layers.30.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[282/291] Writing tensor layers.30.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[283/291] Writing tensor layers.31.attention.wq.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[284/291] Writing tensor layers.31.attention.wk.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[285/291] Writing tensor layers.31.attention.wv.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[286/291] Writing tensor layers.31.attention.wo.weight | size 4096 x 4096 | type UnquantizedDataType(name='F16')\n",
"[287/291] Writing tensor layers.31.attention_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"[288/291] Writing tensor layers.31.feed_forward.w1.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[289/291] Writing tensor layers.31.feed_forward.w2.weight | size 4096 x 11008 | type UnquantizedDataType(name='F16')\n",
"[290/291] Writing tensor layers.31.feed_forward.w3.weight | size 11008 x 4096 | type UnquantizedDataType(name='F16')\n",
"[291/291] Writing tensor layers.31.ffn_norm.weight | size 4096 | type UnquantizedDataType(name='F32')\n",
"Wrote zh-models/7B/ggml-model-f16.bin\n"
]
}
],
"source": [
"!cd llama.cpp && python convert.py zh-models/7B/"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "hEZEJAVYCHkc"
},
"source": [
"### 将FP16模型量化为8-bit\n",
"\n",
"我们进一步将FP16模型转换为8-bit量化模型。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "2xyais7OUVDI",
"outputId": "b7fe3c62-489a-42e5-927a-8ab6088a3ecc"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"llama.cpp: loading model from ./zh-models/7B/ggml-model-f16.bin\n",
"llama.cpp: saving model to ./zh-models/7B/ggml-model-q4_0.bin\n",
"[ 1/ 291] tok_embeddings.weight - 4096 x 49954, type = f16, quantizing .. size = 390.27 MB -> 219.52 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.225 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 2/ 291] norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 3/ 291] output.weight - 4096 x 49954, type = f16, quantizing .. size = 390.27 MB -> 219.52 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.226 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 4/ 291] layers.0.attention.wq.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.026 0.018 0.028 0.044 0.064 0.088 0.111 0.245 0.111 0.087 0.064 0.044 0.028 0.018 0.026 \n",
"[ 5/ 291] layers.0.attention.wk.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.026 0.017 0.028 0.043 0.063 0.087 0.111 0.250 0.112 0.087 0.063 0.043 0.028 0.017 0.026 \n",
"[ 6/ 291] layers.0.attention.wv.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.019 0.031 0.046 0.065 0.087 0.107 0.237 0.107 0.087 0.065 0.046 0.030 0.019 0.027 \n",
"[ 7/ 291] layers.0.attention.wo.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.026 0.017 0.027 0.042 0.062 0.087 0.113 0.253 0.113 0.087 0.062 0.042 0.027 0.017 0.026 \n",
"[ 8/ 291] layers.0.attention_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 9/ 291] layers.0.feed_forward.w1.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.226 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 10/ 291] layers.0.feed_forward.w2.weight - 11008 x 4096, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.032 0.047 0.067 0.088 0.106 0.226 0.106 0.088 0.067 0.047 0.031 0.020 0.027 \n",
"[ 11/ 291] layers.0.feed_forward.w3.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.032 0.047 0.067 0.088 0.106 0.227 0.106 0.088 0.067 0.047 0.032 0.020 0.027 \n",
"[ 12/ 291] layers.0.ffn_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 13/ 291] layers.1.attention.wq.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.031 0.047 0.067 0.088 0.107 0.228 0.107 0.088 0.067 0.047 0.031 0.020 0.027 \n",
"[ 14/ 291] layers.1.attention.wk.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.019 0.031 0.047 0.067 0.088 0.107 0.229 0.107 0.088 0.067 0.047 0.031 0.019 0.027 \n",
"[ 15/ 291] layers.1.attention.wv.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.019 0.030 0.046 0.066 0.088 0.108 0.235 0.108 0.088 0.065 0.046 0.030 0.019 0.027 \n",
"[ 16/ 291] layers.1.attention.wo.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.026 0.017 0.027 0.042 0.062 0.087 0.113 0.256 0.113 0.086 0.062 0.042 0.027 0.017 0.026 \n",
"[ 17/ 291] layers.1.attention_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 18/ 291] layers.1.feed_forward.w1.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.225 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 19/ 291] layers.1.feed_forward.w2.weight - 11008 x 4096, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.226 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 20/ 291] layers.1.feed_forward.w3.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.226 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 21/ 291] layers.1.ffn_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 22/ 291] layers.2.attention.wq.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.031 0.047 0.067 0.088 0.106 0.227 0.106 0.088 0.067 0.047 0.031 0.020 0.027 \n",
"[ 23/ 291] layers.2.attention.wk.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.019 0.031 0.047 0.066 0.088 0.107 0.231 0.107 0.088 0.066 0.047 0.031 0.019 0.027 \n",
"[ 24/ 291] layers.2.attention.wv.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.031 0.047 0.067 0.087 0.106 0.228 0.106 0.087 0.067 0.047 0.031 0.020 0.027 \n",
"[ 25/ 291] layers.2.attention.wo.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.031 0.047 0.067 0.088 0.107 0.228 0.107 0.088 0.067 0.047 0.031 0.019 0.027 \n",
"[ 26/ 291] layers.2.attention_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 27/ 291] layers.2.feed_forward.w1.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.224 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 28/ 291] layers.2.feed_forward.w2.weight - 11008 x 4096, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.225 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 29/ 291] layers.2.feed_forward.w3.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.225 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 30/ 291] layers.2.ffn_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 31/ 291] layers.3.attention.wq.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.031 0.047 0.067 0.088 0.106 0.228 0.106 0.088 0.067 0.047 0.031 0.020 0.027 \n",
"[ 32/ 291] layers.3.attention.wk.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.031 0.047 0.067 0.088 0.106 0.229 0.106 0.088 0.066 0.047 0.031 0.020 0.027 \n",
"[ 33/ 291] layers.3.attention.wv.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.031 0.047 0.067 0.088 0.106 0.228 0.106 0.088 0.067 0.047 0.032 0.020 0.027 \n",
"[ 34/ 291] layers.3.attention.wo.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.225 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 35/ 291] layers.3.attention_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 36/ 291] layers.3.feed_forward.w1.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.224 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 37/ 291] layers.3.feed_forward.w2.weight - 11008 x 4096, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.225 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 38/ 291] layers.3.feed_forward.w3.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.225 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 39/ 291] layers.3.ffn_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 40/ 291] layers.4.attention.wq.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.031 0.047 0.067 0.088 0.106 0.227 0.106 0.088 0.067 0.047 0.031 0.020 0.027 \n",
"[ 41/ 291] layers.4.attention.wk.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.031 0.047 0.067 0.088 0.106 0.228 0.106 0.088 0.067 0.047 0.031 0.020 0.027 \n",
"[ 42/ 291] layers.4.attention.wv.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.031 0.047 0.067 0.088 0.106 0.228 0.106 0.088 0.067 0.047 0.031 0.020 0.027 \n",
"[ 43/ 291] layers.4.attention.wo.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.047 0.067 0.088 0.106 0.226 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 44/ 291] layers.4.attention_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 45/ 291] layers.4.feed_forward.w1.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.224 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 46/ 291] layers.4.feed_forward.w2.weight - 11008 x 4096, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.225 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 47/ 291] layers.4.feed_forward.w3.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.225 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 48/ 291] layers.4.ffn_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 49/ 291] layers.5.attention.wq.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.226 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 50/ 291] layers.5.attention.wk.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.031 0.047 0.067 0.088 0.106 0.226 0.106 0.088 0.067 0.047 0.032 0.020 0.027 \n",
"[ 51/ 291] layers.5.attention.wv.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.031 0.047 0.067 0.088 0.106 0.227 0.106 0.088 0.067 0.047 0.032 0.020 0.027 \n",
"[ 52/ 291] layers.5.attention.wo.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.224 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 53/ 291] layers.5.attention_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 54/ 291] layers.5.feed_forward.w1.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.224 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 55/ 291] layers.5.feed_forward.w2.weight - 11008 x 4096, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.225 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 56/ 291] layers.5.feed_forward.w3.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.225 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 57/ 291] layers.5.ffn_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 58/ 291] layers.6.attention.wq.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.226 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 59/ 291] layers.6.attention.wk.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.031 0.048 0.067 0.088 0.106 0.226 0.106 0.088 0.067 0.048 0.031 0.020 0.027 \n",
"[ 60/ 291] layers.6.attention.wv.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.047 0.067 0.088 0.106 0.227 0.106 0.088 0.067 0.047 0.032 0.020 0.027 \n",
"[ 61/ 291] layers.6.attention.wo.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.224 0.105 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 62/ 291] layers.6.attention_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 63/ 291] layers.6.feed_forward.w1.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.225 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 64/ 291] layers.6.feed_forward.w2.weight - 11008 x 4096, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.226 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 65/ 291] layers.6.feed_forward.w3.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.225 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 66/ 291] layers.6.ffn_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 67/ 291] layers.7.attention.wq.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.047 0.067 0.088 0.106 0.226 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 68/ 291] layers.7.attention.wk.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.226 0.106 0.088 0.067 0.047 0.031 0.020 0.027 \n",
"[ 69/ 291] layers.7.attention.wv.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.031 0.047 0.067 0.088 0.106 0.227 0.106 0.088 0.067 0.047 0.031 0.020 0.027 \n",
"[ 70/ 291] layers.7.attention.wo.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.225 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 71/ 291] layers.7.attention_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 72/ 291] layers.7.feed_forward.w1.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.224 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 73/ 291] layers.7.feed_forward.w2.weight - 11008 x 4096, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.032 0.047 0.067 0.088 0.106 0.226 0.106 0.088 0.067 0.048 0.031 0.020 0.027 \n",
"[ 74/ 291] layers.7.feed_forward.w3.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.225 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 75/ 291] layers.7.ffn_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 76/ 291] layers.8.attention.wq.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.226 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 77/ 291] layers.8.attention.wk.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.226 0.106 0.088 0.067 0.048 0.031 0.020 0.027 \n",
"[ 78/ 291] layers.8.attention.wv.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.031 0.047 0.067 0.088 0.106 0.227 0.106 0.088 0.067 0.047 0.031 0.020 0.027 \n",
"[ 79/ 291] layers.8.attention.wo.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.224 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 80/ 291] layers.8.attention_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 81/ 291] layers.8.feed_forward.w1.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.224 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 82/ 291] layers.8.feed_forward.w2.weight - 11008 x 4096, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.031 0.047 0.067 0.088 0.106 0.226 0.106 0.088 0.067 0.047 0.031 0.020 0.027 \n",
"[ 83/ 291] layers.8.feed_forward.w3.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.225 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 84/ 291] layers.8.ffn_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 85/ 291] layers.9.attention.wq.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.226 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 86/ 291] layers.9.attention.wk.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.047 0.067 0.088 0.106 0.226 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 87/ 291] layers.9.attention.wv.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.031 0.047 0.067 0.088 0.106 0.227 0.106 0.088 0.067 0.047 0.032 0.020 0.027 \n",
"[ 88/ 291] layers.9.attention.wo.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.224 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 89/ 291] layers.9.attention_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 90/ 291] layers.9.feed_forward.w1.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.225 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 91/ 291] layers.9.feed_forward.w2.weight - 11008 x 4096, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.031 0.047 0.067 0.088 0.106 0.227 0.106 0.088 0.067 0.047 0.031 0.020 0.027 \n",
"[ 92/ 291] layers.9.feed_forward.w3.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.225 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 93/ 291] layers.9.ffn_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 94/ 291] layers.10.attention.wq.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.225 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 95/ 291] layers.10.attention.wk.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.226 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 96/ 291] layers.10.attention.wv.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.031 0.047 0.067 0.088 0.106 0.228 0.106 0.088 0.067 0.047 0.031 0.020 0.027 \n",
"[ 97/ 291] layers.10.attention.wo.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.224 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 98/ 291] layers.10.attention_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 99/ 291] layers.10.feed_forward.w1.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.225 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 100/ 291] layers.10.feed_forward.w2.weight - 11008 x 4096, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.031 0.047 0.067 0.088 0.106 0.227 0.106 0.088 0.067 0.047 0.031 0.020 0.027 \n",
"[ 101/ 291] layers.10.feed_forward.w3.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.225 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 102/ 291] layers.10.ffn_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 103/ 291] layers.11.attention.wq.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.226 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 104/ 291] layers.11.attention.wk.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.226 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 105/ 291] layers.11.attention.wv.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.031 0.047 0.067 0.088 0.106 0.228 0.106 0.088 0.067 0.047 0.031 0.020 0.027 \n",
"[ 106/ 291] layers.11.attention.wo.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.224 0.105 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 107/ 291] layers.11.attention_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 108/ 291] layers.11.feed_forward.w1.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.225 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 109/ 291] layers.11.feed_forward.w2.weight - 11008 x 4096, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.031 0.047 0.067 0.088 0.106 0.227 0.106 0.088 0.067 0.047 0.031 0.020 0.027 \n",
"[ 110/ 291] layers.11.feed_forward.w3.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.225 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 111/ 291] layers.11.ffn_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 112/ 291] layers.12.attention.wq.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.225 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 113/ 291] layers.12.attention.wk.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.225 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 114/ 291] layers.12.attention.wv.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.031 0.047 0.067 0.088 0.106 0.227 0.106 0.088 0.067 0.047 0.031 0.020 0.027 \n",
"[ 115/ 291] layers.12.attention.wo.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.224 0.105 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 116/ 291] layers.12.attention_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 117/ 291] layers.12.feed_forward.w1.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.224 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 118/ 291] layers.12.feed_forward.w2.weight - 11008 x 4096, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.031 0.047 0.067 0.088 0.106 0.227 0.106 0.088 0.067 0.047 0.031 0.020 0.027 \n",
"[ 119/ 291] layers.12.feed_forward.w3.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.225 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 120/ 291] layers.12.ffn_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 121/ 291] layers.13.attention.wq.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.225 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 122/ 291] layers.13.attention.wk.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.031 0.048 0.067 0.088 0.106 0.226 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 123/ 291] layers.13.attention.wv.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.031 0.047 0.067 0.088 0.106 0.227 0.106 0.088 0.067 0.047 0.031 0.020 0.027 \n",
"[ 124/ 291] layers.13.attention.wo.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.105 0.224 0.105 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 125/ 291] layers.13.attention_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 126/ 291] layers.13.feed_forward.w1.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.224 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 127/ 291] layers.13.feed_forward.w2.weight - 11008 x 4096, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.031 0.047 0.067 0.088 0.106 0.226 0.106 0.088 0.067 0.047 0.031 0.020 0.027 \n",
"[ 128/ 291] layers.13.feed_forward.w3.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.225 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 129/ 291] layers.13.ffn_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 130/ 291] layers.14.attention.wq.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.225 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 131/ 291] layers.14.attention.wk.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.226 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 132/ 291] layers.14.attention.wv.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.047 0.067 0.088 0.106 0.227 0.106 0.088 0.067 0.047 0.031 0.020 0.027 \n",
"[ 133/ 291] layers.14.attention.wo.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.224 0.105 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 134/ 291] layers.14.attention_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 135/ 291] layers.14.feed_forward.w1.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.224 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 136/ 291] layers.14.feed_forward.w2.weight - 11008 x 4096, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.031 0.047 0.067 0.088 0.106 0.226 0.106 0.088 0.067 0.047 0.031 0.020 0.027 \n",
"[ 137/ 291] layers.14.feed_forward.w3.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.225 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 138/ 291] layers.14.ffn_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 139/ 291] layers.15.attention.wq.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.225 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 140/ 291] layers.15.attention.wk.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.226 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 141/ 291] layers.15.attention.wv.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.047 0.067 0.088 0.106 0.227 0.106 0.088 0.067 0.047 0.032 0.020 0.027 \n",
"[ 142/ 291] layers.15.attention.wo.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.105 0.224 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 143/ 291] layers.15.attention_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 144/ 291] layers.15.feed_forward.w1.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.224 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 145/ 291] layers.15.feed_forward.w2.weight - 11008 x 4096, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.031 0.047 0.067 0.088 0.106 0.226 0.106 0.088 0.067 0.047 0.031 0.020 0.027 \n",
"[ 146/ 291] layers.15.feed_forward.w3.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.225 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 147/ 291] layers.15.ffn_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 148/ 291] layers.16.attention.wq.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.226 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 149/ 291] layers.16.attention.wk.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.226 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 150/ 291] layers.16.attention.wv.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.047 0.067 0.088 0.106 0.227 0.106 0.088 0.067 0.047 0.032 0.020 0.027 \n",
"[ 151/ 291] layers.16.attention.wo.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.224 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 152/ 291] layers.16.attention_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 153/ 291] layers.16.feed_forward.w1.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.224 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 154/ 291] layers.16.feed_forward.w2.weight - 11008 x 4096, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.031 0.047 0.067 0.088 0.106 0.226 0.106 0.088 0.067 0.047 0.031 0.020 0.027 \n",
"[ 155/ 291] layers.16.feed_forward.w3.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.225 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 156/ 291] layers.16.ffn_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 157/ 291] layers.17.attention.wq.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.225 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 158/ 291] layers.17.attention.wk.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.226 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 159/ 291] layers.17.attention.wv.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.031 0.048 0.067 0.088 0.106 0.227 0.106 0.088 0.067 0.047 0.031 0.020 0.027 \n",
"[ 160/ 291] layers.17.attention.wo.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.224 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 161/ 291] layers.17.attention_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 162/ 291] layers.17.feed_forward.w1.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.224 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 163/ 291] layers.17.feed_forward.w2.weight - 11008 x 4096, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.226 0.106 0.088 0.067 0.048 0.031 0.020 0.027 \n",
"[ 164/ 291] layers.17.feed_forward.w3.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.225 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 165/ 291] layers.17.ffn_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 166/ 291] layers.18.attention.wq.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.225 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 167/ 291] layers.18.attention.wk.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.225 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 168/ 291] layers.18.attention.wv.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.031 0.048 0.067 0.088 0.106 0.226 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 169/ 291] layers.18.attention.wo.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.105 0.224 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 170/ 291] layers.18.attention_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 171/ 291] layers.18.feed_forward.w1.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.224 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 172/ 291] layers.18.feed_forward.w2.weight - 11008 x 4096, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.225 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 173/ 291] layers.18.feed_forward.w3.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.224 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 174/ 291] layers.18.ffn_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 175/ 291] layers.19.attention.wq.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.225 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 176/ 291] layers.19.attention.wk.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.226 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 177/ 291] layers.19.attention.wv.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.031 0.047 0.067 0.088 0.106 0.227 0.106 0.088 0.067 0.047 0.032 0.020 0.027 \n",
"[ 178/ 291] layers.19.attention.wo.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.224 0.105 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 179/ 291] layers.19.attention_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 180/ 291] layers.19.feed_forward.w1.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.224 0.105 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 181/ 291] layers.19.feed_forward.w2.weight - 11008 x 4096, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.225 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 182/ 291] layers.19.feed_forward.w3.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.224 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 183/ 291] layers.19.ffn_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 184/ 291] layers.20.attention.wq.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.225 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 185/ 291] layers.20.attention.wk.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.031 0.048 0.067 0.088 0.106 0.226 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 186/ 291] layers.20.attention.wv.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.226 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 187/ 291] layers.20.attention.wo.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.224 0.105 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 188/ 291] layers.20.attention_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 189/ 291] layers.20.feed_forward.w1.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.028 0.020 0.032 0.048 0.067 0.088 0.106 0.224 0.105 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 190/ 291] layers.20.feed_forward.w2.weight - 11008 x 4096, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.225 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 191/ 291] layers.20.feed_forward.w3.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.224 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 192/ 291] layers.20.ffn_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 193/ 291] layers.21.attention.wq.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.225 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 194/ 291] layers.21.attention.wk.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.226 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 195/ 291] layers.21.attention.wv.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.226 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 196/ 291] layers.21.attention.wo.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.224 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 197/ 291] layers.21.attention_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 198/ 291] layers.21.feed_forward.w1.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.028 0.020 0.032 0.048 0.067 0.088 0.106 0.224 0.105 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 199/ 291] layers.21.feed_forward.w2.weight - 11008 x 4096, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.225 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 200/ 291] layers.21.feed_forward.w3.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.224 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 201/ 291] layers.21.ffn_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 202/ 291] layers.22.attention.wq.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.226 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 203/ 291] layers.22.attention.wk.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.226 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 204/ 291] layers.22.attention.wv.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.226 0.106 0.088 0.067 0.047 0.032 0.020 0.027 \n",
"[ 205/ 291] layers.22.attention.wo.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.224 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 206/ 291] layers.22.attention_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 207/ 291] layers.22.feed_forward.w1.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.028 0.020 0.032 0.048 0.067 0.088 0.106 0.224 0.105 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 208/ 291] layers.22.feed_forward.w2.weight - 11008 x 4096, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.225 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 209/ 291] layers.22.feed_forward.w3.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.224 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 210/ 291] layers.22.ffn_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 211/ 291] layers.23.attention.wq.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.226 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 212/ 291] layers.23.attention.wk.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.226 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 213/ 291] layers.23.attention.wv.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.226 0.106 0.088 0.067 0.047 0.031 0.020 0.027 \n",
"[ 214/ 291] layers.23.attention.wo.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.225 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 215/ 291] layers.23.attention_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 216/ 291] layers.23.feed_forward.w1.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.028 0.020 0.032 0.048 0.067 0.088 0.106 0.224 0.105 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 217/ 291] layers.23.feed_forward.w2.weight - 11008 x 4096, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.225 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 218/ 291] layers.23.feed_forward.w3.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.224 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 219/ 291] layers.23.ffn_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 220/ 291] layers.24.attention.wq.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.226 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 221/ 291] layers.24.attention.wk.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.047 0.067 0.088 0.106 0.226 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 222/ 291] layers.24.attention.wv.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.226 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 223/ 291] layers.24.attention.wo.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.105 0.224 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 224/ 291] layers.24.attention_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 225/ 291] layers.24.feed_forward.w1.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.028 0.020 0.032 0.048 0.067 0.088 0.106 0.224 0.105 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 226/ 291] layers.24.feed_forward.w2.weight - 11008 x 4096, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.225 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 227/ 291] layers.24.feed_forward.w3.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.224 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 228/ 291] layers.24.ffn_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 229/ 291] layers.25.attention.wq.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.226 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 230/ 291] layers.25.attention.wk.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.225 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 231/ 291] layers.25.attention.wv.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.226 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 232/ 291] layers.25.attention.wo.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.224 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 233/ 291] layers.25.attention_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 234/ 291] layers.25.feed_forward.w1.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.028 0.020 0.032 0.048 0.067 0.088 0.106 0.224 0.105 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 235/ 291] layers.25.feed_forward.w2.weight - 11008 x 4096, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.225 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 236/ 291] layers.25.feed_forward.w3.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.224 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 237/ 291] layers.25.ffn_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 238/ 291] layers.26.attention.wq.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.225 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 239/ 291] layers.26.attention.wk.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.226 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 240/ 291] layers.26.attention.wv.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.031 0.048 0.067 0.088 0.106 0.226 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 241/ 291] layers.26.attention.wo.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.224 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 242/ 291] layers.26.attention_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 243/ 291] layers.26.feed_forward.w1.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.068 0.088 0.106 0.224 0.105 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 244/ 291] layers.26.feed_forward.w2.weight - 11008 x 4096, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.225 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 245/ 291] layers.26.feed_forward.w3.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.224 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 246/ 291] layers.26.ffn_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 247/ 291] layers.27.attention.wq.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.226 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 248/ 291] layers.27.attention.wk.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.226 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 249/ 291] layers.27.attention.wv.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.226 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 250/ 291] layers.27.attention.wo.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.225 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 251/ 291] layers.27.attention_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 252/ 291] layers.27.feed_forward.w1.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.028 0.020 0.032 0.048 0.067 0.088 0.106 0.224 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 253/ 291] layers.27.feed_forward.w2.weight - 11008 x 4096, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.225 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 254/ 291] layers.27.feed_forward.w3.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.225 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 255/ 291] layers.27.ffn_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 256/ 291] layers.28.attention.wq.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.226 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 257/ 291] layers.28.attention.wk.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.226 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 258/ 291] layers.28.attention.wv.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.226 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 259/ 291] layers.28.attention.wo.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.224 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 260/ 291] layers.28.attention_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 261/ 291] layers.28.feed_forward.w1.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.224 0.105 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 262/ 291] layers.28.feed_forward.w2.weight - 11008 x 4096, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.031 0.047 0.067 0.088 0.106 0.226 0.106 0.088 0.067 0.047 0.031 0.020 0.027 \n",
"[ 263/ 291] layers.28.feed_forward.w3.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.225 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 264/ 291] layers.28.ffn_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 265/ 291] layers.29.attention.wq.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.226 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 266/ 291] layers.29.attention.wk.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.226 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 267/ 291] layers.29.attention.wv.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.226 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 268/ 291] layers.29.attention.wo.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.225 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 269/ 291] layers.29.attention_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 270/ 291] layers.29.feed_forward.w1.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.224 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 271/ 291] layers.29.feed_forward.w2.weight - 11008 x 4096, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.031 0.047 0.067 0.088 0.106 0.227 0.107 0.088 0.067 0.047 0.031 0.020 0.027 \n",
"[ 272/ 291] layers.29.feed_forward.w3.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.225 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 273/ 291] layers.29.ffn_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 274/ 291] layers.30.attention.wq.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.047 0.067 0.088 0.106 0.226 0.106 0.088 0.067 0.047 0.032 0.020 0.027 \n",
"[ 275/ 291] layers.30.attention.wk.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.226 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 276/ 291] layers.30.attention.wv.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.031 0.047 0.067 0.088 0.106 0.226 0.106 0.088 0.067 0.047 0.032 0.020 0.027 \n",
"[ 277/ 291] layers.30.attention.wo.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.225 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 278/ 291] layers.30.attention_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 279/ 291] layers.30.feed_forward.w1.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.225 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 280/ 291] layers.30.feed_forward.w2.weight - 11008 x 4096, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.019 0.030 0.046 0.066 0.088 0.108 0.232 0.108 0.088 0.066 0.046 0.031 0.019 0.027 \n",
"[ 281/ 291] layers.30.feed_forward.w3.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.225 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 282/ 291] layers.30.ffn_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 283/ 291] layers.31.attention.wq.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.031 0.047 0.067 0.088 0.106 0.228 0.106 0.088 0.067 0.047 0.031 0.020 0.027 \n",
"[ 284/ 291] layers.31.attention.wk.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.031 0.047 0.067 0.088 0.106 0.226 0.106 0.088 0.067 0.047 0.031 0.020 0.027 \n",
"[ 285/ 291] layers.31.attention.wv.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.031 0.047 0.067 0.088 0.106 0.228 0.106 0.088 0.067 0.047 0.031 0.020 0.027 \n",
"[ 286/ 291] layers.31.attention.wo.weight - 4096 x 4096, type = f16, quantizing .. size = 32.00 MB -> 18.00 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.225 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 287/ 291] layers.31.attention_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"[ 288/ 291] layers.31.feed_forward.w1.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.225 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 289/ 291] layers.31.feed_forward.w2.weight - 11008 x 4096, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.019 0.030 0.045 0.065 0.088 0.109 0.237 0.109 0.088 0.065 0.045 0.030 0.019 0.027 \n",
"[ 290/ 291] layers.31.feed_forward.w3.weight - 4096 x 11008, type = f16, quantizing .. size = 86.00 MB -> 48.38 MB | hist: 0.000 0.027 0.020 0.032 0.047 0.067 0.088 0.106 0.226 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"[ 291/ 291] layers.31.ffn_norm.weight - 4096, type = f32, size = 0.016 MB\n",
"llama_model_quantize_internal: model size = 13133.55 MB\n",
"llama_model_quantize_internal: quant size = 7388.06 MB\n",
"llama_model_quantize_internal: hist: 0.000 0.027 0.020 0.032 0.048 0.067 0.088 0.106 0.226 0.106 0.088 0.067 0.048 0.032 0.020 0.027 \n",
"\n",
"main: quantize time = 146381.23 ms\n",
"main: total time = 146381.23 ms\n"
]
}
],
"source": [
"!cd llama.cpp && ./quantize ./zh-models/7B/ggml-model-f16.bin ./zh-models/7B/ggml-model-q8_0.bin 7"
]
},
{
"cell_type": "code",
"source": [
"!sha256sum ./llama.cpp/zh-models/7B/ggml-model-q8_0.bin"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "2PR5jo2P-hOw",
"outputId": "2d808543-557d-4d0a-becb-ab35c4ccb8ff"
},
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"0eec8927427f159397c79961a28d62d78849514a4a19033b247edd6ac3fc2cfd ./llama.cpp/zh-models/7B/ggml-model-q8_0.bin\n"
]
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "DLkuRAo9Vkb1"
},
"source": [
"### (可选)测试量化模型解码\n",
"至此已完成了所有转换步骤。\n",
"我们运行一条命令测试一下是否能够正常加载并进行对话。\n",
"\n",
"FP16和Q8量化文件存放在./llama.cpp/zh-models/7B下,可按需下载使用。"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "tW-ep1BsVQtG",
"outputId": "b3b28e5e-c731-4bb5-d3ae-c09d4c7bfb81"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"main: seed = 1682671021\n",
"llama.cpp: loading model from ./zh-models/7B/ggml-model-q8_0.bin\n",
"llama_model_load_internal: format = ggjt v1 (latest)\n",
"llama_model_load_internal: n_vocab = 49954\n",
"llama_model_load_internal: n_ctx = 512\n",
"llama_model_load_internal: n_embd = 4096\n",
"llama_model_load_internal: n_mult = 256\n",
"llama_model_load_internal: n_head = 32\n",
"llama_model_load_internal: n_layer = 32\n",
"llama_model_load_internal: n_rot = 128\n",
"llama_model_load_internal: ftype = 7 (mostly Q8_0)\n",
"llama_model_load_internal: n_ff = 11008\n",
"llama_model_load_internal: n_parts = 1\n",
"llama_model_load_internal: model size = 7B\n",
"llama_model_load_internal: ggml ctx size = 59.11 KB\n",
"llama_model_load_internal: mem required = 9180.12 MB (+ 1026.00 MB per state)\n",
"llama_init_from_file: kv self size = 256.00 MB\n",
"\n",
"system_info: n_threads = 4 / 4 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | \n",
"sampling: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.100000\n",
"generate: n_ctx = 512, n_batch = 512, n_predict = 512, n_keep = 0\n",
"\n",
"\n",
"\u001b[33m 详细介绍一下北京的名胜古迹:\u001b[0m长城、故宫等。同时介绍一些小众景点,比如颐和园中的石舫、圆明园中的琉璃花门等等。 [end of text]\n",
"\n",
"llama_print_timings: load time = 19881.66 ms\n",
"llama_print_timings: sample time = 48.31 ms / 32 runs ( 1.51 ms per run)\n",
"llama_print_timings: prompt eval time = 11365.17 ms / 11 tokens ( 1033.20 ms per token)\n",
"llama_print_timings: eval time = 33910.03 ms / 31 runs ( 1093.87 ms per run)\n",
"llama_print_timings: total time = 53841.09 ms\n"
]
}
],
"source": [
"!cd llama.cpp && ./main -m ./zh-models/7B/ggml-model-q8_0.bin --color -f ./prompts/alpaca.txt -p \"详细介绍一下北京的名胜古迹:\" -n 512"
]
}
],
"metadata": {
"accelerator": "TPU",
"colab": {
"machine_shape": "hm",
"provenance": []
},
"gpuClass": "premium",
"kernelspec": {
"display_name": "Python 3",
"name": "python3"
},
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
================================================
FILE: notebooks/legacy/convert_and_quantize_chinese_llama.ipynb
================================================
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"provenance": [],
"machine_shape": "hm"
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
},
"gpuClass": "standard",
"accelerator": "TPU"
},
"cells": [
{
"cell_type": "markdown",
"source": [
"# 转换并量化中文LLaMA/Alpaca模型\n",
"\n",
"🎉🎉🎉 **新:现在免费用户也有机会能够转换7B和13B模型了!**\n",
"\n",
"💡 提示和小窍门:\n",
"- 免费用户默认的内存只有12G左右,**笔者用免费账号实测选择TPU的话有机会随机出35G内存**,建议多试几次。如果能随机出25G内存以上的机器就可以了转换7B模型了,35G内存以上机器就能转换13B模型了\n",
"- Pro(+)用户请选择 “代码执行程序” -> “更改运行时类型” -> “高RAM”\n",
"- 实测:转换7B级别模型,25G内存的机器就够了;转换13B级别模型需要30G以上的内存(程序莫名崩掉或断开连接就说明内存爆了)\n",
"- 如果选了“高RAM”之后内存还是不够大的话,选择以下操作,有的时候会分配出很高内存的机器,祝你好运😄!\n",
" - 可以把GPU或者TPU也选上(虽然不会用到)\n",
" - 选GPU时,Pro用户可选“高级”类型GPU\n",
"\n",
"以下信息配置信息供参考(Pro订阅下测试),运行时规格设置为“高RAM”时的设备配置如下(有随机性):\n",
"\n",
"| 硬件加速器 | RAM | 硬盘 |\n",
"| :-- | :--: | :--: |\n",
"| None | 25GB | 225GB |\n",
"| TPU | 35GB | 225GB |\n",
"| GPU(标准,T4)| 25GB | 166GB |\n",
"| GPU(高性能,V100)| 25GB | 166GB |\n",
"| GPU(高性能,A100)| **80GB** | 166GB |\n",
"\n",
"*温馨提示:用完之后注意断开运行时,选择满足要求的最低配置即可,避免不必要的计算单元消耗(Pro只给100个计算单元)。*"
],
"metadata": {
"id": "B1c96_k3MahN"
}
},
{
"cell_type": "markdown",
"source": [
"## 安装相关依赖"
],
"metadata": {
"id": "vScqHD_jMFOV"
}
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "E5WKFJXIL6ZU",
"outputId": "7ce317e5-c105-49a8-d1af-70c29e6246e1"
},
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n",
"Collecting transformers\n",
" Downloading transformers-4.28.0-py3-none-any.whl (7.0 MB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m7.0/7.0 MB\u001b[0m \u001b[31m54.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hRequirement already satisfied: requests in /usr/local/lib/python3.9/dist-packages (from transformers) (2.27.1)\n",
"Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.9/dist-packages (from transformers) (1.24.2)\n",
"Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.9/dist-packages (from transformers) (6.0)\n",
"Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.9/dist-packages (from transformers) (4.65.0)\n",
"Collecting huggingface-hub<1.0,>=0.11.0\n",
" Downloading huggingface_hub-0.13.4-py3-none-any.whl (200 kB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m200.1/200.1 kB\u001b[0m \u001b[31m24.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hRequirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.9/dist-packages (from transformers) (2022.10.31)\n",
"Requirement already satisfied: filelock in /usr/local/lib/python3.9/dist-packages (from transformers) (3.11.0)\n",
"Collecting tokenizers!=0.11.3,<0.14,>=0.11.1\n",
" Downloading tokenizers-0.13.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.8 MB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m7.8/7.8 MB\u001b[0m \u001b[31m97.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hRequirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.9/dist-packages (from transformers) (23.0)\n",
"Requirement already satisfied: typing-extensions>=3.7.4.3 in /usr/local/lib/python3.9/dist-packages (from huggingface-hub<1.0,>=0.11.0->transformers) (4.5.0)\n",
"Requirement already satisfied: charset-normalizer~=2.0.0 in /usr/local/lib/python3.9/dist-packages (from requests->transformers) (2.0.12)\n",
"Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.9/dist-packages (from requests->transformers) (2022.12.7)\n",
"Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.9/dist-packages (from requests->transformers) (1.26.15)\n",
"Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.9/dist-packages (from requests->transformers) (3.4)\n",
"Installing collected packages: tokenizers, huggingface-hub, transformers\n",
"Successfully installed huggingface-hub-0.13.4 tokenizers-0.13.3 transformers-4.28.0\n",
"Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n",
"Collecting peft\n",
" Downloading peft-0.2.0-py3-none-any.whl (40 kB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m40.3/40.3 kB\u001b[0m \u001b[31m3.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hRequirement already satisfied: psutil in /usr/local/lib/python3.9/dist-packages (from peft) (5.9.4)\n",
"Requirement already satisfied: transformers in /usr/local/lib/python3.9/dist-packages (from peft) (4.28.0)\n",
"Requirement already satisfied: pyyaml in /usr/local/lib/python3.9/dist-packages (from peft) (6.0)\n",
"Requirement already satisfied: torch>=1.13.0 in /usr/local/lib/python3.9/dist-packages (from peft) (2.0.0+cu118)\n",
"Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.9/dist-packages (from peft) (23.0)\n",
"Collecting accelerate\n",
" Downloading accelerate-0.18.0-py3-none-any.whl (215 kB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m215.3/215.3 kB\u001b[0m \u001b[31m6.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hRequirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.9/dist-packages (from peft) (1.24.2)\n",
"Requirement already satisfied: sympy in /usr/local/lib/python3.9/dist-packages (from torch>=1.13.0->peft) (1.11.1)\n",
"Requirement already satisfied: jinja2 in /usr/local/lib/python3.9/dist-packages (from torch>=1.13.0->peft) (3.1.2)\n",
"Requirement already satisfied: triton==2.0.0 in /usr/local/lib/python3.9/dist-packages (from torch>=1.13.0->peft) (2.0.0)\n",
"Requirement already satisfied: networkx in /usr/local/lib/python3.9/dist-packages (from torch>=1.13.0->peft) (3.1)\n",
"Requirement already satisfied: filelock in /usr/local/lib/python3.9/dist-packages (from torch>=1.13.0->peft) (3.11.0)\n",
"Requirement already satisfied: typing-extensions in /usr/local/lib/python3.9/dist-packages (from torch>=1.13.0->peft) (4.5.0)\n",
"Requirement already satisfied: lit in /usr/local/lib/python3.9/dist-packages (from triton==2.0.0->torch>=1.13.0->peft) (16.0.1)\n",
"Requirement already satisfied: cmake in /usr/local/lib/python3.9/dist-packages (from triton==2.0.0->torch>=1.13.0->peft) (3.25.2)\n",
"Requirement already satisfied: huggingface-hub<1.0,>=0.11.0 in /usr/local/lib/python3.9/dist-packages (from transformers->peft) (0.13.4)\n",
"Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.9/dist-packages (from transformers->peft) (2022.10.31)\n",
"Requirement already satisfied: requests in /usr/local/lib/python3.9/dist-packages (from transformers->peft) (2.27.1)\n",
"Requirement already satisfied: tokenizers!=0.11.3,<0.14,>=0.11.1 in /usr/local/lib/python3.9/dist-packages (from transformers->peft) (0.13.3)\n",
"Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.9/dist-packages (from transformers->peft) (4.65.0)\n",
"Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.9/dist-packages (from jinja2->torch>=1.13.0->peft) (2.1.2)\n",
"Requirement already satisfied: charset-normalizer~=2.0.0 in /usr/local/lib/python3.9/dist-packages (from requests->transformers->peft) (2.0.12)\n",
"Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.9/dist-packages (from requests->transformers->peft) (3.4)\n",
"Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.9/dist-packages (from requests->transformers->peft) (1.26.15)\n",
"Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.9/dist-packages (from requests->transformers->peft) (2022.12.7)\n",
"Requirement already satisfied: mpmath>=0.19 in /usr/local/lib/python3.9/dist-packages (from sympy->torch>=1.13.0->peft) (1.3.0)\n",
"Installing collected packages: accelerate, peft\n",
"Successfully installed accelerate-0.18.0 peft-0.2.0\n",
"Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n",
"Collecting sentencepiece\n",
" Downloading sentencepiece-0.1.98-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.3/1.3 MB\u001b[0m \u001b[31m18.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hInstalling collected packages: sentencepiece\n",
"Successfully installed sentencepiece-0.1.98\n"
]
}
],
"source": [
"!pip install transformers\n",
"!pip install peft\n",
"!pip install sentencepiece"
]
},
{
"cell_type": "markdown",
"source": [
"## 克隆目录和代码"
],
"metadata": {
"id": "ygb1xFIMNQKw"
}
},
{
"cell_type": "code",
"source": [
"!git clone https://github.com/ymcui/Chinese-LLaMA-Alpaca\n",
"!git clone https://github.com/ggerganov/llama.cpp"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "yCEJh7NJNXz9",
"outputId": "91a0e4ff-af63-4f8e-ab82-ee4ddf583033"
},
"execution_count": 2,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Cloning into 'Chinese-LLaMA-Alpaca'...\n",
"remote: Enumerating objects: 559, done.\u001b[K\n",
"remote: Counting objects: 100% (129/129), done.\u001b[K\n",
"remote: Compressing objects: 100% (115/115), done.\u001b[K\n",
"remote: Total 559 (delta 30), reused 22 (delta 14), pack-reused 430\u001b[K\n",
"Receiving objects: 100% (559/559), 10.71 MiB | 25.49 MiB/s, done.\n",
"Resolving deltas: 100% (333/333), done.\n",
"Cloning into 'llama.cpp'...\n",
"remote: Enumerating objects: 1701, done.\u001b[K\n",
"remote: Counting objects: 100% (1701/1701), done.\u001b[K\n",
"remote: Compressing objects: 100% (620/620), done.\u001b[K\n",
"remote: Total 1701 (delta 1084), reused 1623 (delta 1047), pack-reused 0\u001b[K\n",
"Receiving objects: 100% (1701/1701), 1.86 MiB | 14.74 MiB/s, done.\n",
"Resolving deltas: 100% (1084/1084), done.\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"## 合并模型(以Alpaca-7B为例)\n",
"\n",
"**⚠️ 再次提醒:7B模型需要25G内存,13B模型需要35G+内存。**\n",
"\n",
"此处使用的是🤗模型库中提供的基模型(已是HF格式),而不是Facebook官方的LLaMA模型,因此略去将原版LLaMA转换为HF格式的步骤。\n",
"\n",
"**这里直接运行第二步:合并LoRA权重**,生成全量模型权重。可以直接指定🤗模型库的地址,也可以是本地存放地址。\n",
"- 基模型:`decapoda-research/llama-7b-hf` *(use at your own risk)*\n",
"- LoRA模型:`ziqingyang/chinese-alpaca-lora-7b`\n",
"\n",
"💡 转换13B模型提示:\n",
"- 请将参数`--base_model`和`--lora_model`中的的`7b`改为`13b`即可\n",
"- **免费用户必须增加一个参数`--offload_dir`以缓解内存压力**,例如`--offload_dir ./offload_temp`\n",
"\n",
"该过程比较耗时(下载+转换),需要几分钟到十几分钟不等,请耐心等待。\n",
"转换好的模型存放在`alpaca-combined`目录。\n",
"如果你不需要量化模型,那么到这一步就结束了。"
],
"metadata": {
"id": "nIyxX0DSNsgQ"
}
},
{
"cell_type": "code",
"source": [
"!python ./Chinese-LLaMA-Alpaca/scripts/merge_llama_with_chinese_lora.py \\\n",
" --base_model 'decapoda-research/llama-7b-hf' \\\n",
" --lora_model 'ziqingyang/chinese-alpaca-lora-7b' \\\n",
" --output_dir alpaca-combined"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "5AV4EW5hNhVV",
"outputId": "e34419d4-b7c9-4e22-af37-abf80d4163ba"
},
"execution_count": 3,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"2023-04-14 10:13:45.382526: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT\n",
"Downloading tokenizer.model: 100% 758k/758k [00:00<00:00, 12.7MB/s]\n",
"Downloading (…)cial_tokens_map.json: 100% 96.0/96.0 [00:00<00:00, 15.3kB/s]\n",
"Downloading (…)okenizer_config.json: 100% 166/166 [00:00<00:00, 63.2kB/s]\n",
"Downloading (…)lve/main/config.json: 100% 427/427 [00:00<00:00, 63.4kB/s]\n",
"Downloading (…)model.bin.index.json: 100% 25.5k/25.5k [00:00<00:00, 9.41MB/s]\n",
"Downloading shards: 0% 0/33 [00:00, ?it/s]\n",
"Downloading (…)l-00001-of-00033.bin: 0% 0.00/405M [00:00, ?B/s]\u001b[A\n",
"Downloading (…)l-00001-of-00033.bin: 3% 10.5M/405M [00:00<00:04, 95.1MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00033.bin: 8% 31.5M/405M [00:00<00:02, 155MB/s] \u001b[A\n",
"Downloading (…)l-00001-of-00033.bin: 16% 62.9M/405M [00:00<00:01, 185MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00033.bin: 23% 94.4M/405M [00:00<00:01, 197MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00033.bin: 31% 126M/405M [00:00<00:01, 205MB/s] \u001b[A\n",
"Downloading (…)l-00001-of-00033.bin: 39% 157M/405M [00:00<00:01, 208MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00033.bin: 47% 189M/405M [00:00<00:01, 210MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00033.bin: 54% 220M/405M [00:01<00:00, 213MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00033.bin: 62% 252M/405M [00:01<00:00, 214MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00033.bin: 70% 283M/405M [00:01<00:00, 215MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00033.bin: 78% 315M/405M [00:01<00:00, 216MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00033.bin: 85% 346M/405M [00:01<00:00, 216MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00033.bin: 93% 377M/405M [00:01<00:00, 214MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00033.bin: 100% 405M/405M [00:01<00:00, 205MB/s]\n",
"Downloading shards: 3% 1/33 [00:02<01:07, 2.11s/it]\n",
"Downloading (…)l-00002-of-00033.bin: 0% 0.00/405M [00:00, ?B/s]\u001b[A\n",
"Downloading (…)l-00002-of-00033.bin: 3% 10.5M/405M [00:00<00:04, 91.8MB/s]\u001b[A\n",
"Downloading (…)l-00002-of-00033.bin: 8% 31.5M/405M [00:00<00:02, 150MB/s] \u001b[A\n",
"Downloading (…)l-00002-of-00033.bin: 13% 52.4M/405M [00:00<00:02, 173MB/s]\u001b[A\n",
"Downloading (…)l-00002-of-00033.bin: 18% 73.4M/405M [00:00<00:01, 182MB/s]\u001b[A\n",
"Downloading (…)l-00002-of-00033.bin: 23% 94.4M/405M [00:00<00:01, 186MB/s]\u001b[A\n",
"Downloading (…)l-00002-of-00033.bin: 28% 115M/405M [00:00<00:01, 192MB/s] \u001b[A\n",
"Downloading (…)l-00002-of-00033.bin: 34% 136M/405M [00:00<00:01, 196MB/s]\u001b[A\n",
"Downloading (…)l-00002-of-00033.bin: 39% 157M/405M [00:00<00:01, 199MB/s]\u001b[A\n",
"Downloading (…)l-00002-of-00033.bin: 44% 178M/405M [00:00<00:01, 200MB/s]\u001b[A\n",
"Downloading (…)l-00002-of-00033.bin: 49% 199M/405M [00:01<00:01, 201MB/s]\u001b[A\n",
"Downloading (…)l-00002-of-00033.bin: 54% 220M/405M [00:01<00:00, 202MB/s]\u001b[A\n",
"Downloading (…)l-00002-of-00033.bin: 60% 241M/405M [00:01<00:00, 201MB/s]\u001b[A\n",
"Downloading (…)l-00002-of-00033.bin: 65% 262M/405M [00:01<00:00, 202MB/s]\u001b[A\n",
"Downloading (…)l-00002-of-00033.bin: 70% 283M/405M [00:01<00:00, 203MB/s]\u001b[A\n",
"Downloading (…)l-00002-of-00033.bin: 75% 304M/405M [00:01<00:00, 203MB/s]\u001b[A\n",
"Downloading (…)l-00002-of-00033.bin: 80% 325M/405M [00:01<00:00, 203MB/s]\u001b[A\n",
"Downloading (…)l-00002-of-00033.bin: 85% 346M/405M [00:01<00:00, 198MB/s]\u001b[A\n",
"Downloading (…)l-00002-of-00033.bin: 91% 367M/405M [00:01<00:00, 200MB/s]\u001b[A\n",
"Downloading (…)l-00002-of-00033.bin: 100% 405M/405M [00:02<00:00, 194MB/s]\n",
"Downloading shards: 6% 2/33 [00:04<01:07, 2.17s/it]\n",
"Downloading (…)l-00003-of-00033.bin: 0% 0.00/405M [00:00, ?B/s]\u001b[A\n",
"Downloading (…)l-00003-of-00033.bin: 3% 10.5M/405M [00:00<00:04, 90.9MB/s]\u001b[A\n",
"Downloading (…)l-00003-of-00033.bin: 8% 31.5M/405M [00:00<00:02, 143MB/s] \u001b[A\n",
"Downloading (…)l-00003-of-00033.bin: 13% 52.4M/405M [00:00<00:02, 166MB/s]\u001b[A\n",
"Downloading (…)l-00003-of-00033.bin: 18% 73.4M/405M [00:00<00:01, 179MB/s]\u001b[A\n",
"Downloading (…)l-00003-of-00033.bin: 23% 94.4M/405M [00:00<00:01, 187MB/s]\u001b[A\n",
"Downloading (…)l-00003-of-00033.bin: 28% 115M/405M [00:00<00:01, 190MB/s] \u001b[A\n",
"Downloading (…)l-00003-of-00033.bin: 34% 136M/405M [00:00<00:01, 192MB/s]\u001b[A\n",
"Downloading (…)l-00003-of-00033.bin: 39% 157M/405M [00:00<00:01, 194MB/s]\u001b[A\n",
"Downloading (…)l-00003-of-00033.bin: 44% 178M/405M [00:00<00:01, 196MB/s]\u001b[A\n",
"Downloading (…)l-00003-of-00033.bin: 49% 199M/405M [00:01<00:01, 197MB/s]\u001b[A\n",
"Downloading (…)l-00003-of-00033.bin: 54% 220M/405M [00:01<00:00, 197MB/s]\u001b[A\n",
"Downloading (…)l-00003-of-00033.bin: 60% 241M/405M [00:01<00:00, 198MB/s]\u001b[A\n",
"Downloading (…)l-00003-of-00033.bin: 65% 262M/405M [00:01<00:00, 199MB/s]\u001b[A\n",
"Downloading (…)l-00003-of-00033.bin: 70% 283M/405M [00:01<00:00, 199MB/s]\u001b[A\n",
"Downloading (…)l-00003-of-00033.bin: 75% 304M/405M [00:01<00:00, 200MB/s]\u001b[A\n",
"Downloading (…)l-00003-of-00033.bin: 80% 325M/405M [00:01<00:00, 200MB/s]\u001b[A\n",
"Downloading (…)l-00003-of-00033.bin: 85% 346M/405M [00:01<00:00, 200MB/s]\u001b[A\n",
"Downloading (…)l-00003-of-00033.bin: 91% 367M/405M [00:01<00:00, 200MB/s]\u001b[A\n",
"Downloading (…)l-00003-of-00033.bin: 100% 405M/405M [00:02<00:00, 187MB/s]\n",
"Downloading shards: 9% 3/33 [00:06<01:06, 2.23s/it]\n",
"Downloading (…)l-00004-of-00033.bin: 0% 0.00/405M [00:00, ?B/s]\u001b[A\n",
"Downloading (…)l-00004-of-00033.bin: 3% 10.5M/405M [00:00<00:04, 93.5MB/s]\u001b[A\n",
"Downloading (…)l-00004-of-00033.bin: 8% 31.5M/405M [00:00<00:02, 152MB/s] \u001b[A\n",
"Downloading (…)l-00004-of-00033.bin: 13% 52.4M/405M [00:00<00:02, 175MB/s]\u001b[A\n",
"Downloading (…)l-00004-of-00033.bin: 18% 73.4M/405M [00:00<00:01, 182MB/s]\u001b[A\n",
"Downloading (…)l-00004-of-00033.bin: 23% 94.4M/405M [00:00<00:01, 190MB/s]\u001b[A\n",
"Downloading (…)l-00004-of-00033.bin: 28% 115M/405M [00:00<00:01, 195MB/s] \u001b[A\n",
"Downloading (…)l-00004-of-00033.bin: 34% 136M/405M [00:00<00:01, 198MB/s]\u001b[A\n",
"Downloading (…)l-00004-of-00033.bin: 39% 157M/405M [00:00<00:01, 200MB/s]\u001b[A\n",
"Downloading (…)l-00004-of-00033.bin: 44% 178M/405M [00:00<00:01, 201MB/s]\u001b[A\n",
"Downloading (…)l-00004-of-00033.bin: 49% 199M/405M [00:01<00:01, 203MB/s]\u001b[A\n",
"Downloading (…)l-00004-of-00033.bin: 54% 220M/405M [00:01<00:00, 204MB/s]\u001b[A\n",
"Downloading (…)l-00004-of-00033.bin: 60% 241M/405M [00:01<00:00, 204MB/s]\u001b[A\n",
"Downloading (…)l-00004-of-00033.bin: 65% 262M/405M [00:01<00:00, 201MB/s]\u001b[A\n",
"Downloading (…)l-00004-of-00033.bin: 70% 283M/405M [00:01<00:00, 202MB/s]\u001b[A\n",
"Downloading (…)l-00004-of-00033.bin: 75% 304M/405M [00:01<00:00, 203MB/s]\u001b[A\n",
"Downloading (…)l-00004-of-00033.bin: 80% 325M/405M [00:01<00:00, 204MB/s]\u001b[A\n",
"Downloading (…)l-00004-of-00033.bin: 85% 346M/405M [00:01<00:00, 201MB/s]\u001b[A\n",
"Downloading (…)l-00004-of-00033.bin: 91% 367M/405M [00:01<00:00, 200MB/s]\u001b[A\n",
"Downloading (…)l-00004-of-00033.bin: 100% 405M/405M [00:02<00:00, 195MB/s]\n",
"Downloading shards: 12% 4/33 [00:08<01:04, 2.22s/it]\n",
"Downloading (…)l-00005-of-00033.bin: 0% 0.00/405M [00:00, ?B/s]\u001b[A\n",
"Downloading (…)l-00005-of-00033.bin: 3% 10.5M/405M [00:00<00:04, 86.5MB/s]\u001b[A\n",
"Downloading (…)l-00005-of-00033.bin: 8% 31.5M/405M [00:00<00:02, 144MB/s] \u001b[A\n",
"Downloading (…)l-00005-of-00033.bin: 13% 52.4M/405M [00:00<00:02, 167MB/s]\u001b[A\n",
"Downloading (…)l-00005-of-00033.bin: 18% 73.4M/405M [00:00<00:01, 178MB/s]\u001b[A\n",
"Downloading (…)l-00005-of-00033.bin: 23% 94.4M/405M [00:00<00:01, 185MB/s]\u001b[A\n",
"Downloading (…)l-00005-of-00033.bin: 28% 115M/405M [00:00<00:01, 189MB/s] \u001b[A\n",
"Downloading (…)l-00005-of-00033.bin: 34% 136M/405M [00:00<00:01, 187MB/s]\u001b[A\n",
"Downloading (…)l-00005-of-00033.bin: 39% 157M/405M [00:00<00:01, 190MB/s]\u001b[A\n",
"Downloading (…)l-00005-of-00033.bin: 44% 178M/405M [00:00<00:01, 192MB/s]\u001b[A\n",
"Downloading (…)l-00005-of-00033.bin: 49% 199M/405M [00:01<00:01, 193MB/s]\u001b[A\n",
"Downloading (…)l-00005-of-00033.bin: 54% 220M/405M [00:01<00:00, 192MB/s]\u001b[A\n",
"Downloading (…)l-00005-of-00033.bin: 60% 241M/405M [00:01<00:00, 192MB/s]\u001b[A\n",
"Downloading (…)l-00005-of-00033.bin: 65% 262M/405M [00:01<00:00, 193MB/s]\u001b[A\n",
"Downloading (…)l-00005-of-00033.bin: 70% 283M/405M [00:01<00:00, 195MB/s]\u001b[A\n",
"Downloading (…)l-00005-of-00033.bin: 75% 304M/405M [00:01<00:00, 195MB/s]\u001b[A\n",
"Downloading (…)l-00005-of-00033.bin: 80% 325M/405M [00:01<00:00, 196MB/s]\u001b[A\n",
"Downloading (…)l-00005-of-00033.bin: 85% 346M/405M [00:01<00:00, 196MB/s]\u001b[A\n",
"Downloading (…)l-00005-of-00033.bin: 91% 367M/405M [00:01<00:00, 197MB/s]\u001b[A\n",
"Downloading (…)l-00005-of-00033.bin: 100% 405M/405M [00:02<00:00, 188MB/s]\n",
"Downloading shards: 15% 5/33 [00:11<01:03, 2.26s/it]\n",
"Downloading (…)l-00006-of-00033.bin: 0% 0.00/405M [00:00, ?B/s]\u001b[A\n",
"Downloading (…)l-00006-of-00033.bin: 3% 10.5M/405M [00:00<00:04, 90.3MB/s]\u001b[A\n",
"Downloading (…)l-00006-of-00033.bin: 8% 31.5M/405M [00:00<00:02, 150MB/s] \u001b[A\n",
"Downloading (…)l-00006-of-00033.bin: 13% 52.4M/405M [00:00<00:02, 173MB/s]\u001b[A\n",
"Downloading (…)l-00006-of-00033.bin: 18% 73.4M/405M [00:00<00:01, 182MB/s]\u001b[A\n",
"Downloading (…)l-00006-of-00033.bin: 23% 94.4M/405M [00:00<00:01, 186MB/s]\u001b[A\n",
"Downloading (…)l-00006-of-00033.bin: 28% 115M/405M [00:00<00:01, 190MB/s] \u001b[A\n",
"Downloading (…)l-00006-of-00033.bin: 34% 136M/405M [00:00<00:01, 194MB/s]\u001b[A\n",
"Downloading (…)l-00006-of-00033.bin: 39% 157M/405M [00:00<00:01, 196MB/s]\u001b[A\n",
"Downloading (…)l-00006-of-00033.bin: 44% 178M/405M [00:00<00:01, 194MB/s]\u001b[A\n",
"Downloading (…)l-00006-of-00033.bin: 49% 199M/405M [00:01<00:01, 196MB/s]\u001b[A\n",
"Downloading (…)l-00006-of-00033.bin: 54% 220M/405M [00:01<00:00, 197MB/s]\u001b[A\n",
"Downloading (…)l-00006-of-00033.bin: 60% 241M/405M [00:01<00:00, 198MB/s]\u001b[A\n",
"Downloading (…)l-00006-of-00033.bin: 65% 262M/405M [00:01<00:00, 199MB/s]\u001b[A\n",
"Downloading (…)l-00006-of-00033.bin: 70% 283M/405M [00:01<00:00, 200MB/s]\u001b[A\n",
"Downloading (…)l-00006-of-00033.bin: 75% 304M/405M [00:01<00:00, 201MB/s]\u001b[A\n",
"Downloading (…)l-00006-of-00033.bin: 80% 325M/405M [00:01<00:00, 201MB/s]\u001b[A\n",
"Downloading (…)l-00006-of-00033.bin: 85% 346M/405M [00:01<00:00, 198MB/s]\u001b[A\n",
"Downloading (…)l-00006-of-00033.bin: 91% 367M/405M [00:01<00:00, 191MB/s]\u001b[A\n",
"Downloading (…)l-00006-of-00033.bin: 100% 405M/405M [00:02<00:00, 190MB/s]\n",
"Downloading shards: 18% 6/33 [00:13<01:01, 2.26s/it]\n",
"Downloading (…)l-00007-of-00033.bin: 0% 0.00/405M [00:00, ?B/s]\u001b[A\n",
"Downloading (…)l-00007-of-00033.bin: 3% 10.5M/405M [00:00<00:04, 93.4MB/s]\u001b[A\n",
"Downloading (…)l-00007-of-00033.bin: 8% 31.5M/405M [00:00<00:02, 152MB/s] \u001b[A\n",
"Downloading (…)l-00007-of-00033.bin: 13% 52.4M/405M [00:00<00:02, 176MB/s]\u001b[A\n",
"Downloading (…)l-00007-of-00033.bin: 18% 73.4M/405M [00:00<00:01, 188MB/s]\u001b[A\n",
"Downloading (…)l-00007-of-00033.bin: 23% 94.4M/405M [00:00<00:01, 195MB/s]\u001b[A\n",
"Downloading (…)l-00007-of-00033.bin: 28% 115M/405M [00:00<00:01, 198MB/s] \u001b[A\n",
"Downloading (…)l-00007-of-00033.bin: 34% 136M/405M [00:00<00:01, 201MB/s]\u001b[A\n",
"Downloading (…)l-00007-of-00033.bin: 39% 157M/405M [00:00<00:01, 203MB/s]\u001b[A\n",
"Downloading (…)l-00007-of-00033.bin: 44% 178M/405M [00:00<00:01, 204MB/s]\u001b[A\n",
"Downloading (…)l-00007-of-00033.bin: 49% 199M/405M [00:01<00:01, 205MB/s]\u001b[A\n",
"Downloading (…)l-00007-of-00033.bin: 54% 220M/405M [00:01<00:00, 203MB/s]\u001b[A\n",
"Downloading (…)l-00007-of-00033.bin: 60% 241M/405M [00:01<00:00, 200MB/s]\u001b[A\n",
"Downloading (…)l-00007-of-00033.bin: 65% 262M/405M [00:01<00:00, 203MB/s]\u001b[A\n",
"Downloading (…)l-00007-of-00033.bin: 70% 283M/405M [00:01<00:00, 204MB/s]\u001b[A\n",
"Downloading (…)l-00007-of-00033.bin: 75% 304M/405M [00:01<00:00, 204MB/s]\u001b[A\n",
"Downloading (…)l-00007-of-00033.bin: 80% 325M/405M [00:01<00:00, 206MB/s]\u001b[A\n",
"Downloading (…)l-00007-of-00033.bin: 85% 346M/405M [00:01<00:00, 206MB/s]\u001b[A\n",
"Downloading (…)l-00007-of-00033.bin: 91% 367M/405M [00:01<00:00, 206MB/s]\u001b[A\n",
"Downloading (…)l-00007-of-00033.bin: 100% 405M/405M [00:02<00:00, 197MB/s]\n",
"Downloading shards: 21% 7/33 [00:15<00:58, 2.24s/it]\n",
"Downloading (…)l-00008-of-00033.bin: 0% 0.00/405M [00:00, ?B/s]\u001b[A\n",
"Downloading (…)l-00008-of-00033.bin: 3% 10.5M/405M [00:00<00:04, 93.4MB/s]\u001b[A\n",
"Downloading (…)l-00008-of-00033.bin: 8% 31.5M/405M [00:00<00:02, 153MB/s] \u001b[A\n",
"Downloading (…)l-00008-of-00033.bin: 13% 52.4M/405M [00:00<00:01, 177MB/s]\u001b[A\n",
"Downloading (…)l-00008-of-00033.bin: 18% 73.4M/405M [00:00<00:01, 188MB/s]\u001b[A\n",
"Downloading (…)l-00008-of-00033.bin: 23% 94.4M/405M [00:00<00:01, 195MB/s]\u001b[A\n",
"Downloading (…)l-00008-of-00033.bin: 28% 115M/405M [00:00<00:01, 199MB/s] \u001b[A\n",
"Downloading (…)l-00008-of-00033.bin: 34% 136M/405M [00:00<00:01, 202MB/s]\u001b[A\n",
"Downloading (…)l-00008-of-00033.bin: 39% 157M/405M [00:00<00:01, 203MB/s]\u001b[A\n",
"Downloading (…)l-00008-of-00033.bin: 44% 178M/405M [00:00<00:01, 204MB/s]\u001b[A\n",
"Downloading (…)l-00008-of-00033.bin: 49% 199M/405M [00:01<00:01, 201MB/s]\u001b[A\n",
"Downloading (…)l-00008-of-00033.bin: 54% 220M/405M [00:01<00:00, 200MB/s]\u001b[A\n",
"Downloading (…)l-00008-of-00033.bin: 60% 241M/405M [00:01<00:00, 201MB/s]\u001b[A\n",
"Downloading (…)l-00008-of-00033.bin: 65% 262M/405M [00:01<00:00, 202MB/s]\u001b[A\n",
"Downloading (…)l-00008-of-00033.bin: 70% 283M/405M [00:01<00:00, 203MB/s]\u001b[A\n",
"Downloading (…)l-00008-of-00033.bin: 75% 304M/405M [00:01<00:00, 204MB/s]\u001b[A\n",
"Downloading (…)l-00008-of-00033.bin: 80% 325M/405M [00:01<00:00, 205MB/s]\u001b[A\n",
"Downloading (…)l-00008-of-00033.bin: 85% 346M/405M [00:01<00:00, 206MB/s]\u001b[A\n",
"Downloading (…)l-00008-of-00033.bin: 91% 367M/405M [00:01<00:00, 206MB/s]\u001b[A\n",
"Downloading (…)l-00008-of-00033.bin: 100% 405M/405M [00:02<00:00, 197MB/s]\n",
"Downloading shards: 24% 8/33 [00:17<00:55, 2.22s/it]\n",
"Downloading (…)l-00009-of-00033.bin: 0% 0.00/405M [00:00, ?B/s]\u001b[A\n",
"Downloading (…)l-00009-of-00033.bin: 3% 10.5M/405M [00:00<00:04, 89.5MB/s]\u001b[A\n",
"Downloading (…)l-00009-of-00033.bin: 8% 31.5M/405M [00:00<00:02, 149MB/s] \u001b[A\n",
"Downloading (…)l-00009-of-00033.bin: 13% 52.4M/405M [00:00<00:02, 173MB/s]\u001b[A\n",
"Downloading (…)l-00009-of-00033.bin: 18% 73.4M/405M [00:00<00:01, 182MB/s]\u001b[A\n",
"Downloading (…)l-00009-of-00033.bin: 23% 94.4M/405M [00:00<00:01, 189MB/s]\u001b[A\n",
"Downloading (…)l-00009-of-00033.bin: 28% 115M/405M [00:00<00:01, 195MB/s] \u001b[A\n",
"Downloading (…)l-00009-of-00033.bin: 34% 136M/405M [00:00<00:01, 198MB/s]\u001b[A\n",
"Downloading (…)l-00009-of-00033.bin: 39% 157M/405M [00:00<00:01, 200MB/s]\u001b[A\n",
"Downloading (…)l-00009-of-00033.bin: 44% 178M/405M [00:00<00:01, 201MB/s]\u001b[A\n",
"Downloading (…)l-00009-of-00033.bin: 49% 199M/405M [00:01<00:01, 202MB/s]\u001b[A\n",
"Downloading (…)l-00009-of-00033.bin: 54% 220M/405M [00:01<00:00, 203MB/s]\u001b[A\n",
"Downloading (…)l-00009-of-00033.bin: 60% 241M/405M [00:01<00:00, 201MB/s]\u001b[A\n",
"Downloading (…)l-00009-of-00033.bin: 65% 262M/405M [00:01<00:00, 200MB/s]\u001b[A\n",
"Downloading (…)l-00009-of-00033.bin: 70% 283M/405M [00:01<00:00, 202MB/s]\u001b[A\n",
"Downloading (…)l-00009-of-00033.bin: 75% 304M/405M [00:01<00:00, 203MB/s]\u001b[A\n",
"Downloading (…)l-00009-of-00033.bin: 80% 325M/405M [00:01<00:00, 204MB/s]\u001b[A\n",
"Downloading (…)l-00009-of-00033.bin: 85% 346M/405M [00:01<00:00, 200MB/s]\u001b[A\n",
"Downloading (…)l-00009-of-00033.bin: 91% 367M/405M [00:01<00:00, 200MB/s]\u001b[A\n",
"Downloading (…)l-00009-of-00033.bin: 100% 405M/405M [00:02<00:00, 194MB/s]\n",
"Downloading shards: 27% 9/33 [00:20<00:53, 2.22s/it]\n",
"Downloading (…)l-00010-of-00033.bin: 0% 0.00/405M [00:00, ?B/s]\u001b[A\n",
"Downloading (…)l-00010-of-00033.bin: 3% 10.5M/405M [00:00<00:04, 93.7MB/s]\u001b[A\n",
"Downloading (…)l-00010-of-00033.bin: 8% 31.5M/405M [00:00<00:02, 152MB/s] \u001b[A\n",
"Downloading (…)l-00010-of-00033.bin: 13% 52.4M/405M [00:00<00:02, 175MB/s]\u001b[A\n",
"Downloading (…)l-00010-of-00033.bin: 18% 73.4M/405M [00:00<00:01, 186MB/s]\u001b[A\n",
"Downloading (…)l-00010-of-00033.bin: 23% 94.4M/405M [00:00<00:01, 193MB/s]\u001b[A\n",
"Downloading (…)l-00010-of-00033.bin: 28% 115M/405M [00:00<00:01, 196MB/s] \u001b[A\n",
"Downloading (…)l-00010-of-00033.bin: 34% 136M/405M [00:00<00:01, 195MB/s]\u001b[A\n",
"Downloading (…)l-00010-of-00033.bin: 39% 157M/405M [00:00<00:01, 198MB/s]\u001b[A\n",
"Downloading (…)l-00010-of-00033.bin: 44% 178M/405M [00:00<00:01, 200MB/s]\u001b[A\n",
"Downloading (…)l-00010-of-00033.bin: 52% 210M/405M [00:01<00:00, 204MB/s]\u001b[A\n",
"Downloading (…)l-00010-of-00033.bin: 57% 231M/405M [00:01<00:00, 204MB/s]\u001b[A\n",
"Downloading (…)l-00010-of-00033.bin: 62% 252M/405M [00:01<00:00, 204MB/s]\u001b[A\n",
"Downloading (…)l-00010-of-00033.bin: 67% 273M/405M [00:01<00:00, 205MB/s]\u001b[A\n",
"Downloading (…)l-00010-of-00033.bin: 73% 294M/405M [00:01<00:00, 205MB/s]\u001b[A\n",
"Downloading (…)l-00010-of-00033.bin: 78% 315M/405M [00:01<00:00, 205MB/s]\u001b[A\n",
"Downloading (…)l-00010-of-00033.bin: 83% 336M/405M [00:01<00:00, 205MB/s]\u001b[A\n",
"Downloading (…)l-00010-of-00033.bin: 88% 357M/405M [00:01<00:00, 205MB/s]\u001b[A\n",
"Downloading (…)l-00010-of-00033.bin: 93% 377M/405M [00:01<00:00, 205MB/s]\u001b[A\n",
"Downloading (…)l-00010-of-00033.bin: 100% 405M/405M [00:02<00:00, 196MB/s]\n",
"Downloading shards: 30% 10/33 [00:22<00:50, 2.22s/it]\n",
"Downloading (…)l-00011-of-00033.bin: 0% 0.00/405M [00:00, ?B/s]\u001b[A\n",
"Downloading (…)l-00011-of-00033.bin: 3% 10.5M/405M [00:00<00:04, 92.9MB/s]\u001b[A\n",
"Downloading (…)l-00011-of-00033.bin: 8% 31.5M/405M [00:00<00:02, 149MB/s] \u001b[A\n",
"Downloading (…)l-00011-of-00033.bin: 13% 52.4M/405M [00:00<00:02, 173MB/s]\u001b[A\n",
"Downloading (…)l-00011-of-00033.bin: 18% 73.4M/405M [00:00<00:01, 183MB/s]\u001b[A\n",
"Downloading (…)l-00011-of-00033.bin: 23% 94.4M/405M [00:00<00:01, 189MB/s]\u001b[A\n",
"Downloading (…)l-00011-of-00033.bin: 28% 115M/405M [00:00<00:01, 195MB/s] \u001b[A\n",
"Downloading (…)l-00011-of-00033.bin: 34% 136M/405M [00:00<00:01, 198MB/s]\u001b[A\n",
"Downloading (…)l-00011-of-00033.bin: 39% 157M/405M [00:00<00:01, 200MB/s]\u001b[A\n",
"Downloading (…)l-00011-of-00033.bin: 44% 178M/405M [00:00<00:01, 202MB/s]\u001b[A\n",
"Downloading (…)l-00011-of-00033.bin: 49% 199M/405M [00:01<00:01, 203MB/s]\u001b[A\n",
"Downloading (…)l-00011-of-00033.bin: 54% 220M/405M [00:01<00:00, 204MB/s]\u001b[A\n",
"Downloading (…)l-00011-of-00033.bin: 60% 241M/405M [00:01<00:00, 204MB/s]\u001b[A\n",
"Downloading (…)l-00011-of-00033.bin: 65% 262M/405M [00:01<00:00, 202MB/s]\u001b[A\n",
"Downloading (…)l-00011-of-00033.bin: 70% 283M/405M [00:01<00:00, 203MB/s]\u001b[A\n",
"Downloading (…)l-00011-of-00033.bin: 75% 304M/405M [00:01<00:00, 203MB/s]\u001b[A\n",
"Downloading (…)l-00011-of-00033.bin: 80% 325M/405M [00:01<00:00, 204MB/s]\u001b[A\n",
"Downloading (…)l-00011-of-00033.bin: 85% 346M/405M [00:01<00:00, 203MB/s]\u001b[A\n",
"Downloading (…)l-00011-of-00033.bin: 91% 367M/405M [00:01<00:00, 200MB/s]\u001b[A\n",
"Downloading (…)l-00011-of-00033.bin: 100% 405M/405M [00:02<00:00, 195MB/s]\n",
"Downloading shards: 33% 11/33 [00:24<00:48, 2.21s/it]\n",
"Downloading (…)l-00012-of-00033.bin: 0% 0.00/405M [00:00, ?B/s]\u001b[A\n",
"Downloading (…)l-00012-of-00033.bin: 3% 10.5M/405M [00:00<00:04, 86.7MB/s]\u001b[A\n",
"Downloading (…)l-00012-of-00033.bin: 8% 31.5M/405M [00:00<00:02, 143MB/s] \u001b[A\n",
"Downloading (…)l-00012-of-00033.bin: 13% 52.4M/405M [00:00<00:02, 165MB/s]\u001b[A\n",
"Downloading (…)l-00012-of-00033.bin: 18% 73.4M/405M [00:00<00:01, 177MB/s]\u001b[A\n",
"Downloading (…)l-00012-of-00033.bin: 23% 94.4M/405M [00:00<00:01, 185MB/s]\u001b[A\n",
"Downloading (…)l-00012-of-00033.bin: 28% 115M/405M [00:00<00:01, 189MB/s] \u001b[A\n",
"Downloading (…)l-00012-of-00033.bin: 34% 136M/405M [00:00<00:01, 187MB/s]\u001b[A\n",
"Downloading (…)l-00012-of-00033.bin: 39% 157M/405M [00:00<00:01, 190MB/s]\u001b[A\n",
"Downloading (…)l-00012-of-00033.bin: 44% 178M/405M [00:00<00:01, 192MB/s]\u001b[A\n",
"Downloading (…)l-00012-of-00033.bin: 49% 199M/405M [00:01<00:01, 193MB/s]\u001b[A\n",
"Downloading (…)l-00012-of-00033.bin: 54% 220M/405M [00:01<00:00, 192MB/s]\u001b[A\n",
"Downloading (…)l-00012-of-00033.bin: 60% 241M/405M [00:01<00:00, 191MB/s]\u001b[A\n",
"Downloading (…)l-00012-of-00033.bin: 65% 262M/405M [00:01<00:00, 191MB/s]\u001b[A\n",
"Downloading (…)l-00012-of-00033.bin: 70% 283M/405M [00:01<00:00, 192MB/s]\u001b[A\n",
"Downloading (…)l-00012-of-00033.bin: 75% 304M/405M [00:01<00:00, 193MB/s]\u001b[A\n",
"Downloading (…)l-00012-of-00033.bin: 80% 325M/405M [00:01<00:00, 193MB/s]\u001b[A\n",
"Downloading (…)l-00012-of-00033.bin: 85% 346M/405M [00:01<00:00, 194MB/s]\u001b[A\n",
"Downloading (…)l-00012-of-00033.bin: 91% 367M/405M [00:01<00:00, 194MB/s]\u001b[A\n",
"Downloading (…)l-00012-of-00033.bin: 100% 405M/405M [00:02<00:00, 186MB/s]\n",
"Downloading shards: 36% 12/33 [00:26<00:47, 2.25s/it]\n",
"Downloading (…)l-00013-of-00033.bin: 0% 0.00/405M [00:00, ?B/s]\u001b[A\n",
"Downloading (…)l-00013-of-00033.bin: 3% 10.5M/405M [00:00<00:04, 92.5MB/s]\u001b[A\n",
"Downloading (…)l-00013-of-00033.bin: 8% 31.5M/405M [00:00<00:02, 151MB/s] \u001b[A\n",
"Downloading (…)l-00013-of-00033.bin: 13% 52.4M/405M [00:00<00:02, 175MB/s]\u001b[A\n",
"Downloading (…)l-00013-of-00033.bin: 18% 73.4M/405M [00:00<00:01, 187MB/s]\u001b[A\n",
"Downloading (…)l-00013-of-00033.bin: 23% 94.4M/405M [00:00<00:01, 193MB/s]\u001b[A\n",
"Downloading (…)l-00013-of-00033.bin: 28% 115M/405M [00:00<00:01, 197MB/s] \u001b[A\n",
"Downloading (…)l-00013-of-00033.bin: 34% 136M/405M [00:00<00:01, 198MB/s]\u001b[A\n",
"Downloading (…)l-00013-of-00033.bin: 39% 157M/405M [00:00<00:01, 196MB/s]\u001b[A\n",
"Downloading (…)l-00013-of-00033.bin: 44% 178M/405M [00:00<00:01, 196MB/s]\u001b[A\n",
"Downloading (…)l-00013-of-00033.bin: 49% 199M/405M [00:01<00:01, 198MB/s]\u001b[A\n",
"Downloading (…)l-00013-of-00033.bin: 54% 220M/405M [00:01<00:00, 200MB/s]\u001b[A\n",
"Downloading (…)l-00013-of-00033.bin: 60% 241M/405M [00:01<00:00, 200MB/s]\u001b[A\n",
"Downloading (…)l-00013-of-00033.bin: 65% 262M/405M [00:01<00:00, 202MB/s]\u001b[A\n",
"Downloading (…)l-00013-of-00033.bin: 70% 283M/405M [00:01<00:00, 203MB/s]\u001b[A\n",
"Downloading (…)l-00013-of-00033.bin: 75% 304M/405M [00:01<00:00, 204MB/s]\u001b[A\n",
"Downloading (…)l-00013-of-00033.bin: 80% 325M/405M [00:01<00:00, 203MB/s]\u001b[A\n",
"Downloading (…)l-00013-of-00033.bin: 85% 346M/405M [00:01<00:00, 202MB/s]\u001b[A\n",
"Downloading (…)l-00013-of-00033.bin: 91% 367M/405M [00:01<00:00, 203MB/s]\u001b[A\n",
"Downloading (…)l-00013-of-00033.bin: 100% 405M/405M [00:02<00:00, 195MB/s]\n",
"Downloading shards: 39% 13/33 [00:28<00:44, 2.23s/it]\n",
"Downloading (…)l-00014-of-00033.bin: 0% 0.00/405M [00:00, ?B/s]\u001b[A\n",
"Downloading (…)l-00014-of-00033.bin: 3% 10.5M/405M [00:02<01:50, 3.56MB/s]\u001b[A\n",
"Downloading (…)l-00014-of-00033.bin: 5% 21.0M/405M [00:04<01:10, 5.46MB/s]\u001b[A\n",
"Downloading (…)l-00014-of-00033.bin: 8% 31.5M/405M [00:04<00:50, 7.45MB/s]\u001b[A\n",
"Downloading (…)l-00014-of-00033.bin: 10% 41.9M/405M [00:05<00:37, 9.67MB/s]\u001b[A\n",
"Downloading (…)l-00014-of-00033.bin: 13% 52.4M/405M [00:06<00:29, 12.1MB/s]\u001b[A\n",
"Downloading (…)l-00014-of-00033.bin: 16% 62.9M/405M [00:06<00:22, 14.9MB/s]\u001b[A\n",
"Downloading (…)l-00014-of-00033.bin: 18% 73.4M/405M [00:06<00:18, 17.6MB/s]\u001b[A\n",
"Downloading (…)l-00014-of-00033.bin: 21% 83.9M/405M [00:07<00:16, 20.0MB/s]\u001b[A\n",
"Downloading (…)l-00014-of-00033.bin: 23% 94.4M/405M [00:07<00:14, 21.9MB/s]\u001b[A\n",
"Downloading (…)l-00014-of-00033.bin: 26% 105M/405M [00:07<00:12, 23.5MB/s] \u001b[A\n",
"Downloading (…)l-00014-of-00033.bin: 28% 115M/405M [00:08<00:11, 24.7MB/s]\u001b[A\n",
"Downloading (…)l-00014-of-00033.bin: 31% 126M/405M [00:08<00:10, 25.6MB/s]\u001b[A\n",
"Downloading (…)l-00014-of-00033.bin: 34% 136M/405M [00:09<00:10, 26.3MB/s]\u001b[A\n",
"Downloading (…)l-00014-of-00033.bin: 36% 147M/405M [00:09<00:09, 26.8MB/s]\u001b[A\n",
"Downloading (…)l-00014-of-00033.bin: 39% 157M/405M [00:09<00:09, 27.1MB/s]\u001b[A\n",
"Downloading (…)l-00014-of-00033.bin: 41% 168M/405M [00:10<00:08, 27.4MB/s]\u001b[A\n",
"Downloading (…)l-00014-of-00033.bin: 44% 178M/405M [00:10<00:08, 27.6MB/s]\u001b[A\n",
"Downloading (…)l-00014-of-00033.bin: 47% 189M/405M [00:10<00:07, 27.7MB/s]\u001b[A\n",
"Downloading (…)l-00014-of-00033.bin: 49% 199M/405M [00:11<00:07, 27.8MB/s]\u001b[A\n",
"Downloading (…)l-00014-of-00033.bin: 52% 210M/405M [00:11<00:07, 27.8MB/s]\u001b[A\n",
"Downloading (…)l-00014-of-00033.bin: 54% 220M/405M [00:12<00:06, 27.8MB/s]\u001b[A\n",
"Downloading (…)l-00014-of-00033.bin: 57% 231M/405M [00:12<00:06, 27.8MB/s]\u001b[A\n",
"Downloading (…)l-00014-of-00033.bin: 60% 241M/405M [00:12<00:05, 27.9MB/s]\u001b[A\n",
"Downloading (…)l-00014-of-00033.bin: 62% 252M/405M [00:13<00:05, 27.9MB/s]\u001b[A\n",
"Downloading (…)l-00014-of-00033.bin: 65% 262M/405M [00:13<00:05, 27.9MB/s]\u001b[A\n",
"Downloading (…)l-00014-of-00033.bin: 67% 273M/405M [00:13<00:04, 27.9MB/s]\u001b[A\n",
"Downloading (…)l-00014-of-00033.bin: 70% 283M/405M [00:14<00:04, 27.9MB/s]\u001b[A\n",
"Downloading (…)l-00014-of-00033.bin: 73% 294M/405M [00:14<00:03, 27.9MB/s]\u001b[A\n",
"Downloading (…)l-00014-of-00033.bin: 75% 304M/405M [00:15<00:03, 27.9MB/s]\u001b[A\n",
"Downloading (…)l-00014-of-00033.bin: 78% 315M/405M [00:15<00:03, 27.9MB/s]\u001b[A\n",
"Downloading (…)l-00014-of-00033.bin: 80% 325M/405M [00:15<00:02, 27.8MB/s]\u001b[A\n",
"Downloading (…)l-00014-of-00033.bin: 83% 336M/405M [00:16<00:02, 27.9MB/s]\u001b[A\n",
"Downloading (…)l-00014-of-00033.bin: 85% 346M/405M [00:16<00:02, 27.9MB/s]\u001b[A\n",
"Downloading (…)l-00014-of-00033.bin: 88% 357M/405M [00:16<00:01, 27.9MB/s]\u001b[A\n",
"Downloading (…)l-00014-of-00033.bin: 91% 367M/405M [00:17<00:01, 27.9MB/s]\u001b[A\n",
"Downloading (…)l-00014-of-00033.bin: 93% 377M/405M [00:17<00:00, 27.9MB/s]\u001b[A\n",
"Downloading (…)l-00014-of-00033.bin: 96% 388M/405M [00:18<00:00, 27.9MB/s]\u001b[A\n",
"Downloading (…)l-00014-of-00033.bin: 98% 398M/405M [00:18<00:00, 27.9MB/s]\u001b[A\n",
"Downloading (…)l-00014-of-00033.bin: 100% 405M/405M [00:18<00:00, 21.7MB/s]\n",
"Downloading shards: 42% 14/33 [00:48<02:19, 7.34s/it]\n",
"Downloading (…)l-00015-of-00033.bin: 0% 0.00/405M [00:00, ?B/s]\u001b[A\n",
"Downloading (…)l-00015-of-00033.bin: 3% 10.5M/405M [00:02<01:20, 4.90MB/s]\u001b[A\n",
"Downloading (…)l-00015-of-00033.bin: 5% 21.0M/405M [00:03<00:54, 7.08MB/s]\u001b[A\n",
"Downloading (…)l-00015-of-00033.bin: 8% 31.5M/405M [00:03<00:40, 9.31MB/s]\u001b[A\n",
"Downloading (…)l-00015-of-00033.bin: 10% 41.9M/405M [00:04<00:30, 11.8MB/s]\u001b[A\n",
"Downloading (…)l-00015-of-00033.bin: 13% 52.4M/405M [00:04<00:24, 14.4MB/s]\u001b[A\n",
"Downloading (…)l-00015-of-00033.bin: 16% 62.9M/405M [00:05<00:19, 17.2MB/s]\u001b[A\n",
"Downloading (…)l-00015-of-00033.bin: 18% 73.4M/405M [00:05<00:16, 19.6MB/s]\u001b[A\n",
"Downloading (…)l-00015-of-00033.bin: 21% 83.9M/405M [00:05<00:14, 21.6MB/s]\u001b[A\n",
"Downloading (…)l-00015-of-00033.bin: 23% 94.4M/405M [00:06<00:13, 23.2MB/s]\u001b[A\n",
"Downloading (…)l-00015-of-00033.bin: 26% 105M/405M [00:06<00:12, 24.4MB/s] \u001b[A\n",
"Downloading (…)l-00015-of-00033.bin: 28% 115M/405M [00:07<00:11, 25.3MB/s]\u001b[A\n",
"Downloading (…)l-00015-of-00033.bin: 31% 126M/405M [00:07<00:10, 26.0MB/s]\u001b[A\n",
"Downloading (…)l-00015-of-00033.bin: 34% 136M/405M [00:07<00:10, 26.5MB/s]\u001b[A\n",
"Downloading (…)l-00015-of-00033.bin: 36% 147M/405M [00:08<00:09, 26.8MB/s]\u001b[A\n",
"Downloading (…)l-00015-of-00033.bin: 39% 157M/405M [00:08<00:09, 27.1MB/s]\u001b[A\n",
"Downloading (…)l-00015-of-00033.bin: 41% 168M/405M [00:09<00:08, 27.3MB/s]\u001b[A\n",
"Downloading (…)l-00015-of-00033.bin: 44% 178M/405M [00:09<00:08, 27.4MB/s]\u001b[A\n",
"Downloading (…)l-00015-of-00033.bin: 47% 189M/405M [00:09<00:07, 27.5MB/s]\u001b[A\n",
"Downloading (…)l-00015-of-00033.bin: 49% 199M/405M [00:10<00:07, 27.5MB/s]\u001b[A\n",
"Downloading (…)l-00015-of-00033.bin: 52% 210M/405M [00:10<00:07, 27.6MB/s]\u001b[A\n",
"Downloading (…)l-00015-of-00033.bin: 54% 220M/405M [00:10<00:06, 27.6MB/s]\u001b[A\n",
"Downloading (…)l-00015-of-00033.bin: 57% 231M/405M [00:11<00:06, 27.7MB/s]\u001b[A\n",
"Downloading (…)l-00015-of-00033.bin: 60% 241M/405M [00:11<00:05, 27.7MB/s]\u001b[A\n",
"Downloading (…)l-00015-of-00033.bin: 62% 252M/405M [00:12<00:05, 27.7MB/s]\u001b[A\n",
"Downloading (…)l-00015-of-00033.bin: 65% 262M/405M [00:12<00:05, 27.7MB/s]\u001b[A\n",
"Downloading (…)l-00015-of-00033.bin: 67% 273M/405M [00:12<00:04, 27.7MB/s]\u001b[A\n",
"Downloading (…)l-00015-of-00033.bin: 70% 283M/405M [00:13<00:04, 27.7MB/s]\u001b[A\n",
"Downloading (…)l-00015-of-00033.bin: 73% 294M/405M [00:13<00:04, 27.7MB/s]\u001b[A\n",
"Downloading (…)l-00015-of-00033.bin: 75% 304M/405M [00:13<00:03, 27.7MB/s]\u001b[A\n",
"Downloading (…)l-00015-of-00033.bin: 78% 315M/405M [00:14<00:03, 27.7MB/s]\u001b[A\n",
"Downloading (…)l-00015-of-00033.bin: 80% 325M/405M [00:14<00:02, 27.7MB/s]\u001b[A\n",
"Downloading (…)l-00015-of-00033.bin: 83% 336M/405M [00:15<00:02, 27.7MB/s]\u001b[A\n",
"Downloading (…)l-00015-of-00033.bin: 85% 346M/405M [00:15<00:02, 27.7MB/s]\u001b[A\n",
"Downloading (…)l-00015-of-00033.bin: 88% 357M/405M [00:15<00:01, 27.7MB/s]\u001b[A\n",
"Downloading (…)l-00015-of-00033.bin: 91% 367M/405M [00:16<00:01, 27.7MB/s]\u001b[A\n",
"Downloading (…)l-00015-of-00033.bin: 93% 377M/405M [00:16<00:00, 27.7MB/s]\u001b[A\n",
"Downloading (…)l-00015-of-00033.bin: 96% 388M/405M [00:16<00:00, 27.7MB/s]\u001b[A\n",
"Downloading (…)l-00015-of-00033.bin: 98% 398M/405M [00:17<00:00, 27.7MB/s]\u001b[A\n",
"Downloading (…)l-00015-of-00033.bin: 100% 405M/405M [00:17<00:00, 23.0MB/s]\n",
"Downloading shards: 45% 15/33 [01:06<03:10, 10.56s/it]\n",
"Downloading (…)l-00016-of-00033.bin: 0% 0.00/405M [00:00, ?B/s]\u001b[A\n",
"Downloading (…)l-00016-of-00033.bin: 3% 10.5M/405M [00:00<00:04, 91.8MB/s]\u001b[A\n",
"Downloading (…)l-00016-of-00033.bin: 8% 31.5M/405M [00:00<00:02, 144MB/s] \u001b[A\n",
"Downloading (…)l-00016-of-00033.bin: 13% 52.4M/405M [00:00<00:02, 171MB/s]\u001b[A\n",
"Downloading (…)l-00016-of-00033.bin: 18% 73.4M/405M [00:00<00:01, 185MB/s]\u001b[A\n",
"Downloading (…)l-00016-of-00033.bin: 23% 94.4M/405M [00:00<00:01, 189MB/s]\u001b[A\n",
"Downloading (…)l-00016-of-00033.bin: 28% 115M/405M [00:00<00:01, 191MB/s] \u001b[A\n",
"Downloading (…)l-00016-of-00033.bin: 36% 147M/405M [00:00<00:01, 202MB/s]\u001b[A\n",
"Downloading (…)l-00016-of-00033.bin: 41% 168M/405M [00:00<00:01, 204MB/s]\u001b[A\n",
"Downloading (…)l-00016-of-00033.bin: 47% 189M/405M [00:00<00:01, 205MB/s]\u001b[A\n",
"Downloading (…)l-00016-of-00033.bin: 52% 210M/405M [00:01<00:00, 206MB/s]\u001b[A\n",
"Downloading (…)l-00016-of-00033.bin: 57% 231M/405M [00:01<00:00, 204MB/s]\u001b[A\n",
"Downloading (…)l-00016-of-00033.bin: 62% 252M/405M [00:01<00:00, 199MB/s]\u001b[A\n",
"Downloading (…)l-00016-of-00033.bin: 67% 273M/405M [00:01<00:00, 201MB/s]\u001b[A\n",
"Downloading (…)l-00016-of-00033.bin: 73% 294M/405M [00:01<00:00, 203MB/s]\u001b[A\n",
"Downloading (…)l-00016-of-00033.bin: 78% 315M/405M [00:01<00:00, 205MB/s]\u001b[A\n",
"Downloading (…)l-00016-of-00033.bin: 83% 336M/405M [00:01<00:00, 206MB/s]\u001b[A\n",
"Downloading (…)l-00016-of-00033.bin: 88% 357M/405M [00:01<00:00, 207MB/s]\u001b[A\n",
"Downloading (…)l-00016-of-00033.bin: 93% 377M/405M [00:01<00:00, 207MB/s]\u001b[A\n",
"Downloading (…)l-00016-of-00033.bin: 100% 405M/405M [00:02<00:00, 196MB/s]\n",
"Downloading shards: 48% 16/33 [01:08<02:16, 8.06s/it]\n",
"Downloading (…)l-00017-of-00033.bin: 0% 0.00/405M [00:00, ?B/s]\u001b[A\n",
"Downloading (…)l-00017-of-00033.bin: 3% 10.5M/405M [00:00<00:04, 90.4MB/s]\u001b[A\n",
"Downloading (…)l-00017-of-00033.bin: 8% 31.5M/405M [00:00<00:02, 143MB/s] \u001b[A\n",
"Downloading (…)l-00017-of-00033.bin: 13% 52.4M/405M [00:00<00:02, 169MB/s]\u001b[A\n",
"Downloading (…)l-00017-of-00033.bin: 18% 73.4M/405M [00:00<00:01, 183MB/s]\u001b[A\n",
"Downloading (…)l-00017-of-00033.bin: 23% 94.4M/405M [00:00<00:01, 182MB/s]\u001b[A\n",
"Downloading (…)l-00017-of-00033.bin: 28% 115M/405M [00:00<00:01, 189MB/s] \u001b[A\n",
"Downloading (…)l-00017-of-00033.bin: 34% 136M/405M [00:00<00:01, 195MB/s]\u001b[A\n",
"Downloading (…)l-00017-of-00033.bin: 39% 157M/405M [00:00<00:01, 198MB/s]\u001b[A\n",
"Downloading (…)l-00017-of-00033.bin: 44% 178M/405M [00:00<00:01, 201MB/s]\u001b[A\n",
"Downloading (…)l-00017-of-00033.bin: 49% 199M/405M [00:01<00:01, 202MB/s]\u001b[A\n",
"Downloading (…)l-00017-of-00033.bin: 54% 220M/405M [00:01<00:00, 202MB/s]\u001b[A\n",
"Downloading (…)l-00017-of-00033.bin: 60% 241M/405M [00:01<00:00, 203MB/s]\u001b[A\n",
"Downloading (…)l-00017-of-00033.bin: 65% 262M/405M [00:01<00:00, 204MB/s]\u001b[A\n",
"Downloading (…)l-00017-of-00033.bin: 70% 283M/405M [00:01<00:00, 205MB/s]\u001b[A\n",
"Downloading (…)l-00017-of-00033.bin: 75% 304M/405M [00:01<00:00, 205MB/s]\u001b[A\n",
"Downloading (…)l-00017-of-00033.bin: 80% 325M/405M [00:01<00:00, 206MB/s]\u001b[A\n",
"Downloading (…)l-00017-of-00033.bin: 85% 346M/405M [00:01<00:00, 205MB/s]\u001b[A\n",
"Downloading (…)l-00017-of-00033.bin: 91% 367M/405M [00:01<00:00, 205MB/s]\u001b[A\n",
"Downloading (…)l-00017-of-00033.bin: 100% 405M/405M [00:02<00:00, 194MB/s]\n",
"Downloading shards: 52% 17/33 [01:10<01:40, 6.30s/it]\n",
"Downloading (…)l-00018-of-00033.bin: 0% 0.00/405M [00:00, ?B/s]\u001b[A\n",
"Downloading (…)l-00018-of-00033.bin: 3% 10.5M/405M [00:00<00:04, 89.0MB/s]\u001b[A\n",
"Downloading (…)l-00018-of-00033.bin: 8% 31.5M/405M [00:00<00:02, 144MB/s] \u001b[A\n",
"Downloading (…)l-00018-of-00033.bin: 13% 52.4M/405M [00:00<00:02, 170MB/s]\u001b[A\n",
"Downloading (…)l-00018-of-00033.bin: 18% 73.4M/405M [00:00<00:01, 183MB/s]\u001b[A\n",
"Downloading (…)l-00018-of-00033.bin: 23% 94.4M/405M [00:00<00:01, 190MB/s]\u001b[A\n",
"Downloading (…)l-00018-of-00033.bin: 28% 115M/405M [00:00<00:01, 194MB/s] \u001b[A\n",
"Downloading (…)l-00018-of-00033.bin: 34% 136M/405M [00:00<00:01, 197MB/s]\u001b[A\n",
"Downloading (…)l-00018-of-00033.bin: 39% 157M/405M [00:00<00:01, 199MB/s]\u001b[A\n",
"Downloading (…)l-00018-of-00033.bin: 44% 178M/405M [00:00<00:01, 200MB/s]\u001b[A\n",
"Downloading (…)l-00018-of-00033.bin: 49% 199M/405M [00:01<00:01, 201MB/s]\u001b[A\n",
"Downloading (…)l-00018-of-00033.bin: 54% 220M/405M [00:01<00:00, 202MB/s]\u001b[A\n",
"Downloading (…)l-00018-of-00033.bin: 60% 241M/405M [00:01<00:00, 199MB/s]\u001b[A\n",
"Downloading (…)l-00018-of-00033.bin: 65% 262M/405M [00:01<00:00, 198MB/s]\u001b[A\n",
"Downloading (…)l-00018-of-00033.bin: 70% 283M/405M [00:01<00:00, 199MB/s]\u001b[A\n",
"Downloading (…)l-00018-of-00033.bin: 75% 304M/405M [00:01<00:00, 200MB/s]\u001b[A\n",
"Downloading (…)l-00018-of-00033.bin: 80% 325M/405M [00:01<00:00, 194MB/s]\u001b[A\n",
"Downloading (…)l-00018-of-00033.bin: 85% 346M/405M [00:01<00:00, 197MB/s]\u001b[A\n",
"Downloading (…)l-00018-of-00033.bin: 91% 367M/405M [00:01<00:00, 199MB/s]\u001b[A\n",
"Downloading (…)l-00018-of-00033.bin: 100% 405M/405M [00:02<00:00, 192MB/s]\n",
"Downloading shards: 55% 18/33 [01:12<01:16, 5.09s/it]\n",
"Downloading (…)l-00019-of-00033.bin: 0% 0.00/405M [00:00, ?B/s]\u001b[A\n",
"Downloading (…)l-00019-of-00033.bin: 3% 10.5M/405M [00:00<00:04, 85.9MB/s]\u001b[A\n",
"Downloading (…)l-00019-of-00033.bin: 8% 31.5M/405M [00:00<00:02, 142MB/s] \u001b[A\n",
"Downloading (…)l-00019-of-00033.bin: 13% 52.4M/405M [00:00<00:02, 167MB/s]\u001b[A\n",
"Downloading (…)l-00019-of-00033.bin: 18% 73.4M/405M [00:00<00:01, 177MB/s]\u001b[A\n",
"Downloading (…)l-00019-of-00033.bin: 23% 94.4M/405M [00:00<00:01, 185MB/s]\u001b[A\n",
"Downloading (…)l-00019-of-00033.bin: 28% 115M/405M [00:00<00:01, 185MB/s] \u001b[A\n",
"Downloading (…)l-00019-of-00033.bin: 34% 136M/405M [00:00<00:01, 187MB/s]\u001b[A\n",
"Downloading (…)l-00019-of-00033.bin: 39% 157M/405M [00:00<00:01, 190MB/s]\u001b[A\n",
"Downloading (…)l-00019-of-00033.bin: 44% 178M/405M [00:00<00:01, 193MB/s]\u001b[A\n",
"Downloading (…)l-00019-of-00033.bin: 49% 199M/405M [00:01<00:01, 196MB/s]\u001b[A\n",
"Downloading (…)l-00019-of-00033.bin: 54% 220M/405M [00:01<00:00, 197MB/s]\u001b[A\n",
"Downloading (…)l-00019-of-00033.bin: 60% 241M/405M [00:01<00:00, 199MB/s]\u001b[A\n",
"Downloading (…)l-00019-of-00033.bin: 65% 262M/405M [00:01<00:00, 199MB/s]\u001b[A\n",
"Downloading (…)l-00019-of-00033.bin: 70% 283M/405M [00:01<00:00, 197MB/s]\u001b[A\n",
"Downloading (…)l-00019-of-00033.bin: 75% 304M/405M [00:01<00:00, 198MB/s]\u001b[A\n",
"Downloading (…)l-00019-of-00033.bin: 80% 325M/405M [00:01<00:00, 199MB/s]\u001b[A\n",
"Downloading (…)l-00019-of-00033.bin: 85% 346M/405M [00:01<00:00, 199MB/s]\u001b[A\n",
"Downloading (…)l-00019-of-00033.bin: 91% 367M/405M [00:01<00:00, 199MB/s]\u001b[A\n",
"Downloading (…)l-00019-of-00033.bin: 100% 405M/405M [00:02<00:00, 189MB/s]\n",
"Downloading shards: 58% 19/33 [01:15<00:59, 4.24s/it]\n",
"Downloading (…)l-00020-of-00033.bin: 0% 0.00/405M [00:00, ?B/s]\u001b[A\n",
"Downloading (…)l-00020-of-00033.bin: 3% 10.5M/405M [00:00<00:04, 85.8MB/s]\u001b[A\n",
"Downloading (…)l-00020-of-00033.bin: 8% 31.5M/405M [00:00<00:02, 144MB/s] \u001b[A\n",
"Downloading (…)l-00020-of-00033.bin: 13% 52.4M/405M [00:00<00:02, 169MB/s]\u001b[A\n",
"Downloading (…)l-00020-of-00033.bin: 18% 73.4M/405M [00:00<00:01, 181MB/s]\u001b[A\n",
"Downloading (…)l-00020-of-00033.bin: 23% 94.4M/405M [00:00<00:01, 189MB/s]\u001b[A\n",
"Downloading (…)l-00020-of-00033.bin: 28% 115M/405M [00:00<00:01, 191MB/s] \u001b[A\n",
"Downloading (…)l-00020-of-00033.bin: 34% 136M/405M [00:00<00:01, 192MB/s]\u001b[A\n",
"Downloading (…)l-00020-of-00033.bin: 39% 157M/405M [00:00<00:01, 185MB/s]\u001b[A\n",
"Downloading (…)l-00020-of-00033.bin: 44% 178M/405M [00:00<00:01, 190MB/s]\u001b[A\n",
"Downloading (…)l-00020-of-00033.bin: 49% 199M/405M [00:01<00:01, 193MB/s]\u001b[A\n",
"Downloading (…)l-00020-of-00033.bin: 54% 220M/405M [00:01<00:00, 196MB/s]\u001b[A\n",
"Downloading (…)l-00020-of-00033.bin: 60% 241M/405M [00:01<00:00, 198MB/s]\u001b[A\n",
"Downloading (…)l-00020-of-00033.bin: 65% 262M/405M [00:01<00:00, 198MB/s]\u001b[A\n",
"Downloading (…)l-00020-of-00033.bin: 70% 283M/405M [00:01<00:00, 199MB/s]\u001b[A\n",
"Downloading (…)l-00020-of-00033.bin: 75% 304M/405M [00:01<00:00, 200MB/s]\u001b[A\n",
"Downloading (…)l-00020-of-00033.bin: 80% 325M/405M [00:01<00:00, 201MB/s]\u001b[A\n",
"Downloading (…)l-00020-of-00033.bin: 88% 357M/405M [00:01<00:00, 205MB/s]\u001b[A\n",
"Downloading (…)l-00020-of-00033.bin: 93% 377M/405M [00:01<00:00, 206MB/s]\u001b[A\n",
"Downloading (…)l-00020-of-00033.bin: 100% 405M/405M [00:02<00:00, 192MB/s]\n",
"Downloading shards: 61% 20/33 [01:17<00:47, 3.64s/it]\n",
"Downloading (…)l-00021-of-00033.bin: 0% 0.00/405M [00:00, ?B/s]\u001b[A\n",
"Downloading (…)l-00021-of-00033.bin: 3% 10.5M/405M [00:00<00:04, 84.8MB/s]\u001b[A\n",
"Downloading (…)l-00021-of-00033.bin: 8% 31.5M/405M [00:00<00:02, 141MB/s] \u001b[A\n",
"Downloading (…)l-00021-of-00033.bin: 13% 52.4M/405M [00:00<00:02, 168MB/s]\u001b[A\n",
"Downloading (…)l-00021-of-00033.bin: 18% 73.4M/405M [00:00<00:01, 181MB/s]\u001b[A\n",
"Downloading (…)l-00021-of-00033.bin: 23% 94.4M/405M [00:00<00:01, 190MB/s]\u001b[A\n",
"Downloading (…)l-00021-of-00033.bin: 28% 115M/405M [00:00<00:01, 192MB/s] \u001b[A\n",
"Downloading (…)l-00021-of-00033.bin: 34% 136M/405M [00:00<00:01, 195MB/s]\u001b[A\n",
"Downloading (…)l-00021-of-00033.bin: 39% 157M/405M [00:00<00:01, 198MB/s]\u001b[A\n",
"Downloading (…)l-00021-of-00033.bin: 44% 178M/405M [00:00<00:01, 199MB/s]\u001b[A\n",
"Downloading (…)l-00021-of-00033.bin: 49% 199M/405M [00:01<00:01, 201MB/s]\u001b[A\n",
"Downloading (…)l-00021-of-00033.bin: 54% 220M/405M [00:01<00:00, 202MB/s]\u001b[A\n",
"Downloading (…)l-00021-of-00033.bin: 60% 241M/405M [00:01<00:00, 203MB/s]\u001b[A\n",
"Downloading (…)l-00021-of-00033.bin: 65% 262M/405M [00:01<00:00, 204MB/s]\u001b[A\n",
"Downloading (…)l-00021-of-00033.bin: 70% 283M/405M [00:01<00:00, 205MB/s]\u001b[A\n",
"Downloading (…)l-00021-of-00033.bin: 75% 304M/405M [00:01<00:00, 206MB/s]\u001b[A\n",
"Downloading (…)l-00021-of-00033.bin: 80% 325M/405M [00:01<00:00, 206MB/s]\u001b[A\n",
"Downloading (…)l-00021-of-00033.bin: 85% 346M/405M [00:01<00:00, 207MB/s]\u001b[A\n",
"Downloading (…)l-00021-of-00033.bin: 91% 367M/405M [00:01<00:00, 207MB/s]\u001b[A\n",
"Downloading (…)l-00021-of-00033.bin: 100% 405M/405M [00:02<00:00, 196MB/s]\n",
"Downloading shards: 64% 21/33 [01:19<00:38, 3.21s/it]\n",
"Downloading (…)l-00022-of-00033.bin: 0% 0.00/405M [00:00, ?B/s]\u001b[A\n",
"Downloading (…)l-00022-of-00033.bin: 3% 10.5M/405M [00:00<00:04, 89.8MB/s]\u001b[A\n",
"Downloading (…)l-00022-of-00033.bin: 8% 31.5M/405M [00:00<00:02, 147MB/s] \u001b[A\n",
"Downloading (…)l-00022-of-00033.bin: 13% 52.4M/405M [00:00<00:02, 169MB/s]\u001b[A\n",
"Downloading (…)l-00022-of-00033.bin: 18% 73.4M/405M [00:00<00:01, 179MB/s]\u001b[A\n",
"Downloading (…)l-00022-of-00033.bin: 23% 94.4M/405M [00:00<00:01, 189MB/s]\u001b[A\n",
"Downloading (…)l-00022-of-00033.bin: 28% 115M/405M [00:00<00:01, 194MB/s] \u001b[A\n",
"Downloading (…)l-00022-of-00033.bin: 34% 136M/405M [00:00<00:01, 196MB/s]\u001b[A\n",
"Downloading (…)l-00022-of-00033.bin: 39% 157M/405M [00:00<00:01, 193MB/s]\u001b[A\n",
"Downloading (…)l-00022-of-00033.bin: 44% 178M/405M [00:00<00:01, 197MB/s]\u001b[A\n",
"Downloading (…)l-00022-of-00033.bin: 49% 199M/405M [00:01<00:01, 199MB/s]\u001b[A\n",
"Downloading (…)l-00022-of-00033.bin: 54% 220M/405M [00:01<00:00, 200MB/s]\u001b[A\n",
"Downloading (…)l-00022-of-00033.bin: 60% 241M/405M [00:01<00:00, 200MB/s]\u001b[A\n",
"Downloading (…)l-00022-of-00033.bin: 65% 262M/405M [00:01<00:00, 202MB/s]\u001b[A\n",
"Downloading (…)l-00022-of-00033.bin: 70% 283M/405M [00:01<00:00, 203MB/s]\u001b[A\n",
"Downloading (…)l-00022-of-00033.bin: 75% 304M/405M [00:01<00:00, 203MB/s]\u001b[A\n",
"Downloading (…)l-00022-of-00033.bin: 80% 325M/405M [00:01<00:00, 202MB/s]\u001b[A\n",
"Downloading (…)l-00022-of-00033.bin: 85% 346M/405M [00:01<00:00, 199MB/s]\u001b[A\n",
"Downloading (…)l-00022-of-00033.bin: 91% 367M/405M [00:01<00:00, 200MB/s]\u001b[A\n",
"Downloading (…)l-00022-of-00033.bin: 100% 405M/405M [00:02<00:00, 193MB/s]\n",
"Downloading shards: 67% 22/33 [01:21<00:32, 2.92s/it]\n",
"Downloading (…)l-00023-of-00033.bin: 0% 0.00/405M [00:00, ?B/s]\u001b[A\n",
"Downloading (…)l-00023-of-00033.bin: 3% 10.5M/405M [00:00<00:04, 92.9MB/s]\u001b[A\n",
"Downloading (…)l-00023-of-00033.bin: 8% 31.5M/405M [00:00<00:02, 153MB/s] \u001b[A\n",
"Downloading (…)l-00023-of-00033.bin: 13% 52.4M/405M [00:00<00:01, 176MB/s]\u001b[A\n",
"Downloading (…)l-00023-of-00033.bin: 18% 73.4M/405M [00:00<00:01, 188MB/s]\u001b[A\n",
"Downloading (…)l-00023-of-00033.bin: 23% 94.4M/405M [00:00<00:01, 195MB/s]\u001b[A\n",
"Downloading (…)l-00023-of-00033.bin: 28% 115M/405M [00:00<00:01, 199MB/s] \u001b[A\n",
"Downloading (…)l-00023-of-00033.bin: 34% 136M/405M [00:00<00:01, 202MB/s]\u001b[A\n",
"Downloading (…)l-00023-of-00033.bin: 39% 157M/405M [00:00<00:01, 204MB/s]\u001b[A\n",
"Downloading (…)l-00023-of-00033.bin: 44% 178M/405M [00:00<00:01, 204MB/s]\u001b[A\n",
"Downloading (…)l-00023-of-00033.bin: 49% 199M/405M [00:01<00:01, 201MB/s]\u001b[A\n",
"Downloading (…)l-00023-of-00033.bin: 54% 220M/405M [00:01<00:00, 199MB/s]\u001b[A\n",
"Downloading (…)l-00023-of-00033.bin: 60% 241M/405M [00:01<00:00, 202MB/s]\u001b[A\n",
"Downloading (…)l-00023-of-00033.bin: 65% 262M/405M [00:01<00:00, 203MB/s]\u001b[A\n",
"Downloading (…)l-00023-of-00033.bin: 70% 283M/405M [00:01<00:00, 204MB/s]\u001b[A\n",
"Downloading (…)l-00023-of-00033.bin: 75% 304M/405M [00:01<00:00, 205MB/s]\u001b[A\n",
"Downloading (…)l-00023-of-00033.bin: 80% 325M/405M [00:01<00:00, 205MB/s]\u001b[A\n",
"Downloading (…)l-00023-of-00033.bin: 85% 346M/405M [00:01<00:00, 206MB/s]\u001b[A\n",
"Downloading (…)l-00023-of-00033.bin: 91% 367M/405M [00:01<00:00, 204MB/s]\u001b[A\n",
"Downloading (…)l-00023-of-00033.bin: 100% 405M/405M [00:02<00:00, 197MB/s]\n",
"Downloading shards: 70% 23/33 [01:23<00:27, 2.70s/it]\n",
"Downloading (…)l-00024-of-00033.bin: 0% 0.00/405M [00:00, ?B/s]\u001b[A\n",
"Downloading (…)l-00024-of-00033.bin: 3% 10.5M/405M [00:00<00:04, 93.8MB/s]\u001b[A\n",
"Downloading (…)l-00024-of-00033.bin: 8% 31.5M/405M [00:00<00:02, 152MB/s] \u001b[A\n",
"Downloading (…)l-00024-of-00033.bin: 13% 52.4M/405M [00:00<00:01, 177MB/s]\u001b[A\n",
"Downloading (…)l-00024-of-00033.bin: 21% 83.9M/405M [00:00<00:01, 193MB/s]\u001b[A\n",
"Downloading (…)l-00024-of-00033.bin: 28% 115M/405M [00:00<00:01, 200MB/s] \u001b[A\n",
"Downloading (…)l-00024-of-00033.bin: 34% 136M/405M [00:00<00:01, 203MB/s]\u001b[A\n",
"Downloading (…)l-00024-of-00033.bin: 41% 168M/405M [00:00<00:01, 206MB/s]\u001b[A\n",
"Downloading (…)l-00024-of-00033.bin: 49% 199M/405M [00:01<00:00, 207MB/s]\u001b[A\n",
"Downloading (…)l-00024-of-00033.bin: 57% 231M/405M [00:01<00:00, 207MB/s]\u001b[A\n",
"Downloading (…)l-00024-of-00033.bin: 62% 252M/405M [00:01<00:00, 205MB/s]\u001b[A\n",
"Downloading (…)l-00024-of-00033.bin: 67% 273M/405M [00:01<00:00, 203MB/s]\u001b[A\n",
"Downloading (…)l-00024-of-00033.bin: 73% 294M/405M [00:01<00:00, 204MB/s]\u001b[A\n",
"Downloading (…)l-00024-of-00033.bin: 80% 325M/405M [00:01<00:00, 205MB/s]\u001b[A\n",
"Downloading (…)l-00024-of-00033.bin: 85% 346M/405M [00:01<00:00, 204MB/s]\u001b[A\n",
"Downloading (…)l-00024-of-00033.bin: 93% 377M/405M [00:01<00:00, 207MB/s]\u001b[A\n",
"Downloading (…)l-00024-of-00033.bin: 100% 405M/405M [00:02<00:00, 199MB/s]\n",
"Downloading shards: 73% 24/33 [01:26<00:22, 2.54s/it]\n",
"Downloading (…)l-00025-of-00033.bin: 0% 0.00/405M [00:00, ?B/s]\u001b[A\n",
"Downloading (…)l-00025-of-00033.bin: 3% 10.5M/405M [00:00<00:04, 92.7MB/s]\u001b[A\n",
"Downloading (…)l-00025-of-00033.bin: 8% 31.5M/405M [00:00<00:02, 152MB/s] \u001b[A\n",
"Downloading (…)l-00025-of-00033.bin: 13% 52.4M/405M [00:00<00:02, 175MB/s]\u001b[A\n",
"Downloading (…)l-00025-of-00033.bin: 18% 73.4M/405M [00:00<00:01, 188MB/s]\u001b[A\n",
"Downloading (…)l-00025-of-00033.bin: 23% 94.4M/405M [00:00<00:01, 193MB/s]\u001b[A\n",
"Downloading (…)l-00025-of-00033.bin: 28% 115M/405M [00:00<00:01, 195MB/s] \u001b[A\n",
"Downloading (…)l-00025-of-00033.bin: 34% 136M/405M [00:00<00:01, 197MB/s]\u001b[A\n",
"Downloading (…)l-00025-of-00033.bin: 39% 157M/405M [00:00<00:01, 200MB/s]\u001b[A\n",
"Downloading (…)l-00025-of-00033.bin: 44% 178M/405M [00:00<00:01, 202MB/s]\u001b[A\n",
"Downloading (…)l-00025-of-00033.bin: 49% 199M/405M [00:01<00:01, 205MB/s]\u001b[A\n",
"Downloading (…)l-00025-of-00033.bin: 54% 220M/405M [00:01<00:00, 206MB/s]\u001b[A\n",
"Downloading (…)l-00025-of-00033.bin: 60% 241M/405M [00:01<00:00, 206MB/s]\u001b[A\n",
"Downloading (…)l-00025-of-00033.bin: 65% 262M/405M [00:01<00:00, 207MB/s]\u001b[A\n",
"Downloading (…)l-00025-of-00033.bin: 70% 283M/405M [00:01<00:00, 207MB/s]\u001b[A\n",
"Downloading (…)l-00025-of-00033.bin: 75% 304M/405M [00:01<00:00, 207MB/s]\u001b[A\n",
"Downloading (…)l-00025-of-00033.bin: 80% 325M/405M [00:01<00:00, 207MB/s]\u001b[A\n",
"Downloading (…)l-00025-of-00033.bin: 85% 346M/405M [00:01<00:00, 207MB/s]\u001b[A\n",
"Downloading (…)l-00025-of-00033.bin: 91% 367M/405M [00:01<00:00, 208MB/s]\u001b[A\n",
"Downloading (…)l-00025-of-00033.bin: 100% 405M/405M [00:02<00:00, 198MB/s]\n",
"Downloading shards: 76% 25/33 [01:28<00:19, 2.43s/it]\n",
"Downloading (…)l-00026-of-00033.bin: 0% 0.00/405M [00:00, ?B/s]\u001b[A\n",
"Downloading (…)l-00026-of-00033.bin: 3% 10.5M/405M [00:00<00:04, 88.3MB/s]\u001b[A\n",
"Downloading (…)l-00026-of-00033.bin: 8% 31.5M/405M [00:00<00:02, 148MB/s] \u001b[A\n",
"Downloading (…)l-00026-of-00033.bin: 13% 52.4M/405M [00:00<00:02, 174MB/s]\u001b[A\n",
"Downloading (…)l-00026-of-00033.bin: 18% 73.4M/405M [00:00<00:01, 187MB/s]\u001b[A\n",
"Downloading (…)l-00026-of-00033.bin: 23% 94.4M/405M [00:00<00:01, 194MB/s]\u001b[A\n",
"Downloading (…)l-00026-of-00033.bin: 28% 115M/405M [00:00<00:01, 199MB/s] \u001b[A\n",
"Downloading (…)l-00026-of-00033.bin: 34% 136M/405M [00:00<00:01, 202MB/s]\u001b[A\n",
"Downloading (…)l-00026-of-00033.bin: 39% 157M/405M [00:00<00:01, 204MB/s]\u001b[A\n",
"Downloading (…)l-00026-of-00033.bin: 44% 178M/405M [00:00<00:01, 205MB/s]\u001b[A\n",
"Downloading (…)l-00026-of-00033.bin: 49% 199M/405M [00:01<00:00, 206MB/s]\u001b[A\n",
"Downloading (…)l-00026-of-00033.bin: 54% 220M/405M [00:01<00:00, 195MB/s]\u001b[A\n",
"Downloading (…)l-00026-of-00033.bin: 60% 241M/405M [00:01<00:00, 199MB/s]\u001b[A\n",
"Downloading (…)l-00026-of-00033.bin: 65% 262M/405M [00:01<00:00, 201MB/s]\u001b[A\n",
"Downloading (…)l-00026-of-00033.bin: 70% 283M/405M [00:01<00:00, 203MB/s]\u001b[A\n",
"Downloading (…)l-00026-of-00033.bin: 78% 315M/405M [00:01<00:00, 206MB/s]\u001b[A\n",
"Downloading (…)l-00026-of-00033.bin: 83% 336M/405M [00:01<00:00, 207MB/s]\u001b[A\n",
"Downloading (…)l-00026-of-00033.bin: 88% 357M/405M [00:01<00:00, 206MB/s]\u001b[A\n",
"Downloading (…)l-00026-of-00033.bin: 93% 377M/405M [00:01<00:00, 204MB/s]\u001b[A\n",
"Downloading (…)l-00026-of-00033.bin: 100% 405M/405M [00:02<00:00, 196MB/s]\n",
"Downloading shards: 79% 26/33 [01:30<00:16, 2.37s/it]\n",
"Downloading (…)l-00027-of-00033.bin: 0% 0.00/405M [00:00, ?B/s]\u001b[A\n",
"Downloading (…)l-00027-of-00033.bin: 3% 10.5M/405M [00:00<00:04, 92.8MB/s]\u001b[A\n",
"Downloading (…)l-00027-of-00033.bin: 8% 31.5M/405M [00:00<00:02, 153MB/s] \u001b[A\n",
"Downloading (…)l-00027-of-00033.bin: 13% 52.4M/405M [00:00<00:01, 177MB/s]\u001b[A\n",
"Downloading (…)l-00027-of-00033.bin: 18% 73.4M/405M [00:00<00:01, 188MB/s]\u001b[A\n",
"Downloading (…)l-00027-of-00033.bin: 23% 94.4M/405M [00:00<00:01, 195MB/s]\u001b[A\n",
"Downloading (…)l-00027-of-00033.bin: 31% 126M/405M [00:00<00:01, 203MB/s] \u001b[A\n",
"Downloading (…)l-00027-of-00033.bin: 36% 147M/405M [00:00<00:01, 204MB/s]\u001b[A\n",
"Downloading (…)l-00027-of-00033.bin: 41% 168M/405M [00:00<00:01, 205MB/s]\u001b[A\n",
"Downloading (…)l-00027-of-00033.bin: 47% 189M/405M [00:00<00:01, 205MB/s]\u001b[A\n",
"Downloading (…)l-00027-of-00033.bin: 52% 210M/405M [00:01<00:00, 206MB/s]\u001b[A\n",
"Downloading (…)l-00027-of-00033.bin: 57% 231M/405M [00:01<00:00, 205MB/s]\u001b[A\n",
"Downloading (…)l-00027-of-00033.bin: 62% 252M/405M [00:01<00:00, 203MB/s]\u001b[A\n",
"Downloading (…)l-00027-of-00033.bin: 67% 273M/405M [00:01<00:00, 199MB/s]\u001b[A\n",
"Downloading (…)l-00027-of-00033.bin: 73% 294M/405M [00:01<00:00, 202MB/s]\u001b[A\n",
"Downloading (…)l-00027-of-00033.bin: 78% 315M/405M [00:01<00:00, 204MB/s]\u001b[A\n",
"Downloading (…)l-00027-of-00033.bin: 83% 336M/405M [00:01<00:00, 205MB/s]\u001b[A\n",
"Downloading (…)l-00027-of-00033.bin: 88% 357M/405M [00:01<00:00, 206MB/s]\u001b[A\n",
"Downloading (…)l-00027-of-00033.bin: 93% 377M/405M [00:01<00:00, 207MB/s]\u001b[A\n",
"Downloading (…)l-00027-of-00033.bin: 100% 405M/405M [00:02<00:00, 198MB/s]\n",
"Downloading shards: 82% 27/33 [01:32<00:13, 2.31s/it]\n",
"Downloading (…)l-00028-of-00033.bin: 0% 0.00/405M [00:00, ?B/s]\u001b[A\n",
"Downloading (…)l-00028-of-00033.bin: 3% 10.5M/405M [00:00<00:04, 92.9MB/s]\u001b[A\n",
"Downloading (…)l-00028-of-00033.bin: 8% 31.5M/405M [00:00<00:02, 154MB/s] \u001b[A\n",
"Downloading (…)l-00028-of-00033.bin: 13% 52.4M/405M [00:00<00:01, 176MB/s]\u001b[A\n",
"Downloading (…)l-00028-of-00033.bin: 18% 73.4M/405M [00:00<00:01, 188MB/s]\u001b[A\n",
"Downloading (…)l-00028-of-00033.bin: 23% 94.4M/405M [00:00<00:01, 194MB/s]\u001b[A\n",
"Downloading (…)l-00028-of-00033.bin: 28% 115M/405M [00:00<00:01, 199MB/s] \u001b[A\n",
"Downloading (…)l-00028-of-00033.bin: 34% 136M/405M [00:00<00:01, 201MB/s]\u001b[A\n",
"Downloading (…)l-00028-of-00033.bin: 39% 157M/405M [00:00<00:01, 199MB/s]\u001b[A\n",
"Downloading (…)l-00028-of-00033.bin: 44% 178M/405M [00:00<00:01, 199MB/s]\u001b[A\n",
"Downloading (…)l-00028-of-00033.bin: 49% 199M/405M [00:01<00:01, 200MB/s]\u001b[A\n",
"Downloading (…)l-00028-of-00033.bin: 54% 220M/405M [00:01<00:00, 201MB/s]\u001b[A\n",
"Downloading (…)l-00028-of-00033.bin: 60% 241M/405M [00:01<00:00, 198MB/s]\u001b[A\n",
"Downloading (…)l-00028-of-00033.bin: 65% 262M/405M [00:01<00:00, 190MB/s]\u001b[A\n",
"Downloading (…)l-00028-of-00033.bin: 70% 283M/405M [00:01<00:00, 193MB/s]\u001b[A\n",
"Downloading (…)l-00028-of-00033.bin: 75% 304M/405M [00:01<00:00, 195MB/s]\u001b[A\n",
"Downloading (…)l-00028-of-00033.bin: 80% 325M/405M [00:01<00:00, 197MB/s]\u001b[A\n",
"Downloading (…)l-00028-of-00033.bin: 85% 346M/405M [00:01<00:00, 198MB/s]\u001b[A\n",
"Downloading (…)l-00028-of-00033.bin: 91% 367M/405M [00:01<00:00, 199MB/s]\u001b[A\n",
"Downloading (…)l-00028-of-00033.bin: 100% 405M/405M [00:02<00:00, 193MB/s]\n",
"Downloading shards: 85% 28/33 [01:34<00:11, 2.29s/it]\n",
"Downloading (…)l-00029-of-00033.bin: 0% 0.00/405M [00:00, ?B/s]\u001b[A\n",
"Downloading (…)l-00029-of-00033.bin: 3% 10.5M/405M [00:00<00:04, 91.7MB/s]\u001b[A\n",
"Downloading (…)l-00029-of-00033.bin: 8% 31.5M/405M [00:00<00:02, 149MB/s] \u001b[A\n",
"Downloading (…)l-00029-of-00033.bin: 13% 52.4M/405M [00:00<00:02, 173MB/s]\u001b[A\n",
"Downloading (…)l-00029-of-00033.bin: 18% 73.4M/405M [00:00<00:01, 175MB/s]\u001b[A\n",
"Downloading (…)l-00029-of-00033.bin: 23% 94.4M/405M [00:00<00:01, 185MB/s]\u001b[A\n",
"Downloading (…)l-00029-of-00033.bin: 28% 115M/405M [00:00<00:01, 191MB/s] \u001b[A\n",
"Downloading (…)l-00029-of-00033.bin: 34% 136M/405M [00:00<00:01, 194MB/s]\u001b[A\n",
"Downloading (…)l-00029-of-00033.bin: 39% 157M/405M [00:00<00:01, 194MB/s]\u001b[A\n",
"Downloading (…)l-00029-of-00033.bin: 44% 178M/405M [00:00<00:01, 193MB/s]\u001b[A\n",
"Downloading (…)l-00029-of-00033.bin: 49% 199M/405M [00:01<00:01, 196MB/s]\u001b[A\n",
"Downloading (…)l-00029-of-00033.bin: 54% 220M/405M [00:01<00:00, 198MB/s]\u001b[A\n",
"Downloading (…)l-00029-of-00033.bin: 60% 241M/405M [00:01<00:00, 200MB/s]\u001b[A\n",
"Downloading (…)l-00029-of-00033.bin: 65% 262M/405M [00:01<00:00, 200MB/s]\u001b[A\n",
"Downloading (…)l-00029-of-00033.bin: 70% 283M/405M [00:01<00:00, 201MB/s]\u001b[A\n",
"Downloading (…)l-00029-of-00033.bin: 75% 304M/405M [00:01<00:00, 202MB/s]\u001b[A\n",
"Downloading (…)l-00029-of-00033.bin: 80% 325M/405M [00:01<00:00, 201MB/s]\u001b[A\n",
"Downloading (…)l-00029-of-00033.bin: 85% 346M/405M [00:01<00:00, 202MB/s]\u001b[A\n",
"Downloading (…)l-00029-of-00033.bin: 91% 367M/405M [00:01<00:00, 202MB/s]\u001b[A\n",
"Downloading (…)l-00029-of-00033.bin: 100% 405M/405M [00:02<00:00, 192MB/s]\n",
"Downloading shards: 88% 29/33 [01:37<00:09, 2.28s/it]\n",
"Downloading (…)l-00030-of-00033.bin: 0% 0.00/405M [00:00, ?B/s]\u001b[A\n",
"Downloading (…)l-00030-of-00033.bin: 3% 10.5M/405M [00:00<00:04, 89.7MB/s]\u001b[A\n",
"Downloading (…)l-00030-of-00033.bin: 8% 31.5M/405M [00:00<00:02, 149MB/s] \u001b[A\n",
"Downloading (…)l-00030-of-00033.bin: 13% 52.4M/405M [00:00<00:02, 173MB/s]\u001b[A\n",
"Downloading (…)l-00030-of-00033.bin: 18% 73.4M/405M [00:00<00:01, 186MB/s]\u001b[A\n",
"Downloading (…)l-00030-of-00033.bin: 23% 94.4M/405M [00:00<00:01, 193MB/s]\u001b[A\n",
"Downloading (…)l-00030-of-00033.bin: 28% 115M/405M [00:00<00:01, 197MB/s] \u001b[A\n",
"Downloading (…)l-00030-of-00033.bin: 34% 136M/405M [00:00<00:01, 201MB/s]\u001b[A\n",
"Downloading (…)l-00030-of-00033.bin: 39% 157M/405M [00:00<00:01, 203MB/s]\u001b[A\n",
"Downloading (…)l-00030-of-00033.bin: 44% 178M/405M [00:00<00:01, 203MB/s]\u001b[A\n",
"Downloading (…)l-00030-of-00033.bin: 49% 199M/405M [00:01<00:01, 205MB/s]\u001b[A\n",
"Downloading (…)l-00030-of-00033.bin: 54% 220M/405M [00:01<00:00, 205MB/s]\u001b[A\n",
"Downloading (…)l-00030-of-00033.bin: 60% 241M/405M [00:01<00:00, 202MB/s]\u001b[A\n",
"Downloading (…)l-00030-of-00033.bin: 65% 262M/405M [00:01<00:00, 187MB/s]\u001b[A\n",
"Downloading (…)l-00030-of-00033.bin: 70% 283M/405M [00:01<00:00, 190MB/s]\u001b[A\n",
"Downloading (…)l-00030-of-00033.bin: 75% 304M/405M [00:01<00:00, 193MB/s]\u001b[A\n",
"Downloading (…)l-00030-of-00033.bin: 80% 325M/405M [00:01<00:00, 196MB/s]\u001b[A\n",
"Downloading (…)l-00030-of-00033.bin: 85% 346M/405M [00:01<00:00, 197MB/s]\u001b[A\n",
"Downloading (…)l-00030-of-00033.bin: 91% 367M/405M [00:01<00:00, 200MB/s]\u001b[A\n",
"Downloading (…)l-00030-of-00033.bin: 100% 405M/405M [00:02<00:00, 192MB/s]\n",
"Downloading shards: 91% 30/33 [01:39<00:06, 2.27s/it]\n",
"Downloading (…)l-00031-of-00033.bin: 0% 0.00/405M [00:00, ?B/s]\u001b[A\n",
"Downloading (…)l-00031-of-00033.bin: 3% 10.5M/405M [00:00<00:05, 70.4MB/s]\u001b[A\n",
"Downloading (…)l-00031-of-00033.bin: 5% 21.0M/405M [00:00<00:04, 82.7MB/s]\u001b[A\n",
"Downloading (…)l-00031-of-00033.bin: 8% 31.5M/405M [00:00<00:04, 84.8MB/s]\u001b[A\n",
"Downloading (…)l-00031-of-00033.bin: 10% 41.9M/405M [00:00<00:04, 87.9MB/s]\u001b[A\n",
"Downloading (…)l-00031-of-00033.bin: 13% 52.4M/405M [00:00<00:03, 89.8MB/s]\u001b[A\n",
"Downloading (…)l-00031-of-00033.bin: 16% 62.9M/405M [00:00<00:03, 90.0MB/s]\u001b[A\n",
"Downloading (…)l-00031-of-00033.bin: 18% 73.4M/405M [00:00<00:03, 86.3MB/s]\u001b[A\n",
"Downloading (…)l-00031-of-00033.bin: 21% 83.9M/405M [00:00<00:03, 87.6MB/s]\u001b[A\n",
"Downloading (…)l-00031-of-00033.bin: 23% 94.4M/405M [00:01<00:03, 78.5MB/s]\u001b[A\n",
"Downloading (…)l-00031-of-00033.bin: 26% 105M/405M [00:01<00:03, 80.1MB/s] \u001b[A\n",
"Downloading (…)l-00031-of-00033.bin: 28% 115M/405M [00:01<00:03, 84.6MB/s]\u001b[A\n",
"Downloading (…)l-00031-of-00033.bin: 31% 126M/405M [00:01<00:03, 82.3MB/s]\u001b[A\n",
"Downloading (…)l-00031-of-00033.bin: 34% 136M/405M [00:01<00:03, 85.8MB/s]\u001b[A\n",
"Downloading (…)l-00031-of-00033.bin: 36% 147M/405M [00:01<00:02, 88.7MB/s]\u001b[A\n",
"Downloading (…)l-00031-of-00033.bin: 39% 157M/405M [00:01<00:02, 87.0MB/s]\u001b[A\n",
"Downloading (…)l-00031-of-00033.bin: 41% 168M/405M [00:01<00:02, 87.9MB/s]\u001b[A\n",
"Downloading (…)l-00031-of-00033.bin: 44% 178M/405M [00:02<00:02, 90.2MB/s]\u001b[A\n",
"Downloading (…)l-00031-of-00033.bin: 47% 189M/405M [00:02<00:02, 90.6MB/s]\u001b[A\n",
"Downloading (…)l-00031-of-00033.bin: 49% 199M/405M [00:02<00:02, 89.6MB/s]\u001b[A\n",
"Downloading (…)l-00031-of-00033.bin: 52% 210M/405M [00:02<00:02, 92.6MB/s]\u001b[A\n",
"Downloading (…)l-00031-of-00033.bin: 54% 220M/405M [00:02<00:02, 88.2MB/s]\u001b[A\n",
"Downloading (…)l-00031-of-00033.bin: 57% 231M/405M [00:02<00:01, 90.8MB/s]\u001b[A\n",
"Downloading (…)l-00031-of-00033.bin: 60% 241M/405M [00:02<00:01, 91.7MB/s]\u001b[A\n",
"Downloading (…)l-00031-of-00033.bin: 62% 252M/405M [00:02<00:01, 91.5MB/s]\u001b[A\n",
"Downloading (…)l-00031-of-00033.bin: 65% 262M/405M [00:02<00:01, 91.7MB/s]\u001b[A\n",
"Downloading (…)l-00031-of-00033.bin: 67% 273M/405M [00:03<00:01, 91.4MB/s]\u001b[A\n",
"Downloading (…)l-00031-of-00033.bin: 70% 283M/405M [00:03<00:01, 93.7MB/s]\u001b[A\n",
"Downloading (…)l-00031-of-00033.bin: 73% 294M/405M [00:03<00:01, 93.9MB/s]\u001b[A\n",
"Downloading (…)l-00031-of-00033.bin: 75% 304M/405M [00:03<00:01, 94.9MB/s]\u001b[A\n",
"Downloading (…)l-00031-of-00033.bin: 78% 315M/405M [00:03<00:00, 92.4MB/s]\u001b[A\n",
"Downloading (…)l-00031-of-00033.bin: 80% 325M/405M [00:03<00:00, 91.3MB/s]\u001b[A\n",
"Downloading (…)l-00031-of-00033.bin: 83% 336M/405M [00:03<00:00, 91.2MB/s]\u001b[A\n",
"Downloading (…)l-00031-of-00033.bin: 85% 346M/405M [00:03<00:00, 89.0MB/s]\u001b[A\n",
"Downloading (…)l-00031-of-00033.bin: 88% 357M/405M [00:04<00:00, 91.1MB/s]\u001b[A\n",
"Downloading (…)l-00031-of-00033.bin: 91% 367M/405M [00:04<00:00, 92.1MB/s]\u001b[A\n",
"Downloading (…)l-00031-of-00033.bin: 93% 377M/405M [00:04<00:00, 93.2MB/s]\u001b[A\n",
"Downloading (…)l-00031-of-00033.bin: 96% 388M/405M [00:04<00:00, 93.6MB/s]\u001b[A\n",
"Downloading (…)l-00031-of-00033.bin: 100% 405M/405M [00:04<00:00, 89.2MB/s]\n",
"Downloading shards: 94% 31/33 [01:44<00:06, 3.01s/it]\n",
"Downloading (…)l-00032-of-00033.bin: 0% 0.00/405M [00:00, ?B/s]\u001b[A\n",
"Downloading (…)l-00032-of-00033.bin: 3% 10.5M/405M [00:00<00:05, 72.1MB/s]\u001b[A\n",
"Downloading (…)l-00032-of-00033.bin: 5% 21.0M/405M [00:00<00:04, 84.6MB/s]\u001b[A\n",
"Downloading (…)l-00032-of-00033.bin: 8% 31.5M/405M [00:00<00:04, 90.3MB/s]\u001b[A\n",
"Downloading (…)l-00032-of-00033.bin: 10% 41.9M/405M [00:00<00:03, 92.3MB/s]\u001b[A\n",
"Downloading (…)l-00032-of-00033.bin: 13% 52.4M/405M [00:00<00:03, 92.0MB/s]\u001b[A\n",
"Downloading (…)l-00032-of-00033.bin: 16% 62.9M/405M [00:00<00:03, 93.0MB/s]\u001b[A\n",
"Downloading (…)l-00032-of-00033.bin: 18% 73.4M/405M [00:00<00:03, 93.6MB/s]\u001b[A\n",
"Downloading (…)l-00032-of-00033.bin: 21% 83.9M/405M [00:00<00:03, 94.5MB/s]\u001b[A\n",
"Downloading (…)l-00032-of-00033.bin: 23% 94.4M/405M [00:01<00:03, 94.9MB/s]\u001b[A\n",
"Downloading (…)l-00032-of-00033.bin: 26% 105M/405M [00:01<00:03, 96.0MB/s] \u001b[A\n",
"Downloading (…)l-00032-of-00033.bin: 28% 115M/405M [00:01<00:03, 96.2MB/s]\u001b[A\n",
"Downloading (…)l-00032-of-00033.bin: 31% 126M/405M [00:01<00:02, 94.7MB/s]\u001b[A\n",
"Downloading (…)l-00032-of-00033.bin: 34% 136M/405M [00:01<00:03, 88.4MB/s]\u001b[A\n",
"Downloading (…)l-00032-of-00033.bin: 36% 147M/405M [00:01<00:02, 88.4MB/s]\u001b[A\n",
"Downloading (…)l-00032-of-00033.bin: 39% 157M/405M [00:01<00:02, 90.1MB/s]\u001b[A\n",
"Downloading (…)l-00032-of-00033.bin: 41% 168M/405M [00:01<00:02, 91.8MB/s]\u001b[A\n",
"Downloading (…)l-00032-of-00033.bin: 44% 178M/405M [00:01<00:02, 92.5MB/s]\u001b[A\n",
"Downloading (…)l-00032-of-00033.bin: 47% 189M/405M [00:02<00:02, 91.0MB/s]\u001b[A\n",
"Downloading (…)l-00032-of-00033.bin: 49% 199M/405M [00:02<00:02, 91.9MB/s]\u001b[A\n",
"Downloading (…)l-00032-of-00033.bin: 52% 210M/405M [00:02<00:02, 92.8MB/s]\u001b[A\n",
"Downloading (…)l-00032-of-00033.bin: 54% 220M/405M [00:02<00:02, 89.9MB/s]\u001b[A\n",
"Downloading (…)l-00032-of-00033.bin: 57% 231M/405M [00:02<00:01, 90.4MB/s]\u001b[A\n",
"Downloading (…)l-00032-of-00033.bin: 60% 241M/405M [00:02<00:01, 90.8MB/s]\u001b[A\n",
"Downloading (…)l-00032-of-00033.bin: 62% 252M/405M [00:02<00:01, 90.1MB/s]\u001b[A\n",
"Downloading (…)l-00032-of-00033.bin: 65% 262M/405M [00:02<00:01, 91.8MB/s]\u001b[A\n",
"Downloading (…)l-00032-of-00033.bin: 67% 273M/405M [00:02<00:01, 93.0MB/s]\u001b[A\n",
"Downloading (…)l-00032-of-00033.bin: 70% 283M/405M [00:03<00:01, 93.2MB/s]\u001b[A\n",
"Downloading (…)l-00032-of-00033.bin: 73% 294M/405M [00:03<00:01, 94.4MB/s]\u001b[A\n",
"Downloading (…)l-00032-of-00033.bin: 75% 304M/405M [00:03<00:01, 93.9MB/s]\u001b[A\n",
"Downloading (…)l-00032-of-00033.bin: 78% 315M/405M [00:03<00:00, 92.3MB/s]\u001b[A\n",
"Downloading (…)l-00032-of-00033.bin: 80% 325M/405M [00:03<00:00, 93.9MB/s]\u001b[A\n",
"Downloading (…)l-00032-of-00033.bin: 83% 336M/405M [00:03<00:00, 91.7MB/s]\u001b[A\n",
"Downloading (…)l-00032-of-00033.bin: 85% 346M/405M [00:03<00:00, 89.6MB/s]\u001b[A\n",
"Downloading (…)l-00032-of-00033.bin: 88% 357M/405M [00:03<00:00, 92.0MB/s]\u001b[A\n",
"Downloading (…)l-00032-of-00033.bin: 91% 367M/405M [00:03<00:00, 92.7MB/s]\u001b[A\n",
"Downloading (…)l-00032-of-00033.bin: 93% 377M/405M [00:04<00:00, 93.7MB/s]\u001b[A\n",
"Downloading (…)l-00032-of-00033.bin: 96% 388M/405M [00:04<00:00, 94.6MB/s]\u001b[A\n",
"Downloading (…)l-00032-of-00033.bin: 100% 405M/405M [00:04<00:00, 92.0MB/s]\n",
"Downloading shards: 97% 32/33 [01:48<00:03, 3.47s/it]\n",
"Downloading (…)l-00033-of-00033.bin: 0% 0.00/524M [00:00, ?B/s]\u001b[A\n",
"Downloading (…)l-00033-of-00033.bin: 2% 10.5M/524M [00:01<00:50, 10.1MB/s]\u001b[A\n",
"Downloading (…)l-00033-of-00033.bin: 4% 21.0M/524M [00:01<00:23, 21.2MB/s]\u001b[A\n",
"Downloading (…)l-00033-of-00033.bin: 8% 41.9M/524M [00:01<00:10, 44.8MB/s]\u001b[A\n",
"Downloading (…)l-00033-of-00033.bin: 10% 52.4M/524M [00:01<00:08, 53.9MB/s]\u001b[A\n",
"Downloading (…)l-00033-of-00033.bin: 12% 62.9M/524M [00:01<00:07, 62.4MB/s]\u001b[A\n",
"Downloading (…)l-00033-of-00033.bin: 14% 73.4M/524M [00:01<00:06, 69.9MB/s]\u001b[A\n",
"Downloading (…)l-00033-of-00033.bin: 16% 83.9M/524M [00:01<00:05, 76.0MB/s]\u001b[A\n",
"Downloading (…)l-00033-of-00033.bin: 18% 94.4M/524M [00:01<00:05, 78.2MB/s]\u001b[A\n",
"Downloading (…)l-00033-of-00033.bin: 20% 105M/524M [00:01<00:05, 82.6MB/s] \u001b[A\n",
"Downloading (…)l-00033-of-00033.bin: 22% 115M/524M [00:02<00:04, 85.7MB/s]\u001b[A\n",
"Downloading (…)l-00033-of-00033.bin: 24% 126M/524M [00:02<00:04, 88.5MB/s]\u001b[A\n",
"Downloading (…)l-00033-of-00033.bin: 26% 136M/524M [00:02<00:04, 90.9MB/s]\u001b[A\n",
"Downloading (…)l-00033-of-00033.bin: 28% 147M/524M [00:02<00:04, 93.0MB/s]\u001b[A\n",
"Downloading (…)l-00033-of-00033.bin: 30% 157M/524M [00:02<00:03, 94.7MB/s]\u001b[A\n",
"Downloading (…)l-00033-of-00033.bin: 32% 168M/524M [00:02<00:03, 95.4MB/s]\u001b[A\n",
"Downloading (…)l-00033-of-00033.bin: 34% 178M/524M [00:02<00:03, 95.0MB/s]\u001b[A\n",
"Downloading (…)l-00033-of-00033.bin: 36% 189M/524M [00:02<00:03, 95.5MB/s]\u001b[A\n",
"Downloading (…)l-00033-of-00033.bin: 38% 199M/524M [00:02<00:03, 96.6MB/s]\u001b[A\n",
"Downloading (…)l-00033-of-00033.bin: 40% 210M/524M [00:03<00:03, 95.4MB/s]\u001b[A\n",
"Downloading (…)l-00033-of-00033.bin: 42% 220M/524M [00:03<00:03, 96.1MB/s]\u001b[A\n",
"Downloading (…)l-00033-of-00033.bin: 44% 231M/524M [00:03<00:03, 96.1MB/s]\u001b[A\n",
"Downloading (…)l-00033-of-00033.bin: 46% 241M/524M [00:03<00:02, 96.8MB/s]\u001b[A\n",
"Downloading (…)l-00033-of-00033.bin: 48% 252M/524M [00:03<00:02, 96.7MB/s]\u001b[A\n",
"Downloading (…)l-00033-of-00033.bin: 50% 262M/524M [00:03<00:02, 92.2MB/s]\u001b[A\n",
"Downloading (…)l-00033-of-00033.bin: 52% 273M/524M [00:03<00:02, 92.0MB/s]\u001b[A\n",
"Downloading (…)l-00033-of-00033.bin: 54% 283M/524M [00:03<00:02, 93.3MB/s]\u001b[A\n",
"Downloading (…)l-00033-of-00033.bin: 56% 294M/524M [00:03<00:02, 94.2MB/s]\u001b[A\n",
"Downloading (…)l-00033-of-00033.bin: 58% 304M/524M [00:04<00:02, 93.4MB/s]\u001b[A\n",
"Downloading (…)l-00033-of-00033.bin: 60% 315M/524M [00:04<00:02, 92.6MB/s]\u001b[A\n",
"Downloading (…)l-00033-of-00033.bin: 62% 325M/524M [00:04<00:02, 92.4MB/s]\u001b[A\n",
"Downloading (…)l-00033-of-00033.bin: 64% 336M/524M [00:04<00:02, 93.5MB/s]\u001b[A\n",
"Downloading (…)l-00033-of-00033.bin: 66% 346M/524M [00:04<00:01, 94.2MB/s]\u001b[A\n",
"Downloading (…)l-00033-of-00033.bin: 68% 357M/524M [00:04<00:01, 94.3MB/s]\u001b[A\n",
"Downloading (…)l-00033-of-00033.bin: 70% 367M/524M [00:04<00:01, 93.8MB/s]\u001b[A\n",
"Downloading (…)l-00033-of-00033.bin: 72% 377M/524M [00:04<00:01, 94.6MB/s]\u001b[A\n",
"Downloading (…)l-00033-of-00033.bin: 74% 388M/524M [00:04<00:01, 92.8MB/s]\u001b[A\n",
"Downloading (…)l-00033-of-00033.bin: 76% 398M/524M [00:05<00:01, 93.5MB/s]\u001b[A\n",
"Downloading (…)l-00033-of-00033.bin: 78% 409M/524M [00:05<00:01, 92.6MB/s]\u001b[A\n",
"Downloading (…)l-00033-of-00033.bin: 80% 419M/524M [00:05<00:01, 93.5MB/s]\u001b[A\n",
"Downloading (…)l-00033-of-00033.bin: 82% 430M/524M [00:05<00:00, 94.4MB/s]\u001b[A\n",
"Downloading (…)l-00033-of-00033.bin: 84% 440M/524M [00:05<00:00, 94.7MB/s]\u001b[A\n",
"Downloading (…)l-00033-of-00033.bin: 86% 451M/524M [00:05<00:00, 94.4MB/s]\u001b[A\n",
"Downloading (…)l-00033-of-00033.bin: 88% 461M/524M [00:05<00:00, 95.3MB/s]\u001b[A\n",
"Downloading (…)l-00033-of-00033.bin: 90% 472M/524M [00:05<00:00, 94.6MB/s]\u001b[A\n",
"Downloading (…)l-00033-of-00033.bin: 92% 482M/524M [00:05<00:00, 94.3MB/s]\u001b[A\n",
"Downloading (…)l-00033-of-00033.bin: 94% 493M/524M [00:06<00:00, 87.8MB/s]\u001b[A\n",
"Downloading (…)l-00033-of-00033.bin: 96% 503M/524M [00:06<00:00, 89.0MB/s]\u001b[A\n",
"Downloading (…)l-00033-of-00033.bin: 98% 514M/524M [00:06<00:00, 91.8MB/s]\u001b[A\n",
"Downloading (…)l-00033-of-00033.bin: 100% 524M/524M [00:06<00:00, 81.3MB/s]\n",
"Downloading shards: 100% 33/33 [01:55<00:00, 3.49s/it]\n",
"Loading checkpoint shards: 100% 33/33 [00:14<00:00, 2.30it/s]\n",
"Downloading (…)neration_config.json: 100% 124/124 [00:00<00:00, 18.4kB/s]\n",
"Extended vocabulary size: 49954\n",
"Loading LoRA for 7B model\n",
"Downloading (…)/adapter_config.json: 100% 472/472 [00:00<00:00, 166kB/s]\n",
"Downloading adapter_model.bin: 100% 858M/858M [00:08<00:00, 103MB/s]\n",
"Peft version: 0.2.0\n",
"Merging model\n",
"Saving shard 1 of 1 into alpaca-combined/consolidated.00.pth\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"## 量化模型\n",
"接下来我们使用[llama.cpp](https://github.com/ggerganov/llama.cpp)工具对上一步生成的全量版本权重进行转换,生成4-bit量化模型。\n",
"\n",
"### 编译工具\n",
"\n",
"首先对llama.cpp工具进行编译。"
],
"metadata": {
"id": "ueexcKo-Q_EW"
}
},
{
"cell_type": "code",
"source": [
"!cd llama.cpp && make"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "_GbjsT2wRRCR",
"outputId": "8da3382c-6bff-4030-905b-bb4f622766d7"
},
"execution_count": 4,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"I llama.cpp build info: \n",
"I UNAME_S: Linux\n",
"I UNAME_P: x86_64\n",
"I UNAME_M: x86_64\n",
"I CFLAGS: -I. -O3 -DNDEBUG -std=c11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -Wno-unused-function -pthread -march=native -mtune=native\n",
"I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native\n",
"I LDFLAGS: \n",
"I CC: cc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0\n",
"I CXX: g++ (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0\n",
"\n",
"cc -I. -O3 -DNDEBUG -std=c11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -Wno-unused-function -pthread -march=native -mtune=native -c ggml.c -o ggml.o\n",
"g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -c llama.cpp -o llama.o\n",
"In file included from \u001b[01m\u001b[Kllama.cpp:6\u001b[m\u001b[K:\n",
"\u001b[01m\u001b[Kllama_util.h:60:2:\u001b[m\u001b[K \u001b[01;35m\u001b[Kwarning: \u001b[m\u001b[Kextra ‘\u001b[01m\u001b[K;\u001b[m\u001b[K’ [\u001b[01;35m\u001b[K-Wpedantic\u001b[m\u001b[K]\n",
" 60 | }\u001b[01;35m\u001b[K;\u001b[m\u001b[K\n",
" | \u001b[01;35m\u001b[K^\u001b[m\u001b[K\n",
"g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -c examples/common.cpp -o common.o\n",
"g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native examples/main/main.cpp ggml.o llama.o common.o -o main \n",
"\n",
"==== Run ./main -h for help. ====\n",
"\n",
"g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native examples/quantize/quantize.cpp ggml.o llama.o -o quantize \n",
"g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native examples/perplexity/perplexity.cpp ggml.o llama.o common.o -o perplexity \n",
"g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native examples/embedding/embedding.cpp ggml.o llama.o common.o -o embedding \n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"### 模型转换为ggml格式(FP16)\n",
"\n",
"这一步,我们将模型转换为ggml格式(FP16)。\n",
"- 在这之前需要把`alpaca-combined`目录挪个位置,把模型文件放到`llama.cpp/zh-models/7B`下,把`tokenizer.model`放到`llama.cpp/zh-models`\n",
"- tokenizer在哪里?\n",
" - `alpaca-combined`目录下有\n",
" - 或者从以下网址下载:https://huggingface.co/ziqingyang/chinese-alpaca-lora-7b/resolve/main/tokenizer.model (注意,Alpaca和LLaMA的`tokenizer.model`不能混用!)\n",
"\n",
"💡 转换13B模型提示:\n",
"- tokenizer可以直接用7B的,13B和7B的相同\n",
"- Alpaca和LLaMA的`tokenizer.model`不能混用!\n",
"- 以下看到7B字样的都是文件夹名,与转换过程没有关系了,改不改都行"
],
"metadata": {
"id": "gw2xpYC0RcQC"
}
},
{
"cell_type": "code",
"source": [
"!cd llama.cpp && mkdir zh-models && mv ../alpaca-combined zh-models/7B\n",
"!mv llama.cpp/zh-models/7B/tokenizer.model llama.cpp/zh-models/\n",
"!ls llama.cpp/zh-models/"
],
"metadata": {
"id": "5KgnFVStRjio",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "09ba7058-e2fb-4ae1-8539-62228df6ea09"
},
"execution_count": 6,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"7B tokenizer.model\n"
]
}
]
},
{
"cell_type": "code",
"source": [
"!cd llama.cpp && python convert.py zh-models/7B/"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "NUHeoTMQS1AQ",
"outputId": "356f9e70-d05d-42d3-ed8c-fc052e11a855"
},
"execution_count": 7,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Loading model file zh-models/7B/consolidated.00.pth\n",
"Loading vocab file zh-models/tokenizer.model\n",
"Writing vocab...\n",
"[1/291] Writing tensor tok_embeddings.weight, size 49954 x 4096...\n",
"[2/291] Writing tensor norm.weight, size 4096...\n",
"[3/291] Writing tensor output.weight, size 49954 x 4096...\n",
"[4/291] Writing tensor layers.0.attention.wq.weight, size 4096 x 4096...\n",
"[5/291] Writing tensor layers.0.attention.wk.weight, size 4096 x 4096...\n",
"[6/291] Writing tensor layers.0.attention.wv.weight, size 4096 x 4096...\n",
"[7/291] Writing tensor layers.0.attention.wo.weight, size 4096 x 4096...\n",
"[8/291] Writing tensor layers.0.attention_norm.weight, size 4096...\n",
"[9/291] Writing tensor layers.0.feed_forward.w1.weight, size 11008 x 4096...\n",
"[10/291] Writing tensor layers.0.feed_forward.w2.weight, size 4096 x 11008...\n",
"[11/291] Writing tensor layers.0.feed_forward.w3.weight, size 11008 x 4096...\n",
"[12/291] Writing tensor layers.0.ffn_norm.weight, size 4096...\n",
"[13/291] Writing tensor layers.1.attention.wq.weight, size 4096 x 4096...\n",
"[14/291] Writing tensor layers.1.attention.wk.weight, size 4096 x 4096...\n",
"[15/291] Writing tensor layers.1.attention.wv.weight, size 4096 x 4096...\n",
"[16/291] Writing tensor layers.1.attention.wo.weight, size 4096 x 4096...\n",
"[17/291] Writing tensor layers.1.attention_norm.weight, size 4096...\n",
"[18/291] Writing tensor layers.1.feed_forward.w1.weight, size 11008 x 4096...\n",
"[19/291] Writing tensor layers.1.feed_forward.w2.weight, size 4096 x 11008...\n",
"[20/291] Writing tensor layers.1.feed_forward.w3.weight, size 11008 x 4096...\n",
"[21/291] Writing tensor layers.1.ffn_norm.weight, size 4096...\n",
"[22/291] Writing tensor layers.2.attention.wq.weight, size 4096 x 4096...\n",
"[23/291] Writing tensor layers.2.attention.wk.weight, size 4096 x 4096...\n",
"[24/291] Writing tensor layers.2.attention.wv.weight, size 4096 x 4096...\n",
"[25/291] Writing tensor layers.2.attention.wo.weight, size 4096 x 4096...\n",
"[26/291] Writing tensor layers.2.attention_norm.weight, size 4096...\n",
"[27/291] Writing tensor layers.2.feed_forward.w1.weight, size 11008 x 4096...\n",
"[28/291] Writing tensor layers.2.feed_forward.w2.weight, size 4096 x 11008...\n",
"[29/291] Writing tensor layers.2.feed_forward.w3.weight, size 11008 x 4096...\n",
"[30/291] Writing tensor layers.2.ffn_norm.weight, size 4096...\n",
"[31/291] Writing tensor layers.3.attention.wq.weight, size 4096 x 4096...\n",
"[32/291] Writing tensor layers.3.attention.wk.weight, size 4096 x 4096...\n",
"[33/291] Writing tensor layers.3.attention.wv.weight, size 4096 x 4096...\n",
"[34/291] Writing tensor layers.3.attention.wo.weight, size 4096 x 4096...\n",
"[35/291] Writing tensor layers.3.attention_norm.weight, size 4096...\n",
"[36/291] Writing tensor layers.3.feed_forward.w1.weight, size 11008 x 4096...\n",
"[37/291] Writing tensor layers.3.feed_forward.w2.weight, size 4096 x 11008...\n",
"[38/291] Writing tensor layers.3.feed_forward.w3.weight, size 11008 x 4096...\n",
"[39/291] Writing tensor layers.3.ffn_norm.weight, size 4096...\n",
"[40/291] Writing tensor layers.4.attention.wq.weight, size 4096 x 4096...\n",
"[41/291] Writing tensor layers.4.attention.wk.weight, size 4096 x 4096...\n",
"[42/291] Writing tensor layers.4.attention.wv.weight, size 4096 x 4096...\n",
"[43/291] Writing tensor layers.4.attention.wo.weight, size 4096 x 4096...\n",
"[44/291] Writing tensor layers.4.attention_norm.weight, size 4096...\n",
"[45/291] Writing tensor layers.4.feed_forward.w1.weight, size 11008 x 4096...\n",
"[46/291] Writing tensor layers.4.feed_forward.w2.weight, size 4096 x 11008...\n",
"[47/291] Writing tensor layers.4.feed_forward.w3.weight, size 11008 x 4096...\n",
"[48/291] Writing tensor layers.4.ffn_norm.weight, size 4096...\n",
"[49/291] Writing tensor layers.5.attention.wq.weight, size 4096 x 4096...\n",
"[50/291] Writing tensor layers.5.attention.wk.weight, size 4096 x 4096...\n",
"[51/291] Writing tensor layers.5.attention.wv.weight, size 4096 x 4096...\n",
"[52/291] Writing tensor layers.5.attention.wo.weight, size 4096 x 4096...\n",
"[53/291] Writing tensor layers.5.attention_norm.weight, size 4096...\n",
"[54/291] Writing tensor layers.5.feed_forward.w1.weight, size 11008 x 4096...\n",
"[55/291] Writing tensor layers.5.feed_forward.w2.weight, size 4096 x 11008...\n",
"[56/291] Writing tensor layers.5.feed_forward.w3.weight, size 11008 x 4096...\n",
"[57/291] Writing tensor layers.5.ffn_norm.weight, size 4096...\n",
"[58/291] Writing tensor layers.6.attention.wq.weight, size 4096 x 4096...\n",
"[59/291] Writing tensor layers.6.attention.wk.weight, size 4096 x 4096...\n",
"[60/291] Writing tensor layers.6.attention.wv.weight, size 4096 x 4096...\n",
"[61/291] Writing tensor layers.6.attention.wo.weight, size 4096 x 4096...\n",
"[62/291] Writing tensor layers.6.attention_norm.weight, size 4096...\n",
"[63/291] Writing tensor layers.6.feed_forward.w1.weight, size 11008 x 4096...\n",
"[64/291] Writing tensor layers.6.feed_forward.w2.weight, size 4096 x 11008...\n",
"[65/291] Writing tensor layers.6.feed_forward.w3.weight, size 11008 x 4096...\n",
"[66/291] Writing tensor layers.6.ffn_norm.weight, size 4096...\n",
"[67/291] Writing tensor layers.7.attention.wq.weight, size 4096 x 4096...\n",
"[68/291] Writing tensor layers.7.attention.wk.weight, size 4096 x 4096...\n",
"[69/291] Writing tensor layers.7.attention.wv.weight, size 4096 x 4096...\n",
"[70/291] Writing tensor layers.7.attention.wo.weight, size 4096 x 4096...\n",
"[71/291] Writing tensor layers.7.attention_norm.weight, size 4096...\n",
"[72/291] Writing tensor layers.7.feed_forward.w1.weight, size 11008 x 4096...\n",
"[73/291] Writing tensor layers.7.feed_forward.w2.weight, size 4096 x 11008...\n",
"[74/291] Writing tensor layers.7.feed_forward.w3.weight, size 11008 x 4096...\n",
"[75/291] Writing tensor layers.7.ffn_norm.weight, size 4096...\n",
"[76/291] Writing tensor layers.8.attention.wq.weight, size 4096 x 4096...\n",
"[77/291] Writing tensor layers.8.attention.wk.weight, size 4096 x 4096...\n",
"[78/291] Writing tensor layers.8.attention.wv.weight, size 4096 x 4096...\n",
"[79/291] Writing tensor layers.8.attention.wo.weight, size 4096 x 4096...\n",
"[80/291] Writing tensor layers.8.attention_norm.weight, size 4096...\n",
"[81/291] Writing tensor layers.8.feed_forward.w1.weight, size 11008 x 4096...\n",
"[82/291] Writing tensor layers.8.feed_forward.w2.weight, size 4096 x 11008...\n",
"[83/291] Writing tensor layers.8.feed_forward.w3.weight, size 11008 x 4096...\n",
"[84/291] Writing tensor layers.8.ffn_norm.weight, size 4096...\n",
"[85/291] Writing tensor layers.9.attention.wq.weight, size 4096 x 4096...\n",
"[86/291] Writing tensor layers.9.attention.wk.weight, size 4096 x 4096...\n",
"[87/291] Writing tensor layers.9.attention.wv.weight, size 4096 x 4096...\n",
"[88/291] Writing tensor layers.9.attention.wo.weight, size 4096 x 4096...\n",
"[89/291] Writing tensor layers.9.attention_norm.weight, size 4096...\n",
"[90/291] Writing tensor layers.9.feed_forward.w1.weight, size 11008 x 4096...\n",
"[91/291] Writing tensor layers.9.feed_forward.w2.weight, size 4096 x 11008...\n",
"[92/291] Writing tensor layers.9.feed_forward.w3.weight, size 11008 x 4096...\n",
"[93/291] Writing tensor layers.9.ffn_norm.weight, size 4096...\n",
"[94/291] Writing tensor layers.10.attention.wq.weight, size 4096 x 4096...\n",
"[95/291] Writing tensor layers.10.attention.wk.weight, size 4096 x 4096...\n",
"[96/291] Writing tensor layers.10.attention.wv.weight, size 4096 x 4096...\n",
"[97/291] Writing tensor layers.10.attention.wo.weight, size 4096 x 4096...\n",
"[98/291] Writing tensor layers.10.attention_norm.weight, size 4096...\n",
"[99/291] Writing tensor layers.10.feed_forward.w1.weight, size 11008 x 4096...\n",
"[100/291] Writing tensor layers.10.feed_forward.w2.weight, size 4096 x 11008...\n",
"[101/291] Writing tensor layers.10.feed_forward.w3.weight, size 11008 x 4096...\n",
"[102/291] Writing tensor layers.10.ffn_norm.weight, size 4096...\n",
"[103/291] Writing tensor layers.11.attention.wq.weight, size 4096 x 4096...\n",
"[104/291] Writing tensor layers.11.attention.wk.weight, size 4096 x 4096...\n",
"[105/291] Writing tensor layers.11.attention.wv.weight, size 4096 x 4096...\n",
"[106/291] Writing tensor layers.11.attention.wo.weight, size 4096 x 4096...\n",
"[107/291] Writing tensor layers.11.attention_norm.weight, size 4096...\n",
"[108/291] Writing tensor layers.11.feed_forward.w1.weight, size 11008 x 4096...\n",
"[109/291] Writing tensor layers.11.feed_forward.w2.weight, size 4096 x 11008...\n",
"[110/291] Writing tensor layers.11.feed_forward.w3.weight, size 11008 x 4096...\n",
"[111/291] Writing tensor layers.11.ffn_norm.weight, size 4096...\n",
"[112/291] Writing tensor layers.12.attention.wq.weight, size 4096 x 4096...\n",
"[113/291] Writing tensor layers.12.attention.wk.weight, size 4096 x 4096...\n",
"[114/291] Writing tensor layers.12.attention.wv.weight, size 4096 x 4096...\n",
"[115/291] Writing tensor layers.12.attention.wo.weight, size 4096 x 4096...\n",
"[116/291] Writing tensor layers.12.attention_norm.weight, size 4096...\n",
"[117/291] Writing tensor layers.12.feed_forward.w1.weight, size 11008 x 4096...\n",
"[118/291] Writing tensor layers.12.feed_forward.w2.weight, size 4096 x 11008...\n",
"[119/291] Writing tensor layers.12.feed_forward.w3.weight, size 11008 x 4096...\n",
"[120/291] Writing tensor layers.12.ffn_norm.weight, size 4096...\n",
"[121/291] Writing tensor layers.13.attention.wq.weight, size 4096 x 4096...\n",
"[122/291] Writing tensor layers.13.attention.wk.weight, size 4096 x 4096...\n",
"[123/291] Writing tensor layers.13.attention.wv.weight, size 4096 x 4096...\n",
"[124/291] Writing tensor layers.13.attention.wo.weight, size 4096 x 4096...\n",
"[125/291] Writing tensor layers.13.attention_norm.weight, size 4096...\n",
"[126/291] Writing tensor layers.13.feed_forward.w1.weight, size 11008 x 4096...\n",
"[127/291] Writing tensor layers.13.feed_forward.w2.weight, size 4096 x 11008...\n",
"[128/291] Writing tensor layers.13.feed_forward.w3.weight, size 11008 x 4096...\n",
"[129/291] Writing tensor layers.13.ffn_norm.weight, size 4096...\n",
"[130/291] Writing tensor layers.14.attention.wq.weight, size 4096 x 4096...\n",
"[131/291] Writing tensor layers.14.attention.wk.weight, size 4096 x 4096...\n",
"[132/291] Writing tensor layers.14.attention.wv.weight, size 4096 x 4096...\n",
"[133/291] Writing tensor layers.14.attention.wo.weight, size 4096 x 4096...\n",
"[134/291] Writing tensor layers.14.attention_norm.weight, size 4096...\n",
"[135/291] Writing tensor layers.14.feed_forward.w1.weight, size 11008 x 4096...\n",
"[136/291] Writing tensor layers.14.feed_forward.w2.weight, size 4096 x 11008...\n",
"[137/291] Writing tensor layers.14.feed_forward.w3.weight, size 11008 x 4096...\n",
"[138/291] Writing tensor layers.14.ffn_norm.weight, size 4096...\n",
"[139/291] Writing tensor layers.15.attention.wq.weight, size 4096 x 4096...\n",
"[140/291] Writing tensor layers.15.attention.wk.weight, size 4096 x 4096...\n",
"[141/291] Writing tensor layers.15.attention.wv.weight, size 4096 x 4096...\n",
"[142/291] Writing tensor layers.15.attention.wo.weight, size 4096 x 4096...\n",
"[143/291] Writing tensor layers.15.attention_norm.weight, size 4096...\n",
"[144/291] Writing tensor layers.15.feed_forward.w1.weight, size 11008 x 4096...\n",
"[145/291] Writing tensor layers.15.feed_forward.w2.weight, size 4096 x 11008...\n",
"[146/291] Writing tensor layers.15.feed_forward.w3.weight, size 11008 x 4096...\n",
"[147/291] Writing tensor layers.15.ffn_norm.weight, size 4096...\n",
"[148/291] Writing tensor layers.16.attention.wq.weight, size 4096 x 4096...\n",
"[149/291] Writing tensor layers.16.attention.wk.weight, size 4096 x 4096...\n",
"[150/291] Writing tensor layers.16.attention.wv.weight, size 4096 x 4096...\n",
"[151/291] Writing tensor layers.16.attention.wo.weight, size 4096 x 4096...\n",
"[152/291] Writing tensor layers.16.attention_norm.weight, size 4096...\n",
"[153/291] Writing tensor layers.16.feed_forward.w1.weight, size 11008 x 4096...\n",
"[154/291] Writing tensor layers.16.feed_forward.w2.weight, size 4096 x 11008...\n",
"[155/291] Writing tensor layers.16.feed_forward.w3.weight, size 11008 x 4096...\n",
"[156/291] Writing tensor layers.16.ffn_norm.weight, size 4096...\n",
"[157/291] Writing tensor layers.17.attention.wq.weight, size 4096 x 4096...\n",
"[158/291] Writing tensor layers.17.attention.wk.weight, size 4096 x 4096...\n",
"[159/291] Writing tensor layers.17.attention.wv.weight, size 4096 x 4096...\n",
"[160/291] Writing tensor layers.17.attention.wo.weight, size 4096 x 4096...\n",
"[161/291] Writing tensor layers.17.attention_norm.weight, size 4096...\n",
"[162/291] Writing tensor layers.17.feed_forward.w1.weight, size 11008 x 4096...\n",
"[163/291] Writing tensor layers.17.feed_forward.w2.weight, size 4096 x 11008...\n",
"[164/291] Writing tensor layers.17.feed_forward.w3.weight, size 11008 x 4096...\n",
"[165/291] Writing tensor layers.17.ffn_norm.weight, size 4096...\n",
"[166/291] Writing tensor layers.18.attention.wq.weight, size 4096 x 4096...\n",
"[167/291] Writing tensor layers.18.attention.wk.weight, size 4096 x 4096...\n",
"[168/291] Writing tensor layers.18.attention.wv.weight, size 4096 x 4096...\n",
"[169/291] Writing tensor layers.18.attention.wo.weight, size 4096 x 4096...\n",
"[170/291] Writing tensor layers.18.attention_norm.weight, size 4096...\n",
"[171/291] Writing tensor layers.18.feed_forward.w1.weight, size 11008 x 4096...\n",
"[172/291] Writing tensor layers.18.feed_forward.w2.weight, size 4096 x 11008...\n",
"[173/291] Writing tensor layers.18.feed_forward.w3.weight, size 11008 x 4096...\n",
"[174/291] Writing tensor layers.18.ffn_norm.weight, size 4096...\n",
"[175/291] Writing tensor layers.19.attention.wq.weight, size 4096 x 4096...\n",
"[176/291] Writing tensor layers.19.attention.wk.weight, size 4096 x 4096...\n",
"[177/291] Writing tensor layers.19.attention.wv.weight, size 4096 x 4096...\n",
"[178/291] Writing tensor layers.19.attention.wo.weight, size 4096 x 4096...\n",
"[179/291] Writing tensor layers.19.attention_norm.weight, size 4096...\n",
"[180/291] Writing tensor layers.19.feed_forward.w1.weight, size 11008 x 4096...\n",
"[181/291] Writing tensor layers.19.feed_forward.w2.weight, size 4096 x 11008...\n",
"[182/291] Writing tensor layers.19.feed_forward.w3.weight, size 11008 x 4096...\n",
"[183/291] Writing tensor layers.19.ffn_norm.weight, size 4096...\n",
"[184/291] Writing tensor layers.20.attention.wq.weight, size 4096 x 4096...\n",
"[185/291] Writing tensor layers.20.attention.wk.weight, size 4096 x 4096...\n",
"[186/291] Writing tensor layers.20.attention.wv.weight, size 4096 x 4096...\n",
"[187/291] Writing tensor layers.20.attention.wo.weight, size 4096 x 4096...\n",
"[188/291] Writing tensor layers.20.attention_norm.weight, size 4096...\n",
"[189/291] Writing tensor layers.20.feed_forward.w1.weight, size 11008 x 4096...\n",
"[190/291] Writing tensor layers.20.feed_forward.w2.weight, size 4096 x 11008...\n",
"[191/291] Writing tensor layers.20.feed_forward.w3.weight, size 11008 x 4096...\n",
"[192/291] Writing tensor layers.20.ffn_norm.weight, size 4096...\n",
"[193/291] Writing tensor layers.21.attention.wq.weight, size 4096 x 4096...\n",
"[194/291] Writing tensor layers.21.attention.wk.weight, size 4096 x 4096...\n",
"[195/291] Writing tensor layers.21.attention.wv.weight, size 4096 x 4096...\n",
"[196/291] Writing tensor layers.21.attention.wo.weight, size 4096 x 4096...\n",
"[197/291] Writing tensor layers.21.attention_norm.weight, size 4096...\n",
"[198/291] Writing tensor layers.21.feed_forward.w1.weight, size 11008 x 4096...\n",
"[199/291] Writing tensor layers.21.feed_forward.w2.weight, size 4096 x 11008...\n",
"[200/291] Writing tensor layers.21.feed_forward.w3.weight, size 11008 x 4096...\n",
"[201/291] Writing tensor layers.21.ffn_norm.weight, size 4096...\n",
"[202/291] Writing tensor layers.22.attention.wq.weight, size 4096 x 4096...\n",
"[203/291] Writing tensor layers.22.attention.wk.weight, size 4096 x 4096...\n",
"[204/291] Writing tensor layers.22.attention.wv.weight, size 4096 x 4096...\n",
"[205/291] Writing tensor layers.22.attention.wo.weight, size 4096 x 4096...\n",
"[206/291] Writing tensor layers.22.attention_norm.weight, size 4096...\n",
"[207/291] Writing tensor layers.22.feed_forward.w1.weight, size 11008 x 4096...\n",
"[208/291] Writing tensor layers.22.feed_forward.w2.weight, size 4096 x 11008...\n",
"[209/291] Writing tensor layers.22.feed_forward.w3.weight, size 11008 x 4096...\n",
"[210/291] Writing tensor layers.22.ffn_norm.weight, size 4096...\n",
"[211/291] Writing tensor layers.23.attention.wq.weight, size 4096 x 4096...\n",
"[212/291] Writing tensor layers.23.attention.wk.weight, size 4096 x 4096...\n",
"[213/291] Writing tensor layers.23.attention.wv.weight, size 4096 x 4096...\n",
"[214/291] Writing tensor layers.23.attention.wo.weight, size 4096 x 4096...\n",
"[215/291] Writing tensor layers.23.attention_norm.weight, size 4096...\n",
"[216/291] Writing tensor layers.23.feed_forward.w1.weight, size 11008 x 4096...\n",
"[217/291] Writing tensor layers.23.feed_forward.w2.weight, size 4096 x 11008...\n",
"[218/291] Writing tensor layers.23.feed_forward.w3.weight, size 11008 x 4096...\n",
"[219/291] Writing tensor layers.23.ffn_norm.weight, size 4096...\n",
"[220/291] Writing tensor layers.24.attention.wq.weight, size 4096 x 4096...\n",
"[221/291] Writing tensor layers.24.attention.wk.weight, size 4096 x 4096...\n",
"[222/291] Writing tensor layers.24.attention.wv.weight, size 4096 x 4096...\n",
"[223/291] Writing tensor layers.24.attention.wo.weight, size 4096 x 4096...\n",
"[224/291] Writing tensor layers.24.attention_norm.weight, size 4096...\n",
"[225/291] Writing tensor layers.24.feed_forward.w1.weight, size 11008 x 4096...\n",
"[226/291] Writing tensor layers.24.feed_forward.w2.weight, size 4096 x 11008...\n",
"[227/291] Writing tensor layers.24.feed_forward.w3.weight, size 11008 x 4096...\n",
"[228/291] Writing tensor layers.24.ffn_norm.weight, size 4096...\n",
"[229/291] Writing tensor layers.25.attention.wq.weight, size 4096 x 4096...\n",
"[230/291] Writing tensor layers.25.attention.wk.weight, size 4096 x 4096...\n",
"[231/291] Writing tensor layers.25.attention.wv.weight, size 4096 x 4096...\n",
"[232/291] Writing tensor layers.25.attention.wo.weight, size 4096 x 4096...\n",
"[233/291] Writing tensor layers.25.attention_norm.weight, size 4096...\n",
"[234/291] Writing tensor layers.25.feed_forward.w1.weight, size 11008 x 4096...\n",
"[235/291] Writing tensor layers.25.feed_forward.w2.weight, size 4096 x 11008...\n",
"[236/291] Writing tensor layers.25.feed_forward.w3.weight, size 11008 x 4096...\n",
"[237/291] Writing tensor layers.25.ffn_norm.weight, size 4096...\n",
"[238/291] Writing tensor layers.26.attention.wq.weight, size 4096 x 4096...\n",
"[239/291] Writing tensor layers.26.attention.wk.weight, size 4096 x 4096...\n",
"[240/291] Writing tensor layers.26.attention.wv.weight, size 4096 x 4096...\n",
"[241/291] Writing tensor layers.26.attention.wo.weight, size 4096 x 4096...\n",
"[242/291] Writing tensor layers.26.attention_norm.weight, size 4096...\n",
"[243/291] Writing tensor layers.26.feed_forward.w1.weight, size 11008 x 4096...\n",
"[244/291] Writing tensor layers.26.feed_forward.w2.weight, size 4096 x 11008...\n",
"[245/291] Writing tensor layers.26.feed_forward.w3.weight, size 11008 x 4096...\n",
"[246/291] Writing tensor layers.26.ffn_norm.weight, size 4096...\n",
"[247/291] Writing tensor layers.27.attention.wq.weight, size 4096 x 4096...\n",
"[248/291] Writing tensor layers.27.attention.wk.weight, size 4096 x 4096...\n",
"[249/291] Writing tensor layers.27.attention.wv.weight, size 4096 x 4096...\n",
"[250/291] Writing tensor layers.27.attention.wo.weight, size 4096 x 4096...\n",
"[251/291] Writing tensor layers.27.attention_norm.weight, size 4096...\n",
"[252/291] Writing tensor layers.27.feed_forward.w1.weight, size 11008 x 4096...\n",
"[253/291] Writing tensor layers.27.feed_forward.w2.weight, size 4096 x 11008...\n",
"[254/291] Writing tensor layers.27.feed_forward.w3.weight, size 11008 x 4096...\n",
"[255/291] Writing tensor layers.27.ffn_norm.weight, size 4096...\n",
"[256/291] Writing tensor layers.28.attention.wq.weight, size 4096 x 4096...\n",
"[257/291] Writing tensor layers.28.attention.wk.weight, size 4096 x 4096...\n",
"[258/291] Writing tensor layers.28.attention.wv.weight, size 4096 x 4096...\n",
"[259/291] Writing tensor layers.28.attention.wo.weight, size 4096 x 4096...\n",
"[260/291] Writing tensor layers.28.attention_norm.weight, size 4096...\n",
"[261/291] Writing tensor layers.28.feed_forward.w1.weight, size 11008 x 4096...\n",
"[262/291] Writing tensor layers.28.feed_forward.w2.weight, size 4096 x 11008...\n",
"[263/291] Writing tensor layers.28.feed_forward.w3.weight, size 11008 x 4096...\n",
"[264/291] Writing tensor layers.28.ffn_norm.weight, size 4096...\n",
"[265/291] Writing tensor layers.29.attention.wq.weight, size 4096 x 4096...\n",
"[266/291] Writing tensor layers.29.attention.wk.weight, size 4096 x 4096...\n",
"[267/291] Writing tensor layers.29.attention.wv.weight, size 4096 x 4096...\n",
"[268/291] Writing tensor layers.29.attention.wo.weight, size 4096 x 4096...\n",
"[269/291] Writing tensor layers.29.attention_norm.weight, size 4096...\n",
"[270/291] Writing tensor layers.29.feed_forward.w1.weight, size 11008 x 4096...\n",
"[271/291] Writing tensor layers.29.feed_forward.w2.weight, size 4096 x 11008...\n",
"[272/291] Writing tensor layers.29.feed_forward.w3.weight, size 11008 x 4096...\n",
"[273/291] Writing tensor layers.29.ffn_norm.weight, size 4096...\n",
"[274/291] Writing tensor layers.30.attention.wq.weight, size 4096 x 4096...\n",
"[275/291] Writing tensor layers.30.attention.wk.weight, size 4096 x 4096...\n",
"[276/291] Writing tensor layers.30.attention.wv.weight, size 4096 x 4096...\n",
"[277/291] Writing tensor layers.30.attention.wo.weight, size 4096 x 4096...\n",
"[278/291] Writing tensor layers.30.attention_norm.weight, size 4096...\n",
"[279/291] Writing tensor layers.30.feed_forward.w1.weight, size 11008 x 4096...\n",
"[280/291] Writing tensor layers.30.feed_forward.w2.weight, size 4096 x 11008...\n",
"[281/291] Writing tensor layers.30.feed_forward.w3.weight, size 11008 x 4096...\n",
"[282/291] Writing tensor layers.30.ffn_norm.weight, size 4096...\n",
"[283/291] Writing tensor layers.31.attention.wq.weight, size 4096 x 4096...\n",
"[284/291] Writing tensor layers.31.attention.wk.weight, size 4096 x 4096...\n",
"[285/291] Writing tensor layers.31.attention.wv.weight, size 4096 x 4096...\n",
"[286/291] Writing tensor layers.31.attention.wo.weight, size 4096 x 4096...\n",
"[287/291] Writing tensor layers.31.attention_norm.weight, size 4096...\n",
"[288/291] Writing tensor layers.31.feed_forward.w1.weight, size 11008 x 4096...\n",
"[289/291] Writing tensor layers.31.feed_forward.w2.weight, size 4096 x 11008...\n",
"[290/291] Writing tensor layers.31.feed_forward.w3.weight, size 11008 x 4096...\n",
"[291/291] Writing tensor layers.31.ffn_norm.weight, size 4096...\n",
"Wrote zh-models/7B/ggml-model-f16.bin\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"### 将FP16模型量化为4-bit\n",
"\n",
"我们进一步将FP16模型转换为4-bit量化模型。"
],
"metadata": {
"id": "hEZEJAVYCHkc"
}
},
{
"cell_type": "code",
"source": [
"!cd llama.cpp && ./quantize ./zh-models/7B/ggml-model-f16.bin ./zh-models/7B/ggml-model-q4_0.bin 2"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "2xyais7OUVDI",
"outputId": "99b4154e-1370-4240-c06b-69ff2f49ee37"
},
"execution_count": 8,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"llama.cpp: loading model from ./zh-models/7B/ggml-model-f16.bin\n",
"llama.cpp: saving model to ./zh-models/7B/ggml-model-q4_0.bin\n",
"[1/291] tok_embeddings.weight - [4096 x 49954], type = f16, quantizing .. size = 390.27 MB -> 121.96 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[2/291] norm.weight - [4096], type = f32, size = 0.016 MB\n",
"[3/291] output.weight - [4096 x 49954], type = f16, quantizing .. size = 390.27 MB -> 121.96 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.126 0.134 0.126 0.104 0.077 0.052 0.033 0.019 0.022 \n",
"[4/291] layers.0.attention.wq.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.021 0.016 0.028 0.046 0.071 0.103 0.137 0.158 0.137 0.103 0.071 0.046 0.028 0.016 0.021 \n",
"[5/291] layers.0.attention.wk.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.021 0.016 0.027 0.045 0.071 0.104 0.138 0.158 0.139 0.104 0.071 0.045 0.027 0.016 0.021 \n",
"[6/291] layers.0.attention.wv.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.018 0.032 0.051 0.076 0.103 0.128 0.141 0.128 0.103 0.075 0.051 0.032 0.019 0.022 \n",
"[7/291] layers.0.attention.wo.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.021 0.016 0.028 0.046 0.072 0.105 0.136 0.151 0.136 0.105 0.072 0.046 0.028 0.016 0.021 \n",
"[8/291] layers.0.attention_norm.weight - [4096], type = f32, size = 0.016 MB\n",
"[9/291] layers.0.feed_forward.w1.weight - [4096 x 11008], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[10/291] layers.0.feed_forward.w2.weight - [11008 x 4096], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.126 0.134 0.126 0.104 0.077 0.052 0.033 0.019 0.022 \n",
"[11/291] layers.0.feed_forward.w3.weight - [4096 x 11008], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.126 0.134 0.125 0.104 0.077 0.052 0.033 0.019 0.022 \n",
"[12/291] layers.0.ffn_norm.weight - [4096], type = f32, size = 0.016 MB\n",
"[13/291] layers.1.attention.wq.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.032 0.051 0.077 0.104 0.127 0.137 0.127 0.104 0.077 0.051 0.032 0.019 0.022 \n",
"[14/291] layers.1.attention.wk.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.021 0.018 0.032 0.051 0.076 0.104 0.128 0.138 0.128 0.104 0.077 0.051 0.032 0.018 0.022 \n",
"[15/291] layers.1.attention.wv.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.018 0.031 0.051 0.076 0.104 0.129 0.139 0.129 0.104 0.076 0.051 0.031 0.018 0.021 \n",
"[16/291] layers.1.attention.wo.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.021 0.016 0.028 0.046 0.071 0.104 0.137 0.154 0.137 0.104 0.071 0.046 0.028 0.016 0.021 \n",
"[17/291] layers.1.attention_norm.weight - [4096], type = f32, size = 0.016 MB\n",
"[18/291] layers.1.feed_forward.w1.weight - [4096 x 11008], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[19/291] layers.1.feed_forward.w2.weight - [11008 x 4096], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[20/291] layers.1.feed_forward.w3.weight - [4096 x 11008], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[21/291] layers.1.ffn_norm.weight - [4096], type = f32, size = 0.016 MB\n",
"[22/291] layers.2.attention.wq.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.032 0.052 0.077 0.104 0.126 0.135 0.126 0.104 0.077 0.052 0.033 0.019 0.022 \n",
"[23/291] layers.2.attention.wk.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.032 0.051 0.076 0.104 0.127 0.137 0.127 0.104 0.077 0.051 0.032 0.019 0.022 \n",
"[24/291] layers.2.attention.wv.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.125 0.136 0.125 0.104 0.077 0.052 0.033 0.019 0.022 \n",
"[25/291] layers.2.attention.wo.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.032 0.052 0.077 0.104 0.126 0.135 0.126 0.104 0.077 0.052 0.032 0.019 0.022 \n",
"[26/291] layers.2.attention_norm.weight - [4096], type = f32, size = 0.016 MB\n",
"[27/291] layers.2.feed_forward.w1.weight - [4096 x 11008], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[28/291] layers.2.feed_forward.w2.weight - [11008 x 4096], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[29/291] layers.2.feed_forward.w3.weight - [4096 x 11008], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[30/291] layers.2.ffn_norm.weight - [4096], type = f32, size = 0.016 MB\n",
"[31/291] layers.3.attention.wq.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.126 0.135 0.126 0.104 0.077 0.052 0.032 0.019 0.022 \n",
"[32/291] layers.3.attention.wk.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.032 0.052 0.077 0.104 0.126 0.136 0.126 0.104 0.077 0.052 0.032 0.019 0.022 \n",
"[33/291] layers.3.attention.wv.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.125 0.135 0.126 0.104 0.077 0.052 0.033 0.019 0.022 \n",
"[34/291] layers.3.attention.wo.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[35/291] layers.3.attention_norm.weight - [4096], type = f32, size = 0.016 MB\n",
"[36/291] layers.3.feed_forward.w1.weight - [4096 x 11008], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[37/291] layers.3.feed_forward.w2.weight - [11008 x 4096], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[38/291] layers.3.feed_forward.w3.weight - [4096 x 11008], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[39/291] layers.3.ffn_norm.weight - [4096], type = f32, size = 0.016 MB\n",
"[40/291] layers.4.attention.wq.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.126 0.135 0.126 0.104 0.077 0.052 0.033 0.019 0.022 \n",
"[41/291] layers.4.attention.wk.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.032 0.052 0.077 0.104 0.126 0.135 0.126 0.104 0.077 0.052 0.032 0.019 0.022 \n",
"[42/291] layers.4.attention.wv.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.077 0.104 0.125 0.135 0.125 0.104 0.077 0.052 0.033 0.019 0.022 \n",
"[43/291] layers.4.attention.wo.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.078 0.104 0.125 0.134 0.125 0.104 0.077 0.053 0.033 0.019 0.022 \n",
"[44/291] layers.4.attention_norm.weight - [4096], type = f32, size = 0.016 MB\n",
"[45/291] layers.4.feed_forward.w1.weight - [4096 x 11008], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[46/291] layers.4.feed_forward.w2.weight - [11008 x 4096], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[47/291] layers.4.feed_forward.w3.weight - [4096 x 11008], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[48/291] layers.4.ffn_norm.weight - [4096], type = f32, size = 0.016 MB\n",
"[49/291] layers.5.attention.wq.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.077 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[50/291] layers.5.attention.wk.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.126 0.134 0.126 0.104 0.077 0.052 0.033 0.019 0.022 \n",
"[51/291] layers.5.attention.wv.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.125 0.135 0.125 0.104 0.077 0.052 0.033 0.019 0.022 \n",
"[52/291] layers.5.attention.wo.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.132 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[53/291] layers.5.attention_norm.weight - [4096], type = f32, size = 0.016 MB\n",
"[54/291] layers.5.feed_forward.w1.weight - [4096 x 11008], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[55/291] layers.5.feed_forward.w2.weight - [11008 x 4096], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[56/291] layers.5.feed_forward.w3.weight - [4096 x 11008], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[57/291] layers.5.ffn_norm.weight - [4096], type = f32, size = 0.016 MB\n",
"[58/291] layers.6.attention.wq.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[59/291] layers.6.attention.wk.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.125 0.134 0.126 0.104 0.077 0.052 0.033 0.019 0.022 \n",
"[60/291] layers.6.attention.wv.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.077 0.104 0.126 0.134 0.125 0.104 0.077 0.053 0.033 0.019 0.022 \n",
"[61/291] layers.6.attention.wo.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[62/291] layers.6.attention_norm.weight - [4096], type = f32, size = 0.016 MB\n",
"[63/291] layers.6.feed_forward.w1.weight - [4096 x 11008], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[64/291] layers.6.feed_forward.w2.weight - [11008 x 4096], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[65/291] layers.6.feed_forward.w3.weight - [4096 x 11008], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[66/291] layers.6.ffn_norm.weight - [4096], type = f32, size = 0.016 MB\n",
"[67/291] layers.7.attention.wq.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.077 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[68/291] layers.7.attention.wk.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.077 0.104 0.126 0.134 0.126 0.104 0.077 0.052 0.033 0.019 0.022 \n",
"[69/291] layers.7.attention.wv.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.125 0.135 0.126 0.104 0.077 0.052 0.033 0.019 0.022 \n",
"[70/291] layers.7.attention.wo.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[71/291] layers.7.attention_norm.weight - [4096], type = f32, size = 0.016 MB\n",
"[72/291] layers.7.feed_forward.w1.weight - [4096 x 11008], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[73/291] layers.7.feed_forward.w2.weight - [11008 x 4096], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[74/291] layers.7.feed_forward.w3.weight - [4096 x 11008], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[75/291] layers.7.ffn_norm.weight - [4096], type = f32, size = 0.016 MB\n",
"[76/291] layers.8.attention.wq.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[77/291] layers.8.attention.wk.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.078 0.104 0.125 0.134 0.125 0.104 0.078 0.052 0.033 0.019 0.022 \n",
"[78/291] layers.8.attention.wv.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.125 0.134 0.126 0.104 0.077 0.052 0.033 0.019 0.022 \n",
"[79/291] layers.8.attention.wo.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.132 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[80/291] layers.8.attention_norm.weight - [4096], type = f32, size = 0.016 MB\n",
"[81/291] layers.8.feed_forward.w1.weight - [4096 x 11008], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[82/291] layers.8.feed_forward.w2.weight - [11008 x 4096], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.126 0.134 0.126 0.104 0.078 0.052 0.033 0.019 0.022 \n",
"[83/291] layers.8.feed_forward.w3.weight - [4096 x 11008], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[84/291] layers.8.ffn_norm.weight - [4096], type = f32, size = 0.016 MB\n",
"[85/291] layers.9.attention.wq.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[86/291] layers.9.attention.wk.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.134 0.125 0.104 0.077 0.053 0.033 0.019 0.022 \n",
"[87/291] layers.9.attention.wv.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.126 0.134 0.125 0.104 0.077 0.052 0.033 0.019 0.022 \n",
"[88/291] layers.9.attention.wo.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.124 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[89/291] layers.9.attention_norm.weight - [4096], type = f32, size = 0.016 MB\n",
"[90/291] layers.9.feed_forward.w1.weight - [4096 x 11008], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[91/291] layers.9.feed_forward.w2.weight - [11008 x 4096], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.126 0.134 0.126 0.104 0.077 0.052 0.033 0.019 0.022 \n",
"[92/291] layers.9.feed_forward.w3.weight - [4096 x 11008], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[93/291] layers.9.ffn_norm.weight - [4096], type = f32, size = 0.016 MB\n",
"[94/291] layers.10.attention.wq.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[95/291] layers.10.attention.wk.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.077 0.053 0.033 0.019 0.022 \n",
"[96/291] layers.10.attention.wv.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.126 0.135 0.126 0.104 0.077 0.052 0.033 0.019 0.022 \n",
"[97/291] layers.10.attention.wo.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.132 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[98/291] layers.10.attention_norm.weight - [4096], type = f32, size = 0.016 MB\n",
"[99/291] layers.10.feed_forward.w1.weight - [4096 x 11008], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[100/291] layers.10.feed_forward.w2.weight - [11008 x 4096], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.126 0.134 0.126 0.104 0.077 0.052 0.033 0.019 0.022 \n",
"[101/291] layers.10.feed_forward.w3.weight - [4096 x 11008], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[102/291] layers.10.ffn_norm.weight - [4096], type = f32, size = 0.016 MB\n",
"[103/291] layers.11.attention.wq.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[104/291] layers.11.attention.wk.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[105/291] layers.11.attention.wv.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.125 0.135 0.126 0.104 0.077 0.052 0.033 0.019 0.022 \n",
"[106/291] layers.11.attention.wo.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.132 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[107/291] layers.11.attention_norm.weight - [4096], type = f32, size = 0.016 MB\n",
"[108/291] layers.11.feed_forward.w1.weight - [4096 x 11008], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[109/291] layers.11.feed_forward.w2.weight - [11008 x 4096], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.032 0.052 0.077 0.104 0.126 0.134 0.126 0.104 0.077 0.052 0.032 0.019 0.022 \n",
"[110/291] layers.11.feed_forward.w3.weight - [4096 x 11008], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[111/291] layers.11.ffn_norm.weight - [4096], type = f32, size = 0.016 MB\n",
"[112/291] layers.12.attention.wq.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[113/291] layers.12.attention.wk.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[114/291] layers.12.attention.wv.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.126 0.135 0.126 0.104 0.077 0.052 0.033 0.019 0.022 \n",
"[115/291] layers.12.attention.wo.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.132 0.124 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[116/291] layers.12.attention_norm.weight - [4096], type = f32, size = 0.016 MB\n",
"[117/291] layers.12.feed_forward.w1.weight - [4096 x 11008], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[118/291] layers.12.feed_forward.w2.weight - [11008 x 4096], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.126 0.134 0.126 0.104 0.077 0.052 0.033 0.019 0.022 \n",
"[119/291] layers.12.feed_forward.w3.weight - [4096 x 11008], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[120/291] layers.12.ffn_norm.weight - [4096], type = f32, size = 0.016 MB\n",
"[121/291] layers.13.attention.wq.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[122/291] layers.13.attention.wk.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.134 0.125 0.104 0.078 0.052 0.033 0.019 0.022 \n",
"[123/291] layers.13.attention.wv.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.126 0.135 0.126 0.104 0.077 0.052 0.033 0.019 0.022 \n",
"[124/291] layers.13.attention.wo.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.124 0.132 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[125/291] layers.13.attention_norm.weight - [4096], type = f32, size = 0.016 MB\n",
"[126/291] layers.13.feed_forward.w1.weight - [4096 x 11008], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[127/291] layers.13.feed_forward.w2.weight - [11008 x 4096], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.126 0.134 0.126 0.104 0.078 0.052 0.033 0.019 0.022 \n",
"[128/291] layers.13.feed_forward.w3.weight - [4096 x 11008], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[129/291] layers.13.ffn_norm.weight - [4096], type = f32, size = 0.016 MB\n",
"[130/291] layers.14.attention.wq.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[131/291] layers.14.attention.wk.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[132/291] layers.14.attention.wv.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.125 0.134 0.125 0.104 0.077 0.052 0.033 0.019 0.022 \n",
"[133/291] layers.14.attention.wo.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.124 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[134/291] layers.14.attention_norm.weight - [4096], type = f32, size = 0.016 MB\n",
"[135/291] layers.14.feed_forward.w1.weight - [4096 x 11008], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[136/291] layers.14.feed_forward.w2.weight - [11008 x 4096], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.126 0.134 0.126 0.104 0.077 0.052 0.033 0.019 0.022 \n",
"[137/291] layers.14.feed_forward.w3.weight - [4096 x 11008], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[138/291] layers.14.ffn_norm.weight - [4096], type = f32, size = 0.016 MB\n",
"[139/291] layers.15.attention.wq.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[140/291] layers.15.attention.wk.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[141/291] layers.15.attention.wv.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.126 0.134 0.126 0.104 0.077 0.053 0.033 0.019 0.022 \n",
"[142/291] layers.15.attention.wo.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.132 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[143/291] layers.15.attention_norm.weight - [4096], type = f32, size = 0.016 MB\n",
"[144/291] layers.15.feed_forward.w1.weight - [4096 x 11008], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.132 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[145/291] layers.15.feed_forward.w2.weight - [11008 x 4096], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.078 0.104 0.126 0.134 0.125 0.104 0.077 0.052 0.033 0.019 0.022 \n",
"[146/291] layers.15.feed_forward.w3.weight - [4096 x 11008], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[147/291] layers.15.ffn_norm.weight - [4096], type = f32, size = 0.016 MB\n",
"[148/291] layers.16.attention.wq.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[149/291] layers.16.attention.wk.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[150/291] layers.16.attention.wv.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.125 0.135 0.125 0.104 0.077 0.052 0.033 0.019 0.022 \n",
"[151/291] layers.16.attention.wo.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.132 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[152/291] layers.16.attention_norm.weight - [4096], type = f32, size = 0.016 MB\n",
"[153/291] layers.16.feed_forward.w1.weight - [4096 x 11008], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.132 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[154/291] layers.16.feed_forward.w2.weight - [11008 x 4096], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.078 0.104 0.126 0.134 0.126 0.104 0.078 0.052 0.033 0.019 0.022 \n",
"[155/291] layers.16.feed_forward.w3.weight - [4096 x 11008], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[156/291] layers.16.ffn_norm.weight - [4096], type = f32, size = 0.016 MB\n",
"[157/291] layers.17.attention.wq.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[158/291] layers.17.attention.wk.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[159/291] layers.17.attention.wv.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.126 0.134 0.125 0.104 0.077 0.052 0.033 0.019 0.022 \n",
"[160/291] layers.17.attention.wo.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.132 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[161/291] layers.17.attention_norm.weight - [4096], type = f32, size = 0.016 MB\n",
"[162/291] layers.17.feed_forward.w1.weight - [4096 x 11008], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.132 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[163/291] layers.17.feed_forward.w2.weight - [11008 x 4096], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.134 0.125 0.104 0.078 0.052 0.033 0.019 0.022 \n",
"[164/291] layers.17.feed_forward.w3.weight - [4096 x 11008], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[165/291] layers.17.ffn_norm.weight - [4096], type = f32, size = 0.016 MB\n",
"[166/291] layers.18.attention.wq.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[167/291] layers.18.attention.wk.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[168/291] layers.18.attention.wv.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.077 0.104 0.125 0.134 0.125 0.104 0.077 0.053 0.033 0.019 0.022 \n",
"[169/291] layers.18.attention.wo.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.124 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[170/291] layers.18.attention_norm.weight - [4096], type = f32, size = 0.016 MB\n",
"[171/291] layers.18.feed_forward.w1.weight - [4096 x 11008], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.132 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[172/291] layers.18.feed_forward.w2.weight - [11008 x 4096], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[173/291] layers.18.feed_forward.w3.weight - [4096 x 11008], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[174/291] layers.18.ffn_norm.weight - [4096], type = f32, size = 0.016 MB\n",
"[175/291] layers.19.attention.wq.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[176/291] layers.19.attention.wk.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[177/291] layers.19.attention.wv.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.078 0.104 0.125 0.134 0.125 0.104 0.077 0.053 0.033 0.019 0.022 \n",
"[178/291] layers.19.attention.wo.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.124 0.132 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[179/291] layers.19.attention_norm.weight - [4096], type = f32, size = 0.016 MB\n",
"[180/291] layers.19.feed_forward.w1.weight - [4096 x 11008], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.132 0.124 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[181/291] layers.19.feed_forward.w2.weight - [11008 x 4096], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[182/291] layers.19.feed_forward.w3.weight - [4096 x 11008], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[183/291] layers.19.ffn_norm.weight - [4096], type = f32, size = 0.016 MB\n",
"[184/291] layers.20.attention.wq.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[185/291] layers.20.attention.wk.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[186/291] layers.20.attention.wv.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[187/291] layers.20.attention.wo.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.132 0.124 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[188/291] layers.20.attention_norm.weight - [4096], type = f32, size = 0.016 MB\n",
"[189/291] layers.20.feed_forward.w1.weight - [4096 x 11008], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.132 0.124 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[190/291] layers.20.feed_forward.w2.weight - [11008 x 4096], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[191/291] layers.20.feed_forward.w3.weight - [4096 x 11008], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[192/291] layers.20.ffn_norm.weight - [4096], type = f32, size = 0.016 MB\n",
"[193/291] layers.21.attention.wq.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[194/291] layers.21.attention.wk.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.077 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[195/291] layers.21.attention.wv.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.077 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[196/291] layers.21.attention.wo.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.124 0.132 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[197/291] layers.21.attention_norm.weight - [4096], type = f32, size = 0.016 MB\n",
"[198/291] layers.21.feed_forward.w1.weight - [4096 x 11008], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.132 0.124 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[199/291] layers.21.feed_forward.w2.weight - [11008 x 4096], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[200/291] layers.21.feed_forward.w3.weight - [4096 x 11008], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[201/291] layers.21.ffn_norm.weight - [4096], type = f32, size = 0.016 MB\n",
"[202/291] layers.22.attention.wq.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[203/291] layers.22.attention.wk.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[204/291] layers.22.attention.wv.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.134 0.125 0.104 0.077 0.053 0.033 0.019 0.022 \n",
"[205/291] layers.22.attention.wo.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.124 0.132 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[206/291] layers.22.attention_norm.weight - [4096], type = f32, size = 0.016 MB\n",
"[207/291] layers.22.feed_forward.w1.weight - [4096 x 11008], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.132 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[208/291] layers.22.feed_forward.w2.weight - [11008 x 4096], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[209/291] layers.22.feed_forward.w3.weight - [4096 x 11008], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[210/291] layers.22.ffn_norm.weight - [4096], type = f32, size = 0.016 MB\n",
"[211/291] layers.23.attention.wq.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[212/291] layers.23.attention.wk.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.077 0.104 0.125 0.134 0.125 0.104 0.077 0.053 0.033 0.019 0.022 \n",
"[213/291] layers.23.attention.wv.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.134 0.125 0.104 0.077 0.053 0.033 0.019 0.022 \n",
"[214/291] layers.23.attention.wo.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[215/291] layers.23.attention_norm.weight - [4096], type = f32, size = 0.016 MB\n",
"[216/291] layers.23.feed_forward.w1.weight - [4096 x 11008], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.132 0.124 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[217/291] layers.23.feed_forward.w2.weight - [11008 x 4096], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[218/291] layers.23.feed_forward.w3.weight - [4096 x 11008], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[219/291] layers.23.ffn_norm.weight - [4096], type = f32, size = 0.016 MB\n",
"[220/291] layers.24.attention.wq.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[221/291] layers.24.attention.wk.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[222/291] layers.24.attention.wv.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[223/291] layers.24.attention.wo.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.132 0.124 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[224/291] layers.24.attention_norm.weight - [4096], type = f32, size = 0.016 MB\n",
"[225/291] layers.24.feed_forward.w1.weight - [4096 x 11008], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.132 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[226/291] layers.24.feed_forward.w2.weight - [11008 x 4096], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[227/291] layers.24.feed_forward.w3.weight - [4096 x 11008], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[228/291] layers.24.ffn_norm.weight - [4096], type = f32, size = 0.016 MB\n",
"[229/291] layers.25.attention.wq.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[230/291] layers.25.attention.wk.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[231/291] layers.25.attention.wv.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[232/291] layers.25.attention.wo.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[233/291] layers.25.attention_norm.weight - [4096], type = f32, size = 0.016 MB\n",
"[234/291] layers.25.feed_forward.w1.weight - [4096 x 11008], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.132 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[235/291] layers.25.feed_forward.w2.weight - [11008 x 4096], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[236/291] layers.25.feed_forward.w3.weight - [4096 x 11008], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[237/291] layers.25.ffn_norm.weight - [4096], type = f32, size = 0.016 MB\n",
"[238/291] layers.26.attention.wq.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[239/291] layers.26.attention.wk.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[240/291] layers.26.attention.wv.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[241/291] layers.26.attention.wo.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.132 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[242/291] layers.26.attention_norm.weight - [4096], type = f32, size = 0.016 MB\n",
"[243/291] layers.26.feed_forward.w1.weight - [4096 x 11008], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.132 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[244/291] layers.26.feed_forward.w2.weight - [11008 x 4096], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[245/291] layers.26.feed_forward.w3.weight - [4096 x 11008], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[246/291] layers.26.ffn_norm.weight - [4096], type = f32, size = 0.016 MB\n",
"[247/291] layers.27.attention.wq.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[248/291] layers.27.attention.wk.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[249/291] layers.27.attention.wv.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[250/291] layers.27.attention.wo.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[251/291] layers.27.attention_norm.weight - [4096], type = f32, size = 0.016 MB\n",
"[252/291] layers.27.feed_forward.w1.weight - [4096 x 11008], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.132 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[253/291] layers.27.feed_forward.w2.weight - [11008 x 4096], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[254/291] layers.27.feed_forward.w3.weight - [4096 x 11008], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[255/291] layers.27.ffn_norm.weight - [4096], type = f32, size = 0.016 MB\n",
"[256/291] layers.28.attention.wq.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[257/291] layers.28.attention.wk.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.078 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[258/291] layers.28.attention.wv.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.077 0.104 0.125 0.134 0.125 0.104 0.077 0.053 0.033 0.019 0.022 \n",
"[259/291] layers.28.attention.wo.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[260/291] layers.28.attention_norm.weight - [4096], type = f32, size = 0.016 MB\n",
"[261/291] layers.28.feed_forward.w1.weight - [4096 x 11008], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.132 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[262/291] layers.28.feed_forward.w2.weight - [11008 x 4096], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.078 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[263/291] layers.28.feed_forward.w3.weight - [4096 x 11008], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[264/291] layers.28.ffn_norm.weight - [4096], type = f32, size = 0.016 MB\n",
"[265/291] layers.29.attention.wq.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[266/291] layers.29.attention.wk.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.077 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[267/291] layers.29.attention.wv.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[268/291] layers.29.attention.wo.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[269/291] layers.29.attention_norm.weight - [4096], type = f32, size = 0.016 MB\n",
"[270/291] layers.29.feed_forward.w1.weight - [4096 x 11008], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[271/291] layers.29.feed_forward.w2.weight - [11008 x 4096], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.032 0.052 0.077 0.104 0.126 0.134 0.126 0.104 0.077 0.052 0.032 0.019 0.022 \n",
"[272/291] layers.29.feed_forward.w3.weight - [4096 x 11008], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[273/291] layers.29.ffn_norm.weight - [4096], type = f32, size = 0.016 MB\n",
"[274/291] layers.30.attention.wq.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.077 0.104 0.126 0.134 0.125 0.104 0.077 0.052 0.033 0.019 0.022 \n",
"[275/291] layers.30.attention.wk.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.077 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[276/291] layers.30.attention.wv.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.077 0.104 0.125 0.134 0.125 0.104 0.077 0.053 0.033 0.019 0.022 \n",
"[277/291] layers.30.attention.wo.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[278/291] layers.30.attention_norm.weight - [4096], type = f32, size = 0.016 MB\n",
"[279/291] layers.30.feed_forward.w1.weight - [4096 x 11008], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[280/291] layers.30.feed_forward.w2.weight - [11008 x 4096], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.018 0.032 0.051 0.076 0.104 0.128 0.137 0.128 0.104 0.076 0.051 0.032 0.018 0.022 \n",
"[281/291] layers.30.feed_forward.w3.weight - [4096 x 11008], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[282/291] layers.30.ffn_norm.weight - [4096], type = f32, size = 0.016 MB\n",
"[283/291] layers.31.attention.wq.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.032 0.052 0.077 0.104 0.126 0.135 0.126 0.104 0.077 0.052 0.032 0.019 0.022 \n",
"[284/291] layers.31.attention.wk.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.126 0.134 0.126 0.104 0.077 0.052 0.033 0.019 0.022 \n",
"[285/291] layers.31.attention.wv.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.032 0.052 0.077 0.104 0.126 0.135 0.126 0.104 0.077 0.052 0.032 0.019 0.022 \n",
"[286/291] layers.31.attention.wo.weight - [4096 x 4096], type = f16, quantizing .. size = 32.00 MB -> 10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[287/291] layers.31.attention_norm.weight - [4096], type = f32, size = 0.016 MB\n",
"[288/291] layers.31.feed_forward.w1.weight - [4096 x 11008], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[289/291] layers.31.feed_forward.w2.weight - [11008 x 4096], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.021 0.018 0.031 0.050 0.075 0.104 0.130 0.140 0.130 0.104 0.075 0.050 0.031 0.018 0.021 \n",
"[290/291] layers.31.feed_forward.w3.weight - [4096 x 11008], type = f16, quantizing .. size = 86.00 MB -> 26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.077 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"[291/291] layers.31.ffn_norm.weight - [4096], type = f32, size = 0.016 MB\n",
"llama_model_quantize_internal: model size = 13133.55 MB\n",
"llama_model_quantize_internal: quant size = 4104.93 MB\n",
"llama_model_quantize_internal: hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
"\n",
"main: quantize time = 178732.41 ms\n",
"main: total time = 178732.41 ms\n"
]
}
]
},
{
"cell_type": "markdown",
"source": [
"### (可选)测试量化模型解码\n",
"至此已完成了所有转换步骤。\n",
"我们运行一条命令测试一下是否能够正常加载并进行对话。\n",
"\n",
"FP16和Q4量化文件存放在./llama.cpp/zh-models/7B下,可按需下载使用。"
],
"metadata": {
"id": "DLkuRAo9Vkb1"
}
},
{
"cell_type": "code",
"source": [
"!cd llama.cpp && ./main -m ./zh-models/7B/ggml-model-q4_0.bin --color -f ./prompts/alpaca.txt -p \"详细介绍一下北京的名胜古迹:\" -n 512"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "tW-ep1BsVQtG",
"outputId": "0706c974-127e-4f21-be6b-d71ea4fb989b"
},
"execution_count": 10,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"main: seed = 1681467955\n",
"llama.cpp: loading model from ./zh-models/7B/ggml-model-q4_0.bin\n",
"llama_model_load_internal: format = ggjt v1 (latest)\n",
"llama_model_load_internal: n_vocab = 49954\n",
"llama_model_load_internal: n_ctx = 512\n",
"llama_model_load_internal: n_embd = 4096\n",
"llama_model_load_internal: n_mult = 256\n",
"llama_model_load_internal: n_head = 32\n",
"llama_model_load_internal: n_layer = 32\n",
"llama_model_load_internal: n_rot = 128\n",
"llama_model_load_internal: ftype = 2 (mostly Q4_0)\n",
"llama_model_load_internal: n_ff = 11008\n",
"llama_model_load_internal: n_parts = 1\n",
"llama_model_load_internal: model size = 7B\n",
"llama_model_load_internal: ggml ctx size = 59.11 KB\n",
"llama_model_load_internal: mem required = 5896.99 MB (+ 1026.00 MB per state)\n",
"llama_init_from_file: kv self size = 256.00 MB\n",
"\n",
"system_info: n_threads = 40 / 40 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | \n",
"sampling: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.100000\n",
"generate: n_ctx = 512, n_batch = 8, n_predict = 512, n_keep = 0\n",
"\n",
"\n",
"\u001b[33m 详细介绍一下北京的名胜古迹:\u001b[0m\n",
" 故宫:明、清两代皇室,御花园及八达门大街。 宫殿内有大量文物珍品; [end of text]\n",
"\n",
"llama_print_timings: load time = 717.01 ms\n",
"llama_print_timings: sample time = 48.97 ms / 32 runs ( 1.53 ms per run)\n",
"llama_print_timings: prompt eval time = 680.93 ms / 11 tokens ( 61.90 ms per token)\n",
"llama_print_timings: eval time = 4490.00 ms / 31 runs ( 144.84 ms per run)\n",
"llama_print_timings: total time = 5461.05 ms\n"
]
}
]
}
]
}
================================================
FILE: notebooks/pretrain_chinese_llama_lora.ipynb
================================================
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "cHgpJN1uzD8B"
},
"source": [
"# Tutorial on pre-training of Chinese-LLaMA-7B\n",
"\n",
"More info: https://github.com/ymcui/Chinese-LLaMA-Alpaca"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "2meQHBlHxcsi"
},
"source": [
"## Install Dependencies"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "quRXOPaZwmwz",
"outputId": "6d9febc4-b0d2-41ee-ce66-9284758e928e"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n",
"Collecting transformers==4.28.1\n",
" Downloading transformers-4.28.1-py3-none-any.whl (7.0 MB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m7.0/7.0 MB\u001b[0m \u001b[31m83.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hRequirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from transformers==4.28.1) (3.12.0)\n",
"Collecting huggingface-hub<1.0,>=0.11.0 (from transformers==4.28.1)\n",
" Downloading huggingface_hub-0.14.1-py3-none-any.whl (224 kB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m224.5/224.5 kB\u001b[0m \u001b[31m28.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hRequirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.10/dist-packages (from transformers==4.28.1) (1.22.4)\n",
"Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from transformers==4.28.1) (23.1)\n",
"Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.10/dist-packages (from transformers==4.28.1) (6.0)\n",
"Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.10/dist-packages (from transformers==4.28.1) (2022.10.31)\n",
"Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from transformers==4.28.1) (2.27.1)\n",
"Collecting tokenizers!=0.11.3,<0.14,>=0.11.1 (from transformers==4.28.1)\n",
" Downloading tokenizers-0.13.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.8 MB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m7.8/7.8 MB\u001b[0m \u001b[31m105.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hRequirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.10/dist-packages (from transformers==4.28.1) (4.65.0)\n",
"Requirement already satisfied: fsspec in /usr/local/lib/python3.10/dist-packages (from huggingface-hub<1.0,>=0.11.0->transformers==4.28.1) (2023.4.0)\n",
"Requirement already satisfied: typing-extensions>=3.7.4.3 in /usr/local/lib/python3.10/dist-packages (from huggingface-hub<1.0,>=0.11.0->transformers==4.28.1) (4.5.0)\n",
"Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests->transformers==4.28.1) (1.26.15)\n",
"Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests->transformers==4.28.1) (2022.12.7)\n",
"Requirement already satisfied: charset-normalizer~=2.0.0 in /usr/local/lib/python3.10/dist-packages (from requests->transformers==4.28.1) (2.0.12)\n",
"Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests->transformers==4.28.1) (3.4)\n",
"Installing collected packages: tokenizers, huggingface-hub, transformers\n",
"Successfully installed huggingface-hub-0.14.1 tokenizers-0.13.3 transformers-4.28.1\n",
"Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n",
"Collecting git+https://github.com/huggingface/peft.git@13e53fc\n",
" Cloning https://github.com/huggingface/peft.git (to revision 13e53fc) to /tmp/pip-req-build-6fdyuocz\n",
" Running command git clone --filter=blob:none --quiet https://github.com/huggingface/peft.git /tmp/pip-req-build-6fdyuocz\n",
"\u001b[33m WARNING: Did not find branch or tag '13e53fc', assuming revision or ref.\u001b[0m\u001b[33m\n",
"\u001b[0m Running command git checkout -q 13e53fc\n",
" Resolved https://github.com/huggingface/peft.git to commit 13e53fc\n",
" Installing build dependencies ... \u001b[?25l\u001b[?25hdone\n",
" Getting requirements to build wheel ... \u001b[?25l\u001b[?25hdone\n",
" Preparing metadata (pyproject.toml) ... \u001b[?25l\u001b[?25hdone\n",
"Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.10/dist-packages (from peft==0.3.0.dev0) (1.22.4)\n",
"Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from peft==0.3.0.dev0) (23.1)\n",
"Requirement already satisfied: psutil in /usr/local/lib/python3.10/dist-packages (from peft==0.3.0.dev0) (5.9.5)\n",
"Requirement already satisfied: pyyaml in /usr/local/lib/python3.10/dist-packages (from peft==0.3.0.dev0) (6.0)\n",
"Requirement already satisfied: torch>=1.13.0 in /usr/local/lib/python3.10/dist-packages (from peft==0.3.0.dev0) (2.0.0+cu118)\n",
"Requirement already satisfied: transformers in /usr/local/lib/python3.10/dist-packages (from peft==0.3.0.dev0) (4.28.1)\n",
"Collecting accelerate (from peft==0.3.0.dev0)\n",
" Downloading accelerate-0.19.0-py3-none-any.whl (219 kB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m219.1/219.1 kB\u001b[0m \u001b[31m18.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hRequirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from torch>=1.13.0->peft==0.3.0.dev0) (3.12.0)\n",
"Requirement already satisfied: typing-extensions in /usr/local/lib/python3.10/dist-packages (from torch>=1.13.0->peft==0.3.0.dev0) (4.5.0)\n",
"Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from torch>=1.13.0->peft==0.3.0.dev0) (1.11.1)\n",
"Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch>=1.13.0->peft==0.3.0.dev0) (3.1)\n",
"Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from torch>=1.13.0->peft==0.3.0.dev0) (3.1.2)\n",
"Requirement already satisfied: triton==2.0.0 in /usr/local/lib/python3.10/dist-packages (from torch>=1.13.0->peft==0.3.0.dev0) (2.0.0)\n",
"Requirement already satisfied: cmake in /usr/local/lib/python3.10/dist-packages (from triton==2.0.0->torch>=1.13.0->peft==0.3.0.dev0) (3.25.2)\n",
"Requirement already satisfied: lit in /usr/local/lib/python3.10/dist-packages (from triton==2.0.0->torch>=1.13.0->peft==0.3.0.dev0) (16.0.3)\n",
"Requirement already satisfied: huggingface-hub<1.0,>=0.11.0 in /usr/local/lib/python3.10/dist-packages (from transformers->peft==0.3.0.dev0) (0.14.1)\n",
"Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.10/dist-packages (from transformers->peft==0.3.0.dev0) (2022.10.31)\n",
"Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from transformers->peft==0.3.0.dev0) (2.27.1)\n",
"Requirement already satisfied: tokenizers!=0.11.3,<0.14,>=0.11.1 in /usr/local/lib/python3.10/dist-packages (from transformers->peft==0.3.0.dev0) (0.13.3)\n",
"Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.10/dist-packages (from transformers->peft==0.3.0.dev0) (4.65.0)\n",
"Requirement already satisfied: fsspec in /usr/local/lib/python3.10/dist-packages (from huggingface-hub<1.0,>=0.11.0->transformers->peft==0.3.0.dev0) (2023.4.0)\n",
"Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->torch>=1.13.0->peft==0.3.0.dev0) (2.1.2)\n",
"Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests->transformers->peft==0.3.0.dev0) (1.26.15)\n",
"Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests->transformers->peft==0.3.0.dev0) (2022.12.7)\n",
"Requirement already satisfied: charset-normalizer~=2.0.0 in /usr/local/lib/python3.10/dist-packages (from requests->transformers->peft==0.3.0.dev0) (2.0.12)\n",
"Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests->transformers->peft==0.3.0.dev0) (3.4)\n",
"Requirement already satisfied: mpmath>=0.19 in /usr/local/lib/python3.10/dist-packages (from sympy->torch>=1.13.0->peft==0.3.0.dev0) (1.3.0)\n",
"Building wheels for collected packages: peft\n",
" Building wheel for peft (pyproject.toml) ... \u001b[?25l\u001b[?25hdone\n",
" Created wheel for peft: filename=peft-0.3.0.dev0-py3-none-any.whl size=40652 sha256=45f5ed4c68d8fd79aafc4e46cab06b9f7825ddaf6ffbd65c3b9a48aff09aef7c\n",
" Stored in directory: /tmp/pip-ephem-wheel-cache-2yuu1wzt/wheels/d9/13/c6/404d5f8a81c5620f65f7fd75b6a66619f013cd79c2875b981c\n",
"Successfully built peft\n",
"Installing collected packages: accelerate, peft\n",
"Successfully installed accelerate-0.19.0 peft-0.3.0.dev0\n",
"Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n",
"Collecting datasets\n",
" Downloading datasets-2.12.0-py3-none-any.whl (474 kB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m474.6/474.6 kB\u001b[0m \u001b[31m31.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hRequirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.10/dist-packages (from datasets) (1.22.4)\n",
"Requirement already satisfied: pyarrow>=8.0.0 in /usr/local/lib/python3.10/dist-packages (from datasets) (9.0.0)\n",
"Collecting dill<0.3.7,>=0.3.0 (from datasets)\n",
" Downloading dill-0.3.6-py3-none-any.whl (110 kB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m110.5/110.5 kB\u001b[0m \u001b[31m17.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hRequirement already satisfied: pandas in /usr/local/lib/python3.10/dist-packages (from datasets) (1.5.3)\n",
"Requirement already satisfied: requests>=2.19.0 in /usr/local/lib/python3.10/dist-packages (from datasets) (2.27.1)\n",
"Requirement already satisfied: tqdm>=4.62.1 in /usr/local/lib/python3.10/dist-packages (from datasets) (4.65.0)\n",
"Collecting xxhash (from datasets)\n",
" Downloading xxhash-3.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (212 kB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m212.5/212.5 kB\u001b[0m \u001b[31m27.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hCollecting multiprocess (from datasets)\n",
" Downloading multiprocess-0.70.14-py310-none-any.whl (134 kB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m134.3/134.3 kB\u001b[0m \u001b[31m20.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hRequirement already satisfied: fsspec[http]>=2021.11.1 in /usr/local/lib/python3.10/dist-packages (from datasets) (2023.4.0)\n",
"Collecting aiohttp (from datasets)\n",
" Downloading aiohttp-3.8.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.0 MB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.0/1.0 MB\u001b[0m \u001b[31m71.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hRequirement already satisfied: huggingface-hub<1.0.0,>=0.11.0 in /usr/local/lib/python3.10/dist-packages (from datasets) (0.14.1)\n",
"Requirement already satisfied: packaging in /usr/local/lib/python3.10/dist-packages (from datasets) (23.1)\n",
"Collecting responses<0.19 (from datasets)\n",
" Downloading responses-0.18.0-py3-none-any.whl (38 kB)\n",
"Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.10/dist-packages (from datasets) (6.0)\n",
"Requirement already satisfied: attrs>=17.3.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets) (23.1.0)\n",
"Requirement already satisfied: charset-normalizer<4.0,>=2.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp->datasets) (2.0.12)\n",
"Collecting multidict<7.0,>=4.5 (from aiohttp->datasets)\n",
" Downloading multidict-6.0.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (114 kB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m114.5/114.5 kB\u001b[0m \u001b[31m16.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hCollecting async-timeout<5.0,>=4.0.0a3 (from aiohttp->datasets)\n",
" Downloading async_timeout-4.0.2-py3-none-any.whl (5.8 kB)\n",
"Collecting yarl<2.0,>=1.0 (from aiohttp->datasets)\n",
" Downloading yarl-1.9.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (268 kB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m268.8/268.8 kB\u001b[0m \u001b[31m35.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hCollecting frozenlist>=1.1.1 (from aiohttp->datasets)\n",
" Downloading frozenlist-1.3.3-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (149 kB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m149.6/149.6 kB\u001b[0m \u001b[31m24.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hCollecting aiosignal>=1.1.2 (from aiohttp->datasets)\n",
" Downloading aiosignal-1.3.1-py3-none-any.whl (7.6 kB)\n",
"Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from huggingface-hub<1.0.0,>=0.11.0->datasets) (3.12.0)\n",
"Requirement already satisfied: typing-extensions>=3.7.4.3 in /usr/local/lib/python3.10/dist-packages (from huggingface-hub<1.0.0,>=0.11.0->datasets) (4.5.0)\n",
"Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests>=2.19.0->datasets) (1.26.15)\n",
"Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests>=2.19.0->datasets) (2022.12.7)\n",
"Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from requests>=2.19.0->datasets) (3.4)\n",
"Requirement already satisfied: python-dateutil>=2.8.1 in /usr/local/lib/python3.10/dist-packages (from pandas->datasets) (2.8.2)\n",
"Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas->datasets) (2022.7.1)\n",
"Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.8.1->pandas->datasets) (1.16.0)\n",
"Installing collected packages: xxhash, multidict, frozenlist, dill, async-timeout, yarl, responses, multiprocess, aiosignal, aiohttp, datasets\n",
"Successfully installed aiohttp-3.8.4 aiosignal-1.3.1 async-timeout-4.0.2 datasets-2.12.0 dill-0.3.6 frozenlist-1.3.3 multidict-6.0.4 multiprocess-0.70.14 responses-0.18.0 xxhash-3.2.0 yarl-1.9.2\n",
"Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n",
"Collecting sentencepiece\n",
" Downloading sentencepiece-0.1.99-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.3/1.3 MB\u001b[0m \u001b[31m60.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hInstalling collected packages: sentencepiece\n",
"Successfully installed sentencepiece-0.1.99\n",
"Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n",
"Collecting deepspeed\n",
" Downloading deepspeed-0.9.2.tar.gz (779 kB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m779.3/779.3 kB\u001b[0m \u001b[31m28.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25h Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
"Collecting hjson (from deepspeed)\n",
" Downloading hjson-3.1.0-py3-none-any.whl (54 kB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m54.0/54.0 kB\u001b[0m \u001b[31m8.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hCollecting ninja (from deepspeed)\n",
" Downloading ninja-1.11.1-py2.py3-none-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (145 kB)\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m146.0/146.0 kB\u001b[0m \u001b[31m22.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25hRequirement already satisfied: numpy in /usr/local/lib/python3.10/dist-packages (from deepspeed) (1.22.4)\n",
"Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from deepspeed) (23.1)\n",
"Requirement already satisfied: psutil in /usr/local/lib/python3.10/dist-packages (from deepspeed) (5.9.5)\n",
"Requirement already satisfied: py-cpuinfo in /usr/local/lib/python3.10/dist-packages (from deepspeed) (9.0.0)\n",
"Requirement already satisfied: pydantic<2.0.0 in /usr/local/lib/python3.10/dist-packages (from deepspeed) (1.10.7)\n",
"Requirement already satisfied: torch in /usr/local/lib/python3.10/dist-packages (from deepspeed) (2.0.0+cu118)\n",
"Requirement already satisfied: tqdm in /usr/local/lib/python3.10/dist-packages (from deepspeed) (4.65.0)\n",
"Requirement already satisfied: typing-extensions>=4.2.0 in /usr/local/lib/python3.10/dist-packages (from pydantic<2.0.0->deepspeed) (4.5.0)\n",
"Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from torch->deepspeed) (3.12.0)\n",
"Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from torch->deepspeed) (1.11.1)\n",
"Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch->deepspeed) (3.1)\n",
"Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from torch->deepspeed) (3.1.2)\n",
"Requirement already satisfied: triton==2.0.0 in /usr/local/lib/python3.10/dist-packages (from torch->deepspeed) (2.0.0)\n",
"Requirement already satisfied: cmake in /usr/local/lib/python3.10/dist-packages (from triton==2.0.0->torch->deepspeed) (3.25.2)\n",
"Requirement already satisfied: lit in /usr/local/lib/python3.10/dist-packages (from triton==2.0.0->torch->deepspeed) (16.0.3)\n",
"Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->torch->deepspeed) (2.1.2)\n",
"Requirement already satisfied: mpmath>=0.19 in /usr/local/lib/python3.10/dist-packages (from sympy->torch->deepspeed) (1.3.0)\n",
"Building wheels for collected packages: deepspeed\n",
" Building wheel for deepspeed (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
" Created wheel for deepspeed: filename=deepspeed-0.9.2-py3-none-any.whl size=811216 sha256=493b7db5f75d0669b1e656a7ae283c973e2e11ee59959cbd7f71ceb648bdfa27\n",
" Stored in directory: /root/.cache/pip/wheels/a6/d2/b1/b15210b5dc024bab4eccbac2148db29959fe01fe6042557d07\n",
"Successfully built deepspeed\n",
"Installing collected packages: ninja, hjson, deepspeed\n",
"Successfully installed deepspeed-0.9.2 hjson-3.1.0 ninja-1.11.1\n"
]
}
],
"source": [
"!pip install transformers==4.28.1\n",
"!pip install git+https://github.com/huggingface/peft.git@13e53fc\n",
"!pip install datasets\n",
"!pip install sentencepiece\n",
"!pip install deepspeed"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "ji21WFqexASI"
},
"source": [
"## Clone our repository\n",
"\n",
"\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "dulrlPMexFNN",
"outputId": "b9168c88-e72e-4f18-9450-d28832f7fe58"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Cloning into 'Chinese-LLaMA-Alpaca'...\n",
"remote: Enumerating objects: 911, done.\u001b[K\n",
"remote: Counting objects: 100% (352/352), done.\u001b[K\n",
"remote: Compressing objects: 100% (233/233), done.\u001b[K\n",
"remote: Total 911 (delta 135), reused 211 (delta 117), pack-reused 559\u001b[K\n",
"Receiving objects: 100% (911/911), 18.13 MiB | 10.57 MiB/s, done.\n",
"Resolving deltas: 100% (527/527), done.\n"
]
}
],
"source": [
"!git clone https://github.com/ymcui/Chinese-LLaMA-Alpaca.git"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "pqH_h_ZAz4_e"
},
"source": [
"## Pre-training for LLaMA-7B\n",
"\n",
"This follows the setting in https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/Pretraining-Script, except that to simplify the tutorial,\n",
"- only train 100 steps\n",
"- use a sample data file built from alpaca_data_zh_51k.json"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!mkdir Chinese-LLaMA-Alpaca/pt_data\n",
"!cp Chinese-LLaMA-Alpaca/data/pt_sample_data.txt Chinese-LLaMA-Alpaca/pt_data"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "IhrAVNUKSw9_",
"outputId": "4f240f2b-2396-405b-9a94-84446db8d3e6"
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"2023-05-12 06:15:17.883035: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT\n",
"[2023-05-12 06:15:18,834] [INFO] [comm.py:622:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl\n",
"05/12/2023 06:15:21 - WARNING - __main__ - Process rank: 0, device: cuda:0, n_gpu: 1distributed training: True, 16-bits training: True\n",
"[INFO|configuration_utils.py:668] 2023-05-12 06:15:21,697 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--decapoda-research--llama-7b-hf/snapshots/5f98eefcc80e437ef68d457ad7bf167c2c6a1348/config.json\n",
"[INFO|configuration_utils.py:720] 2023-05-12 06:15:21,698 >> Model config LlamaConfig {\n",
" \"_name_or_path\": \"decapoda-research/llama-7b-hf\",\n",
" \"architectures\": [\n",
" \"LLaMAForCausalLM\"\n",
" ],\n",
" \"bos_token_id\": 0,\n",
" \"eos_token_id\": 1,\n",
" \"hidden_act\": \"silu\",\n",
" \"hidden_size\": 4096,\n",
" \"initializer_range\": 0.02,\n",
" \"intermediate_size\": 11008,\n",
" \"max_position_embeddings\": 2048,\n",
" \"max_sequence_length\": 2048,\n",
" \"model_type\": \"llama\",\n",
" \"num_attention_heads\": 32,\n",
" \"num_hidden_layers\": 32,\n",
" \"pad_token_id\": -1,\n",
" \"rms_norm_eps\": 1e-06,\n",
" \"tie_word_embeddings\": false,\n",
" \"torch_dtype\": \"float16\",\n",
" \"transformers_version\": \"4.28.1\",\n",
" \"use_cache\": true,\n",
" \"vocab_size\": 32000\n",
"}\n",
"\n",
"[INFO|tokenization_utils_base.py:1809] 2023-05-12 06:15:21,947 >> loading file tokenizer.model from cache at /root/.cache/huggingface/hub/models--ziqingyang--chinese-llama-lora-7b/snapshots/b5e520ae0a1282c6105a72ad6063a3b3de211067/tokenizer.model\n",
"[INFO|tokenization_utils_base.py:1809] 2023-05-12 06:15:21,947 >> loading file added_tokens.json from cache at None\n",
"[INFO|tokenization_utils_base.py:1809] 2023-05-12 06:15:21,947 >> loading file special_tokens_map.json from cache at /root/.cache/huggingface/hub/models--ziqingyang--chinese-llama-lora-7b/snapshots/b5e520ae0a1282c6105a72ad6063a3b3de211067/special_tokens_map.json\n",
"[INFO|tokenization_utils_base.py:1809] 2023-05-12 06:15:21,947 >> loading file tokenizer_config.json from cache at /root/.cache/huggingface/hub/models--ziqingyang--chinese-llama-lora-7b/snapshots/b5e520ae0a1282c6105a72ad6063a3b3de211067/tokenizer_config.json\n",
"05/12/2023 06:15:22 - INFO - datasets.builder - Using custom data configuration default-6e69300db3ed54e6\n",
"05/12/2023 06:15:22 - INFO - datasets.info - Loading Dataset Infos from /usr/local/lib/python3.10/dist-packages/datasets/packaged_modules/text\n",
"05/12/2023 06:15:22 - INFO - datasets.builder - Generating dataset text (/content/Chinese-LLaMA-Alpaca/scripts/data_cache/pt_sample_data_text/text/default-6e69300db3ed54e6/0.0.0/cb1e9bd71a82ad27976be3b12b407850fe2837d80c22c5e03a28949843a8ace2)\n",
"Downloading and preparing dataset text/default to /content/Chinese-LLaMA-Alpaca/scripts/data_cache/pt_sample_data_text/text/default-6e69300db3ed54e6/0.0.0/cb1e9bd71a82ad27976be3b12b407850fe2837d80c22c5e03a28949843a8ace2...\n",
"Downloading data files: 100% 1/1 [00:00<00:00, 7667.83it/s]\n",
"05/12/2023 06:15:22 - INFO - datasets.download.download_manager - Downloading took 0.0 min\n",
"05/12/2023 06:15:22 - INFO - datasets.download.download_manager - Checksum Computation took 0.0 min\n",
"Extracting data files: 100% 1/1 [00:00<00:00, 1421.32it/s]\n",
"05/12/2023 06:15:22 - INFO - datasets.builder - Generating train split\n",
"05/12/2023 06:15:23 - INFO - datasets.utils.info_utils - Unable to verify splits sizes.\n",
"Dataset text downloaded and prepared to /content/Chinese-LLaMA-Alpaca/scripts/data_cache/pt_sample_data_text/text/default-6e69300db3ed54e6/0.0.0/cb1e9bd71a82ad27976be3b12b407850fe2837d80c22c5e03a28949843a8ace2. Subsequent calls will reuse this data.\n",
"100% 1/1 [00:00<00:00, 218.67it/s]\n",
"05/12/2023 06:15:23 - INFO - __main__ - pt_sample_data.txt has been loaded\n",
"05/12/2023 06:15:23 - INFO - datasets.arrow_dataset - Process #0 will write at data_cache/pt_sample_data_text/tokenized_00000_of_00008.arrow\n",
"05/12/2023 06:15:23 - INFO - datasets.arrow_dataset - Process #1 will write at data_cache/pt_sample_data_text/tokenized_00001_of_00008.arrow\n",
"05/12/2023 06:15:23 - INFO - datasets.arrow_dataset - Process #2 will write at data_cache/pt_sample_data_text/tokenized_00002_of_00008.arrow\n",
"05/12/2023 06:15:23 - INFO - datasets.arrow_dataset - Process #3 will write at data_cache/pt_sample_data_text/tokenized_00003_of_00008.arrow\n",
"05/12/2023 06:15:23 - INFO - datasets.arrow_dataset - Process #4 will write at data_cache/pt_sample_data_text/tokenized_00004_of_00008.arrow\n",
"05/12/2023 06:15:23 - INFO - datasets.arrow_dataset - Process #5 will write at data_cache/pt_sample_data_text/tokenized_00005_of_00008.arrow\n",
"05/12/2023 06:15:23 - INFO - datasets.arrow_dataset - Process #6 will write at data_cache/pt_sample_data_text/tokenized_00006_of_00008.arrow\n",
"05/12/2023 06:15:23 - INFO - datasets.arrow_dataset - Process #7 will write at data_cache/pt_sample_data_text/tokenized_00007_of_00008.arrow\n",
"05/12/2023 06:15:23 - INFO - datasets.arrow_dataset - Spawning 8 processes\n",
"Running tokenizer on dataset (num_proc=8): 0% 0/125987 [00:00, ? examples/s]05/12/2023 06:15:23 - INFO - datasets.arrow_dataset - Caching processed dataset at data_cache/pt_sample_data_text/tokenized_00003_of_00008.arrow\n",
"05/12/2023 06:15:23 - INFO - datasets.arrow_dataset - Caching processed dataset at data_cache/pt_sample_data_text/tokenized_00000_of_00008.arrow\n",
"Running tokenizer on dataset (num_proc=8): 1% 1000/125987 [00:00<00:29, 4287.42 examples/s]05/12/2023 06:15:23 - INFO - datasets.arrow_dataset - Caching processed dataset at data_cache/pt_sample_data_text/tokenized_00005_of_00008.arrow\n",
"05/12/2023 06:15:23 - INFO - datasets.arrow_dataset - Caching processed dataset at data_cache/pt_sample_data_text/tokenized_00004_of_00008.arrow\n",
"05/12/2023 06:15:23 - INFO - datasets.arrow_dataset - Caching processed dataset at data_cache/pt_sample_data_text/tokenized_00002_of_00008.arrow\n",
"05/12/2023 06:15:23 - INFO - datasets.arrow_dataset - Caching processed dataset at data_cache/pt_sample_data_text/tokenized_00001_of_00008.arrow\n",
"05/12/2023 06:15:23 - INFO - datasets.arrow_dataset - Caching processed dataset at data_cache/pt_sample_data_text/tokenized_00006_of_00008.arrow\n",
"05/12/2023 06:15:23 - INFO - datasets.arrow_dataset - Caching processed dataset at data_cache/pt_sample_data_text/tokenized_00007_of_00008.arrow\n",
"05/12/2023 06:15:26 - INFO - datasets.arrow_dataset - Concatenating 8 shards\n",
"05/12/2023 06:15:26 - INFO - datasets.arrow_dataset - Process #0 will write at data_cache/pt_sample_data_text/grouped_00000_of_00008.arrow\n",
"05/12/2023 06:15:26 - INFO - datasets.arrow_dataset - Process #1 will write at data_cache/pt_sample_data_text/grouped_00001_of_00008.arrow\n",
"05/12/2023 06:15:26 - INFO - datasets.arrow_dataset - Process #2 will write at data_cache/pt_sample_data_text/grouped_00002_of_00008.arrow\n",
"05/12/2023 06:15:26 - INFO - datasets.arrow_dataset - Process #3 will write at data_cache/pt_sample_data_text/grouped_00003_of_00008.arrow\n",
"05/12/2023 06:15:26 - INFO - datasets.arrow_dataset - Process #4 will write at data_cache/pt_sample_data_text/grouped_00004_of_00008.arrow\n",
"05/12/2023 06:15:26 - INFO - datasets.arrow_dataset - Process #5 will write at data_cache/pt_sample_data_text/grouped_00005_of_00008.arrow\n",
"05/12/2023 06:15:26 - INFO - datasets.arrow_dataset - Process #6 will write at data_cache/pt_sample_data_text/grouped_00006_of_00008.arrow\n",
"05/12/2023 06:15:26 - INFO - datasets.arrow_dataset - Process #7 will write at data_cache/pt_sample_data_text/grouped_00007_of_00008.arrow\n",
"05/12/2023 06:15:27 - INFO - datasets.arrow_dataset - Spawning 8 processes\n",
"Grouping texts in chunks of 512 (num_proc=8): 0% 0/125987 [00:00, ? examples/s]05/12/2023 06:15:27 - INFO - datasets.arrow_dataset - Caching processed dataset at data_cache/pt_sample_data_text/grouped_00004_of_00008.arrow\n",
"05/12/2023 06:15:27 - INFO - datasets.arrow_dataset - Caching processed dataset at data_cache/pt_sample_data_text/grouped_00002_of_00008.arrow\n",
"05/12/2023 06:15:27 - INFO - datasets.arrow_dataset - Caching processed dataset at data_cache/pt_sample_data_text/grouped_00003_of_00008.arrow\n",
"05/12/2023 06:15:27 - INFO - datasets.arrow_dataset - Caching processed dataset at data_cache/pt_sample_data_text/grouped_00000_of_00008.arrow\n",
"05/12/2023 06:15:27 - INFO - datasets.arrow_dataset - Caching processed dataset at data_cache/pt_sample_data_text/grouped_00001_of_00008.arrow\n",
"Grouping texts in chunks of 512 (num_proc=8): 1% 1000/125987 [00:00<00:14, 8900.08 examples/s]05/12/2023 06:15:27 - INFO - datasets.arrow_dataset - Caching processed dataset at data_cache/pt_sample_data_text/grouped_00005_of_00008.arrow\n",
"05/12/2023 06:15:27 - INFO - datasets.arrow_dataset - Caching processed dataset at data_cache/pt_sample_data_text/grouped_00006_of_00008.arrow\n",
"05/12/2023 06:15:27 - INFO - datasets.arrow_dataset - Caching processed dataset at data_cache/pt_sample_data_text/grouped_00007_of_00008.arrow\n",
"05/12/2023 06:15:28 - INFO - datasets.arrow_dataset - Concatenating 8 shards\n",
"05/12/2023 06:15:28 - INFO - datasets.arrow_dataset - Caching indices mapping at /content/Chinese-LLaMA-Alpaca/scripts/data_cache/pt_sample_data_text/cache-2e6939e892c734e7.arrow\n",
"05/12/2023 06:15:28 - INFO - datasets.arrow_dataset - Caching indices mapping at /content/Chinese-LLaMA-Alpaca/scripts/data_cache/pt_sample_data_text/cache-058d2fd64a628682.arrow\n",
"05/12/2023 06:15:28 - INFO - __main__ - Num train_samples 6906\n",
"05/12/2023 06:15:28 - INFO - __main__ - training example:\n",
"05/12/2023 06:15:28 - INFO - __main__ - 学生的自尊心和自我激励。 最后,音乐教育有助于教育学生了解其他文化和世界。通过学习和演奏其他文化的乐曲,学生可以了解他们在其他情况下不会听到的不同音效和节奏的细微差别。例如,学习一首来自印度的音乐将让学生接触到以前从未遇到过的节奏和音阶的新思想。他们可以深入了解一种他们不熟悉的文化,并拓宽自己的世界观。 总之,音乐教育是学生整体教育经验中的必要组成部分。它为他们提供学习、创造和表达自己的机会,同时也提供了一个平台,建立身份、信心和文化意识。音乐教育在当今世界是至关重要的,因为它让学生有机会探索自己的才能,并帮助他们成长为充满自信的个体。 从以下句子中找出名字。我的兄弟汤姆和姐姐简要来拜访。汤姆 简 列出五种有羽毛的动物。鸡,鸭,老鹰,孔雀,蜂鸟。 阅读说明,然后编写待办事项清单。去购物,收拾你的卧室,完成项目报告。待办事项清单: 1. 去购物 2. 收拾你的卧室 3. 完成项目报告 使用给定的单词之一生成唯一密码。DogD0g@P4ssW0rd 发一条推文,谈谈环保的重要性。关注环保是人类健康和可持续发展的重要保证。让我们更加谨慎地行动,保护我们的星球!#环保 #可持续发展 #绿色生活 使用给定单词中的一个创建电影标题。Cloud命运之云 列举三个志愿参与对事业有益的原因。志愿参与对事业有益的原因有很多。首先,它使个人能够在世界上产生积极的影响。其次,它帮助个人建立对目标的同理心、热情和承诺。最后,还是结识新朋友和建立有意义的联系的好方法。 填写空白以使下列句子变为真实句子: David 获得了一份 _______ 学士学位。计算机科学David 获得了一份计算机科学学士学位。 重新组织句子,使其结构反映以下结构:主语 - 谓语 - 宾语Sleeping has been my hobby for years.我已经睡觉多年了。创建一条带有#SmallActionsMakeADifference标签的推文,以促进可持续发展。#SmallActionsMakeADifference!投资于节能电器,使用可重复\n",
"Downloading (…)model.bin.index.json: 100% 25.5k/25.5k [00:00<00:00, 80.3MB/s]\n",
"[INFO|modeling_utils.py:2534] 2023-05-12 06:15:28,999 >> loading weights file pytorch_model.bin from cache at /root/.cache/huggingface/hub/models--decapoda-research--llama-7b-hf/snapshots/5f98eefcc80e437ef68d457ad7bf167c2c6a1348/pytorch_model.bin.index.json\n",
"Downloading shards: 0% 0/33 [00:00, ?it/s]\n",
"Downloading (…)l-00001-of-00033.bin: 0% 0.00/405M [00:00, ?B/s]\u001b[A\n",
"Downloading (…)l-00001-of-00033.bin: 10% 41.9M/405M [00:00<00:00, 391MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00033.bin: 23% 94.4M/405M [00:00<00:00, 451MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00033.bin: 36% 147M/405M [00:00<00:00, 414MB/s] \u001b[A\n",
"Downloading (…)l-00001-of-00033.bin: 52% 210M/405M [00:00<00:00, 450MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00033.bin: 65% 262M/405M [00:00<00:00, 432MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00033.bin: 80% 325M/405M [00:00<00:00, 468MB/s]\u001b[A\n",
"Downloading (…)l-00001-of-00033.bin: 100% 405M/405M [00:00<00:00, 472MB/s]\n",
"Downloading shards: 3% 1/33 [00:01<00:35, 1.12s/it]\n",
"Downloading (…)l-00002-of-00033.bin: 0% 0.00/405M [00:00, ?B/s]\u001b[A\n",
"Downloading (…)l-00002-of-00033.bin: 13% 52.4M/405M [00:00<00:00, 483MB/s]\u001b[A\n",
"Downloading (…)l-00002-of-00033.bin: 28% 115M/405M [00:00<00:00, 561MB/s] \u001b[A\n",
"Downloading (…)l-00002-of-00033.bin: 44% 178M/405M [00:00<00:00, 505MB/s]\u001b[A\n",
"Downloading (…)l-00002-of-00033.bin: 60% 241M/405M [00:00<00:00, 539MB/s]\u001b[A\n",
"Downloading (…)l-00002-of-00033.bin: 78% 315M/405M [00:00<00:00, 575MB/s]\u001b[A\n",
"Downloading (…)l-00002-of-00033.bin: 100% 405M/405M [00:00<00:00, 552MB/s]\n",
"Downloading shards: 6% 2/33 [00:02<00:31, 1.03s/it]\n",
"Downloading (…)l-00003-of-00033.bin: 0% 0.00/405M [00:00, ?B/s]\u001b[A\n",
"Downloading (…)l-00003-of-00033.bin: 16% 62.9M/405M [00:00<00:00, 564MB/s]\u001b[A\n",
"Downloading (…)l-00003-of-00033.bin: 31% 126M/405M [00:00<00:00, 555MB/s] \u001b[A\n",
"Downloading (…)l-00003-of-00033.bin: 47% 189M/405M [00:00<00:00, 560MB/s]\u001b[A\n",
"Downloading (…)l-00003-of-00033.bin: 62% 252M/405M [00:00<00:00, 554MB/s]\u001b[A\n",
"Downloading (…)l-00003-of-00033.bin: 78% 315M/405M [00:00<00:00, 547MB/s]\u001b[A\n",
"Downloading (…)l-00003-of-00033.bin: 100% 405M/405M [00:00<00:00, 523MB/s]\n",
"Downloading shards: 9% 3/33 [00:03<00:30, 1.03s/it]\n",
"Downloading (…)l-00004-of-00033.bin: 0% 0.00/405M [00:00, ?B/s]\u001b[A\n",
"Downloading (…)l-00004-of-00033.bin: 16% 62.9M/405M [00:00<00:00, 565MB/s]\u001b[A\n",
"Downloading (…)l-00004-of-00033.bin: 31% 126M/405M [00:00<00:00, 579MB/s] \u001b[A\n",
"Downloading (…)l-00004-of-00033.bin: 47% 189M/405M [00:00<00:00, 589MB/s]\u001b[A\n",
"Downloading (…)l-00004-of-00033.bin: 62% 252M/405M [00:00<00:00, 585MB/s]\u001b[A\n",
"Downloading (…)l-00004-of-00033.bin: 78% 315M/405M [00:00<00:00, 524MB/s]\u001b[A\n",
"Downloading (…)l-00004-of-00033.bin: 100% 405M/405M [00:00<00:00, 553MB/s]\n",
"Downloading shards: 12% 4/33 [00:04<00:29, 1.01s/it]\n",
"Downloading (…)l-00005-of-00033.bin: 0% 0.00/405M [00:00, ?B/s]\u001b[A\n",
"Downloading (…)l-00005-of-00033.bin: 13% 52.4M/405M [00:00<00:00, 511MB/s]\u001b[A\n",
"Downloading (…)l-00005-of-00033.bin: 28% 115M/405M [00:00<00:00, 557MB/s] \u001b[A\n",
"Downloading (…)l-00005-of-00033.bin: 44% 178M/405M [00:00<00:00, 561MB/s]\u001b[A\n",
"Downloading (…)l-00005-of-00033.bin: 60% 241M/405M [00:00<00:00, 535MB/s]\u001b[A\n",
"Downloading (…)l-00005-of-00033.bin: 75% 304M/405M [00:00<00:00, 530MB/s]\u001b[A\n",
"Downloading (…)l-00005-of-00033.bin: 100% 405M/405M [00:00<00:00, 546MB/s]\n",
"Downloading shards: 15% 5/33 [00:05<00:28, 1.00s/it]\n",
"Downloading (…)l-00006-of-00033.bin: 0% 0.00/405M [00:00, ?B/s]\u001b[A\n",
"Downloading (…)l-00006-of-00033.bin: 16% 62.9M/405M [00:00<00:00, 539MB/s]\u001b[A\n",
"Downloading (…)l-00006-of-00033.bin: 31% 126M/405M [00:00<00:00, 514MB/s] \u001b[A\n",
"Downloading (…)l-00006-of-00033.bin: 44% 178M/405M [00:00<00:00, 493MB/s]\u001b[A\n",
"Downloading (…)l-00006-of-00033.bin: 57% 231M/405M [00:00<00:00, 455MB/s]\u001b[A\n",
"Downloading (…)l-00006-of-00033.bin: 70% 283M/405M [00:00<00:00, 448MB/s]\u001b[A\n",
"Downloading (…)l-00006-of-00033.bin: 85% 346M/405M [00:00<00:00, 485MB/s]\u001b[A\n",
"Downloading (…)l-00006-of-00033.bin: 100% 405M/405M [00:00<00:00, 489MB/s]\n",
"Downloading shards: 18% 6/33 [00:06<00:27, 1.03s/it]\n",
"Downloading (…)l-00007-of-00033.bin: 0% 0.00/405M [00:00, ?B/s]\u001b[A\n",
"Downloading (…)l-00007-of-00033.bin: 16% 62.9M/405M [00:00<00:00, 568MB/s]\u001b[A\n",
"Downloading (…)l-00007-of-00033.bin: 31% 126M/405M [00:00<00:00, 592MB/s] \u001b[A\n",
"Downloading (…)l-00007-of-00033.bin: 47% 189M/405M [00:00<00:00, 581MB/s]\u001b[A\n",
"Downloading (…)l-00007-of-00033.bin: 62% 252M/405M [00:00<00:00, 534MB/s]\u001b[A\n",
"Downloading (…)l-00007-of-00033.bin: 78% 315M/405M [00:00<00:00, 513MB/s]\u001b[A\n",
"Downloading (…)l-00007-of-00033.bin: 100% 405M/405M [00:00<00:00, 509MB/s]\n",
"Downloading shards: 21% 7/33 [00:07<00:26, 1.04s/it]\n",
"Downloading (…)l-00008-of-00033.bin: 0% 0.00/405M [00:00, ?B/s]\u001b[A\n",
"Downloading (…)l-00008-of-00033.bin: 13% 52.4M/405M [00:00<00:00, 439MB/s]\u001b[A\n",
"Downloading (…)l-00008-of-00033.bin: 26% 105M/405M [00:00<00:00, 457MB/s] \u001b[A\n",
"Downloading (…)l-00008-of-00033.bin: 39% 157M/405M [00:00<00:00, 468MB/s]\u001b[A\n",
"Downloading (…)l-00008-of-00033.bin: 52% 210M/405M [00:00<00:00, 470MB/s]\u001b[A\n",
"Downloading (…)l-00008-of-00033.bin: 65% 262M/405M [00:00<00:00, 474MB/s]\u001b[A\n",
"Downloading (…)l-00008-of-00033.bin: 78% 315M/405M [00:00<00:00, 483MB/s]\u001b[A\n",
"Downloading (…)l-00008-of-00033.bin: 100% 405M/405M [00:00<00:00, 494MB/s]\n",
"Downloading shards: 24% 8/33 [00:08<00:26, 1.05s/it]\n",
"Downloading (…)l-00009-of-00033.bin: 0% 0.00/405M [00:00, ?B/s]\u001b[A\n",
"Downloading (…)l-00009-of-00033.bin: 16% 62.9M/405M [00:00<00:00, 588MB/s]\u001b[A\n",
"Downloading (…)l-00009-of-00033.bin: 31% 126M/405M [00:00<00:00, 602MB/s] \u001b[A\n",
"Downloading (…)l-00009-of-00033.bin: 47% 189M/405M [00:00<00:00, 608MB/s]\u001b[A\n",
"Downloading (…)l-00009-of-00033.bin: 62% 252M/405M [00:00<00:00, 614MB/s]\u001b[A\n",
"Downloading (…)l-00009-of-00033.bin: 78% 315M/405M [00:00<00:00, 591MB/s]\u001b[A\n",
"Downloading (…)l-00009-of-00033.bin: 100% 405M/405M [00:00<00:00, 502MB/s]\n",
"Downloading shards: 27% 9/33 [00:09<00:25, 1.05s/it]\n",
"Downloading (…)l-00010-of-00033.bin: 0% 0.00/405M [00:00, ?B/s]\u001b[A\n",
"Downloading (…)l-00010-of-00033.bin: 5% 21.0M/405M [00:00<00:03, 106MB/s]\u001b[A\n",
"Downloading (…)l-00010-of-00033.bin: 16% 62.9M/405M [00:00<00:01, 235MB/s]\u001b[A\n",
"Downloading (…)l-00010-of-00033.bin: 31% 126M/405M [00:00<00:00, 361MB/s] \u001b[A\n",
"Downloading (…)l-00010-of-00033.bin: 47% 189M/405M [00:00<00:00, 437MB/s]\u001b[A\n",
"Downloading (…)l-00010-of-00033.bin: 62% 252M/405M [00:00<00:00, 491MB/s]\u001b[A\n",
"Downloading (…)l-00010-of-00033.bin: 78% 315M/405M [00:00<00:00, 519MB/s]\u001b[A\n",
"Downloading (…)l-00010-of-00033.bin: 100% 405M/405M [00:00<00:00, 427MB/s]\n",
"Downloading shards: 30% 10/33 [00:10<00:25, 1.10s/it]\n",
"Downloading (…)l-00011-of-00033.bin: 0% 0.00/405M [00:00, ?B/s]\u001b[A\n",
"Downloading (…)l-00011-of-00033.bin: 16% 62.9M/405M [00:00<00:00, 551MB/s]\u001b[A\n",
"Downloading (…)l-00011-of-00033.bin: 31% 126M/405M [00:00<00:00, 579MB/s] \u001b[A\n",
"Downloading (…)l-00011-of-00033.bin: 47% 189M/405M [00:00<00:00, 583MB/s]\u001b[A\n",
"Downloading (…)l-00011-of-00033.bin: 62% 252M/405M [00:00<00:00, 538MB/s]\u001b[A\n",
"Downloading (…)l-00011-of-00033.bin: 78% 315M/405M [00:00<00:00, 429MB/s]\u001b[A\n",
"Downloading (…)l-00011-of-00033.bin: 100% 405M/405M [00:00<00:00, 423MB/s]\n",
"Downloading shards: 33% 11/33 [00:11<00:24, 1.13s/it]\n",
"Downloading (…)l-00012-of-00033.bin: 0% 0.00/405M [00:00, ?B/s]\u001b[A\n",
"Downloading (…)l-00012-of-00033.bin: 16% 62.9M/405M [00:00<00:00, 580MB/s]\u001b[A\n",
"Downloading (…)l-00012-of-00033.bin: 31% 126M/405M [00:00<00:00, 583MB/s] \u001b[A\n",
"Downloading (…)l-00012-of-00033.bin: 47% 189M/405M [00:00<00:00, 588MB/s]\u001b[A\n",
"Downloading (…)l-00012-of-00033.bin: 62% 252M/405M [00:00<00:00, 333MB/s]\u001b[A\n",
"Downloading (…)l-00012-of-00033.bin: 75% 304M/405M [00:00<00:00, 370MB/s]\u001b[A\n",
"Downloading (…)l-00012-of-00033.bin: 88% 357M/405M [00:00<00:00, 393MB/s]\u001b[A\n",
"Downloading (…)l-00012-of-00033.bin: 100% 405M/405M [00:00<00:00, 413MB/s]\n",
"Downloading shards: 36% 12/33 [00:12<00:24, 1.16s/it]\n",
"Downloading (…)l-00013-of-00033.bin: 0% 0.00/405M [00:00, ?B/s]\u001b[A\n",
"Downloading (…)l-00013-of-00033.bin: 10% 41.9M/405M [00:00<00:00, 364MB/s]\u001b[A\n",
"Downloading (…)l-00013-of-00033.bin: 26% 105M/405M [00:00<00:00, 492MB/s] \u001b[A\n",
"Downloading (…)l-00013-of-00033.bin: 39% 157M/405M [00:00<00:00, 447MB/s]\u001b[A\n",
"Downloading (…)l-00013-of-00033.bin: 52% 210M/405M [00:00<00:00, 454MB/s]\u001b[A\n",
"Downloading (…)l-00013-of-00033.bin: 67% 273M/405M [00:00<00:00, 506MB/s]\u001b[A\n",
"Downloading (…)l-00013-of-00033.bin: 80% 325M/405M [00:00<00:00, 473MB/s]\u001b[A\n",
"Downloading (…)l-00013-of-00033.bin: 100% 405M/405M [00:00<00:00, 464MB/s]\n",
"Downloading shards: 39% 13/33 [00:14<00:22, 1.15s/it]\n",
"Downloading (…)l-00014-of-00033.bin: 0% 0.00/405M [00:00, ?B/s]\u001b[A\n",
"Downloading (…)l-00014-of-00033.bin: 13% 52.4M/405M [00:00<00:00, 500MB/s]\u001b[A\n",
"Downloading (…)l-00014-of-00033.bin: 26% 105M/405M [00:00<00:00, 466MB/s] \u001b[A\n",
"Downloading (…)l-00014-of-00033.bin: 41% 168M/405M [00:00<00:00, 500MB/s]\u001b[A\n",
"Downloading (…)l-00014-of-00033.bin: 54% 220M/405M [00:00<00:00, 460MB/s]\u001b[A\n",
"Downloading (…)l-00014-of-00033.bin: 70% 283M/405M [00:00<00:00, 506MB/s]\u001b[A\n",
"Downloading (…)l-00014-of-00033.bin: 83% 336M/405M [00:00<00:00, 500MB/s]\u001b[A\n",
"Downloading (…)l-00014-of-00033.bin: 100% 405M/405M [00:00<00:00, 458MB/s]\n",
"Downloading shards: 42% 14/33 [00:15<00:21, 1.14s/it]\n",
"Downloading (…)l-00015-of-00033.bin: 0% 0.00/405M [00:00, ?B/s]\u001b[A\n",
"Downloading (…)l-00015-of-00033.bin: 16% 62.9M/405M [00:00<00:00, 540MB/s]\u001b[A\n",
"Downloading (…)l-00015-of-00033.bin: 31% 126M/405M [00:00<00:00, 533MB/s] \u001b[A\n",
"Downloading (…)l-00015-of-00033.bin: 47% 189M/405M [00:00<00:00, 504MB/s]\u001b[A\n",
"Downloading (…)l-00015-of-00033.bin: 60% 241M/405M [00:00<00:00, 467MB/s]\u001b[A\n",
"Downloading (…)l-00015-of-00033.bin: 73% 294M/405M [00:00<00:00, 427MB/s]\u001b[A\n",
"Downloading (…)l-00015-of-00033.bin: 85% 346M/405M [00:00<00:00, 395MB/s]\u001b[A\n",
"Downloading (…)l-00015-of-00033.bin: 100% 405M/405M [00:00<00:00, 408MB/s]\n",
"Downloading shards: 45% 15/33 [00:16<00:21, 1.17s/it]\n",
"Downloading (…)l-00016-of-00033.bin: 0% 0.00/405M [00:00, ?B/s]\u001b[A\n",
"Downloading (…)l-00016-of-00033.bin: 13% 52.4M/405M [00:00<00:00, 482MB/s]\u001b[A\n",
"Downloading (…)l-00016-of-00033.bin: 26% 105M/405M [00:00<00:00, 501MB/s] \u001b[A\n",
"Downloading (…)l-00016-of-00033.bin: 39% 157M/405M [00:00<00:00, 509MB/s]\u001b[A\n",
"Downloading (…)l-00016-of-00033.bin: 52% 210M/405M [00:00<00:00, 513MB/s]\u001b[A\n",
"Downloading (…)l-00016-of-00033.bin: 65% 262M/405M [00:00<00:00, 488MB/s]\u001b[A\n",
"Downloading (…)l-00016-of-00033.bin: 78% 315M/405M [00:00<00:00, 499MB/s]\u001b[A\n",
"Downloading (…)l-00016-of-00033.bin: 100% 405M/405M [00:00<00:00, 502MB/s]\n",
"Downloading shards: 48% 16/33 [00:17<00:19, 1.14s/it]\n",
"Downloading (…)l-00017-of-00033.bin: 0% 0.00/405M [00:00, ?B/s]\u001b[A\n",
"Downloading (…)l-00017-of-00033.bin: 13% 52.4M/405M [00:00<00:00, 503MB/s]\u001b[A\n",
"Downloading (…)l-00017-of-00033.bin: 26% 105M/405M [00:00<00:00, 502MB/s] \u001b[A\n",
"Downloading (…)l-00017-of-00033.bin: 39% 157M/405M [00:00<00:00, 498MB/s]\u001b[A\n",
"Downloading (…)l-00017-of-00033.bin: 52% 210M/405M [00:00<00:00, 496MB/s]\u001b[A\n",
"Downloading (…)l-00017-of-00033.bin: 65% 262M/405M [00:00<00:00, 497MB/s]\u001b[A\n",
"Downloading (…)l-00017-of-00033.bin: 78% 315M/405M [00:00<00:00, 481MB/s]\u001b[A\n",
"Downloading (…)l-00017-of-00033.bin: 100% 405M/405M [00:00<00:00, 488MB/s]\n",
"Downloading shards: 52% 17/33 [00:18<00:18, 1.18s/it]\n",
"Downloading (…)l-00018-of-00033.bin: 0% 0.00/405M [00:00, ?B/s]\u001b[A\n",
"Downloading (…)l-00018-of-00033.bin: 16% 62.9M/405M [00:00<00:00, 536MB/s]\u001b[A\n",
"Downloading (…)l-00018-of-00033.bin: 31% 126M/405M [00:00<00:00, 462MB/s] \u001b[A\n",
"Downloading (…)l-00018-of-00033.bin: 44% 178M/405M [00:00<00:00, 477MB/s]\u001b[A\n",
"Downloading (…)l-00018-of-00033.bin: 57% 231M/405M [00:00<00:00, 488MB/s]\u001b[A\n",
"Downloading (…)l-00018-of-00033.bin: 70% 283M/405M [00:00<00:00, 493MB/s]\u001b[A\n",
"Downloading (…)l-00018-of-00033.bin: 83% 336M/405M [00:00<00:00, 498MB/s]\u001b[A\n",
"Downloading (…)l-00018-of-00033.bin: 100% 405M/405M [00:00<00:00, 486MB/s]\n",
"Downloading shards: 55% 18/33 [00:20<00:17, 1.19s/it]\n",
"Downloading (…)l-00019-of-00033.bin: 0% 0.00/405M [00:00, ?B/s]\u001b[A\n",
"Downloading (…)l-00019-of-00033.bin: 13% 52.4M/405M [00:00<00:00, 516MB/s]\u001b[A\n",
"Downloading (…)l-00019-of-00033.bin: 26% 105M/405M [00:00<00:00, 513MB/s] \u001b[A\n",
"Downloading (…)l-00019-of-00033.bin: 39% 157M/405M [00:00<00:00, 509MB/s]\u001b[A\n",
"Downloading (…)l-00019-of-00033.bin: 52% 210M/405M [00:00<00:00, 507MB/s]\u001b[A\n",
"Downloading (…)l-00019-of-00033.bin: 65% 262M/405M [00:00<00:00, 506MB/s]\u001b[A\n",
"Downloading (…)l-00019-of-00033.bin: 78% 315M/405M [00:00<00:00, 493MB/s]\u001b[A\n",
"Downloading (…)l-00019-of-00033.bin: 100% 405M/405M [00:00<00:00, 451MB/s]\n",
"Downloading shards: 58% 19/33 [00:21<00:16, 1.18s/it]\n",
"Downloading (…)l-00020-of-00033.bin: 0% 0.00/405M [00:00, ?B/s]\u001b[A\n",
"Downloading (…)l-00020-of-00033.bin: 13% 52.4M/405M [00:00<00:00, 512MB/s]\u001b[A\n",
"Downloading (…)l-00020-of-00033.bin: 26% 105M/405M [00:00<00:00, 500MB/s] \u001b[A\n",
"Downloading (…)l-00020-of-00033.bin: 39% 157M/405M [00:00<00:00, 500MB/s]\u001b[A\n",
"Downloading (…)l-00020-of-00033.bin: 52% 210M/405M [00:00<00:00, 501MB/s]\u001b[A\n",
"Downloading (…)l-00020-of-00033.bin: 65% 262M/405M [00:00<00:00, 449MB/s]\u001b[A\n",
"Downloading (…)l-00020-of-00033.bin: 78% 315M/405M [00:00<00:00, 436MB/s]\u001b[A\n",
"Downloading (…)l-00020-of-00033.bin: 100% 405M/405M [00:00<00:00, 451MB/s]\n",
"Downloading shards: 61% 20/33 [00:22<00:15, 1.17s/it]\n",
"Downloading (…)l-00021-of-00033.bin: 0% 0.00/405M [00:00, ?B/s]\u001b[A\n",
"Downloading (…)l-00021-of-00033.bin: 3% 10.5M/405M [00:00<00:08, 44.4MB/s]\u001b[A\n",
"Downloading (…)l-00021-of-00033.bin: 16% 62.9M/405M [00:00<00:01, 215MB/s] \u001b[A\n",
"Downloading (…)l-00021-of-00033.bin: 28% 115M/405M [00:00<00:00, 295MB/s] \u001b[A\n",
"Downloading (…)l-00021-of-00033.bin: 39% 157M/405M [00:00<00:00, 333MB/s]\u001b[A\n",
"Downloading (…)l-00021-of-00033.bin: 49% 199M/405M [00:00<00:00, 358MB/s]\u001b[A\n",
"Downloading (…)l-00021-of-00033.bin: 62% 252M/405M [00:00<00:00, 403MB/s]\u001b[A\n",
"Downloading (…)l-00021-of-00033.bin: 75% 304M/405M [00:00<00:00, 434MB/s]\u001b[A\n",
"Downloading (…)l-00021-of-00033.bin: 100% 405M/405M [00:01<00:00, 371MB/s]\n",
"Downloading shards: 64% 21/33 [00:23<00:14, 1.22s/it]\n",
"Downloading (…)l-00022-of-00033.bin: 0% 0.00/405M [00:00, ?B/s]\u001b[A\n",
"Downloading (…)l-00022-of-00033.bin: 13% 52.4M/405M [00:00<00:00, 483MB/s]\u001b[A\n",
"Downloading (…)l-00022-of-00033.bin: 26% 105M/405M [00:00<00:00, 495MB/s] \u001b[A\n",
"Downloading (…)l-00022-of-00033.bin: 39% 157M/405M [00:00<00:00, 498MB/s]\u001b[A\n",
"Downloading (…)l-00022-of-00033.bin: 52% 210M/405M [00:00<00:00, 500MB/s]\u001b[A\n",
"Downloading (…)l-00022-of-00033.bin: 65% 262M/405M [00:00<00:00, 503MB/s]\u001b[A\n",
"Downloading (…)l-00022-of-00033.bin: 78% 315M/405M [00:00<00:00, 503MB/s]\u001b[A\n",
"Downloading (…)l-00022-of-00033.bin: 100% 405M/405M [00:00<00:00, 501MB/s]\n",
"Downloading shards: 67% 22/33 [00:24<00:12, 1.17s/it]\n",
"Downloading (…)l-00023-of-00033.bin: 0% 0.00/405M [00:00, ?B/s]\u001b[A\n",
"Downloading (…)l-00023-of-00033.bin: 13% 52.4M/405M [00:00<00:00, 513MB/s]\u001b[A\n",
"Downloading (…)l-00023-of-00033.bin: 28% 115M/405M [00:00<00:00, 523MB/s] \u001b[A\n",
"Downloading (…)l-00023-of-00033.bin: 44% 178M/405M [00:00<00:00, 526MB/s]\u001b[A\n",
"Downloading (…)l-00023-of-00033.bin: 60% 241M/405M [00:00<00:00, 495MB/s]\u001b[A\n",
"Downloading (…)l-00023-of-00033.bin: 73% 294M/405M [00:00<00:00, 500MB/s]\u001b[A\n",
"Downloading (…)l-00023-of-00033.bin: 85% 346M/405M [00:00<00:00, 503MB/s]\u001b[A\n",
"Downloading (…)l-00023-of-00033.bin: 100% 405M/405M [00:00<00:00, 481MB/s]\n",
"Downloading shards: 70% 23/33 [00:25<00:11, 1.15s/it]\n",
"Downloading (…)l-00024-of-00033.bin: 0% 0.00/405M [00:00, ?B/s]\u001b[A\n",
"Downloading (…)l-00024-of-00033.bin: 16% 62.9M/405M [00:00<00:00, 586MB/s]\u001b[A\n",
"Downloading (…)l-00024-of-00033.bin: 31% 126M/405M [00:00<00:00, 511MB/s] \u001b[A\n",
"Downloading (…)l-00024-of-00033.bin: 44% 178M/405M [00:00<00:00, 438MB/s]\u001b[A\n",
"Downloading (…)l-00024-of-00033.bin: 57% 231M/405M [00:00<00:00, 382MB/s]\u001b[A\n",
"Downloading (…)l-00024-of-00033.bin: 67% 273M/405M [00:00<00:00, 381MB/s]\u001b[A\n",
"Downloading (…)l-00024-of-00033.bin: 78% 315M/405M [00:00<00:00, 386MB/s]\u001b[A\n",
"Downloading (…)l-00024-of-00033.bin: 100% 405M/405M [00:00<00:00, 416MB/s]\n",
"Downloading shards: 73% 24/33 [00:27<00:10, 1.17s/it]\n",
"Downloading (…)l-00025-of-00033.bin: 0% 0.00/405M [00:00, ?B/s]\u001b[A\n",
"Downloading (…)l-00025-of-00033.bin: 16% 62.9M/405M [00:00<00:00, 596MB/s]\u001b[A\n",
"Downloading (…)l-00025-of-00033.bin: 31% 126M/405M [00:00<00:00, 579MB/s] \u001b[A\n",
"Downloading (…)l-00025-of-00033.bin: 47% 189M/405M [00:00<00:00, 553MB/s]\u001b[A\n",
"Downloading (…)l-00025-of-00033.bin: 62% 252M/405M [00:00<00:00, 554MB/s]\u001b[A\n",
"Downloading (…)l-00025-of-00033.bin: 78% 315M/405M [00:00<00:00, 545MB/s]\u001b[A\n",
"Downloading (…)l-00025-of-00033.bin: 100% 405M/405M [00:00<00:00, 542MB/s]\n",
"Downloading shards: 76% 25/33 [00:28<00:08, 1.12s/it]\n",
"Downloading (…)l-00026-of-00033.bin: 0% 0.00/405M [00:00, ?B/s]\u001b[A\n",
"Downloading (…)l-00026-of-00033.bin: 16% 62.9M/405M [00:00<00:00, 534MB/s]\u001b[A\n",
"Downloading (…)l-00026-of-00033.bin: 31% 126M/405M [00:00<00:00, 541MB/s] \u001b[A\n",
"Downloading (…)l-00026-of-00033.bin: 47% 189M/405M [00:00<00:00, 562MB/s]\u001b[A\n",
"Downloading (…)l-00026-of-00033.bin: 62% 252M/405M [00:00<00:00, 539MB/s]\u001b[A\n",
"Downloading (…)l-00026-of-00033.bin: 78% 315M/405M [00:00<00:00, 514MB/s]\u001b[A\n",
"Downloading (…)l-00026-of-00033.bin: 100% 405M/405M [00:00<00:00, 521MB/s]\n",
"Downloading shards: 79% 26/33 [00:29<00:07, 1.12s/it]\n",
"Downloading (…)l-00027-of-00033.bin: 0% 0.00/405M [00:00, ?B/s]\u001b[A\n",
"Downloading (…)l-00027-of-00033.bin: 16% 62.9M/405M [00:00<00:00, 591MB/s]\u001b[A\n",
"Downloading (…)l-00027-of-00033.bin: 31% 126M/405M [00:00<00:00, 602MB/s] \u001b[A\n",
"Downloading (…)l-00027-of-00033.bin: 49% 199M/405M [00:00<00:00, 616MB/s]\u001b[A\n",
"Downloading (…)l-00027-of-00033.bin: 65% 262M/405M [00:00<00:00, 440MB/s]\u001b[A\n",
"Downloading (…)l-00027-of-00033.bin: 78% 315M/405M [00:00<00:00, 360MB/s]\u001b[A\n",
"Downloading (…)l-00027-of-00033.bin: 88% 357M/405M [00:00<00:00, 324MB/s]\u001b[A\n",
"Downloading (…)l-00027-of-00033.bin: 100% 405M/405M [00:01<00:00, 354MB/s]\n",
"Downloading shards: 82% 27/33 [00:30<00:07, 1.20s/it]\n",
"Downloading (…)l-00028-of-00033.bin: 0% 0.00/405M [00:00, ?B/s]\u001b[A\n",
"Downloading (…)l-00028-of-00033.bin: 16% 62.9M/405M [00:00<00:00, 523MB/s]\u001b[A\n",
"Downloading (…)l-00028-of-00033.bin: 28% 115M/405M [00:00<00:00, 521MB/s] \u001b[A\n",
"Downloading (…)l-00028-of-00033.bin: 41% 168M/405M [00:00<00:00, 456MB/s]\u001b[A\n",
"Downloading (…)l-00028-of-00033.bin: 54% 220M/405M [00:00<00:00, 406MB/s]\u001b[A\n",
"Downloading (…)l-00028-of-00033.bin: 65% 262M/405M [00:00<00:00, 398MB/s]\u001b[A\n",
"Downloading (…)l-00028-of-00033.bin: 75% 304M/405M [00:00<00:00, 399MB/s]\u001b[A\n",
"Downloading (…)l-00028-of-00033.bin: 88% 357M/405M [00:00<00:00, 418MB/s]\u001b[A\n",
"Downloading (…)l-00028-of-00033.bin: 100% 405M/405M [00:00<00:00, 427MB/s]\n",
"Downloading shards: 85% 28/33 [00:31<00:05, 1.20s/it]\n",
"Downloading (…)l-00029-of-00033.bin: 0% 0.00/405M [00:00, ?B/s]\u001b[A\n",
"Downloading (…)l-00029-of-00033.bin: 16% 62.9M/405M [00:00<00:00, 508MB/s]\u001b[A\n",
"Downloading (…)l-00029-of-00033.bin: 28% 115M/405M [00:00<00:00, 501MB/s] \u001b[A\n",
"Downloading (…)l-00029-of-00033.bin: 44% 178M/405M [00:00<00:00, 512MB/s]\u001b[A\n",
"Downloading (…)l-00029-of-00033.bin: 60% 241M/405M [00:00<00:00, 517MB/s]\u001b[A\n",
"Downloading (…)l-00029-of-00033.bin: 75% 304M/405M [00:00<00:00, 520MB/s]\u001b[A\n",
"Downloading (…)l-00029-of-00033.bin: 100% 405M/405M [00:00<00:00, 516MB/s]\n",
"Downloading shards: 88% 29/33 [00:32<00:04, 1.15s/it]\n",
"Downloading (…)l-00030-of-00033.bin: 0% 0.00/405M [00:00, ?B/s]\u001b[A\n",
"Downloading (…)l-00030-of-00033.bin: 13% 52.4M/405M [00:00<00:00, 518MB/s]\u001b[A\n",
"Downloading (…)l-00030-of-00033.bin: 28% 115M/405M [00:00<00:00, 527MB/s] \u001b[A\n",
"Downloading (…)l-00030-of-00033.bin: 44% 178M/405M [00:00<00:00, 528MB/s]\u001b[A\n",
"Downloading (…)l-00030-of-00033.bin: 60% 241M/405M [00:00<00:00, 533MB/s]\u001b[A\n",
"Downloading (…)l-00030-of-00033.bin: 75% 304M/405M [00:00<00:00, 531MB/s]\u001b[A\n",
"Downloading (…)l-00030-of-00033.bin: 100% 405M/405M [00:00<00:00, 527MB/s]\n",
"Downloading shards: 91% 30/33 [00:33<00:03, 1.11s/it]\n",
"Downloading (…)l-00031-of-00033.bin: 0% 0.00/405M [00:00, ?B/s]\u001b[A\n",
"Downloading (…)l-00031-of-00033.bin: 13% 52.4M/405M [00:00<00:00, 521MB/s]\u001b[A\n",
"Downloading (…)l-00031-of-00033.bin: 28% 115M/405M [00:00<00:00, 530MB/s] \u001b[A\n",
"Downloading (…)l-00031-of-00033.bin: 44% 178M/405M [00:00<00:00, 528MB/s]\u001b[A\n",
"Downloading (…)l-00031-of-00033.bin: 60% 241M/405M [00:00<00:00, 556MB/s]\u001b[A\n",
"Downloading (…)l-00031-of-00033.bin: 75% 304M/405M [00:00<00:00, 578MB/s]\u001b[A\n",
"Downloading (…)l-00031-of-00033.bin: 100% 405M/405M [00:00<00:00, 511MB/s]\n",
"Downloading shards: 94% 31/33 [00:34<00:02, 1.09s/it]\n",
"Downloading (…)l-00032-of-00033.bin: 0% 0.00/405M [00:00, ?B/s]\u001b[A\n",
"Downloading (…)l-00032-of-00033.bin: 13% 52.4M/405M [00:00<00:00, 505MB/s]\u001b[A\n",
"Downloading (…)l-00032-of-00033.bin: 26% 105M/405M [00:00<00:00, 514MB/s] \u001b[A\n",
"Downloading (…)l-00032-of-00033.bin: 39% 157M/405M [00:00<00:00, 280MB/s]\u001b[A\n",
"Downloading (…)l-00032-of-00033.bin: 49% 199M/405M [00:00<00:00, 251MB/s]\u001b[A\n",
"Downloading (…)l-00032-of-00033.bin: 57% 231M/405M [00:00<00:00, 243MB/s]\u001b[A\n",
"Downloading (…)l-00032-of-00033.bin: 65% 262M/405M [00:00<00:00, 247MB/s]\u001b[A\n",
"Downloading (…)l-00032-of-00033.bin: 75% 304M/405M [00:01<00:00, 274MB/s]\u001b[A\n",
"Downloading (…)l-00032-of-00033.bin: 85% 346M/405M [00:01<00:00, 308MB/s]\u001b[A\n",
"Downloading (…)l-00032-of-00033.bin: 100% 405M/405M [00:01<00:00, 303MB/s]\n",
"Downloading shards: 97% 32/33 [00:36<00:01, 1.24s/it]\n",
"Downloading (…)l-00033-of-00033.bin: 0% 0.00/524M [00:00, ?B/s]\u001b[A\n",
"Downloading (…)l-00033-of-00033.bin: 10% 52.4M/524M [00:00<00:00, 511MB/s]\u001b[A\n",
"Downloading (…)l-00033-of-00033.bin: 22% 115M/524M [00:00<00:00, 524MB/s] \u001b[A\n",
"Downloading (…)l-00033-of-00033.bin: 34% 178M/524M [00:00<00:00, 525MB/s]\u001b[A\n",
"Downloading (…)l-00033-of-00033.bin: 44% 231M/524M [00:00<00:00, 523MB/s]\u001b[A\n",
"Downloading (…)l-00033-of-00033.bin: 54% 283M/524M [00:00<00:00, 523MB/s]\u001b[A\n",
"Downloading (…)l-00033-of-00033.bin: 66% 346M/524M [00:00<00:00, 525MB/s]\u001b[A\n",
"Downloading (…)l-00033-of-00033.bin: 78% 409M/524M [00:00<00:00, 521MB/s]\u001b[A\n",
"Downloading (…)l-00033-of-00033.bin: 88% 461M/524M [00:00<00:00, 515MB/s]\u001b[A\n",
"Downloading (…)l-00033-of-00033.bin: 100% 524M/524M [00:01<00:00, 504MB/s]\n",
"Downloading shards: 100% 33/33 [00:37<00:00, 1.14s/it]\n",
"[INFO|modeling_utils.py:1176] 2023-05-12 06:16:06,686 >> Instantiating LlamaForCausalLM model under default dtype torch.float16.\n",
"[INFO|configuration_utils.py:575] 2023-05-12 06:16:06,687 >> Generate config GenerationConfig {\n",
" \"_from_model_config\": true,\n",
" \"bos_token_id\": 0,\n",
" \"eos_token_id\": 1,\n",
" \"pad_token_id\": -1,\n",
" \"transformers_version\": \"4.28.1\"\n",
"}\n",
"\n",
"Loading checkpoint shards: 100% 33/33 [00:14<00:00, 2.35it/s]\n",
"[INFO|modeling_utils.py:3190] 2023-05-12 06:16:23,699 >> All model checkpoint weights were used when initializing LlamaForCausalLM.\n",
"\n",
"[INFO|modeling_utils.py:3198] 2023-05-12 06:16:23,699 >> All the weights of LlamaForCausalLM were initialized from the model checkpoint at decapoda-research/llama-7b-hf.\n",
"If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training.\n",
"Downloading (…)neration_config.json: 100% 124/124 [00:00<00:00, 781kB/s]\n",
"[INFO|configuration_utils.py:537] 2023-05-12 06:16:24,167 >> loading configuration file generation_config.json from cache at /root/.cache/huggingface/hub/models--decapoda-research--llama-7b-hf/snapshots/5f98eefcc80e437ef68d457ad7bf167c2c6a1348/generation_config.json\n",
"[INFO|configuration_utils.py:575] 2023-05-12 06:16:24,167 >> Generate config GenerationConfig {\n",
" \"_from_model_config\": true,\n",
" \"bos_token_id\": 0,\n",
" \"eos_token_id\": 1,\n",
" \"pad_token_id\": 0,\n",
" \"transformers_version\": \"4.28.1\"\n",
"}\n",
"\n",
"05/12/2023 06:16:56 - INFO - __main__ - Init new peft model\n",
"['q_proj', 'v_proj', 'k_proj', 'o_proj', 'gate_proj', 'down_proj', 'up_proj']\n",
"8\n",
"trainable params: 429203456 || all params: 6905475072 || trainable%: 6.2154080859739\n",
"[INFO|trainer.py:564] 2023-05-12 06:18:27,733 >> max_steps is given, it will override any value given in num_train_epochs\n",
"[INFO|trainer.py:621] 2023-05-12 06:18:27,734 >> Using cuda_amp half precision backend\n",
"/usr/local/lib/python3.10/dist-packages/transformers/optimization.py:391: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning\n",
" warnings.warn(\n",
"[2023-05-12 06:18:27,759] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed info: version=0.9.2, git-hash=unknown, git-branch=unknown\n",
"05/12/2023 06:18:31 - INFO - torch.distributed.distributed_c10d - Added key: store_based_barrier_key:2 to store for rank: 0\n",
"05/12/2023 06:18:31 - INFO - torch.distributed.distributed_c10d - Rank 0: Completed store-based barrier for key:store_based_barrier_key:2 with 1 nodes.\n",
"[2023-05-12 06:18:31,488] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Flops Profiler Enabled: False\n",
"[2023-05-12 06:18:31,489] [INFO] [logging.py:96:log_dist] [Rank 0] Removing param_group that has no 'params' in the client Optimizer\n",
"[2023-05-12 06:18:31,489] [INFO] [logging.py:96:log_dist] [Rank 0] Using client Optimizer as basic optimizer\n",
"[2023-05-12 06:18:31,533] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Basic Optimizer = AdamW\n",
"[2023-05-12 06:18:31,534] [INFO] [utils.py:54:is_zero_supported_optimizer] Checking ZeRO support for optimizer=AdamW type=\n",
"[2023-05-12 06:18:31,534] [WARNING] [engine.py:1104:_do_optimizer_sanity_check] **** You are using ZeRO with an untested optimizer, proceed with caution *****\n",
"[2023-05-12 06:18:31,534] [INFO] [logging.py:96:log_dist] [Rank 0] Creating torch.float16 ZeRO stage 2 optimizer\n",
"[2023-05-12 06:18:31,534] [INFO] [stage_1_and_2.py:133:__init__] Reduce bucket size 100000000\n",
"[2023-05-12 06:18:31,534] [INFO] [stage_1_and_2.py:134:__init__] Allgather bucket size 100000000\n",
"[2023-05-12 06:18:31,534] [INFO] [stage_1_and_2.py:135:__init__] CPU Offload: False\n",
"[2023-05-12 06:18:31,534] [INFO] [stage_1_and_2.py:136:__init__] Round robin gradient partitioning: False\n",
"Using /root/.cache/torch_extensions/py310_cu118 as PyTorch extensions root...\n",
"Creating extension directory /root/.cache/torch_extensions/py310_cu118/utils...\n",
"Emitting ninja build file /root/.cache/torch_extensions/py310_cu118/utils/build.ninja...\n",
"Building extension module utils...\n",
"Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)\n",
"[1/2] c++ -MMD -MF flatten_unflatten.o.d -DTORCH_EXTENSION_NAME=utils -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\\\"_gcc\\\" -DPYBIND11_STDLIB=\\\"_libstdcpp\\\" -DPYBIND11_BUILD_ABI=\\\"_cxxabi1011\\\" -isystem /usr/local/lib/python3.10/dist-packages/torch/include -isystem /usr/local/lib/python3.10/dist-packages/torch/include/torch/csrc/api/include -isystem /usr/local/lib/python3.10/dist-packages/torch/include/TH -isystem /usr/local/lib/python3.10/dist-packages/torch/include/THC -isystem /usr/include/python3.10 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -c /usr/local/lib/python3.10/dist-packages/deepspeed/ops/csrc/utils/flatten_unflatten.cpp -o flatten_unflatten.o \n",
"[2/2] c++ flatten_unflatten.o -shared -L/usr/local/lib/python3.10/dist-packages/torch/lib -lc10 -ltorch_cpu -ltorch -ltorch_python -o utils.so\n",
"Loading extension module utils...\n",
"Time to load utils op: 18.753242015838623 seconds\n",
"Rank: 0 partition count [1] and sizes[(429203456, False)] \n",
"[2023-05-12 06:18:52,360] [INFO] [utils.py:785:see_memory_usage] Before initializing optimizer states\n",
"[2023-05-12 06:18:52,361] [INFO] [utils.py:786:see_memory_usage] MA 14.49 GB Max_MA 15.29 GB CA 15.33 GB Max_CA 15 GB \n",
"[2023-05-12 06:18:52,361] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory: used = 4.4 GB, percent = 5.3%\n",
"[2023-05-12 06:18:52,592] [INFO] [utils.py:785:see_memory_usage] After initializing optimizer states\n",
"[2023-05-12 06:18:52,592] [INFO] [utils.py:786:see_memory_usage] MA 17.69 GB Max_MA 20.89 GB CA 21.73 GB Max_CA 22 GB \n",
"[2023-05-12 06:18:52,593] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory: used = 4.4 GB, percent = 5.3%\n",
"[2023-05-12 06:18:52,593] [INFO] [stage_1_and_2.py:489:__init__] optimizer state initialized\n",
"[2023-05-12 06:18:52,803] [INFO] [utils.py:785:see_memory_usage] After initializing ZeRO optimizer\n",
"[2023-05-12 06:18:52,804] [INFO] [utils.py:786:see_memory_usage] MA 17.69 GB Max_MA 17.69 GB CA 21.73 GB Max_CA 22 GB \n",
"[2023-05-12 06:18:52,804] [INFO] [utils.py:793:see_memory_usage] CPU Virtual Memory: used = 4.4 GB, percent = 5.3%\n",
"[2023-05-12 06:18:52,818] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed Final Optimizer = AdamW\n",
"[2023-05-12 06:18:52,818] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed using client LR scheduler\n",
"[2023-05-12 06:18:52,818] [INFO] [logging.py:96:log_dist] [Rank 0] DeepSpeed LR Scheduler = \n",
"[2023-05-12 06:18:52,818] [INFO] [logging.py:96:log_dist] [Rank 0] step=0, skipped=0, lr=[0.0], mom=[(0.9, 0.999)]\n",
"[2023-05-12 06:18:52,820] [INFO] [config.py:955:print] DeepSpeedEngine configuration:\n",
"[2023-05-12 06:18:52,821] [INFO] [config.py:959:print] activation_checkpointing_config {\n",
" \"partition_activations\": false, \n",
" \"contiguous_memory_optimization\": false, \n",
" \"cpu_checkpointing\": false, \n",
" \"number_checkpoints\": null, \n",
" \"synchronize_checkpoint_boundary\": false, \n",
" \"profile\": false\n",
"}\n",
"[2023-05-12 06:18:52,821] [INFO] [config.py:959:print] aio_config ................... {'block_size': 1048576, 'queue_depth': 8, 'thread_count': 1, 'single_submit': False, 'overlap_events': True}\n",
"[2023-05-12 06:18:52,821] [INFO] [config.py:959:print] amp_enabled .................. False\n",
"[2023-05-12 06:18:52,821] [INFO] [config.py:959:print] amp_params ................... False\n",
"[2023-05-12 06:18:52,821] [INFO] [config.py:959:print] autotuning_config ............ {\n",
" \"enabled\": false, \n",
" \"start_step\": null, \n",
" \"end_step\": null, \n",
" \"metric_path\": null, \n",
" \"arg_mappings\": null, \n",
" \"metric\": \"throughput\", \n",
" \"model_info\": null, \n",
" \"results_dir\": \"autotuning_results\", \n",
" \"exps_dir\": \"autotuning_exps\", \n",
" \"overwrite\": true, \n",
" \"fast\": true, \n",
" \"start_profile_step\": 3, \n",
" \"end_profile_step\": 5, \n",
" \"tuner_type\": \"gridsearch\", \n",
" \"tuner_early_stopping\": 5, \n",
" \"tuner_num_trials\": 50, \n",
" \"model_info_path\": null, \n",
" \"mp_size\": 1, \n",
" \"max_train_batch_size\": null, \n",
" \"min_train_batch_size\": 1, \n",
" \"max_train_micro_batch_size_per_gpu\": 1.024000e+03, \n",
" \"min_train_micro_batch_size_per_gpu\": 1, \n",
" \"num_tuning_micro_batch_sizes\": 3\n",
"}\n",
"[2023-05-12 06:18:52,821] [INFO] [config.py:959:print] bfloat16_enabled ............. False\n",
"[2023-05-12 06:18:52,821] [INFO] [config.py:959:print] checkpoint_parallel_write_pipeline False\n",
"[2023-05-12 06:18:52,821] [INFO] [config.py:959:print] checkpoint_tag_validation_enabled True\n",
"[2023-05-12 06:18:52,821] [INFO] [config.py:959:print] checkpoint_tag_validation_fail False\n",
"[2023-05-12 06:18:52,822] [INFO] [config.py:959:print] comms_config ................. \n",
"[2023-05-12 06:18:52,822] [INFO] [config.py:959:print] communication_data_type ...... None\n",
"[2023-05-12 06:18:52,822] [INFO] [config.py:959:print] compression_config ........... {'weight_quantization': {'shared_parameters': {'enabled': False, 'quantizer_kernel': False, 'schedule_offset': 0, 'quantize_groups': 1, 'quantize_verbose': False, 'quantization_type': 'symmetric', 'quantize_weight_in_forward': False, 'rounding': 'nearest', 'fp16_mixed_quantize': False, 'quantize_change_ratio': 0.001}, 'different_groups': {}}, 'activation_quantization': {'shared_parameters': {'enabled': False, 'quantization_type': 'symmetric', 'range_calibration': 'dynamic', 'schedule_offset': 1000}, 'different_groups': {}}, 'sparse_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'row_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'head_pruning': {'shared_parameters': {'enabled': False, 'method': 'topk', 'schedule_offset': 1000}, 'different_groups': {}}, 'channel_pruning': {'shared_parameters': {'enabled': False, 'method': 'l1', 'schedule_offset': 1000}, 'different_groups': {}}, 'layer_reduction': {'enabled': False}}\n",
"[2023-05-12 06:18:52,822] [INFO] [config.py:959:print] curriculum_enabled_legacy .... False\n",
"[2023-05-12 06:18:52,822] [INFO] [config.py:959:print] curriculum_params_legacy ..... False\n",
"[2023-05-12 06:18:52,822] [INFO] [config.py:959:print] data_efficiency_config ....... {'enabled': False, 'seed': 1234, 'data_sampling': {'enabled': False, 'num_epochs': 1000, 'num_workers': 0, 'curriculum_learning': {'enabled': False}}, 'data_routing': {'enabled': False, 'random_ltd': {'enabled': False, 'layer_token_lr_schedule': {'enabled': False}}}}\n",
"[2023-05-12 06:18:52,822] [INFO] [config.py:959:print] data_efficiency_enabled ...... False\n",
"[2023-05-12 06:18:52,822] [INFO] [config.py:959:print] dataloader_drop_last ......... False\n",
"[2023-05-12 06:18:52,822] [INFO] [config.py:959:print] disable_allgather ............ False\n",
"[2023-05-12 06:18:52,822] [INFO] [config.py:959:print] dump_state ................... False\n",
"[2023-05-12 06:18:52,822] [INFO] [config.py:959:print] dynamic_loss_scale_args ...... {'init_scale': 65536, 'scale_window': 100, 'delayed_shift': 2, 'min_scale': 1e-10}\n",
"[2023-05-12 06:18:52,822] [INFO] [config.py:959:print] eigenvalue_enabled ........... False\n",
"[2023-05-12 06:18:52,822] [INFO] [config.py:959:print] eigenvalue_gas_boundary_resolution 1\n",
"[2023-05-12 06:18:52,822] [INFO] [config.py:959:print] eigenvalue_layer_name ........ bert.encoder.layer\n",
"[2023-05-12 06:18:52,822] [INFO] [config.py:959:print] eigenvalue_layer_num ......... 0\n",
"[2023-05-12 06:18:52,822] [INFO] [config.py:959:print] eigenvalue_max_iter .......... 100\n",
"[2023-05-12 06:18:52,822] [INFO] [config.py:959:print] eigenvalue_stability ......... 1e-06\n",
"[2023-05-12 06:18:52,822] [INFO] [config.py:959:print] eigenvalue_tol ............... 0.01\n",
"[2023-05-12 06:18:52,822] [INFO] [config.py:959:print] eigenvalue_verbose ........... False\n",
"[2023-05-12 06:18:52,822] [INFO] [config.py:959:print] elasticity_enabled ........... False\n",
"[2023-05-12 06:18:52,822] [INFO] [config.py:959:print] flops_profiler_config ........ {\n",
" \"enabled\": false, \n",
" \"profile_step\": 1, \n",
" \"module_depth\": -1, \n",
" \"top_modules\": 1, \n",
" \"detailed\": true, \n",
" \"output_file\": null\n",
"}\n",
"[2023-05-12 06:18:52,822] [INFO] [config.py:959:print] fp16_auto_cast ............... False\n",
"[2023-05-12 06:18:52,823] [INFO] [config.py:959:print] fp16_enabled ................. True\n",
"[2023-05-12 06:18:52,823] [INFO] [config.py:959:print] fp16_master_weights_and_gradients False\n",
"[2023-05-12 06:18:52,823] [INFO] [config.py:959:print] global_rank .................. 0\n",
"[2023-05-12 06:18:52,823] [INFO] [config.py:959:print] grad_accum_dtype ............. None\n",
"[2023-05-12 06:18:52,823] [INFO] [config.py:959:print] gradient_accumulation_steps .. 1\n",
"[2023-05-12 06:18:52,823] [INFO] [config.py:959:print] gradient_clipping ............ 1.0\n",
"[2023-05-12 06:18:52,823] [INFO] [config.py:959:print] gradient_predivide_factor .... 1.0\n",
"[2023-05-12 06:18:52,823] [INFO] [config.py:959:print] hybrid_engine ................ enabled=False max_out_tokens=512 inference_tp_size=1 release_inference_cache=False pin_parameters=True tp_gather_partition_size=8\n",
"[2023-05-12 06:18:52,823] [INFO] [config.py:959:print] initial_dynamic_scale ........ 65536\n",
"[2023-05-12 06:18:52,823] [INFO] [config.py:959:print] load_universal_checkpoint .... False\n",
"[2023-05-12 06:18:52,823] [INFO] [config.py:959:print] loss_scale ................... 0\n",
"[2023-05-12 06:18:52,823] [INFO] [config.py:959:print] memory_breakdown ............. False\n",
"[2023-05-12 06:18:52,823] [INFO] [config.py:959:print] mics_hierarchial_params_gather False\n",
"[2023-05-12 06:18:52,823] [INFO] [config.py:959:print] mics_shard_size .............. -1\n",
"[2023-05-12 06:18:52,823] [INFO] [config.py:959:print] monitor_config ............... tensorboard=TensorBoardConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') wandb=WandbConfig(enabled=False, group=None, team=None, project='deepspeed') csv_monitor=CSVConfig(enabled=False, output_path='', job_name='DeepSpeedJobName') enabled=False\n",
"[2023-05-12 06:18:52,823] [INFO] [config.py:959:print] nebula_config ................ {\n",
" \"enabled\": false, \n",
" \"persistent_storage_path\": null, \n",
" \"persistent_time_interval\": 100, \n",
" \"num_of_version_in_retention\": 2, \n",
" \"enable_nebula_load\": true, \n",
" \"load_path\": null\n",
"}\n",
"[2023-05-12 06:18:52,823] [INFO] [config.py:959:print] optimizer_legacy_fusion ...... False\n",
"[2023-05-12 06:18:52,823] [INFO] [config.py:959:print] optimizer_name ............... None\n",
"[2023-05-12 06:18:52,823] [INFO] [config.py:959:print] optimizer_params ............. None\n",
"[2023-05-12 06:18:52,823] [INFO] [config.py:959:print] pipeline ..................... {'stages': 'auto', 'partition': 'best', 'seed_layers': False, 'activation_checkpoint_interval': 0}\n",
"[2023-05-12 06:18:52,823] [INFO] [config.py:959:print] pld_enabled .................. False\n",
"[2023-05-12 06:18:52,824] [INFO] [config.py:959:print] pld_params ................... False\n",
"[2023-05-12 06:18:52,824] [INFO] [config.py:959:print] prescale_gradients ........... False\n",
"[2023-05-12 06:18:52,824] [INFO] [config.py:959:print] scheduler_name ............... None\n",
"[2023-05-12 06:18:52,824] [INFO] [config.py:959:print] scheduler_params ............. None\n",
"[2023-05-12 06:18:52,824] [INFO] [config.py:959:print] sparse_attention ............. None\n",
"[2023-05-12 06:18:52,824] [INFO] [config.py:959:print] sparse_gradients_enabled ..... False\n",
"[2023-05-12 06:18:52,824] [INFO] [config.py:959:print] steps_per_print .............. 2000\n",
"[2023-05-12 06:18:52,824] [INFO] [config.py:959:print] train_batch_size ............. 1\n",
"[2023-05-12 06:18:52,824] [INFO] [config.py:959:print] train_micro_batch_size_per_gpu 1\n",
"[2023-05-12 06:18:52,824] [INFO] [config.py:959:print] use_node_local_storage ....... False\n",
"[2023-05-12 06:18:52,824] [INFO] [config.py:959:print] wall_clock_breakdown ......... False\n",
"[2023-05-12 06:18:52,824] [INFO] [config.py:959:print] world_size ................... 1\n",
"[2023-05-12 06:18:52,824] [INFO] [config.py:959:print] zero_allow_untested_optimizer True\n",
"[2023-05-12 06:18:52,824] [INFO] [config.py:959:print] zero_config .................. stage=2 contiguous_gradients=True reduce_scatter=True reduce_bucket_size=100000000 allgather_partitions=True allgather_bucket_size=100000000 overlap_comm=True load_from_fp32_weights=True elastic_checkpoint=False offload_param=None offload_optimizer=None sub_group_size=1,000,000,000 cpu_offload_param=None cpu_offload_use_pin_memory=None cpu_offload=None prefetch_bucket_size=50,000,000 param_persistence_threshold=100,000 model_persistence_threshold=sys.maxsize max_live_parameters=1,000,000,000 max_reuse_distance=1,000,000,000 gather_16bit_weights_on_model_save=False stage3_gather_fp16_weights_on_model_save=False ignore_unused_parameters=True legacy_stage1=False round_robin_gradients=False mics_shard_size=-1 mics_hierarchical_params_gather=False memory_efficient_linear=True\n",
"[2023-05-12 06:18:52,824] [INFO] [config.py:959:print] zero_enabled ................. True\n",
"[2023-05-12 06:18:52,824] [INFO] [config.py:959:print] zero_force_ds_cpu_optimizer .. True\n",
"[2023-05-12 06:18:52,824] [INFO] [config.py:959:print] zero_optimization_stage ...... 2\n",
"[2023-05-12 06:18:52,824] [INFO] [config.py:945:print_user_config] json = {\n",
" \"fp16\": {\n",
" \"enabled\": true, \n",
" \"loss_scale\": 0, \n",
" \"loss_scale_window\": 100, \n",
" \"initial_scale_power\": 16, \n",
" \"hysteresis\": 2, \n",
" \"min_loss_scale\": 1e-10\n",
" }, \n",
" \"zero_optimization\": {\n",
" \"stage\": 2, \n",
" \"allgather_partitions\": true, \n",
" \"allgather_bucket_size\": 1.000000e+08, \n",
" \"overlap_comm\": true, \n",
" \"reduce_scatter\": true, \n",
" \"reduce_bucket_size\": 1.000000e+08, \n",
" \"contiguous_gradients\": true\n",
" }, \n",
" \"gradient_accumulation_steps\": 1, \n",
" \"gradient_clipping\": 1.0, \n",
" \"steps_per_print\": 2.000000e+03, \n",
" \"train_batch_size\": 1, \n",
" \"train_micro_batch_size_per_gpu\": 1, \n",
" \"wall_clock_breakdown\": false, \n",
" \"zero_allow_untested_optimizer\": true\n",
"}\n",
"Using /root/.cache/torch_extensions/py310_cu118 as PyTorch extensions root...\n",
"No modifications detected for re-loaded extension module utils, skipping build step...\n",
"Loading extension module utils...\n",
"Time to load utils op: 0.00040268898010253906 seconds\n",
"[INFO|trainer.py:1769] 2023-05-12 06:18:52,827 >> ***** Running training *****\n",
"[INFO|trainer.py:1770] 2023-05-12 06:18:52,827 >> Num examples = 6,906\n",
"[INFO|trainer.py:1771] 2023-05-12 06:18:52,827 >> Num Epochs = 1\n",
"[INFO|trainer.py:1772] 2023-05-12 06:18:52,827 >> Instantaneous batch size per device = 1\n",
"[INFO|trainer.py:1773] 2023-05-12 06:18:52,827 >> Total train batch size (w. parallel, distributed & accumulation) = 1\n",
"[INFO|trainer.py:1774] 2023-05-12 06:18:52,827 >> Gradient Accumulation steps = 1\n",
"[INFO|trainer.py:1775] 2023-05-12 06:18:52,827 >> Total optimization steps = 100\n",
"[INFO|trainer.py:1776] 2023-05-12 06:18:52,830 >> Number of trainable parameters = 429,203,456\n",
" 0% 0/100 [00:00, ?it/s][WARNING|logging.py:295] 2023-05-12 06:18:52,874 >> `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...\n",
"[2023-05-12 06:18:56,313] [INFO] [loss_scaler.py:188:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, but hysteresis is 2. Reducing hysteresis to 1\n",
"{'loss': 10.3047, 'learning_rate': 0.0, 'epoch': 0.0}\n",
" 1% 1/100 [00:03<05:43, 3.47s/it][2023-05-12 06:18:56,654] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 65536, reducing to 32768\n",
" 2% 2/100 [00:03<02:39, 1.63s/it][2023-05-12 06:18:56,989] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 32768, reducing to 16384\n",
" 4% 4/100 [00:04<01:16, 1.25it/s][2023-05-12 06:18:57,753] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 16384, reducing to 8192\n",
" 6% 6/100 [00:05<00:52, 1.78it/s][2023-05-12 06:18:58,534] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 8192, reducing to 4096\n",
"{'loss': 10.4913, 'learning_rate': 0.0002, 'epoch': 0.0}\n",
"{'loss': 8.243, 'learning_rate': 0.00019458172417006347, 'epoch': 0.0}\n",
" 25% 25/100 [00:13<00:32, 2.30it/s][2023-05-12 06:19:06,764] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 4096, reducing to 2048\n",
"{'loss': 7.6727, 'learning_rate': 0.00018090169943749476, 'epoch': 0.0}\n",
"{'loss': 7.4988, 'learning_rate': 0.00015743286626829437, 'epoch': 0.01}\n",
"{'loss': 7.0074, 'learning_rate': 0.00012774029087618446, 'epoch': 0.01}\n",
" 50% 50/100 [00:24<00:21, 2.31it/s][INFO|trainer.py:2868] 2023-05-12 06:19:17,175 >> Saving model checkpoint to /content/output_model/checkpoint-50\n",
"[INFO|trainer.py:2880] 2023-05-12 06:19:17,182 >> Trainer.model is not a `PreTrainedModel`, only saving its state dict.\n",
"[INFO|tokenization_utils_base.py:2171] 2023-05-12 06:19:18,702 >> tokenizer config file saved in /content/output_model/checkpoint-50/tokenizer_config.json\n",
"[INFO|tokenization_utils_base.py:2178] 2023-05-12 06:19:18,702 >> Special tokens file saved in /content/output_model/checkpoint-50/special_tokens_map.json\n",
"[2023-05-12 06:19:18,704] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step50 is about to be saved!\n",
"[2023-05-12 06:19:27,197] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: /content/output_model/checkpoint-50/global_step50/mp_rank_00_model_states.pt\n",
"[2023-05-12 06:19:27,197] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving /content/output_model/checkpoint-50/global_step50/mp_rank_00_model_states.pt...\n",
"[2023-05-12 06:20:06,793] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved /content/output_model/checkpoint-50/global_step50/mp_rank_00_model_states.pt.\n",
"[2023-05-12 06:20:07,322] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving /content/output_model/checkpoint-50/global_step50/zero_pp_rank_0_mp_rank_00_optim_states.pt...\n",
"[2023-05-12 06:20:21,414] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved /content/output_model/checkpoint-50/global_step50/zero_pp_rank_0_mp_rank_00_optim_states.pt.\n",
"[2023-05-12 06:20:21,415] [INFO] [engine.py:3228:_save_zero_checkpoint] zero checkpoint saved /content/output_model/checkpoint-50/global_step50/zero_pp_rank_0_mp_rank_00_optim_states.pt\n",
"[2023-05-12 06:20:21,415] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step50 is ready now!\n",
"{'loss': 7.0254, 'learning_rate': 9.504162453267777e-05, 'epoch': 0.01}\n",
" 66% 66/100 [01:35<00:17, 1.90it/s][2023-05-12 06:20:28,718] [INFO] [loss_scaler.py:181:update_scale] [deepspeed] OVERFLOW! Rank 0 Skipping step. Attempted loss scale: 2048, reducing to 1024\n",
"{'loss': 6.607, 'learning_rate': 6.59706825558357e-05, 'epoch': 0.01}\n",
"{'loss': 6.4305, 'learning_rate': 3.7282364152646297e-05, 'epoch': 0.01}\n",
"{'loss': 6.5699, 'learning_rate': 1.5390474757906446e-05, 'epoch': 0.01}\n",
"{'loss': 5.8727, 'learning_rate': 2.667340275199426e-06, 'epoch': 0.01}\n",
"100% 100/100 [01:50<00:00, 2.32it/s][INFO|trainer.py:2868] 2023-05-12 06:20:42,974 >> Saving model checkpoint to /content/output_model/checkpoint-100\n",
"[INFO|trainer.py:2880] 2023-05-12 06:20:42,981 >> Trainer.model is not a `PreTrainedModel`, only saving its state dict.\n",
"[INFO|tokenization_utils_base.py:2171] 2023-05-12 06:20:44,510 >> tokenizer config file saved in /content/output_model/checkpoint-100/tokenizer_config.json\n",
"[INFO|tokenization_utils_base.py:2178] 2023-05-12 06:20:44,510 >> Special tokens file saved in /content/output_model/checkpoint-100/special_tokens_map.json\n",
"[2023-05-12 06:20:44,512] [INFO] [logging.py:96:log_dist] [Rank 0] [Torch] Checkpoint global_step100 is about to be saved!\n",
"[2023-05-12 06:20:53,180] [INFO] [logging.py:96:log_dist] [Rank 0] Saving model checkpoint: /content/output_model/checkpoint-100/global_step100/mp_rank_00_model_states.pt\n",
"[2023-05-12 06:20:53,180] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving /content/output_model/checkpoint-100/global_step100/mp_rank_00_model_states.pt...\n",
"[2023-05-12 06:21:31,680] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved /content/output_model/checkpoint-100/global_step100/mp_rank_00_model_states.pt.\n",
"[2023-05-12 06:21:32,241] [INFO] [torch_checkpoint_engine.py:21:save] [Torch] Saving /content/output_model/checkpoint-100/global_step100/zero_pp_rank_0_mp_rank_00_optim_states.pt...\n",
"[2023-05-12 06:21:46,217] [INFO] [torch_checkpoint_engine.py:23:save] [Torch] Saved /content/output_model/checkpoint-100/global_step100/zero_pp_rank_0_mp_rank_00_optim_states.pt.\n",
"[2023-05-12 06:21:46,218] [INFO] [engine.py:3228:_save_zero_checkpoint] zero checkpoint saved /content/output_model/checkpoint-100/global_step100/zero_pp_rank_0_mp_rank_00_optim_states.pt\n",
"[2023-05-12 06:21:46,218] [INFO] [torch_checkpoint_engine.py:33:commit] [Torch] Checkpoint global_step100 is ready now!\n",
"[INFO|trainer.py:2039] 2023-05-12 06:21:46,220 >> \n",
"\n",
"Training completed. Do not forget to share your model on huggingface.co/models =)\n",
"\n",
"\n",
"{'train_runtime': 173.3904, 'train_samples_per_second': 0.577, 'train_steps_per_second': 0.577, 'train_loss': 7.34, 'epoch': 0.01}\n",
"100% 100/100 [02:54<00:00, 1.75s/it]\n",
"[INFO|trainer.py:2868] 2023-05-12 06:21:47,497 >> Saving model checkpoint to /content/output_model\n",
"[INFO|trainer.py:2880] 2023-05-12 06:21:47,504 >> Trainer.model is not a `PreTrainedModel`, only saving its state dict.\n",
"[INFO|tokenization_utils_base.py:2171] 2023-05-12 06:21:49,250 >> tokenizer config file saved in /content/output_model/tokenizer_config.json\n",
"[INFO|tokenization_utils_base.py:2178] 2023-05-12 06:21:49,250 >> Special tokens file saved in /content/output_model/special_tokens_map.json\n",
"***** train metrics *****\n",
" epoch = 0.01\n",
" train_loss = 7.34\n",
" train_runtime = 0:02:53.39\n",
" train_samples = 6906\n",
" train_samples_per_second = 0.577\n",
" train_steps_per_second = 0.577\n"
]
}
],
"source": [
"!cd Chinese-LLaMA-Alpaca/scripts && torchrun --nnodes 1 --nproc_per_node 1 run_clm_pt_with_peft.py \\\n",
" --deepspeed ds_zero2_no_offload.json \\\n",
" --model_name_or_path decapoda-research/llama-7b-hf \\\n",
" --tokenizer_name_or_path ziqingyang/chinese-llama-lora-7b \\\n",
" --dataset_dir /content/Chinese-LLaMA-Alpaca/pt_data \\\n",
" --data_cache_dir data_cache \\\n",
" --validation_split_percentage 0.001 \\\n",
" --per_device_train_batch_size 1 \\\n",
" --do_train \\\n",
" --fp16 \\\n",
" --seed $RANDOM \\\n",
" --max_steps 100 \\\n",
" --lr_scheduler_type cosine \\\n",
" --learning_rate 2e-4 \\\n",
" --warmup_ratio 0.05 \\\n",
" --weight_decay 0.01 \\\n",
" --logging_strategy steps \\\n",
" --logging_steps 10 \\\n",
" --save_strategy steps \\\n",
" --save_total_limit 3 \\\n",
" --save_steps 50 \\\n",
" --gradient_accumulation_steps 1 \\\n",
" --preprocessing_num_workers 8 \\\n",
" --block_size 512 \\\n",
" --output_dir /content/output_model \\\n",
" --overwrite_output_dir \\\n",
" --ddp_timeout 30000 \\\n",
" --logging_first_step True \\\n",
" --lora_rank 8 \\\n",
" --lora_alpha 32\\\n",
" --trainable q_proj,v_proj,k_proj,o_proj,gate_proj,down_proj,up_proj \\\n",
" --modules_to_save embed_tokens,lm_head \\\n",
" --lora_dropout 0.05 \\\n",
" --torch_dtype float16 \\\n",
" --gradient_checkpointing \\\n",
" --ddp_find_unused_parameters False"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "hvVWDy9YPzG1"
},
"source": [
"After training, rename saved `pytorch_model.bin` to `adapter_model.bin`"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"id": "KnA4qnBCX3ev"
},
"outputs": [],
"source": [
"!mkdir output_model/peft_model\n",
"!mv output_model/pytorch_model.bin output_model/peft_model/adapter_model.bin"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {
"id": "hDRJlD8sYs7E"
},
"source": [
"Lastly, you need to manually create an `adapter_config.json` under `peft_model` and fill in the hyperparamters such as `lora_rank`, `lora_alpha` etc., whose content and \n",
"format can be referenced from the corresponding file in Chinese-LLaMA-LoRA."
]
}
],
"metadata": {
"accelerator": "GPU",
"colab": {
"gpuType": "A100",
"machine_shape": "hm",
"provenance": []
},
"gpuClass": "standard",
"kernelspec": {
"display_name": "Python 3",
"name": "python3"
},
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
================================================
FILE: requirements.txt
================================================
torch==1.13.1
git+https://github.com/huggingface/peft.git@13e53fc
transformers==4.30.0
sentencepiece==0.1.97
================================================
FILE: scripts/README.md
================================================
# 代码与脚本 Code and Scripts
### training/
预训练与指令精调代码,Wiki:
- 预训练:[https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/预训练脚本](https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/预训练脚本)
- 指令精调:[https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/指令精调脚本](https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/指令精调脚本)
Pre-training and instruction finetuning code, Wiki:
- Pre-training: https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/Pretraining-Script
- Instruction finetuning: https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/SFT-Script
### inference/
使用🤗transformers进行推理,Wiki:[https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/使用Transformers推理](https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/使用Transformers推理)
Inference using 🤗transformers, Wiki: https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/Inference-with-Transformers
### langchain/
使用LangChain进行检索式问答和文本摘要的示例,Wiki:[https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/与LangChain进行集成](https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/与LangChain进行集成)
Using LangChain for Retrieval QA and Summarization, Wiki: https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/Integrated-with-LangChain
### openai_server_demo/
使用fastapi实现的仿OPENAI API风格的服务器,Wiki:[https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/API调用](https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/API调用)
A server that implements OPENAI API using fastapi, Wiki: [https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/API-Calls](https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/API-Calls)
### merge_tokenizer/
中文词表扩充代码,Wiki: [https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/训练细节#准备工作词表扩充](https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/训练细节#准备工作词表扩充)
Code for extending Chinese vocabulary, Wiki: https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/Training-Details#preparation-vocabulary-expansion
### merge_llama_with_chinese_lora.py
合并LLaMA/Alpaca LoRA脚本,Wiki: [https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/手动模型合并与转换](https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/手动模型合并与转换)
Script for merging LLaMA/Alpaca LoRA. Wiki: https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/Manual-Conversion
### merge_llama_with_chinese_lora_low_mem.py
(推荐)低资源版合并LLaMA/Alpaca LoRA脚本,Wiki: [https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/手动模型合并与转换](https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/手动模型合并与转换)
(recommended)Script for merging LLaMA/Alpaca LoRA (low-resource version). Wiki: https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/Manual-Conversion
### crawl_prompt.py
指令数据爬取脚本,Wiki:[https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/训练细节#训练数据](https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/训练细节#训练数据)
Crawl prompt script, Wiki: https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/Training-Details#training-data
================================================
FILE: scripts/ceval/eval.py
================================================
# This code is modified from C-Eval Project: https://github.com/SJTU-LIT/ceval
import os
import argparse
import pandas as pd
import torch
import json
from llama_evaluator import Llama_Evaluator
import time
choices = ["A", "B", "C", "D"]
def main(args, evaluator,take):
assert os.path.exists("subject_mapping.json"), "subject_mapping.json not found!"
with open("subject_mapping.json") as f:
subject_mapping = json.load(f)
filenames = os.listdir("data/val")
subject_list = [val_file.replace("_val.csv","") for val_file in filenames]
accuracy, summary = {}, {}
run_date=time.strftime('%Y-%m-%d_%H-%M-%S',time.localtime(time.time()))
output_dir = args.output_dir
save_result_dir=os.path.join(output_dir,f"take{take}")
if not os.path.exists(save_result_dir):
os.makedirs(save_result_dir,exist_ok=True)
all_answers = {}
for index,subject_name in enumerate(subject_list):
print(f"{index/len(subject_list)} Inference starts at {run_date} on {args.model_path} with subject of {subject_name}!")
val_file_path=os.path.join('data/val',f'{subject_name}_val.csv')
dev_file_path=os.path.join('data/dev',f'{subject_name}_dev.csv')
test_file_path=os.path.join('data/test',f'{subject_name}_test.csv')
val_df=pd.read_csv(val_file_path) if args.do_test is False else pd.read_csv(test_file_path)
dev_df=pd.read_csv(dev_file_path) if args.few_shot else None
correct_ratio, answers = evaluator.eval_subject(subject_name, val_df, dev_df,
save_result_dir=save_result_dir if args.do_save_csv else None,
few_shot=args.few_shot,
cot=args.cot,
with_prompt=args.with_prompt,
constrained_decoding=args.constrained_decoding,
do_test=args.do_test)
print(f"Subject: {subject_name}")
print(f"Acc: {correct_ratio}")
accuracy[subject_name] = correct_ratio
summary[subject_name] = {"score":correct_ratio,
"num":len(val_df),
"correct":correct_ratio*len(val_df)/100}
all_answers[subject_name] = answers
json.dump(all_answers,open(save_result_dir+'/submission.json','w'),ensure_ascii=False,indent=4)
print("Accuracy:")
for k, v in accuracy.items():
print(k, ": ", v)
total_num = 0
total_correct = 0
summary['grouped'] = {
"STEM": {"correct": 0.0, "num": 0},
"Social Science": {"correct": 0.0, "num": 0},
"Humanities": {"correct": 0.0, "num": 0},
"Other": {"correct": 0.0, "num": 0}
}
for subj, info in subject_mapping.items():
group = info[2]
summary['grouped'][group]["num"] += summary[subj]['num']
summary['grouped'][group]["correct"] += summary[subj]['correct']
for group, info in summary['grouped'].items():
info['score'] = info["correct"] / info["num"]
total_num += info["num"]
total_correct += info["correct"]
summary['All'] = {"score": total_correct / total_num, "num": total_num, "correct": total_correct}
json.dump(summary,open(save_result_dir+'/summary.json','w'),ensure_ascii=False,indent=2)
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--model_path", type=str)
parser.add_argument("--cot",choices=["False","True"], default="False")
parser.add_argument("--few_shot", choices=["False","True"], default="True")
parser.add_argument("--ntrain", "-k", type=int, default=5)
parser.add_argument("--with_prompt", choices=["False","True"], default="False")
parser.add_argument("--constrained_decoding", choices=["False","True"], default="True")
parser.add_argument("--temperature",type=float,default=0.2)
parser.add_argument("--n_times", default=1,type=int)
parser.add_argument("--do_save_csv", choices=["False","True"], default="False")
parser.add_argument("--output_dir", type=str)
parser.add_argument("--do_test", choices=["False","True"], default="False")
args = parser.parse_args()
args.cot = args.cot == "True"
args.few_shot = args.few_shot == "True"
args.with_prompt = args.with_prompt == "True"
args.constrained_decoding = args.constrained_decoding == "True"
args.do_test = args.do_test == "True"
args.do_save_csv = args.do_save_csv == "True"
if args.constrained_decoding is True:
args.n_times=max(args.n_times,1)
print(args)
device = torch.device(0)
print(device)
evaluator=Llama_Evaluator(
choices=choices,
k=args.ntrain,
model_path=args.model_path,
device=device,
temperature = args.temperature
)
for i in range(args.n_times):
main(args,evaluator=evaluator,take=i)
================================================
FILE: scripts/ceval/evaluator.py
================================================
# This code is modified from C-Eval Project: https://github.com/SJTU-LIT/ceval
import string
class Evaluator:
def __init__(self, choices, model_name, k=-1):
self.choices = choices
self.model_name = model_name
self.k = k
self.puncs = list(string.punctuation)
def format_example(self, line, include_answer=True):
example = line['question']
for choice in self.choices:
example += f'\n{choice}. {line[f"{choice}"]}'
example += '\n答案:'
if include_answer:
example += f'{line["answer"]}\n\n'
return example
def generate_few_shot_prompt(self, subject, dev_df):
prompt = f"以下是中国关于{subject}考试的单项选择题,请选出其中的正确答案。\n\n"
k = self.k
if self.k == -1:
k = dev_df.shape[0]
for i in range(k):
prompt += self.format_example(dev_df.iloc[i, :])
return prompt
def eval_subject(self, subject_name, test_df, dev_df=None, few_shot=False, save_result_dir=None):
pass
def normalize_answer(self,s):
def white_space_fix(text):
return ' '.join(text.split())
def remove_punc(text):
exclude=set(self.puncs)
return ''.join(ch for ch in text if ch not in exclude)
def lower(text):
return text.lower()
return white_space_fix(remove_punc(lower(s)))
def exact_match(self,pred, target):
return self.normalize_answer(pred)==self.normalize_answer(target)
================================================
FILE: scripts/ceval/llama_evaluator.py
================================================
# This code is modified from C-Eval Project: https://github.com/SJTU-LIT/ceval
import os
import re
from tqdm import tqdm
import random
import numpy as np
import torch
from transformers import LlamaForCausalLM, LlamaTokenizer
from evaluator import Evaluator
class Llama_Evaluator(Evaluator):
def __init__(self, choices, k, model_path, device, temperature=0.2):
super(Llama_Evaluator, self).__init__(choices, model_path, k)
load_type = torch.float16
self.model_path = model_path
self.device = device
self.tokenizer = LlamaTokenizer.from_pretrained(model_path)
self.model = LlamaForCausalLM.from_pretrained(
model_path,
load_in_8bit=False,
torch_dtype=load_type,
low_cpu_mem_usage=True,
device_map='auto')
self.generation_config = dict(
temperature=temperature,
top_k=40,
top_p=0.9,
do_sample=True,
num_beams=1,
repetition_penalty=1.1,
max_new_tokens=20
)
self.sA_id = self.tokenizer.encode("A", add_special_tokens=False)[0]
self.sB_id = self.tokenizer.encode("B", add_special_tokens=False)[0]
self.sC_id = self.tokenizer.encode("C", add_special_tokens=False)[0]
self.sD_id = self.tokenizer.encode("D", add_special_tokens=False)[0]
self.A_id = self.tokenizer.encode(":A")[-1]
self.B_id = self.tokenizer.encode(":B")[-1]
self.C_id = self.tokenizer.encode(":C")[-1]
self.D_id = self.tokenizer.encode(":D")[-1]
def eval_subject(self, subject_name,
test_df,
dev_df=None,
few_shot=False,
cot=False,
save_result_dir=None,
with_prompt=False,
constrained_decoding=False,
do_test=False):
all_answers = {}
if constrained_decoding is True:
self.generation_config['output_scores'] = True
self.generation_config['return_dict_in_generate'] = True
self.generation_config['max_new_tokens'] = 1
self.generation_config['top_p'] = 1.0
self.generation_config['top_k'] = 0
correct_num = 0
if save_result_dir:
result = []
score = []
if few_shot:
history = self.generate_few_shot_prompt(subject_name, dev_df, cot=cot)
else:
history = ''
answers = ['NA'] * len(test_df) if do_test is True else list(test_df['answer'])
for row_index, row in tqdm(test_df.iterrows(), total=len(test_df)):
question = self.format_example(row, include_answer=False, cot=cot,with_prompt=with_prompt)
instruction = history + question
if with_prompt:
prompt_template = (
"Below is an instruction that describes a task. "
"Write a response that appropriately completes the request.\n\n"
"### Instruction:\n{instruction}\n\n### Response: ")
instruction = prompt_template.format_map({'instruction': instruction,'subject':subject_name})
inputs = self.tokenizer(instruction, return_tensors="pt")
generation_output = self.model.generate(
input_ids = inputs["input_ids"].to(self.device),
attention_mask = inputs['attention_mask'].to(self.device),
eos_token_id=self.tokenizer.eos_token_id,
pad_token_id=self.tokenizer.pad_token_id,
**self.generation_config
)
batch_size, length = inputs.input_ids.shape
if constrained_decoding is True:
logits = generation_output.scores[0][0]
logits = logits.float().cpu().detach()
choices1_logits = logits[[self.sA_id,self.sB_id,self.sC_id,self.sD_id]]
choices2_logits = logits[[self.A_id,self.B_id,self.C_id,self.D_id]]
choicesAll_logits = (choices1_logits + choices2_logits).numpy()
assert not (np.any(np.isinf(choicesAll_logits)) or np.any(np.isnan(choicesAll_logits)))
ans = {0: "A", 1: "B", 2: "C", 3: "D"}[np.argmax(choicesAll_logits)]
response = self.tokenizer.decode([logits.argmax(-1).item()])
else:
response = self.tokenizer.decode(generation_output[0, length:], skip_special_tokens=True)
ans, direct_extract = self.extract_answer(row, response)
if ans == answers[row_index]:
correct_num += 1
correct = 1
else:
correct = 0
print(f"\n=======begin {str(row_index)}=======")
print("question: ", question)
print("response: ", response)
print("ans: ", ans)
print("ground truth: ", answers[row_index], "\n")
if save_result_dir:
result.append(response)
score.append(correct)
print(f"=======end {str(row_index)}=======")
all_answers[str(row_index)] = ans
correct_ratio = 100*correct_num/len(answers)
if save_result_dir:
test_df['model_output'] = result
test_df['correctness'] = score
test_df.to_csv(os.path.join(save_result_dir, f'{subject_name}_test.csv'))
return correct_ratio, all_answers
def format_example(self, line, include_answer=True, cot=False, with_prompt=False):
example = line['question']
for choice in self.choices:
example += f'\n{choice}. {line[f"{choice}"]}'
if include_answer:
if cot:
example += "\n答案:让我们一步一步思考,\n" + \
line["explanation"] + f"\n所以答案是{line['answer']}。\n\n"
else:
example += '\n答案:' + line["answer"] + '\n\n'
else:
if with_prompt is False:
if cot:
example += "\n答案:让我们一步一步思考,\n1."
else:
example += '\n答案:'
else:
if cot:
example += "\n答案是什么?让我们一步一步思考,\n1."
else:
example += '\n答案是什么? '
return example
def generate_few_shot_prompt(self, subject, dev_df, cot=False):
prompt = f"以下是中国关于{subject}考试的单项选择题,请选出其中的正确答案。\n\n"
k = self.k
if self.k == -1:
k = dev_df.shape[0]
for i in range(k):
prompt += self.format_example(
dev_df.iloc[i, :],
include_answer=True,
cot=cot
)
return prompt
def extract_answer(self, line, gen_ans):
m = re.findall(r'所以答案是(.+?)。', gen_ans, re.M)
if len(m) > 0 and m[-1] in self.choices:
return m[-1], True
answer_patterns = [
r'([ABCD])是正确的',
r'选项([ABCD])正确',
r'答案为([ABCD])',
r'答案是([ABCD])',
r'答案([ABCD])',
r'选择([ABCD])',
r'答案:([ABCD])',
r'选择答案([ABCD])'
]
# RE extraction
for answer_pattern in answer_patterns:
m = re.search(answer_pattern, gen_ans, re.M)
if m:
answer = m.group(1)
return answer, False
# only containing one choice-character
m = re.findall(r'[ABCD]', gen_ans, re.M)
if len(m) >= 1:
answer = m[0]
return answer, False
# only containing one choice-context
choices_dict = {}
pattern = ""
for c in self.choices:
choices_dict[str(line[f'{c}'])] = c
pattern += re.escape(str(line[f'{c}']))+"|"
pattern = pattern[:-1]
m = re.findall(pattern, gen_ans, re.M)
print("w/ escape:",repr(pattern),gen_ans,(len(m)>=1))
if len(m) >= 1:
answer = choices_dict[m[0]]
return answer, False
return random.choice('ABCD'), False
================================================
FILE: scripts/ceval/subject_mapping.json
================================================
{
"computer_network": [
"Computer Network",
"\u8ba1\u7b97\u673a\u7f51\u7edc",
"STEM"
],
"operating_system": [
"Operating System",
"\u64cd\u4f5c\u7cfb\u7edf",
"STEM"
],
"computer_architecture": [
"Computer Architecture",
"\u8ba1\u7b97\u673a\u7ec4\u6210",
"STEM"
],
"college_programming": [
"College Programming",
"\u5927\u5b66\u7f16\u7a0b",
"STEM"
],
"college_physics": [
"College Physics",
"\u5927\u5b66\u7269\u7406",
"STEM"
],
"college_chemistry": [
"College Chemistry",
"\u5927\u5b66\u5316\u5b66",
"STEM"
],
"advanced_mathematics": [
"Advanced Mathematics",
"\u9ad8\u7b49\u6570\u5b66",
"STEM"
],
"probability_and_statistics": [
"Probability and Statistics",
"\u6982\u7387\u7edf\u8ba1",
"STEM"
],
"discrete_mathematics": [
"Discrete Mathematics",
"\u79bb\u6563\u6570\u5b66",
"STEM"
],
"electrical_engineer": [
"Electrical Engineer",
"\u6ce8\u518c\u7535\u6c14\u5de5\u7a0b\u5e08",
"STEM"
],
"metrology_engineer": [
"Metrology Engineer",
"\u6ce8\u518c\u8ba1\u91cf\u5e08",
"STEM"
],
"high_school_mathematics": [
"High School Mathematics",
"\u9ad8\u4e2d\u6570\u5b66",
"STEM"
],
"high_school_physics": [
"High School Physics",
"\u9ad8\u4e2d\u7269\u7406",
"STEM"
],
"high_school_chemistry": [
"High School Chemistry",
"\u9ad8\u4e2d\u5316\u5b66",
"STEM"
],
"high_school_biology": [
"High School Biology",
"\u9ad8\u4e2d\u751f\u7269",
"STEM"
],
"middle_school_mathematics": [
"Middle School Mathematics",
"\u521d\u4e2d\u6570\u5b66",
"STEM"
],
"middle_school_biology": [
"Middle School Biology",
"\u521d\u4e2d\u751f\u7269",
"STEM"
],
"middle_school_physics": [
"Middle School Physics",
"\u521d\u4e2d\u7269\u7406",
"STEM"
],
"middle_school_chemistry": [
"Middle School Chemistry",
"\u521d\u4e2d\u5316\u5b66",
"STEM"
],
"veterinary_medicine": [
"Veterinary Medicine",
"\u517d\u533b\u5b66",
"STEM"
],
"college_economics": [
"College Economics",
"\u5927\u5b66\u7ecf\u6d4e\u5b66",
"Social Science"
],
"business_administration": [
"Business Administration",
"\u5de5\u5546\u7ba1\u7406",
"Social Science"
],
"marxism": [
"Marxism",
"\u9a6c\u514b\u601d\u4e3b\u4e49\u57fa\u672c\u539f\u7406",
"Social Science"
],
"mao_zedong_thought": [
"Mao Zedong Thought",
"\u6bdb\u6cfd\u4e1c\u601d\u60f3\u548c\u4e2d\u56fd\u7279\u8272\u793e\u4f1a\u4e3b\u4e49\u7406\u8bba\u4f53\u7cfb\u6982\u8bba",
"Social Science"
],
"education_science": [
"Education Science",
"\u6559\u80b2\u5b66",
"Social Science"
],
"teacher_qualification": [
"Teacher Qualification",
"\u6559\u5e08\u8d44\u683c",
"Social Science"
],
"high_school_politics": [
"High School Politics",
"\u9ad8\u4e2d\u653f\u6cbb",
"Social Science"
],
"high_school_geography": [
"High School Geography",
"\u9ad8\u4e2d\u5730\u7406",
"Social Science"
],
"middle_school_politics": [
"Middle School Politics",
"\u521d\u4e2d\u653f\u6cbb",
"Social Science"
],
"middle_school_geography": [
"Middle School Geography",
"\u521d\u4e2d\u5730\u7406",
"Social Science"
],
"modern_chinese_history": [
"Modern Chinese History",
"\u8fd1\u4ee3\u53f2\u7eb2\u8981",
"Humanities"
],
"ideological_and_moral_cultivation": [
"Ideological and Moral Cultivation",
"\u601d\u60f3\u9053\u5fb7\u4fee\u517b\u4e0e\u6cd5\u5f8b\u57fa\u7840",
"Humanities"
],
"logic": [
"Logic",
"\u903b\u8f91\u5b66",
"Humanities"
],
"law": [
"Law",
"\u6cd5\u5b66",
"Humanities"
],
"chinese_language_and_literature": [
"Chinese Language and Literature",
"\u4e2d\u56fd\u8bed\u8a00\u6587\u5b66",
"Humanities"
],
"art_studies": [
"Art Studies",
"\u827a\u672f\u5b66",
"Humanities"
],
"professional_tour_guide": [
"Professional Tour Guide",
"\u5bfc\u6e38\u8d44\u683c",
"Humanities"
],
"legal_professional": [
"Legal Professional",
"\u6cd5\u5f8b\u804c\u4e1a\u8d44\u683c",
"Humanities"
],
"high_school_chinese": [
"High School Chinese",
"\u9ad8\u4e2d\u8bed\u6587",
"Humanities"
],
"high_school_history": [
"High School History",
"\u9ad8\u4e2d\u5386\u53f2",
"Humanities"
],
"middle_school_history": [
"Middle School History",
"\u521d\u4e2d\u5386\u53f2",
"Humanities"
],
"civil_servant": [
"Civil Servant",
"\u516c\u52a1\u5458",
"Other"
],
"sports_science": [
"Sports Science",
"\u4f53\u80b2\u5b66",
"Other"
],
"plant_protection": [
"Plant Protection",
"\u690d\u7269\u4fdd\u62a4",
"Other"
],
"basic_medicine": [
"Basic Medicine",
"\u57fa\u7840\u533b\u5b66",
"Other"
],
"clinical_medicine": [
"Clinical Medicine",
"\u4e34\u5e8a\u533b\u5b66",
"Other"
],
"urban_and_rural_planner": [
"Urban and Rural Planner",
"\u6ce8\u518c\u57ce\u4e61\u89c4\u5212\u5e08",
"Other"
],
"accountant": [
"Accountant",
"\u6ce8\u518c\u4f1a\u8ba1\u5e08",
"Other"
],
"fire_engineer": [
"Fire Engineer",
"\u6ce8\u518c\u6d88\u9632\u5de5\u7a0b\u5e08",
"Other"
],
"environmental_impact_assessment_engineer": [
"Environmental Impact Assessment Engineer",
"\u73af\u5883\u5f71\u54cd\u8bc4\u4ef7\u5de5\u7a0b\u5e08",
"Other"
],
"tax_accountant": [
"Tax Accountant",
"\u7a0e\u52a1\u5e08",
"Other"
],
"physician": [
"Physician",
"\u533b\u5e08\u8d44\u683c",
"Other"
]
}
================================================
FILE: scripts/crawl_prompt.py
================================================
import openai
import sys
import random
openai.api_key = "" # you must provide your OpenAI API key before crawling
if not openai.api_key:
raise ValueError("OpenAI API key not provided. Please set the 'openai.api_key' variable.")
def return_random_prompt():
system_prompt = "你需要尽可能给出多样化的任务指令和对应的回答。我们将用于人工评估ChatGPT模型对指令的完成情况。要求:\n"
# generate random topics
topic_list = ["科技", "娱乐", "体育", "金融", "时政", "教育", "医疗", "旅游", "美食", "汽车", "房产", "文化", "历史", "地理", "自然", "人文", "社会", "法律", "军事", "政治", "经济", "文学", "艺术", "宗教", "哲学", "语言", "数学", "物理", "化学", "生物", "地球科学", "天文学", "计算机科学", "工程", "建筑", "设计", "音乐", "舞蹈", "电影", "电视", "动漫", "游戏", "健康", "美容", "时尚", "家居", "家电", "家具", "家装", "母婴", "育儿", "职场", "工作", "生活", "养生", "心理", "情感", "人际", "社交", "交友", "恋爱", "婚姻", "家庭", "亲子", "宠物", "动物", "植物", "食品", "饮料", "餐饮", "酒店", "购物", "消费", "理财", "税务", "法规", "法院", "司法", "刑事", "民事", "行政", "战争"]
system_prompt += "1. 主题多样化,涵盖各个领域,例如:" + "、".join(random.sample(topic_list, 10)) + "等。\n"
# generate random tasks
task_list = ["开放式生成", "分类", "问答", "编辑", "摘要", "写作", "翻译", "写代码", "分析", "代码解析", "常识推理", "写信", "抽取", "推荐"]
system_prompt += "2. 表述多样化,结合真实问题;指令类型多样化,例如:" + "、".join(random.sample(task_list, 10)) + "等。\n"
# other requirements
system_prompt += "3. 如果遇到无法处理的指令(只靠文本无法回答),给出无法处理的回复。\n"
system_prompt += "4. 除非特别要求,请使用中文,指令可以是命令句、疑问句、或其他合适的类型。\n"
system_prompt += "5. 为指令生成一个适当且涉及真实情况的 ,不应该只包含简单的占位符。 应提供实质性的内容,具有挑战性。字数不超过" + str(random.randint(80, 120)) + "字。\n"
system_prompt += "6. 应该是对指令的适当且真实的回应,不能只回复答应或拒绝请求。如果需要额外信息才能回复时,请努力预测用户意图并尝试回复。的内容应少于" + str(random.randint(128, 512)) + "字。\n\n"
system_prompt += "请给出满足条件的20条JSON格式数据:\n"
return system_prompt
if __name__ == "__main__":
if len(sys.argv) != 2:
print("Usage: python crawl_prompt.py ")
exit(1)
output_file = open(sys.argv[1], 'w')
MAX_EPOCHS = 1 # number of data to generate (each prompt contains 20 JSON-formatted data)
for k in range(MAX_EPOCHS):
response = openai.ChatCompletion.create(
model="gpt-3.5-turbo", # here we use `gpt-3.5-turbo` model, while Stanford-Alpaca uses `text-davinci-003`
messages=[
{"role": "user", "content": return_random_prompt()},
]
)
output_file.write(response["choices"][0]["message"]["content"] + '\n')
output_file.close()
================================================
FILE: scripts/inference/gradio_demo.py
================================================
import torch
from transformers import (
LlamaForCausalLM,
LlamaTokenizer,
StoppingCriteria,
)
import gradio as gr
import argparse
import os
from queue import Queue
from threading import Thread
import traceback
import gc
# Parse command-line arguments
parser = argparse.ArgumentParser()
parser.add_argument(
'--base_model',
default=None,
type=str,
required=True,
help='Base model path')
parser.add_argument('--lora_model', default=None, type=str,
help="If None, perform inference on the base model")
parser.add_argument(
'--tokenizer_path',
default=None,
type=str,
help='If None, lora model path or base model path will be used')
parser.add_argument(
'--gpus',
default="0",
type=str,
help='If None, cuda:0 will be used. Inference using multi-cards: --gpus=0,1,... ')
parser.add_argument('--share', default=True, help='Share gradio domain name')
parser.add_argument('--port', default=19324, type=int, help='Port of gradio demo')
parser.add_argument(
'--max_memory',
default=256,
type=int,
help='Maximum input prompt length, if exceeded model will receive prompt[-max_memory:]')
parser.add_argument(
'--load_in_8bit',
action='store_true',
help='Use 8 bit quantified model')
parser.add_argument(
'--only_cpu',
action='store_true',
help='Only use CPU for inference')
parser.add_argument(
'--alpha',
type=str,
default="1.0",
help="The scaling factor of NTK method, can be a float or 'auto'. ")
args = parser.parse_args()
if args.only_cpu is True:
args.gpus = ""
from patches import apply_attention_patch, apply_ntk_scaling_patch
apply_attention_patch(use_memory_efficient_attention=True)
apply_ntk_scaling_patch(args.alpha)
# Set CUDA devices if available
os.environ["CUDA_VISIBLE_DEVICES"] = args.gpus
# Peft library can only import after setting CUDA devices
from peft import PeftModel
# Set up the required components: model and tokenizer
def setup():
global tokenizer, model, device, share, port, max_memory
max_memory = args.max_memory
port = args.port
share = args.share
load_in_8bit = args.load_in_8bit
load_type = torch.float16
if torch.cuda.is_available():
device = torch.device(0)
else:
device = torch.device('cpu')
if args.tokenizer_path is None:
args.tokenizer_path = args.lora_model
if args.lora_model is None:
args.tokenizer_path = args.base_model
tokenizer = LlamaTokenizer.from_pretrained(args.tokenizer_path)
base_model = LlamaForCausalLM.from_pretrained(
args.base_model,
load_in_8bit=load_in_8bit,
torch_dtype=load_type,
low_cpu_mem_usage=True,
device_map='auto',
)
model_vocab_size = base_model.get_input_embeddings().weight.size(0)
tokenzier_vocab_size = len(tokenizer)
print(f"Vocab of the base model: {model_vocab_size}")
print(f"Vocab of the tokenizer: {tokenzier_vocab_size}")
if model_vocab_size != tokenzier_vocab_size:
assert tokenzier_vocab_size > model_vocab_size
print("Resize model embeddings to fit tokenizer")
base_model.resize_token_embeddings(tokenzier_vocab_size)
if args.lora_model is not None:
print("loading peft model")
model = PeftModel.from_pretrained(
base_model,
args.lora_model,
torch_dtype=load_type,
device_map='auto',
)
else:
model = base_model
if device == torch.device('cpu'):
model.float()
model.eval()
# Reset the user input
def reset_user_input():
return gr.update(value='')
# Reset the state
def reset_state():
return []
# Generate the prompt for the input of LM model
def generate_prompt(instruction):
return f"""
Below is an instruction that describes a task. Write a response that appropriately completes the request.
{instruction}
"""
# User interaction function for chat
def user(user_message, history):
return gr.update(value="", interactive=False), history + \
[[user_message, None]]
class Stream(StoppingCriteria):
def __init__(self, callback_func=None):
self.callback_func = callback_func
def __call__(self, input_ids, scores) -> bool:
if self.callback_func is not None:
self.callback_func(input_ids[0])
return False
class Iteratorize:
"""
Transforms a function that takes a callback
into a lazy iterator (generator).
Adapted from: https://stackoverflow.com/a/9969000
"""
def __init__(self, func, kwargs=None, callback=None):
self.mfunc = func
self.c_callback = callback
self.q = Queue()
self.sentinel = object()
self.kwargs = kwargs or {}
self.stop_now = False
def _callback(val):
if self.stop_now:
raise ValueError
self.q.put(val)
def gentask():
try:
ret = self.mfunc(callback=_callback, **self.kwargs)
except ValueError:
pass
except Exception:
traceback.print_exc()
clear_torch_cache()
self.q.put(self.sentinel)
if self.c_callback:
self.c_callback(ret)
self.thread = Thread(target=gentask)
self.thread.start()
def __iter__(self):
return self
def __next__(self):
obj = self.q.get(True, None)
if obj is self.sentinel:
raise StopIteration
else:
return obj
def __del__(self):
clear_torch_cache()
def __enter__(self):
return self
def __exit__(self, exc_type, exc_val, exc_tb):
self.stop_now = True
clear_torch_cache()
def clear_torch_cache():
gc.collect()
if torch.cuda.device_count() > 0:
torch.cuda.empty_cache()
# Perform prediction based on the user input and history
@torch.no_grad()
def predict(
history,
max_new_tokens=128,
top_p=0.75,
temperature=0.1,
top_k=40,
do_sample=True,
repetition_penalty=1.0
):
history[-1][1] = ""
if len(history) != 0:
input = "".join(["### Instruction:\n" +
i[0] +
"\n\n" +
"### Response: " +
i[1] +
("\n\n" if i[1] != "" else "") for i in history])
if len(input) > max_memory:
input = input[-max_memory:]
prompt = generate_prompt(input)
inputs = tokenizer(prompt, return_tensors="pt")
input_ids = inputs["input_ids"].to(device)
generate_params = {
'input_ids': input_ids,
'max_new_tokens': max_new_tokens,
'top_p': top_p,
'temperature': temperature,
'top_k': top_k,
'do_sample': do_sample,
'repetition_penalty': repetition_penalty,
}
def generate_with_callback(callback=None, **kwargs):
if 'stopping_criteria' in kwargs:
kwargs['stopping_criteria'].append(Stream(callback_func=callback))
else:
kwargs['stopping_criteria'] = [Stream(callback_func=callback)]
clear_torch_cache()
with torch.no_grad():
model.generate(**kwargs)
def generate_with_streaming(**kwargs):
return Iteratorize(generate_with_callback, kwargs, callback=None)
with generate_with_streaming(**generate_params) as generator:
for output in generator:
next_token_ids = output[len(input_ids[0]):]
if next_token_ids[0] == tokenizer.eos_token_id:
break
new_tokens = tokenizer.decode(
next_token_ids, skip_special_tokens=True)
if isinstance(tokenizer, LlamaTokenizer) and len(next_token_ids) > 0:
if tokenizer.convert_ids_to_tokens(int(next_token_ids[0])).startswith('▁'):
new_tokens = ' ' + new_tokens
history[-1][1] = new_tokens
yield history
if len(next_token_ids) >= max_new_tokens:
break
# Call the setup function to initialize the components
setup()
# Create the Gradio interface
with gr.Blocks() as demo:
github_banner_path = 'https://raw.githubusercontent.com/ymcui/Chinese-LLaMA-Alpaca/main/pics/banner.png'
gr.HTML(f'
')
gr.Markdown("> 为了促进大模型在中文NLP社区的开放研究,本项目开源了中文LLaMA模型和指令精调的Alpaca大模型。这些模型在原版LLaMA的基础上扩充了中文词表并使用了中文数据进行二次预训练,进一步提升了中文基础语义理解能力。同时,中文Alpaca模型进一步使用了中文指令数据进行精调,显著提升了模型对指令的理解和执行能力。")
chatbot = gr.Chatbot()
with gr.Row():
with gr.Column(scale=4):
with gr.Column(scale=12):
user_input = gr.Textbox(
show_label=False,
placeholder="Shift + Enter发送消息...",
lines=10).style(
container=False)
with gr.Column(min_width=32, scale=1):
submitBtn = gr.Button("Submit", variant="primary")
with gr.Column(scale=1):
emptyBtn = gr.Button("Clear History")
max_new_token = gr.Slider(
0,
4096,
value=512,
step=1.0,
label="Maximum New Token Length",
interactive=True)
top_p = gr.Slider(0, 1, value=0.9, step=0.01,
label="Top P", interactive=True)
temperature = gr.Slider(
0,
1,
value=0.5,
step=0.01,
label="Temperature",
interactive=True)
top_k = gr.Slider(1, 40, value=40, step=1,
label="Top K", interactive=True)
do_sample = gr.Checkbox(
value=True,
label="Do Sample",
info="use random sample strategy",
interactive=True)
repetition_penalty = gr.Slider(
1.0,
3.0,
value=1.1,
step=0.1,
label="Repetition Penalty",
interactive=True)
params = [user_input, chatbot]
predict_params = [
chatbot,
max_new_token,
top_p,
temperature,
top_k,
do_sample,
repetition_penalty]
submitBtn.click(
user,
params,
params,
queue=False).then(
predict,
predict_params,
chatbot).then(
lambda: gr.update(
interactive=True),
None,
[user_input],
queue=False)
user_input.submit(
user,
params,
params,
queue=False).then(
predict,
predict_params,
chatbot).then(
lambda: gr.update(
interactive=True),
None,
[user_input],
queue=False)
submitBtn.click(reset_user_input, [], [user_input])
emptyBtn.click(reset_state, outputs=[chatbot], show_progress=True)
# Launch the Gradio interface
demo.queue().launch(
share=share,
inbrowser=True,
server_name='0.0.0.0',
server_port=port)
================================================
FILE: scripts/inference/inference_hf.py
================================================
import argparse
import json, os
parser = argparse.ArgumentParser()
parser.add_argument('--base_model', default=None, type=str, required=True)
parser.add_argument('--lora_model', default=None, type=str,help="If None, perform inference on the base model")
parser.add_argument('--tokenizer_path',default=None,type=str)
parser.add_argument('--data_file',default=None, type=str,help="A file that contains instructions (one instruction per line)")
parser.add_argument('--with_prompt',action='store_true',help="wrap the input with the prompt automatically")
parser.add_argument('--interactive',action='store_true',help="run in the instruction mode (single-turn)")
parser.add_argument('--predictions_file', default='./predictions.json', type=str)
parser.add_argument('--gpus', default="0", type=str)
parser.add_argument('--only_cpu',action='store_true',help='only use CPU for inference')
parser.add_argument('--alpha',type=str,default="1.0", help="The scaling factor of NTK method, can be a float or 'auto'. ")
parser.add_argument('--load_in_8bit',action='store_true', help="Load the LLM in the 8bit mode")
args = parser.parse_args()
if args.only_cpu is True:
args.gpus = ""
os.environ["CUDA_VISIBLE_DEVICES"] = args.gpus
import torch
from transformers import LlamaForCausalLM, LlamaTokenizer
from peft import PeftModel
from patches import apply_attention_patch, apply_ntk_scaling_patch
apply_attention_patch(use_memory_efficient_attention=True)
apply_ntk_scaling_patch(args.alpha)
generation_config = dict(
temperature=0.2,
top_k=40,
top_p=0.9,
do_sample=True,
num_beams=1,
repetition_penalty=1.1,
max_new_tokens=400
)
# The prompt template below is taken from llama.cpp
# and is slightly different from the one used in training.
# But we find it gives better results
prompt_input = (
"Below is an instruction that describes a task. "
"Write a response that appropriately completes the request.\n\n"
"### Instruction:\n\n{instruction}\n\n### Response:\n\n"
)
sample_data = ["为什么要减少污染,保护环境?"]
def generate_prompt(instruction, input=None):
if input:
instruction = instruction + '\n' + input
return prompt_input.format_map({'instruction': instruction})
if __name__ == '__main__':
load_type = torch.float16
if torch.cuda.is_available():
device = torch.device(0)
else:
device = torch.device('cpu')
if args.tokenizer_path is None:
args.tokenizer_path = args.lora_model
if args.lora_model is None:
args.tokenizer_path = args.base_model
tokenizer = LlamaTokenizer.from_pretrained(args.tokenizer_path)
base_model = LlamaForCausalLM.from_pretrained(
args.base_model,
load_in_8bit=args.load_in_8bit,
torch_dtype=load_type,
low_cpu_mem_usage=True,
device_map='auto',
)
model_vocab_size = base_model.get_input_embeddings().weight.size(0)
tokenzier_vocab_size = len(tokenizer)
print(f"Vocab of the base model: {model_vocab_size}")
print(f"Vocab of the tokenizer: {tokenzier_vocab_size}")
if model_vocab_size!=tokenzier_vocab_size:
assert tokenzier_vocab_size > model_vocab_size
print("Resize model embeddings to fit tokenizer")
base_model.resize_token_embeddings(tokenzier_vocab_size)
if args.lora_model is not None:
print("loading peft model")
model = PeftModel.from_pretrained(base_model, args.lora_model,torch_dtype=load_type,device_map='auto',)
else:
model = base_model
if device==torch.device('cpu'):
model.float()
# test data
if args.data_file is None:
examples = sample_data
else:
with open(args.data_file,'r') as f:
examples = [l.strip() for l in f.readlines()]
print("first 10 examples:")
for example in examples[:10]:
print(example)
model.eval()
with torch.no_grad():
if args.interactive:
print("Start inference with instruction mode.")
print('='*85)
print("+ 该模式下仅支持单轮问答,无多轮对话能力。\n"
"+ 如要进行多轮对话,请使用llama.cpp或llamachat工具。")
print('-'*85)
print("+ This mode only supports single-turn QA.\n"
"+ If you want to experience multi-turn dialogue, please use llama.cpp or llamachat.")
print('='*85)
while True:
raw_input_text = input("Input:")
if len(raw_input_text.strip())==0:
break
if args.with_prompt:
input_text = generate_prompt(instruction=raw_input_text)
else:
input_text = raw_input_text
inputs = tokenizer(input_text,return_tensors="pt") #add_special_tokens=False ?
generation_output = model.generate(
input_ids = inputs["input_ids"].to(device),
attention_mask = inputs['attention_mask'].to(device),
eos_token_id=tokenizer.eos_token_id,
pad_token_id=tokenizer.pad_token_id,
**generation_config
)
s = generation_output[0]
output = tokenizer.decode(s,skip_special_tokens=True)
if args.with_prompt:
response = output.split("### Response:")[1].strip()
else:
response = output
print("Response: ",response)
print("\n")
else:
print("Start inference.")
results = []
for index, example in enumerate(examples):
if args.with_prompt is True:
input_text = generate_prompt(instruction=example)
else:
input_text = example
inputs = tokenizer(input_text,return_tensors="pt") #add_special_tokens=False ?
generation_output = model.generate(
input_ids = inputs["input_ids"].to(device),
attention_mask = inputs['attention_mask'].to(device),
eos_token_id=tokenizer.eos_token_id,
pad_token_id=tokenizer.pad_token_id,
**generation_config
)
s = generation_output[0]
output = tokenizer.decode(s,skip_special_tokens=True)
if args.with_prompt:
response = output.split("### Response:")[1].strip()
else:
response = output
print(f"======={index}=======")
print(f"Input: {example}\n")
print(f"Output: {response}\n")
results.append({"Input":input_text,"Output":response})
dirname = os.path.dirname(args.predictions_file)
os.makedirs(dirname,exist_ok=True)
with open(args.predictions_file,'w') as f:
json.dump(results,f,ensure_ascii=False,indent=2)
with open(dirname+'/generation_config.json','w') as f:
json.dump(generation_config,f,ensure_ascii=False,indent=2)
================================================
FILE: scripts/inference/patches.py
================================================
import torch
from torch import nn
from typing import Optional, Tuple, Union
import transformers
from transformers.models.llama.modeling_llama import apply_rotary_pos_emb, rotate_half
import math
try:
from xformers import ops as xops
except ImportError:
xops = None
print(
"Xformers is not installed correctly. If you want to use memory_efficient_attention use the following command to install Xformers\npip install xformers."
)
STORE_KV_BEFORE_ROPE = False
USE_MEM_EFF_ATTENTION = False
ALPHA = 1.0
def apply_rotary_pos_emb_single(q, cos, sin, position_ids):
# The first two dimensions of cos and sin are always 1, so we can `squeeze` them.
cos = cos.squeeze(1).squeeze(0) # [seq_len, dim]
sin = sin.squeeze(1).squeeze(0) # [seq_len, dim]
cos = cos[position_ids].unsqueeze(1) # [bs, 1, seq_len, dim]
sin = sin[position_ids].unsqueeze(1) # [bs, 1, seq_len, dim]
q_embed = (q * cos) + (rotate_half(q) * sin)
return q_embed
def xformers_forward(
self,
hidden_states: torch.Tensor,
attention_mask: Optional[torch.Tensor] = None,
position_ids: Optional[torch.LongTensor] = None,
past_key_value: Optional[Tuple[torch.Tensor]] = None,
output_attentions: bool = False,
use_cache: bool = False,
) -> Tuple[torch.Tensor, Optional[torch.Tensor], Optional[Tuple[torch.Tensor]]]:
bsz, q_len, _ = hidden_states.size()
query_states = self.q_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
key_states = self.k_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
value_states = self.v_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
kv_seq_len = key_states.shape[-2]
if past_key_value is not None:
kv_seq_len += past_key_value[0].shape[-2]
if STORE_KV_BEFORE_ROPE is False:
cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
# [bsz, nh, t, hd]
if past_key_value is not None:
# reuse k, v, self_attention
key_states = torch.cat([past_key_value[0], key_states], dim=2)
value_states = torch.cat([past_key_value[1], value_states], dim=2)
past_key_value = (key_states, value_states) if use_cache else None
else:
if past_key_value is not None:
# reuse k, v, self_attention
key_states = torch.cat([past_key_value[0], key_states], dim=2)
value_states = torch.cat([past_key_value[1], value_states], dim=2)
past_key_value = (key_states, value_states) if use_cache else None
cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
query_states = apply_rotary_pos_emb_single(query_states, cos, sin, position_ids)
position_ids = torch.arange(kv_seq_len, dtype=torch.long, device=cos.device)
position_ids = position_ids.unsqueeze(0).view(-1, kv_seq_len)
key_states = apply_rotary_pos_emb_single(key_states, cos, sin, position_ids)
if xops is not None and USE_MEM_EFF_ATTENTION:
attn_weights = None
query_states = query_states.transpose(1, 2)
key_states = key_states.transpose(1, 2)
value_states = value_states.transpose(1, 2)
attn_bias = None if (query_states.size(1)==1 and key_states.size(1)>1) else xops.LowerTriangularMask()
attn_output = xops.memory_efficient_attention(
query_states, key_states, value_states, attn_bias=attn_bias, p=0)
else:
attn_weights = torch.matmul(query_states, key_states.transpose(2, 3)) / math.sqrt(self.head_dim)
if attn_weights.size() != (bsz, self.num_heads, q_len, kv_seq_len):
raise ValueError(
f"Attention weights should be of size {(bsz * self.num_heads, q_len, kv_seq_len)}, but is"
f" {attn_weights.size()}"
)
if attention_mask is not None:
if attention_mask.size() != (bsz, 1, q_len, kv_seq_len):
raise ValueError(
f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.size()}"
)
attn_weights = attn_weights + attention_mask
attn_weights = torch.max(
attn_weights, torch.tensor(torch.finfo(attn_weights.dtype).min, device=attn_weights.device)
)
# upcast attention to fp32
attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=torch.float32).to(query_states.dtype)
attn_output = torch.matmul(attn_weights, value_states)
if attn_output.size() != (bsz, self.num_heads, q_len, self.head_dim):
raise ValueError(
f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is"
f" {attn_output.size()}"
)
attn_output = attn_output.transpose(1, 2)
attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
attn_output = self.o_proj(attn_output)
if not output_attentions:
attn_weights = None
return attn_output, attn_weights, past_key_value
old_init = transformers.models.llama.modeling_llama.LlamaRotaryEmbedding.__init__
def adaptive_ntk_init(self, dim, max_position_embeddings=2048, base=10000, device=None):
self.dim = dim
self.alpha = ALPHA
if isinstance(ALPHA,(float,int)):
base = base * ALPHA ** (dim / (dim-2))
self.base = base
elif ALPHA=='auto':
self.base = base
else:
raise ValueError(ALPHA)
old_init(self, dim, max_position_embeddings, base, device)
ntk_inv_freq = 1.0 / (base ** (torch.arange(0, dim, 2).float().to(device) / dim))
self.register_buffer("ntk_inv_freq", ntk_inv_freq, persistent=False)
def adaptive_ntk_forward(self, x, seq_len=None):
if seq_len > self.max_seq_len_cached:
if isinstance(self.alpha,(float,int)):
self.max_seq_len_cached = seq_len
t = torch.arange(seq_len, device=x.device, dtype=self.ntk_inv_freq.dtype)
freqs = torch.einsum("i,j->ij", t, self.ntk_inv_freq)
emb = torch.cat((freqs, freqs), dim=-1).to(x.device)
self.register_buffer("cos_cached", emb.cos()[None, None, :, :], persistent=False)
self.register_buffer("sin_cached", emb.sin()[None, None, :, :], persistent=False)
return (
self.cos_cached[:, :, :seq_len, ...].to(dtype=x.dtype),
self.sin_cached[:, :, :seq_len, ...].to(dtype=x.dtype),
)
elif self.alpha=='auto':
t = torch.arange(seq_len, device=x.device, dtype=self.ntk_inv_freq.dtype)
dim = self.dim
alpha = (seq_len / 1024 - 1) * 1.1
base = self.base * alpha ** (dim / (dim-2))
ntk_inv_freq = 1.0 / (base ** (torch.arange(0, dim, 2).float().to(x.device) / dim ))
freqs = torch.einsum("i,j->ij", t, ntk_inv_freq)
emb = torch.cat((freqs, freqs), dim=-1).to(x.device)
cos_cached = emb.cos()[None, None, :, :]
sin_cached = emb.sin()[None, None, :, :]
return (
cos_cached[:, :, :seq_len, ...].to(dtype=x.dtype),
sin_cached[:, :, :seq_len, ...].to(dtype=x.dtype)
)
else:
return (
self.cos_cached[:, :, :seq_len, ...].to(dtype=x.dtype),
self.sin_cached[:, :, :seq_len, ...].to(dtype=x.dtype)
)
def apply_attention_patch(
use_memory_efficient_attention=False,
store_kv_before_rope=False
):
global USE_MEM_EFF_ATTENTION, STORE_KV_BEFORE_ROPE
if use_memory_efficient_attention is True and xops is not None:
USE_MEM_EFF_ATTENTION = use_memory_efficient_attention
print("USE_MEM_EFF_ATTENTION: ",USE_MEM_EFF_ATTENTION)
STORE_KV_BEFORE_ROPE = store_kv_before_rope
print("STORE_KV_BEFORE_ROPE:", STORE_KV_BEFORE_ROPE)
transformers.models.llama.modeling_llama.LlamaAttention.forward = xformers_forward
def apply_ntk_scaling_patch(alpha: Union[float,str]):
global ALPHA
ALPHA = alpha
try:
ALPHA = float(ALPHA)
except ValueError:
if ALPHA!="auto":
raise ValueError(f"Alpha can only be a float or 'auto', but given {ALPHA}")
print(f"Apply NTK scaling with ALPHA={ALPHA}")
transformers.models.llama.modeling_llama.LlamaRotaryEmbedding.__init__ = adaptive_ntk_init
transformers.models.llama.modeling_llama.LlamaRotaryEmbedding.forward = adaptive_ntk_forward
================================================
FILE: scripts/langchain/doc.txt
================================================
李白[注 1](701年5月19日—762年11月30日),字太白,号青莲居士,中国唐朝诗人。李白自言祖籍陇西成纪(今甘肃静宁西南),汉飞将军李广后裔,西凉武昭王李暠之后,与李唐皇室同宗。
一说其幼时内迁,寄籍剑南道绵州昌隆(今四川省江油市青莲镇)。一说先人隋末被窜于碎叶,出生于碎叶,属唐安西都护府(今吉尔吉斯斯坦共和国楚河州托克马克市)。有“诗仙”、“诗侠”、“酒仙”、“谪仙人”等称呼,活跃于盛唐[1],为杰出的浪漫主义诗人。与杜甫合称“李杜”[注 2]。被贺知章呼为“天上谪仙”、“李谪仙”。
李白的诗歌在唐朝已被选进殷璠编选的《河岳英灵集》、于敦煌石室发现的《唐写本唐人选唐诗》、韦庄编选的《又玄集》和韦縠编选的《才调集》。唐文宗御封李白的诗歌、裴旻的剑舞、张旭的草书称为“三绝”[2]。其作品想像奇特丰富,风格雄奇浪漫,意境独特,清新俊逸;善于利用夸饰与譬喻等手法、自然优美的词句,表现出奔放的情感。诗句行云流水,浑然天成。李白诗篇传诵千年,众多诗句已成经典,清赵翼称:“李杜诗篇万口传”(例如“抽刀断水水更流,举杯消愁愁更愁”等,更被谱入曲)。李白在诗歌的艺术成就被认为是中国浪漫主义诗歌的巅峰。诗作在全唐诗收录于卷161至卷185。有《李太白集》传世。杜甫曾经这样评价过李白的文章:“笔落惊风雨,诗成泣鬼神”、“白也诗无敌,飘然思不群”。
生平
早年
据《新唐书》记载李白为兴圣皇帝(凉武昭王李暠)九世孙[3],如果按照这个说法李白与李唐诸王实际上同宗,应是唐太宗李世民的同辈族弟。亦有野史说其祖是李建成或李元吉,因为被李世民族灭而逃往西域;但此说缺乏佐证,且李建成、李元吉诸子尚在幼年即在玄武门之变后全数被害,留有亲生后嗣的可能性很小。据《旧唐书》记载,李白之父李客为任城尉。更为了学习而隐居。
李白于武则天大足元年(701年)[4]出生,关于其出生地有多种说法,现在主要有剑南道绵州昌隆县(今四川省江油市)[5]青莲乡(今青莲镇)和西域的碎叶(Suyab,位于今吉尔吉斯托克马克附近)[6]这两种说法,其中后一种说法认为李白直到四岁时(705年)才跟随他的父亲李客迁居蜀地,入籍绵州。李白自四岁(705年)接受启蒙教育,从景云元年(710年)开始,李白开始读诸子史籍[7],开元三年时十四岁(715年)——喜好作赋、剑术、奇书、神仙:“十五观奇书,做赋凌相如”。在青年时期开始在中国各地游历。开元五年左右,李白曾拜撰写《长短经》的赵蕤为师,学习一年有余,这段时期的学习对李白产生了深远的影响。开元六年,在戴天山(约在四川省昌隆县北五十里处)大明寺读书。二十五岁时只身出四川,开始了广泛漫游,南到洞庭湘江,东至吴、越,寓居在安陆(今湖北省安陆市)、应山(今湖北省广水市)。
中年
李白曾经在唐玄宗天宝元年(742年)供奉翰林。有一次皇帝因酒酣问李白说:“我朝与天后(武后)之朝何如?”白曰:“天后朝政出多门,国由奸幸,任人之道,如小儿市瓜,不择香味,惟拣肥大者;我朝任人如淘沙取金,剖石采用,皆得其精粹者。”玄宗听后大笑不止[8][9]。但是由于他桀骜不驯的性格,所以仅仅不到两年他就离开了长安。据说是因为他作的《清平调》得罪了当时宠冠后宫的杨贵妃(因李白命“力士脱靴”,高力士引以为大耻,因而以言语诱使杨贵妃认为“可怜飞燕倚新妆”几句是讽刺她)而不容于宫中[注 3]。天宝三年(745年)“恳求还山,帝赐金放还”,离开长安。
后在洛阳与另两位著名诗人杜甫、高适相识,并结为好友。
晚年
天宝十一年(752年)李白年届五十二岁,北上途中游广平郡邯郸、临洺、清漳等地。十月,抵幽州。初有立功边疆思想,在边地习骑射。后发现安禄山野心,登黄金台痛哭。不久即离幽州南下。
安史之乱爆发时,李白游华山,南下回宣城,后上庐山。756年12月,李白被三次邀请,下山赴寻阳入永王李璘幕僚[10]。永王触怒唐肃宗被杀后,李白也获罪入狱。幸得郭子仪力保,方得免死,改为流徙夜郎(今贵州关岭县一带),在途经巫山时遇赦,此时他已经59岁。(参见李璘之乱)
李白晚年在江南一带漂泊。在他61岁时,听到太尉李光弼率领大军讨伐安史叛军,于是他北上准备追随李光弼从军杀敌,但是中途因病折回。第二年,李白投奔他的族叔、当时在当涂(今属安徽省马鞍山)当县令的李阳冰。同年11月,李白病逝于寓所,终年61岁,葬当涂龙山。唐宪宗元和十二年(817年),宣歙观察使范传正根据李白生前“志在青山”的遗愿,将其墓迁至当涂青山。
去世
《新唐书》记载,唐代宗继位后以左拾遗召李白,但李白当时已去世。
李阳冰在《草堂集序》中说李白是病死的[11];皮日休在诗作中记载,李白是患“腐胁疾”而死的[12]。
《旧唐书》则记载,李白流放虽然遇赦,但因途中饮酒过度,醉死于宣城。中国民间有“太白捞月”的传说:李白在舟中赏月,饮酒大醉,想要跳下船至水里捞月而溺死[13][14][15];在民间的求签活动中亦有“太白捞月”一签文,乃是下下签[16]。
作品
李白一生创作大量的诗歌,绝大多数已散佚[17],流传至今的只有九百多首。他的诗歌创作涉及的中国古典诗歌的题材非常广泛,而且在许多题材都有名作出现,而且因为际遇的不同,每个时期的诗风都有所不同。
================================================
FILE: scripts/langchain/langchain_qa.py
================================================
import argparse
import os
parser = argparse.ArgumentParser()
parser.add_argument('--file_path',required=True,type=str)
parser.add_argument('--embedding_path',required=True,type=str)
parser.add_argument('--model_path',required=True,type=str)
parser.add_argument('--gpus', default="0", type=str)
parser.add_argument('--chain_type', default="refine", type=str)
args = parser.parse_args()
os.environ["CUDA_VISIBLE_DEVICES"] = args.gpus
# os.environ['PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION']='python'
file_path = args.file_path
embedding_path = args.embedding_path
model_path = args.model_path
import torch
from langchain import HuggingFacePipeline
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain.document_loaders import TextLoader
from langchain.prompts import PromptTemplate
from langchain.chains import RetrievalQA
from langchain.embeddings.huggingface import HuggingFaceEmbeddings
prompt_template = ("Below is an instruction that describes a task. "
"Write a response that appropriately completes the request.\n\n"
"### Instruction:\n{context}\n{question}\n\n### Response: ")
refine_prompt_template = (
"Below is an instruction that describes a task. "
"Write a response that appropriately completes the request.\n\n"
"### Instruction:\n"
"这是原始问题: {question}\n"
"已有的回答: {existing_answer}\n"
"现在还有一些文字,(如果有需要)你可以根据它们完善现有的回答。"
"\n\n"
"{context_str}\n"
"\\nn"
"请根据新的文段,进一步完善你的回答。\n\n"
"### Response: "
)
initial_qa_template = (
"Below is an instruction that describes a task. "
"Write a response that appropriately completes the request.\n\n"
"### Instruction:\n"
"以下为背景知识:\n"
"{context_str}"
"\n"
"请根据以上背景知识, 回答这个问题:{question}。\n\n"
"### Response: "
)
if __name__ == '__main__':
load_type = torch.float16
if torch.cuda.is_available():
device = torch.device(0)
else:
device = torch.device('cpu')
loader = TextLoader(file_path)
documents = loader.load()
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=600, chunk_overlap=100)
texts = text_splitter.split_documents(documents)
print("Loading the embedding model...")
embeddings = HuggingFaceEmbeddings(model_name=embedding_path)
docsearch = FAISS.from_documents(texts, embeddings)
print("loading LLM...")
model = HuggingFacePipeline.from_model_id(model_id=model_path,
task="text-generation",
model_kwargs={
"torch_dtype" : load_type,
"low_cpu_mem_usage" : True,
"temperature": 0.2,
"max_length": 1000,
"device_map": "auto",
"repetition_penalty":1.1}
)
if args.chain_type == "stuff":
PROMPT = PromptTemplate(
template=prompt_template, input_variables=["context", "question"]
)
chain_type_kwargs = {"prompt": PROMPT}
qa = RetrievalQA.from_chain_type(
llm=model,
chain_type="stuff",
retriever=docsearch.as_retriever(search_kwargs={"k": 1}),
chain_type_kwargs=chain_type_kwargs)
elif args.chain_type == "refine":
refine_prompt = PromptTemplate(
input_variables=["question", "existing_answer", "context_str"],
template=refine_prompt_template,
)
initial_qa_prompt = PromptTemplate(
input_variables=["context_str", "question"],
template=initial_qa_template,
)
chain_type_kwargs = {"question_prompt": initial_qa_prompt, "refine_prompt": refine_prompt}
qa = RetrievalQA.from_chain_type(
llm=model, chain_type="refine",
retriever=docsearch.as_retriever(search_kwargs={"k": 1}),
chain_type_kwargs=chain_type_kwargs)
while True:
query = input("请输入问题:")
if len(query.strip())==0:
break
print(qa.run(query))
================================================
FILE: scripts/langchain/langchain_sum.py
================================================
import argparse
import os
parser = argparse.ArgumentParser()
parser.add_argument('--file_path',required=True,type=str)
parser.add_argument('--model_path',required=True,type=str)
parser.add_argument('--gpus', default="0", type=str)
parser.add_argument('--chain_type', default="refine", type=str)
args = parser.parse_args()
os.environ["CUDA_VISIBLE_DEVICES"] = args.gpus
# os.environ['PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION']='python'
file_path = args.file_path
model_path = args.model_path
import torch
from langchain import HuggingFacePipeline
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.prompts import PromptTemplate
from langchain.chains.summarize import load_summarize_chain
prompt_template = ("Below is an instruction that describes a task. "
"Write a response that appropriately completes the request.\n\n"
"### Instruction:\n请为以下文字写一段摘要:\n{text}\n\n### Response: ")
refine_template = (
"Below is an instruction that describes a task."
"Write a response that appropriately completes the request.\n\n"
"### Instruction:\n"
"已有一段摘要:{existing_answer}\n"
"现在还有一些文字,(如果有需要)你可以根据它们完善现有的摘要。"
"\n"
"{text}\n"
"\n"
"如果这段文字没有用,返回原来的摘要即可。请你生成一个最终的摘要。"
"\n\n### Response: "
)
if __name__ == '__main__':
load_type = torch.float16
if torch.cuda.is_available():
device = torch.device(0)
else:
device = torch.device('cpu')
text_splitter = RecursiveCharacterTextSplitter(chunk_size=600, chunk_overlap=100, length_function=len)
with open(file_path) as f:
text = f.read()
docs = text_splitter.create_documents([text])
print("loading LLM...")
model = HuggingFacePipeline.from_model_id(model_id=model_path,
task="text-generation",
model_kwargs={
"torch_dtype" : load_type,
"low_cpu_mem_usage" : True,
"temperature": 0.2,
"max_length": 1000,
"device_map": "auto",
"repetition_penalty":1.1}
)
PROMPT = PromptTemplate(template=prompt_template, input_variables=["text"])
REFINE_PROMPT = PromptTemplate(
template=refine_template,input_variables=["existing_answer", "text"],
)
if args.chain_type == "stuff":
chain = load_summarize_chain(model, chain_type="stuff", prompt=PROMPT)
elif args.chain_type == "refine":
chain = load_summarize_chain(model, chain_type="refine", question_prompt=PROMPT, refine_prompt=REFINE_PROMPT)
print(chain.run(docs))
================================================
FILE: scripts/merge_llama_with_chinese_lora.py
================================================
"""
Usage:
python merge_llama_with_chinese_lora.py \
--base_model path/to/llama/model \
--lora_model path/to/first/lora/model [path/to/second/lora/model] \
--output_type [pth|huggingface] \
--output_dir path/to/output/dir
"""
import argparse
import json
import os
import gc
import torch
import peft
from peft import PeftModel
from transformers import LlamaForCausalLM, LlamaTokenizer
from huggingface_hub import hf_hub_download
parser = argparse.ArgumentParser()
parser.add_argument('--base_model', default=None, required=True,
type=str, help="Please specify a base_model")
parser.add_argument('--lora_model', default=None, required=True,
type=str, help="Please specify LoRA models to be merged (ordered); use commas to separate multiple LoRA models.")
parser.add_argument('--offload_dir', default=None, type=str,
help="(Optional) Please specify a temp folder for offloading (useful for low-RAM machines). Default None (disable offload).")
parser.add_argument('--output_type', default='pth',choices=['pth','huggingface'], type=str,
help="save the merged model in pth or huggingface format.")
parser.add_argument('--output_dir', default='./', type=str)
emb_to_model_size = {
4096 : '7B',
5120 : '13B',
6656 : '33B',
8192 : '65B',
}
num_shards_of_models = {'7B': 1, '13B': 2, '33B': 4, '65B': 8}
params_of_models = {
'7B':
{
"dim": 4096,
"multiple_of": 256,
"n_heads": 32,
"n_layers": 32,
"norm_eps": 1e-06,
"vocab_size": -1,
},
'13B':
{
"dim": 5120,
"multiple_of": 256,
"n_heads": 40,
"n_layers": 40,
"norm_eps": 1e-06,
"vocab_size": -1,
},
'33B':
{
"dim": 6656,
"multiple_of": 256,
"n_heads": 52,
"n_layers": 60,
"norm_eps": 1e-06,
"vocab_size": -1,
},
'65B':
{
"dim": 8192,
"multiple_of": 256,
"n_heads": 64,
"n_layers": 80,
"norm_eps": 1e-05,
"vocab_size": -1,
},
}
def transpose(weight, fan_in_fan_out):
return weight.T if fan_in_fan_out else weight
# Borrowed and modified from https://github.com/tloen/alpaca-lora
def translate_state_dict_key(k):
k = k.replace("base_model.model.", "")
if k == "model.embed_tokens.weight":
return "tok_embeddings.weight"
elif k == "model.norm.weight":
return "norm.weight"
elif k == "lm_head.weight":
return "output.weight"
elif k.startswith("model.layers."):
layer = k.split(".")[2]
if k.endswith(".self_attn.q_proj.weight"):
return f"layers.{layer}.attention.wq.weight"
elif k.endswith(".self_attn.k_proj.weight"):
return f"layers.{layer}.attention.wk.weight"
elif k.endswith(".self_attn.v_proj.weight"):
return f"layers.{layer}.attention.wv.weight"
elif k.endswith(".self_attn.o_proj.weight"):
return f"layers.{layer}.attention.wo.weight"
elif k.endswith(".mlp.gate_proj.weight"):
return f"layers.{layer}.feed_forward.w1.weight"
elif k.endswith(".mlp.down_proj.weight"):
return f"layers.{layer}.feed_forward.w2.weight"
elif k.endswith(".mlp.up_proj.weight"):
return f"layers.{layer}.feed_forward.w3.weight"
elif k.endswith(".input_layernorm.weight"):
return f"layers.{layer}.attention_norm.weight"
elif k.endswith(".post_attention_layernorm.weight"):
return f"layers.{layer}.ffn_norm.weight"
elif k.endswith("rotary_emb.inv_freq") or "lora" in k:
return None
else:
print(layer, k)
raise NotImplementedError
else:
print(k)
raise NotImplementedError
def unpermute(w):
return (
w.view(n_heads, 2, dim // n_heads // 2, dim).transpose(1, 2).reshape(dim, dim)
)
def save_shards(model_sd, num_shards: int):
# Add the no_grad context manager
with torch.no_grad():
if num_shards == 1:
new_state_dict = {}
for k, v in model_sd.items():
new_k = translate_state_dict_key(k)
if new_k is not None:
if "wq" in new_k or "wk" in new_k:
new_state_dict[new_k] = unpermute(v)
else:
new_state_dict[new_k] = v
os.makedirs(output_dir, exist_ok=True)
print(f"Saving shard 1 of {num_shards} into {output_dir}/consolidated.00.pth")
torch.save(new_state_dict, output_dir + "/consolidated.00.pth")
with open(output_dir + "/params.json", "w") as f:
json.dump(params, f)
else:
new_state_dicts = [dict() for _ in range(num_shards)]
for k in list(model_sd.keys()):
v = model_sd[k]
new_k = translate_state_dict_key(k)
if new_k is not None:
if new_k=='tok_embeddings.weight':
print(f"Processing {new_k}")
assert v.size(1)%num_shards==0
splits = v.split(v.size(1)//num_shards,dim=1)
elif new_k=='output.weight':
print(f"Processing {new_k}")
if v.size(0)%num_shards==0:
splits = v.split(v.size(0)//num_shards,dim=0)
else:
size_list = [v.size(0)//num_shards] * num_shards
size_list[-1] += v.size(0)%num_shards
splits = v.split(size_list, dim=0) # 13B: size_list == [24976,24977]
elif new_k=='norm.weight':
print(f"Processing {new_k}")
splits = [v] * num_shards
elif 'ffn_norm.weight' in new_k:
print(f"Processing {new_k}")
splits = [v] * num_shards
elif 'attention_norm.weight' in new_k:
print(f"Processing {new_k}")
splits = [v] * num_shards
elif 'w1.weight' in new_k:
print(f"Processing {new_k}")
splits = v.split(v.size(0)//num_shards,dim=0)
elif 'w2.weight' in new_k:
print(f"Processing {new_k}")
splits = v.split(v.size(1)//num_shards,dim=1)
elif 'w3.weight' in new_k:
print(f"Processing {new_k}")
splits = v.split(v.size(0)//num_shards,dim=0)
elif 'wo.weight' in new_k:
print(f"Processing {new_k}")
splits = v.split(v.size(1)//num_shards,dim=1)
elif 'wv.weight' in new_k:
print(f"Processing {new_k}")
splits = v.split(v.size(0)//num_shards,dim=0)
elif "wq.weight" in new_k or "wk.weight" in new_k:
print(f"Processing {new_k}")
v = unpermute(v)
splits = v.split(v.size(0)//num_shards,dim=0)
else:
print(f"Unexpected key {new_k}")
raise ValueError
for sd,split in zip(new_state_dicts,splits):
sd[new_k] = split.clone()
del split
del splits
del model_sd[k],v
gc.collect() # Effectively enforce garbage collection
os.makedirs(output_dir, exist_ok=True)
for i,new_state_dict in enumerate(new_state_dicts):
print(f"Saving shard {i+1} of {num_shards} into {output_dir}/consolidated.0{i}.pth")
torch.save(new_state_dict, output_dir + f"/consolidated.0{i}.pth")
with open(output_dir + "/params.json", "w") as f:
print(f"Saving params.json into {output_dir}/params.json")
json.dump(params, f)
if __name__=='__main__':
args = parser.parse_args()
base_model_path = args.base_model
lora_model_paths = [s.strip() for s in args.lora_model.split(',') if len(s.strip())!=0]
output_dir = args.output_dir
output_type = args.output_type
offload_dir = args.offload_dir
print(f"Base model: {base_model_path}")
print(f"LoRA model(s) {lora_model_paths}:")
if offload_dir is not None:
# Load with offloading, which is useful for low-RAM machines.
# Note that if you have enough RAM, please use original method instead, as it is faster.
base_model = LlamaForCausalLM.from_pretrained(
base_model_path,
load_in_8bit=False,
torch_dtype=torch.float16,
offload_folder=offload_dir,
offload_state_dict=True,
low_cpu_mem_usage=True,
device_map={"": "cpu"},
)
else:
# Original method without offloading
base_model = LlamaForCausalLM.from_pretrained(
base_model_path,
load_in_8bit=False,
torch_dtype=torch.float16,
device_map={"": "cpu"},
)
## infer the model size from the checkpoint
embedding_size = base_model.get_input_embeddings().weight.size(1)
model_size = emb_to_model_size[embedding_size]
print(f"Peft version: {peft.__version__}")
print(f"Loading LoRA for {model_size} model")
lora_model = None
lora_model_sd = None
for lora_index, lora_model_path in enumerate(lora_model_paths):
print(f"Loading LoRA {lora_model_path}...")
tokenizer = LlamaTokenizer.from_pretrained(lora_model_path)
print(f"base_model vocab size: {base_model.get_input_embeddings().weight.size(0)}")
print(f"tokenizer vocab size: {len(tokenizer)}")
model_vocab_size = base_model.get_input_embeddings().weight.size(0)
assert len(tokenizer) >= model_vocab_size, \
(f"The vocab size of the tokenizer {len(tokenizer)} is smaller than the vocab size of the base model {model_vocab_size}\n"
"This is not the intended use. Please check your model and tokenizer.")
if model_vocab_size != len(tokenizer):
base_model.resize_token_embeddings(len(tokenizer))
print(f"Extended vocabulary size to {len(tokenizer)}")
first_weight = base_model.model.layers[0].self_attn.q_proj.weight
first_weight_old = first_weight.clone()
print(f"Loading LoRA weights")
if hasattr(peft.LoraModel,'merge_and_unload'):
try:
lora_model = PeftModel.from_pretrained(
base_model,
lora_model_path,
device_map={"": "cpu"},
torch_dtype=torch.float16,
)
except RuntimeError as e:
if '[49953, 4096]' in str(e):
print("The vocab size of the tokenizer does not match the vocab size of the LoRA weight. \n"
"Did you misuse the LLaMA tokenizer with the Alpaca-LoRA weight?\n"
"Make sure that you use LLaMA tokenizer with the LLaMA-LoRA weight and Alpaca tokenizer with the Alpaca-LoRA weight!")
raise e
assert torch.allclose(first_weight_old, first_weight)
print(f"Merging with merge_and_unload...")
base_model = lora_model.merge_and_unload()
else:
base_model_sd = base_model.state_dict()
try:
lora_model_sd = torch.load(os.path.join(lora_model_path,'adapter_model.bin'),map_location='cpu')
except FileNotFoundError:
print("Cannot find lora model on the disk. Downloading lora model from hub...")
filename = hf_hub_download(repo_id=lora_model_path,filename='adapter_model.bin')
lora_model_sd = torch.load(filename,map_location='cpu')
if 'base_model.model.model.embed_tokens.weight' in lora_model_sd:
assert lora_model_sd['base_model.model.model.embed_tokens.weight'].shape[0]==len(tokenizer), \
("The vocab size of the tokenizer does not match the vocab size of the LoRA weight. \n"
"Did you misuse the LLaMA tokenizer with the Alpaca-LoRA weight?\n"
"Make sure that you use LLaMA tokenizer with the LLaMA-LoRA weight and Alpaca tokenizer with the Alpaca-LoRA weight!")
lora_config = peft.LoraConfig.from_pretrained(lora_model_path)
lora_scaling = lora_config.lora_alpha / lora_config.r
fan_in_fan_out = lora_config.fan_in_fan_out
lora_keys = [k for k in lora_model_sd if 'lora_A' in k]
non_lora_keys = [k for k in lora_model_sd if not 'lora_' in k]
for k in non_lora_keys:
print(f"merging {k}")
original_k = k.replace('base_model.model.','')
base_model_sd[original_k].copy_(lora_model_sd[k])
for k in lora_keys:
print(f"merging {k}")
original_key = k.replace('.lora_A','').replace('base_model.model.','')
assert original_key in base_model_sd
lora_a_key = k
lora_b_key = k.replace('lora_A','lora_B')
base_model_sd[original_key] += (
transpose(lora_model_sd[lora_b_key].float() @ lora_model_sd[lora_a_key].float(),fan_in_fan_out) * lora_scaling
)
assert base_model_sd[original_key].dtype == torch.float16
# did we do anything?
assert not torch.allclose(first_weight_old, first_weight)
tokenizer.save_pretrained(output_dir)
if output_type=='huggingface':
print("Saving to Hugging Face format...")
LlamaForCausalLM.save_pretrained(base_model, output_dir) #, state_dict=deloreanized_sd)
else: # output_type=='pth
print("Saving to pth format...")
base_model_sd = base_model.state_dict()
del lora_model, base_model, lora_model_sd
params = params_of_models[model_size]
num_shards = num_shards_of_models[model_size]
n_layers = params["n_layers"]
n_heads = params["n_heads"]
dim = params["dim"]
dims_per_head = dim // n_heads
base = 10000.0
inv_freq = 1.0 / (base ** (torch.arange(0, dims_per_head, 2).float() / dims_per_head))
save_shards(model_sd=base_model_sd, num_shards=num_shards)
================================================
FILE: scripts/merge_llama_with_chinese_lora_low_mem.py
================================================
"""
Usage:
python merge_llama_with_chinese_lora_low_mem.py \
--base_model path/to/llama/model \
--lora_model path/to/first/lora[,path/to/second/lora] \
--output_type [pth|huggingface] \
--output_dir path/to/output/dir
"""
import argparse
import json
import os
import gc
import torch
import peft
from transformers import LlamaTokenizer
from transformers.modeling_utils import dtype_byte_size
from huggingface_hub import snapshot_download
import re
parser = argparse.ArgumentParser()
parser.add_argument('--base_model', default=None, required=True,
type=str, help="Please specify a base model")
parser.add_argument('--lora_model', default=None, required=True,
type=str, help="Please specify LoRA models to be merged (ordered); use commas to separate multiple LoRA models")
parser.add_argument('--output_type', default='pth',choices=['pth','huggingface'],
type=str, help="Save the merged model in pth or huggingface format")
parser.add_argument('--output_dir', default='./merged_model',
type=str, help="The output folder to save the merged model")
parser.add_argument('--verbose', default=False, action='store_true',
help="Show detailed messages")
emb_to_model_size = {
4096 : '7B',
5120 : '13B',
6656 : '33B',
8192 : '65B',
}
num_shards_of_models = {'7B': 1, '13B': 2, '33B': 4, '65B': 8}
params_of_models = {
'7B':
{
"dim": 4096,
"multiple_of": 256,
"n_heads": 32,
"n_layers": 32,
"norm_eps": 1e-06,
"vocab_size": -1,
},
'13B':
{
"dim": 5120,
"multiple_of": 256,
"n_heads": 40,
"n_layers": 40,
"norm_eps": 1e-06,
"vocab_size": -1,
},
'33B':
{
"dim": 6656,
"multiple_of": 256,
"n_heads": 52,
"n_layers": 60,
"norm_eps": 1e-06,
"vocab_size": -1,
},
'65B':
{
"dim": 8192,
"multiple_of": 256,
"n_heads": 64,
"n_layers": 80,
"norm_eps": 1e-05,
"vocab_size": -1,
},
}
def transpose(weight, fan_in_fan_out):
return weight.T if fan_in_fan_out else weight
# Borrowed and modified from https://github.com/tloen/alpaca-lora
def translate_state_dict_key(k):
k = k.replace("base_model.model.", "")
if k == "model.embed_tokens.weight":
return "tok_embeddings.weight"
elif k == "model.norm.weight":
return "norm.weight"
elif k == "lm_head.weight":
return "output.weight"
elif k.startswith("model.layers."):
layer = k.split(".")[2]
if k.endswith(".self_attn.q_proj.weight"):
return f"layers.{layer}.attention.wq.weight"
elif k.endswith(".self_attn.k_proj.weight"):
return f"layers.{layer}.attention.wk.weight"
elif k.endswith(".self_attn.v_proj.weight"):
return f"layers.{layer}.attention.wv.weight"
elif k.endswith(".self_attn.o_proj.weight"):
return f"layers.{layer}.attention.wo.weight"
elif k.endswith(".mlp.gate_proj.weight"):
return f"layers.{layer}.feed_forward.w1.weight"
elif k.endswith(".mlp.down_proj.weight"):
return f"layers.{layer}.feed_forward.w2.weight"
elif k.endswith(".mlp.up_proj.weight"):
return f"layers.{layer}.feed_forward.w3.weight"
elif k.endswith(".input_layernorm.weight"):
return f"layers.{layer}.attention_norm.weight"
elif k.endswith(".post_attention_layernorm.weight"):
return f"layers.{layer}.ffn_norm.weight"
elif k.endswith("rotary_emb.inv_freq") or "lora" in k:
return None
else:
print(layer, k)
raise NotImplementedError
else:
print(k)
raise NotImplementedError
def unpermute(w):
return (
w.view(n_heads, 2, dim // n_heads // 2, dim).transpose(1, 2).reshape(dim, dim)
)
def save_shards(model_sd, num_shards: int, prefix="", verbose=False):
"""
Convert and save the HF format weights to PTH format weights
"""
with torch.no_grad():
if num_shards == 1:
new_state_dict = {}
for k, v in model_sd.items():
new_k = translate_state_dict_key(k)
if new_k is not None:
if "wq" in new_k or "wk" in new_k:
new_state_dict[new_k] = unpermute(v)
else:
new_state_dict[new_k] = v
os.makedirs(output_dir, exist_ok=True)
print(f"Saving shard 1 of {num_shards} into {output_dir}/{prefix}consolidated.00.pth")
torch.save(new_state_dict, output_dir + f"/{prefix}consolidated.00.pth")
else:
new_state_dicts = [dict() for _ in range(num_shards)]
for k in list(model_sd.keys()):
v = model_sd[k]
new_k = translate_state_dict_key(k)
if new_k is not None:
if new_k=='tok_embeddings.weight':
assert v.size(1)%num_shards==0
splits = v.split(v.size(1)//num_shards,dim=1)
elif new_k=='output.weight':
if v.size(0)%num_shards==0:
splits = v.split(v.size(0)//num_shards,dim=0)
else:
size_list = [v.size(0)//num_shards] * num_shards
size_list[-1] += v.size(0)%num_shards
splits = v.split(size_list, dim=0) # 13B: size_list == [24976,24977]
elif new_k=='norm.weight':
splits = [v] * num_shards
elif 'ffn_norm.weight' in new_k:
splits = [v] * num_shards
elif 'attention_norm.weight' in new_k:
splits = [v] * num_shards
elif 'w1.weight' in new_k:
splits = v.split(v.size(0)//num_shards,dim=0)
elif 'w2.weight' in new_k:
splits = v.split(v.size(1)//num_shards,dim=1)
elif 'w3.weight' in new_k:
splits = v.split(v.size(0)//num_shards,dim=0)
elif 'wo.weight' in new_k:
splits = v.split(v.size(1)//num_shards,dim=1)
elif 'wv.weight' in new_k:
splits = v.split(v.size(0)//num_shards,dim=0)
elif "wq.weight" in new_k or "wk.weight" in new_k:
v = unpermute(v)
splits = v.split(v.size(0)//num_shards,dim=0)
else:
print(f"Unexpected key {new_k}")
raise ValueError
if verbose:
print(f"Processing {new_k}")
for sd,split in zip(new_state_dicts,splits):
sd[new_k] = split.clone()
del split
del splits
del model_sd[k],v
gc.collect() # Effectively enforce garbage collection
os.makedirs(output_dir, exist_ok=True)
for i,new_state_dict in enumerate(new_state_dicts):
print(f"Saving shard {i+1} of {num_shards} into {output_dir}/{prefix}consolidated.0{i}.pth")
torch.save(new_state_dict, output_dir + f"/{prefix}consolidated.0{i}.pth")
def merge_shards(output_dir, num_shards: int):
ckpt_filenames = sorted([f for f in os.listdir(output_dir) if re.match('L(\d+)-consolidated.(\d+).pth',f)])
for i in range(num_shards):
shards_filenames = sorted([f for f in ckpt_filenames if re.match(f'L(\d+)-consolidated.0{i}.pth',f)])
print(f"Loading {shards_filenames} ...")
shards_dicts = [torch.load(os.path.join(output_dir,fn)) for fn in shards_filenames]
shards_merged = {}
for d in shards_dicts:
shards_merged |= d
print(f"Saving the merged shard to " + os.path.join(output_dir, f"consolidated.0{i}.pth"))
torch.save(shards_merged, os.path.join(output_dir, f"consolidated.0{i}.pth"))
print("Cleaning up...")
del shards_merged
for d in shards_dicts:
del d
del shards_dicts
gc.collect() # Effectively enforce garbage collection
for fn in shards_filenames:
os.remove(os.path.join(output_dir,fn))
if __name__=='__main__':
args = parser.parse_args()
base_model_path = args.base_model
lora_model_paths = [s.strip() for s in args.lora_model.split(',') if len(s.strip())!=0]
output_dir = args.output_dir
output_type = args.output_type
os.makedirs(output_dir, exist_ok=True)
print(f"Base model: {base_model_path}")
print(f"LoRA model(s) {lora_model_paths}:")
tokenizers_and_loras = []
for lora_model_path in lora_model_paths:
print(f"Loading {lora_model_path}")
if not os.path.exists(lora_model_path):
print("Cannot find lora model on the disk. Downloading lora model from hub...")
lora_model_path = snapshot_download(repo_id=lora_model_path)
tokenizer = LlamaTokenizer.from_pretrained(lora_model_path)
lora_config = peft.LoraConfig.from_pretrained(lora_model_path)
lora_state_dict = torch.load(os.path.join(lora_model_path,'adapter_model.bin'),map_location='cpu')
if 'base_model.model.model.embed_tokens.weight' in lora_state_dict:
lora_vocab_size = lora_state_dict['base_model.model.model.embed_tokens.weight'].shape[0]
assert lora_vocab_size==len(tokenizer), \
(f"The vocab size of the tokenizer {len(tokenizer)} does not match the vocab size of the LoRA weight {lora_vocab_size}.\n"
"Make sure that you use LLaMA tokenizer with the LLaMA-LoRA weight and Alpaca tokenizer with the Alpaca-LoRA weight!")
tokenizers_and_loras.append(
{
"tokenizer" :tokenizer,
"state_dict" :lora_state_dict,
"config": lora_config,
"scaling": lora_config.lora_alpha / lora_config.r,
"fan_in_fan_out" : lora_config.fan_in_fan_out,
})
if len(tokenizers_and_loras)==2:
t1_vocab_size = len(tokenizers_and_loras[0]["tokenizer"])
t2_vocab_size = len(tokenizers_and_loras[1]["tokenizer"])
assert t1_vocab_size<=t2_vocab_size, \
(f"The vocab size of the first tokenizer is {t1_vocab_size}\n"
f"The vocab size of the second tokenizer is {t2_vocab_size}, found to be smaller than {t1_vocab_size}\n"
"This is not the intended use. Please check your model and tokenizer.")
if not os.path.exists(base_model_path):
print("Cannot find lora model on the disk. Downloading lora model from hub...")
base_model_path = snapshot_download(repo_id=base_model_path)
ckpt_filenames = sorted([f for f in os.listdir(base_model_path) if re.match('pytorch_model-(\d+)-of-(\d+).bin',f)])
embedding_size = None
model_size = None
total_size = 0
for index, filename in enumerate(ckpt_filenames):
print(f"Loading ckpt {filename}")
state_dict = torch.load(os.path.join(base_model_path,filename), map_location='cpu')
if index == 0:
embedding_size = state_dict['model.embed_tokens.weight'].shape[1]
model_size = emb_to_model_size[embedding_size]
if output_type=='pth':
params = params_of_models[model_size]
num_shards = num_shards_of_models[model_size]
n_layers = params["n_layers"]
n_heads = params["n_heads"]
dim = params["dim"]
dims_per_head = dim // n_heads
base = 10000.0
inv_freq = 1.0 / (base ** (torch.arange(0, dims_per_head, 2).float() / dims_per_head))
print("Merging...")
for k in state_dict:
for tl_idx, t_and_l in enumerate(tokenizers_and_loras):
saved_key = 'base_model.model.'+k
lora_key_A = saved_key.replace('.weight','.lora_A.weight')
if saved_key in t_and_l['state_dict']:
if args.verbose:
print(f"copying {saved_key} from {tl_idx}-th LoRA weight to {k}")
state_dict[k] = t_and_l['state_dict'][saved_key].half().clone() # do we need half()?
if lora_key_A in t_and_l['state_dict']:
lora_key_B = lora_key_A.replace('lora_A.weight','lora_B.weight')
if args.verbose:
print(f"merging {lora_key_A} and lora_B.weight form {tl_idx}-th LoRA weight to {k}")
state_dict[k] += (
transpose(
t_and_l['state_dict'][lora_key_B].float()
@ t_and_l['state_dict'][lora_key_A].float(), t_and_l['fan_in_fan_out']) * t_and_l['scaling']
)
weight_size = state_dict[k].numel() * dtype_byte_size(state_dict[k].dtype)
total_size += weight_size
if output_type=='huggingface':
print(f"Saving ckpt {filename} to {output_dir} in HF format...")
torch.save(state_dict,os.path.join(output_dir, filename))
elif output_type=='pth':
print(f"Converting to pth format...")
save_shards(model_sd=state_dict, num_shards=num_shards,prefix=f"L{index+1}-", verbose=args.verbose)
del state_dict
gc.collect() # Effectively enforce garbage collection
print(f"Saving tokenizer")
tokenizers_and_loras[-1]['tokenizer'].save_pretrained(output_dir)
if output_type == 'pth':
with open(output_dir + "/params.json", "w") as f:
print(f"Saving params.json into {output_dir}/params.json")
json.dump(params, f)
merge_shards(output_dir, num_shards=num_shards)
if output_type=='huggingface':
configs = ('config.json', 'generation_config.json', 'pytorch_model.bin.index.json')
for config in configs:
if os.path.exists(os.path.join(base_model_path, config)):
print(f"Saving {config}")
with open(os.path.join(base_model_path, config),'r') as f:
obj = json.load(f)
if config=='config.json':
obj['vocab_size'] = len(tokenizers_and_loras[-1]['tokenizer'])
if config=='pytorch_model.bin.index.json':
obj['metadata']['total_size'] = total_size
with open(os.path.join(output_dir, config), 'w') as f:
json.dump(obj, f, indent=2)
print("Done.")
================================================
FILE: scripts/merge_tokenizer/merge_tokenizers.py
================================================
import os
os.environ["PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION"]="python"
from transformers import LlamaTokenizer
from sentencepiece import sentencepiece_model_pb2 as sp_pb2_model
import sentencepiece as spm
import argparse
parser = argparse.ArgumentParser()
parser.add_argument('--llama_tokenizer_dir', default=None, type=str, required=True)
parser.add_argument('--chinese_sp_model_file', default='./chinese_sp.model', type=str)
args = parser.parse_args()
llama_tokenizer_dir = args.llama_tokenizer_dir
chinese_sp_model_file = args.chinese_sp_model_file
# load
llama_tokenizer = LlamaTokenizer.from_pretrained(llama_tokenizer_dir)
chinese_sp_model = spm.SentencePieceProcessor()
chinese_sp_model.Load(chinese_sp_model_file)
llama_spm = sp_pb2_model.ModelProto()
llama_spm.ParseFromString(llama_tokenizer.sp_model.serialized_model_proto())
chinese_spm = sp_pb2_model.ModelProto()
chinese_spm.ParseFromString(chinese_sp_model.serialized_model_proto())
# print number of tokens
print(len(llama_tokenizer),len(chinese_sp_model))
print(llama_tokenizer.all_special_tokens)
print(llama_tokenizer.all_special_ids)
print(llama_tokenizer.special_tokens_map)
## Add Chinese tokens to LLaMA tokenizer
llama_spm_tokens_set=set(p.piece for p in llama_spm.pieces)
print(len(llama_spm_tokens_set))
print(f"Before:{len(llama_spm_tokens_set)}")
for p in chinese_spm.pieces:
piece = p.piece
if piece not in llama_spm_tokens_set:
new_p = sp_pb2_model.ModelProto().SentencePiece()
new_p.piece = piece
new_p.score = 0
llama_spm.pieces.append(new_p)
print(f"New model pieces: {len(llama_spm.pieces)}")
## Save
output_sp_dir = 'merged_tokenizer_sp'
output_hf_dir = 'merged_tokenizer_hf' # the path to save Chinese-LLaMA tokenizer
os.makedirs(output_sp_dir,exist_ok=True)
with open(output_sp_dir+'/chinese_llama.model', 'wb') as f:
f.write(llama_spm.SerializeToString())
tokenizer = LlamaTokenizer(vocab_file=output_sp_dir+'/chinese_llama.model')
tokenizer.save_pretrained(output_hf_dir)
print(f"Chinese-LLaMA tokenizer has been saved to {output_hf_dir}")
# Test
llama_tokenizer = LlamaTokenizer.from_pretrained(llama_tokenizer_dir)
chinese_llama_tokenizer = LlamaTokenizer.from_pretrained(output_hf_dir)
print(tokenizer.all_special_tokens)
print(tokenizer.all_special_ids)
print(tokenizer.special_tokens_map)
text='''白日依山尽,黄河入海流。欲穷千里目,更上一层楼。
The primary use of LLaMA is research on large language models, including'''
print("Test text:\n",text)
print(f"Tokenized by LLaMA tokenizer:{llama_tokenizer.tokenize(text)}")
print(f"Tokenized by Chinese-LLaMA tokenizer:{chinese_llama_tokenizer.tokenize(text)}")
================================================
FILE: scripts/openai_server_demo/README.md
================================================
# OPENAI API DEMO
> 更加详细的OPENAI API信息:
这是一个使用fastapi实现的简易的仿OPENAI API风格的服务器DEMO,您可以使用这个API DEMO来快速搭建基于中文大模型的个人网站以及其他有趣的WEB DEMO。
## 部署方式
安装依赖
``` shell
pip install fastapi uvicorn shortuuid
```
启动脚本
``` shell
python scripts/openai_server_demo/openai_api_server.py --base_model /path/to/base_model --lora_model /path/to/lora_model --gpus 0,1
```
### 参数说明
`--base_model {base_model}`:存放HF格式的LLaMA模型权重和配置文件的目录,可以是合并后的中文Alpaca或Alpaca Plus模型(此时无需提供`--lora_model`),也可以是转后HF格式后的原版LLaMA模型(需要提供`--lora_model`)
`--lora_model {lora_model}`:中文Alpaca LoRA解压后文件所在目录,也可使用🤗Model Hub模型调用名称。若不提供此参数,则只加载--base_model指定的模型
`--tokenizer_path {tokenizer_path}`:存放对应tokenizer的目录。若不提供此参数,则其默认值与`--lora_model`相同;若也未提供`--lora_model`参数,则其默认值与--base_model相同
`--only_cpu`: 仅使用CPU进行推理
`--gpus {gpu_ids}`: 指定使用的GPU设备编号,默认为0。如使用多张GPU,以逗号分隔,如0,1,2
`--load_in_8bit`: 使用8bit模型进行推理,可节省显存,但可能影响模型效果
## API文档
### 文字接龙(completion)
> 有关completion的中文翻译,李宏毅教授将其翻译为文字接龙
最基础的API接口,输入prompt,输出语言大模型的文字接龙(completion)结果。
API DEMO内置有alpaca prompt模板,prompt将被套入alpaca instruction模板中,这里输入的prompt应更像指令而非对话。
#### 快速体验completion接口
请求command:
``` shell
curl http://localhost:19327/v1/completions \
-H "Content-Type: application/json" \
-d '{
"prompt": "告诉我中国的首都在哪里"
}'
```
json返回体:
``` json
{
"id": "cmpl-3watqWsbmYgbWXupsSik7s",
"object": "text_completion",
"created": 1686067311,
"model": "chinese-llama-alpaca",
"choices": [
{
"index": 0,
"text": "中国的首都是北京。"
}
]
}
```
#### completion接口高级参数
请求command:
``` shell
curl http://localhost:19327/v1/completions \
-H "Content-Type: application/json" \
-d '{
"prompt": "告诉我中国和美国分别各有哪些优点缺点",
"max_tokens": 90,
"temperature": 0.7,
"num_beams": 4,
"top_k": 40
}'
```
json返回体:
``` json
{
"id": "cmpl-PvVwfMq2MVWHCBKiyYJfKM",
"object": "text_completion",
"created": 1686149471,
"model": "chinese-llama-alpaca",
"choices": [
{
"index": 0,
"text": "中国的优点是拥有丰富的文化和历史,而美国的优点是拥有先进的科技和经济体系。"
}
]
}
```
#### completion接口高级参数说明
> 有关Decoding策略,更加详细的细节可以参考 该文章详细讲述了三种Llama会用到的Decoding策略:Greedy Decoding、Random Sampling 和 Beam Search,Decoding策略是top_k、top_p、temperature、num_beam等高级参数的基础。
`prompt`: 生成文字接龙(completion)的提示。
`max_tokens`: 新生成的句子的token长度。
`temperature`: 在0和2之间选择的采样温度。较高的值如0.8会使输出更加随机,而较低的值如0.2则会使其输出更具有确定性。temperature越高,使用随机采样最为decoding的概率越大。
`num_beams`: 当搜索策略为束搜索(beam search)时,该参数为在束搜索(beam search)中所使用的束个数,当num_beams=1时,实际上就是贪心搜索(greedy decoding)。
`top_k`: 在随机采样(random sampling)时,前top_k高概率的token将作为候选token被随机采样。
`top_p`: 在随机采样(random sampling)时,累积概率超过top_p的token将作为候选token被随机采样,越低随机性越大,举个例子,当top_p设定为0.6时,概率前5的token概率分别为{0.23, 0.20, 0.18, 0.11, 0.10}时,前三个token的累积概率为0.61,那么第4个token将被过滤掉,只有前三的token将作为候选token被随机采样。
`repetition_penalty`: 重复惩罚,具体细节可以参考这篇文章: 。
`do_sample`: 启用随机采样策略。默认为true。
### 聊天(chat completion)
聊天接口支持多轮对话
#### 快速体验聊天接口
请求command:
``` shell
curl http://localhost:19327/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"messages": [
{"role": "user","message": "给我讲一些有关杭州的故事吧"}
],
"repetition_penalty": 1.0
}'
```
json返回体:
``` json
{
"id": "chatcmpl-5L99pYoW2ov5ra44Ghwupt",
"object": "chat.completion",
"created": 1686143170,
"model": "chinese-llama-alpaca",
"choices": [
{
"index": 0,
"message": {
"role": "user",
"content": "给我讲一些有关杭州的故事吧"
}
},
{
"index": 1,
"message": {
"role": "assistant",
"content": "好的,请问您对杭州有什么特别的偏好吗?"
}
}
]
}
```
#### 多轮对话
请求command:
``` shell
curl http://localhost:19327/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"messages": [
{"role": "user","message": "给我讲一些有关杭州的故事吧"},
{"role": "assistant","message": "好的,请问您对杭州有什么特别的偏好吗?"},
{"role": "user","message": "我比较喜欢和西湖,可以给我讲一下西湖吗"}
],
"repetition_penalty": 1.0
}'
```
json返回体:
``` json
{
"id": "chatcmpl-hmvrQNPGYTcLtmYruPJbv6",
"object": "chat.completion",
"created": 1686143439,
"model": "chinese-llama-alpaca",
"choices": [
{
"index": 0,
"message": {
"role": "user",
"content": "给我讲一些有关杭州的故事吧"
}
},
{
"index": 1,
"message": {
"role": "assistant",
"content": "好的,请问您对杭州有什么特别的偏好吗?"
}
},
{
"index": 2,
"message": {
"role": "user",
"content": "我比较喜欢和西湖,可以给我讲一下西湖吗"
}
},
{
"index": 3,
"message": {
"role": "assistant",
"content": "是的,西湖是杭州最著名的景点之一,它被誉为“人间天堂”。 <\\s>"
}
}
]
}
```
#### 聊天接口高级参数说明
`prompt`: 生成文字接龙(completion)的提示。
`max_tokens`: 新生成的句子的token长度。
`temperature`: 在0和2之间选择的采样温度。较高的值如0.8会使输出更加随机,而较低的值如0.2则会使其输出更具有确定性。temperature越高,使用随机采样最为decoding的概率越大。
`num_beams`: 当搜索策略为束搜索(beam search)时,该参数为在束搜索(beam search)中所使用的束个数,当num_beams=1时,实际上就是贪心搜索(greedy decoding)。
`top_k`: 在随机采样(random sampling)时,前top_k高概率的token将作为候选token被随机采样。
`top_p`: 在随机采样(random sampling)时,累积概率超过top_p的token将作为候选token被随机采样,越低随机性越大,举个例子,当top_p设定为0.6时,概率前5的token概率分别为[0.23, 0.20, 0.18, 0.11, 0.10]时,前三个token的累积概率为0.61,那么第4个token将被过滤掉,只有前三的token将作为候选token被随机采样。
`repetition_penalty`: 重复惩罚,具体细节可以参考这篇文章: 。
`do_sample`: 启用随机采样策略。默认为true。
### 文本嵌入向量(text embedding)
文本嵌入向量有很多作用,包括但不限于基于大型文档问答、总结一本书中的内容、为大语言模型找到与当前用户输入最相近的记忆等等。
请求command:
``` shell
curl http://localhost:19327/v1/embeddings \
-H "Content-Type: application/json" \
-d '{
"input": "今天天气真不错"
}'
```
json返回体:
``` json
{
"object": "list",
"data": [
{
"object": "embedding",
"embedding": [
0.003643923671916127,
-0.0072653163224458694,
0.0075545101426541805,
....,
0.0045851171016693115
],
"index": 0
}
],
"model": "chinese-llama-alpaca"
}
```
embedding向量的长度与所使用模型hidden size相同。比如当使用7B模型时,embedding的长度为4096。
================================================
FILE: scripts/openai_server_demo/openai_api_protocol.py
================================================
from typing import Optional, List, Dict, Any, Union
import time
import shortuuid
from pydantic import BaseModel, Field
class ChatCompletionRequest(BaseModel):
model: str = "chinese-llama-alpaca"
messages: Union[str, List[Dict[str, str]]]
temperature: Optional[float] = 0.7
top_p: Optional[float] = 1.0
top_k: Optional[int] = 40
n: Optional[int] = 1
max_tokens: Optional[int] = 128
num_beams: Optional[int] = 1
stop: Optional[Union[str, List[str]]] = None
stream: Optional[bool] = False
repetition_penalty: Optional[float] = 1.0
user: Optional[str] = None
do_sample: Optional[bool] = True
class ChatMessage(BaseModel):
role: str
content: str
class ChatCompletionResponseChoice(BaseModel):
index: int
message: ChatMessage
class ChatCompletionResponse(BaseModel):
id: str = Field(default_factory=lambda: f"chatcmpl-{shortuuid.random()}")
object: str = "chat.completion"
created: int = Field(default_factory=lambda: int(time.time()))
model: str = "chinese-llama-alpaca"
choices: List[ChatCompletionResponseChoice]
class EmbeddingsRequest(BaseModel):
input: Union[str, List[Any]]
user: Optional[str] = None
class EmbeddingsResponse(BaseModel):
object: str = "list"
data: List[Dict[str, Any]]
model: str = "chinese-llama-alpaca"
class CompletionRequest(BaseModel):
prompt: Union[str, List[Any]]
temperature: Optional[float] = 0.1
n: Optional[int] = 1
max_tokens: Optional[int] = 128
stop: Optional[Union[str, List[str]]] = None
stream: Optional[bool] = False
top_p: Optional[float] = 0.75
top_k: Optional[int] = 40
num_beams: Optional[int] = 1
logprobs: Optional[int] = None
echo: Optional[bool] = False
repetition_penalty: Optional[float] = 1.0
user: Optional[str] = None
do_sample: Optional[bool] = True
class CompletionResponseChoice(BaseModel):
index: int
text: str
class CompletionResponse(BaseModel):
id: Optional[str] = Field(default_factory=lambda: f"cmpl-{shortuuid.random()}")
object: Optional[str] = "text_completion"
created: Optional[int] = Field(default_factory=lambda: int(time.time()))
model: Optional[str] = 'chinese-llama-alpaca'
choices: List[CompletionResponseChoice]
================================================
FILE: scripts/openai_server_demo/openai_api_server.py
================================================
import argparse
import os
from fastapi import FastAPI
import uvicorn
parser = argparse.ArgumentParser()
parser.add_argument('--base_model', default=None, type=str, required=True)
parser.add_argument('--lora_model', default=None, type=str,help="If None, perform inference on the base model")
parser.add_argument('--tokenizer_path',default=None,type=str)
parser.add_argument('--gpus', default="0", type=str)
parser.add_argument('--load_in_8bit',action='store_true', help='use 8 bit model')
parser.add_argument('--only_cpu',action='store_true',help='only use CPU for inference')
parser.add_argument('--alpha',type=str,default="1.0", help="The scaling factor of NTK method, can be a float or 'auto'. ")
args = parser.parse_args()
load_in_8bit = args.load_in_8bit
if args.only_cpu is True:
args.gpus = ""
os.environ["CUDA_VISIBLE_DEVICES"] = args.gpus
import torch
import torch.nn.functional as F
from transformers import LlamaForCausalLM, LlamaTokenizer, GenerationConfig
from peft import PeftModel
from patches import apply_attention_patch, apply_ntk_scaling_patch
apply_attention_patch(use_memory_efficient_attention=True)
apply_ntk_scaling_patch(args.alpha)
from openai_api_protocol import (
ChatCompletionRequest,
ChatCompletionResponse,
ChatMessage,
ChatCompletionResponseChoice,
CompletionRequest,
CompletionResponse,
CompletionResponseChoice,
EmbeddingsRequest,
EmbeddingsResponse,
)
load_type = torch.float16
if torch.cuda.is_available():
device = torch.device(0)
else:
device = torch.device('cpu')
if args.tokenizer_path is None:
args.tokenizer_path = args.lora_model
if args.lora_model is None:
args.tokenizer_path = args.base_model
tokenizer = LlamaTokenizer.from_pretrained(args.tokenizer_path)
base_model = LlamaForCausalLM.from_pretrained(
args.base_model,
load_in_8bit=load_in_8bit,
torch_dtype=load_type,
low_cpu_mem_usage=True,
device_map='auto' if not args.only_cpu else None,
)
model_vocab_size = base_model.get_input_embeddings().weight.size(0)
tokenzier_vocab_size = len(tokenizer)
print(f"Vocab of the base model: {model_vocab_size}")
print(f"Vocab of the tokenizer: {tokenzier_vocab_size}")
if model_vocab_size!=tokenzier_vocab_size:
assert tokenzier_vocab_size > model_vocab_size
print("Resize model embeddings to fit tokenizer")
base_model.resize_token_embeddings(tokenzier_vocab_size)
if args.lora_model is not None:
print("loading peft model")
model = PeftModel.from_pretrained(base_model, args.lora_model,torch_dtype=load_type,device_map='auto',)
else:
model = base_model
if device==torch.device('cpu'):
model.float()
model.eval()
def generate_completion_prompt(instruction: str):
"""Generate prompt for completion"""
return f"""Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
{instruction}
### Response: """
def generate_chat_prompt(messages: list):
"""Generate prompt for chat completion"""
system_msg = '''Below is an instruction that describes a task. Write a response that appropriately completes the request.'''
for msg in messages:
if msg.role == 'system':
system_msg = msg.content
prompt = f"{system_msg}\n\n"
for msg in messages:
if msg.role == 'system':
continue
if msg.role == 'assistant':
prompt += f"### Response: {msg.content}\n\n"
if msg.role == 'user':
prompt += f"### Instruction:\n{msg.content}\n\n"
prompt += "### Response: "
return prompt
def predict(
input,
max_new_tokens=128,
top_p=0.75,
temperature=0.1,
top_k=40,
num_beams=4,
repetition_penalty=1.0,
do_sample=True,
**kwargs,
):
"""
Main inference method
type(input) == str -> /v1/completions
type(input) == list -> /v1/chat/completions
"""
if isinstance(input, str):
prompt = generate_completion_prompt(input)
else:
prompt = generate_chat_prompt(input)
inputs = tokenizer(prompt, return_tensors="pt")
input_ids = inputs["input_ids"].to(device)
generation_config = GenerationConfig(
temperature=temperature,
top_p=top_p,
top_k=top_k,
num_beams=num_beams,
do_sample=do_sample,
**kwargs,
)
with torch.no_grad():
generation_output = model.generate(
input_ids=input_ids,
generation_config=generation_config,
return_dict_in_generate=True,
output_scores=False,
max_new_tokens=max_new_tokens,
repetition_penalty=float(repetition_penalty),
)
s = generation_output.sequences[0]
output = tokenizer.decode(s, skip_special_tokens=True)
output = output.split("### Response:")[-1].strip()
return output
def get_embedding(input):
"""Get embedding main function"""
with torch.no_grad():
if tokenizer.pad_token == None:
tokenizer.add_special_tokens({'pad_token': '[PAD]'})
encoding = tokenizer(
input, padding=True, return_tensors="pt"
)
input_ids = encoding["input_ids"].to(device)
attention_mask = encoding["attention_mask"].to(device)
model_output = model(
input_ids, attention_mask, output_hidden_states=True
)
data = model_output.hidden_states[-1]
mask = attention_mask.unsqueeze(-1).expand(data.size()).float()
masked_embeddings = data * mask
sum_embeddings = torch.sum(masked_embeddings, dim=1)
seq_length = torch.sum(mask, dim=1)
embedding = sum_embeddings / seq_length
normalized_embeddings = F.normalize(embedding, p=2, dim=1)
ret = normalized_embeddings.squeeze(0).tolist()
return ret
app = FastAPI()
@app.post("/v1/chat/completions")
async def create_chat_completion(request: ChatCompletionRequest):
"""Creates a completion for the chat message"""
msgs = request.messages
if isinstance(msgs, str):
msgs = [ChatMessage(role='user',content=msgs)]
else:
msgs = [ChatMessage(role=x['role'],content=x['message']) for x in msgs]
output = predict(
input=msgs,
max_new_tokens=request.max_tokens,
top_p=request.top_p,
top_k=request.top_k,
temperature=request.temperature,
num_beams=request.num_beams,
repetition_penalty=request.repetition_penalty,
do_sample=request.do_sample,
)
choices = [ChatCompletionResponseChoice(index = i, message = msg) for i, msg in enumerate(msgs)]
choices += [ChatCompletionResponseChoice(index = len(choices), message = ChatMessage(role='assistant',content=output))]
return ChatCompletionResponse(choices = choices)
@app.post("/v1/completions")
async def create_completion(request: CompletionRequest):
"""Creates a completion"""
output = predict(
input=request.prompt,
max_new_tokens=request.max_tokens,
top_p=request.top_p,
top_k=request.top_k,
temperature=request.temperature,
num_beams=request.num_beams,
repetition_penalty=request.repetition_penalty,
do_sample=request.do_sample,
)
choices = [CompletionResponseChoice(index = 0, text = output)]
return CompletionResponse(choices = choices)
@app.post("/v1/embeddings")
async def create_embeddings(request: EmbeddingsRequest):
"""Creates text embedding"""
embedding = get_embedding(request.input)
data = [{
"object": "embedding",
"embedding": embedding,
"index": 0
}]
return EmbeddingsResponse(data=data)
if __name__ == "__main__":
log_config = uvicorn.config.LOGGING_CONFIG
log_config["formatters"]["access"]["fmt"] = "%(asctime)s - %(levelname)s - %(message)s"
log_config["formatters"]["default"]["fmt"] = "%(asctime)s - %(levelname)s - %(message)s"
uvicorn.run(app, host='0.0.0.0', port=19327, workers=1, log_config=log_config)
================================================
FILE: scripts/openai_server_demo/patches.py
================================================
import torch
from torch import nn
from typing import Optional, Tuple, Union
import transformers
from transformers.models.llama.modeling_llama import apply_rotary_pos_emb, rotate_half
import math
try:
from xformers import ops as xops
except ImportError:
xops = None
print(
"Xformers is not installed correctly. If you want to use memory_efficient_attention use the following command to install Xformers\npip install xformers."
)
STORE_KV_BEFORE_ROPE = False
USE_MEM_EFF_ATTENTION = False
ALPHA = 1.0
def apply_rotary_pos_emb_single(q, cos, sin, position_ids):
# The first two dimensions of cos and sin are always 1, so we can `squeeze` them.
cos = cos.squeeze(1).squeeze(0) # [seq_len, dim]
sin = sin.squeeze(1).squeeze(0) # [seq_len, dim]
cos = cos[position_ids].unsqueeze(1) # [bs, 1, seq_len, dim]
sin = sin[position_ids].unsqueeze(1) # [bs, 1, seq_len, dim]
q_embed = (q * cos) + (rotate_half(q) * sin)
return q_embed
def xformers_forward(
self,
hidden_states: torch.Tensor,
attention_mask: Optional[torch.Tensor] = None,
position_ids: Optional[torch.LongTensor] = None,
past_key_value: Optional[Tuple[torch.Tensor]] = None,
output_attentions: bool = False,
use_cache: bool = False,
) -> Tuple[torch.Tensor, Optional[torch.Tensor], Optional[Tuple[torch.Tensor]]]:
bsz, q_len, _ = hidden_states.size()
query_states = self.q_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
key_states = self.k_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
value_states = self.v_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
kv_seq_len = key_states.shape[-2]
if past_key_value is not None:
kv_seq_len += past_key_value[0].shape[-2]
if STORE_KV_BEFORE_ROPE is False:
cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin, position_ids)
# [bsz, nh, t, hd]
if past_key_value is not None:
# reuse k, v, self_attention
key_states = torch.cat([past_key_value[0], key_states], dim=2)
value_states = torch.cat([past_key_value[1], value_states], dim=2)
past_key_value = (key_states, value_states) if use_cache else None
else:
if past_key_value is not None:
# reuse k, v, self_attention
key_states = torch.cat([past_key_value[0], key_states], dim=2)
value_states = torch.cat([past_key_value[1], value_states], dim=2)
past_key_value = (key_states, value_states) if use_cache else None
cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
query_states = apply_rotary_pos_emb_single(query_states, cos, sin, position_ids)
position_ids = torch.arange(kv_seq_len, dtype=torch.long, device=cos.device)
position_ids = position_ids.unsqueeze(0).view(-1, kv_seq_len)
key_states = apply_rotary_pos_emb_single(key_states, cos, sin, position_ids)
if xops is not None and USE_MEM_EFF_ATTENTION:
attn_weights = None
query_states = query_states.transpose(1, 2)
key_states = key_states.transpose(1, 2)
value_states = value_states.transpose(1, 2)
attn_bias = None if (query_states.size(1)==1 and key_states.size(1)>1) else xops.LowerTriangularMask()
attn_output = xops.memory_efficient_attention(
query_states, key_states, value_states, attn_bias=attn_bias, p=0)
else:
attn_weights = torch.matmul(query_states, key_states.transpose(2, 3)) / math.sqrt(self.head_dim)
if attn_weights.size() != (bsz, self.num_heads, q_len, kv_seq_len):
raise ValueError(
f"Attention weights should be of size {(bsz * self.num_heads, q_len, kv_seq_len)}, but is"
f" {attn_weights.size()}"
)
if attention_mask is not None:
if attention_mask.size() != (bsz, 1, q_len, kv_seq_len):
raise ValueError(
f"Attention mask should be of size {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.size()}"
)
attn_weights = attn_weights + attention_mask
attn_weights = torch.max(
attn_weights, torch.tensor(torch.finfo(attn_weights.dtype).min, device=attn_weights.device)
)
# upcast attention to fp32
attn_weights = nn.functional.softmax(attn_weights, dim=-1, dtype=torch.float32).to(query_states.dtype)
attn_output = torch.matmul(attn_weights, value_states)
if attn_output.size() != (bsz, self.num_heads, q_len, self.head_dim):
raise ValueError(
f"`attn_output` should be of size {(bsz, self.num_heads, q_len, self.head_dim)}, but is"
f" {attn_output.size()}"
)
attn_output = attn_output.transpose(1, 2)
attn_output = attn_output.reshape(bsz, q_len, self.hidden_size)
attn_output = self.o_proj(attn_output)
if not output_attentions:
attn_weights = None
return attn_output, attn_weights, past_key_value
old_init = transformers.models.llama.modeling_llama.LlamaRotaryEmbedding.__init__
def adaptive_ntk_init(self, dim, max_position_embeddings=2048, base=10000, device=None):
self.dim = dim
self.alpha = ALPHA
if isinstance(ALPHA,(float,int)):
base = base * ALPHA ** (dim / (dim-2))
self.base = base
elif ALPHA=='auto':
self.base = base
else:
raise ValueError(ALPHA)
old_init(self, dim, max_position_embeddings, base, device)
ntk_inv_freq = 1.0 / (base ** (torch.arange(0, dim, 2).float().to(device) / dim))
self.register_buffer("ntk_inv_freq", ntk_inv_freq, persistent=False)
def adaptive_ntk_forward(self, x, seq_len=None):
if seq_len > self.max_seq_len_cached:
if isinstance(self.alpha,(float,int)):
self.max_seq_len_cached = seq_len
t = torch.arange(seq_len, device=x.device, dtype=self.ntk_inv_freq.dtype)
freqs = torch.einsum("i,j->ij", t, self.ntk_inv_freq)
emb = torch.cat((freqs, freqs), dim=-1).to(x.device)
self.register_buffer("cos_cached", emb.cos()[None, None, :, :], persistent=False)
self.register_buffer("sin_cached", emb.sin()[None, None, :, :], persistent=False)
return (
self.cos_cached[:, :, :seq_len, ...].to(dtype=x.dtype),
self.sin_cached[:, :, :seq_len, ...].to(dtype=x.dtype),
)
elif self.alpha=='auto':
t = torch.arange(seq_len, device=x.device, dtype=self.ntk_inv_freq.dtype)
dim = self.dim
alpha = (seq_len / 1024 - 1) * 1.1
base = self.base * alpha ** (dim / (dim-2))
ntk_inv_freq = 1.0 / (base ** (torch.arange(0, dim, 2).float().to(x.device) / dim ))
freqs = torch.einsum("i,j->ij", t, ntk_inv_freq)
emb = torch.cat((freqs, freqs), dim=-1).to(x.device)
cos_cached = emb.cos()[None, None, :, :]
sin_cached = emb.sin()[None, None, :, :]
return (
cos_cached[:, :, :seq_len, ...].to(dtype=x.dtype),
sin_cached[:, :, :seq_len, ...].to(dtype=x.dtype)
)
else:
return (
self.cos_cached[:, :, :seq_len, ...].to(dtype=x.dtype),
self.sin_cached[:, :, :seq_len, ...].to(dtype=x.dtype)
)
def apply_attention_patch(
use_memory_efficient_attention=False,
store_kv_before_rope=False
):
global USE_MEM_EFF_ATTENTION, STORE_KV_BEFORE_ROPE
if use_memory_efficient_attention is True and xops is not None:
USE_MEM_EFF_ATTENTION = use_memory_efficient_attention
print("USE_MEM_EFF_ATTENTION: ",USE_MEM_EFF_ATTENTION)
STORE_KV_BEFORE_ROPE = store_kv_before_rope
print("STORE_KV_BEFORE_ROPE:", STORE_KV_BEFORE_ROPE)
transformers.models.llama.modeling_llama.LlamaAttention.forward = xformers_forward
def apply_ntk_scaling_patch(alpha: Union[float,str]):
global ALPHA
ALPHA = alpha
try:
ALPHA = float(ALPHA)
except ValueError:
if ALPHA!="auto":
raise ValueError(f"Alpha can only be a float or 'auto', but given {ALPHA}")
print(f"Apply NTK scaling with ALPHA={ALPHA}")
transformers.models.llama.modeling_llama.LlamaRotaryEmbedding.__init__ = adaptive_ntk_init
transformers.models.llama.modeling_llama.LlamaRotaryEmbedding.forward = adaptive_ntk_forward
================================================
FILE: scripts/training/build_dataset.py
================================================
import logging
import os
from dataclasses import dataclass
from typing import Dict, Sequence, Union, List
import datasets
import torch
from datasets import load_dataset, concatenate_datasets
import transformers
IGNORE_INDEX = -100
logger = logging.getLogger('__name__')
PROMPT_TEMPLATE = (
"Below is an instruction that describes a task. "
"Write a response that appropriately completes the request.\n\n"
"### Instruction:\n{instruction}\n\n### Response: "
)
def build_instruction_dataset(data_path: Union[List[str],str],
tokenizer: transformers.PreTrainedTokenizer,
max_seq_length: int, data_cache_dir = None,
preprocessing_num_workers = None,
):
def tokenization(examples):
sources = []
targets = []
prompt = PROMPT_TEMPLATE
for instruction, input, output in zip(examples['instruction'],examples['input'],examples['output']):
if input is not None and input !="":
instruction = instruction+'\n'+input
source = prompt.format_map({'instruction':instruction})
target = f"{output}{tokenizer.eos_token}"
sources.append(source)
targets.append(target)
tokenized_sources = tokenizer(sources,return_attention_mask=False)
tokenized_targets = tokenizer(targets,return_attention_mask=False,add_special_tokens=False)
all_input_ids = []
all_labels = []
for s,t in zip(tokenized_sources['input_ids'],tokenized_targets['input_ids']):
input_ids = torch.LongTensor(s + t)[:max_seq_length]
labels = torch.LongTensor([IGNORE_INDEX] * len(s) + t)[:max_seq_length]
assert len(input_ids) == len(labels)
all_input_ids.append(input_ids)
all_labels.append(labels)
results = {'input_ids':all_input_ids, 'labels': all_labels}
return results
logging.warning("building dataset...")
all_datasets = []
if not isinstance(data_path,(list,tuple)):
data_path = [data_path]
for file in data_path:
if data_cache_dir is None:
data_cache_dir = str(os.path.dirname(file))
cache_path = os.path.join(data_cache_dir,os.path.basename(file).split('.')[0])
os.makedirs(cache_path, exist_ok=True)
try:
processed_dataset = datasets.load_from_disk(cache_path)
logger.info(f'training datasets-{file} has been loaded from disk')
except Exception:
raw_dataset = load_dataset("json", data_files=file, cache_dir=cache_path)
tokenization_func = tokenization
tokenized_dataset = raw_dataset.map(
tokenization_func,
batched=True,
num_proc=preprocessing_num_workers,
remove_columns=["instruction","input","output"],
keep_in_memory=False,
desc="preprocessing on dataset",
)
processed_dataset = tokenized_dataset
processed_dataset.save_to_disk(cache_path)
processed_dataset.set_format('torch')
all_datasets.append(processed_dataset['train'])
all_datasets = concatenate_datasets(all_datasets)
return all_datasets
@dataclass
class DataCollatorForSupervisedDataset(object):
"""Collate examples for supervised fine-tuning."""
tokenizer: transformers.PreTrainedTokenizer
def __call__(self, instances: Sequence[Dict]) -> Dict[str, torch.Tensor]:
input_ids, labels = tuple([instance[key] for instance in instances] for key in ("input_ids", "labels"))
input_ids = torch.nn.utils.rnn.pad_sequence(
input_ids, batch_first=True, padding_value=self.tokenizer.pad_token_id
)
labels = torch.nn.utils.rnn.pad_sequence(labels, batch_first=True, padding_value=-100)
return dict(
input_ids=input_ids,
labels=labels,
attention_mask=input_ids.ne(self.tokenizer.pad_token_id),
)
================================================
FILE: scripts/training/ds_zero2_no_offload.json
================================================
{
"fp16": {
"enabled": "auto",
"loss_scale": 0,
"loss_scale_window": 100,
"initial_scale_power": 16,
"hysteresis": 2,
"min_loss_scale": 1e-10
},
"zero_optimization": {
"stage": 2,
"allgather_partitions": true,
"allgather_bucket_size": 1e8,
"overlap_comm": true,
"reduce_scatter": true,
"reduce_bucket_size": 1e8,
"contiguous_gradients": true
},
"gradient_accumulation_steps": "auto",
"gradient_clipping": "auto",
"steps_per_print": 2000,
"train_batch_size": "auto",
"train_micro_batch_size_per_gpu": "auto",
"wall_clock_breakdown": false
}
================================================
FILE: scripts/training/run_clm_pt_with_peft.py
================================================
#!/usr/bin/env python
# coding=utf-8
# Copyright 2020 The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
Fine-tuning the library models for causal language modeling (GPT, GPT-2, CTRL, ...) on a text file or a dataset.
Here is the full list of checkpoints on the hub that can be fine-tuned by this script:
https://huggingface.co/models?filter=text-generation
"""
# You can also adapt this script on your own causal language modeling task. Pointers for this are left as comments.
import logging
import numpy as np
import math
import os
import sys
from dataclasses import dataclass, field
from itertools import chain
from typing import Optional, List, Dict, Any, Mapping
from pathlib import Path
import datasets
import torch
from datasets import load_dataset, concatenate_datasets
import transformers
from transformers import (
CONFIG_MAPPING,
MODEL_FOR_CAUSAL_LM_MAPPING,
AutoConfig,
AutoModelForCausalLM,
LlamaForCausalLM,
LlamaTokenizer,
AutoTokenizer,
HfArgumentParser,
Trainer,
TrainingArguments,
is_torch_tpu_available,
set_seed,
)
from transformers.testing_utils import CaptureLogger
from transformers.trainer_utils import get_last_checkpoint
from transformers.utils import send_example_telemetry
from transformers.utils.versions import require_version
from sklearn.metrics import accuracy_score
from peft import LoraConfig, TaskType, get_peft_model, PeftModel, get_peft_model_state_dict
from transformers.trainer_utils import PREFIX_CHECKPOINT_DIR
class SavePeftModelCallback(transformers.TrainerCallback):
def save_model(self, args, state, kwargs):
if state.best_model_checkpoint is not None:
checkpoint_folder = os.path.join(state.best_model_checkpoint, "pt_lora_model")
else:
checkpoint_folder = os.path.join(args.output_dir, f"{PREFIX_CHECKPOINT_DIR}-{state.global_step}")
peft_model_path = os.path.join(checkpoint_folder, "pt_lora_model")
kwargs["model"].save_pretrained(peft_model_path)
kwargs["tokenizer"].save_pretrained(peft_model_path)
def on_save(self, args, state, control, **kwargs):
self.save_model(args, state, kwargs)
return control
def on_train_end(self, args, state, control, **kwargs):
peft_model_path = os.path.join(args.output_dir, "pt_lora_model")
kwargs["model"].save_pretrained(peft_model_path)
kwargs["tokenizer"].save_pretrained(peft_model_path)
def accuracy(predictions, references, normalize=True, sample_weight=None):
return {
"accuracy": float(
accuracy_score(references, predictions, normalize=normalize, sample_weight=sample_weight)
)
}
def compute_metrics(eval_preds):
preds, labels = eval_preds
# preds have the same shape as the labels, after the argmax(-1) has been calculated
# by preprocess_logits_for_metrics but we need to shift the labels
labels = labels[:, 1:].reshape(-1)
preds = preds[:, :-1].reshape(-1)
return accuracy(predictions=preds, references=labels)
def preprocess_logits_for_metrics(logits, labels):
if isinstance(logits, tuple):
# Depending on the model and config, logits may contain extra tensors,
# like past_key_values, but logits always come first
logits = logits[0]
return logits.argmax(dim=-1)
def fault_tolerance_data_collator(features: List) -> Dict[str, Any]:
if not isinstance(features[0], Mapping):
features = [vars(f) for f in features]
first = features[0]
batch = {}
# Special handling for labels.
# Ensure that tensor is created with the correct type
# (it should be automatically the case, but let's make sure of it.)
if "label" in first and first["label"] is not None:
label = first["label"].item() if isinstance(first["label"], torch.Tensor) else first["label"]
dtype = torch.long if isinstance(label, int) else torch.float
batch["labels"] = torch.tensor([f["label"] for f in features], dtype=dtype)
elif "label_ids" in first and first["label_ids"] is not None:
if isinstance(first["label_ids"], torch.Tensor):
batch["labels"] = torch.stack([f["label_ids"] for f in features])
else:
dtype = torch.long if isinstance(first["label_ids"][0], int) else torch.float
batch["labels"] = torch.tensor([f["label_ids"] for f in features], dtype=dtype)
# Handling of all other possible keys.
# Again, we will use the first element to figure out which key/values are not None for this model.
try:
for k, v in first.items():
if k not in ("label", "label_ids") and v is not None and not isinstance(v, str):
if isinstance(v, torch.Tensor):
batch[k] = torch.stack([f[k] for f in features])
elif isinstance(v, np.ndarray):
batch[k] = torch.tensor(np.stack([f[k] for f in features]))
else:
batch[k] = torch.tensor([f[k] for f in features])
except ValueError: # quick fix by simply take the first example
for k, v in first.items():
if k not in ("label", "label_ids") and v is not None and not isinstance(v, str):
if isinstance(v, torch.Tensor):
batch[k] = torch.stack([features[0][k]] * len(features))
elif isinstance(v, np.ndarray):
batch[k] = torch.tensor(np.stack([features[0][k]] * len(features)))
else:
batch[k] = torch.tensor([features[0][k]] * len(features))
return batch
MODEL_CONFIG_CLASSES = list(MODEL_FOR_CAUSAL_LM_MAPPING.keys())
MODEL_TYPES = tuple(conf.model_type for conf in MODEL_CONFIG_CLASSES)
@dataclass
class ModelArguments:
"""
Arguments pertaining to which model/config/tokenizer we are going to fine-tune, or train from scratch.
"""
model_name_or_path: Optional[str] = field(
default=None,
metadata={
"help": (
"The model checkpoint for weights initialization.Don't set if you want to train a model from scratch."
)
},
)
tokenizer_name_or_path: Optional[str] = field(
default=None,
metadata={
"help": (
"The tokenizer for weights initialization.Don't set if you want to train a model from scratch."
)
},
)
model_type: Optional[str] = field(
default=None,
metadata={"help": "If training from scratch, pass a model type from the list: " + ", ".join(MODEL_TYPES)},
)
config_overrides: Optional[str] = field(
default=None,
metadata={
"help": (
"Override some existing default config settings when a model is trained from scratch. Example: "
"n_embd=10,resid_pdrop=0.2,scale_attn_weights=false,summary_type=cls_index"
)
},
)
config_name: Optional[str] = field(
default=None, metadata={"help": "Pretrained config name or path if not the same as model_name"}
)
tokenizer_name: Optional[str] = field(
default=None, metadata={"help": "Pretrained tokenizer name or path if not the same as model_name"}
)
cache_dir: Optional[str] = field(
default=None,
metadata={"help": "Where do you want to store the pretrained models downloaded from huggingface.co"},
)
use_fast_tokenizer: bool = field(
default=True,
metadata={"help": "Whether to use one of the fast tokenizer (backed by the tokenizers library) or not."},
)
model_revision: str = field(
default="main",
metadata={"help": "The specific model version to use (can be a branch name, tag name or commit id)."},
)
use_auth_token: bool = field(
default=False,
metadata={
"help": (
"Will use the token generated when running `huggingface-cli login` (necessary to use this script "
"with private models)."
)
},
)
torch_dtype: Optional[str] = field(
default=None,
metadata={
"help": (
"Override the default `torch.dtype` and load the model under this dtype. If `auto` is passed, the "
"dtype will be automatically derived from the model's weights."
),
"choices": ["auto", "bfloat16", "float16", "float32"],
},
)
def __post_init__(self):
if self.config_overrides is not None and (self.config_name is not None or self.model_name_or_path is not None):
raise ValueError(
"--config_overrides can't be used in combination with --config_name or --model_name_or_path"
)
@dataclass
class DataTrainingArguments:
"""
Arguments pertaining to what data we are going to input our model for training and eval.
"""
dataset_dir: Optional[str] = field(
default=None, metadata={"help": "The name of the dataset to use (via the datasets library)."}
)
dataset_config_name: Optional[str] = field(
default=None, metadata={"help": "The configuration name of the dataset to use (via the datasets library)."}
)
train_file: Optional[str] = field(default=None, metadata={"help": "The input training data file (a text file)."})
validation_file: Optional[str] = field(
default=None,
metadata={"help": "An optional input evaluation data file to evaluate the perplexity on (a text file)."},
)
max_train_samples: Optional[int] = field(
default=None,
metadata={
"help": (
"For debugging purposes or quicker training, truncate the number of training examples to this "
"value if set."
)
},
)
max_eval_samples: Optional[int] = field(
default=None,
metadata={
"help": (
"For debugging purposes or quicker training, truncate the number of evaluation examples to this "
"value if set."
)
},
)
streaming: bool = field(default=False, metadata={"help": "Enable streaming mode"})
block_size: Optional[int] = field(
default=None,
metadata={
"help": (
"Optional input sequence length after tokenization. "
"The training dataset will be truncated in block of this size for training. "
"Default to the model max input length for single sentence inputs (take into account special tokens)."
)
},
)
overwrite_cache: bool = field(
default=False, metadata={"help": "Overwrite the cached training and evaluation sets"}
)
validation_split_percentage: Optional[float] = field(
default=0.05,
metadata={
"help": "The percentage of the train set used as validation set in case there's no validation split"
},
)
preprocessing_num_workers: Optional[int] = field(
default=None,
metadata={"help": "The number of processes to use for the preprocessing."},
)
keep_linebreaks: bool = field(
default=True, metadata={"help": "Whether to keep line breaks when using TXT files or not."}
)
data_cache_dir: Optional[str] = field(default="./", metadata={"help": "The datasets processed stored"})
def __post_init__(self):
if self.streaming:
require_version("datasets>=2.0.0", "The streaming feature requires `datasets>=2.0.0`")
@dataclass
class MyTrainingArguments(TrainingArguments):
trainable : Optional[str] = field(default="q_proj,v_proj")
lora_rank : Optional[int] = field(default=8)
lora_dropout : Optional[float] = field(default=0.1)
lora_alpha : Optional[float] = field(default=32.)
modules_to_save : Optional[str] = field(default=None)
debug_mode : Optional[bool] = field(default=False)
peft_path : Optional[str] = field(default=None)
logger = logging.getLogger(__name__)
def main():
parser = HfArgumentParser((ModelArguments, DataTrainingArguments, MyTrainingArguments))
if len(sys.argv) == 2 and sys.argv[1].endswith(".json"):
# If we pass only one argument to the script and it's the path to a json file,
# let's parse it to get our arguments.
model_args, data_args, training_args = parser.parse_json_file(json_file=os.path.abspath(sys.argv[1]))
else:
model_args, data_args, training_args = parser.parse_args_into_dataclasses()
# Sending telemetry. Tracking the example usage helps us better allocate resources to maintain them. The
# information sent is the one passed as arguments along with your Python/PyTorch versions.
send_example_telemetry("run_clm", model_args, data_args)
# Setup logging
logging.basicConfig(format="%(asctime)s - %(levelname)s - %(name)s - %(message)s",datefmt="%m/%d/%Y %H:%M:%S",
level=logging.INFO, # if training_args.local_rank in [-1, 0] else logging.WARN,
handlers=[logging.StreamHandler(sys.stdout)],)
if training_args.should_log:
# The default of training_args.log_level is passive, so we set log level at info here to have that default.
transformers.utils.logging.set_verbosity_info()
log_level = training_args.get_process_log_level()
logger.setLevel(log_level)
datasets.utils.logging.set_verbosity(log_level)
transformers.utils.logging.set_verbosity(log_level)
transformers.utils.logging.enable_default_handler()
transformers.utils.logging.enable_explicit_format()
# transformers.tokenization_utils.logging.set_verbosity_warning()
# Log on each process the small summary:
logger.warning(
f"Process rank: {training_args.local_rank}, device: {training_args.device}, n_gpu: {training_args.n_gpu}"
+ f"distributed training: {bool(training_args.local_rank != -1)}, 16-bits training: {training_args.fp16}"
)
# Detecting last checkpoint.
last_checkpoint = None
if os.path.isdir(training_args.output_dir) and training_args.do_train and not training_args.overwrite_output_dir:
last_checkpoint = get_last_checkpoint(training_args.output_dir)
if last_checkpoint is None and len(os.listdir(training_args.output_dir)) > 0:
raise ValueError(
f"Output directory ({training_args.output_dir}) already exists and is not empty. "
"Use --overwrite_output_dir to overcome."
)
elif last_checkpoint is not None and training_args.resume_from_checkpoint is None:
logger.info(
f"Checkpoint detected, resuming training at {last_checkpoint}. To avoid this behavior, change "
"the `--output_dir` or add `--overwrite_output_dir` to train from scratch."
)
# Set seed before initializing model.
set_seed(training_args.seed)
config_kwargs = {
"cache_dir": model_args.cache_dir,
"revision": model_args.model_revision,
"use_auth_token": True if model_args.use_auth_token else None,
}
if model_args.config_name:
config = AutoConfig.from_pretrained(model_args.config_name, **config_kwargs)
elif model_args.model_name_or_path:
config = AutoConfig.from_pretrained(model_args.model_name_or_path, **config_kwargs)
else:
config = CONFIG_MAPPING[model_args.model_type]()
logger.warning("You are instantiating a new config instance from scratch.")
if model_args.config_overrides is not None:
logger.info(f"Overriding config: {model_args.config_overrides}")
config.update_from_string(model_args.config_overrides)
logger.info(f"New config: {config}")
tokenizer_kwargs = {
"cache_dir": model_args.cache_dir,
"use_fast": model_args.use_fast_tokenizer,
"revision": model_args.model_revision,
"use_auth_token": True if model_args.use_auth_token else None,
}
if model_args.tokenizer_name:
tokenizer = AutoTokenizer.from_pretrained(model_args.tokenizer_name, **tokenizer_kwargs)
elif model_args.tokenizer_name_or_path:
tokenizer = LlamaTokenizer.from_pretrained(model_args.tokenizer_name_or_path, **tokenizer_kwargs)
else:
raise ValueError(
"You are instantiating a new tokenizer from scratch. This is not supported by this script."
"You can do it from another script, save it, and load it from here, using --tokenizer_name."
)
# Preprocessing the datasets.
# First we tokenize all the texts.
# since this will be pickled to avoid _LazyModule error in Hasher force logger loading before tokenize_function
tok_logger = transformers.utils.logging.get_logger("transformers.tokenization_utils_base")
def tokenize_function(examples):
with CaptureLogger(tok_logger) as cl:
output = tokenizer(examples["text"])
# clm input could be much much longer than block_size
if "Token indices sequence length is longer than the" in cl.out:
tok_logger.warning(
"^^^^^^^^^^^^^^^^ Please ignore the warning above - this long input will be chunked into smaller bits"
" before being passed to the model."
)
return output
if data_args.block_size is None:
block_size = tokenizer.model_max_length
if block_size > 1024:
logger.warning(
"The chosen tokenizer supports a `model_max_length` that is longer than the default `block_size` value"
" of 1024. If you would like to use a longer `block_size` up to `tokenizer.model_max_length` you can"
" override this default with `--block_size xxx`."
)
block_size = 1024
else:
if data_args.block_size > tokenizer.model_max_length:
logger.warning(
f"The block_size passed ({data_args.block_size}) is larger than the maximum length for the model"
f"({tokenizer.model_max_length}). Using block_size={tokenizer.model_max_length}."
)
block_size = min(data_args.block_size, tokenizer.model_max_length)
# Main data processing function that will concatenate all texts from our dataset and generate chunks of block_size.
def group_texts(examples):
# Concatenate all texts.
concatenated_examples = {k: list(chain(*examples[k])) for k in examples.keys()}
total_length = len(concatenated_examples[list(examples.keys())[0]])
# We drop the small remainder, we could add padding if the model supported it instead of this drop, you can
# customize this part to your needs.
if total_length >= block_size:
total_length = (total_length // block_size) * block_size
# Split by chunks of max_len.
result = {
k: [t[i : i + block_size] for i in range(0, total_length, block_size)]
for k, t in concatenated_examples.items()
}
result["labels"] = result["input_ids"].copy()
return result
with training_args.main_process_first(desc="dataset map tokenization and grouping"):
lm_datasets = []
path = Path(data_args.dataset_dir)
files = [file.name for file in path.glob("*.txt")]
if training_args.debug_mode is True:
files = [files[0]]
for idx, file in enumerate(files):
data_file = os.path.join(path, file)
filename = ''.join(file.split(".")[:-1])
cache_path = os.path.join(data_args.data_cache_dir, filename)
os.makedirs(cache_path, exist_ok=True)
try:
processed_dataset = datasets.load_from_disk(cache_path, keep_in_memory=False)
logger.info(f'training datasets-{filename} has been loaded from disk')
except Exception:
cache_dir = os.path.join(data_args.data_cache_dir, filename+"_text")
os.makedirs(cache_dir, exist_ok=True)
raw_dataset = load_dataset("text", data_files=data_file, cache_dir=cache_dir, keep_in_memory=False)
logger.info(f"{file} has been loaded")
tokenized_dataset = raw_dataset.map(
tokenize_function,
batched=True,
num_proc=data_args.preprocessing_num_workers,
remove_columns="text",
load_from_cache_file=True,
keep_in_memory=False,
cache_file_names = {k: os.path.join(cache_dir, 'tokenized.arrow') for k in raw_dataset},
desc="Running tokenizer on dataset",
)
grouped_datasets = tokenized_dataset.map(
group_texts,
batched=True,
num_proc=data_args.preprocessing_num_workers,
load_from_cache_file=True,
keep_in_memory=False,
cache_file_names = {k: os.path.join(cache_dir, 'grouped.arrow') for k in tokenized_dataset},
desc=f"Grouping texts in chunks of {block_size}",
)
processed_dataset = grouped_datasets
processed_dataset.save_to_disk(cache_path)
if idx == 0:
lm_datasets = processed_dataset['train']
else:
assert lm_datasets.features.type == processed_dataset["train"].features.type
lm_datasets = concatenate_datasets([lm_datasets, processed_dataset["train"]])
lm_datasets = lm_datasets.train_test_split(test_size = data_args.validation_split_percentage)
if training_args.do_train:
train_dataset = lm_datasets['train']
if data_args.max_train_samples is not None:
max_train_samples = min(len(train_dataset), data_args.max_train_samples)
train_dataset = train_dataset.select(range(max_train_samples))
logger.info(f"Num train_samples {len(train_dataset)}")
logger.info("training example:")
logger.info(tokenizer.decode(train_dataset[0]['input_ids']))
if training_args.do_eval:
eval_dataset = lm_datasets["test"]
if data_args.max_eval_samples is not None:
max_eval_samples = min(len(eval_dataset), data_args.max_eval_samples)
eval_dataset = eval_dataset.select(range(max_eval_samples))
logger.info(f"Num eval_samples {len(eval_dataset)}")
logger.info("training example:")
logger.info(tokenizer.decode(eval_dataset[0]['input_ids']))
if model_args.model_name_or_path:
torch_dtype = (
model_args.torch_dtype
if model_args.torch_dtype in ["auto", None]
else getattr(torch, model_args.torch_dtype)
)
model = LlamaForCausalLM.from_pretrained(
model_args.model_name_or_path,
from_tf=bool(".ckpt" in model_args.model_name_or_path),
config=config,
cache_dir=model_args.cache_dir,
revision=model_args.model_revision,
use_auth_token=True if model_args.use_auth_token else None,
torch_dtype=torch_dtype,
low_cpu_mem_usage=True
)
else:
model = AutoModelForCausalLM.from_config(config)
n_params = sum({p.data_ptr(): p.numel() for p in model.parameters()}.values())
logger.info(f"Training new model from scratch - Total size={n_params/2**20:.2f}M params")
model_vocab_size = model.get_output_embeddings().weight.size(0)
if not (
(model_vocab_size==32000 and len(tokenizer)==49953) or \
(model_vocab_size==32000 and len(tokenizer)==32000) or \
(model_vocab_size==49953 and len(tokenizer)==49953) or \
(model_vocab_size==49954 and len(tokenizer)==49954)
):
raise ValueError(
f"The combination of base model (size: {model_vocab_size}) and tokenizer (size: {len(tokenizer)}) is not a valid configuration. Please check our project wiki for further information. \n"
"Valid configurations (base model / tokenizer):\n"
"- Continue pre-training original LLaMA: 32000 / 32000 \n"
"- Pre-training Chinese LLaMA based on original LLaMA: 32000 / 49953 \n"
"- Continue pre-training Chinese LLaMA: 49953 / 49953 \n"
"- Continue pre-training Chinese Alpaca: 49954 / 49954 \n")
model.resize_token_embeddings(len(tokenizer))
if training_args.peft_path is not None:
logger.info("Peft from pre-trained model")
model = PeftModel.from_pretrained(model, training_args.peft_path)
else:
logger.info("Init new peft model")
target_modules = training_args.trainable.split(',')
modules_to_save = training_args.modules_to_save
if modules_to_save is not None:
modules_to_save = modules_to_save.split(',')
lora_rank = training_args.lora_rank
lora_dropout = training_args.lora_dropout
lora_alpha = training_args.lora_alpha
logger.info(f"target_modules: {target_modules}")
logger.info(f"lora_rank: {lora_rank}")
peft_config = LoraConfig(
task_type=TaskType.CAUSAL_LM,
target_modules=target_modules,
inference_mode=False,
r=lora_rank, lora_alpha=lora_alpha,
lora_dropout=lora_dropout,
modules_to_save=modules_to_save)
model = get_peft_model(model, peft_config)
model.print_trainable_parameters()
old_state_dict = model.state_dict
model.state_dict = (
lambda self, *_, **__: get_peft_model_state_dict(self, old_state_dict())
).__get__(model, type(model))
# Initialize our Trainer
trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_dataset if training_args.do_train else None,
eval_dataset=eval_dataset if training_args.do_eval else None,
tokenizer=tokenizer,
data_collator=fault_tolerance_data_collator,
compute_metrics=compute_metrics if training_args.do_eval and not is_torch_tpu_available() else None,
preprocess_logits_for_metrics=preprocess_logits_for_metrics
if training_args.do_eval and not is_torch_tpu_available()
else None,
)
trainer.add_callback(SavePeftModelCallback)
# Training
if training_args.do_train:
checkpoint = None
if training_args.resume_from_checkpoint is not None:
checkpoint = training_args.resume_from_checkpoint
elif last_checkpoint is not None:
checkpoint = last_checkpoint
train_result = trainer.train(resume_from_checkpoint=checkpoint)
metrics = train_result.metrics
max_train_samples = (
data_args.max_train_samples if data_args.max_train_samples is not None else len(train_dataset)
)
metrics["train_samples"] = min(max_train_samples, len(train_dataset))
trainer.log_metrics("train", metrics)
trainer.save_metrics("train", metrics)
trainer.save_state()
# Evaluation
if training_args.do_eval:
logger.info("*** Evaluate ***")
metrics = trainer.evaluate()
max_eval_samples = data_args.max_eval_samples if data_args.max_eval_samples is not None else len(eval_dataset)
metrics["eval_samples"] = min(max_eval_samples, len(eval_dataset))
try:
perplexity = math.exp(metrics["eval_loss"])
except OverflowError:
perplexity = float("inf")
metrics["perplexity"] = perplexity
trainer.log_metrics("eval", metrics)
trainer.save_metrics("eval", metrics)
if __name__ == "__main__":
main()
================================================
FILE: scripts/training/run_clm_sft_with_peft.py
================================================
#!/usr/bin/env python
# coding=utf-8
# Copyright 2020 The HuggingFace Inc. team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""
Fine-tuning the library models for causal language modeling (GPT, GPT-2, CTRL, ...) on a text file or a dataset.
Here is the full list of checkpoints on the hub that can be fine-tuned by this script:
https://huggingface.co/models?filter=text-generation
"""
# You can also adapt this script on your own causal language modeling task. Pointers for this are left as comments.
import logging
import math
import os
import sys
from dataclasses import dataclass, field
from typing import Optional
from pathlib import Path
import datasets
import torch
from build_dataset import build_instruction_dataset, DataCollatorForSupervisedDataset
import transformers
from transformers import (
CONFIG_MAPPING,
AutoConfig,
AutoModelForCausalLM,
LlamaForCausalLM,
LlamaTokenizer,
AutoTokenizer,
HfArgumentParser,
Trainer,
TrainingArguments,
set_seed,
)
from transformers.trainer_utils import get_last_checkpoint
from transformers.utils import send_example_telemetry
from transformers.utils.versions import require_version
from peft import LoraConfig, TaskType, get_peft_model, PeftModel, get_peft_model_state_dict
from transformers.trainer_utils import PREFIX_CHECKPOINT_DIR
IGNORE_INDEX = -100
DEFAULT_PAD_TOKEN = "[PAD]"
DEFAULT_EOS_TOKEN = " "
DEFAULT_BOS_TOKEN = ""
DEFAULT_UNK_TOKEN = ""
require_version("datasets>=1.8.0", "To fix: pip install -r examples/pytorch/language-modeling/requirements.txt")
class SavePeftModelCallback(transformers.TrainerCallback):
def save_model(self, args, state, kwargs):
if state.best_model_checkpoint is not None:
checkpoint_folder = os.path.join(state.best_model_checkpoint, "sft_lora_model")
else:
checkpoint_folder = os.path.join(args.output_dir, f"{PREFIX_CHECKPOINT_DIR}-{state.global_step}")
peft_model_path = os.path.join(checkpoint_folder, "sft_lora_model")
kwargs["model"].save_pretrained(peft_model_path)
kwargs["tokenizer"].save_pretrained(peft_model_path)
def on_save(self, args, state, control, **kwargs):
self.save_model(args, state, kwargs)
return control
def on_train_end(self, args, state, control, **kwargs):
peft_model_path = os.path.join(args.output_dir, "sft_lora_model")
kwargs["model"].save_pretrained(peft_model_path)
kwargs["tokenizer"].save_pretrained(peft_model_path)
@dataclass
class ModelArguments:
"""
Arguments pertaining to which model/config/tokenizer we are going to fine-tune, or train from scratch.
"""
model_name_or_path: Optional[str] = field(
default=None,
metadata={
"help": (
"The model checkpoint for weights initialization.Don't set if you want to train a model from scratch."
)
},
)
tokenizer_name_or_path: Optional[str] = field(
default=None,
metadata={
"help": (
"The tokenizer for weights initialization.Don't set if you want to train a model from scratch."
)
},
)
config_overrides: Optional[str] = field(
default=None,
metadata={
"help": (
"Override some existing default config settings when a model is trained from scratch. Example: "
"n_embd=10,resid_pdrop=0.2,scale_attn_weights=false,summary_type=cls_index"
)
},
)
config_name: Optional[str] = field(
default=None, metadata={"help": "Pretrained config name or path if not the same as model_name"}
)
tokenizer_name: Optional[str] = field(
default=None, metadata={"help": "Pretrained tokenizer name or path if not the same as model_name"}
)
cache_dir: Optional[str] = field(
default=None,
metadata={"help": "Where do you want to store the pretrained models downloaded from huggingface.co"},
)
use_fast_tokenizer: bool = field(
default=True,
metadata={"help": "Whether to use one of the fast tokenizer (backed by the tokenizers library) or not."},
)
model_revision: str = field(
default="main",
metadata={"help": "The specific model version to use (can be a branch name, tag name or commit id)."},
)
use_auth_token: bool = field(
default=False,
metadata={
"help": (
"Will use the token generated when running `huggingface-cli login` (necessary to use this script "
"with private models)."
)
},
)
torch_dtype: Optional[str] = field(
default=None,
metadata={
"help": (
"Override the default `torch.dtype` and load the model under this dtype. If `auto` is passed, the "
"dtype will be automatically derived from the model's weights."
),
"choices": ["auto", "bfloat16", "float16", "float32"],
},
)
def __post_init__(self):
if self.config_overrides is not None and (self.config_name is not None or self.model_name_or_path is not None):
raise ValueError(
"--config_overrides can't be used in combination with --config_name or --model_name_or_path"
)
@dataclass
class DataTrainingArguments:
"""
Arguments pertaining to what data we are going to input our model for training and eval.
"""
dataset_dir: Optional[str] = field(
default=None, metadata={"help": "The name of the dataset to use (via the datasets library)."}
)
train_file: Optional[str] = field(default=None, metadata={"help": "The input training data file (a text file)."})
validation_file: Optional[str] = field(
default=None,
metadata={"help": "An optional input evaluation data file to evaluate the perplexity on (a text file)."},
)
overwrite_cache: bool = field(
default=False, metadata={"help": "Overwrite the cached training and evaluation sets"}
)
validation_split_percentage: Optional[float] = field(
default=0.05,
metadata={
"help": "The percentage of the train set used as validation set in case there's no validation split"
},
)
preprocessing_num_workers: Optional[int] = field(
default=None,
metadata={"help": "The number of processes to use for the preprocessing."},
)
keep_linebreaks: bool = field(
default=True, metadata={"help": "Whether to keep line breaks when using TXT files or not."}
)
data_cache_dir: Optional[str] = field(default=None, metadata={"help": "The datasets processed stored"})
max_seq_length: Optional[int] = field(default=512)
@dataclass
class MyTrainingArguments(TrainingArguments):
trainable : Optional[str] = field(default="q_proj,v_proj")
lora_rank : Optional[int] = field(default=8)
lora_dropout : Optional[float] = field(default=0.1)
lora_alpha : Optional[float] = field(default=32.)
modules_to_save : Optional[str] = field(default=None)
peft_path : Optional[str] = field(default=None)
force_resize_embeddings: bool = field(default=False)
logger = logging.getLogger(__name__)
def main():
parser = HfArgumentParser((ModelArguments, DataTrainingArguments, MyTrainingArguments))
if len(sys.argv) == 2 and sys.argv[1].endswith(".json"):
# If we pass only one argument to the script and it's the path to a json file,
# let's parse it to get our arguments.
model_args, data_args, training_args = parser.parse_json_file(json_file=os.path.abspath(sys.argv[1]))
else:
model_args, data_args, training_args = parser.parse_args_into_dataclasses()
send_example_telemetry("run_clm", model_args, data_args)
# Setup logging
logging.basicConfig(format="%(asctime)s - %(levelname)s - %(name)s - %(message)s",datefmt="%m/%d/%Y %H:%M:%S",
level=logging.INFO, # if training_args.local_rank in [-1, 0] else logging.WARN,
handlers=[logging.StreamHandler(sys.stdout)],)
if training_args.should_log:
# The default of training_args.log_level is passive, so we set log level at info here to have that default.
transformers.utils.logging.set_verbosity_info()
log_level = training_args.get_process_log_level()
logger.setLevel(log_level)
datasets.utils.logging.set_verbosity(log_level)
transformers.utils.logging.set_verbosity(log_level)
transformers.utils.logging.enable_default_handler()
transformers.utils.logging.enable_explicit_format()
# transformers.tokenization_utils.logging.set_verbosity_warning()
# Log on each process the small summary:
logger.warning(
f"Process rank: {training_args.local_rank}, device: {training_args.device}, n_gpu: {training_args.n_gpu}"
+ f"distributed training: {bool(training_args.local_rank != -1)}, 16-bits training: {training_args.fp16}"
)
# Detecting last checkpoint.
last_checkpoint = None
if os.path.isdir(training_args.output_dir) and training_args.do_train and not training_args.overwrite_output_dir:
last_checkpoint = get_last_checkpoint(training_args.output_dir)
if last_checkpoint is None and len(os.listdir(training_args.output_dir)) > 0:
raise ValueError(
f"Output directory ({training_args.output_dir}) already exists and is not empty. "
"Use --overwrite_output_dir to overcome."
)
elif last_checkpoint is not None and training_args.resume_from_checkpoint is None:
logger.info(
f"Checkpoint detected, resuming training at {last_checkpoint}. To avoid this behavior, change "
"the `--output_dir` or add `--overwrite_output_dir` to train from scratch."
)
# Set seed before initializing model.
set_seed(training_args.seed)
config_kwargs = {
"cache_dir": model_args.cache_dir,
"revision": model_args.model_revision,
"use_auth_token": True if model_args.use_auth_token else None,
}
if model_args.config_name:
config = AutoConfig.from_pretrained(model_args.config_name, **config_kwargs)
elif model_args.model_name_or_path:
config = AutoConfig.from_pretrained(model_args.model_name_or_path, **config_kwargs)
else:
config = CONFIG_MAPPING[model_args.model_type]()
logger.warning("You are instantiating a new config instance from scratch.")
if model_args.config_overrides is not None:
logger.info(f"Overriding config: {model_args.config_overrides}")
config.update_from_string(model_args.config_overrides)
logger.info(f"New config: {config}")
tokenizer_kwargs = {
"cache_dir": model_args.cache_dir,
"use_fast": model_args.use_fast_tokenizer,
"revision": model_args.model_revision,
"use_auth_token": True if model_args.use_auth_token else None,
}
if model_args.tokenizer_name:
tokenizer = AutoTokenizer.from_pretrained(model_args.tokenizer_name, **tokenizer_kwargs)
elif model_args.tokenizer_name_or_path:
tokenizer = LlamaTokenizer.from_pretrained(model_args.tokenizer_name_or_path, **tokenizer_kwargs)
else:
raise ValueError(
"You are instantiating a new tokenizer from scratch. This is not supported by this script."
"You can do it from another script, save it, and load it from here, using --tokenizer_name."
)
if (len(tokenizer))!=49954:
raise ValueError(f"The vocab size of the tokenizer must be 49954, but found {len(tokenizer)}.\n"
"Please use Chinese Alpaca tokenizer!")
if tokenizer.pad_token is None:
print(f"Adding pad token {DEFAULT_PAD_TOKEN}")
tokenizer.add_special_tokens(dict(pad_token=DEFAULT_PAD_TOKEN))
data_collator = DataCollatorForSupervisedDataset(tokenizer=tokenizer)
eval_dataset=None
train_dataset = None
if training_args.do_train:
with training_args.main_process_first(desc="loading and tokenization"):
path = Path(data_args.dataset_dir)
files = [os.path.join(path,file.name) for file in path.glob("*.json")]
logger.info(f"Training files: {' '.join(files)}")
train_dataset = build_instruction_dataset(
data_path=files,
tokenizer=tokenizer,
max_seq_length=data_args.max_seq_length,
data_cache_dir = None,
preprocessing_num_workers = data_args.preprocessing_num_workers)
logger.info(f"Num train_samples {len(train_dataset)}")
logger.info("training example:")
logger.info(tokenizer.decode(train_dataset[0]['input_ids']))
if training_args.do_eval:
with training_args.main_process_first(desc="loading and tokenization"):
files = [data_args.validation_file]
logger.info(f"Evaluation files: {' '.join(files)}")
eval_dataset = build_instruction_dataset(
data_path=files,
tokenizer=tokenizer,
max_seq_length=data_args.max_seq_length,
data_cache_dir = None,
preprocessing_num_workers = data_args.preprocessing_num_workers)
logger.info(f"Num eval_samples {len(eval_dataset)}")
logger.info("eval example:")
logger.info(tokenizer.decode(eval_dataset[0]['input_ids']))
if model_args.model_name_or_path:
torch_dtype = (
model_args.torch_dtype
if model_args.torch_dtype in ["auto", None]
else getattr(torch, model_args.torch_dtype)
)
model = LlamaForCausalLM.from_pretrained(
model_args.model_name_or_path,
from_tf=bool(".ckpt" in model_args.model_name_or_path),
config=config,
cache_dir=model_args.cache_dir,
revision=model_args.model_revision,
use_auth_token=True if model_args.use_auth_token else None,
torch_dtype=torch_dtype,
low_cpu_mem_usage=True
)
else:
model = AutoModelForCausalLM.from_config(config)
n_params = sum({p.data_ptr(): p.numel() for p in model.parameters()}.values())
logger.info(f"Training new model from scratch - Total size={n_params/2**20:.2f}M params")
logger.info(f"len(tokenizer):{len(tokenizer)}")
embedding_size = model.get_input_embeddings().weight.shape[0]
if len(tokenizer) != embedding_size:
logger.info("resize the embedding size by the size of the tokenizer")
model.resize_token_embeddings(len(tokenizer))
if training_args.peft_path is not None:
logger.info("Peft from pre-trained model")
model = PeftModel.from_pretrained(model, training_args.peft_path)
else:
logger.info("Init new peft model")
target_modules = training_args.trainable.split(',')
modules_to_save = training_args.modules_to_save
if modules_to_save is not None:
modules_to_save = modules_to_save.split(',')
lora_rank = training_args.lora_rank
lora_dropout = training_args.lora_dropout
lora_alpha = training_args.lora_alpha
logger.info(f"target_modules: {target_modules}")
logger.info(f"lora_rank: {lora_rank}")
peft_config = LoraConfig(
task_type=TaskType.CAUSAL_LM,
target_modules=target_modules,
inference_mode=False,
r=lora_rank, lora_alpha=lora_alpha,
lora_dropout=lora_dropout,
modules_to_save=modules_to_save)
model = get_peft_model(model, peft_config)
#model.base_model.tie_weights()
model.print_trainable_parameters()
logger.info(f"model.modules_to_save: {model.modules_to_save}")
old_state_dict = model.state_dict
model.state_dict = (
lambda self, *_, **__: get_peft_model_state_dict(self, old_state_dict())
).__get__(model, type(model))
# Initialize our Trainer
trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
tokenizer=tokenizer,
data_collator=data_collator,
)
trainer.add_callback(SavePeftModelCallback)
# Training
if training_args.do_train:
checkpoint = None
if training_args.resume_from_checkpoint is not None:
checkpoint = training_args.resume_from_checkpoint
elif last_checkpoint is not None:
checkpoint = last_checkpoint
train_result = trainer.train(resume_from_checkpoint=checkpoint)
metrics = train_result.metrics
metrics["train_samples"] = len(train_dataset)
trainer.log_metrics("train", metrics)
trainer.save_metrics("train", metrics)
trainer.save_state()
# Evaluation
if training_args.do_eval:
logger.info("*** Evaluate ***")
metrics = trainer.evaluate()
metrics["eval_samples"] =len(eval_dataset)
try:
perplexity = math.exp(metrics["eval_loss"])
except OverflowError:
perplexity = float("inf")
metrics["perplexity"] = perplexity
trainer.log_metrics("eval", metrics)
trainer.save_metrics("eval", metrics)
if __name__ == "__main__":
main()
================================================
FILE: scripts/training/run_pt.sh
================================================
lr=2e-4
lora_rank=8
lora_alpha=32
lora_trainable="q_proj,v_proj,k_proj,o_proj,gate_proj,down_proj,up_proj"
modules_to_save="embed_tokens,lm_head"
lora_dropout=0.05
pretrained_model=path/to/hf/llama/dir
chinese_tokenizer_path=path/to/chinese/llama/tokenizer/dir
dataset_dir=path/to/pt/data/dir
data_cache=temp_data_cache_dir
per_device_train_batch_size=1
per_device_eval_batch_size=1
gradient_accumulation_steps=8
output_dir=output_dir
deepspeed_config_file=ds_zero2_no_offload.json
torchrun --nnodes 1 --nproc_per_node 1 run_clm_pt_with_peft.py \
--deepspeed ${deepspeed_config_file} \
--model_name_or_path ${pretrained_model} \
--tokenizer_name_or_path ${chinese_tokenizer_path} \
--dataset_dir ${dataset_dir} \
--data_cache_dir ${data_cache} \
--validation_split_percentage 0.001 \
--per_device_train_batch_size ${per_device_train_batch_size} \
--per_device_eval_batch_size ${per_device_eval_batch_size} \
--do_train \
--seed $RANDOM \
--fp16 \
--num_train_epochs 1 \
--lr_scheduler_type cosine \
--learning_rate ${lr} \
--warmup_ratio 0.05 \
--weight_decay 0.01 \
--logging_strategy steps \
--logging_steps 10 \
--save_strategy steps \
--save_total_limit 3 \
--save_steps 200 \
--gradient_accumulation_steps ${gradient_accumulation_steps} \
--preprocessing_num_workers 8 \
--block_size 512 \
--output_dir ${output_dir} \
--overwrite_output_dir \
--ddp_timeout 30000 \
--logging_first_step True \
--lora_rank ${lora_rank} \
--lora_alpha ${lora_alpha} \
--trainable ${lora_trainable} \
--modules_to_save ${modules_to_save} \
--lora_dropout ${lora_dropout} \
--torch_dtype float16 \
--gradient_checkpointing \
--ddp_find_unused_parameters False
================================================
FILE: scripts/training/run_sft.sh
================================================
lr=1e-4
lora_rank=8
lora_alpha=32
lora_trainable="q_proj,v_proj,k_proj,o_proj,gate_proj,down_proj,up_proj"
modules_to_save="embed_tokens,lm_head"
lora_dropout=0.05
pretrained_model=path/to/hf/llama/or/merged/llama/dir/or/model_id
chinese_tokenizer_path=path/to/chinese/llama/tokenizer/dir
dataset_dir=path/to/sft/data/dir
per_device_train_batch_size=1
per_device_eval_batch_size=1
gradient_accumulation_steps=8
output_dir=output_dir
peft_model=path/to/peft/model/dir
validation_file=validation_file_name
deepspeed_config_file=ds_zero2_no_offload.json
torchrun --nnodes 1 --nproc_per_node 1 run_clm_sft_with_peft.py \
--deepspeed ${deepspeed_config_file} \
--model_name_or_path ${pretrained_model} \
--tokenizer_name_or_path ${chinese_tokenizer_path} \
--dataset_dir ${dataset_dir} \
--validation_split_percentage 0.001 \
--per_device_train_batch_size ${per_device_train_batch_size} \
--per_device_eval_batch_size ${per_device_eval_batch_size} \
--do_train \
--do_eval \
--seed $RANDOM \
--fp16 \
--num_train_epochs 1 \
--lr_scheduler_type cosine \
--learning_rate ${lr} \
--warmup_ratio 0.03 \
--weight_decay 0 \
--logging_strategy steps \
--logging_steps 10 \
--save_strategy steps \
--save_total_limit 3 \
--evaluation_strategy steps \
--eval_steps 100 \
--save_steps 200 \
--gradient_accumulation_steps ${gradient_accumulation_steps} \
--preprocessing_num_workers 8 \
--max_seq_length 512 \
--output_dir ${output_dir} \
--overwrite_output_dir \
--ddp_timeout 30000 \
--logging_first_step True \
--lora_rank ${lora_rank} \
--lora_alpha ${lora_alpha} \
--trainable ${lora_trainable} \
--modules_to_save ${modules_to_save} \
--lora_dropout ${lora_dropout} \
--torch_dtype float16 \
--validation_file ${validation_file} \
--peft_path ${peft_model} \
--gradient_checkpointing \
--ddp_find_unused_parameters False