[
  {
    "path": "LICENSE",
    "content": "MIT License\n\nCopyright (c) 2023 YuhuiXu\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\nSOFTWARE.\n"
  },
  {
    "path": "README.md",
    "content": "# QA-LoRA\n\nQA-LoRA has been accepted by ICLR 2024!\n\nThis repository provides the official PyTorch implementation of [QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models](https://arxiv.org/pdf/2309.14717.pdf).\n\n<div align=\"center\">\n  <img src=\"image/qalora.png\" width=\"600\"/>\n</div>\n\nQA-LoRA is easily implemented with a few lines of code, and it equips the original LoRA with two-fold abilities: (i) during fine-tuning, the LLM's weights are quantized (e.g., into INT4) to reduce time and memory usage; (ii) after fine-tuning, the LLM and auxiliary weights are naturally integrated into a quantized model without loss of accuracy.\n\n## Todo list\nFix the conflict with the newest Auto-gptq version.\n\n## Installation\n```bash\nconda create -n qalora python=3.8\nconda activate qalora\nconda install pytorch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 pytorch-cuda=12.1 -c pytorch -c nvidia\ngit clone -b v0.3.0 https://github.com/PanQiWei/AutoGPTQ.git && cd AutoGPTQ\npip install .\ncd ..\npip install bitsandbytes\npip install -r requirements.txt\npip install protobuf==3.20.*\n```\nChange the `peft_utils.py` in your own auto-gptq path(python path/auto_gptq/utils/peft_utils.py) with the new one.\nFor the users of [GPTQLORA](https://github.com/qwopqwop200/gptqlora), you only need to change the `peft_utils.py` file.\n\n\n## Quantization\nWe use [GPTQ](https://github.com/qwopqwop200/GPTQ-for-LLaMa) for quantization. \nbits=4, group-size=32, act-order=False\nIf you change the group-size, you need to change the group_size in `peft_utils.py` and `merge.py` accordingly.\n\n## Training\n```bash\npython qalora.py --model_path <path>\n```\n\nThe file structure of the model checkpoint is as follows:\n```\nconfig.json             llama7b-4bit-32g.bin  special_tokens_map.json  tokenizer_config.json\ngeneration_config.json  quantize_config.json      tokenizer.model\n```\n\n## Merge\nNote that our trained LoRA modules can be perfectly merged into the quantized model. We offer a simple merged script in this repo.\n\n## Notice \n### About the implementations\nThere are two kinds of implementations of the dimention reduction(x from D_in to D_in//L). Both are mathematical equivalent.\n#### The first one(this repo)\nAdopt avgpooling operation. But the weights of adapters will be divided by D_in//L during merge(refer to `merge.py`).\n```bash\nadapter_result = (lora_B(lora_A(lora_dropout(self.qa_pool(x)))） * scale).type_as(result)\nmodel[tmp_key+'.qzeros'] -= (lora['base_model.model.'+tmp_key+'.lora_B.weight'] @ lora['base_model.model.'+tmp_key+'.lora_A.weight']).t() * scale / group_size / model[tmp_key+'.scales']\n```\n#### The second one \nUtilize sum operation. The adapters do not need to be divided during merge)\n\n```bash\nadapter_result = (lora_B(lora_A(lora_dropout(self.qa_pool(x) * group_size))） * scale).type_as(result)\nmodel[tmp_key+'.qzeros'] -= (lora['base_model.model.'+tmp_key+'.lora_B.weight'] @ lora['base_model.model.'+tmp_key+'.lora_A.weight']).t() * scale / model[tmp_key+'.scales']\n```\n\n### About the quantization\n\nSome GPTQ implementation such as [GPTQ-for-llama](https://github.com/qwopqwop200/GPTQ-for-LLaMa) further compress the zeros into qzeros. You need to decode the qzeros first and restore fp16 format zeros.\n## Acknowledgements\nOur code is based on [QLoRA](https://github.com/artidoro/qlora), [GPTQLORA](https://github.com/qwopqwop200/gptqlora), [Auto-GPTQ](https://github.com/PanQiWei/AutoGPTQ/tree/main)\n"
  },
  {
    "path": "environment.yaml",
    "content": "name: alpaca\nchannels:\n  - defaults\ndependencies:\n  - _libgcc_mutex=0.1=main\n  - _openmp_mutex=5.1=1_gnu\n  - ca-certificates=2023.05.30=h06a4308_0\n  - ld_impl_linux-64=2.38=h1181459_1\n  - libffi=3.4.4=h6a678d5_0\n  - libgcc-ng=11.2.0=h1234567_1\n  - libgomp=11.2.0=h1234567_1\n  - libstdcxx-ng=11.2.0=h1234567_1\n  - ncurses=6.4=h6a678d5_0\n  - openssl=3.0.9=h7f8727e_0\n  - pip=23.1.2=py38h06a4308_0\n  - python=3.8.16=h955ad1f_4\n  - readline=8.2=h5eee18b_0\n  - setuptools=67.8.0=py38h06a4308_0\n  - sqlite=3.41.2=h5eee18b_0\n  - tk=8.6.12=h1ccaba5_0\n  - wheel=0.38.4=py38h06a4308_0\n  - xz=5.4.2=h5eee18b_0\n  - zlib=1.2.13=h5eee18b_0\n  - pip:\n    - absl-py==1.2.0\n    - accelerate==0.21.0.dev0\n    - addict==2.4.0\n    - aiohttp==3.8.4\n    - aiosignal==1.3.1\n    - appdirs==1.4.4\n    - async-timeout==4.0.2\n    - attrs==22.2.0\n    - auto-gptq==0.3.0.dev0\n    - bert-score==0.3.13\n    - bitsandbytes==0.39.0\n    - certifi==2022.9.24\n    - charset-normalizer==3.0.1\n    - click==8.1.3\n    - cmake==3.26.3\n    - colorama==0.4.5\n    - contourpy==1.0.6\n    - cycler==0.11.0\n    - datasets==2.12.0\n    - dill==0.3.5.1\n    - evaluate==0.4.0\n    - fonttools==4.39.4\n    - frozenlist==1.3.3\n    - fsspec==2022.11.0\n    - gitdb==4.0.10\n    - gitpython==3.1.31\n    - huggingface-hub==0.14.1\n    - importlib-resources==5.12.0\n    - jinja2==3.1.2\n    - joblib==1.2.0\n    - lazy-import==0.2.2\n    - lit==16.0.5\n    - lxml==4.8.0\n    - markupsafe==2.1.2\n    - matplotlib==3.7.1\n    - mpmath==1.2.1\n    - multidict==6.0.2\n    - multiprocess==0.70.12.2\n    - networkx==2.8.8\n    - ninja==1.11.1\n    - nltk==3.8.1\n    - numpy==1.24.2\n    - packaging==23.1\n    - pandas==2.0.0\n    - pathlib2==2.3.7.post1\n    - pathtools==0.1.2\n    - peft==0.4.0.dev0\n    - pillow==9.3.0\n    - protobuf==3.20.2\n    - psutil==5.9.4\n    - pyarrow==10.0.1\n    - pyparsing==3.0.9\n    - python-dateutil==2.8.2\n    - pytz==2022.6\n    - pyyaml==6.0\n    - requests==2.31.0\n    - responses==0.18.0\n    - rouge==1.0.0\n    - rouge-score==0.1.2\n    - safetensors==0.3.1\n    - scikit-learn==1.2.2\n    - scipy==1.10.1\n    - sentencepiece==0.1.99\n    - sentry-sdk==1.24.0\n    - setproctitle==1.3.2\n    - six==1.16.0\n    - smmap==5.0.0\n    - sympy==1.11.1\n    - terminaltables==3.1.10\n    - threadpoolctl==3.1.0\n    - tokenizers==0.13.3\n    - torch==2.0.0+cu117\n    - tqdm==4.63.1\n    - transformers==4.31.0.dev0\n    - triton==2.0.0\n    - typing-extensions==4.6.2\n    - tzdata==2022.7\n    - urllib3==1.26.16\n    - wandb==0.15.2\n    - xformers==0.0.20\n    - xxhash==3.0.0\n    - yapf==0.32.0\n    - yarl==1.8.1\n    - zipp==3.15.0\n\n"
  },
  {
    "path": "merge.py",
    "content": "import torch\r\nmodel_path = 'path of the quantized model'\r\nlora_path = 'path of the saved LoRA adapters'\r\nmerged_path = 'target path of the merged model'\r\nscale = 16 /64\r\ngroup_size = 32\r\n\r\nmodel = torch.load(model_path, map_location='cpu')\r\nlora = torch.load(lora_path, map_location='cpu')\r\ntmp_keys = [key[17:-14] for key in lora.keys() if 'lora_A' in key]\r\nfor tmp_key in tmp_keys:\r\n    model[tmp_key+'.qzeros'] -= (lora['base_model.model.'+tmp_key+'.lora_B.weight'] @ lora['base_model.model.'+tmp_key+'.lora_A.weight']).t() * scale / group_size /model[tmp_key+'.scales']\r\n\r\ntorch.save(model, merged_path)\r\n\r\n"
  },
  {
    "path": "peft_utils.py",
    "content": "import warnings\nimport re\nfrom contextlib import contextmanager\nfrom dataclasses import asdict\nfrom enum import Enum\nfrom typing import List, Optional\n\nimport torch\nfrom peft import get_peft_model, PeftConfig, PeftModel, PeftType\nfrom peft.peft_model import PEFT_TYPE_TO_MODEL_MAPPING\nfrom peft.tuners.lora import LoraConfig, LoraLayer, LoraModel, Embedding\nfrom peft.tuners.adalora import AdaLoraConfig, AdaLoraLayer, AdaLoraModel\nfrom peft.mapping import PEFT_TYPE_TO_CONFIG_MAPPING\nfrom peft.utils.other import _get_submodules\n\nfrom ..modeling._base import BaseGPTQForCausalLM\n\n\ngroup_size = 32  # quantization group_size\n\n\nclass GPTQLoraConfig(LoraConfig):\n    injected_fused_attention: bool = False\n    injected_fused_mlp: bool = False\n\n\nclass GPTQLoraLinear(torch.nn.Linear, LoraLayer):\n    def __init__(\n        self,\n        adapter_name: str,\n        linear_module: torch.nn.Linear,\n        r: int = 0,\n        lora_alpha: int = 1,\n        lora_dropout: float = 0.0,\n        fan_in_fan_out: bool = False,  # Set this to True if the layer to replace stores weight like (fan_in, fan_out)\n        **kwargs,\n    ):\n        init_lora_weights = kwargs.pop(\"init_lora_weights\", True)\n\n        torch.nn.Linear.__init__(self, linear_module.in_features, linear_module.out_features)\n        LoraLayer.__init__(self, linear_module.in_features//group_size, linear_module.out_features)\n\n        self.linear_module = linear_module\n\n        self.weight.requires_grad = False\n        self.weight = self.linear_module.weight\n        self.bias = self.linear_module.bias\n        self.fan_in_fan_out = fan_in_fan_out\n        if fan_in_fan_out:\n            self.weight.data = self.weight.data.T\n\n        self.update_layer(adapter_name, r, lora_alpha, lora_dropout, init_lora_weights)\n        self.active_adapter = adapter_name\n        self.qa_pool = torch.nn.AvgPool1d(group_size)  # using pooling layer to conduct sum operation\n\n    def reset_lora_parameters(self, adapter_name):\n        if adapter_name in self.lora_A.keys():\n            torch.nn.init.xavier_uniform_(self.lora_A[adapter_name].weight)\n            torch.nn.init.zeros_(self.lora_B[adapter_name].weight)\n\n    def merge(self):\n        raise NotImplementedError(\"gptq model not support merge lora adapter\")\n\n    def unmerge(self):\n        raise NotImplementedError(\"gptq model not support unmerge lora adapter\")\n\n    def forward(self, x: torch.Tensor):\n        previous_dtype = x.dtype\n        if self.active_adapter not in self.lora_A.keys():\n            return self.linear_module(x)\n        if self.disable_adapters:\n            if self.r[self.active_adapter] > 0 and self.merged:\n                self.unmerge()\n            result = self.linear_module(x)\n        elif self.r[self.active_adapter] > 0 and not self.merged:\n            result = self.linear_module(x)\n\n            lora_B = self.lora_B[self.active_adapter]\n            lora_A = self.lora_A[self.active_adapter]\n            lora_dropout = self.lora_dropout[self.active_adapter]\n            scale = self.scaling[self.active_adapter]\n\n            x = x.type_as(lora_A.weight.data)\n            adapter_result = (lora_B(lora_A(lora_dropout(self.qa_pool(x)))) * scale).type_as(result)\n            result += adapter_result\n        else:\n            result = self.linear_module(x)\n\n        result = result.to(previous_dtype)\n\n        return result\n\n\nclass GPTQLoraModel(LoraModel):\n    def _find_and_replace(self, adapter_name):\n        lora_config = self.peft_config[adapter_name]\n        is_target_modules_in_base_model = False\n        kwargs = {\n            \"r\": lora_config.r,\n            \"lora_alpha\": lora_config.lora_alpha,\n            \"lora_dropout\": lora_config.lora_dropout,\n            \"fan_in_fan_out\": lora_config.fan_in_fan_out,\n            \"init_lora_weights\": lora_config.init_lora_weights,\n        }\n        key_list = [key for key, _ in self.model.named_modules()]\n        for key in key_list:\n            if isinstance(lora_config.target_modules, str):\n                target_module_found = re.fullmatch(lora_config.target_modules, key)\n            else:\n                target_module_found = any(key.endswith(target_key) for target_key in lora_config.target_modules)\n            if target_module_found:\n                if not is_target_modules_in_base_model:\n                    is_target_modules_in_base_model = True\n                parent, target, target_name = _get_submodules(self.model, key)\n                bias = False\n                if hasattr(target, \"bias\"):\n                    bias = target.bias is not None\n\n                if isinstance(target, LoraLayer):\n                    target.update_layer(\n                        adapter_name,\n                        lora_config.r,\n                        lora_config.lora_alpha,\n                        lora_config.lora_dropout,\n                        lora_config.init_lora_weights,\n                    )\n                else:\n                    if isinstance(target, torch.nn.Embedding):\n                        embedding_kwargs = kwargs.copy()\n                        embedding_kwargs.pop(\"fan_in_fan_out\", None)\n                        in_features, out_features = target.num_embeddings, target.embedding_dim\n                        new_module = Embedding(adapter_name, in_features, out_features, **embedding_kwargs)\n                    else:\n                        if isinstance(target, torch.nn.Linear):\n                            if kwargs[\"fan_in_fan_out\"]:\n                                warnings.warn(\n                                    \"fan_in_fan_out is set to True but the target module is `torch.nn.Linear`. \"\n                                    \"Setting fan_in_fan_out to False.\"\n                                )\n                                kwargs[\"fan_in_fan_out\"] = lora_config.fan_in_fan_out = False\n                        else:\n                            raise ValueError(\n                                f\"Target module {target} is not supported. \"\n                                f\"Currently, only `torch.nn.Linear` and its subclasses are supported.\"\n                            )\n                        new_module = GPTQLoraLinear(adapter_name, target, **kwargs)\n\n                    self._replace_module(parent, target_name, new_module, target)\n        if not is_target_modules_in_base_model:\n            raise ValueError(\n                f\"Target modules {lora_config.target_modules} not found in the base model. \"\n                f\"Please check the target modules and try again.\"\n            )\n\n    def _replace_module(self, parent_module, child_name, new_module, old_module):\n        setattr(parent_module, child_name, new_module)\n        if not isinstance(new_module, GPTQLoraLinear):\n            new_module.weight = old_module.weight\n            if hasattr(old_module, \"bias\"):\n                if old_module.bias is not None:\n                    new_module.bias = old_module.bias\n\n            if getattr(old_module, \"state\", None) is not None:\n                new_module.state = old_module.state\n                new_module.to(old_module.weight.device)\n\n        # dispatch to correct device\n        for name, module in new_module.named_modules():\n            if \"lora_\" in name:\n                module.to(old_module.weight.device)\n\n    def merge_adapter(self):\n        raise NotImplementedError(\"gptq model not support merge ada lora adapter\")\n\n    def unmerge_adapter(self):\n        raise NotImplementedError(\"gptq model not support unmerge ada lora adapter\")\n\n    def merge_and_unload(self):\n        raise NotImplementedError(\"gptq model not support merge and unload\")\n\n\nclass GPTQAdaLoraConfig(AdaLoraConfig):\n    injected_fused_attention: bool = False\n    injected_fused_mlp: bool = False\n\n\nclass GPTQSVDLinear(torch.nn.Linear, AdaLoraLayer):\n    def __init__(\n        self,\n        adapter_name: str,\n        linear_module: torch.nn.Linear,\n        r: int = 0,\n        lora_alpha: int = 1,\n        lora_dropout: float = 0.0,\n        fan_in_fan_out: bool = False,  # Set this to True if the layer to replace stores weight like (fan_in, fan_out)\n        **kwargs,\n    ):\n        init_lora_weights = kwargs.pop(\"init_lora_weights\", True)\n\n        torch.nn.Linear.__init__(self, linear_module.in_features, linear_module.out_features)\n        AdaLoraLayer.__init__(self, linear_module.in_features, linear_module.out_features)\n\n        self.linear_module = linear_module\n\n        self.weight.requires_grad = False\n        self.weight = self.linear_module.weight\n        self.bias = self.linear_module.bias\n        self.fan_in_fan_out = fan_in_fan_out\n        if fan_in_fan_out:\n            self.weight.data = self.weight.data.T\n\n        self.update_layer(adapter_name, r, lora_alpha, lora_dropout, init_lora_weights)\n        self.active_adapter = adapter_name\n\n    def merge(self):\n        raise NotImplementedError(\"gptq model not support merge lora adapter\")\n\n    def unmerge(self):\n        raise NotImplementedError(\"gptq model not support unmerge lora adapter\")\n\n    def forward(self, x: torch.Tensor):\n        if self.active_adapter not in self.lora_A.keys():\n            return self.linear_module(x)\n        if self.disable_adapters:\n            if self.r[self.active_adapter] > 0 and self.merged:\n                self.unmerge()\n            result = self.linear_module(x)\n        elif self.r[self.active_adapter] > 0 and not self.merged:\n            result = self.linear_module(x)\n            result += (\n                (\n                    self.lora_dropout[self.active_adapter](x)\n                    @ (self.lora_A[self.active_adapter] * self.lora_E[self.active_adapter]).T\n                    @ self.lora_B[self.active_adapter].T\n                )\n                * self.scaling[self.active_adapter]\n                / (self.ranknum[self.active_adapter] + 1e-5)\n            )\n        else:\n            result = self.linear_module(x)\n        return result\n\n\nclass GPTQAdaLoraModel(AdaLoraModel):\n    def _find_and_replace(self, adapter_name):\n        lora_config = self.peft_config[adapter_name]\n        is_target_modules_in_base_model = False\n        kwargs = {\n            \"r\": lora_config.init_r,\n            \"lora_alpha\": lora_config.lora_alpha,\n            \"lora_dropout\": lora_config.lora_dropout,\n            \"fan_in_fan_out\": lora_config.fan_in_fan_out,\n            \"init_lora_weights\": lora_config.init_lora_weights,\n        }\n        key_list = [key for key, _ in self.model.named_modules()]\n        for key in key_list:\n            if isinstance(lora_config.target_modules, str):\n                target_module_found = re.fullmatch(lora_config.target_modules, key)\n            else:\n                target_module_found = any(key.endswith(target_key) for target_key in lora_config.target_modules)\n            if target_module_found:\n                if not is_target_modules_in_base_model:\n                    is_target_modules_in_base_model = True\n                parent, target, target_name = _get_submodules(self.model, key)\n                bias = target.bias is not None\n                if isinstance(target, LoraLayer):\n                    target.update_layer(\n                        adapter_name,\n                        lora_config.init_r,\n                        lora_config.lora_alpha,\n                        lora_config.lora_dropout,\n                        lora_config.init_lora_weights,\n                    )\n                else:\n                    if isinstance(target, torch.nn.Linear):\n                        in_features, out_features = target.in_features, target.out_features\n                        if kwargs[\"fan_in_fan_out\"]:\n                            warnings.warn(\n                                \"fan_in_fan_out is set to True but the target module is `torch.nn.Linear`. \"\n                                \"Setting fan_in_fan_out to False.\"\n                            )\n                            kwargs[\"fan_in_fan_out\"] = lora_config.fan_in_fan_out = False\n                    else:\n                        raise ValueError(\n                            f\"Target module {target} is not supported. \"\n                            f\"Currently, only `torch.nn.Linear` and its subclasses are supported.\"\n                        )\n                    new_module = GPTQSVDLinear(adapter_name, target, **kwargs)\n\n                    self._replace_module(parent, target_name, new_module, target)\n        if not is_target_modules_in_base_model:\n            raise ValueError(\n                f\"Target modules {lora_config.target_modules} not found in the base model. \"\n                f\"Please check the target modules and try again.\"\n            )\n\n    def _replace_module(self, parent_module, child_name, new_module, old_module):\n        setattr(parent_module, child_name, new_module)\n\n        # dispatch to correct device\n        for name, module in new_module.named_modules():\n            if \"lora_\" in name:\n                module.to(old_module.weight.device)\n\n    def merge_adapter(self):\n        raise NotImplementedError(\"gptq model not support merge ada lora adapter\")\n\n    def unmerge_adapter(self):\n        raise NotImplementedError(\"gptq model not support unmerge ada lora adapter\")\n\n    def merge_and_unload(self):\n        raise NotImplementedError(\"gptq model not support merge and unload\")\n\n\ndef find_all_linear_names(model: BaseGPTQForCausalLM, ignore: Optional[List[str]] = None, ignore_lm_head: bool = True):\n    if not ignore:\n        ignore = []\n    lm_head_name = model.lm_head_name\n    if ignore_lm_head and lm_head_name not in ignore:\n        ignore.append(lm_head_name)\n    results = set()\n    for n, m in model.named_modules():\n        if isinstance(m, torch.nn.Linear):\n            res = n.split('.')[-1]\n            if res not in ignore:\n                results.add(res)\n    return list(results)\n\n\n@contextmanager\ndef hijack_peft_mappings():\n    PEFT_TYPE_TO_CONFIG_MAPPING[PeftType.LORA] = GPTQLoraConfig\n    PEFT_TYPE_TO_MODEL_MAPPING[PeftType.LORA] = GPTQLoraModel\n    PEFT_TYPE_TO_CONFIG_MAPPING[PeftType.ADALORA] = GPTQAdaLoraConfig\n    PEFT_TYPE_TO_MODEL_MAPPING[PeftType.ADALORA] = GPTQAdaLoraModel\n\n    try:\n        yield\n    except:\n        PEFT_TYPE_TO_CONFIG_MAPPING[PeftType.LORA] = GPTQLoraConfig\n        PEFT_TYPE_TO_MODEL_MAPPING[PeftType.LORA] = GPTQLoraModel\n        PEFT_TYPE_TO_CONFIG_MAPPING[PeftType.ADALORA] = GPTQAdaLoraConfig\n        PEFT_TYPE_TO_MODEL_MAPPING[PeftType.ADALORA] = GPTQAdaLoraModel\n        raise\n    finally:\n        PEFT_TYPE_TO_CONFIG_MAPPING[PeftType.LORA] = GPTQLoraConfig\n        PEFT_TYPE_TO_MODEL_MAPPING[PeftType.LORA] = GPTQLoraModel\n        PEFT_TYPE_TO_CONFIG_MAPPING[PeftType.ADALORA] = GPTQAdaLoraConfig\n        PEFT_TYPE_TO_MODEL_MAPPING[PeftType.ADALORA] = GPTQAdaLoraModel\n\n\ndef get_gptq_peft_model(\n    model: BaseGPTQForCausalLM,\n    peft_config: PeftConfig = None,\n    model_id: str = None,\n    adapter_name: str = \"default\",\n    auto_find_all_linears: bool = True,\n    train_mode: bool = False\n):\n    if train_mode and not model.trainable:\n        model.enable_trainable_mode()\n    if train_mode and not peft_config:\n        raise ValueError(\"peft_config not specified when in train mode.\")\n    if not train_mode and not model_id:\n        raise ValueError(\"model_id(where to load adapters) not specified when in inference mode.\")\n\n    if model.fused_attn_module_type is not None and not model.injected_fused_attention:\n        peft_types = [PeftType.LORA.value, PeftType.ADALORA.value]\n        warnings.warn(\n            f\"You can just ignore this warning if the peft type you use isn't in {peft_types}.\\n\"\n            f\"{model.__class__.__name__} supports injecting fused attention but not enables this time. \"\n            \"If you are training adapters, you must also disable fused attention injection when loading quantized \"\n            \"base model at inference time, otherwise adapters may not be added to base model properly. \"\n            \"If you are loading adapters to do inference, you can reference to adapter's config file to check \"\n            \"whether the adapters are trained using base model that not enable fused attention injection.\"\n        )\n    if model.injected_fused_mlp:\n        raise NotImplementedError(\"GPTQ model that enables fused mlp injection is not supported to integrate with peft.\")\n\n    if train_mode:\n        peft_type = peft_config.peft_type\n        if not isinstance(peft_type, str):\n            peft_type = peft_type.value\n        if peft_type in [PeftType.LORA.value, PeftType.ADALORA.value]:\n            if auto_find_all_linears:\n                peft_config.target_modules = find_all_linear_names(model, ignore_lm_head=True)\n            if peft_type == PeftType.LORA.value and not isinstance(peft_config, GPTQLoraConfig):\n                peft_config = GPTQLoraConfig(**peft_config.to_dict())\n            if peft_type == PeftType.ADALORA.value and not isinstance(peft_config, GPTQAdaLoraConfig):\n                peft_config = GPTQAdaLoraConfig(**peft_config.to_dict())\n            peft_config.injected_fused_attention = model.injected_fused_attention\n            peft_config.injected_fused_mlp = model.injected_fused_mlp\n        if peft_type == PeftType.ADAPTION_PROMPT.value:\n            if peft_config.adapter_layers > model.config.num_hidden_layers:\n                warnings.warn(\n                    f\"model has only {model.config.num_hidden_layers} layers \"\n                    f\"but adapter_layers is set to {peft_config.adapter_layers}, \"\n                    f\"will reset value to {model.config.num_hidden_layers}.\"\n                )\n                peft_config.adapter_layers = model.config.num_hidden_layers\n            if model.injected_fused_attention:\n                raise NotImplementedError(\n                    \"model with fused attention injected isn't supported to use ADAPTION_PROMPT peft type yet.\"\n                )\n\n    with hijack_peft_mappings():\n        try:\n            if train_mode:\n                peft_model = get_peft_model(model.model, peft_config, adapter_name=adapter_name)\n            else:\n                peft_model = PeftModel.from_pretrained(model.model, model_id, adapter_name)\n        except:\n            raise NotImplementedError(\n                f\"{model.__class__.__name__} not support {peft_config.peft_type.value} peft type yet.\"\n            )\n\n    return peft_model\n\n\n__all__ = [\n    \"GPTQLoraConfig\",\n    \"GPTQLoraModel\",\n    \"GPTQAdaLoraConfig\",\n    \"GPTQAdaLoraModel\",\n    \"find_all_linear_names\",\n    \"get_gptq_peft_model\"\n]\n"
  },
  {
    "path": "qalora.py",
    "content": "# This source code is licensed under the MIT license found in the\n# LICENSE file in the root directory of this source tree.\n\nfrom collections import defaultdict\nimport copy\nimport json\nimport os\nfrom os.path import exists, join, isdir\nfrom dataclasses import dataclass, field\nimport sys\nfrom typing import Optional, Dict, Sequence\nimport numpy as np\nfrom tqdm import tqdm\nimport logging\n\nimport torch\nimport transformers\nfrom torch.nn.utils.rnn import pad_sequence\nimport argparse\nfrom transformers import (\n    AutoTokenizer, \n    AutoModelForCausalLM, \n    set_seed, \n    Seq2SeqTrainer,\n    LlamaTokenizerFast\n)\nfrom datasets import load_dataset\nimport evaluate\n\nfrom peft import (\n    LoraConfig,\n    get_peft_model_state_dict,\n    set_peft_model_state_dict,\n    PeftModel\n)\nfrom peft.tuners.lora import LoraLayer\nfrom transformers.trainer_utils import PREFIX_CHECKPOINT_DIR\nfrom auto_gptq.utils.peft_utils import get_gptq_peft_model, GPTQLoraConfig\nfrom auto_gptq import AutoGPTQForCausalLM\nfrom auto_gptq.nn_modules.qlinear import GeneralQuantLinear\n\ntorch.backends.cuda.matmul.allow_tf32 = True\n\nlogger = logging.getLogger(__name__)\n\nIGNORE_INDEX = -100\nDEFAULT_PAD_TOKEN = \"[PAD]\"\n\nimport os\nos.environ[\"TOKENIZERS_PARALLELISM\"] = \"false\"\n\ndef prepare_model_for_int8_training(model, use_gradient_checkpointing=True):\n    r\"\"\"\n    This method wraps the entire protocol for preparing a model before running a training. This includes:\n        1- Cast the layernorm in fp32 2- making output embedding layer require grads 3- Add the upcasting of the lm\n        head to fp32\n\n    Args:\n        model, (`transformers.PreTrainedModel`):\n            The loaded model from `transformers`\n    \"\"\"\n    for name, param in model.named_parameters():\n        # freeze base model's layers\n        param.requires_grad = False\n        \n    if use_gradient_checkpointing:\n        # For backward compatibility\n        if hasattr(model, \"enable_input_require_grads\"):\n            model.enable_input_require_grads()\n        else:\n\n            def make_inputs_require_grad(module, input, output):\n                output.requires_grad_(True)\n\n            model.get_input_embeddings().register_forward_hook(make_inputs_require_grad)\n\n        # enable gradient checkpointing for memory efficiency\n        model.gradient_checkpointing_enable()\n\n    model.lm_head = model.lm_head.float()\n    for _, param in model.named_parameters():\n        if param.dtype == torch.float16:\n            param = param.float()\n\n    return model\n\n@dataclass\nclass ModelArguments:\n    model_path: Optional[str] = field(\n        default=\"./llama-7b/\"\n    )\n    trust_remote_code: Optional[bool] = field(\n        default=False,\n        metadata={\"help\": \"Enable unpickling of arbitrary code in AutoModelForCausalLM#from_pretrained.\"}\n    )\n\n@dataclass\nclass DataArguments:\n    eval_dataset_size: int = field(\n        default=1024, metadata={\"help\": \"Size of validation dataset.\"}\n    )\n    max_train_samples: Optional[int] = field(\n        default=None,\n        metadata={\n            \"help\": \"For debugging purposes or quicker training, truncate the number of training examples to this \"\n            \"value if set.\"\n        },\n    )\n    max_eval_samples: Optional[int] = field(\n        default=None,\n        metadata={\n            \"help\": \"For debugging purposes or quicker training, truncate the number of evaluation examples to this \"\n            \"value if set.\"\n        },\n    )\n    source_max_len: int = field(\n        default=1024,\n        metadata={\"help\": \"Maximum source sequence length. Sequences will be right padded (and possibly truncated).\"},\n    )\n    target_max_len: int = field(\n        default=256,\n        metadata={\"help\": \"Maximum target sequence length. Sequences will be right padded (and possibly truncated).\"},\n    )\n    dataset: str = field(\n        default='alpaca',\n        metadata={\"help\": \"Which dataset to finetune on. See datamodule for options.\"}\n    )\n\n@dataclass\nclass TrainingArguments(transformers.Seq2SeqTrainingArguments):\n    cache_dir: Optional[str] = field(\n        default=None\n    )\n    train_on_source: Optional[bool] = field(\n        default=False,\n        metadata={\"help\": \"Whether to train on the input in addition to the target text.\"}\n    )\n    mmlu_split: Optional[str] = field(\n        default='eval',\n        metadata={\"help\": \"The MMLU split to run on\"}\n    )\n    mmlu_dataset: Optional[str] = field(\n        default='mmlu-fs',\n        metadata={\"help\": \"MMLU dataset to use: options are `mmlu-zs` for zero-shot or `mmlu-fs` for few shot.\"}\n    )\n    do_mmlu_eval: Optional[bool] = field(\n        default=False,\n        metadata={\"help\": \"Whether to run the MMLU evaluation.\"}\n    )\n    max_mmlu_samples: Optional[int] = field(\n        default=None,\n        metadata={\"help\": \"If set, only evaluates on `max_mmlu_samples` of the MMMLU dataset.\"}\n    )\n    mmlu_source_max_len: int = field(\n        default=2048,\n        metadata={\"help\": \"Maximum source sequence length for mmlu.\"}\n    )\n    full_finetune: bool = field(\n        default=False,\n        metadata={\"help\": \"Finetune the entire model without adapters.\"}\n    )\n    adam8bit: bool = field(\n        default=False,\n        metadata={\"help\": \"Use 8-bit adam.\"}\n    )\n    lora_r: int = field(\n        default=64,\n        metadata={\"help\": \"Lora R dimension.\"}\n    )\n    lora_alpha: float = field(\n        default=16,\n        metadata={\"help\": \" Lora alpha.\"}\n    )\n    lora_dropout: float = field(\n        default=0.0,\n        metadata={\"help\":\"Lora dropout.\"}\n    )\n    max_memory_MB: int = field(\n        default=24000,\n        metadata={\"help\": \"Free memory per gpu.\"}\n    )\n    report_to: str = field(\n        default='none',\n        metadata={\"help\": \"To use wandb or something else for reporting.\"}\n    )\n    output_dir: str = field(default='./output', metadata={\"help\": 'The output dir for logs and checkpoints'})\n    optim: str = field(default='paged_adamw_32bit', metadata={\"help\": 'The optimizer to be used'})\n    per_device_train_batch_size: int = field(default=1, metadata={\"help\": 'The training batch size per GPU. Increase for better speed.'})\n    gradient_accumulation_steps: int = field(default=16, metadata={\"help\": 'How many gradients to accumulate before to perform an optimizer step'})\n    max_steps: int = field(default=10000, metadata={\"help\": 'How many optimizer update steps to take'})\n    weight_decay: float = field(default=0.0, metadata={\"help\": 'The L2 weight decay rate of AdamW'}) # use lora dropout instead for regularization if needed\n    learning_rate: float = field(default=0.0002, metadata={\"help\": 'The learnign rate'})\n    remove_unused_columns: bool = field(default=False, metadata={\"help\": 'Removed unused columns. Needed to make this codebase work.'})\n    max_grad_norm: float = field(default=0.3, metadata={\"help\": 'Gradient clipping max norm. This is tuned and works well for all models tested.'})\n    gradient_checkpointing: bool = field(default=True, metadata={\"help\": 'Use gradient checkpointing. You want to use this.'})\n    do_train: bool = field(default=True, metadata={\"help\": 'To train or not to train, that is the question?'})\n    lr_scheduler_type: str = field(default='constant', metadata={\"help\": 'Learning rate schedule. Constant a bit better than cosine, and has advantage for analysis'})\n    warmup_ratio: float = field(default=0.03, metadata={\"help\": 'Fraction of steps to do a warmup for'})\n    logging_steps: int = field(default=10, metadata={\"help\": 'The frequency of update steps after which to log the loss'})\n    group_by_length: bool = field(default=True, metadata={\"help\": 'Group sequences into batches with same length. Saves memory and speeds up training considerably.'})\n    save_strategy: str = field(default='steps', metadata={\"help\": 'When to save checkpoints'})\n    save_steps: int = field(default=250, metadata={\"help\": 'How often to save a model'})\n    save_total_limit: int = field(default=40, metadata={\"help\": 'How many checkpoints to save before the oldest is overwritten'})\n\n@dataclass\nclass GenerationArguments:\n    # For more hyperparameters check:\n    # https://huggingface.co/docs/transformers/main_classes/text_generation#transformers.GenerationConfig\n    # Length arguments\n    max_new_tokens: Optional[int] = field(\n        default=256,\n        metadata={\"help\": \"Maximum number of new tokens to be generated in evaluation or prediction loops\"\n                          \"if predict_with_generate is set.\"}\n    )\n    min_new_tokens : Optional[int] = field(\n        default=None,\n        metadata={\"help\": \"Minimum number of new tokens to generate.\"}\n    )\n\n    # Generation strategy\n    do_sample: Optional[bool] = field(default=False)\n    num_beams: Optional[int] = field(default=1)\n    num_beam_groups: Optional[int] = field(default=1)\n    penalty_alpha: Optional[float] = field(default=None)\n    use_cache: Optional[bool] = field(default=True) \n\n    # Hyperparameters for logit manipulation\n    temperature: Optional[float] = field(default=1.0)\n    top_k: Optional[int] = field(default=50)\n    top_p: Optional[float] = field(default=1.0)\n    typical_p: Optional[float] = field(default=1.0)\n    diversity_penalty: Optional[float] = field(default=0.0) \n    repetition_penalty: Optional[float] = field(default=1.0) \n    length_penalty: Optional[float] = field(default=1.0)\n    no_repeat_ngram_size: Optional[int] = field(default=0) \n\ndef find_all_linear_names(args, model):\n    cls = GeneralQuantLinear if not(args.full_finetune) else torch.nn.Linear\n    lora_module_names = set()\n    for name, module in model.named_modules():\n        if isinstance(module, cls):\n            names = name.split('.')\n            lora_module_names.add(names[0] if len(names) == 1 else names[-1])\n\n\n    if 'lm_head' in lora_module_names: # needed for 16-bit\n        lora_module_names.remove('lm_head')\n    return list(lora_module_names)\n\n\nclass SavePeftModelCallback(transformers.TrainerCallback):\n    def save_model(self, args, state, kwargs):\n        print('Saving PEFT checkpoint...')\n        if state.best_model_checkpoint is not None:\n            checkpoint_folder = os.path.join(state.best_model_checkpoint, \"adapter_model\")\n        else:\n            checkpoint_folder = os.path.join(args.output_dir, f\"{PREFIX_CHECKPOINT_DIR}-{state.global_step}\")\n\n        peft_model_path = os.path.join(checkpoint_folder, \"adapter_model\")\n        kwargs[\"model\"].save_pretrained(peft_model_path)\n\n        pytorch_model_path = os.path.join(checkpoint_folder, \"pytorch_model.bin\")\n        if os.path.exists(pytorch_model_path):\n            os.remove(pytorch_model_path)\n\n    def on_save(self, args, state, control, **kwargs):\n        self.save_model(args, state, kwargs)\n        return control\n\n    def on_train_end(self, args, state, control, **kwargs):\n        def touch(fname, times=None):\n            with open(fname, 'a'):\n                os.utime(fname, times)\n\n        touch(join(args.output_dir, 'completed'))\n        self.save_model(args, state, kwargs)\n\ndef get_accelerate_model(args, checkpoint_dir):\n\n    n_gpus = torch.cuda.device_count()\n    max_memory = f'{args.max_memory_MB}MB'\n    max_memory = {i: max_memory for i in range(n_gpus)}\n\n    if args.full_finetune: assert args.bits in [16, 32]\n\n    print(f'loading base model {args.model_path}...')\n    model = AutoGPTQForCausalLM.from_quantized(\n        args.model_path,\n        device_map='auto',\n        max_memory=max_memory,\n        trust_remote_code=args.trust_remote_code,\n        inject_fused_attention = False,\n        inject_fused_mlp = False,\n        use_triton=True,\n        warmup_triton=False,\n        trainable=True\n    )\n    model.model.quantize_config = model.quantize_config\n    model.train()\n\n    setattr(model, 'model_parallel', True)\n    setattr(model, 'is_parallelizable', True)\n    #modules = find_all_linear_names(args, model)\n\n    model.config.torch_dtype=(torch.float32 if args.fp16 else (torch.bfloat16 if args.bf16 else torch.float32))\n\n    if not args.full_finetune:\n        model = prepare_model_for_int8_training(model, use_gradient_checkpointing=args.gradient_checkpointing)\n    if args.gradient_checkpointing:\n        model.gradient_checkpointing_enable()\n\n    config = GPTQLoraConfig(\n        r=args.lora_r,\n        lora_alpha=args.lora_alpha,\n        #target_modules=modules,\n        lora_dropout=args.lora_dropout,\n        bias=\"none\",\n        task_type=\"CAUSAL_LM\",\n    )\n    if not args.full_finetune:\n        if checkpoint_dir is not None:\n            print(\"Loading adapters from checkpoint.\")\n            model = PeftModel.from_pretrained(model, join(checkpoint_dir, 'adapter_model'))\n            for name, p in model.named_parameters():\n                if 'lora' in name:\n                    print(name, p.sum())\n        else:\n            print(f'adding LoRA modules...')\n            model = get_gptq_peft_model(model, config, auto_find_all_linears=True, train_mode=True)\n\n    if args.gradient_checkpointing:\n        if hasattr(model, \"enable_input_require_grads\"):\n            model.enable_input_require_grads()\n        else:\n            def make_inputs_require_grad(module, input, output):\n                output.requires_grad_(True)\n            model.get_input_embeddings().register_forward_hook(make_inputs_require_grad)\n\n\n    for name, module in model.named_modules():\n        if isinstance(module, LoraLayer):\n            if args.bf16:\n                module = module.to(torch.bfloat16)\n        if 'norm' in name:\n            module = module.to(torch.float32)\n        if 'lm_head' in name or 'embed_tokens' in name:\n            if hasattr(module, 'weight'):\n                if args.bf16 and module.weight.dtype == torch.float32:\n                    module = module.to(torch.bfloat16)\n    return model\n\ndef print_trainable_parameters(args, model):\n    \"\"\"\n    Prints the number of trainable parameters in the model.\n    \"\"\"\n    trainable_params = 0\n    all_param = 0\n    for _, param in model.named_parameters():\n        all_param += param.numel()\n        if param.requires_grad:\n            trainable_params += param.numel()\n    try:\n        trainable_params /= (32//model.quantize_config.bits)\n    except:\n        pass\n    print(f\"trainable params: {trainable_params} || all params: {all_param} || trainable: {100 * trainable_params / all_param}\")\n\ndef smart_tokenizer_and_embedding_resize(\n    special_tokens_dict: Dict,\n    tokenizer: transformers.PreTrainedTokenizer,\n    model: transformers.PreTrainedModel,\n):\n    \"\"\"Resize tokenizer and embedding.\n\n    Note: This is the unoptimized version that may make your embedding size not be divisible by 64.\n    \"\"\"\n    num_new_tokens = tokenizer.add_special_tokens(special_tokens_dict)\n    model.resize_token_embeddings(len(tokenizer))\n\n    if num_new_tokens > 0:\n        input_embeddings = model.get_input_embeddings().weight.data\n        output_embeddings = model.get_output_embeddings().weight.data\n\n        input_embeddings_avg = input_embeddings[:-num_new_tokens].mean(dim=0, keepdim=True)\n        output_embeddings_avg = output_embeddings[:-num_new_tokens].mean(dim=0, keepdim=True)\n\n        input_embeddings[-num_new_tokens:] = input_embeddings_avg\n        output_embeddings[-num_new_tokens:] = output_embeddings_avg\n\n@dataclass\nclass DataCollatorForCausalLM(object):\n    tokenizer: transformers.PreTrainedTokenizer\n    source_max_len: int\n    target_max_len: int\n    train_on_source: bool\n    predict_with_generate: bool\n\n    def __call__(self, instances: Sequence[Dict]) -> Dict[str, torch.Tensor]:\n        # Extract elements\n        sources = [example['input'] for example in instances]\n        targets = [f\"{example['output']}{self.tokenizer.eos_token}\" for example in instances]\n        # Tokenize\n        tokenized_sources_with_prompt = self.tokenizer(\n            sources,\n            max_length=self.source_max_len,\n            truncation=True,\n        )\n        tokenized_targets = self.tokenizer(\n            targets,\n            max_length=self.target_max_len,\n            truncation=True,\n            add_special_tokens=False,\n        )\n        # Build the input and labels for causal LM\n        input_ids = []\n        labels = [] \n        for tokenized_source, tokenized_target in zip(\n            tokenized_sources_with_prompt['input_ids'], \n            tokenized_targets['input_ids']\n        ):\n            if not self.predict_with_generate:\n                input_ids.append(torch.tensor(tokenized_source + tokenized_target))\n                if not self.train_on_source:\n                    labels.append(\n                        torch.tensor([IGNORE_INDEX for _ in range(len(tokenized_source))] + copy.deepcopy(tokenized_target))\n                    )\n                else:\n                    labels.append(torch.tensor(copy.deepcopy(tokenized_source + tokenized_target)))\n            else:\n                input_ids.append(torch.tensor(tokenized_source))\n        # Apply padding\n        input_ids = pad_sequence(input_ids, batch_first=True, padding_value=self.tokenizer.pad_token_id)\n        labels = pad_sequence(labels, batch_first=True, padding_value=IGNORE_INDEX) if not self.predict_with_generate else None\n        data_dict = {\n            'input_ids': input_ids,\n            'attention_mask':input_ids.ne(self.tokenizer.pad_token_id),\n        }\n        if labels is not None:\n            data_dict['labels'] = labels\n        return data_dict\n\ndef extract_unnatural_instructions_data(examples, extract_reformulations=False):\n    out = {\n        'input': [],\n        'output': [],\n    }\n    for example_instances in examples['instances']:\n        for instance in example_instances:\n            out['input'].append(instance['instruction_with_input'])\n            out['output'].append(instance['output'])\n    if extract_reformulations:\n        for example_reformulations in examples['reformulations']:\n            if example_reformulations is not None:\n                for instance in example_reformulations:\n                    out['input'].append(instance['instruction_with_input'])\n                    out['output'].append(instance['output'])\n    return out\n\nPROMPT_DICT = {\n    \"prompt_input\": (\n        \"Below is an instruction that describes a task, paired with an input that provides further context. \"\n        \"Write a response that appropriately completes the request.\\n\\n\"\n        \"### Instruction:\\n{instruction}\\n\\n### Input:\\n{input}\\n\\n### Response: \"\n    ),\n    \"prompt_no_input\": (\n        \"Below is an instruction that describes a task. \"\n        \"Write a response that appropriately completes the request.\\n\\n\"\n        \"### Instruction:\\n{instruction}\\n\\n### Response: \"\n    ),\n}\n\ndef extract_alpaca_dataset(example):\n    if example.get(\"input\", \"\") != \"\":\n        prompt_format = PROMPT_DICT[\"prompt_input\"]\n    else:\n        prompt_format = PROMPT_DICT[\"prompt_no_input\"]\n    return {'input': prompt_format.format(**example)}\n\ndef make_data_module(tokenizer: transformers.PreTrainedTokenizer, args) -> Dict:\n    \"\"\"\n    Make dataset and collator for supervised fine-tuning.\n    Datasets are expected to have the following columns: { `input`, `output` }\n\n    Available datasets to be selected with `dataset` argument:\n        - alpaca, 52002 examples\n        - alpaca cleaned, 51942 examples   \n        - chip2 (OIG), 210289 examples\n        - self-instruct, 82612 examples\n        - hh-rlhf (Anthropic), 160800 examples\n        - longform, 23.7k examples\n\n    Coming soon:\n        - unnatural instructions core, 66010 examples\n        - unnatural instructions full, 240670 examples\n        - alpaca-gpt4, 52002 examples\n        - unnatural-instructions-gpt4, 9000 examples\n        - oa-rlhf (OpenAssistant) primary message tree only, 9209 examples\n        - oa-rlhf-assistant (OpenAssistant) all assistant  replies with ranking\n        - supernatural-instructions, 69624 examples (same as paper with 100 ex/task more can be used)\n        - flan (FLAN v2), up to 20M examples available\n\n    Not Available:\n        - vicuna, not released at the moment.\n    \"\"\"\n    # Load dataset.\n    # Alpaca\n    if args.dataset == 'alpaca':\n        dataset = load_dataset(\"tatsu-lab/alpaca\")\n        dataset = dataset.map(extract_alpaca_dataset, remove_columns=['instruction'])\n    # Alpaca clean\n    elif args.dataset == 'alpaca-clean':\n        dataset = load_dataset(\"yahma/alpaca-cleaned\")\n        dataset = dataset.map(extract_alpaca_dataset, remove_columns=['instruction'])\n    # Chip2\n    elif args.dataset == 'chip2':\n        dataset = load_dataset(\"laion/OIG\", data_files='unified_chip2.jsonl')\n        dataset = dataset.map(lambda x: {\n            'input': x['text'].split('\\n<bot>: ')[0].replace('<human>: ', ''),\n            'output': x['text'].split('\\n<bot>: ')[1],\n        }, remove_columns=['text', 'metadata'])\n    # Self Instruct\n    elif args.dataset == 'self-instruct':\n        dataset = load_dataset(\"yizhongw/self_instruct\", name='self_instruct')\n        for old, new in [[\"prompt\", \"input\"], [\"completion\", \"output\"]]:\n            dataset = dataset.rename_column(old, new)\n    # Anthropic rlhf\n    elif args.dataset == 'hh-rlhf':\n        dataset = load_dataset(\"Anthropic/hh-rlhf\")\n        dataset = dataset.map(lambda x: {\n            'input': '',\n            'output': x['chosen']\n        }, remove_columns=['chosen', 'rejected'])\n    # LongForm\n    elif args.dataset == 'longform':\n        dataset = load_dataset(\"akoksal/LongForm\")\n    elif args.dataset == 'vicuna':\n        raise NotImplementedError(\"Vicuna data was not released.\")\n    else:\n        raise NotImplementedError(f\"Dataset {args.dataset} not implemented yet.\")\n\n    # Split train/eval, reduce size\n    if args.do_eval or args.do_predict:\n        if 'eval' in dataset:\n            eval_dataset = dataset['eval']\n        else:\n            print('Splitting train dataset in train and validation according to `eval_dataset_size`')\n            dataset = dataset[\"train\"].train_test_split(\n                test_size=args.eval_dataset_size, shuffle=True, seed=42\n            )\n            eval_dataset = dataset['test']\n        if args.max_eval_samples is not None and len(eval_dataset) > args.max_eval_samples:\n            eval_dataset = eval_dataset.select(range(args.max_eval_samples))\n        if args.group_by_length:\n            eval_dataset = eval_dataset.map(lambda x: {'length': len(x['input']) + len(x['output'])})\n    if args.do_train:\n        train_dataset = dataset['train']\n        if args.max_train_samples is not None and len(train_dataset) > args.max_train_samples:\n            train_dataset = train_dataset.select(range(args.max_train_samples))\n        if args.group_by_length:\n            train_dataset = train_dataset.map(lambda x: {'length': len(x['input']) + len(x['output'])})\n\n    data_collator = DataCollatorForCausalLM(\n        tokenizer=tokenizer, \n        source_max_len=args.source_max_len,\n        target_max_len=args.target_max_len,\n        train_on_source=args.train_on_source,\n        predict_with_generate=args.predict_with_generate,\n    )\n    return dict(\n        train_dataset=train_dataset if args.do_train else None, \n        eval_dataset=eval_dataset if args.do_eval else None,\n        predict_dataset=eval_dataset if args.do_predict else None,\n        data_collator=data_collator\n    )\n\ndef get_last_checkpoint(checkpoint_dir):\n    if isdir(checkpoint_dir):\n        is_completed = exists(join(checkpoint_dir, 'completed'))\n        if is_completed: return None, True # already finished\n        max_step = 0\n        for filename in os.listdir(checkpoint_dir):\n            if isdir(join(checkpoint_dir, filename)) and filename.startswith('checkpoint'):\n                max_step = max(max_step, int(filename.replace('checkpoint-', '')))\n        if max_step == 0: return None, is_completed # training started, but no checkpoint\n        checkpoint_dir = join(checkpoint_dir, f'checkpoint-{max_step}')\n        print(f\"Found a previous checkpoint at: {checkpoint_dir}\")\n        return checkpoint_dir, is_completed # checkpoint found!\n    return None, False # first training\n\ndef train():\n    hfparser = transformers.HfArgumentParser((\n        ModelArguments, DataArguments, TrainingArguments, GenerationArguments\n    ))\n    model_args, data_args, training_args, generation_args, extra_args = \\\n        hfparser.parse_args_into_dataclasses(return_remaining_strings=True)\n    training_args.generation_config = transformers.GenerationConfig(**vars(generation_args))\n    args = argparse.Namespace(\n        **vars(model_args), **vars(data_args), **vars(training_args)\n    )\n    \n\n    checkpoint_dir, completed_training = get_last_checkpoint(args.output_dir)\n    if completed_training:\n        print('Detected that training was already completed!')\n\n    model = get_accelerate_model(args, checkpoint_dir)\n    training_args.skip_loading_checkpoint_weights=True\n\n    resume_from_checkpoint = checkpoint_dir\n    if resume_from_checkpoint:\n        # Check the available weights and load them\n        checkpoint_name = os.path.join(\n            checkpoint_dir, \"pytorch_model.bin\"\n        )  # Full checkpoint\n        if not os.path.exists(checkpoint_name):\n            checkpoint_path = os.path.join(\n                checkpoint_dir, \"adapter_model\"\n            ) \n\n            checkpoint_name = os.path.join(\n                checkpoint_path, \"adapter_model.bin\"\n            )  # only LoRA model - LoRA config above has to fit\n            resume_from_checkpoint = (\n                False  # So the trainer won't try loading its state\n            )\n        # The two files above have a different name depending on how they were saved, but are actually the same.\n        if os.path.exists(checkpoint_name):\n            print(f\"Restarting from {checkpoint_name}\")\n            adapters_weights = torch.load(checkpoint_name)\n            set_peft_model_state_dict(model, adapters_weights)\n        else:\n            print(f\"Checkpoint {checkpoint_name} not found\")\n\n    model.config.use_cache = False\n    print_trainable_parameters(args, model)\n    print('loaded model')\n    set_seed(args.seed)\n\n    # Tokenizer\n    tokenizer = AutoTokenizer.from_pretrained(\n        args.model_path,\n        cache_dir=args.cache_dir,\n        padding_side=\"right\",\n        use_fast=True,\n    )\n    \n    if tokenizer.pad_token is None:\n        smart_tokenizer_and_embedding_resize(\n            special_tokens_dict=dict(pad_token=DEFAULT_PAD_TOKEN),\n            tokenizer=tokenizer,\n            model=model,\n        )\n        \n    if isinstance(tokenizer, LlamaTokenizerFast):\n        # LLaMA tokenizer may not have correct special tokens set.\n        # Check and add them if missing to prevent them from being parsed into different tokens.\n        # Note that these are present in the vocabulary. \n        # Note also that `model.config.pad_token_id` is 0 which corresponds to `<unk>` token.\n        tokenizer.add_special_tokens(\n            {\n                \"eos_token\": tokenizer.convert_ids_to_tokens(model.config.eos_token_id),\n                \"bos_token\": tokenizer.convert_ids_to_tokens(model.config.bos_token_id),\n                \"unk_token\": tokenizer.convert_ids_to_tokens(model.config.pad_token_id),\n            }\n        )\n    \n    data_module = make_data_module(tokenizer=tokenizer, args=args)\n    trainer = Seq2SeqTrainer(\n        model=model, \n        tokenizer=tokenizer,\n        args=training_args,\n        **{k:v for k,v in data_module.items() if k != 'predict_dataset'},\n    )\n\n    # Callbacks\n    if not args.full_finetune:\n        trainer.add_callback(SavePeftModelCallback)\n    if args.do_mmlu_eval:\n        if args.mmlu_dataset == 'mmlu-zs':\n            mmlu_dataset = load_dataset(\"json\", data_files={\n                'eval': 'data/mmlu/zero_shot_mmlu_val.json',\n                'test': 'data/mmlu/zero_shot_mmlu_test.json',\n            })\n            mmlu_dataset = mmlu_dataset.remove_columns('subject')\n        # MMLU Five-shot (Eval/Test only)\n        elif args.mmlu_dataset == 'mmlu' or args.mmlu_dataset == 'mmlu-fs':\n            mmlu_dataset = load_dataset(\"json\", data_files={\n                'eval': 'data/mmlu/five_shot_mmlu_val.json',\n                'test': 'data/mmlu/five_shot_mmlu_test.json',\n            })\n            # mmlu_dataset = mmlu_dataset.remove_columns('subject')\n        mmlu_dataset = mmlu_dataset[args.mmlu_split]\n        if args.max_mmlu_samples is not None:\n            mmlu_dataset = mmlu_dataset.select(range(args.max_mmlu_samples))\n        abcd_idx = [\n            tokenizer(\"A\", add_special_tokens=False).input_ids[0],\n            tokenizer(\"B\", add_special_tokens=False).input_ids[0],\n            tokenizer(\"C\", add_special_tokens=False).input_ids[0],\n            tokenizer(\"D\", add_special_tokens=False).input_ids[0],\n        ]\n        accuracy = evaluate.load(\"accuracy\")\n\n        class MMLUEvalCallback(transformers.TrainerCallback):\n            def on_evaluate(self, args, state, control, model, **kwargs):\n                data_loader = trainer.get_eval_dataloader(mmlu_dataset)\n                source_max_len = trainer.data_collator.source_max_len\n                trainer.data_collator.source_max_len = args.mmlu_source_max_len\n                trainer.model.eval()\n                preds, refs = [], []\n                loss_mmlu = 0\n                for batch in tqdm(data_loader, total=len(data_loader)):\n                    (loss, logits, labels) = trainer.prediction_step(trainer.model,batch,prediction_loss_only=False,)\n                    # There are two tokens, the output, and eos token.\n                    for i, logit in enumerate(logits):\n                        label_non_zero_id = (batch['labels'][i] != -100).nonzero()[0][0]\n                        logit_abcd = logit[label_non_zero_id-1][abcd_idx]\n                        preds.append(torch.argmax(logit_abcd).item())\n                    labels = labels[labels != IGNORE_INDEX].view(-1, 2)[:,0]\n                    for label in labels.tolist():\n                        if label in abcd_idx:\n                            refs += [abcd_idx.index(label)]\n                    \n                    loss_mmlu += loss.item()\n                # Extract results by subject.\n                results = {'mmlu_loss':loss_mmlu/len(data_loader)}\n                subject = mmlu_dataset['subject']\n                subjects = {s:{'refs':[], 'preds':[]} for s in set(subject)}\n                for s,p,r in zip(subject, preds, refs):\n                    subjects[s]['preds'].append(p)\n                    subjects[s]['refs'].append(r)\n                subject_scores = []\n                for subject in subjects:\n                    subject_score = accuracy.compute(\n                        references=subjects[subject]['refs'],\n                        predictions=subjects[subject]['preds']\n                    )['accuracy']\n                    results[f'mmlu_{args.mmlu_split}_accuracy_{subject}'] = subject_score\n                    subject_scores.append(subject_score)\n                results[f'mmlu_{args.mmlu_split}_accuracy'] = np.mean(subject_scores)\n                trainer.log(results)\n                trainer.data_collator.source_max_len = source_max_len\n\n        trainer.add_callback(MMLUEvalCallback)\n\n    # Verifying the datatypes.\n    dtypes = {}\n    for _, p in model.named_parameters():\n        dtype = p.dtype\n        if dtype not in dtypes: dtypes[dtype] = 0\n        dtypes[dtype] += p.numel()\n    total = 0\n    for k, v in dtypes.items(): total+= v\n    for k, v in dtypes.items():\n        print(k, v, v/total)\n\n    all_metrics = {\"run_name\": args.run_name}\n    # Training\n    if args.do_train:\n        train_result = trainer.train(resume_from_checkpoint=resume_from_checkpoint)\n        metrics = train_result.metrics\n        trainer.log_metrics(\"train\", metrics)\n        trainer.save_metrics(\"train\", metrics)\n        trainer.save_state()\n        all_metrics.update(metrics)\n    # Evaluation\n    if args.do_eval:\n        logger.info(\"*** Evaluate ***\")\n        metrics = trainer.evaluate(metric_key_prefix=\"eval\")\n        trainer.log_metrics(\"eval\", metrics)\n        trainer.save_metrics(\"eval\", metrics)\n        all_metrics.update(metrics)\n    # Prediction\n    if args.do_predict:\n        logger.info(\"*** Predict ***\")\n        prediction_output = trainer.predict(test_dataset=data_module['predict_dataset'],metric_key_prefix=\"predict\")\n        prediction_metrics = prediction_output.metrics\n        predictions = prediction_output.predictions\n        predictions = np.where(predictions != -100, predictions, tokenizer.pad_token_id)\n        predictions = tokenizer.batch_decode(\n            predictions, skip_special_tokens=True, clean_up_tokenization_spaces=True\n        )\n        with open(os.path.join(args.output_dir, 'predictions.jsonl'), 'w') as fout:\n            for i, example in enumerate(data_module['predict_dataset']):\n                example['prediction_with_input'] = predictions[i].strip()\n                example['prediction'] = predictions[i].replace(example['input'], '').strip()\n                fout.write(json.dumps(example) + '\\n')\n        print(prediction_metrics)\n        trainer.log_metrics(\"predict\", prediction_metrics)\n        trainer.save_metrics(\"predict\", prediction_metrics)\n        all_metrics.update(prediction_metrics)\n\n    if (args.do_train or args.do_eval or args.do_predict):\n        with open(os.path.join(args.output_dir, \"metrics.json\"), \"w\") as fout:\n            fout.write(json.dumps(all_metrics))\n\nif __name__ == \"__main__\":\n    train()\n"
  },
  {
    "path": "requirements.txt",
    "content": "bert-score==0.3.13\nevaluate==0.4.0\nrouge-score==0.1.2\nscikit-learn==1.2.2\nsentencepiece==0.1.99\nwandb==0.15.2\ntransformers==4.31.0\npeft==0.4.0\naccelerate==0.21.0"
  }
]