[
  {
    "path": ".gitignore",
    "content": "# Byte-compiled / optimized / DLL files\n__pycache__/\n*.py[cod]\n*$py.class\n\n# C extensions\n*.so\n\n# Distribution / packaging\n.Python\nbuild/\ndevelop-eggs/\ndist/\ndownloads/\neggs/\n.eggs/\nlib/\nlib64/\nparts/\nsdist/\nvar/\nwheels/\npip-wheel-metadata/\nshare/python-wheels/\n*.egg-info/\n.installed.cfg\n*.egg\nMANIFEST\n\n# PyInstaller\n#  Usually these files are written by a python script from a template\n#  before PyInstaller builds the exe, so as to inject date/other infos into it.\n*.manifest\n*.spec\n\n# Installer logs\npip-log.txt\npip-delete-this-directory.txt\n\n# Unit test / coverage reports\nhtmlcov/\n.tox/\n.nox/\n.coverage\n.coverage.*\n.cache\nnosetests.xml\ncoverage.xml\n*.cover\n*.py,cover\n.hypothesis/\n.pytest_cache/\n\n# Translations\n*.mo\n*.pot\n\n# Django stuff:\n*.log\nlocal_settings.py\ndb.sqlite3\ndb.sqlite3-journal\n\n# Flask stuff:\ninstance/\n.webassets-cache\n\n# Scrapy stuff:\n.scrapy\n\n# Sphinx documentation\ndocs/_build/\n\n# PyBuilder\ntarget/\n\n# Jupyter Notebook\n.ipynb_checkpoints\n\n# IPython\nprofile_default/\nipython_config.py\n\n# Profiling\n*.pclprof\n\n# pyenv\n.python-version\n\n# pipenv\n#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.\n#   However, in case of collaboration, if having platform-specific dependencies or dependencies\n#   having no cross-platform support, pipenv may install dependencies that don't work, or not\n#   install all needed dependencies.\n#Pipfile.lock\n\n# PEP 582; used by e.g. github.com/David-OConnor/pyflow\n__pypackages__/\n\n# Celery stuff\ncelerybeat-schedule\ncelerybeat.pid\n\n# SageMath parsed files\n*.sage.py\n\n# Environments\n.env\n.venv\n.idea\nenv/\nvenv/\nENV/\nenv.bak/\nvenv.bak/\n\n# Spyder project settings\n.spyderproject\n.spyproject\n\n# VSCode project settings\n.vscode/\n\n# Rope project settings\n.ropeproject\n\n# mkdocs documentation\n/site\nmkdocs_github_authors.yaml\n\n# mypy\n.mypy_cache/\n.dmypy.json\ndmypy.json\n\n# Pyre type checker\n.pyre/\n\n# datasets and projects\ndatasets/\nruns/\nwandb/\ntests/\nlogs/\n.DS_Store\n\n# Neural Network weights -----------------------------------------------------------------------------------------------\nweights/\n*.weights\n*.pt\n*.pb\n*.onnx\n*.engine\n*.mlmodel\n*.mlpackage\n*.torchscript\n*.tflite\n*.h5\n*_saved_model/\n*_web_model/\n*_openvino_model/\n*_paddle_model/\npnnx*\n\n# Autogenerated files for tests\n/ultralytics/assets/\n\n# dataset cache \n*.cache"
  },
  {
    "path": "Ultralytics-YOLO-project.md",
    "content": "# Ultralytics-YOLO项目详细说明\n\n1. 本项目集成了YOLOv8、v10、v11、v12乃至前沿的YOLO26等全系列基础模型。 无论是做横向对比实验，还是纵向的版本改进，无需到处找资源，一个项目就能满足你所有的实验需求！\n2. 核心代码已实现高度模块化与解耦，专为新手优化。 你完全不需要死磕底层复杂代码，只需像搭积木一样简单修改YAML配置文件，就能轻松实现各种改进模块的自由组合。\n3. 面对日益内卷的YOLO赛道，简单的“缝合”已难满足毕业要求。 本项目不仅提供现成的创新方案，更配套独家“二次创新”课程，授人以渔。我们将手把手教你掌握模块设计的底层逻辑，助你从“模仿者”进阶为“创造者”，设计出独属于你的创新模块。\n4. 针对有代码基础但受困于Ultralytics复杂架构的同学， 本项目引入了来自DFine、DEIM项目中成熟的“万物皆可融”架构思想。你无需纠结模块注册等信息，只需遵循我所提供的标准接口规范，即可将自定义魔改模块无缝融入YAML配置，与各类CSP变种灵活结合。\n5. 实验跑通了，却不知道如何写创新点？ 本项目将定期拆解高分论文，传授写作心法，教你如何将实验成果转化为逻辑严密、亮点突出的高质量学术论文，解决写作难题！\n6. 毕业设计缺少高大上的展示界面？ 别担心，项目会内置基于PyQt或HTML的通用可视化界面，开箱即用，完美补齐毕业论文的最后一块拼图，助你从容应对答辩！\n7. 购买即享专属技术交流群， 这里有业内公认的高效答疑服务，以及志同道合的伙伴互助交流。拒绝闭门造车，让我们带你避开深坑，高效通关！  \n\n## 针对于已经入手了yolov8/yolo11项目的同学来说，如果你有以下几点需求，可以考虑追加入手！\n1. 想用最新的YOLO26做实验！而且本项目支持v8、v10、11、12、26全系列版本！\n2. 想深入学习改进创新的同学，本项目会附带二次创新的通用教程，手把手教你设计出属于自己的创新模块！\n3. 做完实验不知道怎么写论文？本项目会定期拆解高分论文案例，教你如何把实验结果写成逻辑清晰、亮点突出的高质量学术论文\n4. 想自己魔改模块的同学！本项目提供超级简单的模块注册方式，只需按照教程操作，就能轻松注册自己的模块，还能和各种CSP变种随意组合！\n\n## 模块列表(这些模块均已在代码中注册好，只需要修改yaml可以直接实验)\n\n- ultralytics/nn/extra_modules/attention \n\n    1. ultralytics/nn/extra_modules/attention/SEAM.py\n    2. CVPR2021|ultralytics/nn/extra_modules/attention/ca.py\n    3. ICASSP2023|ultralytics/nn/extra_modules/attention/ema.py\n    4. ICML2021|ultralytics/nn/extra_modules/attention/simam.py\n    5. ICCV2023|ultralytics/nn/extra_modules/attention/lsk.py\n    6. WACV2024|ultralytics/nn/extra_modules/attention/DeformableLKA.py\n    7. ultralytics/nn/extra_modules/attention/mlca.py\n    8. BIBM2024|ultralytics/nn/extra_modules/attention/FSA.py\n    9. AAAI2025|ultralytics/nn/extra_modules/attention/CDFA.py\n    10. TGRS2025|ultralytics/nn/extra_modules/attention/MCA.py\n    11. CVPR2025|ultralytics/nn/extra_modules/attention/CASAB.py \n    12. NN2025|ultralytics/nn/extra_modules/attention/KSFA.py\n    13. TPAMI2025|ultralytics/nn/extra_modules/attention/GQL.py\n    14. TGRS2025|ultralytics/nn/extra_modules/attention/ACA.py\n    15. TGRS2025|ultralytics/nn/extra_modules/attention/DHPF.py\n    16. TGRS2025|ultralytics/nn/extra_modules/attention/ACAB.py\n\n- ultralytics/nn/extra_modules/conv_module(此部分内容教程可以看GuideVideo-MG.md中的改进模块-使用教程的第五节)\n\n    1. CVPR2021|ultralytics/nn/extra_modules/conv_module/dbb.py\n    2. TIP2024|ultralytics/nn/extra_modules/conv_module/deconv.py\n    3. ICCV2023|ultralytics/nn/extra_modules/conv_module/dynamic_snake_conv.py\n    4. CVPR2023|ultralytics/nn/extra_modules/conv_module/pconv.py\n    5. AAAI2025|ultralytics/nn/extra_modules/conv_module/psconv.py\n    6. CVPR2025|ultralytics/nn/extra_modules/conv_module/ShiftwiseConv.py\n    7. ultralytics/nn/extra_modules/conv_module/wdbb.py\n    8. ultralytics/nn/extra_modules/conv_module/deepdbb.py\n    9. ECCV2024|ultralytics/nn/extra_modules/conv_module/wtconv2d.py\n    10. CVPR2023|ultralytics/nn/extra_modules/conv_module/ScConv.py\n    11. ultralytics/nn/extra_modules/conv_module/dcnv2.py\n    12. CVPR2024|ultralytics/nn/extra_modules/conv_module/DilatedReparamConv.py\n    13. ultralytics/nn/extra_modules/conv_module/gConv.py\n    14. CVPR2024|ultralytics/nn/extra_modules/conv_module/IDWC.py\n    15. ultralytics/nn/extra_modules/conv_module/DSA.py\n    16. CVPR2025|ultralytics/nn/extra_modules/conv_module/FDConv.py\n    17. CVPR2023|ultralytics/nn/extra_modules/conv_module/dcnv3.py\n    18. CVPR2024|ultralytics/nn/extra_modules/conv_module/dcnv4.py\n    19. CVPR2024|ultralytics/nn/extra_modules/conv_module/DynamicConv.py\n    20. CVPR2024|ultralytics/nn/extra_modules/conv_module/FADC.py\n    21. CVPR2023|ultralytics/nn/extra_modules/conv_module/SMPConv.py\n    22. MIA2025|ultralytics/nn/extra_modules/conv_module/FourierConv.py\n    23. CVPR2024|ultralytics/nn/extra_modules/conv_module/SFSConv.py\n    24. ICCV2025|ultralytics/nn/extra_modules/conv_module/MBRConv.py\n    25. ICCV2025|ultralytics/nn/extra_modules/conv_module/ConvAttn.py\n    26. ICCV2025|ultralytics/nn/extra_modules/conv_module/Converse2D.py\n    27. CVPR2025|ultralytics/nn/extra_modules/conv_module/gcconv.py\n    28. ACCV2024|ultralytics/nn/extra_modules/conv_module/RMBC.py\n    29. CVPR2026|ultralytics/nn/extra_modules/conv_module/DEGConv.py\n\n- engine/extre_module/custom_nn/stem\n\n    1. ultralytics/nn/extra_modules/stem/SRFD.py\n    2. ultralytics/nn/extra_modules/stem/LoG.py\n    3. ICCV2023|ultralytics/nn/extra_modules/stem/RepStem.py\n\n- ultralytics/nn/extra_modules/upsample\n\n    1. CVPR2024|ultralytics/nn/extra_modules/upsample/eucb.py\n    2. CVPR2024|ultralytics/nn/extra_modules/upsample/eucb_sc.py\n    3. ultralytics/nn/extra_modules/upsample/WaveletUnPool.py\n    4. ICCV2019|ultralytics/nn/extra_modules/upsample/CARAFE.py\n    5. ICCV2023|ultralytics/nn/extra_modules/upsample/DySample.py\n    6. ICCV2025|ultralytics/nn/extra_modules/upsample/Converse2D_Up.py\n    7. CVPR2025|ultralytics/nn/extra_modules/upsample/DSUB.py\n\n- ultralytics/nn/extra_modules/downsample\n\n    1. TIP2020|ultralytics/nn/extra_modules/downsample/gcnet.py\n    2. 自研模块|ultralytics/nn/extra_modules/downsample/lawds.py \n    3. ultralytics/nn/extra_modules/downsample/WaveletPool.py\n    4. ultralytics/nn/extra_modules/downsample/ADown.py\n    5. ultralytics/nn/extra_modules/downsample/YOLOV7Down.py\n    6. ultralytics/nn/extra_modules/downsample/SPDConv.py\n    7. ultralytics/nn/extra_modules/downsample/HWD.py\n    8. ultralytics/nn/extra_modules/downsample/DRFD.py\n    9. TGRS2025|ultralytics/nn/extra_modules/conv_module/FSConv.py\n\n- ultralytics/nn/extra_modules/module\n\n    1. AAAI2025|ultralytics/nn/extra_modules/module/APBottleneck.py\n    2. CVPR2025|ultralytics/nn/extra_modules/module/efficientVIM.py\n    3. CVPR2023|ultralytics/nn/extra_modules/module/fasterblock.py\n    4. CVPR2024|ultralytics/nn/extra_modules/module/starblock.py\n    5. ultralytics/nn/extra_modules/module/DWR.py\n    6. CVPR2024|ultralytics/nn/extra_modules/module/UniRepLKBlock.py\n    7. CVPR2025|ultralytics/nn/extra_modules/module/mambaout.py\n    8. AAAI2024|ultralytics/nn/extra_modules/module/DynamicFilter.py\n    9. ultralytics/nn/extra_modules/module/StripBlock.py\n    10. TGRS2024|ultralytics/nn/extra_modules/module/elgca.py\n    11. CVPR2024|ultralytics/nn/extra_modules/module/LEGM.py\n    12. ICCV2023|ultralytics/nn/extra_modules/module/iRMB.py\n    13. TPAMI2025|ultralytics/nn/extra_modules/module/MSBlock.py\n    14. ICLR2024|ultralytics/nn/extra_modules/module/FATBlock.py\n    15. CVPR2024|ultralytics/nn/extra_modules/module/MSCB.py\n    16. ultralytics/nn/extra_modules/module/LEGBlock.py\n    17. ultralytics/nn/extra_modules/module/GLSA.py\n    18. CVPR2025|ultralytics/nn/extra_modules/module/RCB.py\n    19. ECCV2024|ultralytics/nn/extra_modules/module/JDPM.py\n    20. CVPR2025|ultralytics/nn/extra_modules/module/vHeat.py\n    21. CVPR2025|ultralytics/nn/extra_modules/module/EBlock.py\n    22. CVPR2025|ultralytics/nn/extra_modules/module/DBlock.py\n    23. ECCV2024|ultralytics/nn/extra_modules/module/FMB.py\n    24. CVPR2024|ultralytics/nn/extra_modules/module/IDWB.py\n    25. ECCV2022|ultralytics/nn/extra_modules/module/LFE.py\n    26. AAAI2025|ultralytics/nn/extra_modules/module/FCM.py\n    27. CVPR2024|ultralytics/nn/extra_modules/module/RepViTBlock.py\n    28. CVPR2024|ultralytics/nn/extra_modules/module/PKIModule.py\n    29. CVPR2024|ultralytics/nn/extra_modules/module/camixer.py\n    30. ICCV2025|ultralytics/nn/extra_modules/module/ESC.py\n    31. CVPR2025|ultralytics/nn/extra_modules/module/nnWNet.py\n    32. TGRS2025|ultralytics/nn/extra_modules/module/ARF.py\n    33. AAAI2024|ultralytics/nn/extra_modules/module/CFBlock.py\n    34. IJCV2024|ultralytics/nn/extra_modules/module/FMA.py\n    35. ultralytics/nn/extra_modules/module/LWGA.py\n    36. TGRS2025|ultralytics/nn/extra_modules/module/CSSC.py\n    37. TGRS2025|ultralytics/nn/extra_modules/module/CNCM.py\n    38. ICCV2025|ultralytics/nn/extra_modules/module/HFRB.py\n    39. ICIP2025|ultralytics/nn/extra_modules/module/EVA.py\n    40. CVPR2025|ultralytics/nn/extra_modules/module/IEL.py\n    41. MICCAI2023|ultralytics/nn/extra_modules/module/MFEBlock.py\n    42. AAAI2026|ultralytics/nn/extra_modules/module/PartialNetBlock.py\n    43. TGRS2025|ultralytics/nn/extra_modules/module/DRG.py\n    44. ultralytics/nn/extra_modules/module/Wave2D.py\n    45. TGRS2025|ultralytics/nn/extra_modules/module/GLGM.py\n    46. TGRS2025|ultralytics/nn/extra_modules/module/MAC.py\n    47. AAAI2026|ultralytics/nn/extra_modules/module/SPJFB.py\n\n- ultralytics/nn/extra_modules/block \n    \n    1. ultralytics/nn/extra_modules/block/CSPBlock.py\n    2. TPAMI2025|ultralytics/nn/extra_modules/block/MANet.py\n    3. TPAMI2024|ultralytics/nn/extra_modules/block/MetaFormer.py\n\n- ultralytics/nn/extra_modules/transformer\n\n    1. ICLR2025|ultralytics/nn/extra_modules/transformer/PolaLinearAttention.py\n    2. CVPR2023|ultralytics/nn/extra_modules/transformer/biformer.py\n    3. CVPR2023|ultralytics/nn/extra_modules/transformer/CascadedGroupAttention.py\n    4. CVPR2022|ultralytics/nn/extra_modules/transformer/DAttention.py\n    5. ICLR2022|ultralytics/nn/extra_modules/transformer/DPBAttention.py\n    6. CVPR2024|ultralytics/nn/extra_modules/transformer/AdaptiveSparseSA.py\n    7. ultralytics/nn/extra_modules/transformer/GSA.py\n    8. ultralytics/nn/extra_modules/transformer/RSA.py\n    9. ECCV2024|ultralytics/nn/extra_modules/transformer/FSSA.py\n    10. AAAI2025|ultralytics/nn/extra_modules/transformer/DilatedGCSA.py\n    11. AAAI2025|ultralytics/nn/extra_modules/transformer/DilatedMWSA.py\n    12. CVPR2024|ultralytics/nn/extra_modules/transformer/SHSA.py\n    13. IJCAI2024|ultralytics/nn/extra_modules/transformer/CTA.py\n    14. IJCAI2024|ultralytics/nn/extra_modules/transformer/SFA.py\n    15. ultralytics/nn/extra_modules/transformer/MSLA.py\n    16. ACMMM2025|ultralytics/nn/extra_modules/transformer/CPIA_SA.py\n    17. NN2025|ultralytics/nn/extra_modules/transformer/TokenSelectAttention.py\n    18. CVPR2025|ultralytics/nn/extra_modules/transformer/TAB.py\n    19. TPAMI2025|ultralytics/nn/extra_modules/transformer/LRSA.py\n    20. ICCV2025|ultralytics/nn/extra_modules/transformer/MALA.py\n    21. ICML2023|ultralytics/nn/extra_modules/transformer/MUA.py\n    22. ACMMM2025|ultralytics/nn/extra_modules/transformer/EGSA.py\n    23. ACMMM2025|ultralytics/nn/extra_modules/transformer/SWSA.py\n    24. AAAI2026|ultralytics/nn/extra_modules/transformer/DHOGSA.py\n    25. NeurIPS2025|ultralytics/nn/extra_modules/transformer/CBSA.py\n    26. TGRS2025|ultralytics/nn/extra_modules/transformer/DPWA.py\n    27. TIP2025|ultralytics/nn/extra_modules/transformer/DWM_MSA.py\n    28. CVPR2026|ultralytics/nn/extra_modules/transformer/BinaryAttention.py\n    29. CVPR2025|ultralytics/nn/extra_modules/transformer/wca.py\n\n- ultralytics/nn/extra_modules/mamba\n\n    1. AAAI2025|ultralytics/nn/extra_modules/mamba/SS2D.py\n    2. CVPR2025|ultralytics/nn/extra_modules/mamba/ASSM.py\n    3. CVPR2025|ultralytics/nn/extra_modules/mamba/SAVSS.py\n    4. CVPR2025|ultralytics/nn/extra_modules/mamba/MobileMamba/mobilemamba.py\n    5. CVPR2025|ultralytics/nn/extra_modules/mamba/MaIR.py\n    6. TGRS2025|ultralytics/nn/extra_modules/mamba/GLVSS.py\n    7. ICCV2025|ultralytics/nn/extra_modules/mamba/VSSD.py\n    8. ICCV2025|ultralytics/nn/extra_modules/mamba/TinyViM.py\n    9. INFFUS2025|ultralytics/nn/extra_modules/mamba/CSI.py\n    10. TIP2025|ultralytics/nn/extra_modules/mamba/SFMB.py\n    11. TGRS2025|ultralytics/nn/extra_modules/mamba/GLSS.py\n    12. TGRS2025|ultralytics/nn/extra_modules/mamba/GLSS2D.py\n    13. CVPR2026|ultralytics/nn/extra_modules/mamba/TransMixer.py\n\n- ultralytics/nn/extra_modules/mlp\n\n    1. CVPR2024|ultralytics/nn/extra_modules/mlp/ConvolutionalGLU.py\n    2. IJCAI2024|ultralytics/nn/extra_modules/mlp/DFFN.py\n    3. ICLR2024|ultralytics/nn/extra_modules/mlp/FMFFN.py\n    4. CVPR2024|ultralytics/nn/extra_modules/mlp/FRFN.py\n    5. ECCV2024|ultralytics/nn/extra_modules/mlp/EFFN.py \n    6. WACV2025|ultralytics/nn/extra_modules/mlp/SEFN.py\n    7. ICLR2025|ultralytics/nn/extra_modules/mlp/KAN.py\n    8. CVPR2025|ultralytics/nn/extra_modules/mlp/EDFFN.py\n    9. ICVJ2024|ultralytics/nn/extra_modules/mlp/DML.py\n    10. AAAI2026|ultralytics/nn/extra_modules/mlp/DIFF.py\n\n- ultralytics/nn/extra_modules/neck\n\n    1. ultralytics/nn/extra_modules/neck/ASF.py\n    2. ultralytics/nn/extra_modules/neck/BiFPN.py\n    3. AAAI2022|ultralytics/nn/extra_modules/neck/CTrans.py\n    4. ultralytics/nn/extra_modules/neck/EfficientRepBiPAN.py\n    5. ultralytics/nn/extra_modules/neck/GFPN.py\n    6. ultralytics/nn/extra_modules/neck/HSFPN.py\n    7. AAAI2025|ultralytics/nn/extra_modules/neck/HS_FPN.py\n    8. TPAMI2025|ultralytics/nn/extra_modules/neck/HyperComputeModule.py\n    9. ultralytics/nn/extra_modules/neck/SlimNeck.py\n    10. ultralytics/nn/extra_modules/neck/GoldYOLO.py\n    11. ultralytics/nn/extra_modules/neck/EMBSFPN.py\n\n- ultralytics/nn/extra_modules/featurefusion\n\n    1. 自研模块|ultralytics/nn/extra_modules/featurefusion/cgfm.py\n    2. BMVC2024|ultralytics/nn/extra_modules/featurefusion/msga.py\n    3. CVPR2024|ultralytics/nn/extra_modules/featurefusion/mfm.py\n    4. TIP2023|ultralytics/nn/extra_modules/featurefusion/CSFCN.py\n    5. BIBM2024|ultralytics/nn/extra_modules/featurefusion/mpca.py\n    6. ACMMM2024|ultralytics/nn/extra_modules/featurefusion/wfu.py\n    7. CVPR2025|ultralytics/nn/extra_modules/featurefusion/GDSAFusion.py\n    8. ultralytics/nn/extra_modules/featurefusion/PST.py\n    9. TGRS2025|ultralytics/nn/extra_modules/featurefusion/MSAM.py\n    10. INFFUS2025|ultralytics/nn/extra_modules/featurefusion/DPCF.py\n    11. CVRP2025|ultralytics/nn/extra_modules/featurefusion/LCA.py\n    12. TGRS2025|ultralytics/nn/extra_modules/featurefusion/HFFE.py\n    13. TGRS2025|ultralytics/nn/extra_modules/featurefusion/MFPM.py\n    14. TGRS2025|ultralytics/nn/extra_modules/featurefusion/ERM.py\n    15. TIP2025|ultralytics/nn/extra_modules/featurefusion/CAFM.py\n    16. TIP2024|ultralytics/nn/extra_modules/featurefusion/CGAFusion.py\n    17. IF2023|ultralytics/nn/extra_modules/featurefusion/PSFM.py\n    18. IF2023|ultralytics/nn/extra_modules/featurefusion/SDFM.py\n    19. 自研模块|ultralytics/nn/extra_modules/featurefusion/DAF.py\n    20. 自研模块|ultralytics/nn/extra_modules/featurefusion/CIDAF.py\n    21. 自研模块|ultralytics/nn/extra_modules/featurefusion/WDAF.py\n\n- ultralytics/nn/extra_modules/norm\n\n    1. ICML2024|engine/extre_module/custom_nn/transformer/repbn.py\n    2. CVPR2025|engine/extre_module/custom_nn/transformer/dyt.py\n    3. engine/extre_module/custom_nn/norm/derf.py\n\n- ultralytics/nn/extra_modules/featurepreprocess\n\n    1. TGRS2025|ultralytics/nn/extra_modules/featurepreprocess/FAENet.py\n\n- ultralytics/nn/extra_modules/head(ultralytics/cfg/models/improve/head)\n\n    1. ultralytics/nn/extra_modules/head/LSPCD.py\n\n## Loss 列表\n\n#### 默认配置（兼容）\n\n- cls_loss=bce\n- iou_loss=ciou\n- iou_aux=none\n\n- cls_loss（分类损失）\n\n    1. bce\n    2. slide\n    3. ema_slide\n    4. focal\n    5. varifocal\n    6. qualityfocal\n\n- iou_loss（IoU主损失）\n\n    1. 基础形式：\n       iou、giou、diou、ciou、eiou、siou、shapeiou、piou、piou2\n    2. Inner形式：\n       inner_<base>（例如：inner_diou、inner_ciou、inner_siou）\n    3. Focaler形式：\n       focaler_<base>（例如：focaler_diou、focaler_ciou、focaler_siou）\n    4. MPDIoU家族：\n       mpdiou、inner_mpdiou、focaler_mpdiou\n    5. WiseIoU家族：\n       wiseiou（等价wiseiou_wiou）\n       wiseiou_<variant>\n       wiseiou_inner_<variant>\n       wiseiou_focaler_<variant>\n    6. wise <variant> 可选值：\n       iou、wiou、giou、diou、ciou、eiou、siou、shapeiou、piou、piou2、mpdiou\n\n- iou_aux（IoU辅助损失）\n\n    1. none\n    2. gcd\n    3. nwd\n\n## 更新公告\n\n- 20260217\n\n    1. 初版项目发布.\n    2. 新增使用教程、模块改进使用教程视频.\n\n- 20260228\n\n    1. 新增常见的cls和iou的损失，并直接支持在train.py里面指定，并且在训练的时候会打印目前的loss.\n    2. 对模型改进的yaml扩展到yolov8、yolov10、yolo11、yolo12.\n    3. 新增在训练过程中mAP75输出.\n    4. 优化detect.py中的特征图保存机制，使其可以单独保存每一个通道的特征图和总通道求和的特征图.\n    5. 新增毕业必备-基于web的可视化界面，支持选择模型、检测图片、检测视频，显示目标数量等等功能\n    6. 新增web界面的教程视频.\n    7. 新增注册module的教程视频.\n   \n- 20260308\n\n    1. 在val.py脚本中增加auto_coco_eval指标，支持一步到位计算COCO指标，不需要再人为转换标签和对齐标签的问题！\n    2. 新增AAAI2026-SPJFB模块.\n    3. 新增TGRS2025-GLSS2D模块.\n    4. 新增TIP2025-CAFM模块.\n    5. 新增TIP2025-DWM_MSA模块.\n    6. 新增DynamicERF模块.\n    7. 新增CSP、MetaFormer、Module在yaml中的使用教程-20260307补充版的视频.\n    8. 修复用户反馈的bug.\n\n- 20260315\n    \n    1. 新增CVPR2026-DEGConv模块。\n    2. 新增CVPR2026-BinaryAttention模块。\n    3. 新增CVPR2026-TransMixer模块。\n    4. 新增CVPR2025-wca模块。\n    5. 新增自研模块-DAF模块。\n    6. 新增自研模块-CIDAF模块。\n    7. 新增自研模块-WDAF模块。\n    8. 新增Neck部分内容(ASF、BIFPN、CTrans、ERepBIFPN、GFPN、HSFPN、HS-FPN、超图FPN、SlimNeck、GoldYOLO、EMBSFPN)。\n    9. 补全attention部分的配置文件。\n    10. 新增conv、attention的内容如何与CSP模块随意组合的使用教程。\n    11. 修复用户反馈的bug。"
  },
  {
    "path": "bilibili-guide.md",
    "content": "# 魔鬼面具-哔哩哔哩视频指南\n\n### 必看干货系列(建议搞深度学习的小伙伴都看看,特别是图像相关)\n1. [深度学习常见实验问题与实验技巧(适用于所有模型，小白初学者必看!)](https://www.bilibili.com/video/BV17j41147j8/)\n2. [还在迷茫深度学习中的改进实验应该从哪里开始改起的同学，一定要进来看看了！用自身经验给你推荐实验顺序！](https://www.bilibili.com/video/BV1Nu4y1G7B9/)\n3. [探究深度学习中预训练权重对改进和精度的影响!](https://www.bilibili.com/video/BV1FH4y1o7GL/)\n4. [什么？你说你不会画模型结构图？行吧，那你进来看看吧，手把手教你画YAML结构图！](https://www.bilibili.com/video/BV1X94y1K76Z/)\n5. [探究深度学习中训练中的可重现性](https://www.bilibili.com/video/BV1Nu4y1s7sc/)\n6. [什么？你说你更换主干后看不懂配置文件也不懂画结构图？那你快点进来看看了！](https://www.bilibili.com/video/BV1WA4m1V7nQ/)\n7. [从三个角度分析，什么条件才算是一个合格的改进专栏！](https://www.bilibili.com/video/BV1E6421g7eb/)\n8. [都2024了，你写论文不会还只用p,r,map这些指标分析目标检测模型吧？](https://www.bilibili.com/video/BV1wF4m177JQ/)\n9. [从简到难手把手教你画Pytorch模块内的结构图！](https://www.bilibili.com/video/BV1dC411p7H7/)\n10. [深度学习论文实验中的其中一大注意点-预训练权重究竟加还是不加？](https://www.bilibili.com/video/BV1Q1421Q7Zw/)\n11. [深度学习改进实验必看！基于YOLOV8的WIDER-FACE改进(轻量化+提点)实验思路讲解](https://www.bilibili.com/video/BV1QJ4m1H7DJ/)\n12. [YOLOV8-硬塞注意力机制？这样做没创新！想知道注意力怎么用才有创新那赶快来看看！](https://www.bilibili.com/video/BV1bm421K7tf/)\n13. [YOLOV8改进-还硬塞注意力机制？这期用注意力机制手把手给大家自研一个ContextGuideFPN！创新真的不难，需要找对方法！](https://www.bilibili.com/video/BV1Vx4y1n7hZ/)\n14. [长达46分钟的肺腑之言！给以后想从事图像算法工程师、小白入门深度学习路线的总结！](https://www.bilibili.com/video/BV16y411h7T9/)\n15. [提升多少才能发paper？轻量化需要看什么指标？需要轻量化到什么程度才能发paper？这期给大家一一解答！](https://www.bilibili.com/video/BV1QZ421M7gu/)\n16. [深度学习实验部分常见疑问解答！(小白刚入门必看！少走弯路！少自我内耗！)](https://www.bilibili.com/video/BV1Bz421B7pC/)\n    ```\n    1. 如何衡量自己的所做的工作量够不够？\n    2. 为什么别人的论文说这个模块对xxx有作用，但是我自己用的时候还掉点了？\n    3. 提升是和什么模型相比呢 比如和yolov8这种基础模型比还是和别人提出的目前最好的模型比\n    4. 对比不同的模型的时候，输入尺寸，学习率，学习次数这些是否需要一致？\n    ```\n17. [深度学习实验部分常见疑问解答二！(小白刚入门必看！少走弯路！少自我内耗！)](https://www.bilibili.com/video/BV1ZM4m1m785/)\n    ```\n    1. 为什么我用yolov8自带的coco8、coco128训练出来的效果很差？\n    2. 我的数据集很大，机器跑得慢，我是否可以用数据集的百分之10的数据去测试这个改进点是否有效？有效再跑整个数据集？\n    ```\n18. [深度学习实验部分常见疑问解答三！(怎么判断模型是否收敛？模型过拟合怎么办？)](https://www.bilibili.com/video/BV11S421d76P/)\n19. [YOLO系列模型训练结果详细解答！(训练过程的一些疑问，该放哪个文件运行出来的结果、参数量计算量在哪里看..等等问题)](https://www.bilibili.com/video/BV11b421J7Vx/)\n20. [细谈目标检测中的小目标检测头和大目标检测检测头，并教懂你怎么加微小目标、极大目标检测头！](https://www.bilibili.com/video/BV1jkDWYFEwx/)\n21. [深度学习炼丹必备必看必须知道的小技巧！](https://www.bilibili.com/video/BV1q3SZYsExc/)\n22. [深度学习实验准备-数据集怎么选？有哪些需要注意的点？](https://www.bilibili.com/video/BV11zySYvEhs/)\n23. [深度学习论文实验中新手非常容易陷入的一个误区：抱着解决xxx问题的心态去做实验](https://www.bilibili.com/video/BV1kkkvYJEHG/)\n24. [小目标检测必看系列 | 除了AP-Small指标，可还有AP-VeryTiny、AP-Tiny的指标喔~手把手带你加！](https://www.bilibili.com/video/BV1CYcUeBEzY/)\n25. [YOLO中的实例分割原来是这样巧妙地实现的！你在做YOLO-Seg但是又不知道的话，那你要进来看看咯～](https://www.bilibili.com/video/BV1SkP1e1EHC/)\n26. [长达30分钟的吐血讲解！为什么别人的纯YOLO小目标检测能上AAAI2025，你的连个最差的都费劲！看看差距在哪里，怎么改善！](https://www.bilibili.com/video/BV14DJazTEtV)\n27. [深度学习论文中的基础实验、改进实验、 消融实验、对比实验、泛化实验｜这些究竟是什么？](https://www.bilibili.com/video/BV1NYKUz2E6b/)\n28. [深度学习论文中的推理结果图、热力图、特征图究竟应该怎么放？需要注意什么？有什么作用？](https://www.bilibili.com/video/BV1s5gQzcEPh/)\n29. [YOLO｜RTDETR｜我会跑Ultralytics了！但是输出的这些都怎么看呀？论文中的结果写什么呀？需要注意什么呀？](https://www.bilibili.com/video/BV1VfbVzHEGM/)\n\n### 服务器租用系列\n1. [|DAModel|竟然有一个\"不需要装环境就能跑YOLO代码\"的服务器平台？让我们一起来看看！](https://www.bilibili.com/video/BV1mg2SYGEGF)\n2. [|DAModel|给大家准备好COCO、VOC、VisDrone、CrowdHuman、BDD100K数据集啦～YOLO格式和data.yaml都已配置好～](https://www.bilibili.com/video/BV1UV5qzuEGf)\n3. [智算云扉服务器平台｜0.99每小时的3090？RTX4090-48GB的显卡？已经配置好的YOLO｜RTDETR环境？充值还有额外算力点？标题有限制优势说不完。](https://www.bilibili.com/video/BV11DXTYiENS)\n\n### 必看论文分享系列\n1. [有营养的必看论文分享系列一-RTMDet<考虑到精度、速度、部署的2D目标检测网络>](https://www.bilibili.com/video/BV1ab421J77G/)\n2. [有营养的必看论文分享系列二-MobileNets<轻量化的开山之作>](https://www.bilibili.com/video/BV1hM4m117JW/)\n3. [计算机视觉|YOLO|DETR|2025创新必看的论文之一|MetaFormer(TPAMI2024),选对Baseline是成功的第一步](https://www.bilibili.com/video/BV1W5ATetEg6/)\n\n### 高区论文带读系列\n1. [高区论文带读系列一-40分钟长视频带你分析一篇SCI1区的文章，SCI1区也不是触不可及！](https://www.bilibili.com/video/BV1JESuYxEjn/)\n2. [高区论文带读系列二-学会捕捉数据集场景下的要害问题是写好文章的第一步！](https://www.bilibili.com/video/BV1XNqjYNEyg/)\n\n### YOLO系列配置文件系列\n1. [不会把多个改进整合到一个yaml配置文件里面？那来看看这个吧！从简到难手把手带你整合三个yaml](https://www.bilibili.com/video/BV15H4y1Y7a2/)\n2. [细谈目标检测中的小目标检测头和大目标检测检测头，并教懂你怎么加微小目标、极大目标检测头！](https://www.bilibili.com/video/BV1jkDWYFEwx/)\n3. [不会看YOLO的模型yaml配置文件？那你还怎么整合多个配置文件！](https://www.bilibili.com/video/BV1oiBRYnEEw/)\n4. [不会把多个创新点整合到一个yaml配置文件里面？那来看看这个吧！手把手来你整合创新点！](https://www.bilibili.com/video/BV1DUBRYGE3b/)\n\n### YOLOV5,V7-PYQT5项目讲解\n1. [哔哩哔哩合集地址](https://space.bilibili.com/286900343/channel/collectiondetail?sid=917275)\n2. [项目github地址](https://github.com/z1069614715/yolov7-pyqt)\n\n### YOLOV5、V7、V8、V9、V10、V11、V12 热力图源码\n1. [哔哩哔哩合集地址](https://space.bilibili.com/286900343/channel/collectiondetail?sid=1080305)\n2. [项目github地址](https://github.com/z1069614715/objectdetection_script/blob/master/yolo-gradcam)\n\n### YOLO系列模型使用教程系列\n1. [YOLOV7保姆级教程](https://www.bilibili.com/video/BV1gD4y1s7zw/?spm_id_from=333.999.0.0)\n2. [YOLOV5-Seg实例分割教程](https://www.bilibili.com/video/BV1nV4y1P7HQ/?spm_id_from=333.999.0.0)\n3. [YOLOV5-快速上手教程](https://www.bilibili.com/video/BV1tM411a7it/?spm_id_from=333.999.0.0)\n4. [YOLOV8-OBB详细教学视频(包含如何把DOTA数据集分割成小图进行训练)](https://www.bilibili.com/video/BV1xK4y117fg/)\n5. [EfficientTeacher半监督-详细教学和调参注意事项](https://www.bilibili.com/video/BV1494y1v7hF/)\n6. [YOLOV9保姆级别教程来啦~包含环境配置、数据集转换、训练、测试、推理环节~一看就懂！](https://www.bilibili.com/video/BV1d1421z7XW/)\n7. [保姆级别YOLOV11-环境配置、 数据集介绍、训练、验证、推理 详细教学视频，看了它，跑YOLOV11 没问题~](https://www.bilibili.com/video/BV1VA11YBELB/)\n\n### YOLOV8V11源码常见疑问解答小课堂\n1. [关于配置文件中Optimizer参数为auto的时候，究竟Optimizer会怎么选用呢？](https://www.bilibili.com/video/BV1K34y1w7cZ/)\n2. [best.pt究竟是根据什么指标来保存的?](https://www.bilibili.com/video/BV1jN411M7MA/)\n3. [数据增强在yolov8中的应用](https://www.bilibili.com/video/BV1aQ4y1g7ah/)\n4. [如何添加FPS计算代码和FPS的相关的一些疑问](https://www.bilibili.com/video/BV1Sw411g7DD/)\n5. [预测框粗细颜色修改与精度小数位修改](https://www.bilibili.com/video/BV12K421a7rH/)\n6. [导出改进/剪枝的onnx模型和讲解onnx-opset和onnxsim的作用](https://www.bilibili.com/video/BV1CK421e7Y3/)\n7. [YOLOV8模型详细讲解(包含该如何改进YOLOV8)(刚入门小白，需要改进YOLOV8的同学必看！)](https://www.bilibili.com/video/BV1Ms421u7VH/)\n8. [学习率变化问题](https://www.bilibili.com/video/BV1frnferEL1/)\n\n### 目标检测干活系列\n1. [深入了解目标检测中的检测头](https://www.bilibili.com/video/BV1AQ4y1j7Cr/)\n2. [目标检测中的标签分配策略做了什么？分配过程中的正负样本又是什么？](https://www.bilibili.com/video/BV1Ek4aeUE2J/)\n\n### 环境配置系列教程\n1. [保姆式AUTODL-YOLO环境教程(上):从0教你如何配置VSCODE、安装新环境和CUDA和CUDNN、跑通YOLOV8、编译DCNV3](https://www.bilibili.com/video/BV1tT4y1b75q/)\n2. [保姆式AUTODL-YOLO环境教程(下):从0教你如何配置VSCODE、安装新环境和CUDA和CUDNN、跑通YOLOV8、编译DCNV3](https://www.bilibili.com/video/BV1nV411Q7mA/)\n\n### 目标检测Tricks\n1. [可视化并统计目标检测中的TP,FP,FN](https://www.bilibili.com/video/BV1yM4y1d7Gp/)\n2. [深度学习小实验-卷积家族(fps,flops,param)对比实验](https://www.bilibili.com/video/BV1UL411R7Qr/)\n3. [yolov5中的FeatureMap可视化(热力图格式)](https://www.bilibili.com/video/BV1LV4y1R7w6/)\n4. [用于yolov5和v7中的yolo格式转换coco格式的脚本.](https://www.bilibili.com/video/BV14T411s7Ts/)\n5. [Segment Anything演示代码](https://www.bilibili.com/video/BV1hv4y1H7eg/)\n6. [固定随机种子在同一个主机上极可能地复现结果](https://www.bilibili.com/video/BV1bh4y1n7Yc/)\n7. [计算yolov5推理时间和FPS的脚本](https://www.bilibili.com/video/BV1Uu4y1C714/)\n8. [计算yolov7推理时间和FPS的脚本](https://www.bilibili.com/video/BV17p4y177Pe/)\n9. [深度学习小实验-YOLO-Block家族(fps,flops,param)对比实验.](https://www.bilibili.com/video/BV17H4y1V7s9/)\n10. [输出YOLOV8、RTDETR各个层的计算量和参数量.](https://www.bilibili.com/video/BV1tb421b7aB/)\n11. [YOLOV8-不会把PR曲线的数据保存并绘制到一张图？不用怕，手把手教程来啦~](https://www.bilibili.com/video/BV1uC41177oE/)\n12. [yolov5、v7、v8、v9、v10曲线对比图、推理时间vs精度对比图绘制手把手教程！](https://www.bilibili.com/video/BV1yf421X7t5/)\n13. [YOLOV8-输出每一层的图特征图尺寸和通道数.](https://www.bilibili.com/video/BV1Mz421B7xz/)\n14. [YOLOV8V10V11V12更详细的输出精度结果](https://www.bilibili.com/video/BV1dBQDY6Ec5/)\n15. [关于数据集的可视化脚本](https://www.bilibili.com/video/BV1k2TizGEnH/)\n\n### MMDet系列教程\n1. [一库打尽目标检测对比实验！mmdetection环境、训练、测试手把手教程！](https://www.bilibili.com/video/BV1xA4m1c7H8/)\n2. [一库打尽目标检测对比实验！mmdetection参数量、计算量、FPS、绘制logs手把手教程](https://www.bilibili.com/video/BV17C41137dW/)\n3. [一库打尽目标检测对比实验！mmdetection指标转换YOLO指标！](https://www.bilibili.com/video/BV1AWtCesEc6/)\n\n### 离线数据增强教程\n1. [目标检测数据集离线数据增强教程，包含对目标框、多种变换、天气变化等等增强！](https://www.bilibili.com/video/BV1bT421k7iq/)\n2. [语义分割数据集离线数据增强教程，包含对mask、多种变换、天气变化等等增强！](https://www.bilibili.com/video/BV1xi421a7Gb/)\n3. [CVPR2025-SaMam｜手把手带你用以Mamba为核心的任意风格迁移网络去做数据集扩充！(一个小创新点有了！)](https://www.bilibili.com/video/BV1gWE4z4Eqq/)\n\n### YOLO系列(YOLOV5,YOLOV7,YOLOV8)模型改进大合集\n#### YOLOV5(主干系列修改V7同样也适用)\n1. [添加EIOU，SIOU，ALPHA-IOU, FocalEIOU到yolov5的box_iou中](https://www.bilibili.com/video/BV1KM411b7Sz/)\n2. [Wise-IoU](https://www.bilibili.com/video/BV1tG4y1N7Gk/)\n3. [使用DAMO-YOLO中的GFPN替换YOLOV5中的Head](https://www.bilibili.com/video/BV1iR4y1a7bx/)\n4. [使用DAMO-YOLO中的GFPN替换YOLOV5中的Head](https://www.bilibili.com/video/BV1iR4y1a7bx/)\n5. [使用yolov8中的C2F模块替换yolov5中的C3模块.](https://www.bilibili.com/video/BV1rx4y1g7xt/)\n6. [添加Optimal Transport Assignment到yolov5的Loss中](https://www.bilibili.com/video/BV1xD4y1J76n/)\n7. [添加Deformable convolution V2到yolov5中](https://www.bilibili.com/video/BV1rT411Q76q/)\n8. [添加辅助训练分支到yolov5中](https://www.bilibili.com/video/BV1Fo4y1v7bi/)\n9. [添加context augmentation module到yolov5中](https://www.bilibili.com/video/BV17b411d7ef/)\n10. [添加SAC到yolov5中](https://www.bilibili.com/video/BV1xD4y1u7NU/)\n11. [添加CoordConv到yolov5中](https://www.bilibili.com/video/BV1ng4y1E7rS/)\n12. [添加soft-nms(IoU,GIoU,DIoU,CIoU,EIoU,SIoU)到yolov5中](https://www.bilibili.com/video/BV1cM41147Ry/)\n13. [添加DSConv到yolov5中](https://www.bilibili.com/video/BV1iT411a7Mi/)\n14. [添加DCNV3到yolov5中.](https://www.bilibili.com/video/BV1LY411z7iE/)\n15. [添加Normalized Gaussian Wasserstein Distance到yolov5中.](https://www.bilibili.com/video/BV1zY4y197UP/)\n16. [添加Efficient-DecoupledHead到yolov5中](https://www.bilibili.com/video/BV1mk4y1h7us/)\n17. [添加FasterNet中的Faster-Block到yolov5中](https://www.bilibili.com/video/BV1Bs4y1H7Ph/)\n18. [添加Timm支持的主干到yolov5中.](https://www.bilibili.com/video/BV1Mx4y1A7jy/)\n19. [添加Task-Specific Context Decoupling到yolov5中](https://www.bilibili.com/video/BV1mk4y1h7us/)\n20. [添加FasterNet主干到yolov5中](https://www.bilibili.com/video/BV1ra4y1K77u/)\n21. [添加Omni-Dimensional Dynamic Convolution主干(od_mobilenetv2,od_resnet)到yolov5中](https://www.bilibili.com/video/BV1Jk4y1v7EW/)\n22. [融合Omni-Dimensional Dynamic Convolution主干(od_mobilenetv2,od_resnet)中的Conv和BN](https://www.bilibili.com/video/BV1Rs4y1N7fp/)\n23. [添加轻量级上采样算子CARAFE到yolov5中](https://www.bilibili.com/video/BV1kj411c72a/)\n24. [添加CFPNet中的EVC-Block到yolov5中](https://www.bilibili.com/video/BV1Pg4y1u7cM/)\n25. [添加基于注意力机制的目标检测头(DYHEAD)到yolov5中](https://www.bilibili.com/video/BV1qs4y117Mx/)\n26. [添加(2023年New)InceptionNeXt主干到yolov5中](https://www.bilibili.com/video/BV12v4y1H7E1/)\n27. [添加aLRPLoss到yolov5中](https://www.bilibili.com/video/BV1YV4y1Z7rV/)\n28. [结合Res2Net提出具有多尺度提取能力的C3模块](https://www.bilibili.com/video/BV13X4y167VB/)\n29. [添加(2022年)FocalNet(transformer)主干到yolov5中](https://www.bilibili.com/video/BV1ch411L7Dk/)\n30. [添加(2023年)EMO(transformer)主干到yolov5中](https://www.bilibili.com/video/BV1Dh4y1J7SV/)\n31. [添加(2022年)EfficientFormerV2(transformer)主干到yolov5中](https://www.bilibili.com/video/BV1da4y1g7KT/)\n32. [添加(2022年CVPR)PoolFormer(transformer)主干到yolov5中](https://www.bilibili.com/video/BV1eh411c7bz/)\n33. [添加(2023年)EfficientViT(transformer)主干到yolov5中](https://www.bilibili.com/video/BV1xk4y1L7Gu/)\n34. [添加ContextAggregation到yolov5中](https://www.bilibili.com/video/BV1Yk4y1s7Kx/)\n35. [添加(2023年)VanillaNet主干到yolov5中](https://www.bilibili.com/video/BV1os4y1v7Du/)\n36. [添加(2022年)NextViT主干到yolov5中](https://www.bilibili.com/video/BV1im4y1i7Ht/)\n37. [添加(2023年)RIFormer主干到yolov5中](https://www.bilibili.com/video/BV1bW4y1X7Lo/)\n38. [Scale-Aware RFE与C3结合而成的C3RFEM添加到yolov5中](https://www.bilibili.com/video/BV1Gj411D7Pf/)\n39. [把重参数结构DiverseBranchBlock与C3融合成C3-DBB添加到yolov5中](https://www.bilibili.com/video/BV1sM4y177Cn/)\n40. [添加(2023CVPR)EfficientViT(transformer)主干到yolov5中](https://www.bilibili.com/video/BV1xk4y1L7Gu/)\n41. [添加(2023旋转目标检测SOTA)LSKNet主干到yolov5中](https://www.bilibili.com/video/BV1xk4y1L7Gu/)\n42. [添加(2023最新IoU度量算法)MPDiou到yolov5中.](https://www.bilibili.com/video/BV19P41147gJ/)\n43. [添加Yolo-Face-V2中SlideLoss的到yolov5中](https://www.bilibili.com/video/BV1W14y1i79U/)\n44. [添加RepViT(transformer)主干到yolov5中](https://www.bilibili.com/video/BV1PH4y1S7mf/)\n45. [利用华为2023最新GOLD-YOLO中的Gatherand-Distribute进行改进YOLOV5中的特征融合模](https://www.bilibili.com/video/BV1PH4y1S7mf/)\n46. [利用动态蛇形卷积改进YOLOV5](https://www.bilibili.com/video/BV1Qu411K7Hw/)\n47. [利用带有位置信息编码的AIFI自注意力机制改进YOLOV5](https://www.bilibili.com/video/BV1nu4y1h7eS/)\n48. [添加UniRepLKNet主干到yolov5中](https://www.bilibili.com/video/BV1PH4y1S7mf/)\n49. [添加Attentional Scale Sequence Fusion到yolov5中](https://www.bilibili.com/video/BV1PH4y1S7mf/)\n50. [添加cross-scale feature-fusion到yolov5中](https://www.bilibili.com/video/BV1Tb4y1P7yd/)\n51. [添加对小目标有效的BiFormer注意力机制到yolov5中](https://www.bilibili.com/video/BV15g4y1g7bM/)\n52. [引入最新SOTA(YOLOV9)中的RepNCSPELAN模块](https://www.bilibili.com/video/BV17y421z73k/)\n#### YOLOV7\n1. [添加EIOU，SIOU，ALPHA-IOU, FocalEIOU到yolov5的box_iou中](https://www.bilibili.com/video/BV1zx4y177EF/)\n2. [Wise-IoU](https://www.bilibili.com/video/BV1yv4y147kf/)\n3. [添加Deformable convolution V2到yolov7中](https://www.bilibili.com/video/BV17R4y1q7vr/)\n4. [添加SAC到yolov7中](https://www.bilibili.com/video/BV1xD4y1u7NU/)\n5. [添加CoordConv到yolov7中](https://www.bilibili.com/video/BV1K54y1g7ye/)\n6. [添加soft-nms(IoU,GIoU,DIoU,CIoU,EIoU,SIoU)到yolov7中](https://www.bilibili.com/video/BV1ZY41167iC/)\n7. [添加DSConv到yolov7中](https://www.bilibili.com/video/BV1724y1b7PD/)\n8. [添加DCNV3到yolov7中.](https://www.bilibili.com/video/BV1mk4y1h7us/)\n9. [添加Normalized Gaussian Wasserstein Distance到yolov7中](https://www.bilibili.com/video/BV1kM411H7g1/)\n10. [添加具有隐式知识学习的Efficient-DecoupledHead到yolov7中](https://www.bilibili.com/video/BV1tg4y1x7ha/)\n11. [添加FasterNet中的PConv到yolov7中](https://www.bilibili.com/video/BV1Z84y137oi/)\n12. [添加轻量级上采样算子CARAFE到yolov7中.](https://www.bilibili.com/video/BV1yc411p7wL/)\n13. [添加基于注意力机制的目标检测头(DYHEAD)到yolov7中](https://www.bilibili.com/video/BV1Ph4y1s7i9/)\n14. [添加Omni-Dimensional Dynamic Convolution到yolov7中](https://www.bilibili.com/video/BV1vh411j71Z/)\n15. [添加CFPNet中的EVC-Block到yolov7中](https://www.bilibili.com/video/BV12u4y1f7np/)\n16. [P2,P6检测层在YOLOV7中的添加](https://www.bilibili.com/video/BV1LX4y1a72m/)\n17. [使用VOVGSCSP轻量化yolov7的Neck](https://www.bilibili.com/video/BV14m4y147PC/)\n18. [添加SwinTransformer-Tiny主干到yolov5中](https://www.bilibili.com/video/BV1WX4y1a7ea/)\n19. [Scale-Aware RFE添加到yolov7中](https://www.bilibili.com/video/BV1hW4y1D7gQ/)\n20. [把重参数结构DiverseBranchBlock添加到yolov7中](https://www.bilibili.com/video/BV14u411b7kL/)\n21. [添加(2023最新IoU度量算法)MPDiou到yolov7中](https://www.bilibili.com/video/BV1Qh4y1r7D3/)\n22. [利用华为2023最新GOLD-YOLO中的Gatherand-Distribute进行改进YOLOV7中的特征融合模块.](https://www.bilibili.com/video/BV14V411c7H1/)\n23. [利用动态蛇形卷积改进YOLOV7](https://www.bilibili.com/video/BV1Wj411x7fq/)\n24. [利用带有位置信息编码的AIFI自注意力机制改进YOLOV7](https://www.bilibili.com/video/BV1rj411a7s4/)\n25. [添加Attentional Scale Sequence Fusion到yolov7中](https://www.bilibili.com/video/BV1PH4y1S7mf/)\n26. [引入最新SOTA(YOLOV9)中的RepNCSPELAN模块](https://www.bilibili.com/video/BV1UA4m137hz/)\n#### YOLOV8\n1. [添加EIOU，SIOU，ALPHA-IOU, FocalEIOU到yolov5,yolov8的box_iou中](https://www.bilibili.com/video/BV1PY4y1o7Hm/)\n2. [Wise-IoU](https://www.bilibili.com/video/BV1De4y1N7Mb/)\n3. [添加Deformable convolution V2到yolov8中](https://www.bilibili.com/video/BV1Fo4y1i7Mm/)\n4. [最新~YOLOV8手把手教学配置文件添加注意力机制!一看就会!](https://www.bilibili.com/video/BV1RH4y1D7CY/)\n5. [YOLOV8改进-手把手带你学会注意力机制进阶用法](https://www.bilibili.com/video/BV1ZQ4y1J7oC/)\n6. [YOLOV8可视化-可视化并统计每张图的True Positive、False Positive、False Negative](https://www.bilibili.com/video/BV1RA4m1L79K/)\n7. [YOLOV8-基于VisDrone的TaskAlignedAssigner任务对齐分配策略的调参实验](https://www.bilibili.com/video/BV1XJ4m1x7eJ/)\n8. [YOLOV8-不会把多个改进整合到一个yaml配置文件里面？那来看看这个吧！从简到难手把手带你整合三个yaml](https://www.bilibili.com/video/BV15H4y1Y7a2/)\n9. [YOLOV8下游任务系列-一步一步DEBUG保姆式带你完成目标计数](https://www.bilibili.com/video/BV17H4y1J7DD/)\n10. [YOLOV8改进-带你分析V8的检测头并重设计10种结构轻量化检测头](https://www.bilibili.com/video/BV1cu411K7FE/)\n11. [从CVPR2022-RepLKNet分析有效感受野，并提供YOLOV8可视化感受野的脚本和讲解~](https://www.bilibili.com/video/BV1Gx4y1v7ZZ/)\n12. [YOLOV8-不会把PR曲线的数据保存并绘制到一张图？不用怕，手把手教程来啦~](https://www.bilibili.com/video/BV1uC41177oE/)\n13. [YOLOV8应用NMS-Free效果怎么样？在Visdrone2019数据集上进行实验，效果不错！后处理时间为0.0ms！](https://www.bilibili.com/video/BV1bt421N7ob/)\n14. [YOLOV8-NMSFree|更多公开数据集测试！VisDrone、VOC、PCB](https://www.bilibili.com/video/BV1nZ421x7jr/)\n15. [YOLOV8模型详细讲解(包含该如何改进YOLOV8)(刚入门小白，需要改进YOLOV8的同学必看！)](https://www.bilibili.com/video/BV1Ms421u7VH/)\n#### YOLOV9\n1. [YOLOV9-VisDrone实验对比结果来啦！YOLOV9-C模型VisDrone测试集精度为39.7！有兴趣进来看看具体啦！](https://www.bilibili.com/video/BV1Yy42187A3/)\n2. [从源码分析YOLOV9比YOLOV7多了什么内容！](https://www.bilibili.com/video/BV1v1421f7rN/)\n3. [YOLOV9n VS YOLOV8n，在VisDrone数据集上精度有2.4个点的提升!](https://www.bilibili.com/video/BV16m411f78L/)\n4. [YOLOV9改进-更换轻量化王者MobilenetV4-Backbone](https://www.bilibili.com/video/BV1Ax4y1B7Ln/)\n5. [YOLOV9改进-CVPR2024-StarNet、DRepCSPELAN](https://www.bilibili.com/video/BV1BU411o7rz/)\n6. [YOLOV9改进-CVPR2023-FasterNet以及其FasterBlock、PConv的改进](https://www.bilibili.com/video/BV18y411a74y/)\n7. [YOLOV9改进-DySnakeConv动态蛇形卷积、针对长条形不规则物体！](https://www.bilibili.com/video/BV1gi421S77X/)\n#### YOLOV11\n1. [Ultralytics8.3.0沉浸式讲解-YOLOV11针对代码的详细剖析](https://www.bilibili.com/video/BV19XxxeXEma/)\n2. [保姆级别YOLOV11-环境配置、 数据集介绍、训练、验证、推理 详细教学视频，看了它，跑YOLOV11 没问题~](https://www.bilibili.com/video/BV1VA11YBELB/)\n3. [YOLOV11改进详细分析(改进前必看)，每个部分(Backbone、Neck、Head....)有哪些地方可以改进？改进的时候要避免小白三件套！](https://www.bilibili.com/video/BV1GKCdYbEuz/)\n#### YOLOV13\n1. [哎哟你干嘛！YOLO又又又又出新版本了，YOLOV13来了！我们来看看YOLOV13改进了什么，对正在做YOLO改进的同学有什么影响？](https://www.bilibili.com/video/BV1jqKbzGEua/)\n#### D-Fine-ICLR2025\n1. [暴打CVPR2024-RTDETR的D-Fine究竟性能如何？我们一起来训练看看~](https://www.bilibili.com/video/BV1aE6aYHEer/)\n#### DEIM-CVPR2025\n1. [CVPR2025-DEIM｜新一代目标检测SOTA｜2025发高区论文必备的baseline｜训练、测试、10几集的基础改进课程、画图教程系列](https://space.bilibili.com/286900343/lists/4909499)"
  },
  {
    "path": "cv-attention/A2Attention.py",
    "content": "import numpy as np\nimport torch\nfrom torch import nn\nfrom torch.nn import init\nfrom torch.nn import functional as F\n\n\n\nclass DoubleAttention(nn.Module):\n\n    def __init__(self, in_channels,c_m=128,c_n=128,reconstruct = True):\n        super().__init__()\n        self.in_channels=in_channels\n        self.reconstruct = reconstruct\n        self.c_m=c_m\n        self.c_n=c_n\n        self.convA=nn.Conv2d(in_channels,c_m,1)\n        self.convB=nn.Conv2d(in_channels,c_n,1)\n        self.convV=nn.Conv2d(in_channels,c_n,1)\n        if self.reconstruct:\n            self.conv_reconstruct = nn.Conv2d(c_m, in_channels, kernel_size = 1)\n        self.init_weights()\n\n\n    def init_weights(self):\n        for m in self.modules():\n            if isinstance(m, nn.Conv2d):\n                init.kaiming_normal_(m.weight, mode='fan_out')\n                if m.bias is not None:\n                    init.constant_(m.bias, 0)\n            elif isinstance(m, nn.BatchNorm2d):\n                init.constant_(m.weight, 1)\n                init.constant_(m.bias, 0)\n            elif isinstance(m, nn.Linear):\n                init.normal_(m.weight, std=0.001)\n                if m.bias is not None:\n                    init.constant_(m.bias, 0)\n\n    def forward(self, x):\n        b, c, h,w=x.shape\n        assert c==self.in_channels\n        A=self.convA(x) #b,c_m,h,w\n        B=self.convB(x) #b,c_n,h,w\n        V=self.convV(x) #b,c_n,h,w\n        tmpA=A.view(b,self.c_m,-1)\n        attention_maps=F.softmax(B.view(b,self.c_n,-1))\n        attention_vectors=F.softmax(V.view(b,self.c_n,-1))\n        # step 1: feature gating\n        global_descriptors=torch.bmm(tmpA,attention_maps.permute(0,2,1)) #b.c_m,c_n\n        # step 2: feature distribution\n        tmpZ = global_descriptors.matmul(attention_vectors) #b,c_m,h*w\n        tmpZ=tmpZ.view(b,self.c_m,h,w) #b,c_m,h,w\n        if self.reconstruct:\n            tmpZ=self.conv_reconstruct(tmpZ)\n\n        return tmpZ \n\n\nif __name__ == '__main__':\n    input=torch.randn(50,512,7,7)\n    a2 = DoubleAttention(512)\n    output=a2(input)\n    print(output.shape)"
  },
  {
    "path": "cv-attention/BAM.py",
    "content": "import numpy as np\nimport torch\nfrom torch import nn\nfrom torch.nn import init\n\ndef autopad(k, p=None, d=1):  # kernel, padding, dilation\n    \"\"\"Pad to 'same' shape outputs.\"\"\"\n    if d > 1:\n        k = d * (k - 1) + 1 if isinstance(k, int) else [d * (x - 1) + 1 for x in k]  # actual kernel-size\n    if p is None:\n        p = k // 2 if isinstance(k, int) else [x // 2 for x in k]  # auto-pad\n    return p\n\nclass Flatten(nn.Module):\n    def forward(self, x):\n        return x.view(x.shape[0], -1)\n\n\nclass ChannelAttention(nn.Module):\n    def __init__(self, channel, reduction=16, num_layers=3):\n        super().__init__()\n        self.avgpool = nn.AdaptiveAvgPool2d(1)\n        gate_channels = [channel]\n        gate_channels += [channel // reduction] * num_layers\n        gate_channels += [channel]\n\n        self.ca = nn.Sequential()\n        self.ca.add_module('flatten', Flatten())\n        for i in range(len(gate_channels) - 2):\n            self.ca.add_module('fc%d' % i, nn.Linear(gate_channels[i], gate_channels[i + 1]))\n            self.ca.add_module('bn%d' % i, nn.BatchNorm1d(gate_channels[i + 1]))\n            self.ca.add_module('relu%d' % i, nn.ReLU())\n        self.ca.add_module('last_fc', nn.Linear(gate_channels[-2], gate_channels[-1]))\n\n    def forward(self, x):\n        res = self.avgpool(x)\n        res = self.ca(res)\n        res = res.unsqueeze(-1).unsqueeze(-1).expand_as(x)\n        return res\n\n\nclass SpatialAttention(nn.Module):\n    def __init__(self, channel, reduction=16, num_layers=3, dia_val=2):\n        super().__init__()\n        self.sa = nn.Sequential()\n        self.sa.add_module('conv_reduce1',\n                           nn.Conv2d(kernel_size=1, in_channels=channel, out_channels=channel // reduction))\n        self.sa.add_module('bn_reduce1', nn.BatchNorm2d(channel // reduction))\n        self.sa.add_module('relu_reduce1', nn.ReLU())\n        for i in range(num_layers):\n            self.sa.add_module('conv_%d' % i, nn.Conv2d(kernel_size=3, in_channels=channel // reduction,\n                                                        out_channels=channel // reduction, padding=autopad(3, None, dia_val), dilation=dia_val))\n            self.sa.add_module('bn_%d' % i, nn.BatchNorm2d(channel // reduction))\n            self.sa.add_module('relu_%d' % i, nn.ReLU())\n        self.sa.add_module('last_conv', nn.Conv2d(channel // reduction, 1, kernel_size=1))\n\n    def forward(self, x):\n        res = self.sa(x)\n        res = res.expand_as(x)\n        return res\n\n\nclass BAMBlock(nn.Module):\n    def __init__(self, channel=512, reduction=16, dia_val=2):\n        super().__init__()\n        self.ca = ChannelAttention(channel=channel, reduction=reduction)\n        self.sa = SpatialAttention(channel=channel, reduction=reduction, dia_val=dia_val)\n        self.sigmoid = nn.Sigmoid()\n\n    def init_weights(self):\n        for m in self.modules():\n            if isinstance(m, nn.Conv2d):\n                init.kaiming_normal_(m.weight, mode='fan_out')\n                if m.bias is not None:\n                    init.constant_(m.bias, 0)\n            elif isinstance(m, nn.BatchNorm2d):\n                init.constant_(m.weight, 1)\n                init.constant_(m.bias, 0)\n            elif isinstance(m, nn.Linear):\n                init.normal_(m.weight, std=0.001)\n                if m.bias is not None:\n                    init.constant_(m.bias, 0)\n\n    def forward(self, x):\n        b, c, _, _ = x.size()\n        sa_out = self.sa(x)\n        ca_out = self.ca(x)\n        weight = self.sigmoid(sa_out + ca_out)\n        out = (1 + weight) * x\n        return out\n\nif __name__ == '__main__':\n    input = torch.randn(50, 512, 7, 7)\n    bam = BAMBlock(channel=512, reduction=16, dia_val=2)\n    output = bam(input)\n    print(output.shape)\n"
  },
  {
    "path": "cv-attention/Biformer.py",
    "content": "\"\"\"\nCore of BiFormer, Bi-Level Routing Attention.\n\nTo be refactored.\n\nauthor: ZHU Lei\ngithub: https://github.com/rayleizhu\nemail: ray.leizhu@outlook.com\n\nThis source code is licensed under the license found in the\nLICENSE file in the root directory of this source tree.\n\"\"\"\nfrom typing import Tuple, Optional\n\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nfrom einops import rearrange\nfrom torch import Tensor, LongTensor\n\n\nclass TopkRouting(nn.Module):\n    \"\"\"\n    differentiable topk routing with scaling\n    Args:\n        qk_dim: int, feature dimension of query and key\n        topk: int, the 'topk'\n        qk_scale: int or None, temperature (multiply) of softmax activation\n        with_param: bool, wether inorporate learnable params in routing unit\n        diff_routing: bool, wether make routing differentiable\n        soft_routing: bool, wether make output value multiplied by routing weights\n    \"\"\"\n    def __init__(self, qk_dim, topk=4, qk_scale=None, param_routing=False, diff_routing=False):\n        super().__init__()\n        self.topk = topk\n        self.qk_dim = qk_dim\n        self.scale = qk_scale or qk_dim ** -0.5\n        self.diff_routing = diff_routing\n        # TODO: norm layer before/after linear?\n        self.emb = nn.Linear(qk_dim, qk_dim) if param_routing else nn.Identity()\n        # routing activation\n        self.routing_act = nn.Softmax(dim=-1)\n    \n    def forward(self, query:Tensor, key:Tensor)->Tuple[Tensor]:\n        \"\"\"\n        Args:\n            q, k: (n, p^2, c) tensor\n        Return:\n            r_weight, topk_index: (n, p^2, topk) tensor\n        \"\"\"\n        if not self.diff_routing:\n            query, key = query.detach(), key.detach()\n        query_hat, key_hat = self.emb(query), self.emb(key) # per-window pooling -> (n, p^2, c) \n        attn_logit = (query_hat*self.scale) @ key_hat.transpose(-2, -1) # (n, p^2, p^2)\n        topk_attn_logit, topk_index = torch.topk(attn_logit, k=self.topk, dim=-1) # (n, p^2, k), (n, p^2, k)\n        r_weight = self.routing_act(topk_attn_logit) # (n, p^2, k)\n        \n        return r_weight, topk_index\n        \n\nclass KVGather(nn.Module):\n    def __init__(self, mul_weight='none'):\n        super().__init__()\n        assert mul_weight in ['none', 'soft', 'hard']\n        self.mul_weight = mul_weight\n\n    def forward(self, r_idx:Tensor, r_weight:Tensor, kv:Tensor):\n        \"\"\"\n        r_idx: (n, p^2, topk) tensor\n        r_weight: (n, p^2, topk) tensor\n        kv: (n, p^2, w^2, c_kq+c_v)\n\n        Return:\n            (n, p^2, topk, w^2, c_kq+c_v) tensor\n        \"\"\"\n        # select kv according to routing index\n        n, p2, w2, c_kv = kv.size()\n        topk = r_idx.size(-1)\n        # print(r_idx.size(), r_weight.size())\n        # FIXME: gather consumes much memory (topk times redundancy), write cuda kernel? \n        topk_kv = torch.gather(kv.view(n, 1, p2, w2, c_kv).expand(-1, p2, -1, -1, -1), # (n, p^2, p^2, w^2, c_kv) without mem cpy\n                                dim=2,\n                                index=r_idx.view(n, p2, topk, 1, 1).expand(-1, -1, -1, w2, c_kv) # (n, p^2, k, w^2, c_kv)\n                               )\n\n        if self.mul_weight == 'soft':\n            topk_kv = r_weight.view(n, p2, topk, 1, 1) * topk_kv # (n, p^2, k, w^2, c_kv)\n        elif self.mul_weight == 'hard':\n            raise NotImplementedError('differentiable hard routing TBA')\n        # else: #'none'\n        #     topk_kv = topk_kv # do nothing\n\n        return topk_kv\n\nclass QKVLinear(nn.Module):\n    def __init__(self, dim, qk_dim, bias=True):\n        super().__init__()\n        self.dim = dim\n        self.qk_dim = qk_dim\n        self.qkv = nn.Linear(dim, qk_dim + qk_dim + dim, bias=bias)\n    \n    def forward(self, x):\n        q, kv = self.qkv(x).split([self.qk_dim, self.qk_dim+self.dim], dim=-1)\n        return q, kv\n        # q, k, v = self.qkv(x).split([self.qk_dim, self.qk_dim, self.dim], dim=-1)\n        # return q, k, v\n\nclass BiLevelRoutingAttention(nn.Module):\n    \"\"\"\n    n_win: number of windows in one side (so the actual number of windows is n_win*n_win)\n    kv_per_win: for kv_downsample_mode='ada_xxxpool' only, number of key/values per window. Similar to n_win, the actual number is kv_per_win*kv_per_win.\n    topk: topk for window filtering\n    param_attention: 'qkvo'-linear for q,k,v and o, 'none': param free attention\n    param_routing: extra linear for routing\n    diff_routing: wether to set routing differentiable\n    soft_routing: wether to multiply soft routing weights \n    \"\"\"\n    def __init__(self, dim, n_win=7, num_heads=8, qk_dim=None, qk_scale=None,\n                 kv_per_win=4, kv_downsample_ratio=4, kv_downsample_kernel=None, kv_downsample_mode='identity',\n                 topk=4, param_attention=\"qkvo\", param_routing=False, diff_routing=False, soft_routing=False, side_dwconv=3,\n                 auto_pad=True):\n        super().__init__()\n        # local attention setting\n        self.dim = dim\n        self.n_win = n_win  # Wh, Ww\n        self.num_heads = num_heads\n        self.qk_dim = qk_dim or dim\n        assert self.qk_dim % num_heads == 0 and self.dim % num_heads==0, 'qk_dim and dim must be divisible by num_heads!'\n        self.scale = qk_scale or self.qk_dim ** -0.5\n\n\n        ################side_dwconv (i.e. LCE in ShuntedTransformer)###########\n        self.lepe = nn.Conv2d(dim, dim, kernel_size=side_dwconv, stride=1, padding=side_dwconv//2, groups=dim) if side_dwconv > 0 else \\\n                    lambda x: torch.zeros_like(x)\n        \n        ################ global routing setting #################\n        self.topk = topk\n        self.param_routing = param_routing\n        self.diff_routing = diff_routing\n        self.soft_routing = soft_routing\n        # router\n        assert not (self.param_routing and not self.diff_routing) # cannot be with_param=True and diff_routing=False\n        self.router = TopkRouting(qk_dim=self.qk_dim,\n                                  qk_scale=self.scale,\n                                  topk=self.topk,\n                                  diff_routing=self.diff_routing,\n                                  param_routing=self.param_routing)\n        if self.soft_routing: # soft routing, always diffrentiable (if no detach)\n            mul_weight = 'soft'\n        elif self.diff_routing: # hard differentiable routing\n            mul_weight = 'hard'\n        else:  # hard non-differentiable routing\n            mul_weight = 'none'\n        self.kv_gather = KVGather(mul_weight=mul_weight)\n\n        # qkv mapping (shared by both global routing and local attention)\n        self.param_attention = param_attention\n        if self.param_attention == 'qkvo':\n            self.qkv = QKVLinear(self.dim, self.qk_dim)\n            self.wo = nn.Linear(dim, dim)\n        elif self.param_attention == 'qkv':\n            self.qkv = QKVLinear(self.dim, self.qk_dim)\n            self.wo = nn.Identity()\n        else:\n            raise ValueError(f'param_attention mode {self.param_attention} is not surpported!')\n        \n        self.kv_downsample_mode = kv_downsample_mode\n        self.kv_per_win = kv_per_win\n        self.kv_downsample_ratio = kv_downsample_ratio\n        self.kv_downsample_kenel = kv_downsample_kernel\n        if self.kv_downsample_mode == 'ada_avgpool':\n            assert self.kv_per_win is not None\n            self.kv_down = nn.AdaptiveAvgPool2d(self.kv_per_win)\n        elif self.kv_downsample_mode == 'ada_maxpool':\n            assert self.kv_per_win is not None\n            self.kv_down = nn.AdaptiveMaxPool2d(self.kv_per_win)\n        elif self.kv_downsample_mode == 'maxpool':\n            assert self.kv_downsample_ratio is not None\n            self.kv_down = nn.MaxPool2d(self.kv_downsample_ratio) if self.kv_downsample_ratio > 1 else nn.Identity()\n        elif self.kv_downsample_mode == 'avgpool':\n            assert self.kv_downsample_ratio is not None\n            self.kv_down = nn.AvgPool2d(self.kv_downsample_ratio) if self.kv_downsample_ratio > 1 else nn.Identity()\n        elif self.kv_downsample_mode == 'identity': # no kv downsampling\n            self.kv_down = nn.Identity()\n        elif self.kv_downsample_mode == 'fracpool':\n            # assert self.kv_downsample_ratio is not None\n            # assert self.kv_downsample_kenel is not None\n            # TODO: fracpool\n            # 1. kernel size should be input size dependent\n            # 2. there is a random factor, need to avoid independent sampling for k and v \n            raise NotImplementedError('fracpool policy is not implemented yet!')\n        elif kv_downsample_mode == 'conv':\n            # TODO: need to consider the case where k != v so that need two downsample modules\n            raise NotImplementedError('conv policy is not implemented yet!')\n        else:\n            raise ValueError(f'kv_down_sample_mode {self.kv_downsaple_mode} is not surpported!')\n\n        # softmax for local attention\n        self.attn_act = nn.Softmax(dim=-1)\n\n        self.auto_pad=auto_pad\n\n    def forward(self, x, ret_attn_mask=False):\n        \"\"\"\n        x: NHWC tensor\n\n        Return:\n            NHWC tensor\n        \"\"\"\n        x = rearrange(x, \"n c h w -> n h w c\")\n         # NOTE: use padding for semantic segmentation\n        ###################################################\n        if self.auto_pad:\n            N, H_in, W_in, C = x.size()\n\n            pad_l = pad_t = 0\n            pad_r = (self.n_win - W_in % self.n_win) % self.n_win\n            pad_b = (self.n_win - H_in % self.n_win) % self.n_win\n            x = F.pad(x, (0, 0, # dim=-1\n                          pad_l, pad_r, # dim=-2\n                          pad_t, pad_b)) # dim=-3\n            _, H, W, _ = x.size() # padded size\n        else:\n            N, H, W, C = x.size()\n            assert H%self.n_win == 0 and W%self.n_win == 0 #\n        ###################################################\n\n\n        # patchify, (n, p^2, w, w, c), keep 2d window as we need 2d pooling to reduce kv size\n        x = rearrange(x, \"n (j h) (i w) c -> n (j i) h w c\", j=self.n_win, i=self.n_win)\n\n        #################qkv projection###################\n        # q: (n, p^2, w, w, c_qk)\n        # kv: (n, p^2, w, w, c_qk+c_v)\n        # NOTE: separte kv if there were memory leak issue caused by gather\n        q, kv = self.qkv(x) \n\n        # pixel-wise qkv\n        # q_pix: (n, p^2, w^2, c_qk)\n        # kv_pix: (n, p^2, h_kv*w_kv, c_qk+c_v)\n        q_pix = rearrange(q, 'n p2 h w c -> n p2 (h w) c')\n        kv_pix = self.kv_down(rearrange(kv, 'n p2 h w c -> (n p2) c h w'))\n        kv_pix = rearrange(kv_pix, '(n j i) c h w -> n (j i) (h w) c', j=self.n_win, i=self.n_win)\n\n        q_win, k_win = q.mean([2, 3]), kv[..., 0:self.qk_dim].mean([2, 3]) # window-wise qk, (n, p^2, c_qk), (n, p^2, c_qk)\n\n        ##################side_dwconv(lepe)##################\n        # NOTE: call contiguous to avoid gradient warning when using ddp\n        lepe = self.lepe(rearrange(kv[..., self.qk_dim:], 'n (j i) h w c -> n c (j h) (i w)', j=self.n_win, i=self.n_win).contiguous())\n        lepe = rearrange(lepe, 'n c (j h) (i w) -> n (j h) (i w) c', j=self.n_win, i=self.n_win)\n\n        ############ gather q dependent k/v #################\n\n        r_weight, r_idx = self.router(q_win, k_win) # both are (n, p^2, topk) tensors\n\n        kv_pix_sel = self.kv_gather(r_idx=r_idx, r_weight=r_weight, kv=kv_pix) #(n, p^2, topk, h_kv*w_kv, c_qk+c_v)\n        k_pix_sel, v_pix_sel = kv_pix_sel.split([self.qk_dim, self.dim], dim=-1)\n        # kv_pix_sel: (n, p^2, topk, h_kv*w_kv, c_qk)\n        # v_pix_sel: (n, p^2, topk, h_kv*w_kv, c_v)\n        \n        ######### do attention as normal ####################\n        k_pix_sel = rearrange(k_pix_sel, 'n p2 k w2 (m c) -> (n p2) m c (k w2)', m=self.num_heads) # flatten to BMLC, (n*p^2, m, topk*h_kv*w_kv, c_kq//m) transpose here?\n        v_pix_sel = rearrange(v_pix_sel, 'n p2 k w2 (m c) -> (n p2) m (k w2) c', m=self.num_heads) # flatten to BMLC, (n*p^2, m, topk*h_kv*w_kv, c_v//m)\n        q_pix = rearrange(q_pix, 'n p2 w2 (m c) -> (n p2) m w2 c', m=self.num_heads) # to BMLC tensor (n*p^2, m, w^2, c_qk//m)\n\n        # param-free multihead attention\n        attn_weight = (q_pix * self.scale) @ k_pix_sel # (n*p^2, m, w^2, c) @ (n*p^2, m, c, topk*h_kv*w_kv) -> (n*p^2, m, w^2, topk*h_kv*w_kv)\n        attn_weight = self.attn_act(attn_weight)\n        out = attn_weight @ v_pix_sel # (n*p^2, m, w^2, topk*h_kv*w_kv) @ (n*p^2, m, topk*h_kv*w_kv, c) -> (n*p^2, m, w^2, c)\n        out = rearrange(out, '(n j i) m (h w) c -> n (j h) (i w) (m c)', j=self.n_win, i=self.n_win,\n                        h=H//self.n_win, w=W//self.n_win)\n\n        out = out + lepe\n        # output linear\n        out = self.wo(out)\n\n        # NOTE: use padding for semantic segmentation\n        # crop padded region\n        if self.auto_pad and (pad_r > 0 or pad_b > 0):\n            out = out[:, :H_in, :W_in, :].contiguous()\n\n        if ret_attn_mask:\n            return out, r_weight, r_idx, attn_weight\n        else:\n            return rearrange(out, \"n h w c -> n c h w\")\n\nclass Attention(nn.Module):\n    \"\"\"\n    vanilla attention\n    \"\"\"\n    def __init__(self, dim, num_heads=8, qkv_bias=False, qk_scale=None, attn_drop=0., proj_drop=0.):\n        super().__init__()\n        self.num_heads = num_heads\n        head_dim = dim // num_heads\n        # NOTE scale factor was wrong in my original version, can set manually to be compat with prev weights\n        self.scale = qk_scale or head_dim ** -0.5\n\n        self.qkv = nn.Linear(dim, dim * 3, bias=qkv_bias)\n        self.attn_drop = nn.Dropout(attn_drop)\n        self.proj = nn.Linear(dim, dim)\n        self.proj_drop = nn.Dropout(proj_drop)\n\n    def forward(self, x):\n        \"\"\"\n        args:\n            x: NCHW tensor\n        return:\n            NCHW tensor\n        \"\"\"\n        _, _, H, W = x.size()\n        x = rearrange(x, 'n c h w -> n (h w) c')\n        \n        #######################################\n        B, N, C = x.shape        \n        qkv = self.qkv(x).reshape(B, N, 3, self.num_heads, C // self.num_heads).permute(2, 0, 3, 1, 4)\n        q, k, v = qkv[0], qkv[1], qkv[2]   # make torchscript happy (cannot use tensor as tuple)\n\n        attn = (q @ k.transpose(-2, -1)) * self.scale\n        attn = attn.softmax(dim=-1)\n        attn = self.attn_drop(attn)\n\n        x = (attn @ v).transpose(1, 2).reshape(B, N, C)\n        x = self.proj(x)\n        x = self.proj_drop(x)\n        #######################################\n\n        x = rearrange(x, 'n (h w) c -> n c h w', h=H, w=W)\n        return x\n\nclass AttentionLePE(nn.Module):\n    \"\"\"\n    vanilla attention\n    \"\"\"\n    def __init__(self, dim, num_heads=8, qkv_bias=False, qk_scale=None, attn_drop=0., proj_drop=0., side_dwconv=5):\n        super().__init__()\n        self.num_heads = num_heads\n        head_dim = dim // num_heads\n        # NOTE scale factor was wrong in my original version, can set manually to be compat with prev weights\n        self.scale = qk_scale or head_dim ** -0.5\n\n        self.qkv = nn.Linear(dim, dim * 3, bias=qkv_bias)\n        self.attn_drop = nn.Dropout(attn_drop)\n        self.proj = nn.Linear(dim, dim)\n        self.proj_drop = nn.Dropout(proj_drop)\n        self.lepe = nn.Conv2d(dim, dim, kernel_size=side_dwconv, stride=1, padding=side_dwconv//2, groups=dim) if side_dwconv > 0 else \\\n                    lambda x: torch.zeros_like(x)\n\n    def forward(self, x):\n        \"\"\"\n        args:\n            x: NCHW tensor\n        return:\n            NCHW tensor\n        \"\"\"\n        _, _, H, W = x.size()\n        x = rearrange(x, 'n c h w -> n (h w) c')\n        \n        #######################################\n        B, N, C = x.shape        \n        qkv = self.qkv(x).reshape(B, N, 3, self.num_heads, C // self.num_heads).permute(2, 0, 3, 1, 4)\n        q, k, v = qkv[0], qkv[1], qkv[2]   # make torchscript happy (cannot use tensor as tuple)\n\n        lepe = self.lepe(rearrange(x, 'n (h w) c -> n c h w', h=H, w=W))\n        lepe = rearrange(lepe, 'n c h w -> n (h w) c')\n\n        attn = (q @ k.transpose(-2, -1)) * self.scale\n        attn = attn.softmax(dim=-1)\n        attn = self.attn_drop(attn)\n\n        x = (attn @ v).transpose(1, 2).reshape(B, N, C)\n        x = x + lepe\n\n        x = self.proj(x)\n        x = self.proj_drop(x)\n        #######################################\n\n        x = rearrange(x, 'n (h w) c -> n c h w', h=H, w=W)\n        return x\n\ndef _grid2seq(x:Tensor, region_size:Tuple[int], num_heads:int):\n    \"\"\"\n    Args:\n        x: BCHW tensor\n        region size: int\n        num_heads: number of attention heads\n    Return:\n        out: rearranged x, has a shape of (bs, nhead, nregion, reg_size, head_dim)\n        region_h, region_w: number of regions per col/row\n    \"\"\"\n    B, C, H, W = x.size()\n    region_h, region_w =  H//region_size[0],  W//region_size[1]\n    x = x.view(B, num_heads, C//num_heads, region_h, region_size[0], region_w, region_size[1])\n    x = torch.einsum('bmdhpwq->bmhwpqd', x).flatten(2, 3).flatten(-3, -2) # (bs, nhead, nregion, reg_size, head_dim)\n    return x, region_h, region_w\n\n\ndef _seq2grid(x:Tensor, region_h:int, region_w:int, region_size:Tuple[int]):\n    \"\"\"\n    Args: \n        x: (bs, nhead, nregion, reg_size^2, head_dim)\n    Return:\n        x: (bs, C, H, W)\n    \"\"\"\n    bs, nhead, nregion, reg_size_square, head_dim = x.size()\n    x = x.view(bs, nhead, region_h, region_w, region_size[0], region_size[1], head_dim)\n    x = torch.einsum('bmhwpqd->bmdhpwq', x).reshape(bs, nhead*head_dim,\n        region_h*region_size[0], region_w*region_size[1])\n    return x\n\n\ndef regional_routing_attention_torch(\n    query:Tensor, key:Tensor, value:Tensor, scale:float,\n    region_graph:LongTensor, region_size:Tuple[int],\n    kv_region_size:Optional[Tuple[int]]=None,\n    auto_pad=True)->Tensor:\n    \"\"\"\n    Args:\n        query, key, value: (B, C, H, W) tensor\n        scale: the scale/temperature for dot product attention\n        region_graph: (B, nhead, h_q*w_q, topk) tensor, topk <= h_k*w_k\n        region_size: region/window size for queries, (rh, rw)\n        key_region_size: optional, if None, key_region_size=region_size\n        auto_pad: required to be true if the input sizes are not divisible by the region_size\n    Return:\n        output: (B, C, H, W) tensor\n        attn: (bs, nhead, q_nregion, reg_size, topk*kv_region_size) attention matrix\n    \"\"\"\n    kv_region_size = kv_region_size or region_size\n    bs, nhead, q_nregion, topk = region_graph.size()\n    \n    # Auto pad to deal with any input size \n    q_pad_b, q_pad_r, kv_pad_b, kv_pad_r = 0, 0, 0, 0\n    if auto_pad:\n        _, _, Hq, Wq = query.size()\n        q_pad_b = (region_size[0] - Hq % region_size[0]) % region_size[0]\n        q_pad_r = (region_size[1] - Wq % region_size[1]) % region_size[1]\n        if (q_pad_b > 0 or q_pad_r > 0):\n            query = F.pad(query, (0, q_pad_r, 0, q_pad_b)) # zero padding\n\n        _, _, Hk, Wk = key.size()\n        kv_pad_b = (kv_region_size[0] - Hk % kv_region_size[0]) % kv_region_size[0]\n        kv_pad_r = (kv_region_size[1] - Wk % kv_region_size[1]) % kv_region_size[1]\n        if (kv_pad_r > 0 or kv_pad_b > 0):\n            key = F.pad(key, (0, kv_pad_r, 0, kv_pad_b)) # zero padding\n            value = F.pad(value, (0, kv_pad_r, 0, kv_pad_b)) # zero padding\n    \n    # to sequence format, i.e. (bs, nhead, nregion, reg_size, head_dim)\n    query, q_region_h, q_region_w = _grid2seq(query, region_size=region_size, num_heads=nhead)\n    key, _, _ = _grid2seq(key, region_size=kv_region_size, num_heads=nhead)\n    value, _, _ = _grid2seq(value, region_size=kv_region_size, num_heads=nhead)\n\n    # gather key and values.\n    # TODO: is seperate gathering slower than fused one (our old version) ?\n    # torch.gather does not support broadcasting, hence we do it manually\n    bs, nhead, kv_nregion, kv_region_size, head_dim = key.size()\n    broadcasted_region_graph = region_graph.view(bs, nhead, q_nregion, topk, 1, 1).\\\n        expand(-1, -1, -1, -1, kv_region_size, head_dim)\n    key_g = torch.gather(key.view(bs, nhead, 1, kv_nregion, kv_region_size, head_dim).\\\n        expand(-1, -1, query.size(2), -1, -1, -1), dim=3,\n        index=broadcasted_region_graph) # (bs, nhead, q_nregion, topk, kv_region_size, head_dim)\n    value_g = torch.gather(value.view(bs, nhead, 1, kv_nregion, kv_region_size, head_dim).\\\n        expand(-1, -1, query.size(2), -1, -1, -1), dim=3,\n        index=broadcasted_region_graph) # (bs, nhead, q_nregion, topk, kv_region_size, head_dim)\n    \n    # token-to-token attention\n    # (bs, nhead, q_nregion, reg_size, head_dim) @ (bs, nhead, q_nregion, head_dim, topk*kv_region_size)\n    # -> (bs, nhead, q_nregion, reg_size, topk*kv_region_size)\n    # TODO: mask padding region\n    attn = (query * scale) @ key_g.flatten(-3, -2).transpose(-1, -2)\n    attn = torch.softmax(attn, dim=-1)\n    # (bs, nhead, q_nregion, reg_size, topk*kv_region_size) @ (bs, nhead, q_nregion, topk*kv_region_size, head_dim)\n    # -> (bs, nhead, q_nregion, reg_size, head_dim)\n    output = attn @ value_g.flatten(-3, -2)\n\n    # to BCHW format\n    output = _seq2grid(output, region_h=q_region_h, region_w=q_region_w, region_size=region_size)\n\n    # remove paddings if needed\n    if auto_pad and (q_pad_b > 0 or q_pad_r > 0):\n        output = output[:, :, :Hq, :Wq]\n\n    return output, attn\n\nclass BiLevelRoutingAttention_nchw(nn.Module):\n    \"\"\"Bi-Level Routing Attention that takes nchw input\n\n    Compared to legacy version, this implementation:\n    * removes unused args and components\n    * uses nchw input format to avoid frequent permutation\n\n    When the size of inputs is not divisible by the region size, there is also a numerical difference\n    than legacy implementation, due to:\n    * different way to pad the input feature map (padding after linear projection)\n    * different pooling behavior (count_include_pad=False)\n\n    Current implementation is more reasonable, hence we do not keep backward numerical compatiability\n    \"\"\"\n    def __init__(self, dim, num_heads=8, n_win=7, qk_scale=None, topk=4,  side_dwconv=3, auto_pad=False, attn_backend='torch'):\n        super().__init__()\n        # local attention setting\n        self.dim = dim\n        self.num_heads = num_heads\n        assert self.dim % num_heads == 0, 'dim must be divisible by num_heads!'\n        self.head_dim = self.dim // self.num_heads\n        self.scale = qk_scale or self.dim ** -0.5 # NOTE: to be consistent with old models.\n\n        ################side_dwconv (i.e. LCE in Shunted Transformer)###########\n        self.lepe = nn.Conv2d(dim, dim, kernel_size=side_dwconv, stride=1, padding=side_dwconv//2, groups=dim) if side_dwconv > 0 else \\\n                    lambda x: torch.zeros_like(x)\n        \n        ################ regional routing setting #################\n        self.topk = topk\n        self.n_win = n_win  # number of windows per row/col\n\n        ##########################################\n\n        self.qkv_linear = nn.Conv2d(self.dim, 3*self.dim, kernel_size=1)\n        self.output_linear = nn.Conv2d(self.dim, self.dim, kernel_size=1)\n\n        if attn_backend == 'torch':\n            self.attn_fn = regional_routing_attention_torch\n        else:\n            raise ValueError('CUDA implementation is not available yet. Please stay tuned.')\n\n    def forward(self, x:Tensor, ret_attn_mask=False):\n        \"\"\"\n        Args:\n            x: NCHW tensor, better to be channel_last (https://pytorch.org/tutorials/intermediate/memory_format_tutorial.html)\n        Return:\n            NCHW tensor\n        \"\"\"\n        N, C, H, W = x.size()\n        region_size = (H//self.n_win, W//self.n_win)\n\n        # STEP 1: linear projection\n        qkv = self.qkv_linear.forward(x) # ncHW\n        q, k, v = qkv.chunk(3, dim=1) # ncHW\n       \n        # STEP 2: region-to-region routing\n        # NOTE: ceil_mode=True, count_include_pad=False = auto padding\n        # NOTE: gradients backward through token-to-token attention. See Appendix A for the intuition.\n        q_r = F.avg_pool2d(q.detach(), kernel_size=region_size, ceil_mode=True, count_include_pad=False)\n        k_r = F.avg_pool2d(k.detach(), kernel_size=region_size, ceil_mode=True, count_include_pad=False) # nchw\n        q_r:Tensor = q_r.permute(0, 2, 3, 1).flatten(1, 2) # n(hw)c\n        k_r:Tensor = k_r.flatten(2, 3) # nc(hw)\n        a_r = q_r @ k_r # n(hw)(hw), adj matrix of regional graph\n        _, idx_r = torch.topk(a_r, k=self.topk, dim=-1) # n(hw)k long tensor\n        idx_r:LongTensor = idx_r.unsqueeze_(1).expand(-1, self.num_heads, -1, -1) \n\n        # STEP 3: token to token attention (non-parametric function)\n        output, attn_mat = self.attn_fn(query=q, key=k, value=v, scale=self.scale,\n                                        region_graph=idx_r, region_size=region_size\n                                       )\n        \n        output = output + self.lepe(v) # ncHW\n        output = self.output_linear(output) # ncHW\n\n        if ret_attn_mask:\n            return output, attn_mat\n\n        return output"
  },
  {
    "path": "cv-attention/CAA.py",
    "content": "import torch.nn as nn\n\ndef autopad(k, p=None, d=1):  # kernel, padding, dilation\n    \"\"\"Pad to 'same' shape outputs.\"\"\"\n    if d > 1:\n        k = d * (k - 1) + 1 if isinstance(k, int) else [d * (x - 1) + 1 for x in k]  # actual kernel-size\n    if p is None:\n        p = k // 2 if isinstance(k, int) else [x // 2 for x in k]  # auto-pad\n    return p\n\n\nclass Conv(nn.Module):\n    \"\"\"Standard convolution with args(ch_in, ch_out, kernel, stride, padding, groups, dilation, activation).\"\"\"\n\n    default_act = nn.SiLU()  # default activation\n\n    def __init__(self, c1, c2, k=1, s=1, p=None, g=1, d=1, act=True):\n        \"\"\"Initialize Conv layer with given arguments including activation.\"\"\"\n        super().__init__()\n        self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p, d), groups=g, dilation=d, bias=False)\n        self.bn = nn.BatchNorm2d(c2)\n        self.act = self.default_act if act is True else act if isinstance(act, nn.Module) else nn.Identity()\n\n    def forward(self, x):\n        \"\"\"Apply convolution, batch normalization and activation to input tensor.\"\"\"\n        return self.act(self.bn(self.conv(x)))\n\n    def forward_fuse(self, x):\n        \"\"\"Perform transposed convolution of 2D data.\"\"\"\n        return self.act(self.conv(x))\n\nclass CAA(nn.Module):\n    def __init__(self, ch, h_kernel_size = 11, v_kernel_size = 11) -> None:\n        super().__init__()\n        \n        self.avg_pool = nn.AvgPool2d(7, 1, 3)\n        self.conv1 = Conv(ch, ch)\n        self.h_conv = nn.Conv2d(ch, ch, (1, h_kernel_size), 1, (0, h_kernel_size // 2), 1, ch)\n        self.v_conv = nn.Conv2d(ch, ch, (v_kernel_size, 1), 1, (v_kernel_size // 2, 0), 1, ch)\n        self.conv2 = Conv(ch, ch)\n        self.act = nn.Sigmoid()\n    \n    def forward(self, x):\n        attn_factor = self.act(self.conv2(self.v_conv(self.h_conv(self.conv1(self.avg_pool(x))))))\n        return attn_factor * x"
  },
  {
    "path": "cv-attention/CBAM.py",
    "content": "import numpy as np\nimport torch\nfrom torch import nn\nfrom torch.nn import init\n\n\nclass ChannelAttention(nn.Module):\n    def __init__(self, channel, reduction=16):\n        super().__init__()\n        self.maxpool = nn.AdaptiveMaxPool2d(1)\n        self.avgpool = nn.AdaptiveAvgPool2d(1)\n        self.se = nn.Sequential(\n            nn.Conv2d(channel, channel // reduction, 1, bias=False),\n            nn.ReLU(),\n            nn.Conv2d(channel // reduction, channel, 1, bias=False)\n        )\n        self.sigmoid = nn.Sigmoid()\n\n    def forward(self, x):\n        max_result = self.maxpool(x)\n        avg_result = self.avgpool(x)\n        max_out = self.se(max_result)\n        avg_out = self.se(avg_result)\n        output = self.sigmoid(max_out + avg_out)\n        return output\n\n\nclass SpatialAttention(nn.Module):\n    def __init__(self, kernel_size=7):\n        super().__init__()\n        self.conv = nn.Conv2d(2, 1, kernel_size=kernel_size, padding=kernel_size // 2)\n        self.sigmoid = nn.Sigmoid()\n\n    def forward(self, x):\n        max_result, _ = torch.max(x, dim=1, keepdim=True)\n        avg_result = torch.mean(x, dim=1, keepdim=True)\n        result = torch.cat([max_result, avg_result], 1)\n        output = self.conv(result)\n        output = self.sigmoid(output)\n        return output\n\n\nclass CBAMBlock(nn.Module):\n\n    def __init__(self, channel=512, reduction=16, kernel_size=7):\n        super().__init__()\n        self.ca = ChannelAttention(channel=channel, reduction=reduction)\n        self.sa = SpatialAttention(kernel_size=kernel_size)\n\n    def init_weights(self):\n        for m in self.modules():\n            if isinstance(m, nn.Conv2d):\n                init.kaiming_normal_(m.weight, mode='fan_out')\n                if m.bias is not None:\n                    init.constant_(m.bias, 0)\n            elif isinstance(m, nn.BatchNorm2d):\n                init.constant_(m.weight, 1)\n                init.constant_(m.bias, 0)\n            elif isinstance(m, nn.Linear):\n                init.normal_(m.weight, std=0.001)\n                if m.bias is not None:\n                    init.constant_(m.bias, 0)\n\n    def forward(self, x):\n        b, c, _, _ = x.size()\n        out = x * self.ca(x)\n        out = out * self.sa(out)\n        return out\n\n\nif __name__ == '__main__':\n    input = torch.randn(50, 512, 7, 7)\n    kernel_size = input.shape[2]\n    cbam = CBAMBlock(channel=512, reduction=16, kernel_size=kernel_size)\n    output = cbam(input)\n    print(output.shape)\n"
  },
  {
    "path": "cv-attention/CPCA.py",
    "content": "import torch\nimport torch.nn as nn\nimport torch.nn.functional as F\n\nclass CPCA_ChannelAttention(nn.Module):\n\n    def __init__(self, input_channels, internal_neurons):\n        super(CPCA_ChannelAttention, self).__init__()\n        self.fc1 = nn.Conv2d(in_channels=input_channels, out_channels=internal_neurons, kernel_size=1, stride=1, bias=True)\n        self.fc2 = nn.Conv2d(in_channels=internal_neurons, out_channels=input_channels, kernel_size=1, stride=1, bias=True)\n        self.input_channels = input_channels\n\n    def forward(self, inputs):\n        x1 = F.adaptive_avg_pool2d(inputs, output_size=(1, 1))\n        x1 = self.fc1(x1)\n        x1 = F.relu(x1, inplace=True)\n        x1 = self.fc2(x1)\n        x1 = torch.sigmoid(x1)\n        x2 = F.adaptive_max_pool2d(inputs, output_size=(1, 1))\n        x2 = self.fc1(x2)\n        x2 = F.relu(x2, inplace=True)\n        x2 = self.fc2(x2)\n        x2 = torch.sigmoid(x2)\n        x = x1 + x2\n        x = x.view(-1, self.input_channels, 1, 1)\n        return inputs * x\n\nclass CPCA(nn.Module):\n    def __init__(self, channels, channelAttention_reduce=4):\n        super().__init__()\n\n        self.ca = CPCA_ChannelAttention(input_channels=channels, internal_neurons=channels // channelAttention_reduce)\n        self.dconv5_5 = nn.Conv2d(channels,channels,kernel_size=5,padding=2,groups=channels)\n        self.dconv1_7 = nn.Conv2d(channels,channels,kernel_size=(1,7),padding=(0,3),groups=channels)\n        self.dconv7_1 = nn.Conv2d(channels,channels,kernel_size=(7,1),padding=(3,0),groups=channels)\n        self.dconv1_11 = nn.Conv2d(channels,channels,kernel_size=(1,11),padding=(0,5),groups=channels)\n        self.dconv11_1 = nn.Conv2d(channels,channels,kernel_size=(11,1),padding=(5,0),groups=channels)\n        self.dconv1_21 = nn.Conv2d(channels,channels,kernel_size=(1,21),padding=(0,10),groups=channels)\n        self.dconv21_1 = nn.Conv2d(channels,channels,kernel_size=(21,1),padding=(10,0),groups=channels)\n        self.conv = nn.Conv2d(channels,channels,kernel_size=(1,1),padding=0)\n        self.act = nn.GELU()\n\n    def forward(self, inputs):\n        #   Global Perceptron\n        inputs = self.conv(inputs)\n        inputs = self.act(inputs)\n        \n        inputs = self.ca(inputs)\n\n        x_init = self.dconv5_5(inputs)\n        x_1 = self.dconv1_7(x_init)\n        x_1 = self.dconv7_1(x_1)\n        x_2 = self.dconv1_11(x_init)\n        x_2 = self.dconv11_1(x_2)\n        x_3 = self.dconv1_21(x_init)\n        x_3 = self.dconv21_1(x_3)\n        x = x_1 + x_2 + x_3 + x_init\n        spatial_att = self.conv(x)\n        out = spatial_att * inputs\n        out = self.conv(out)\n        return out"
  },
  {
    "path": "cv-attention/CloAttention.py",
    "content": "import torch\nimport torch.nn as nn\nfrom efficientnet_pytorch.model import MemoryEfficientSwish\n\nclass AttnMap(nn.Module):\n    def __init__(self, dim):\n        super().__init__()\n        self.act_block = nn.Sequential(\n                            nn.Conv2d(dim, dim, 1, 1, 0),\n                            MemoryEfficientSwish(),\n                            nn.Conv2d(dim, dim, 1, 1, 0)\n                         )\n    def forward(self, x):\n        return self.act_block(x)\n\nclass EfficientAttention(nn.Module):\n    def __init__(self, dim, num_heads=8, group_split=[4, 4], kernel_sizes=[5], window_size=4, \n                 attn_drop=0., proj_drop=0., qkv_bias=True):\n        super().__init__()\n        assert sum(group_split) == num_heads\n        assert len(kernel_sizes) + 1 == len(group_split)\n        self.dim = dim\n        self.num_heads = num_heads\n        self.dim_head = dim // num_heads\n        self.scalor = self.dim_head ** -0.5\n        self.kernel_sizes = kernel_sizes\n        self.window_size = window_size\n        self.group_split = group_split\n        convs = []\n        act_blocks = []\n        qkvs = []\n        #projs = []\n        for i in range(len(kernel_sizes)):\n            kernel_size = kernel_sizes[i]\n            group_head = group_split[i]\n            if group_head == 0:\n                continue\n            convs.append(nn.Conv2d(3*self.dim_head*group_head, 3*self.dim_head*group_head, kernel_size,\n                         1, kernel_size//2, groups=3*self.dim_head*group_head))\n            act_blocks.append(AttnMap(self.dim_head*group_head))\n            qkvs.append(nn.Conv2d(dim, 3*group_head*self.dim_head, 1, 1, 0, bias=qkv_bias))\n            #projs.append(nn.Linear(group_head*self.dim_head, group_head*self.dim_head, bias=qkv_bias))\n        if group_split[-1] != 0:\n            self.global_q = nn.Conv2d(dim, group_split[-1]*self.dim_head, 1, 1, 0, bias=qkv_bias)\n            self.global_kv = nn.Conv2d(dim, group_split[-1]*self.dim_head*2, 1, 1, 0, bias=qkv_bias)\n            #self.global_proj = nn.Linear(group_split[-1]*self.dim_head, group_split[-1]*self.dim_head, bias=qkv_bias)\n            self.avgpool = nn.AvgPool2d(window_size, window_size) if window_size!=1 else nn.Identity()\n\n        self.convs = nn.ModuleList(convs)\n        self.act_blocks = nn.ModuleList(act_blocks)\n        self.qkvs = nn.ModuleList(qkvs)\n        self.proj = nn.Conv2d(dim, dim, 1, 1, 0, bias=qkv_bias)\n        self.attn_drop = nn.Dropout(attn_drop)\n        self.proj_drop = nn.Dropout(proj_drop)\n\n    def high_fre_attntion(self, x: torch.Tensor, to_qkv: nn.Module, mixer: nn.Module, attn_block: nn.Module):\n        '''\n        x: (b c h w)\n        '''\n        b, c, h, w = x.size()\n        qkv = to_qkv(x) #(b (3 m d) h w)\n        qkv = mixer(qkv).reshape(b, 3, -1, h, w).transpose(0, 1).contiguous() #(3 b (m d) h w)\n        q, k, v = qkv #(b (m d) h w)\n        attn = attn_block(q.mul(k)).mul(self.scalor)\n        attn = self.attn_drop(torch.tanh(attn))\n        res = attn.mul(v) #(b (m d) h w)\n        return res\n        \n    def low_fre_attention(self, x : torch.Tensor, to_q: nn.Module, to_kv: nn.Module, avgpool: nn.Module):\n        '''\n        x: (b c h w)\n        '''\n        b, c, h, w = x.size()\n        \n        q = to_q(x).reshape(b, -1, self.dim_head, h*w).transpose(-1, -2).contiguous() #(b m (h w) d)\n        kv = avgpool(x) #(b c h w)\n        kv = to_kv(kv).view(b, 2, -1, self.dim_head, (h*w)//(self.window_size**2)).permute(1, 0, 2, 4, 3).contiguous() #(2 b m (H W) d)\n        k, v = kv #(b m (H W) d)\n        attn = self.scalor * q @ k.transpose(-1, -2) #(b m (h w) (H W))\n        attn = self.attn_drop(attn.softmax(dim=-1))\n        res = attn @ v #(b m (h w) d)\n        res = res.transpose(2, 3).reshape(b, -1, h, w).contiguous()\n        return res\n\n    def forward(self, x: torch.Tensor):\n        '''\n        x: (b c h w)\n        '''\n        res = []\n        for i in range(len(self.kernel_sizes)):\n            if self.group_split[i] == 0:\n                continue\n            res.append(self.high_fre_attntion(x, self.qkvs[i], self.convs[i], self.act_blocks[i]))\n        if self.group_split[-1] != 0:\n            res.append(self.low_fre_attention(x, self.global_q, self.global_kv, self.avgpool))\n        return self.proj_drop(self.proj(torch.cat(res, dim=1)))"
  },
  {
    "path": "cv-attention/CoTAttention.py",
    "content": "import numpy as np\nimport torch\nfrom torch import flatten, nn\nfrom torch.nn import init\nfrom torch.nn.modules.activation import ReLU\nfrom torch.nn.modules.batchnorm import BatchNorm2d\nfrom torch.nn import functional as F\n\n\nclass CoTAttention(nn.Module):\n\n    def __init__(self, dim=512, kernel_size=3):\n        super().__init__()\n        self.dim = dim\n        self.kernel_size = kernel_size\n\n        self.key_embed = nn.Sequential(\n            nn.Conv2d(dim, dim, kernel_size=kernel_size, padding=kernel_size // 2, groups=4, bias=False),\n            nn.BatchNorm2d(dim),\n            nn.ReLU()\n        )\n        self.value_embed = nn.Sequential(\n            nn.Conv2d(dim, dim, 1, bias=False),\n            nn.BatchNorm2d(dim)\n        )\n\n        factor = 4\n        self.attention_embed = nn.Sequential(\n            nn.Conv2d(2 * dim, 2 * dim // factor, 1, bias=False),\n            nn.BatchNorm2d(2 * dim // factor),\n            nn.ReLU(),\n            nn.Conv2d(2 * dim // factor, kernel_size * kernel_size * dim, 1)\n        )\n\n    def forward(self, x):\n        bs, c, h, w = x.shape\n        k1 = self.key_embed(x)  # bs,c,h,w\n        v = self.value_embed(x).view(bs, c, -1)  # bs,c,h,w\n\n        y = torch.cat([k1, x], dim=1)  # bs,2c,h,w\n        att = self.attention_embed(y)  # bs,c*k*k,h,w\n        att = att.reshape(bs, c, self.kernel_size * self.kernel_size, h, w)\n        att = att.mean(2, keepdim=False).view(bs, c, -1)  # bs,c,h*w\n        k2 = F.softmax(att, dim=-1) * v\n        k2 = k2.view(bs, c, h, w)\n\n        return k1 + k2\n\n\nif __name__ == '__main__':\n    input = torch.randn(50, 512, 7, 7)\n    cot = CoTAttention(dim=512, kernel_size=3)\n    output = cot(input)\n    print(output.shape)\n"
  },
  {
    "path": "cv-attention/CoordAttention.py",
    "content": "import torch\nimport torch.nn as nn\nimport torch.nn.functional as F\n\n\nclass h_sigmoid(nn.Module):\n    def __init__(self, inplace=True):\n        super(h_sigmoid, self).__init__()\n        self.relu = nn.ReLU6(inplace=inplace)\n\n    def forward(self, x):\n        return self.relu(x + 3) / 6\n\n\nclass h_swish(nn.Module):\n    def __init__(self, inplace=True):\n        super(h_swish, self).__init__()\n        self.sigmoid = h_sigmoid(inplace=inplace)\n\n    def forward(self, x):\n        return x * self.sigmoid(x)\n\n\nclass CoordAtt(nn.Module):\n    def __init__(self, inp, reduction=32):\n        super(CoordAtt, self).__init__()\n        self.pool_h = nn.AdaptiveAvgPool2d((None, 1))\n        self.pool_w = nn.AdaptiveAvgPool2d((1, None))\n\n        mip = max(8, inp // reduction)\n\n        self.conv1 = nn.Conv2d(inp, mip, kernel_size=1, stride=1, padding=0)\n        self.bn1 = nn.BatchNorm2d(mip)\n        self.act = h_swish()\n\n        self.conv_h = nn.Conv2d(mip, inp, kernel_size=1, stride=1, padding=0)\n        self.conv_w = nn.Conv2d(mip, inp, kernel_size=1, stride=1, padding=0)\n\n    def forward(self, x):\n        identity = x\n\n        n, c, h, w = x.size()\n        x_h = self.pool_h(x)\n        x_w = self.pool_w(x).permute(0, 1, 3, 2)\n\n        y = torch.cat([x_h, x_w], dim=2)\n        y = self.conv1(y)\n        y = self.bn1(y)\n        y = self.act(y)\n\n        x_h, x_w = torch.split(y, [h, w], dim=2)\n        x_w = x_w.permute(0, 1, 3, 2)\n\n        a_h = self.conv_h(x_h).sigmoid()\n        a_w = self.conv_w(x_w).sigmoid()\n\n        out = identity * a_w * a_h\n\n        return out\n\nif __name__ == '__main__':\n    input = torch.randn(50, 512, 7, 7)\n    pna = CoordAtt(inp=512)\n    output = pna(input)\n    print(output.shape)"
  },
  {
    "path": "cv-attention/DAttention.py",
    "content": "import torch, einops\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport numpy as np\nfrom timm.models.layers import trunc_normal_\n\nclass LayerNormProxy(nn.Module):\n    def __init__(self, dim):\n        super().__init__()\n        self.norm = nn.LayerNorm(dim)\n\n    def forward(self, x):\n        x = einops.rearrange(x, 'b c h w -> b h w c')\n        x = self.norm(x)\n        return einops.rearrange(x, 'b h w c -> b c h w')\n\nclass DAttention(nn.Module):\n    # Vision Transformer with Deformable Attention CVPR2022\n    # fixed_pe=True need adujust 640x640\n    def __init__(\n        self, channel, q_size, n_heads=8, n_groups=4,\n        attn_drop=0.0, proj_drop=0.0, stride=1, \n        offset_range_factor=4, use_pe=True, dwc_pe=True,\n        no_off=False, fixed_pe=False, ksize=3, log_cpb=False, kv_size=None\n    ):\n        super().__init__()\n        n_head_channels = channel // n_heads\n        self.dwc_pe = dwc_pe\n        self.n_head_channels = n_head_channels\n        self.scale = self.n_head_channels ** -0.5\n        self.n_heads = n_heads\n        self.q_h, self.q_w = q_size\n        # self.kv_h, self.kv_w = kv_size\n        self.kv_h, self.kv_w = self.q_h // stride, self.q_w // stride\n        self.nc = n_head_channels * n_heads\n        self.n_groups = n_groups\n        self.n_group_channels = self.nc // self.n_groups\n        self.n_group_heads = self.n_heads // self.n_groups\n        self.use_pe = use_pe\n        self.fixed_pe = fixed_pe\n        self.no_off = no_off\n        self.offset_range_factor = offset_range_factor\n        self.ksize = ksize\n        self.log_cpb = log_cpb\n        self.stride = stride\n        kk = self.ksize\n        pad_size = kk // 2 if kk != stride else 0\n\n        self.conv_offset = nn.Sequential(\n            nn.Conv2d(self.n_group_channels, self.n_group_channels, kk, stride, pad_size, groups=self.n_group_channels),\n            LayerNormProxy(self.n_group_channels),\n            nn.GELU(),\n            nn.Conv2d(self.n_group_channels, 2, 1, 1, 0, bias=False)\n        )\n        if self.no_off:\n            for m in self.conv_offset.parameters():\n                m.requires_grad_(False)\n\n        self.proj_q = nn.Conv2d(\n            self.nc, self.nc,\n            kernel_size=1, stride=1, padding=0\n        )\n\n        self.proj_k = nn.Conv2d(\n            self.nc, self.nc,\n            kernel_size=1, stride=1, padding=0\n        )\n\n        self.proj_v = nn.Conv2d(\n            self.nc, self.nc,\n            kernel_size=1, stride=1, padding=0\n        )\n\n        self.proj_out = nn.Conv2d(\n            self.nc, self.nc,\n            kernel_size=1, stride=1, padding=0\n        )\n\n        self.proj_drop = nn.Dropout(proj_drop, inplace=True)\n        self.attn_drop = nn.Dropout(attn_drop, inplace=True)\n\n        if self.use_pe and not self.no_off:\n            if self.dwc_pe:\n                self.rpe_table = nn.Conv2d(\n                    self.nc, self.nc, kernel_size=3, stride=1, padding=1, groups=self.nc)\n            elif self.fixed_pe:\n                self.rpe_table = nn.Parameter(\n                    torch.zeros(self.n_heads, self.q_h * self.q_w, self.kv_h * self.kv_w)\n                )\n                trunc_normal_(self.rpe_table, std=0.01)\n            elif self.log_cpb:\n                # Borrowed from Swin-V2\n                self.rpe_table = nn.Sequential(\n                    nn.Linear(2, 32, bias=True),\n                    nn.ReLU(inplace=True),\n                    nn.Linear(32, self.n_group_heads, bias=False)\n                )\n            else:\n                self.rpe_table = nn.Parameter(\n                    torch.zeros(self.n_heads, self.q_h * 2 - 1, self.q_w * 2 - 1)\n                )\n                trunc_normal_(self.rpe_table, std=0.01)\n        else:\n            self.rpe_table = None\n\n    @torch.no_grad()\n    def _get_ref_points(self, H_key, W_key, B, dtype, device):\n\n        ref_y, ref_x = torch.meshgrid(\n            torch.linspace(0.5, H_key - 0.5, H_key, dtype=dtype, device=device),\n            torch.linspace(0.5, W_key - 0.5, W_key, dtype=dtype, device=device),\n            indexing='ij'\n        )\n        ref = torch.stack((ref_y, ref_x), -1)\n        ref[..., 1].div_(W_key - 1.0).mul_(2.0).sub_(1.0)\n        ref[..., 0].div_(H_key - 1.0).mul_(2.0).sub_(1.0)\n        ref = ref[None, ...].expand(B * self.n_groups, -1, -1, -1) # B * g H W 2\n\n        return ref\n    \n    @torch.no_grad()\n    def _get_q_grid(self, H, W, B, dtype, device):\n\n        ref_y, ref_x = torch.meshgrid(\n            torch.arange(0, H, dtype=dtype, device=device),\n            torch.arange(0, W, dtype=dtype, device=device),\n            indexing='ij'\n        )\n        ref = torch.stack((ref_y, ref_x), -1)\n        ref[..., 1].div_(W - 1.0).mul_(2.0).sub_(1.0)\n        ref[..., 0].div_(H - 1.0).mul_(2.0).sub_(1.0)\n        ref = ref[None, ...].expand(B * self.n_groups, -1, -1, -1) # B * g H W 2\n\n        return ref\n\n    def forward(self, x):\n\n        B, C, H, W = x.size()\n        dtype, device = x.dtype, x.device\n\n        q = self.proj_q(x)\n        q_off = einops.rearrange(q, 'b (g c) h w -> (b g) c h w', g=self.n_groups, c=self.n_group_channels)\n        offset = self.conv_offset(q_off).contiguous()  # B * g 2 Hg Wg\n        Hk, Wk = offset.size(2), offset.size(3)\n        n_sample = Hk * Wk\n\n        if self.offset_range_factor >= 0 and not self.no_off:\n            offset_range = torch.tensor([1.0 / (Hk - 1.0), 1.0 / (Wk - 1.0)], device=device).reshape(1, 2, 1, 1)\n            offset = offset.tanh().mul(offset_range).mul(self.offset_range_factor)\n\n        offset = einops.rearrange(offset, 'b p h w -> b h w p')\n        reference = self._get_ref_points(Hk, Wk, B, dtype, device)\n\n        if self.no_off:\n            offset = offset.fill_(0.0)\n\n        if self.offset_range_factor >= 0:\n            pos = offset + reference\n        else:\n            pos = (offset + reference).clamp(-1., +1.)\n\n        if self.no_off:\n            x_sampled = F.avg_pool2d(x, kernel_size=self.stride, stride=self.stride)\n            assert x_sampled.size(2) == Hk and x_sampled.size(3) == Wk, f\"Size is {x_sampled.size()}\"\n        else:\n            pos = pos.type(x.dtype)\n            x_sampled = F.grid_sample(\n                input=x.reshape(B * self.n_groups, self.n_group_channels, H, W), \n                grid=pos[..., (1, 0)], # y, x -> x, y\n                mode='bilinear', align_corners=True) # B * g, Cg, Hg, Wg\n                \n\n        x_sampled = x_sampled.reshape(B, C, 1, n_sample)\n\n        q = q.reshape(B * self.n_heads, self.n_head_channels, H * W)\n        k = self.proj_k(x_sampled).reshape(B * self.n_heads, self.n_head_channels, n_sample)\n        v = self.proj_v(x_sampled).reshape(B * self.n_heads, self.n_head_channels, n_sample)\n\n        attn = torch.einsum('b c m, b c n -> b m n', q, k) # B * h, HW, Ns\n        attn = attn.mul(self.scale)\n\n        if self.use_pe and (not self.no_off):\n\n            if self.dwc_pe:\n                residual_lepe = self.rpe_table(q.reshape(B, C, H, W)).reshape(B * self.n_heads, self.n_head_channels, H * W)\n            elif self.fixed_pe:\n                rpe_table = self.rpe_table\n                attn_bias = rpe_table[None, ...].expand(B, -1, -1, -1)\n                attn = attn + attn_bias.reshape(B * self.n_heads, H * W, n_sample)\n            elif self.log_cpb:\n                q_grid = self._get_q_grid(H, W, B, dtype, device)\n                displacement = (q_grid.reshape(B * self.n_groups, H * W, 2).unsqueeze(2) - pos.reshape(B * self.n_groups, n_sample, 2).unsqueeze(1)).mul(4.0) # d_y, d_x [-8, +8]\n                displacement = torch.sign(displacement) * torch.log2(torch.abs(displacement) + 1.0) / np.log2(8.0)\n                attn_bias = self.rpe_table(displacement) # B * g, H * W, n_sample, h_g\n                attn = attn + einops.rearrange(attn_bias, 'b m n h -> (b h) m n', h=self.n_group_heads)\n            else:\n                rpe_table = self.rpe_table\n                rpe_bias = rpe_table[None, ...].expand(B, -1, -1, -1)\n                q_grid = self._get_q_grid(H, W, B, dtype, device)\n                displacement = (q_grid.reshape(B * self.n_groups, H * W, 2).unsqueeze(2) - pos.reshape(B * self.n_groups, n_sample, 2).unsqueeze(1)).mul(0.5)\n                attn_bias = F.grid_sample(\n                    input=einops.rearrange(rpe_bias, 'b (g c) h w -> (b g) c h w', c=self.n_group_heads, g=self.n_groups),\n                    grid=displacement[..., (1, 0)],\n                    mode='bilinear', align_corners=True) # B * g, h_g, HW, Ns\n\n                attn_bias = attn_bias.reshape(B * self.n_heads, H * W, n_sample)\n                attn = attn + attn_bias\n\n        attn = F.softmax(attn, dim=2)\n        attn = self.attn_drop(attn)\n\n        out = torch.einsum('b m n, b c n -> b c m', attn, v)\n\n        if self.use_pe and self.dwc_pe:\n            out = out + residual_lepe\n        out = out.reshape(B, C, H, W)\n\n        y = self.proj_drop(self.proj_out(out))\n\n        return y"
  },
  {
    "path": "cv-attention/ECA.py",
    "content": "import torch, math\nfrom torch import nn\n\nclass EfficientChannelAttention(nn.Module):           # Efficient Channel Attention module\n    def __init__(self, c, b=1, gamma=2):\n        super(EfficientChannelAttention, self).__init__()\n        t = int(abs((math.log(c, 2) + b) / gamma))\n        k = t if t % 2 else t + 1\n\n        self.avg_pool = nn.AdaptiveAvgPool2d(1)\n        self.conv1 = nn.Conv1d(1, 1, kernel_size=k, padding=int(k/2), bias=False)\n        self.sigmoid = nn.Sigmoid()\n\n    def forward(self, x):\n        out = self.avg_pool(x)\n        out = self.conv1(out.squeeze(-1).transpose(-1, -2)).transpose(-1, -2).unsqueeze(-1)\n        out = self.sigmoid(out)\n        return out * x\n\n\nif __name__ == '__main__':\n    input = torch.randn(50, 512, 7, 7)\n    eca = EfficientChannelAttention(c=512)\n    output = eca(input)\n    print(output.shape)"
  },
  {
    "path": "cv-attention/ELA.py",
    "content": "import torch.nn as nn\n\nclass ELA(nn.Module):\n    def __init__(self, channels) -> None:\n        super().__init__()\n        self.pool_h = nn.AdaptiveAvgPool2d((None, 1))\n        self.pool_w = nn.AdaptiveAvgPool2d((1, None))\n        self.conv1x1 = nn.Sequential(\n            nn.Conv1d(channels, channels, 1),\n            nn.GroupNorm(16, channels),\n            nn.Sigmoid()\n        )\n    \n    def forward(self, x):\n        b, c, h, w = x.size()\n        x_h = self.conv1x1(self.pool_h(x).reshape((b, c, h))).reshape((b, c, h, 1))\n        x_w = self.conv1x1(self.pool_w(x).reshape((b, c, w))).reshape((b, c, 1, w))\n        return x * x_h * x_w"
  },
  {
    "path": "cv-attention/EMA.py",
    "content": "import torch\nfrom torch import nn\n\nclass EMA(nn.Module):\n    def __init__(self, channels, factor=8):\n        super(EMA, self).__init__()\n        self.groups = factor\n        assert channels // self.groups > 0\n        self.softmax = nn.Softmax(-1)\n        self.agp = nn.AdaptiveAvgPool2d((1, 1))\n        self.pool_h = nn.AdaptiveAvgPool2d((None, 1))\n        self.pool_w = nn.AdaptiveAvgPool2d((1, None))\n        self.gn = nn.GroupNorm(channels // self.groups, channels // self.groups)\n        self.conv1x1 = nn.Conv2d(channels // self.groups, channels // self.groups, kernel_size=1, stride=1, padding=0)\n        self.conv3x3 = nn.Conv2d(channels // self.groups, channels // self.groups, kernel_size=3, stride=1, padding=1)\n\n    def forward(self, x):\n        b, c, h, w = x.size()\n        group_x = x.reshape(b * self.groups, -1, h, w)  # b*g,c//g,h,w\n        x_h = self.pool_h(group_x)\n        x_w = self.pool_w(group_x).permute(0, 1, 3, 2)\n        hw = self.conv1x1(torch.cat([x_h, x_w], dim=2))\n        x_h, x_w = torch.split(hw, [h, w], dim=2)\n        x1 = self.gn(group_x * x_h.sigmoid() * x_w.permute(0, 1, 3, 2).sigmoid())\n        x2 = self.conv3x3(group_x)\n        x11 = self.softmax(self.agp(x1).reshape(b * self.groups, -1, 1).permute(0, 2, 1))\n        x12 = x2.reshape(b * self.groups, c // self.groups, -1)  # b*g, c//g, hw\n        x21 = self.softmax(self.agp(x2).reshape(b * self.groups, -1, 1).permute(0, 2, 1))\n        x22 = x1.reshape(b * self.groups, c // self.groups, -1)  # b*g, c//g, hw\n        weights = (torch.matmul(x11, x12) + torch.matmul(x21, x22)).reshape(b * self.groups, 1, h, w)\n        return (group_x * weights.sigmoid()).reshape(b, c, h, w)"
  },
  {
    "path": "cv-attention/EffectiveSE.py",
    "content": "import torch\nfrom torch import nn as nn\nfrom timm.models.layers.create_act import create_act_layer\n\n\nclass EffectiveSEModule(nn.Module):\n    def __init__(self, channels, add_maxpool=False, gate_layer='hard_sigmoid'):\n        super(EffectiveSEModule, self).__init__()\n        self.add_maxpool = add_maxpool\n        self.fc = nn.Conv2d(channels, channels, kernel_size=1, padding=0)\n        self.gate = create_act_layer(gate_layer)\n\n    def forward(self, x):\n        x_se = x.mean((2, 3), keepdim=True)\n        if self.add_maxpool:\n            # experimental codepath, may remove or change\n            x_se = 0.5 * x_se + 0.5 * x.amax((2, 3), keepdim=True)\n        x_se = self.fc(x_se)\n        return x * self.gate(x_se)\n\nif __name__ == '__main__':\n    input=torch.randn(50,512,7,7)\n    Ese = EffectiveSEModule(512)\n    output=Ese(input)\n    print(output.shape)"
  },
  {
    "path": "cv-attention/GAM.py",
    "content": "import torch.nn as nn\nimport torch\n \nclass GAM_Attention(nn.Module):\n    def __init__(self, in_channels, rate=4):\n        super(GAM_Attention, self).__init__()\n \n        self.channel_attention = nn.Sequential(\n            nn.Linear(in_channels, int(in_channels / rate)),\n            nn.ReLU(inplace=True),\n            nn.Linear(int(in_channels / rate), in_channels)\n        )\n \n        self.spatial_attention = nn.Sequential(\n            nn.Conv2d(in_channels, int(in_channels / rate), kernel_size=7, padding=3),\n            nn.BatchNorm2d(int(in_channels / rate)),\n            nn.ReLU(inplace=True),\n            nn.Conv2d(int(in_channels / rate), in_channels, kernel_size=7, padding=3),\n            nn.BatchNorm2d(in_channels)\n        )\n \n    def forward(self, x):\n        b, c, h, w = x.shape\n        x_permute = x.permute(0, 2, 3, 1).view(b, -1, c)\n        x_att_permute = self.channel_attention(x_permute).view(b, h, w, c)\n        x_channel_att = x_att_permute.permute(0, 3, 1, 2).sigmoid()\n \n        x = x * x_channel_att\n \n        x_spatial_att = self.spatial_attention(x).sigmoid()\n        out = x * x_spatial_att\n \n        return out\n \nif __name__ == '__main__':\n    x = torch.randn(1, 64, 20, 20)\n    b, c, h, w = x.shape\n    net = GAM_Attention(in_channels=c)\n    y = net(x)\n    print(y.size())"
  },
  {
    "path": "cv-attention/GC.py",
    "content": "import torch\nfrom torch import nn as nn\nimport torch.nn.functional as F\nfrom timm.models.layers.create_act import create_act_layer, get_act_layer\nfrom timm.models.layers import make_divisible\nfrom timm.models.layers.mlp import ConvMlp\nfrom timm.models.layers.norm import LayerNorm2d\n\n\nclass GlobalContext(nn.Module):\n\n    def __init__(self, channels, use_attn=True, fuse_add=False, fuse_scale=True, init_last_zero=False,\n                 rd_ratio=1./8, rd_channels=None, rd_divisor=1, act_layer=nn.ReLU, gate_layer='sigmoid'):\n        super(GlobalContext, self).__init__()\n        act_layer = get_act_layer(act_layer)\n\n        self.conv_attn = nn.Conv2d(channels, 1, kernel_size=1, bias=True) if use_attn else None\n\n        if rd_channels is None:\n            rd_channels = make_divisible(channels * rd_ratio, rd_divisor, round_limit=0.)\n        if fuse_add:\n            self.mlp_add = ConvMlp(channels, rd_channels, act_layer=act_layer, norm_layer=LayerNorm2d)\n        else:\n            self.mlp_add = None\n        if fuse_scale:\n            self.mlp_scale = ConvMlp(channels, rd_channels, act_layer=act_layer, norm_layer=LayerNorm2d)\n        else:\n            self.mlp_scale = None\n\n        self.gate = create_act_layer(gate_layer)\n        self.init_last_zero = init_last_zero\n        self.reset_parameters()\n\n    def reset_parameters(self):\n        if self.conv_attn is not None:\n            nn.init.kaiming_normal_(self.conv_attn.weight, mode='fan_in', nonlinearity='relu')\n        if self.mlp_add is not None:\n            nn.init.zeros_(self.mlp_add.fc2.weight)\n\n    def forward(self, x):\n        B, C, H, W = x.shape\n\n        if self.conv_attn is not None:\n            attn = self.conv_attn(x).reshape(B, 1, H * W)  # (B, 1, H * W)\n            attn = F.softmax(attn, dim=-1).unsqueeze(3)  # (B, 1, H * W, 1)\n            context = x.reshape(B, C, H * W).unsqueeze(1) @ attn\n            context = context.view(B, C, 1, 1)\n        else:\n            context = x.mean(dim=(2, 3), keepdim=True)\n\n        if self.mlp_scale is not None:\n            mlp_x = self.mlp_scale(context)\n            x = x * self.gate(mlp_x)\n        if self.mlp_add is not None:\n            mlp_x = self.mlp_add(context)\n            x = x + mlp_x\n\n        return x\n\nif __name__ == '__main__':\n    input=torch.randn(50,512,7,7)\n    gc = GlobalContext(512)\n    output=gc(input)\n    print(output.shape)"
  },
  {
    "path": "cv-attention/GE.py",
    "content": "import math, torch\nfrom torch import nn as nn\nimport torch.nn.functional as F\n\nfrom timm.models.layers.create_act import create_act_layer, get_act_layer\nfrom timm.models.layers.create_conv2d import create_conv2d\nfrom timm.models.layers import make_divisible\nfrom timm.models.layers.mlp import ConvMlp\n\n\nclass GatherExcite(nn.Module):\n    def __init__(\n            self, channels, feat_size=None, extra_params=False, extent=0, use_mlp=True,\n            rd_ratio=1./16, rd_channels=None,  rd_divisor=1, add_maxpool=False,\n            act_layer=nn.ReLU, norm_layer=nn.BatchNorm2d, gate_layer='sigmoid'):\n        super(GatherExcite, self).__init__()\n        self.add_maxpool = add_maxpool\n        act_layer = get_act_layer(act_layer)\n        self.extent = extent\n        if extra_params:\n            self.gather = nn.Sequential()\n            if extent == 0:\n                assert feat_size is not None, 'spatial feature size must be specified for global extent w/ params'\n                self.gather.add_module(\n                    'conv1', create_conv2d(channels, channels, kernel_size=feat_size, stride=1, depthwise=True))\n                if norm_layer:\n                    self.gather.add_module(f'norm1', nn.BatchNorm2d(channels))\n            else:\n                assert extent % 2 == 0\n                num_conv = int(math.log2(extent))\n                for i in range(num_conv):\n                    self.gather.add_module(\n                        f'conv{i + 1}',\n                        create_conv2d(channels, channels, kernel_size=3, stride=2, depthwise=True))\n                    if norm_layer:\n                        self.gather.add_module(f'norm{i + 1}', nn.BatchNorm2d(channels))\n                    if i != num_conv - 1:\n                        self.gather.add_module(f'act{i + 1}', act_layer(inplace=True))\n        else:\n            self.gather = None\n            if self.extent == 0:\n                self.gk = 0\n                self.gs = 0\n            else:\n                assert extent % 2 == 0\n                self.gk = self.extent * 2 - 1\n                self.gs = self.extent\n\n        if not rd_channels:\n            rd_channels = make_divisible(channels * rd_ratio, rd_divisor, round_limit=0.)\n        self.mlp = ConvMlp(channels, rd_channels, act_layer=act_layer) if use_mlp else nn.Identity()\n        self.gate = create_act_layer(gate_layer)\n\n    def forward(self, x):\n        size = x.shape[-2:]\n        if self.gather is not None:\n            x_ge = self.gather(x)\n        else:\n            if self.extent == 0:\n                # global extent\n                x_ge = x.mean(dim=(2, 3), keepdims=True)\n                if self.add_maxpool:\n                    # experimental codepath, may remove or change\n                    x_ge = 0.5 * x_ge + 0.5 * x.amax((2, 3), keepdim=True)\n            else:\n                x_ge = F.avg_pool2d(\n                    x, kernel_size=self.gk, stride=self.gs, padding=self.gk // 2, count_include_pad=False)\n                if self.add_maxpool:\n                    # experimental codepath, may remove or change\n                    x_ge = 0.5 * x_ge + 0.5 * F.max_pool2d(x, kernel_size=self.gk, stride=self.gs, padding=self.gk // 2)\n        x_ge = self.mlp(x_ge)\n        if x_ge.shape[-1] != 1 or x_ge.shape[-2] != 1:\n            x_ge = F.interpolate(x_ge, size=size)\n        return x * self.gate(x_ge)\n\nif __name__ == '__main__':\n    input=torch.randn(50,512,7,7)\n    GE = GatherExcite(512)\n    output=GE(input)\n    print(output.shape)"
  },
  {
    "path": "cv-attention/LSKA.py",
    "content": "import torch.nn as nn\n\nclass LSKA(nn.Module):\n    # Large-Separable-Kernel-Attention\n    # https://github.com/StevenLauHKHK/Large-Separable-Kernel-Attention/tree/main\n    def __init__(self, dim, k_size=7):\n        super().__init__()\n\n        self.k_size = k_size\n\n        if k_size == 7:\n            self.conv0h = nn.Conv2d(dim, dim, kernel_size=(1, 3), stride=(1,1), padding=(0,(3-1)//2), groups=dim)\n            self.conv0v = nn.Conv2d(dim, dim, kernel_size=(3, 1), stride=(1,1), padding=((3-1)//2,0), groups=dim)\n            self.conv_spatial_h = nn.Conv2d(dim, dim, kernel_size=(1, 3), stride=(1,1), padding=(0,2), groups=dim, dilation=2)\n            self.conv_spatial_v = nn.Conv2d(dim, dim, kernel_size=(3, 1), stride=(1,1), padding=(2,0), groups=dim, dilation=2)\n        elif k_size == 11:\n            self.conv0h = nn.Conv2d(dim, dim, kernel_size=(1, 3), stride=(1,1), padding=(0,(3-1)//2), groups=dim)\n            self.conv0v = nn.Conv2d(dim, dim, kernel_size=(3, 1), stride=(1,1), padding=((3-1)//2,0), groups=dim)\n            self.conv_spatial_h = nn.Conv2d(dim, dim, kernel_size=(1, 5), stride=(1,1), padding=(0,4), groups=dim, dilation=2)\n            self.conv_spatial_v = nn.Conv2d(dim, dim, kernel_size=(5, 1), stride=(1,1), padding=(4,0), groups=dim, dilation=2)\n        elif k_size == 23:\n            self.conv0h = nn.Conv2d(dim, dim, kernel_size=(1, 5), stride=(1,1), padding=(0,(5-1)//2), groups=dim)\n            self.conv0v = nn.Conv2d(dim, dim, kernel_size=(5, 1), stride=(1,1), padding=((5-1)//2,0), groups=dim)\n            self.conv_spatial_h = nn.Conv2d(dim, dim, kernel_size=(1, 7), stride=(1,1), padding=(0,9), groups=dim, dilation=3)\n            self.conv_spatial_v = nn.Conv2d(dim, dim, kernel_size=(7, 1), stride=(1,1), padding=(9,0), groups=dim, dilation=3)\n        elif k_size == 35:\n            self.conv0h = nn.Conv2d(dim, dim, kernel_size=(1, 5), stride=(1,1), padding=(0,(5-1)//2), groups=dim)\n            self.conv0v = nn.Conv2d(dim, dim, kernel_size=(5, 1), stride=(1,1), padding=((5-1)//2,0), groups=dim)\n            self.conv_spatial_h = nn.Conv2d(dim, dim, kernel_size=(1, 11), stride=(1,1), padding=(0,15), groups=dim, dilation=3)\n            self.conv_spatial_v = nn.Conv2d(dim, dim, kernel_size=(11, 1), stride=(1,1), padding=(15,0), groups=dim, dilation=3)\n        elif k_size == 41:\n            self.conv0h = nn.Conv2d(dim, dim, kernel_size=(1, 5), stride=(1,1), padding=(0,(5-1)//2), groups=dim)\n            self.conv0v = nn.Conv2d(dim, dim, kernel_size=(5, 1), stride=(1,1), padding=((5-1)//2,0), groups=dim)\n            self.conv_spatial_h = nn.Conv2d(dim, dim, kernel_size=(1, 13), stride=(1,1), padding=(0,18), groups=dim, dilation=3)\n            self.conv_spatial_v = nn.Conv2d(dim, dim, kernel_size=(13, 1), stride=(1,1), padding=(18,0), groups=dim, dilation=3)\n        elif k_size == 53:\n            self.conv0h = nn.Conv2d(dim, dim, kernel_size=(1, 5), stride=(1,1), padding=(0,(5-1)//2), groups=dim)\n            self.conv0v = nn.Conv2d(dim, dim, kernel_size=(5, 1), stride=(1,1), padding=((5-1)//2,0), groups=dim)\n            self.conv_spatial_h = nn.Conv2d(dim, dim, kernel_size=(1, 17), stride=(1,1), padding=(0,24), groups=dim, dilation=3)\n            self.conv_spatial_v = nn.Conv2d(dim, dim, kernel_size=(17, 1), stride=(1,1), padding=(24,0), groups=dim, dilation=3)\n\n        self.conv1 = nn.Conv2d(dim, dim, 1)\n\n    def forward(self, x):\n        u = x.clone()\n        attn = self.conv0h(x)\n        attn = self.conv0v(attn)\n        attn = self.conv_spatial_h(attn)\n        attn = self.conv_spatial_v(attn)\n        attn = self.conv1(attn)\n        return u * attn"
  },
  {
    "path": "cv-attention/LSKBlock.py",
    "content": "import torch\nimport torch.nn as nn\n\nclass LSKblock(nn.Module):\n    def __init__(self, dim):\n        super().__init__()\n        self.conv0 = nn.Conv2d(dim, dim, 5, padding=2, groups=dim)\n        self.conv_spatial = nn.Conv2d(dim, dim, 7, stride=1, padding=9, groups=dim, dilation=3)\n        self.conv1 = nn.Conv2d(dim, dim//2, 1)\n        self.conv2 = nn.Conv2d(dim, dim//2, 1)\n        self.conv_squeeze = nn.Conv2d(2, 2, 7, padding=3)\n        self.conv = nn.Conv2d(dim//2, dim, 1)\n\n    def forward(self, x):   \n        attn1 = self.conv0(x)\n        attn2 = self.conv_spatial(attn1)\n\n        attn1 = self.conv1(attn1)\n        attn2 = self.conv2(attn2)\n        \n        attn = torch.cat([attn1, attn2], dim=1)\n        avg_attn = torch.mean(attn, dim=1, keepdim=True)\n        max_attn, _ = torch.max(attn, dim=1, keepdim=True)\n        agg = torch.cat([avg_attn, max_attn], dim=1)\n        sig = self.conv_squeeze(agg).sigmoid()\n        attn = attn1 * sig[:,0,:,:].unsqueeze(1) + attn2 * sig[:,1,:,:].unsqueeze(1)\n        attn = self.conv(attn)\n        return x * attn"
  },
  {
    "path": "cv-attention/MHSA.py",
    "content": "import torch\nimport torch.nn as nn\n\nclass MHSA(nn.Module):\n    def __init__(self, n_dims, width=14, height=14, heads=4, pos_emb=False):\n        super(MHSA, self).__init__()\n\n        self.heads = heads\n        self.query = nn.Conv2d(n_dims, n_dims, kernel_size=1)\n        self.key = nn.Conv2d(n_dims, n_dims, kernel_size=1)\n        self.value = nn.Conv2d(n_dims, n_dims, kernel_size=1)\n        self.pos = pos_emb\n        if self.pos:\n            self.rel_h_weight = nn.Parameter(torch.randn([1, heads, (n_dims) // heads, 1, int(height)]),\n                                             requires_grad=True)\n            self.rel_w_weight = nn.Parameter(torch.randn([1, heads, (n_dims) // heads, int(width), 1]),\n                                             requires_grad=True)\n        self.softmax = nn.Softmax(dim=-1)\n\n    def forward(self, x):\n        n_batch, C, width, height = x.size()\n        q = self.query(x).view(n_batch, self.heads, C // self.heads, -1)\n        k = self.key(x).view(n_batch, self.heads, C // self.heads, -1)\n        v = self.value(x).view(n_batch, self.heads, C // self.heads, -1)\n        content_content = torch.matmul(q.permute(0, 1, 3, 2), k)  # 1,C,h*w,h*w\n        c1, c2, c3, c4 = content_content.size()\n        if self.pos:\n            content_position = (self.rel_h_weight + self.rel_w_weight).view(1, self.heads, C // self.heads, -1).permute(\n                0, 1, 3, 2)  # 1,4,1024,64\n\n            content_position = torch.matmul(content_position, q)  # ([1, 4, 1024, 256])\n            content_position = content_position if (\n                    content_content.shape == content_position.shape) else content_position[:, :, :c3, ]\n            assert (content_content.shape == content_position.shape)\n            energy = content_content + content_position\n        else:\n            energy = content_content\n        attention = self.softmax(energy)\n        out = torch.matmul(v, attention.permute(0, 1, 3, 2))  # 1,4,256,64\n        out = out.view(n_batch, C, width, height)\n        return out\n\nif __name__ == '__main__':\n    input = torch.randn(50, 512, 7, 7)\n    mhsa = MHSA(n_dims=512)\n    output = mhsa(input)\n    print(output.shape)"
  },
  {
    "path": "cv-attention/MLCA.py",
    "content": "import math, torch\nfrom torch import nn\nimport torch.nn.functional as F\n\nclass MLCA(nn.Module):\n    def __init__(self, in_size, local_size=5, gamma = 2, b = 1,local_weight=0.5):\n        super(MLCA, self).__init__()\n\n        # ECA 计算方法\n        self.local_size=local_size\n        self.gamma = gamma\n        self.b = b\n        t = int(abs(math.log(in_size, 2) + self.b) / self.gamma)   # eca  gamma=2\n        k = t if t % 2 else t + 1\n\n        self.conv = nn.Conv1d(1, 1, kernel_size=k, padding=(k - 1) // 2, bias=False)\n        self.conv_local = nn.Conv1d(1, 1, kernel_size=k, padding=(k - 1) // 2, bias=False)\n\n        self.local_weight=local_weight\n\n        self.local_arv_pool = nn.AdaptiveAvgPool2d(local_size)\n        self.global_arv_pool=nn.AdaptiveAvgPool2d(1)\n\n    def forward(self, x):\n        local_arv=self.local_arv_pool(x)\n        global_arv=self.global_arv_pool(local_arv)\n\n        b,c,m,n = x.shape\n        b_local, c_local, m_local, n_local = local_arv.shape\n\n        # (b,c,local_size,local_size) -> (b,c,local_size*local_size) -> (b,local_size*local_size,c) -> (b,1,local_size*local_size*c)\n        temp_local= local_arv.view(b, c_local, -1).transpose(-1, -2).reshape(b, 1, -1)\n        # (b,c,1,1) -> (b,c,1) -> (b,1,c)\n        temp_global = global_arv.view(b, c, -1).transpose(-1, -2)\n\n        y_local = self.conv_local(temp_local)\n        y_global = self.conv(temp_global)\n\n        # (b,c,local_size,local_size) <- (b,c,local_size*local_size)<-(b,local_size*local_size,c) <- (b,1,local_size*local_size*c)\n        y_local_transpose=y_local.reshape(b, self.local_size * self.local_size,c).transpose(-1,-2).view(b, c, self.local_size , self.local_size)\n        # (b,1,c) -> (b,c,1) -> (b,c,1,1)\n        y_global_transpose = y_global.transpose(-1,-2).unsqueeze(-1)\n\n        # 反池化\n        att_local = y_local_transpose.sigmoid()\n        att_global = F.adaptive_avg_pool2d(y_global_transpose.sigmoid(),[self.local_size, self.local_size])\n        att_all = F.adaptive_avg_pool2d(att_global*(1-self.local_weight)+(att_local*self.local_weight), [m, n])\n\n        x = x * att_all\n        return x\n\nif __name__ == '__main__':\n    attention = MLCA(in_size=256)\n    inputs = torch.randn((2, 256, 16, 16))\n    result = attention(inputs)\n    print(result.size())"
  },
  {
    "path": "cv-attention/MobileViTAttention.py",
    "content": "from torch import nn\nimport torch\nfrom einops import rearrange\n\n\nclass PreNorm(nn.Module):\n    def __init__(self, dim, fn):\n        super().__init__()\n        self.ln = nn.LayerNorm(dim)\n        self.fn = fn\n\n    def forward(self, x, **kwargs):\n        return self.fn(self.ln(x), **kwargs)\n\n\nclass FeedForward(nn.Module):\n    def __init__(self, dim, mlp_dim, dropout):\n        super().__init__()\n        self.net = nn.Sequential(\n            nn.Linear(dim, mlp_dim),\n            nn.SiLU(),\n            nn.Dropout(dropout),\n            nn.Linear(mlp_dim, dim),\n            nn.Dropout(dropout)\n        )\n\n    def forward(self, x):\n        return self.net(x)\n\n\nclass Attention(nn.Module):\n    def __init__(self, dim, heads, head_dim, dropout):\n        super().__init__()\n        inner_dim = heads * head_dim\n        project_out = not (heads == 1 and head_dim == dim)\n\n        self.heads = heads\n        self.scale = head_dim ** -0.5\n\n        self.attend = nn.Softmax(dim=-1)\n        self.to_qkv = nn.Linear(dim, inner_dim * 3, bias=False)\n\n        self.to_out = nn.Sequential(\n            nn.Linear(inner_dim, dim),\n            nn.Dropout(dropout)\n        ) if project_out else nn.Identity()\n\n    def forward(self, x):\n        qkv = self.to_qkv(x).chunk(3, dim=-1)\n        q, k, v = map(lambda t: rearrange(t, 'b p n (h d) -> b p h n d', h=self.heads), qkv)\n        dots = torch.matmul(q, k.transpose(-1, -2)) * self.scale\n        attn = self.attend(dots)\n        out = torch.matmul(attn, v)\n        out = rearrange(out, 'b p h n d -> b p n (h d)')\n        return self.to_out(out)\n\n\nclass Transformer(nn.Module):\n    def __init__(self, dim, depth, heads, head_dim, mlp_dim, dropout=0.):\n        super().__init__()\n        self.layers = nn.ModuleList([])\n        for _ in range(depth):\n            self.layers.append(nn.ModuleList([\n                PreNorm(dim, Attention(dim, heads, head_dim, dropout)),\n                PreNorm(dim, FeedForward(dim, mlp_dim, dropout))\n            ]))\n\n    def forward(self, x):\n        out = x\n        for att, ffn in self.layers:\n            out = out + att(out)\n            out = out + ffn(out)\n        return out\n\n\nclass MobileViTAttention(nn.Module):\n    def __init__(self, in_channel=3, dim=512, kernel_size=3, patch_size=7):\n        super().__init__()\n        self.ph, self.pw = patch_size, patch_size\n        self.conv1 = nn.Conv2d(in_channel, in_channel, kernel_size=kernel_size, padding=kernel_size // 2)\n        self.conv2 = nn.Conv2d(in_channel, dim, kernel_size=1)\n\n        self.trans = Transformer(dim=dim, depth=3, heads=8, head_dim=64, mlp_dim=1024)\n\n        self.conv3 = nn.Conv2d(dim, in_channel, kernel_size=1)\n        self.conv4 = nn.Conv2d(2 * in_channel, in_channel, kernel_size=kernel_size, padding=kernel_size // 2)\n\n    def forward(self, x):\n        y = x.clone()  # bs,c,h,w\n\n        ## Local Representation\n        y = self.conv2(self.conv1(x))  # bs,dim,h,w\n\n        ## Global Representation\n        _, _, h, w = y.shape\n        y = rearrange(y, 'bs dim (nh ph) (nw pw) -> bs (ph pw) (nh nw) dim', ph=self.ph, pw=self.pw)  # bs,h,w,dim\n        y = self.trans(y)\n        y = rearrange(y, 'bs (ph pw) (nh nw) dim -> bs dim (nh ph) (nw pw)', ph=self.ph, pw=self.pw, nh=h // self.ph,\n                      nw=w // self.pw)  # bs,dim,h,w\n\n        ## Fusion\n        y = self.conv3(y)  # bs,dim,h,w\n        y = torch.cat([x, y], 1)  # bs,2*dim,h,w\n        y = self.conv4(y)  # bs,c,h,w\n\n        return y\n\n\nif __name__ == '__main__':\n    m = MobileViTAttention(in_channel=512)\n    input = torch.randn(1, 512, 49, 49)\n    output = m(input)\n    print(output.shape)\n"
  },
  {
    "path": "cv-attention/ParNetAttention.py",
    "content": "import numpy as np\nimport torch\nfrom torch import nn\nfrom torch.nn import init\n\n\nclass ParNetAttention(nn.Module):\n\n    def __init__(self, channel=512):\n        super().__init__()\n        self.sse = nn.Sequential(\n            nn.AdaptiveAvgPool2d(1),\n            nn.Conv2d(channel, channel, kernel_size=1),\n            nn.Sigmoid()\n        )\n\n        self.conv1x1 = nn.Sequential(\n            nn.Conv2d(channel, channel, kernel_size=1),\n            nn.BatchNorm2d(channel)\n        )\n        self.conv3x3 = nn.Sequential(\n            nn.Conv2d(channel, channel, kernel_size=3, padding=1),\n            nn.BatchNorm2d(channel)\n        )\n        self.silu = nn.SiLU()\n\n    def forward(self, x):\n        b, c, _, _ = x.size()\n        x1 = self.conv1x1(x)\n        x2 = self.conv3x3(x)\n        x3 = self.sse(x) * x\n        y = self.silu(x1 + x2 + x3)\n        return y\n\n\nif __name__ == '__main__':\n    input = torch.randn(50, 512, 7, 7)\n    pna = ParNetAttention(channel=512)\n    output = pna(input)\n    print(output.shape)"
  },
  {
    "path": "cv-attention/PolarizedSelfAttention.py",
    "content": "import numpy as np\nimport torch\nfrom torch import nn\nfrom torch.nn import init\n\n\n\nclass ParallelPolarizedSelfAttention(nn.Module):\n\n    def __init__(self, channel=512):\n        super().__init__()\n        self.ch_wv=nn.Conv2d(channel,channel//2,kernel_size=(1,1))\n        self.ch_wq=nn.Conv2d(channel,1,kernel_size=(1,1))\n        self.softmax_channel=nn.Softmax(1)\n        self.softmax_spatial=nn.Softmax(-1)\n        self.ch_wz=nn.Conv2d(channel//2,channel,kernel_size=(1,1))\n        self.ln=nn.LayerNorm(channel)\n        self.sigmoid=nn.Sigmoid()\n        self.sp_wv=nn.Conv2d(channel,channel//2,kernel_size=(1,1))\n        self.sp_wq=nn.Conv2d(channel,channel//2,kernel_size=(1,1))\n        self.agp=nn.AdaptiveAvgPool2d((1,1))\n\n    def forward(self, x):\n        b, c, h, w = x.size()\n\n        #Channel-only Self-Attention\n        channel_wv=self.ch_wv(x) #bs,c//2,h,w\n        channel_wq=self.ch_wq(x) #bs,1,h,w\n        channel_wv=channel_wv.reshape(b,c//2,-1) #bs,c//2,h*w\n        channel_wq=channel_wq.reshape(b,-1,1) #bs,h*w,1\n        channel_wq=self.softmax_channel(channel_wq)\n        channel_wz=torch.matmul(channel_wv,channel_wq).unsqueeze(-1) #bs,c//2,1,1\n        channel_weight=self.sigmoid(self.ln(self.ch_wz(channel_wz).reshape(b,c,1).permute(0,2,1))).permute(0,2,1).reshape(b,c,1,1) #bs,c,1,1\n        channel_out=channel_weight*x\n\n        #Spatial-only Self-Attention\n        spatial_wv=self.sp_wv(x) #bs,c//2,h,w\n        spatial_wq=self.sp_wq(x) #bs,c//2,h,w\n        spatial_wq=self.agp(spatial_wq) #bs,c//2,1,1\n        spatial_wv=spatial_wv.reshape(b,c//2,-1) #bs,c//2,h*w\n        spatial_wq=spatial_wq.permute(0,2,3,1).reshape(b,1,c//2) #bs,1,c//2\n        spatial_wq=self.softmax_spatial(spatial_wq)\n        spatial_wz=torch.matmul(spatial_wq,spatial_wv) #bs,1,h*w\n        spatial_weight=self.sigmoid(spatial_wz.reshape(b,1,h,w)) #bs,1,h,w\n        spatial_out=spatial_weight*x\n        out=spatial_out+channel_out\n        return out\n\n\nif __name__ == '__main__':\n    input=torch.randn(1,512,7,7)\n    psa = ParallelPolarizedSelfAttention(channel=512)\n    output=psa(input)\n    print(output.shape)\n"
  },
  {
    "path": "cv-attention/S2Attention.py",
    "content": "import numpy as np\nimport torch\nfrom torch import nn\nfrom torch.nn import init\n\n\ndef spatial_shift1(x):\n    b, w, h, c = x.size()\n    x[:, 1:, :, :c // 4] = x[:, :w - 1, :, :c // 4]\n    x[:, :w - 1, :, c // 4:c // 2] = x[:, 1:, :, c // 4:c // 2]\n    x[:, :, 1:, c // 2:c * 3 // 4] = x[:, :, :h - 1, c // 2:c * 3 // 4]\n    x[:, :, :h - 1, 3 * c // 4:] = x[:, :, 1:, 3 * c // 4:]\n    return x\n\n\ndef spatial_shift2(x):\n    b, w, h, c = x.size()\n    x[:, :, 1:, :c // 4] = x[:, :, :h - 1, :c // 4]\n    x[:, :, :h - 1, c // 4:c // 2] = x[:, :, 1:, c // 4:c // 2]\n    x[:, 1:, :, c // 2:c * 3 // 4] = x[:, :w - 1, :, c // 2:c * 3 // 4]\n    x[:, :w - 1, :, 3 * c // 4:] = x[:, 1:, :, 3 * c // 4:]\n    return x\n\n\nclass SplitAttention(nn.Module):\n    def __init__(self, channel=512, k=3):\n        super().__init__()\n        self.channel = channel\n        self.k = k\n        self.mlp1 = nn.Linear(channel, channel, bias=False)\n        self.gelu = nn.GELU()\n        self.mlp2 = nn.Linear(channel, channel * k, bias=False)\n        self.softmax = nn.Softmax(1)\n\n    def forward(self, x_all):\n        b, k, h, w, c = x_all.shape\n        x_all = x_all.reshape(b, k, -1, c)  # bs,k,n,c\n        a = torch.sum(torch.sum(x_all, 1), 1)  # bs,c\n        hat_a = self.mlp2(self.gelu(self.mlp1(a)))  # bs,kc\n        hat_a = hat_a.reshape(b, self.k, c)  # bs,k,c\n        bar_a = self.softmax(hat_a)  # bs,k,c\n        attention = bar_a.unsqueeze(-2)  # #bs,k,1,c\n        out = attention * x_all  # #bs,k,n,c\n        out = torch.sum(out, 1).reshape(b, h, w, c)\n        return out\n\n\nclass S2Attention(nn.Module):\n\n    def __init__(self, channels=512):\n        super().__init__()\n        self.mlp1 = nn.Linear(channels, channels * 3)\n        self.mlp2 = nn.Linear(channels, channels)\n        self.split_attention = SplitAttention()\n\n    def forward(self, x):\n        b, c, w, h = x.size()\n        x = x.permute(0, 2, 3, 1)\n        x = self.mlp1(x)\n        x1 = spatial_shift1(x[:, :, :, :c])\n        x2 = spatial_shift2(x[:, :, :, c:c * 2])\n        x3 = x[:, :, :, c * 2:]\n        x_all = torch.stack([x1, x2, x3], 1)\n        a = self.split_attention(x_all)\n        x = self.mlp2(a)\n        x = x.permute(0, 3, 1, 2)\n        return x\n\n\nif __name__ == '__main__':\n    input = torch.randn(50, 512, 7, 7)\n    s2att = S2Attention(channels=512)\n    output = s2att(input)\n    print(output.shape)"
  },
  {
    "path": "cv-attention/SE.py",
    "content": "import numpy as np\nimport torch\nfrom torch import nn\nfrom torch.nn import init\n\n\n\nclass SEAttention(nn.Module):\n\n    def __init__(self, channel=512,reduction=16):\n        super().__init__()\n        self.avg_pool = nn.AdaptiveAvgPool2d(1)\n        self.fc = nn.Sequential(\n            nn.Linear(channel, channel // reduction, bias=False),\n            nn.ReLU(inplace=True),\n            nn.Linear(channel // reduction, channel, bias=False),\n            nn.Sigmoid()\n        )\n\n\n    def init_weights(self):\n        for m in self.modules():\n            if isinstance(m, nn.Conv2d):\n                init.kaiming_normal_(m.weight, mode='fan_out')\n                if m.bias is not None:\n                    init.constant_(m.bias, 0)\n            elif isinstance(m, nn.BatchNorm2d):\n                init.constant_(m.weight, 1)\n                init.constant_(m.bias, 0)\n            elif isinstance(m, nn.Linear):\n                init.normal_(m.weight, std=0.001)\n                if m.bias is not None:\n                    init.constant_(m.bias, 0)\n\n    def forward(self, x):\n        b, c, _, _ = x.size()\n        y = self.avg_pool(x).view(b, c)\n        y = self.fc(y).view(b, c, 1, 1)\n        return x * y.expand_as(x)\n\n\nif __name__ == '__main__':\n    input=torch.randn(50,512,7,7)\n    se = SEAttention(channel=512,reduction=8)\n    output=se(input)\n    print(output.shape)\n"
  },
  {
    "path": "cv-attention/SGE.py",
    "content": "import numpy as np\nimport torch\nfrom torch import nn\nfrom torch.nn import init\n\nclass SpatialGroupEnhance(nn.Module):\n    def __init__(self, groups=8):\n        super().__init__()\n        self.groups=groups\n        self.avg_pool = nn.AdaptiveAvgPool2d(1)\n        self.weight=nn.Parameter(torch.zeros(1,groups,1,1))\n        self.bias=nn.Parameter(torch.zeros(1,groups,1,1))\n        self.sig=nn.Sigmoid()\n        self.init_weights()\n\n    def init_weights(self):\n        for m in self.modules():\n            if isinstance(m, nn.Conv2d):\n                init.kaiming_normal_(m.weight, mode='fan_out')\n                if m.bias is not None:\n                    init.constant_(m.bias, 0)\n            elif isinstance(m, nn.BatchNorm2d):\n                init.constant_(m.weight, 1)\n                init.constant_(m.bias, 0)\n            elif isinstance(m, nn.Linear):\n                init.normal_(m.weight, std=0.001)\n                if m.bias is not None:\n                    init.constant_(m.bias, 0)\n\n    def forward(self, x):\n        b, c, h,w=x.shape\n        x=x.view(b*self.groups,-1,h,w) #bs*g,dim//g,h,w\n        xn=x*self.avg_pool(x) #bs*g,dim//g,h,w\n        xn=xn.sum(dim=1,keepdim=True) #bs*g,1,h,w\n        t=xn.view(b*self.groups,-1) #bs*g,h*w\n\n        t=t-t.mean(dim=1,keepdim=True) #bs*g,h*w\n        std=t.std(dim=1,keepdim=True)+1e-5\n        t=t/std #bs*g,h*w\n        t=t.view(b,self.groups,h,w) #bs,g,h*w\n        \n        t=t*self.weight+self.bias #bs,g,h*w\n        t=t.view(b*self.groups,1,h,w) #bs*g,1,h*w\n        x=x*self.sig(t)\n        x=x.view(b,c,h,w)\n        return x \n\n\nif __name__ == '__main__':\n    input=torch.randn(50,512,7,7)\n    sge = SpatialGroupEnhance(groups=8)\n    output=sge(input)\n    print(output.shape)"
  },
  {
    "path": "cv-attention/SK.py",
    "content": "import numpy as np\nimport torch\nfrom torch import nn\nfrom torch.nn import init\nfrom collections import OrderedDict\n\n\nclass SKAttention(nn.Module):\n\n    def __init__(self, channel=512, kernels=[1, 3, 5, 7], reduction=16, group=1, L=32):\n        super().__init__()\n        self.d = max(L, channel // reduction)\n        self.convs = nn.ModuleList([])\n        for k in kernels:\n            self.convs.append(\n                nn.Sequential(OrderedDict([\n                    ('conv', nn.Conv2d(channel, channel, kernel_size=k, padding=k // 2, groups=group)),\n                    ('bn', nn.BatchNorm2d(channel)),\n                    ('relu', nn.ReLU())\n                ]))\n            )\n        self.fc = nn.Linear(channel, self.d)\n        self.fcs = nn.ModuleList([])\n        for i in range(len(kernels)):\n            self.fcs.append(nn.Linear(self.d, channel))\n        self.softmax = nn.Softmax(dim=0)\n\n    def forward(self, x):\n        bs, c, _, _ = x.size()\n        conv_outs = []\n        ### split\n        for conv in self.convs:\n            conv_outs.append(conv(x))\n        feats = torch.stack(conv_outs, 0)  # k,bs,channel,h,w\n\n        ### fuse\n        U = sum(conv_outs)  # bs,c,h,w\n\n        ### reduction channel\n        S = U.mean(-1).mean(-1)  # bs,c\n        Z = self.fc(S)  # bs,d\n\n        ### calculate attention weight\n        weights = []\n        for fc in self.fcs:\n            weight = fc(Z)\n            weights.append(weight.view(bs, c, 1, 1))  # bs,channel\n        attention_weughts = torch.stack(weights, 0)  # k,bs,channel,1,1\n        attention_weughts = self.softmax(attention_weughts)  # k,bs,channel,1,1\n\n        ### fuse\n        V = (attention_weughts * feats).sum(0)\n        return V\n\n\nif __name__ == '__main__':\n    input = torch.randn(50, 512, 7, 7)\n    se = SKAttention(channel=512, reduction=8)\n    output = se(input)\n    print(output.shape)\n"
  },
  {
    "path": "cv-attention/SequentialSelfAttention.py",
    "content": "import numpy as np\nimport torch\nfrom torch import nn\nfrom torch.nn import init\n\nclass SequentialPolarizedSelfAttention(nn.Module):\n\n    def __init__(self, channel=512):\n        super().__init__()\n        self.ch_wv=nn.Conv2d(channel,channel//2,kernel_size=(1,1))\n        self.ch_wq=nn.Conv2d(channel,1,kernel_size=(1,1))\n        self.softmax_channel=nn.Softmax(1)\n        self.softmax_spatial=nn.Softmax(-1)\n        self.ch_wz=nn.Conv2d(channel//2,channel,kernel_size=(1,1))\n        self.ln=nn.LayerNorm(channel)\n        self.sigmoid=nn.Sigmoid()\n        self.sp_wv=nn.Conv2d(channel,channel//2,kernel_size=(1,1))\n        self.sp_wq=nn.Conv2d(channel,channel//2,kernel_size=(1,1))\n        self.agp=nn.AdaptiveAvgPool2d((1,1))\n\n    def forward(self, x):\n        b, c, h, w = x.size()\n\n        #Channel-only Self-Attention\n        channel_wv=self.ch_wv(x) #bs,c//2,h,w\n        channel_wq=self.ch_wq(x) #bs,1,h,w\n        channel_wv=channel_wv.reshape(b,c//2,-1) #bs,c//2,h*w\n        channel_wq=channel_wq.reshape(b,-1,1) #bs,h*w,1\n        channel_wq=self.softmax_channel(channel_wq)\n        channel_wz=torch.matmul(channel_wv,channel_wq).unsqueeze(-1) #bs,c//2,1,1\n        channel_weight=self.sigmoid(self.ln(self.ch_wz(channel_wz).reshape(b,c,1).permute(0,2,1))).permute(0,2,1).reshape(b,c,1,1) #bs,c,1,1\n        channel_out=channel_weight*x\n\n        #Spatial-only Self-Attention\n        spatial_wv=self.sp_wv(channel_out) #bs,c//2,h,w\n        spatial_wq=self.sp_wq(channel_out) #bs,c//2,h,w\n        spatial_wq=self.agp(spatial_wq) #bs,c//2,1,1\n        spatial_wv=spatial_wv.reshape(b,c//2,-1) #bs,c//2,h*w\n        spatial_wq=spatial_wq.permute(0,2,3,1).reshape(b,1,c//2) #bs,1,c//2\n        spatial_wq=self.softmax_spatial(spatial_wq)\n        spatial_wz=torch.matmul(spatial_wq,spatial_wv) #bs,1,h*w\n        spatial_weight=self.sigmoid(spatial_wz.reshape(b,1,h,w)) #bs,1,h,w\n        spatial_out=spatial_weight*channel_out\n        return spatial_out\n\nif __name__ == '__main__':\n    input=torch.randn(1,512,7,7)\n    psa = SequentialPolarizedSelfAttention(channel=512)\n    output=psa(input)\n    print(output.shape)\n"
  },
  {
    "path": "cv-attention/ShuffleAttention.py",
    "content": "import numpy as np\nimport torch\nfrom torch import nn\nfrom torch.nn import init\nfrom torch.nn.parameter import Parameter\n\n\nclass ShuffleAttention(nn.Module):\n\n    def __init__(self, channel=512, reduction=16, G=8):\n        super().__init__()\n        self.G = G\n        self.channel = channel\n        self.avg_pool = nn.AdaptiveAvgPool2d(1)\n        self.gn = nn.GroupNorm(channel // (2 * G), channel // (2 * G))\n        self.cweight = Parameter(torch.zeros(1, channel // (2 * G), 1, 1))\n        self.cbias = Parameter(torch.ones(1, channel // (2 * G), 1, 1))\n        self.sweight = Parameter(torch.zeros(1, channel // (2 * G), 1, 1))\n        self.sbias = Parameter(torch.ones(1, channel // (2 * G), 1, 1))\n        self.sigmoid = nn.Sigmoid()\n\n    def init_weights(self):\n        for m in self.modules():\n            if isinstance(m, nn.Conv2d):\n                init.kaiming_normal_(m.weight, mode='fan_out')\n                if m.bias is not None:\n                    init.constant_(m.bias, 0)\n            elif isinstance(m, nn.BatchNorm2d):\n                init.constant_(m.weight, 1)\n                init.constant_(m.bias, 0)\n            elif isinstance(m, nn.Linear):\n                init.normal_(m.weight, std=0.001)\n                if m.bias is not None:\n                    init.constant_(m.bias, 0)\n\n    @staticmethod\n    def channel_shuffle(x, groups):\n        b, c, h, w = x.shape\n        x = x.reshape(b, groups, -1, h, w)\n        x = x.permute(0, 2, 1, 3, 4)\n\n        # flatten\n        x = x.reshape(b, -1, h, w)\n\n        return x\n\n    def forward(self, x):\n        b, c, h, w = x.size()\n        # group into subfeatures\n        x = x.view(b * self.G, -1, h, w)  # bs*G,c//G,h,w\n\n        # channel_split\n        x_0, x_1 = x.chunk(2, dim=1)  # bs*G,c//(2*G),h,w\n\n        # channel attention\n        x_channel = self.avg_pool(x_0)  # bs*G,c//(2*G),1,1\n        x_channel = self.cweight * x_channel + self.cbias  # bs*G,c//(2*G),1,1\n        x_channel = x_0 * self.sigmoid(x_channel)\n\n        # spatial attention\n        x_spatial = self.gn(x_1)  # bs*G,c//(2*G),h,w\n        x_spatial = self.sweight * x_spatial + self.sbias  # bs*G,c//(2*G),h,w\n        x_spatial = x_1 * self.sigmoid(x_spatial)  # bs*G,c//(2*G),h,w\n\n        # concatenate along channel axis\n        out = torch.cat([x_channel, x_spatial], dim=1)  # bs*G,c//G,h,w\n        out = out.contiguous().view(b, -1, h, w)\n\n        # channel shuffle\n        out = self.channel_shuffle(out, 2)\n        return out\n\n\nif __name__ == '__main__':\n    input = torch.randn(50, 512, 7, 7)\n    se = ShuffleAttention(channel=512, G=8)\n    output = se(input)\n    print(output.shape)\n"
  },
  {
    "path": "cv-attention/SimAM.py",
    "content": "import torch\nimport torch.nn as nn\n\n\nclass SimAM(torch.nn.Module):\n    def __init__(self, e_lambda=1e-4):\n        super(SimAM, self).__init__()\n\n        self.activaton = nn.Sigmoid()\n        self.e_lambda = e_lambda\n\n    def __repr__(self):\n        s = self.__class__.__name__ + '('\n        s += ('lambda=%f)' % self.e_lambda)\n        return s\n\n    @staticmethod\n    def get_module_name():\n        return \"simam\"\n\n    def forward(self, x):\n        b, c, h, w = x.size()\n\n        n = w * h - 1\n\n        x_minus_mu_square = (x - x.mean(dim=[2, 3], keepdim=True)).pow(2)\n        y = x_minus_mu_square / (4 * (x_minus_mu_square.sum(dim=[2, 3], keepdim=True) / n + self.e_lambda)) + 0.5\n\n        return x * self.activaton(y)\n\n\nif __name__ == '__main__':\n    input = torch.randn(3, 64, 7, 7)\n    model = SimAM()\n    outputs = model(input)\n    print(outputs.shape)\n"
  },
  {
    "path": "cv-attention/TripletAttention.py",
    "content": "import torch\nimport torch.nn as nn\n\n\nclass BasicConv(nn.Module):\n    def __init__(self, in_planes, out_planes, kernel_size, stride=1, padding=0, dilation=1, groups=1, relu=True,\n                 bn=True, bias=False):\n        super(BasicConv, self).__init__()\n        self.out_channels = out_planes\n        self.conv = nn.Conv2d(in_planes, out_planes, kernel_size=kernel_size, stride=stride, padding=padding,\n                              dilation=dilation, groups=groups, bias=bias)\n        self.bn = nn.BatchNorm2d(out_planes, eps=1e-5, momentum=0.01, affine=True) if bn else None\n        self.relu = nn.ReLU() if relu else None\n\n    def forward(self, x):\n        x = self.conv(x)\n        if self.bn is not None:\n            x = self.bn(x)\n        if self.relu is not None:\n            x = self.relu(x)\n        return x\n\n\nclass ZPool(nn.Module):\n    def forward(self, x):\n        return torch.cat((torch.max(x, 1)[0].unsqueeze(1), torch.mean(x, 1).unsqueeze(1)), dim=1)\n\n\nclass AttentionGate(nn.Module):\n    def __init__(self):\n        super(AttentionGate, self).__init__()\n        kernel_size = 7\n        self.compress = ZPool()\n        self.conv = BasicConv(2, 1, kernel_size, stride=1, padding=(kernel_size - 1) // 2, relu=False)\n\n    def forward(self, x):\n        x_compress = self.compress(x)\n        x_out = self.conv(x_compress)\n        scale = torch.sigmoid_(x_out)\n        return x * scale\n\n\nclass TripletAttention(nn.Module):\n    def __init__(self, no_spatial=False):\n        super(TripletAttention, self).__init__()\n        self.cw = AttentionGate()\n        self.hc = AttentionGate()\n        self.no_spatial = no_spatial\n        if not no_spatial:\n            self.hw = AttentionGate()\n\n    def forward(self, x):\n        x_perm1 = x.permute(0, 2, 1, 3).contiguous()\n        x_out1 = self.cw(x_perm1)\n        x_out11 = x_out1.permute(0, 2, 1, 3).contiguous()\n        x_perm2 = x.permute(0, 3, 2, 1).contiguous()\n        x_out2 = self.hc(x_perm2)\n        x_out21 = x_out2.permute(0, 3, 2, 1).contiguous()\n        if not self.no_spatial:\n            x_out = self.hw(x)\n            x_out = 1 / 3 * (x_out + x_out11 + x_out21)\n        else:\n            x_out = 1 / 2 * (x_out11 + x_out21)\n        return x_out\n\n\nif __name__ == '__main__':\n    input = torch.randn(50, 512, 7, 7)\n    triplet = TripletAttention()\n    output = triplet(input)\n    print(output.shape)\n"
  },
  {
    "path": "cv-attention/readme.md",
    "content": "# CV-Attention\n关于CV的一些经典注意力机制代码。  \n目前代码格式主要用于yolov3,yolov5,yolov7,yolov8.\n\n# Supports\n| name | need_chaneel | paper |\n| :----:| :----: | :----: |\n| BAM | True | https://arxiv.org/pdf/1807.06514.pdf |\n| CBAM | True | https://openaccess.thecvf.com/content_ECCV_2018/papers/Sanghyun_Woo_Convolutional_Block_Attention_ECCV_2018_paper.pdf |\n| SE | True | https://arxiv.org/abs/1709.01507 |\n| CoTAttention | True | https://arxiv.org/abs/2107.12292 |\n| MobileViTAttention | True | https://arxiv.org/abs/2110.02178 |\n| SimAM | False | http://proceedings.mlr.press/v139/yang21o/yang21o.pdf |\n| SK | True | https://arxiv.org/pdf/1903.06586.pdf |\n| ShuffleAttention | True | https://arxiv.org/pdf/2102.00240.pdf |\n| S2Attention | True | https://arxiv.org/abs/2108.01072 |\n| TripletAttention | False | https://arxiv.org/abs/2010.03045 |\n| ECA | True | https://arxiv.org/pdf/1910.03151.pdf |\n| ParNetAttention | True | https://arxiv.org/abs/2110.07641 |\n| CoordAttention | True | https://arxiv.org/abs/2103.02907 |\n| MHSA<br>Multi-Head-Self-Attention | True | https://wuch15.github.io/paper/EMNLP2019-NRMS.pdf |\n| SGE | False | https://arxiv.org/pdf/1905.09646.pdf |\n| A2Attention | True | https://arxiv.org/pdf/1810.11579.pdf |\n| GC<br>Global Context Attention | True | https://arxiv.org/abs/1904.11492 |\n| EffectiveSE<br>Effective Squeeze-Excitation | True | https://arxiv.org/abs/1911.06667 |\n| GE<br>Gather-Excite Attention | True | https://arxiv.org/abs/1810.12348 |\n| CrissCrossAttention | True | https://arxiv.org/abs/1811.11721 |\n| Polarized Self-Attention | True | https://arxiv.org/abs/2107.00782 |\n| Sequential Self-Attention | True | https://arxiv.org/abs/2107.00782 |\n| GAM | True | https://arxiv.org/pdf/2112.05561v1.pdf |\n| Biformer | True | https://arxiv.org/abs/2303.08810 |\n| EMA | True | https://arxiv.org/abs/2305.13563v2 |\n| CloAttention | True | https://arxiv.org/abs/2303.17803 |\n| LSKBlock | True | https://arxiv.org/pdf/2303.09030.pdf |\n| MLCA | True | https://www.sciencedirect.com/science/article/pii/S0952197623006267 |\n| LSKA | True | https://arxiv.org/abs/2309.01439 |\n| DAttention | True | https://openaccess.thecvf.com/content/CVPR2022/html/Xia_Vision_Transformer_With_Deformable_Attention_CVPR_2022_paper.html |\n| ELA | True | https://arxiv.org/abs/2403.01123 |\n| CAA | True | https://arxiv.org/pdf/2403.06258 |\n| CPCA | True | https://arxiv.org/abs/2306.05196 |\n\n# Install\n安装命令:pip install timm einops efficientnet_pytorch -i https://pypi.tuna.tsinghua.edu.cn/simple\n\n# Course\n1. [yolov5添加注意力哔哩哔哩视频教学链接](https://www.bilibili.com/video/BV1s84y1775U) [yolov5添加注意力-补充事项-哔哩哔哩视频教学链接](https://www.bilibili.com/video/BV1hG4y1M71X)\n2. [yolov7添加注意力哔哩哔哩视频教学链接](https://www.bilibili.com/video/BV1pd4y1H7BK)\n3. [yolov8添加注意力哔哩哔哩视频教学链接](https://www.bilibili.com/video/BV1ZQ4y1J7oC/) [yolov8添加注意力进阶版哔哩哔哩视频教学链接](https://www.bilibili.com/video/BV1ZQ4y1J7oC/)\n\n# Reference\nhttps://github.com/xmu-xiaoma666/External-Attention-pytorch  \nhttps://github.com/rwightman/pytorch-image-models  \nhttps://github.com/rayleizhu/BiFormer  \nhttps://github.com/XiaLiPKU/EMANet  \nhttps://github.com/qhfan/CloFormer/tree/main  \nhttps://github.com/zcablii/LSKNet  \nhttps://github.com/wandahangFY/MLCA  \nhttps://github.com/StevenLauHKHK/Large-Separable-Kernel-Attention  \nhttps://github.com/LeapLabTHU/DAT  \nhttps://github.com/NUST-Machine-Intelligence-Laboratory/PKINet  \nhttps://github.com/Cuthbert-Huang/CPCANet  "
  },
  {
    "path": "cvpr2025-deim-project.md",
    "content": "# 2025-SOTA目标检测模型项目(2026发论文必备项目)\n\n鉴于目前YOLO系列模型反映的拒稿率越来越高且YOLO模型确实非常泛滥，无论是不是计算机专业、是不是小白都基本可以快速上手YOLO模型，导致计算机专业和有期刊级别要求的小伙伴日益难受，简单来说就是YOLO在学术界的红利已经基本吃透，目前开始越来越多人转CVPR2024-RTDETR，而且目前研究生毕业一年比一年难，不像以前随便结合点深度学习就可以毕业，就像越来越多人反馈，导师已经明确禁止不能用YOLO，再加上这么多年来YOLO对学术的灌水已经让审稿人出现视觉疲劳，带上了”有色”眼镜看待YOLO，所以结合以上众多原因，因此我们需要一个有一定上手难度且是顶会的模型来支撑我们后续的大小论文的工作。\nPS:20250614版本更新后，本项目的dfine和cvpr2025-deimv1已经支持Ultralytics同款的配置文件形式，大大降低上手难度！[B站介绍链接](https://www.bilibili.com/video/BV1Q4MHzXEdd/)\n\n### 1. 这个项目包含什么模型？\n\n这个项目的源代码来自：[DEIM](https://github.com/ShihuaHuang95/DEIM)  \n其内部可以跑以下模型(以下模型支持目标检测，DFine、DEIM支持实例分割，不支持姿态检测、旋转目标检测)：\n1. CVPR2025-DEIM\n2. ICLR2025-DFine\n3. RTDETRV2\n4. DEIMV2\n\n选择这个课程，这些模型都可以改进，不限于DEIM，这些都是顶会的模型，不要说2025，就算是2026、2027都不落后！还有一个重点就是像CVPR2024-RTDETR，最小的模型也有50GFLOPs，但是现在的DEIM和DFine都有像YOLO一样的Nano大小版本的模型，变相降低了训练成本和设备要求！(建议最低12G显存的显卡起步)\n\n### 2. 这个项目会以什么形式开展？\n\n1. 这个项目跟以往区别比较大，我们其他改进项目都是直接提供好修改好的代码，用户不需要懂代码的情况下也可以开始做实验，甚至可以做完实验，但是这样也有一个不好的点，就是会大幅度降低上手门槛，这特别对计算机专业的同学来说是非常不利的，因此这个项目在代码工程方面，这个项目我们会有教程教大家怎么去调试程序、修改代码、添加模块。\n2. 这个项目会**不定时(直播时间到时候会群里进行通知，没有硬性规定多久一次，不方便看的会有录播)**有**直播**，详细直播内容请看第三大点。\n3. 这个项目会持续更新创新点，如果创新点是来源于现有的模型，还会提供对应的论文及其中文翻译版本（假设像FasterNet中的FasterBlock，会提供好对应的py文件、原论文及其中文翻译版本），用户可以根据从本课程学习到的缝合模块（代指第一点）去定制或者创新自己的网络。\n4. 附带答疑群，答疑群主要答疑的内容是实验、代码操作、代码报错等相关问题(经过YOLO、RTDETR大量的经验，我没法保证每一个问题都能回复到大家，只能保证遇到过的问题会给大家提供建议和方向，当然群内的一些高频问题，我也会收集起来挑出部分出视频或者直播给大家进行解答)。\n5. 如果后续有剪枝、蒸馏，不需要额外付费，本项目会包含在内，所以性价比真的非常高，YOLO改进剪枝蒸馏三件套也要200多了。\n\n### 3. 直播内容\n\n1. 解答群内一些高频疑问，比如很多人都会遇到的报错、或者注意点。\n2. 教大家如何去做二次创新(PS:这个不是口头给大家说怎么二次创新，而是从代码的层面带大家去实践二次创新。可能这里会有同学问，那自研创新呢？你会自研模块的前提是必须要懂如何二次创新，首先这是一个过程，然后我有很多自研模块是突然有的想法或者看论文看到某些结构与之前看到的论文联合后有新的想法，所以也很难描述我为什么就想到这个结构，大多数情况下，只需要会有一定复杂度的二次创新就足够，当然自研模块有机会我也会去讲)\n3. 给大家从浅到深解说一些我认为比较经典的模块，提高自己能创新新模块的能力和基础，因为很多模块都是相通的，本质没有变，只是模块上的组合体替换。(有不少人私聊我说，能不能出些你是如何结合一些现有的模块去创新的，虽然现在B站上也有不少讲创新点的，但是他们的感觉就是从头到尾读一篇代码，我看了几次之后觉得我把代码扔给GPT给我打上注释的感觉是一样的，看的时候感觉哦哦哦这样，看完后就不知所然)\n\n### 3. 入手本项目需要注意些什么？\n\n1. 因为本项目完全不是像之前YOLO项目这样傻瓜式操作，所以本项目有一定难度，具有以下特征的小伙伴不建议入手。（看到这里可能有人会问，为什么不考虑把DEIM、DFine、RTDETRV2都移植到Ultralytics？因为这个不确定性太大，DETR类型的模型对参数非常敏感，可能有一点参数不合适，效果就会大打折扣，但是对于这种较为复杂的模型移植过程中又很难保证一比一全过程移植） \n- 未入门、100%纯小白(如果你有心学，这个不是问题)\n- 不太想花太多时间去学，搞这个只是想为了水个无要求的论文就行\n- 没有任何解决问题的能力(如果你有心学，这个不是问题)\n- 从来不看使用文档、说明之类的(强烈不建议入手)  \n- 此项目上手需要时间，如果想无脑直接跑就不合适购入  \n最后补充！如果你具有以上特征，但又要求期刊不能太水或者不能做yolo的问题，尽早入手CVPR2024-RTDETR吧，去年没抓上，今年不能再等了，模型红利可不等人。\n2. 入手前可以先去B站看一下[CVPR025-DEIM合集里面的教程](https://space.bilibili.com/286900343/lists/4909499)，最起码先跑通过DEIM原始模型，能跟着视频训练和测试，然后也把合集里面的基础课程都先看一下，为后面打好基础。\n3. 我认为这个不是什么不可达到的事，就看你想不想毕业了，有志者事竟成。\nPS:20250614版本更新后，本项目的dfine和deim已经支持Ultralytics同款的配置文件形式，大大降低上手难度！[B站介绍链接](https://www.bilibili.com/video/BV1Q4MHzXEdd/)\n\n### 4. 价格\n\n1. 本项目价格为288，没有时效限制。（与其150、200买个YOLO纯模型改进专栏，不如288买个2025-SOTA专栏，最起码不用怕花了钱，最后做的YOLO还投不出去，还毕不了业）\n2. 虚拟项目一经售出不退不换，需要入手前考虑清楚，如果你是初次入手我的项目，怕我不靠谱，可以先考虑入手个YOLO和RTDETR看下。\n\n### 5. 项目使用问题\n\n1. 购买本项目的使用者都会得到一个独一无二的用于解压7z的密码，到时候用于解压对应的压缩包，此密码自己妥善保管，请勿告诉他人。\n2. 本项目的视频和直播回放统一都是加密视频，每个购买者都可以得到一个激活码，激活码在每个人专属的7z压缩文件内。\n\n### 6. 项目更新公告\n\n- 20250330\n\n    1. 初版项目发布.\n\n- 20250413\n\n    1. 新增多个改进模块并新增模块简介，位置在engine/extre_module/module_images内。\n    2. 新增训练和测试阶段的进度条显示。\n    3. 优化tensorboard中的精度名称显示。\n    4. 优化输出，把重要信息换颜色显示。\n    5. 新增plot_train_batch_freq参数，用于控制间隔多少epoch保存第一个batch中的数据增强后的图像，默认为12。\n    6. 新增保存当前参数信息，会自动保存到output_dir中的args.json文件内。\n    7. 优化output_dir保存逻辑，当判断output_dir路径存在的时候，会自动在后缀加1，避免覆盖原先代码。\n\n- 20250419\n\n    1. 新增verbose_type参数，用于控制使用默认还是进度条输出，默认为官方默认输出形式。\n    2. 新增thop计算模型计算量方式，避免calflops对于部分算子出现不支持报错的操作。\n    3. 完善每个模块的py文件，增加输出计算量和参数量等数值，方便用户后续调试。\n    4. 给DataLoader中添加pin_memory参数为True，可以在训练时候如果是数据加载成为瓶颈，可以提高速度。\n    5. 修复用户反馈的已知问题。\n    6. 新增多个改进模块。\n\n- 20250429\n\n    1. 修复engine/extre_module/custom_nn/attention/SEAM.py模块，应该是MutilSEAM。\n    2. 新增一些进阶课程的视频。\n    3. 新增多个改进模块。\n    4. 修复用户反馈的已知问题。\n    5. 修复续训时候会新增一个保存路径的问题。\n    6. 修复多卡训练Stage2的时候会出现部分进程找不到权重文件的问题。\n\n- 20250514\n\n    1. 新增一些进阶课程的视频。\n    2. 新增多个改进模块。\n    3. 修复用户反馈的已知问题。\n\n- 20250526\n\n    1. 新增一些进阶课程的视频。\n    2. 新增多个改进模块。\n    3. 新增cache_ram参数，详细可以看userguide。\n    4. 修复在torch2.7.0下出现的NotImplementedError问题。\n\n- 20250609\n\n    1. 修复新增了cache_ram功能后训练COCO数据集精度不正常的问题。\n    2. 修复在训练COCO数据集中数据增强的绘制BUG。\n    3. 新增多个改进模块。\n    4. 新增一些进阶课程的视频。\n    5. 修复用户反馈的已知问题。\n\n- 20250614\n\n    1. 新增Ultralytics的配置文件方式，大大降低改进难度。\n    2. 新增一些<Ultralytics的配置文件方式>进阶课程的视频。\n    3. 新增多个改进模块。\n\n- 20250617\n\n    1. 修复配置文件中层序号有误的问题。\n\n- 20250619\n\n    1. 修复配置文件中层序号有误的问题。\n    2. 新增多个改进模块。\n    3. 新增一些<Ultralytics的配置文件方式>进阶课程的视频。\n\n- 20250625\n\n    1. 修复best_stg2保存异常的问题。\n    2. 新增YOLOV13中的HyperACE模块。\n    3. 新增多个关于<Ultralytics的配置文件方式>进阶课程的视频。\n\n- 20250705\n\n    1. 新增多个改进模块。\n    2. 新增多个关于<Ultralytics的配置文件方式>进阶课程的视频。\n    3. 新增20250704基础疑问解答直播回放链接。\n\n- 20250714\n\n    1. 新增多个改进模块。\n    2. 新增多个关于<Ultralytics的配置文件方式>进阶课程的视频。\n    3. 新增小目标检测网络架构专题一群课题直播回放。\n\n- 20250726\n\n    1. 新增在test-only的状态下输出每个类别的'mAP', 'mAP_50', 'mAP_75', 'mAP_s', 'mAP_m', 'mAP_l'。\n    2. 新增多个改进模块。\n    3. 修复用户反馈的已知问题。\n    4. 新增一个JSON格式数据集脚本。(输出类别数和类别id、输出每个类别的实例数量)\n\n- 20250817\n\n    1. 新增支持蒸馏学习，蒸馏学习支持断点续训使用方法跟正常训练一样。\n    2. 蒸馏学习支持特征蒸馏、逻辑蒸馏、特征+逻辑蒸馏 这三种方式。\n    3. 无论是Ultralytics配置文件方式、还是原始的代码方式都支持相互蒸馏。\n    4. 蒸馏学习支持控制epoch，例如只有前50epoch进行蒸馏学习，后50epoch关闭蒸馏学习。\n    5. 更多细节请看关于<知识蒸馏教学视频>的进阶课程。\n    6. 支持输出YOLO指标(Precision、Recall、F1-Score、mAP50、mAP75、mAP50-95)，详细请看userguide。\n    7. 新增多个改进模块。\n    8. 新增小目标检测网络架构专题二链接。\n\n- 20250823\n\n    1. 修复YOLO指标在一些图片没真实标签的时候报错的bug。\n    2. 开放逻辑蒸馏，在项目内有对应的课程。\n    3. 新增多个改进模块。\n    4. 新增<知识蒸馏教学视频>的进阶课程。\n\n- 20250907\n\n    1. 新增多个改进模块。\n    2. 修复蒸馏学习中教师信息输出错误的问题。\n\n- 20250921\n\n    1. 新增导出脚本(export.py)，支持导出onnx、tensorrt模型。\n    2. 重构大部分输出，增加输出对应的时间、文件、函数、行数，以便用户快速定位。\n    3. 新增20250910直播回放链接。\n    4. 修复一些已知BUG。\n    5. 完善onnx、tensorrt模型推理脚本。\n    6. 支持在train.py test-only状态下中使用onnx、tensorrt模型进行验证。\n    7. 新增<模型导出>相关教程视频。\n    8. 新增多个改进模块。\n    9. 支持DINOV3(ConvNext、ViT)作为主干进行微调。<教程在百度云创新课题的第五点>\n\n- 20251012\n\n    1. 移植DEIMV2到本项目，暂只支持原始的代码修改方式。\n    2. 更新UserGuide。\n    3. 新增<DEIMV2说明视频>。\n    4. 修复一些已知问题。\n\n- 20251025\n\n    1. 新增DQ-DETR的模块。\n    2. 新增多个改进模块。\n    3. 新增<DQ-DETR改进点>的相关教程视频。\n    4. 修复一些已知问题。\n\n- 20251102\n\n    1. 新增<DQ-DETR改进点>的相关教程视频。\n    2. 修复一些已知问题。\n\n- 20251115\n\n    1. 新增以DensityMap为主导的创新课程[DFINE with Density-aware Query Selection]。\n    2. 修复一些已知问题。\n\n- 20251207\n\n    1. 新增在test-only状态下，yolo-metrice支持保存混淆矩阵。\n    2. 新增DFine、DEIM实例分割的实现，使用相关请看进阶教程实例分割部分。\n    3. 更新dataset/coco_analyzer.py脚本，支持输出数据集中更多的内容，以便分析数据集的特点。\n    4. 新增tools/visualization/tp_fp_fn_analysis.py脚本，用于分析检测结果中的tp、fp、fn。\n    5. 新增多个改进模块。\n    6. 修复一些已知问题。\n    7. 新增<TGRS2025-HighFrequencyDirectionInjection创新思想课程>。\n    8. 新增基于ByteTrack的目标跟踪，教程请看进阶教程内的<目标跟踪ByteTrack的使用教程>。\n\n- 20251213\n\n    1. 参考CVPR2022-MaskDINO重构实例分割检测头代码。\n    2. 修复在ram_cache状态下实例分割数据集部分存在的BUG。\n    3. 重新录制实例分割部分的进阶视频。\n\n- 20251224\n\n    1. 新增多个改进模块。\n    2. 修复实例分割部分已知的问题。\n    3. 新增以DensityMap为主导的实例分割检测头内容[DFINESeg with Density-aware Query Selection]。\n    4. 新增[DFINESeg with Density-aware Query Selection]的使用视频教程。\n    5. 更新实例分割实现讲解。\n\n- 20251226\n\n    1. 修复一些已知问题。\n    2. 新增基于COCO-Tiny指标，并支持输出每类COCO-Tiny指标，详细请看UserGuide.md中的<项目内yml一些额外参数说明>。\n\n- 20260109\n\n    1. 修复一些已知问题。\n    2. 新增<ES-MoE>动态路由网络模块。\n    3. 更新视频链接。\n\n- 20260128\n\n    1. 修复一些已知问题。\n    2. 新增多个改进模块。\n    3. 新增<ES-MoE>动态路由网络教程视频。\n    4. 新增<TPAMI2025 YOLO-MS>的MSBlock和GQL的教程视频。\n\n- 20260224\n\n    1. 修复一些已知问题。\n    2. 新增多个改进模块。\n    3. compile_module的编译模块支持50系显卡。\n    4. 为了兼容50系用户，新版的环境统一修改成torch2.8.0，旧版本的用户不影响。\n\n- 20260310\n\n    1. 新增diou, ciou, eiou, siou, shapeiou, piou, piou2。\n    2. 支持TIMM中的主干进行训练。\n    3. DINOV3版本支持Ultralytics版本训练。\n    4. 新增AAAI2026-SPJFB模块。\n    5. 新增TGRS2025-GLSS2D模块。\n    6. 新增TIP2025-CAFM模块。\n    7. 新增TIP2025-DWM_MSA模块。\n    8. 新增DynamicERF模块。\n    9. 新增如何使用其他IOU的操作视频。\n    10. 新增TIMM主干的操作视频。\n    11. yolo_metrice参数从默认为False改为True，代表训练过程中YOLO和COCO指标都会一并输出。\n\n### 7. 目前已有的模块\n\n- engine/extre_module/custom_nn/attention \n\n    1. engine/extre_module/custom_nn/attention/SEAM.py\n    2. CVPR2021|engine/extre_module/custom_nn/attention/ca.py\n    3. ICASSP2023|engine/extre_module/custom_nn/attention/ema.py\n    4. ICML2021|engine/extre_module/custom_nn/attention/simam.py\n    5. ICCV2023|engine/extre_module/custom_nn/attention/lsk.py\n    6. WACV2024|engine/extre_module/custom_nn/attention/DeformableLKA.py\n    7. engine/extre_module/custom_nn/attention/mlca.py\n    8. BIBM2024|engine/extre_module/custom_nn/attention/FSA.py\n    9. AAAI2025|engine/extre_module/custom_nn/attention/CDFA.py\n    10. engine/extre_module/custom_nn/attention/GLSA.py\n    11. TGRS2025|engine/extre_module/custom_nn/attention/MCA.py\n    12. CVPR2025|engine/extre_module/custom_nn/attention/CASAB.py \n    13. NN2025|engine/extre_module/custom_nn/attention/KSFA.py\n    14. TPAMI2025|engine/extre_module/custom_nn/attention/GQL.py\n    15. TGRS2025|engine/extre_module/custom_nn/attention/ACA.py\n    16. TGRS2025|engine/extre_module/custom_nn/attention/DHPF.py\n    17. TGRS2025|engine/extre_module/custom_nn/attention/ACAB.py\n\n- engine/extre_module/custom_nn/block\n\n    1. engine/extre_module/custom_nn/block/RepHMS.py\n    2. 自研模块|engine/extre_module/custom_nn/block/rgcspelan.py\n    3. TPAMI2025|engine/extre_module/custom_nn/block/MANet.py\n\n- engine/extre_module/custom_nn/conv_module\n\n    1. CVPR2021|engine/extre_module/custom_nn/conv_module/dbb.py\n    2. IEEETIP2024|engine/extre_module/custom_nn/conv_module/deconv.py\n    3. ICCV2023|engine/extre_module/custom_nn/conv_module/dynamic_snake_conv.py\n    4. CVPR2023|engine/extre_module/custom_nn/conv_module/pconv.py\n    5. AAAI2025|engine/extre_module/custom_nn/conv_module/psconv.py\n    6. CVPR2025|engine/extre_module/custom_nn/conv_module/ShiftwiseConv.py\n    7. engine/extre_module/custom_nn/conv_module/wdbb.py\n    8. engine/extre_module/custom_nn/conv_module/deepdbb.py\n    9. ECCV2024|engine/extre_module/custom_nn/conv_module/wtconv2d.py\n    10. CVPR2023|engine/extre_module/custom_nn/conv_module/ScConv.py\n    11. engine/extre_module/custom_nn/conv_module/dcnv2.py\n    12. CVPR2024|engine/extre_module/custom_nn/conv_module/DilatedReparamConv.py\n    13. engine/extre_module/custom_nn/conv_module/gConv.py\n    14. CVPR2024|engine/extre_module/custom_nn/conv_module/IDWC.py\n    15. engine/extre_module/custom_nn/conv_module/DSA.py\n    16. CVPR2025|engine/extre_module/custom_nn/conv_module/FDConv.py\n    17. CVPR2023|engine/extre_module/custom_nn/conv_module/dcnv3.py\n    18. CVPR2024|engine/extre_module/custom_nn/conv_module/dcnv4.py\n    19. CVPR2024|engine/extre_module/custom_nn/conv_module/DynamicConv.py\n    20. CVPR2024|engine/extre_module/custom_nn/conv_module/FADC.py\n    21. CVPR2023|engine/extre_module/custom_nn/conv_module/SMPConv.py\n    22. MIA2025|engine/extre_module/custom_nn/conv_module/FourierConv.py\n    23. CVPR2024|engine/extre_module/custom_nn/conv_module/SFSConv.py\n    24. ICCV2025|engine/extre_module/custom_nn/conv_module/MBRConv.py\n    25. ICCV2025|engine/extre_module/custom_nn/conv_module/ConvAttn.py\n    26. ICCV2025|engine/extre_module/custom_nn/conv_module/Converse2D.py\n    27. CVPR2025|engine/extre_module/custom_nn/conv_module/gcconv.py\n    28. ACCV2024|engine/extre_module/custom_nn/conv_module/RMBC.py\n\n- engine/extre_module/custom_nn/upsample\n\n    1. CVPR2024|engine/extre_module/custom_nn/upsample/eucb.py\n    2. CVPR2024|engine/extre_module/custom_nn/upsample/eucb_sc.py\n    3. engine/extre_module/custom_nn/upsample/WaveletUnPool.py\n    4. ICCV2019|engine/extre_module/custom_nn/upsample/CARAFE.py\n    5. ICCV2023|engine/extre_module/custom_nn/upsample/DySample.py\n    6. ICCV2025|engine/extre_module/custom_nn/upsample/Converse2D_Up.py\n    7. CVPR2025|engine/extre_module/custom_nn/upsample/DSUB.py\n\n- engine/extre_module/custom_nn/downsample\n\n    1. IEEETIP2020|engine/extre_module/custom_nn/downsample/gcnet.py\n    2. 自研模块|engine/extre_module/custom_nn/downsample/lawds.py \n    3. engine/extre_module/custom_nn/downsample/WaveletPool.py\n    4. engine/extre_module/custom_nn/downsample/ADown.py\n    5. engine/extre_module/custom_nn/downsample/YOLOV7Down.py\n    6. engine/extre_module/custom_nn/downsample/SPDConv.py\n    7. engine/extre_module/custom_nn/downsample/HWD.py\n    8. engine/extre_module/custom_nn/downsample/DRFD.py\n    9. TGRS2025|engine/extre_module/custom_nn/conv_module/FSConv.py\n\n- engine/extre_module/custom_nn/stem\n\n    1. engine/extre_module/custom_nn/stem/SRFD.py\n    2. engine/extre_module/custom_nn/stem/LoG.py\n    3. ICCV2023|engine/extre_module/custom_nn/stem/RepStem.py\n\n- engine/extre_module/custom_nn/featurefusion\n\n    1. 自研模块|engine/extre_module/custom_nn/featurefusion/cgfm.py\n    2. BMVC2024|engine/extre_module/custom_nn/featurefusion/msga.py\n    3. CVPR2024|engine/extre_module/custom_nn/featurefusion/mfm.py\n    4. IEEETIP2023|engine/extre_module/custom_nn/featurefusion/CSFCN.py\n    5. BIBM2024|engine/extre_module/custom_nn/featurefusion/mpca.py\n    6. ACMMM2024|engine/extre_module/custom_nn/featurefusion/wfu.py\n    7. CVPR2025|engine/extre_module/custom_nn/featurefusion/GDSAFusion.py\n    8. engine/extre_module/custom_nn/featurefusion/PST.py\n    9. TGRS2025|engine/extre_module/custom_nn/featurefusion/MSAM.py\n    10. INFFUS2025|engine/extre_module/custom_nn/featurefusion/DPCF.py\n    11. CVRP2025|engine/extre_module/custom_nn/featurefusion/LCA.py\n    12. TGRS2025|engine/extre_module/custom_nn/featurefusion/HFFE.py\n    13. TGRS2025|engine/extre_module/custom_nn/featurefusion/MFPM.py\n    14. TGRS2025|engine/extre_module/custom_nn/featurefusion/ERM.py\n    15. TIP2025|engine/extre_module/custom_nn/featurefusion/CAFM.py\n\n- engine/extre_module/custom_nn/module\n\n    1. AAAI2025|engine/extre_module/custom_nn/module/APBottleneck.py\n    2. CVPR2025|engine/extre_module/custom_nn/module/efficientVIM.py\n    3. CVPR2023|engine/extre_module/custom_nn/module/fasterblock.py\n    4. CVPR2024|engine/extre_module/custom_nn/module/starblock.py\n    5. engine/extre_module/custom_nn/module/DWR.py\n    6. CVPR2024|engine/extre_module/custom_nn/module/UniRepLKBlock.py\n    7. CVPR2025|engine/extre_module/custom_nn/module/mambaout.py\n    8. AAAI2024|engine/extre_module/custom_nn/module/DynamicFilter.py\n    9. engine/extre_module/custom_nn/module/StripBlock.py\n    10. TGRS2024|engine/extre_module/custom_nn/module/elgca.py\n    11. CVPR2024|engine/extre_module/custom_nn/module/LEGM.py\n    12. ICCV2023|engine/extre_module/custom_nn/module/iRMB.py\n    13. TPAMI2025|engine/extre_module/custom_nn/module/MSBlock.py\n    14. ICLR2024|engine/extre_module/custom_nn/module/FATBlock.py\n    15. CVPR2024|engine/extre_module/custom_nn/module/MSCB.py\n    16. engine/extre_module/custom_nn/module/LEGBlock.py\n    17. CVPR2025|engine/extre_module/custom_nn/module/RCB.py\n    18. ECCV2024|engine/extre_module/custom_nn/module/JDPM.py\n    19. CVPR2025|engine/extre_module/custom_nn/module/vHeat.py\n    20. CVPR2025|engine/extre_module/custom_nn/module/EBlock.py\n    21. CVPR2025|engine/extre_module/custom_nn/module/DBlock.py\n    22. ECCV2024|engine/extre_module/custom_nn/module/FMB.py\n    23. CVPR2024|engine/extre_module/custom_nn/module/IDWB.py\n    24. ECCV2022|engine/extre_module/custom_nn/module/LFE.py\n    25. AAAI2025|engine/extre_module/custom_nn/module/FCM.py\n    26. CVPR2024|engine/extre_module/custom_nn/module/RepViTBlock.py\n    27. CVPR2024|engine/extre_module/custom_nn/module/PKIModule.py\n    28. CVPR2024|engine/extre_module/custom_nn/module/camixer.py\n    29. ICCV2025|engine/extre_module/custom_nn/module/ESC.py\n    30. CVPR2025|engine/extre_module/custom_nn/module/nnWNet.py\n    31. TGRS2025|engine/extre_module/custom_nn/module/ARF.py\n    32. AAAI2024|engine/extre_module/custom_nn/module/CFBlock.py\n    33. IJCV2024|engine/extre_module/custom_nn/module/FMA.py\n    34. engine/extre_module/custom_nn/module/LWGA.py\n    35. TGRS2025|engine/extre_module/custom_nn/module/CSSC.py\n    36. TGRS2025|engine/extre_module/custom_nn/module/CNCM.py\n    37. ICCV2025|engine/extre_module/custom_nn/module/HFRB.py\n    38. ICIP2025|engine/extre_module/custom_nn/module/EVA.py\n    39. CVPR2025|engine/extre_module/custom_nn/module/IEL.py\n    40. MICCAI2023|engine/extre_module/custom_nn/module/MFEBlock.py\n    41. AAAI2026|engine/extre_module/custom_nn/module/PartialNetBlock.py\n    42. TGRS2025|engine/extre_module/custom_nn/module/DRG.py\n    43. engine/extre_module/custom_nn/module/Wave2D.py\n    44. TGRS2025|engine/extre_module/custom_nn/module/GLGM.py\n    45. TGRS2025|engine/extre_module/custom_nn/module/MAC.py\n    46. AAAI2026|engine/extre_module/custom_nn/module/SPJFB.py\n\n- engine/extre_module/custom_nn/neck\n\n    1. 自研模块|engine/extre_module/custom_nn/neck/FDPN.py\n\n- engine/extre_module/custom_nn/neck_module\n\n    1. TPAMI2025|engine/extre_module/custom_nn/neck_module/HyperCompute.py\n    2. engine/extre_module/custom_nn/neck_module/HyperACE.py\n    3. engine/extre_module/custom_nn/neck_module/GoldYOLO.py\n    4. AAAI2025|engine/extre_module/custom_nn/neck_module/HS_FPN.py\n\n- engine/extre_module/custom_nn/norm\n\n    1. ICML2024|engine/extre_module/custom_nn/transformer/repbn.py\n    2. CVPR2025|engine/extre_module/custom_nn/transformer/dyt.py\n    3. engine/extre_module/custom_nn/norm/derf.py\n\n- engine/extre_module/custom_nn/transformer\n\n    1. ICLR2025|engine/extre_module/custom_nn/transformer/PolaLinearAttention.py\n    2. CVPR2023|engine/extre_module/custom_nn/transformer/biformer.py\n    3. CVPR2023|engine/extre_module/custom_nn/transformer/CascadedGroupAttention.py\n    4. CVPR2022|engine/extre_module/custom_nn/transformer/DAttention.py\n    5. ICLR2022|engine/extre_module/custom_nn/transformer/DPBAttention.py\n    6. CVPR2024|engine/extre_module/custom_nn/transformer/AdaptiveSparseSA.py\n    7. engine/extre_module/custom_nn/transformer/GSA.py\n    8. engine/extre_module/custom_nn/transformer/RSA.py\n    9. ECCV2024|engine/extre_module/custom_nn/transformer/FSSA.py\n    10. AAAI2025|engine/extre_module/custom_nn/transformer/DilatedGCSA.py\n    11. AAAI2025|engine/extre_module/custom_nn/transformer/DilatedMWSA.py\n    12. CVPR2024|engine/extre_module/custom_nn/transformer/SHSA.py\n    13. IJCAI2024|engine/extre_module/custom_nn/transformer/CTA.py\n    14. IJCAI2024|engine/extre_module/custom_nn/transformer/SFA.py\n    15. engine/extre_module/custom_nn/transformer/MSLA.py\n    16. ACMMM2025|engine/extre_module/custom_nn/transformer/CPIA_SA.py\n    17. NN2025|engine/extre_module/custom_nn/transformer/TokenSelectAttention.py\n    18. CVPR2025|engine/extre_module/custom_nn/transformer/TAB.py\n    19. TPAMI2025|engine/extre_module/custom_nn/transformer/LRSA.py\n    20. ICCV2025|engine/extre_module/custom_nn/transformer/MALA.py\n    21. ICML2023|engine/extre_module/custom_nn/transformer/MUA.py\n    22. ACMMM2025|engine/extre_module/custom_nn/transformer/EGSA.py\n    23. ACMMM2025|engine/extre_module/custom_nn/transformer/SWSA.py\n    24. AAAI2026|engine/extre_module/custom_nn/transformer/DHOGSA.py\n    25. NeurIPS2025|engine/extre_module/custom_nn/transformer/CBSA.py\n    26. TGRS2025|engine/extre_module/custom_nn/transformer/DPWA.py\n    27. TIP2025|engine/extre_module/custom_nn/transformer/DWM_MSA.py\n\n- engine/extre_module/custom_nn/mlp\n\n    1. CVPR2024|engine/extre_module/custom_nn/mlp/ConvolutionalGLU.py\n    2. IJCAI2024|engine/extre_module/custom_nn/mlp/DFFN.py\n    3. ICLR2024|engine/extre_module/custom_nn/mlp/FMFFN.py\n    4. CVPR2024|engine/extre_module/custom_nn/mlp/FRFN.py\n    5. ECCV2024|engine/extre_module/custom_nn/mlp/EFFN.py \n    6. WACV2025|engine/extre_module/custom_nn/mlp/SEFN.py\n    7. ICLR2025|engine/extre_module/custom_nn/mlp/KAN.py\n    8. CVPR2025|engine/extre_module/custom_nn/mlp/EDFFN.py\n    9. ICVJ2024|engine/extre_module/custom_nn/mlp/DML.py\n    10. AAAI2026|engine/extre_module/custom_nn/mlp/DIFF.py\n\n- engine/extre_module/custom_nn/mamba\n\n    1. AAAI2025|engine/extre_module/custom_nn/mamba/SS2D.py\n    2. CVPR2025|engine/extre_module/custom_nn/mamba/ASSM.py\n    3. CVPR2025|engine/extre_module/custom_nn/mamba/SAVSS.py\n    4. CVPR2025|engine/extre_module/custom_nn/mamba/MobileMamba/mobilemamba.py\n    5. CVPR2025|engine/extre_module/custom_nn/mamba/MaIR.py\n    6. TGRS2025|engine/extre_module/custom_nn/mamba/GLVSS.py\n    7. ICCV2025|engine/extre_module/custom_nn/mamba/VSSD.py\n    8. ICCV2025|engine/extre_module/custom_nn/mamba/TinyViM.py\n    9. INFFUS2025|engine/extre_module/custom_nn/mamba/CSI.py\n    10. TIP2025|engine/extre_module/custom_nn/mamba/SFMB.py\n    11. TGRS2025|engine/extre_module/custom_nn/mamba/GLSS.py\n    12. TGRS2025|engine/extre_module/custom_nn/mamba/GLSS2D.py\n\n- engine/extre_module/custom_nn/moe\n\n    1. engine/extre_module/custom_nn/moe/moe_module.py\n\n- engine/extre_module/custom_nn/featurepreprocess\n\n    1. TGRS2025|engine/extre_module/custom_nn/featurepreprocess/FAENet.py\n\n- 积木模块,示例教程engine/extre_module/custom_nn/module/example.py\n\n    1. YOLOV5|C3\n    2. YOLOV8|C2f\n    3. YOLO11|C3k2\n    4. TPAMI2025|MANet\n    5. TPAMI2024|MetaFormer_Block\n    6. TPAMI2024+CVPR2025|MetaFormer_Mona\n    7. TPAMI2024+CVPR2025+WACV2025|MetaFormer_SEFN\n    8. TPAMI2024+CVPR2025+WACV2025|MetaFormer_Mona_SEFN\n\n- 创新课程代码<标识着是那个课程中的代码，详细可以去看对应的课程视频>\n\n    1. 顶会中的Partial创新思想课程|engine/extre_module/innovate/CVPR2020_GhostConv.py\n    2. 顶会中的Partial创新思想课程|engine/extre_module/innovate/CVPR2023_PartialConv.py\n    3. CVPR2025-MobileMamba中的Long-Range WTB-Mamba二次创新|engine/extre_module/innovate/CVPR2025_MobileMamba.py\n    4. TGRS2025-HighFrequencyDirectionInjection创新思想课程|engine/extre_module/innovate/TGRS2025_HFDI.py"
  },
  {
    "path": "damo-yolo/Annotations/ReadMe.md",
    "content": "# 存放VOC标注格式的文件夹"
  },
  {
    "path": "damo-yolo/JPEGImages/ReadMe.md",
    "content": "# 存放图像的文件夹"
  },
  {
    "path": "damo-yolo/readme.md",
    "content": "# DAMO-YOLO的数据集处理文件\n本目录下的脚本是针对与DAMO-YOLO的数据集处理脚本，支持如下：\n1. VOC标注格式转换为COCO标注格式，并生成train.json,val.json,test.json.\n\n# 使用方法\n1. 把图片存放在JPEGImages中，图片后缀需要一致，比如都是jpg或者png等等，不支持混合的图片后缀格式，比如一些是jpg，一些是png。\n2. 把VOC标注格式的XML文件存放在Annotations中。\n3. 运行voc2coco.py,其中postfix参数是JPEGImages的图片后缀，train_ratio是训练集的比例，val_ratio是验证集的比例，剩下的就是测试集的比例。"
  },
  {
    "path": "damo-yolo/voc2coco.py",
    "content": "import os\nimport glob\nimport json\nimport shutil\nimport numpy as np\nimport xml.etree.ElementTree as ET\n \nSTART_BOUNDING_BOX_ID = 1\n\ndef find_classes(path):\n    classes = []\n    for i in os.listdir(path):\n        try:\n            in_file = open(os.path.join(path, i), encoding='utf-8')\n            tree=ET.parse(in_file)\n            root = tree.getroot()\n\n            for obj in root.iter('object'):\n                difficult = 0 \n                if obj.find('difficult')!=None:\n                    difficult = obj.find('difficult').text\n                cls = obj.find('name').text\n                if cls not in classes:\n                    classes.append(cls)\n        except Exception as e:\n            print(os.path.join(path, i), e)\n    return classes\n\ndef get(root, name):\n    return root.findall(name)\n \n \ndef get_and_check(root, name, length):\n    vars = root.findall(name)\n    if len(vars) == 0:\n        raise NotImplementedError('Can not find %s in %s.'%(name, root.tag))\n    if length > 0 and len(vars) != length:\n        raise NotImplementedError('The size of %s is supposed to be %d, but is %d.'%(name, length, len(vars)))\n    if length == 1:\n        vars = vars[0]\n    return vars\n \n \ndef convert(xml_list, json_file):\n    json_dict = {\"info\":['none'], \"license\":['none'], \"images\": [], \"annotations\": [], \"categories\": []}\n    categories = pre_define_categories.copy()\n    bnd_id = START_BOUNDING_BOX_ID\n    all_categories = {}\n    for index, line in enumerate(xml_list):\n        # print(\"Processing %s\"%(line))\n        xml_f = line\n        tree = ET.parse(xml_f)\n        root = tree.getroot()\n        \n        filename = os.path.basename(xml_f)[:-4] + f\".{postfix}\"\n            \n        image_id = index\n        \n        size = get_and_check(root, 'size', 1)\n        width = int(get_and_check(size, 'width', 1).text)\n        height = int(get_and_check(size, 'height', 1).text)\n        image = {'file_name': filename, 'height': height, 'width': width, 'id':image_id}\n        json_dict['images'].append(image)\n        ## Cruuently we do not support segmentation\n        #  segmented = get_and_check(root, 'segmented', 1).text\n        #  assert segmented == '0'\n        for obj in get(root, 'object'):\n            category = get_and_check(obj, 'name', 1).text\n            if category in all_categories:\n                all_categories[category] += 1\n            else:\n                all_categories[category] = 1\n            if category not in categories:\n                if only_care_pre_define_categories:\n                    continue\n                new_id = len(categories) + 1\n                print(\"[warning] category '{}' not in 'pre_define_categories'({}), create new id: {} automatically\".format(category, pre_define_categories, new_id))\n                categories[category] = new_id\n            category_id = categories[category]\n            bndbox = get_and_check(obj, 'bndbox', 1)\n            xmin = int(float(get_and_check(bndbox, 'xmin', 1).text))\n            ymin = int(float(get_and_check(bndbox, 'ymin', 1).text))\n            xmax = int(float(get_and_check(bndbox, 'xmax', 1).text))\n            ymax = int(float(get_and_check(bndbox, 'ymax', 1).text))\n            # if (xmax > xmin) or (ymax > ymin):\n            #     continue\n            # assert(xmax > xmin), \"xmax <= xmin, {}\".format(line)\n            # assert(ymax > ymin), \"ymax <= ymin, {}\".format(line)\n            o_width = abs(xmax - xmin)\n            o_height = abs(ymax - ymin)\n            ann = {'area': o_width*o_height, 'iscrowd': 0, 'image_id':\n                   image_id, 'bbox':[xmin, ymin, o_width, o_height],\n                   'category_id': category_id, 'id': bnd_id, 'ignore': 0,\n                   'segmentation': []}\n            json_dict['annotations'].append(ann)\n            bnd_id = bnd_id + 1\n \n    for cate, cid in categories.items():\n        cat = {'supercategory': 'none', 'id': cid, 'name': cate}\n        json_dict['categories'].append(cat)\n    json_fp = open(json_file, 'w')\n    json_str = json.dumps(json_dict)\n    json_fp.write(json_str)\n    json_fp.close()\n    print(\"------------create {} done--------------\".format(json_file))\n    print(\"find {} categories: {} -->>> your pre_define_categories {}: {}\".format(len(all_categories), all_categories.keys(), len(pre_define_categories), pre_define_categories.keys()))\n    print(\"category: id --> {}\".format(categories))\n    print(categories.keys())\n    print(categories.values())\n \n \nif __name__ == '__main__':\n    postfix = 'jpg'\n \t# xml标注文件夹   \n    xml_dir = './datasets/Annotations'\n    # 训练数据的josn文件\n    save_json_train = './datasets/train.json'\n    # 验证数据的josn文件\n    save_json_val = './datasets/val.json'\n    # 验证数据的test文件\n    save_json_test = './datasets/test.json'\n    # 类别，如果是多个类别，往classes中添加类别名字即可，比如['dog', 'person', 'cat']\n    classes = []\n    \n    # 是否需要先遍历全部xml文件寻找classes\n    get_data_classes = True\n    # 是否只关注classes里面的类别\n    only_care_pre_define_categories = False\n\n    if get_data_classes:\n        classes = find_classes(xml_dir)\n        only_care_pre_define_categories = False\n\n    pre_define_categories = {}\n    for i, cls in enumerate(classes):\n        pre_define_categories[cls] = i + 1\n    print(pre_define_categories)\n\n    # 训练数据集比例 \n    train_ratio = 0.7\n    val_ratio = 0.1\n    print('xml_dir is {}'.format(xml_dir))\n    xml_list = glob.glob(xml_dir + \"/*.xml\")  \n    xml_list = np.sort(xml_list)\n#     print('xml_list is {}'.format(xml_list))\n    np.random.seed(100)\n    np.random.shuffle(xml_list)\n \n    train_num = int(len(xml_list)*train_ratio)\n    val_num = int(len(xml_list)*val_ratio)\n    print('训练样本数目是 {}'.format(train_num))\n    print('验证样本数目是 {}'.format(val_num))\n    print('测试样本数目是 {}'.format(len(xml_list) - train_num - val_num))\n    xml_list_val = xml_list[:val_num]\n    xml_list_train = xml_list[val_num:train_num+val_num]\n    xml_list_test = xml_list[train_num+val_num:]  \n    # 对训练数据集对应的xml进行coco转换   \n    convert(xml_list_train, save_json_train)\n    # 对验证数据集的xml进行coco转换\n    convert(xml_list_val, save_json_val)\n    # 对测试数据集的xml进行coco转换\n    convert(xml_list_test, save_json_test)"
  },
  {
    "path": "data-offline-aug/object_detection_data_aug.py",
    "content": "import warnings\nwarnings.filterwarnings('ignore')\nimport os, shutil, cv2, tqdm\nimport numpy as np\nimport albumentations as A\nfrom PIL import Image\nfrom multiprocessing import Pool\nfrom typing import Callable, Dict, List, Union\n\n# https://github.com/albumentations-team/albumentations\n# https://albumentations.ai/docs/api_reference/augmentations/geometric/transforms/#geometric-transforms-augmentationsgeometrictransforms:~:text=Contributing%20to%20Albumentations-,Geometric%20transforms%20(augmentations.geometric.transforms),-%C2%B6\n\nIMAGE_PATH = 'dataset/object_detection/images'\nLABEL_PATH = 'dataset/object_detection/labels'\nAUG_IMAGE_PATH = 'dataset/object_detection/images_aug'\nAUG_LABEL_PATH = 'dataset/object_detection/labels_aug'\nSHOW_SAVE_PATH = 'results'\nCLASSES = ['head', 'person']\n\nENHANCEMENT_LOOP = 1\nENHANCEMENT_STRATEGY = A.Compose([\n    A.Compose([\n        A.Affine(scale=[0.5, 1.5], translate_percent=[0.0, 0.3], rotate=[-360, 360], shear=[-45, 45], keep_ratio=True, p=0.5), # Augmentation to apply affine transformations to images.\n        A.BBoxSafeRandomCrop(erosion_rate=0.2, p=0.1), # Crop a random part of the input without loss of bboxes.\n        A.D4(p=0.1), # Applies one of the eight possible D4 dihedral group transformations to a square-shaped input, maintaining the square shape. These transformations correspond to the symmetries of a square, including rotations and reflections.\n        A.ElasticTransform(p=0.1), # Elastic deformation of images as described in [Simard2003]_ (with modifications).\n        A.Flip(p=0.1), # Flip the input either horizontally, vertically or both horizontally and vertically.\n        A.GridDistortion(p=0.1), # Applies grid distortion augmentation to images, masks, and bounding boxes. This technique involves dividing the image into a grid of cells and randomly displacing the intersection points of the grid, resulting in localized distortions.\n        A.Perspective(p=0.1), # Perform a random four point perspective transform of the input.\n    ], p=1.0),\n    \n    A.Compose([\n        A.GaussNoise(p=0.1), # Apply Gaussian noise to the input image.\n        A.ISONoise(p=0.1), # Apply camera sensor noise.\n        A.ImageCompression(quality_lower=50, quality_upper=100, p=0.1), # Decreases image quality by Jpeg, WebP compression of an image.\n        A.RandomBrightnessContrast(p=0.1), # Randomly change brightness and contrast of the input image.\n        A.RandomFog(p=0.1), # Simulates fog for the image.\n        A.RandomRain(p=0.1), # Adds rain effects to an image.\n        A.RandomSnow(p=0.1), # Bleach out some pixel values imitating snow.\n        A.RandomShadow(p=0.1), # Simulates shadows for the image\n        A.RandomSunFlare(p=0.1), # Simulates Sun Flare for the image\n        A.ToGray(p=0.1), # Convert the input RGB image to grayscale\n    ], p=1.0)\n    \n    # A.OneOf([\n    #     A.GaussNoise(p=1.0), # Apply Gaussian noise to the input image.\n    #     A.ISONoise(p=1.0), # Apply camera sensor noise.\n    #     A.ImageCompression(quality_lower=50, quality_upper=100, p=1.0), # Decreases image quality by Jpeg, WebP compression of an image.\n    #     A.RandomBrightnessContrast(p=1.0), # Randomly change brightness and contrast of the input image.\n    #     A.RandomFog(p=1.0), # Simulates fog for the image.\n    #     A.RandomRain(p=1.0), # Adds rain effects to an image.\n    #     A.RandomSnow(p=1.0), # Bleach out some pixel values imitating snow.\n    #     A.RandomShadow(p=1.0), # Simulates shadows for the image\n    #     A.RandomSunFlare(p=1.0), # Simulates Sun Flare for the image\n    #     A.ToGray(p=1.0), # Convert the input RGB image to grayscale\n    # ], p=1.0),\n], bbox_params=A.BboxParams(format='yolo', min_visibility=0.1, label_fields=['class_labels']))\n\ndef parallelise(function: Callable, data: List, chunksize=100, verbose=True, num_workers=os.cpu_count()) -> List:\n    num_workers = 1 if num_workers < 1 else num_workers  # Pool needs to have at least 1 worker.\n    pool = Pool(processes=num_workers)\n    results = list(\n        tqdm.tqdm(pool.imap(function, data, chunksize), total=len(data), disable=not verbose)\n    )\n    pool.close()\n    pool.join()\n    return results\n\ndef draw_detections(box, name, img):\n    height, width, _ = img.shape\n    xmin, ymin, xmax, ymax = list(map(int, list(box)))\n    \n    # 根据图像大小调整矩形框的线宽和文本的大小\n    line_thickness = max(1, int(min(height, width) / 200))\n    font_scale = min(height, width) / 500\n    font_thickness = max(1, int(min(height, width) / 200))\n    # 根据图像大小调整文本的纵向位置\n    text_offset_y = int(min(height, width) / 50)\n    \n    cv2.rectangle(img, (xmin, ymin), (xmax, ymax), (0, 0, 255), line_thickness)\n    cv2.putText(img, str(name), (xmin, ymin - text_offset_y), cv2.FONT_HERSHEY_SIMPLEX, font_scale, (0, 255, 0), font_thickness, lineType=cv2.LINE_AA)\n    return img\n\ndef show_labels(images_base_path, labels_base_path):\n    if os.path.exists(SHOW_SAVE_PATH):\n        shutil.rmtree(SHOW_SAVE_PATH)\n    os.makedirs(SHOW_SAVE_PATH, exist_ok=True)\n    \n    for images_name in tqdm.tqdm(os.listdir(images_base_path)):\n        file_heads, _ = os.path.splitext(images_name)\n        # images_path = f'{images_base_path}/{images_name}'\n        images_path = os.path.join(images_base_path, images_name)\n        # labels_path = f'{labels_base_path}/{file_heads}.txt'\n        labels_path = os.path.join(labels_base_path, f'{file_heads}.txt')\n        if os.path.exists(labels_path):\n            with open(labels_path) as f:\n                labels = np.array(list(map(lambda x:np.array(x.strip().split(), dtype=np.float64), f.readlines())), dtype=np.float64)\n            images = cv2.imread(images_path)\n            height, width, _ = images.shape\n            for cls, x_center, y_center, w, h in labels:\n                x_center *= width\n                y_center *= height\n                w *= width\n                h *= height\n                draw_detections([x_center - w // 2, y_center - h // 2, x_center + w // 2, y_center + h // 2], CLASSES[int(cls)], images)\n            # cv2.imwrite(f'{SHOW_SAVE_PATH}/{images_name}', images)\n            cv2.imwrite(os.path.join(SHOW_SAVE_PATH, images_name), images)\n            print(f'{SHOW_SAVE_PATH}/{images_name} save success...')\n        else:\n            print(f'{labels_path} label file not found...')\n\ndef data_aug_single(images_name):\n    file_heads, postfix = os.path.splitext(images_name)\n    # images_path = f'{IMAGE_PATH}/{images_name}'\n    images_path = os.path.join(IMAGE_PATH, images_name)\n    # labels_path = f'{LABEL_PATH}/{file_heads}.txt'\n    labels_path = os.path.join(LABEL_PATH, f'{file_heads}.txt')\n    if os.path.exists(labels_path):\n        with open(labels_path) as f:\n            labels = np.array(list(map(lambda x:np.array(x.strip().split(), dtype=np.float64), f.readlines())), dtype=np.float64)\n        images = Image.open(images_path)\n        for i in range(ENHANCEMENT_LOOP):\n            # new_images_name = f'{AUG_IMAGE_PATH}/{file_heads}_{i:0>3}{postfix}'\n            new_images_name = os.path.join(AUG_IMAGE_PATH, f'{file_heads}_{i:0>3}{postfix}')\n            # new_labels_name = f'{AUG_LABEL_PATH}/{file_heads}_{i:0>3}.txt'\n            new_labels_name = os.path.join(AUG_LABEL_PATH, f'{file_heads}_{i:0>3}.txt')\n            try:\n                transformed = ENHANCEMENT_STRATEGY(image=np.array(images), bboxes=np.minimum(np.maximum(labels[:, 1:], 0), 1), class_labels=labels[:, 0])\n            except:\n                continue\n            transformed_image = transformed['image']\n            transformed_bboxes = transformed['bboxes']\n            transformed_class_labels = transformed['class_labels']\n            \n            cv2.imwrite(new_images_name, cv2.cvtColor(transformed_image, cv2.COLOR_RGB2BGR))\n            with open(new_labels_name, 'w+') as f:\n                for bbox, cls in zip(transformed_bboxes, transformed_class_labels):\n                    f.write(f'{cls} {bbox[0]} {bbox[1]} {bbox[2]} {bbox[3]}\\n')\n            print(f'{new_images_name} and {new_labels_name} save success...')\n    else:\n        print(f'{labels_path} label file not found...')\n\ndef data_aug():\n    if os.path.exists(AUG_IMAGE_PATH):\n        shutil.rmtree(AUG_IMAGE_PATH)\n    if os.path.exists(AUG_LABEL_PATH):\n        shutil.rmtree(AUG_LABEL_PATH)\n        \n    os.makedirs(AUG_IMAGE_PATH, exist_ok=True)\n    os.makedirs(AUG_LABEL_PATH, exist_ok=True)\n\n    for images_name in tqdm.tqdm(os.listdir(IMAGE_PATH)):\n        data_aug_single(images_name)\n    \nif __name__ == '__main__':\n    # data_aug()\n    \n    # show_labels(IMAGE_PATH, LABEL_PATH)\n    show_labels(AUG_IMAGE_PATH, AUG_LABEL_PATH)\n    "
  },
  {
    "path": "data-offline-aug/readme.md",
    "content": "# data-offline-aug\n\n### 环境\n\n    pip install -i https://pypi.tuna.tsinghua.edu.cn/simple albumentations\n\n### 1. object_detection_data_aug.py\n\n    目标检测数据集yolo格式离线数据增强脚本.\n    视频教程链接:https://www.bilibili.com/video/BV1bT421k7iq/\n\n### 2. segment_data_aug.py\n\n    语义分割离线数据增强脚本.\n    视频教程链接:https://www.bilibili.com/video/BV1xi421a7Gb/\n\n# Reference\nhttps://github.com/albumentations-team/albumentations  "
  },
  {
    "path": "data-offline-aug/segment_data_aug.py",
    "content": "import warnings\nwarnings.filterwarnings('ignore')\nimport os, shutil, cv2, tqdm\nimport numpy as np\nnp.random.seed(0)\nimport albumentations as A\nfrom PIL import Image\nfrom multiprocessing import Pool\nfrom typing import Callable, Dict, List, Union\n\n# https://github.com/albumentations-team/albumentations\n\ndef generate_color_map(num_classes):\n    hsv_colors = [(i * 180 // num_classes, 255, 255) for i in range(num_classes)]\n    rgb_colors = [[0, 0, 0]] + [cv2.cvtColor(np.uint8([[color]]), cv2.COLOR_HSV2BGR)[0][0] for color in hsv_colors]\n    return np.array(rgb_colors, dtype=np.uint8)\n\nIMAGE_PATH = 'dataset/segment/images'\nLABEL_PATH = 'dataset/segment/labels'\nAUG_IMAGE_PATH = 'dataset/segment/images_aug'\nAUG_LABEL_PATH = 'dataset/segment/labels_aug'\nSHOW_SAVE_PATH = 'results'\nCOLORS = generate_color_map(20)\n\nENHANCEMENT_LOOP = 1\nENHANCEMENT_STRATEGY = A.Compose([\n    A.Compose([\n        A.Affine(scale=[0.5, 1.5], translate_percent=[0.0, 0.3], rotate=[-360, 360], shear=[-45, 45], keep_ratio=True, cval_mask=0, p=0.5), # Augmentation to apply affine transformations to images.\n        A.BBoxSafeRandomCrop(erosion_rate=0.2, p=0.1), # Crop a random part of the input without loss of bboxes.\n        A.D4(p=0.1), # Applies one of the eight possible D4 dihedral group transformations to a square-shaped input, maintaining the square shape. These transformations correspond to the symmetries of a square, including rotations and reflections.\n        A.ElasticTransform(p=0.1), # Elastic deformation of images as described in [Simard2003]_ (with modifications).\n        A.Flip(p=0.1), # Flip the input either horizontally, vertically or both horizontally and vertically.\n        A.GridDistortion(p=0.1), # Applies grid distortion augmentation to images, masks, and bounding boxes. This technique involves dividing the image into a grid of cells and randomly displacing the intersection points of the grid, resulting in localized distortions.\n        A.Perspective(p=0.1), # Perform a random four point perspective transform of the input.\n    ], p=1.0),\n    \n    A.Compose([\n        A.GaussNoise(p=0.1), # Apply Gaussian noise to the input image.\n        A.ISONoise(p=0.1), # Apply camera sensor noise.\n        A.ImageCompression(quality_lower=50, quality_upper=100, p=0.1), # Decreases image quality by Jpeg, WebP compression of an image.\n        A.RandomBrightnessContrast(p=0.1), # Randomly change brightness and contrast of the input image.\n        A.RandomFog(p=0.1), # Simulates fog for the image.\n        A.RandomRain(p=0.1), # Adds rain effects to an image.\n        A.RandomSnow(p=0.1), # Bleach out some pixel values imitating snow.\n        A.RandomShadow(p=0.1), # Simulates shadows for the image\n        A.RandomSunFlare(p=0.1), # Simulates Sun Flare for the image\n        A.ToGray(p=0.1), # Convert the input RGB image to grayscale\n    ], p=1.0)\n    \n    # A.OneOf([\n    #     A.GaussNoise(p=1.0), # Apply Gaussian noise to the input image.\n    #     A.ISONoise(p=1.0), # Apply camera sensor noise.\n    #     A.ImageCompression(quality_lower=50, quality_upper=100, p=1.0), # Decreases image quality by Jpeg, WebP compression of an image.\n    #     A.RandomBrightnessContrast(p=1.0), # Randomly change brightness and contrast of the input image.\n    #     A.RandomFog(p=1.0), # Simulates fog for the image.\n    #     A.RandomRain(p=1.0), # Adds rain effects to an image.\n    #     A.RandomSnow(p=1.0), # Bleach out some pixel values imitating snow.\n    #     A.RandomShadow(p=1.0), # Simulates shadows for the image\n    #     A.RandomSunFlare(p=1.0), # Simulates Sun Flare for the image\n    #     A.ToGray(p=1.0), # Convert the input RGB image to grayscale\n    # ], p=1.0),\n], is_check_shapes=False)\n\ndef draw_segments(image, mask):\n    blended_image = cv2.addWeighted(image, 0.7, COLORS[mask], 0.3, 0)\n    return blended_image\n\ndef show_labels(images_base_path, labels_base_path):\n    if os.path.exists(SHOW_SAVE_PATH):\n        shutil.rmtree(SHOW_SAVE_PATH)\n    os.makedirs(SHOW_SAVE_PATH, exist_ok=True)\n    \n    for images_name in tqdm.tqdm(os.listdir(images_base_path)):\n        file_heads, _ = os.path.splitext(images_name)\n        # images_path = f'{images_base_path}/{images_name}'\n        images_path = os.path.join(images_base_path, images_name)\n        # labels_path = f'{labels_base_path}/{file_heads}.png'\n        labels_path = os.path.join(labels_base_path, f'{file_heads}.png')\n        if os.path.exists(labels_path):\n            images = cv2.imread(images_path)\n            masks = np.array(Image.open(labels_path))\n            print(np.unique(masks))\n            images = draw_segments(images, masks)\n            cv2.imwrite(f'{SHOW_SAVE_PATH}/{images_name}', images)\n            print(f'{SHOW_SAVE_PATH}/{images_name} save success...')\n        else:\n            print(f'{labels_path} label file not found...')\n\ndef data_aug_single(images_name):\n    file_heads, postfix = os.path.splitext(images_name)\n    # images_path = f'{IMAGE_PATH}/{images_name}'\n    images_path = os.path.join(IMAGE_PATH, images_name)\n    # labels_path = f'{LABEL_PATH}/{file_heads}.jpg'\n    labels_path = os.path.join(LABEL_PATH, f'{file_heads}.jpg')\n    if os.path.exists(labels_path):\n        images = Image.open(images_path)\n        masks = np.array(Image.open(labels_path))\n        for i in range(ENHANCEMENT_LOOP):\n            # new_images_name = f'{AUG_IMAGE_PATH}/{file_heads}_{i:0>3}{postfix}'\n            new_images_name = os.path.join(AUG_IMAGE_PATH, f'{file_heads}_{i:0>3}{postfix}')\n            # new_labels_name = f'{AUG_LABEL_PATH}/{file_heads}_{i:0>3}.png'\n            new_labels_name = os.path.join(AUG_LABEL_PATH, f'{file_heads}_{i:0>3}.png')\n            try:\n                transformed = ENHANCEMENT_STRATEGY(image=np.array(images), masks=[masks])\n            except:\n                continue\n            transformed_image = transformed['image']\n            transformed_masks = transformed['masks'][0]\n            \n            cv2.imwrite(new_images_name, cv2.cvtColor(transformed_image, cv2.COLOR_RGB2BGR))\n            Image.fromarray(np.array(transformed_masks)).save(new_labels_name)\n            print(f'{new_images_name} and {new_labels_name} save success...')\n    else:\n        print(f'{labels_path} label file not found...')\n\ndef data_aug():\n    if os.path.exists(AUG_IMAGE_PATH):\n        shutil.rmtree(AUG_IMAGE_PATH)\n    if os.path.exists(AUG_LABEL_PATH):\n        shutil.rmtree(AUG_LABEL_PATH)\n        \n    os.makedirs(AUG_IMAGE_PATH, exist_ok=True)\n    os.makedirs(AUG_LABEL_PATH, exist_ok=True)\n\n    for images_name in tqdm.tqdm(os.listdir(IMAGE_PATH)):\n        data_aug_single(images_name)\n\nif __name__ == '__main__':\n    show_labels(IMAGE_PATH, LABEL_PATH)\n    # show_labels(AUG_IMAGE_PATH, AUG_LABEL_PATH)\n    \n    # data_aug()"
  },
  {
    "path": "mmdet-course/config/atss_r50_fpn_dyhead_1x_visdrone.py",
    "content": "_base_ = 'atss_r50_fpn_dyhead_1x_coco.py'\n\nmodel = dict(\n    bbox_head=dict(\n        num_classes=10\n    )\n)\n\n# 修改数据集相关配置\ndata_root = '/home/hjj/Desktop/dataset/dataset_visdrone/'\nmetainfo = {\n    'classes': ('pedestrian', 'people', 'bicycle', 'car', 'van', 'truck', 'tricycle', 'awning-tricycle', 'bus', 'motor'),\n    # 'palette': [\n    #     (220, 20, 60),\n    # ]\n}\ntrain_dataloader = dict(\n    batch_size=8,\n    num_workers=8,\n    dataset=dict(\n        data_root=data_root,\n        metainfo=metainfo,\n        ann_file='VisDrone2019-DET-train/annotations/train.json',\n        data_prefix=dict(img='VisDrone2019-DET-train/images/')))\nval_dataloader = dict(\n    batch_size=8,\n    num_workers=8,\n    dataset=dict(\n        data_root=data_root,\n        metainfo=metainfo,\n        ann_file='VisDrone2019-DET-val/annotations/val.json',\n        data_prefix=dict(img='VisDrone2019-DET-val/images/')))\ntest_dataloader = dict(\n    batch_size=8,\n    num_workers=8,\n    dataset=dict(\n        data_root=data_root,\n        metainfo=metainfo,\n        ann_file='VisDrone2019-DET-test-dev/annotations/test.json',\n        data_prefix=dict(img='VisDrone2019-DET-test-dev/images/')))\n\n# 修改评价指标相关配置\nval_evaluator = dict(ann_file=data_root + 'VisDrone2019-DET-val/annotations/val.json')\ntest_evaluator = dict(ann_file=data_root + 'VisDrone2019-DET-test-dev/annotations/test.json')\n\n# optim_wrapper = dict(type='AmpOptimWrapper')\n\ndefault_hooks = dict(logger=dict(type='LoggerHook', interval=200))\n\nload_from='atss_r50_fpn_dyhead_4x4_1x_coco_20211219_023314-eaa620c6.pth'\n\n# nohup python tools/train.py configs/dyhead/atss_r50_fpn_dyhead_1x_visdrone.py > atss-dyhead-visdrone.log 2>&1 & tail -f atss-dyhead-visdrone.log\n# python tools/test.py configs/dyhead/atss_r50_fpn_dyhead_1x_visdrone.py work_dirs/tood_r50_fpn_1x_visdrone/epoch_12.pth --show --show-dir test_save\n# python tools/test.py configs/dyhead/atss_r50_fpn_dyhead_1x_visdrone.py work_dirs/tood_r50_fpn_1x_visdrone/epoch_12.pth --tta "
  },
  {
    "path": "mmdet-course/config/cascade-rcnn_r50_fpn_1x_visdrone.py",
    "content": "_base_ = './cascade-rcnn_r50_fpn_1x_coco.py'\n\n# 我们还需要更改 head 中的 num_classes 以匹配数据集中的类别数\nmodel = dict(\n    roi_head=dict(\n        bbox_head=[\n            dict(\n                type='Shared2FCBBoxHead',\n                num_classes=10\n            ),\n            dict(\n                type='Shared2FCBBoxHead',\n                num_classes=10\n            ),\n            dict(\n                type='Shared2FCBBoxHead',\n                num_classes=10\n            ),\n        ]\n    )\n)\n\n# 修改数据集相关配置\ndata_root = '/home/hjj/Desktop/dataset/dataset_visdrone/'\nmetainfo = {\n    'classes': ('pedestrian', 'people', 'bicycle', 'car', 'van', 'truck', 'tricycle', 'awning-tricycle', 'bus', 'motor'),\n    # 'palette': [\n    #     (220, 20, 60),\n    # ]\n}\ntrain_dataloader = dict(\n    batch_size=8,\n    num_workers=8,\n    dataset=dict(\n        data_root=data_root,\n        metainfo=metainfo,\n        ann_file='VisDrone2019-DET-train/annotations/train.json',\n        data_prefix=dict(img='VisDrone2019-DET-train/images/')))\nval_dataloader = dict(\n    batch_size=8,\n    num_workers=8,\n    dataset=dict(\n        data_root=data_root,\n        metainfo=metainfo,\n        ann_file='VisDrone2019-DET-val/annotations/val.json',\n        data_prefix=dict(img='VisDrone2019-DET-val/images/')))\ntest_dataloader = dict(\n    batch_size=8,\n    num_workers=8,\n    dataset=dict(\n        data_root=data_root,\n        metainfo=metainfo,\n        ann_file='VisDrone2019-DET-test-dev/annotations/test.json',\n        data_prefix=dict(img='VisDrone2019-DET-test-dev/images/')))\n\n# 修改评价指标相关配置\nval_evaluator = dict(ann_file=data_root + 'VisDrone2019-DET-val/annotations/val.json')\ntest_evaluator = dict(ann_file=data_root + 'VisDrone2019-DET-test-dev/annotations/test.json')\n\n# optim_wrapper = dict(type='AmpOptimWrapper')\n\ndefault_hooks = dict(logger=dict(type='LoggerHook', interval=200))\n\nload_from='cascade_rcnn_r50_fpn_1x_coco_20200316-3dc56deb.pth'\n\n# nohup python tools/train.py configs/cascade_rcnn/cascade-rcnn_r50_fpn_1x_visdrone.py > cascade-rcnn-visdrone.log 2>&1 & tail -f cascade-rcnn-visdrone.log\n# python tools/test.py configs/cascade_rcnn/cascade-rcnn_r50_fpn_1x_visdrone.py work_dirs/cascade-rcnn_r50_fpn_1x_visdrone/epoch_12.pth --show --show-dir test_save\n# python tools/test.py configs/cascade_rcnn/cascade-rcnn_r50_fpn_1x_visdrone.py work_dirs/cascade-rcnn_r50_fpn_1x_visdrone/epoch_12.pth --tta "
  },
  {
    "path": "mmdet-course/config/ddq-detr-4scale_r50_8xb2-12e_visdrone.py",
    "content": "_base_ = 'ddq-detr-4scale_r50_8xb2-12e_coco.py'\n\nmodel = dict(\n    bbox_head=dict(\n        type='DDQDETRHead',\n        num_classes=10\n    )\n)\n\n# 修改数据集相关配置\ndata_root = '/home/hjj/Desktop/dataset/dataset_visdrone/'\nmetainfo = {\n    'classes': ('pedestrian', 'people', 'bicycle', 'car', 'van', 'truck', 'tricycle', 'awning-tricycle', 'bus', 'motor'),\n    # 'palette': [\n    #     (220, 20, 60),\n    # ]\n}\ntrain_dataloader = dict(\n    batch_size=2,\n    num_workers=2,\n    dataset=dict(\n        data_root=data_root,\n        metainfo=metainfo,\n        ann_file='VisDrone2019-DET-train/annotations/train.json',\n        data_prefix=dict(img='VisDrone2019-DET-train/images/')))\nval_dataloader = dict(\n    batch_size=2,\n    num_workers=2,\n    dataset=dict(\n        data_root=data_root,\n        metainfo=metainfo,\n        ann_file='VisDrone2019-DET-val/annotations/val.json',\n        data_prefix=dict(img='VisDrone2019-DET-val/images/')))\ntest_dataloader = dict(\n    batch_size=2,\n    num_workers=2,\n    dataset=dict(\n        data_root=data_root,\n        metainfo=metainfo,\n        ann_file='VisDrone2019-DET-test-dev/annotations/test.json',\n        data_prefix=dict(img='VisDrone2019-DET-test-dev/images/')))\n\n# 修改评价指标相关配置\nval_evaluator = dict(ann_file=data_root + 'VisDrone2019-DET-val/annotations/val.json')\ntest_evaluator = dict(ann_file=data_root + 'VisDrone2019-DET-test-dev/annotations/test.json')\n\n# optim_wrapper = dict(type='AmpOptimWrapper')\n\ndefault_hooks = dict(logger=dict(type='LoggerHook', interval=1000))\n\nload_from='ddq-detr-4scale_r50_8xb2-12e_coco_20230809_170711-42528127.pth'\n\n# nohup python tools/train.py configs/ddq/ddq-detr-4scale_r50_8xb2-12e_visdrone.py > ddq-visdrone.log 2>&1 & tail -f ddq-visdrone.log\n# python tools/test.py configs/ddq/ddq-detr-4scale_r50_8xb2-12e_visdrone.py work_dirs/faster-rcnn_r50_fpn_ciou_1x_visdrone/epoch_12.pth --show --show-dir test_save\n# python tools/test.py configs/ddq/ddq-detr-4scale_r50_8xb2-12e_visdrone.py work_dirs/faster-rcnn_r50_fpn_ciou_1x_visdrone/epoch_12.pth --tta "
  },
  {
    "path": "mmdet-course/config/dino-4scale_r50_8xb2-12e_visdrone.py",
    "content": "_base_ = 'dino-4scale_r50_8xb2-12e_coco.py'\n\nmodel = dict(\n    bbox_head=dict(\n        type='DINOHead',\n        num_classes=10,\n    )\n)\n\n# 修改数据集相关配置\ndata_root = '/home/hjj/Desktop/dataset/dataset_visdrone/'\nmetainfo = {\n    'classes': ('pedestrian', 'people', 'bicycle', 'car', 'van', 'truck', 'tricycle', 'awning-tricycle', 'bus', 'motor'),\n    # 'palette': [\n    #     (220, 20, 60),\n    # ]\n}\ntrain_dataloader = dict(\n    batch_size=4,\n    num_workers=4,\n    dataset=dict(\n        data_root=data_root,\n        metainfo=metainfo,\n        ann_file='VisDrone2019-DET-train/annotations/train.json',\n        data_prefix=dict(img='VisDrone2019-DET-train/images/')))\nval_dataloader = dict(\n    batch_size=4,\n    num_workers=4,\n    dataset=dict(\n        data_root=data_root,\n        metainfo=metainfo,\n        ann_file='VisDrone2019-DET-val/annotations/val.json',\n        data_prefix=dict(img='VisDrone2019-DET-val/images/')))\ntest_dataloader = dict(\n    batch_size=4,\n    num_workers=4,\n    dataset=dict(\n        data_root=data_root,\n        metainfo=metainfo,\n        ann_file='VisDrone2019-DET-test-dev/annotations/test.json',\n        data_prefix=dict(img='VisDrone2019-DET-test-dev/images/')))\n\n# 修改评价指标相关配置\nval_evaluator = dict(ann_file=data_root + 'VisDrone2019-DET-val/annotations/val.json')\ntest_evaluator = dict(ann_file=data_root + 'VisDrone2019-DET-test-dev/annotations/test.json')\n\n# optim_wrapper = dict(type='AmpOptimWrapper')\n\ndefault_hooks = dict(logger=dict(type='LoggerHook', interval=500))\n\nload_from='dino-4scale_r50_8xb2-12e_coco_20221202_182705-55b2bba2.pth'\n\n# nohup python tools/train.py configs/dino/dino-4scale_r50_8xb2-12e_visdrone.py > dino-visdrone.log 2>&1 & tail -f dino-visdrone.log\n# python tools/test.py configs/dino/dino-4scale_r50_8xb2-12e_visdrone.py work_dirs/tood_r50_fpn_1x_visdrone/epoch_12.pth --show --show-dir test_save\n# python tools/test.py configs/dino/dino-4scale_r50_8xb2-12e_visdrone.py work_dirs/tood_r50_fpn_1x_visdrone/epoch_12.pth --tta "
  },
  {
    "path": "mmdet-course/config/faster-rcnn_r50_fpn_ciou_1x_visdrone.py",
    "content": "_base_ = 'faster-rcnn_r50_fpn_ciou_1x_coco.py'\n\n# 我们还需要更改 head 中的 num_classes 以匹配数据集中的类别数\nmodel = dict(\n    roi_head=dict(\n        bbox_head=dict(\n            type='Shared2FCBBoxHead',\n            num_classes=10\n        )\n    )\n)\n\n# 修改数据集相关配置\ndata_root = '/home/hjj/Desktop/dataset/dataset_visdrone/'\nmetainfo = {\n    'classes': ('pedestrian', 'people', 'bicycle', 'car', 'van', 'truck', 'tricycle', 'awning-tricycle', 'bus', 'motor'),\n    # 'palette': [\n    #     (220, 20, 60),\n    # ]\n}\ntrain_dataloader = dict(\n    batch_size=8,\n    num_workers=8,\n    dataset=dict(\n        data_root=data_root,\n        metainfo=metainfo,\n        ann_file='VisDrone2019-DET-train/annotations/train.json',\n        data_prefix=dict(img='VisDrone2019-DET-train/images/')))\nval_dataloader = dict(\n    batch_size=8,\n    num_workers=8,\n    dataset=dict(\n        data_root=data_root,\n        metainfo=metainfo,\n        ann_file='VisDrone2019-DET-val/annotations/val.json',\n        data_prefix=dict(img='VisDrone2019-DET-val/images/')))\ntest_dataloader = dict(\n    batch_size=8,\n    num_workers=8,\n    dataset=dict(\n        data_root=data_root,\n        metainfo=metainfo,\n        ann_file='VisDrone2019-DET-test-dev/annotations/test.json',\n        data_prefix=dict(img='VisDrone2019-DET-test-dev/images/')))\n\n# 修改评价指标相关配置\nval_evaluator = dict(ann_file=data_root + 'VisDrone2019-DET-val/annotations/val.json')\ntest_evaluator = dict(ann_file=data_root + 'VisDrone2019-DET-test-dev/annotations/test.json')\n\n# optim_wrapper = dict(type='AmpOptimWrapper')\n\ndefault_hooks = dict(logger=dict(type='LoggerHook', interval=200))\n\nload_from='faster_rcnn_r50_fpn_giou_1x_coco-0eada910.pth'\n\n# nohup python tools/train.py configs/faster_rcnn/faster-rcnn_r50_fpn_ciou_1x_visdrone.py > faster-rcnn-visdrone.log 2>&1 & tail -f faster-rcnn-visdrone.log\n# python tools/test.py configs/faster_rcnn/faster-rcnn_r50_fpn_ciou_1x_visdrone.py work_dirs/faster-rcnn_r50_fpn_ciou_1x_visdrone/epoch_12.pth --show --show-dir test_save\n# python tools/test.py configs/faster_rcnn/faster-rcnn_r50_fpn_ciou_1x_visdrone.py work_dirs/faster-rcnn_r50_fpn_ciou_1x_visdrone/epoch_12.pth --tta "
  },
  {
    "path": "mmdet-course/config/gfl_r50_fpn_1x_visdrone.py",
    "content": "_base_ = 'gfl_r50_fpn_1x_coco.py'\n\n# 我们还需要更改 head 中的 num_classes 以匹配数据集中的类别数\nmodel = dict(\n    bbox_head=dict(\n        num_classes=10\n    )\n)\n\n# 修改数据集相关配置\ndata_root = '/home/hjj/Desktop/dataset/dataset_visdrone/'\nmetainfo = {\n    'classes': ('pedestrian', 'people', 'bicycle', 'car', 'van', 'truck', 'tricycle', 'awning-tricycle', 'bus', 'motor'),\n    # 'palette': [\n    #     (220, 20, 60),\n    # ]\n}\ntrain_dataloader = dict(\n    batch_size=8,\n    num_workers=8,\n    dataset=dict(\n        data_root=data_root,\n        metainfo=metainfo,\n        ann_file='VisDrone2019-DET-train/annotations/train.json',\n        data_prefix=dict(img='VisDrone2019-DET-train/images/')))\nval_dataloader = dict(\n    batch_size=8,\n    num_workers=8,\n    dataset=dict(\n        data_root=data_root,\n        metainfo=metainfo,\n        ann_file='VisDrone2019-DET-val/annotations/val.json',\n        data_prefix=dict(img='VisDrone2019-DET-val/images/')))\ntest_dataloader = dict(\n    batch_size=8,\n    num_workers=8,\n    dataset=dict(\n        data_root=data_root,\n        metainfo=metainfo,\n        ann_file='VisDrone2019-DET-test-dev/annotations/test.json',\n        data_prefix=dict(img='VisDrone2019-DET-test-dev/images/')))\n\n# 修改评价指标相关配置\nval_evaluator = dict(ann_file=data_root + 'VisDrone2019-DET-val/annotations/val.json')\ntest_evaluator = dict(ann_file=data_root + 'VisDrone2019-DET-test-dev/annotations/test.json')\n\n# optim_wrapper = dict(type='AmpOptimWrapper')\n\ndefault_hooks = dict(logger=dict(type='LoggerHook', interval=200))\nload_from='gfl_r50_fpn_1x_coco_20200629_121244-25944287.pth'\n\n# nohup python tools/train.py configs/gfl/gfl_r50_fpn_1x_visdrone.py > gfl-visdrone.log 2>&1 & tail -f gfl-visdrone.log\n# python tools/test.py configs/gfl/gfl_r50_fpn_1x_visdrone.py work_dirs/gfl_r50_fpn_1x_visdrone/epoch_12.pth --show --show-dir test_save\n# python tools/test.py configs/gfl/gfl_r50_fpn_1x_visdrone.py work_dirs/gfl_r50_fpn_1x_visdrone/epoch_12.pth --tta \n# python tools/analysis_tools/get_flops.py configs/gfl/gfl_r50_fpn_1x_visdrone.py"
  },
  {
    "path": "mmdet-course/config/retinanet_r50_fpn_1x_visdrone.py",
    "content": "_base_ = 'retinanet_r50_fpn_1x_coco.py'\n\n# 我们还需要更改 head 中的 num_classes 以匹配数据集中的类别数\nmodel = dict(\n    bbox_head=dict(\n        num_classes=10\n    )\n)\n\n# 修改数据集相关配置\ndata_root = '/home/hjj/Desktop/dataset/dataset_visdrone/'\nmetainfo = {\n    'classes': ('pedestrian', 'people', 'bicycle', 'car', 'van', 'truck', 'tricycle', 'awning-tricycle', 'bus', 'motor'),\n    # 'palette': [\n    #     (220, 20, 60),\n    # ]\n}\ntrain_dataloader = dict(\n    batch_size=8,\n    num_workers=8,\n    dataset=dict(\n        data_root=data_root,\n        metainfo=metainfo,\n        ann_file='VisDrone2019-DET-train/annotations/train.json',\n        data_prefix=dict(img='VisDrone2019-DET-train/images/')))\nval_dataloader = dict(\n    batch_size=8,\n    num_workers=8,\n    dataset=dict(\n        data_root=data_root,\n        metainfo=metainfo,\n        ann_file='VisDrone2019-DET-val/annotations/val.json',\n        data_prefix=dict(img='VisDrone2019-DET-val/images/')))\ntest_dataloader = dict(\n    batch_size=8,\n    num_workers=8,\n    dataset=dict(\n        data_root=data_root,\n        metainfo=metainfo,\n        ann_file='VisDrone2019-DET-test-dev/annotations/test.json',\n        data_prefix=dict(img='VisDrone2019-DET-test-dev/images/')))\n\n# 修改评价指标相关配置\nval_evaluator = dict(ann_file=data_root + 'VisDrone2019-DET-val/annotations/val.json')\ntest_evaluator = dict(ann_file=data_root + 'VisDrone2019-DET-test-dev/annotations/test.json')\n\n# optim_wrapper = dict(type='AmpOptimWrapper')\n\ndefault_hooks = dict(logger=dict(type='LoggerHook', interval=200))\n\nload_from='retinanet_r50_fpn_1x_coco_20200130-c2398f9e.pth'\n\n# nohup python tools/train.py configs/retinanet/retinanet_r50_fpn_1x_visdrone.py > retinanet-visdrone.log 2>&1 & tail -f retinanet-visdrone.log\n# python tools/test.py configs/retinanet/retinanet_r50_fpn_1x_visdrone.py work_dirs/tood_r50_fpn_1x_visdrone/epoch_12.pth --show --show-dir test_save\n# python tools/test.py configs/retinanet/retinanet_r50_fpn_1x_visdrone.py work_dirs/retinanet_r50_fpn_1x_visdrone/epoch_12.pth --tta \n# python tools/analysis_tools/get_flops.py configs/retinanet/retinanet_r50_fpn_1x_visdrone.py"
  },
  {
    "path": "mmdet-course/config/rtmdet_tiny_8xb32-300e_visdrone.py",
    "content": "_base_ = 'rtmdet_tiny_8xb32-300e_coco.py'\n\nmodel = dict(\n    bbox_head=dict(\n        num_classes=10\n    )\n)\n\n# 修改数据集相关配置\ndata_root = '/home/hjj/Desktop/dataset/dataset_visdrone/'\nmetainfo = {\n    'classes': ('pedestrian', 'people', 'bicycle', 'car', 'van', 'truck', 'tricycle', 'awning-tricycle', 'bus', 'motor'),\n    # 'palette': [\n    #     (220, 20, 60),\n    # ]\n}\ntrain_dataloader = dict(\n    batch_size=16,\n    num_workers=8,\n    dataset=dict(\n        data_root=data_root,\n        metainfo=metainfo,\n        ann_file='VisDrone2019-DET-train/annotations/train.json',\n        data_prefix=dict(img='VisDrone2019-DET-train/images/')))\nval_dataloader = dict(\n    batch_size=16,\n    num_workers=8,\n    dataset=dict(\n        data_root=data_root,\n        metainfo=metainfo,\n        ann_file='VisDrone2019-DET-val/annotations/val.json',\n        data_prefix=dict(img='VisDrone2019-DET-val/images/')))\ntest_dataloader = dict(\n    batch_size=16,\n    num_workers=8,\n    dataset=dict(\n        data_root=data_root,\n        metainfo=metainfo,\n        ann_file='VisDrone2019-DET-test-dev/annotations/test.json',\n        data_prefix=dict(img='VisDrone2019-DET-test-dev/images/')))\n\n# 修改评价指标相关配置\nval_evaluator = dict(ann_file=data_root + 'VisDrone2019-DET-val/annotations/val.json')\ntest_evaluator = dict(ann_file=data_root + 'VisDrone2019-DET-test-dev/annotations/test.json')\n\n# optim_wrapper = dict(type='AmpOptimWrapper')\n\ndefault_hooks = dict(logger=dict(type='LoggerHook', interval=200))\nload_from='rtmdet_tiny_8xb32-300e_coco_20220902_112414-78e30dcc.pth'\n\n# nohup python tools/train.py configs/rtmdet/rtmdet_tiny_8xb32-300e_visdrone.py > rtmdet-tiny-visdrone.log 2>&1 & tail -f rtmdet-tiny-visdrone.log\n# python tools/test.py configs/rtmdet/rtmdet_tiny_8xb32-300e_visdrone.py work_dirs/rtmdet_tiny_8xb32-300e_visdrone/epoch_300.pth --show --show-dir test_save\n# python tools/test.py configs/rtmdet/rtmdet_tiny_8xb32-300e_visdrone.py work_dirs/rtmdet_tiny_8xb32-300e_visdrone/epoch_300.pth --tta \n# python tools/analysis_tools/get_flops.py configs/rtmdet/rtmdet_tiny_8xb32-300e_visdrone.py"
  },
  {
    "path": "mmdet-course/config/tood_r50_fpn_1x_visdrone.py",
    "content": "_base_ = './tood_r50_fpn_1x_coco.py'\n\n# 我们还需要更改 head 中的 num_classes 以匹配数据集中的类别数\nmodel = dict(\n    bbox_head=dict(\n        num_classes=10\n    )\n)\n\n# 修改数据集相关配置\ndata_root = '/home/hjj/Desktop/dataset/dataset_visdrone/'\nmetainfo = {\n    'classes': ('pedestrian', 'people', 'bicycle', 'car', 'van', 'truck', 'tricycle', 'awning-tricycle', 'bus', 'motor'),\n    # 'palette': [\n    #     (220, 20, 60),\n    # ]\n}\ntrain_dataloader = dict(\n    batch_size=8,\n    num_workers=8,\n    dataset=dict(\n        data_root=data_root,\n        metainfo=metainfo,\n        ann_file='VisDrone2019-DET-train/annotations/train.json',\n        data_prefix=dict(img='VisDrone2019-DET-train/images/')))\nval_dataloader = dict(\n    batch_size=8,\n    num_workers=8,\n    dataset=dict(\n        data_root=data_root,\n        metainfo=metainfo,\n        ann_file='VisDrone2019-DET-val/annotations/val.json',\n        data_prefix=dict(img='VisDrone2019-DET-val/images/')))\ntest_dataloader = dict(\n    batch_size=8,\n    num_workers=8,\n    dataset=dict(\n        data_root=data_root,\n        metainfo=metainfo,\n        ann_file='VisDrone2019-DET-test-dev/annotations/test.json',\n        data_prefix=dict(img='VisDrone2019-DET-test-dev/images/')))\n\n# 修改评价指标相关配置\nval_evaluator = dict(ann_file=data_root + 'VisDrone2019-DET-val/annotations/val.json')\ntest_evaluator = dict(ann_file=data_root + 'VisDrone2019-DET-test-dev/annotations/test.json')\n\n# optim_wrapper = dict(type='AmpOptimWrapper')\n\ndefault_hooks = dict(logger=dict(type='LoggerHook', interval=200))\n\nload_from='tood_r50_fpn_1x_coco_20211210_103425-20e20746.pth'\n\n# nohup python tools/train.py configs/tood/tood_r50_fpn_1x_visdrone.py > tood-visdrone.log 2>&1 & tail -f tood-visdrone.log\n# python tools/test.py configs/tood/tood_r50_fpn_1x_visdrone.py work_dirs/tood_r50_fpn_1x_visdrone/epoch_12.pth --show --show-dir test_save\n# python tools/test.py configs/tood/tood_r50_fpn_1x_visdrone.py work_dirs/tood_r50_fpn_1x_visdrone/epoch_12.pth --tta "
  },
  {
    "path": "mmdet-course/config/yolox_tiny_8xb8-300e_visdrone.py",
    "content": "_base_ = './yolox_tiny_8xb8-300e_coco.py'\n\n# 我们还需要更改 head 中的 num_classes 以匹配数据集中的类别数\nmodel = dict(\n    bbox_head=dict(\n        num_classes=10\n    )\n)\n\n# 修改数据集相关配置\n# dataset settings\ndata_root = '/home/hjj/Desktop/dataset/dataset_visdrone/'\ndataset_type = 'CocoDataset'\nmetainfo = {\n    'classes': ('pedestrian', 'people', 'bicycle', 'car', 'van', 'truck', 'tricycle', 'awning-tricycle', 'bus', 'motor'),\n    # 'palette': [\n    #     (220, 20, 60),\n    # ]\n}\n\n# Example to use different file client\n# Method 1: simply set the data root and let the file I/O module\n# automatically infer from prefix (not support LMDB and Memcache yet)\n\n# data_root = 's3://openmmlab/datasets/detection/coco/'\n\n# Method 2: Use `backend_args`, `file_client_args` in versions before 3.0.0rc6\n# backend_args = dict(\n#     backend='petrel',\n#     path_mapping=dict({\n#         './data/': 's3://openmmlab/datasets/detection/',\n#         'data/': 's3://openmmlab/datasets/detection/'\n#     }))\nbackend_args = None\n\nimg_scale = (640, 640)  # width, height\n\ntrain_pipeline = [\n    dict(type='Mosaic', img_scale=img_scale, pad_val=114.0),\n    dict(\n        type='RandomAffine',\n        scaling_ratio_range=(0.1, 2),\n        # img_scale is (width, height)\n        border=(-img_scale[0] // 2, -img_scale[1] // 2)),\n    dict(\n        type='MixUp',\n        img_scale=img_scale,\n        ratio_range=(0.8, 1.6),\n        pad_val=114.0),\n    dict(type='YOLOXHSVRandomAug'),\n    dict(type='RandomFlip', prob=0.5),\n    # According to the official implementation, multi-scale\n    # training is not considered here but in the\n    # 'mmdet/models/detectors/yolox.py'.\n    # Resize and Pad are for the last 15 epochs when Mosaic,\n    # RandomAffine, and MixUp are closed by YOLOXModeSwitchHook.\n    dict(type='Resize', scale=img_scale, keep_ratio=True),\n    dict(\n        type='Pad',\n        pad_to_square=True,\n        # If the image is three-channel, the pad value needs\n        # to be set separately for each channel.\n        pad_val=dict(img=(114.0, 114.0, 114.0))),\n    dict(type='FilterAnnotations', min_gt_bbox_wh=(1, 1), keep_empty=False),\n    dict(type='PackDetInputs')\n]\n\ntrain_dataset = dict(\n    # use MultiImageMixDataset wrapper to support mosaic and mixup\n    type='MultiImageMixDataset',\n    dataset=dict(\n        type=dataset_type,\n        data_root=data_root,\n        metainfo=metainfo,\n        ann_file='VisDrone2019-DET-train/annotations/train.json',\n        data_prefix=dict(img='VisDrone2019-DET-train/images/'),\n        pipeline=[\n            dict(type='LoadImageFromFile', backend_args=backend_args),\n            dict(type='LoadAnnotations', with_bbox=True)\n        ],\n        filter_cfg=dict(filter_empty_gt=False, min_size=32),\n        backend_args=backend_args),\n    pipeline=train_pipeline)\n\ntest_pipeline = [\n    dict(type='LoadImageFromFile', backend_args=backend_args),\n    dict(type='Resize', scale=img_scale, keep_ratio=True),\n    dict(\n        type='Pad',\n        pad_to_square=True,\n        pad_val=dict(img=(114.0, 114.0, 114.0))),\n    dict(type='LoadAnnotations', with_bbox=True),\n    dict(\n        type='PackDetInputs',\n        meta_keys=('img_id', 'img_path', 'ori_shape', 'img_shape',\n                   'scale_factor'))\n]\n\ntrain_dataloader = dict(\n    batch_size=16,\n    num_workers=8,\n    persistent_workers=True,\n    sampler=dict(type='DefaultSampler', shuffle=True),\n    dataset=train_dataset)\nval_dataloader = dict(\n    batch_size=16,\n    num_workers=8,\n    persistent_workers=True,\n    drop_last=False,\n    sampler=dict(type='DefaultSampler', shuffle=False),\n    dataset=dict(\n        type=dataset_type,\n        data_root=data_root,\n        metainfo=metainfo,\n        ann_file='VisDrone2019-DET-val/annotations/val.json',\n        data_prefix=dict(img='VisDrone2019-DET-val/images/'),\n        test_mode=True,\n        pipeline=test_pipeline,\n        backend_args=backend_args))\ntest_dataloader = dict(\n    batch_size=16,\n    num_workers=8,\n    persistent_workers=True,\n    drop_last=False,\n    sampler=dict(type='DefaultSampler', shuffle=False),\n    dataset=dict(\n        type=dataset_type,\n        data_root=data_root,\n        metainfo=metainfo,\n        ann_file='VisDrone2019-DET-test-dev/annotations/test.json',\n        data_prefix=dict(img='VisDrone2019-DET-test-dev/images/'),\n        test_mode=True,\n        pipeline=test_pipeline,\n        backend_args=backend_args))\n\nval_evaluator = dict(\n    type='CocoMetric',\n    ann_file=data_root + 'VisDrone2019-DET-val/annotations/val.json',\n    metric='bbox',\n    backend_args=backend_args)\ntest_evaluator = dict(\n    type='CocoMetric',\n    ann_file=data_root + 'VisDrone2019-DET-test-dev/annotations/test.json',\n    metric='bbox',\n    backend_args=backend_args)\n\ndefault_hooks = dict(logger=dict(type='LoggerHook', interval=200))\n\nload_form='yolox_tiny_8x8_300e_coco_20211124_171234-b4047906.pth'\n\n# nohup python tools/train.py configs/yolox/yolox_tiny_8xb8-300e_visdrone.py > yolox-tiny-visdrone.log 2>&1 & tail -f yolox-tiny-visdrone.log\n# python tools/test.py configs/yolox/yolox_tiny_8xb8-300e_visdrone.py work_dirs/yolox_tiny_8xb8-300e_visdrone/epoch_300.pth --show --show-dir test_save\n# python tools/test.py configs/yolox/yolox_tiny_8xb8-300e_visdrone.py work_dirs/yolox_tiny_8xb8-300e_visdrone/epoch_300.pth --tta \n# python tools/analysis_tools/get_flops.py configs/yolox/yolox_tiny_8xb8-300e_visdrone.py"
  },
  {
    "path": "mmdet-course/mmdet2yolo.py",
    "content": "import os, torch, cv2, math, tqdm, time, shutil, argparse, json, pickle\nimport numpy as np\nfrom prettytable import PrettyTable\n\ndef clip_boxes(boxes, shape):\n    # Clip boxes (xyxy) to image shape (height, width)\n    if isinstance(boxes, torch.Tensor):  # faster individually\n        boxes[..., 0].clamp_(0, shape[1])  # x1\n        boxes[..., 1].clamp_(0, shape[0])  # y1\n        boxes[..., 2].clamp_(0, shape[1])  # x2\n        boxes[..., 3].clamp_(0, shape[0])  # y2\n    else:  # np.array (faster grouped)\n        boxes[..., [0, 2]] = boxes[..., [0, 2]].clip(0, shape[1])  # x1, x2\n        boxes[..., [1, 3]] = boxes[..., [1, 3]].clip(0, shape[0])  # y1, y2\n\ndef scale_boxes(img1_shape, boxes, img0_shape, ratio_pad=None):\n    # Rescale boxes (xyxy) from img1_shape to img0_shape\n    if ratio_pad is None:  # calculate from img0_shape\n        gain = min(img1_shape[0] / img0_shape[0], img1_shape[1] / img0_shape[1])  # gain  = old / new\n        pad = (img1_shape[1] - img0_shape[1] * gain) / 2, (img1_shape[0] - img0_shape[0] * gain) / 2  # wh padding\n    else:\n        gain = ratio_pad[0][0]\n        pad = ratio_pad[1]\n\n    boxes[..., [0, 2]] -= pad[0]  # x padding\n    boxes[..., [1, 3]] -= pad[1]  # y padding\n    boxes[..., :4] /= gain\n    clip_boxes(boxes, img0_shape)\n    return boxes\n\ndef box_iou(box1, box2, eps=1e-7):\n    \"\"\"\n    Calculate intersection-over-union (IoU) of boxes. Both sets of boxes are expected to be in (x1, y1, x2, y2) format.\n    Based on https://github.com/pytorch/vision/blob/master/torchvision/ops/boxes.py\n\n    Args:\n        box1 (torch.Tensor): A tensor of shape (N, 4) representing N bounding boxes.\n        box2 (torch.Tensor): A tensor of shape (M, 4) representing M bounding boxes.\n        eps (float, optional): A small value to avoid division by zero. Defaults to 1e-7.\n\n    Returns:\n        (torch.Tensor): An NxM tensor containing the pairwise IoU values for every element in box1 and box2.\n    \"\"\"\n\n    # NOTE: Need .float() to get accurate iou values\n    # inter(N,M) = (rb(N,M,2) - lt(N,M,2)).clamp(0).prod(2)\n    (a1, a2), (b1, b2) = box1.float().unsqueeze(1).chunk(2, 2), box2.float().unsqueeze(0).chunk(2, 2)\n    inter = (torch.min(a2, b2) - torch.max(a1, b1)).clamp_(0).prod(2)\n\n    # IoU = inter / (area1 + area2 - inter)\n    return inter / ((a2 - a1).prod(2) + (b2 - b1).prod(2) - inter + eps)\n\ndef process_batch(detections, labels, iouv):\n    \"\"\"\n    Return correct prediction matrix\n    Arguments:\n        detections (array[N, 6]), x1, y1, x2, y2, conf, class\n        labels (array[M, 5]), class, x1, y1, x2, y2\n    Returns:\n        correct (array[N, 10]), for 10 IoU levels\n    \"\"\"\n    correct = np.zeros((detections.shape[0], iouv.shape[0])).astype(bool)\n    iou = box_iou(labels[:, 1:], detections[:, :4])\n    correct_class = labels[:, 0:1] == detections[:, 5]\n    for i in range(len(iouv)):\n        x = torch.where((iou >= iouv[i]) & correct_class)  # IoU > threshold and classes match\n        if x[0].shape[0]:\n            matches = torch.cat((torch.stack(x, 1), iou[x[0], x[1]][:, None]), 1).cpu().numpy()  # [label, detect, iou]\n            if x[0].shape[0] > 1:\n                matches = matches[matches[:, 2].argsort()[::-1]]\n                matches = matches[np.unique(matches[:, 1], return_index=True)[1]]\n                # matches = matches[matches[:, 2].argsort()[::-1]]\n                matches = matches[np.unique(matches[:, 0], return_index=True)[1]]\n            correct[matches[:, 1].astype(int), i] = True\n    return torch.tensor(correct, dtype=torch.bool, device=iouv.device)\n\ndef smooth(y, f=0.05):\n    # Box filter of fraction f\n    nf = round(len(y) * f * 2) // 2 + 1  # number of filter elements (must be odd)\n    p = np.ones(nf // 2)  # ones padding\n    yp = np.concatenate((p * y[0], y, p * y[-1]), 0)  # y padded\n    return np.convolve(yp, np.ones(nf) / nf, mode='valid')  # y-smoothed\n\n\ndef ap_per_class(tp, conf, pred_cls, target_cls, plot=False, save_dir='.', names=(), eps=1e-16, prefix=''):\n    \"\"\" Compute the average precision, given the recall and precision curves.\n    Source: https://github.com/rafaelpadilla/Object-Detection-Metrics.\n    # Arguments\n        tp:  True positives (nparray, nx1 or nx10).\n        conf:  Objectness value from 0-1 (nparray).\n        pred_cls:  Predicted object classes (nparray).\n        target_cls:  True object classes (nparray).\n        plot:  Plot precision-recall curve at mAP@0.5\n        save_dir:  Plot save directory\n    # Returns\n        The average precision as computed in py-faster-rcnn.\n    \"\"\"\n\n    # Sort by objectness\n    i = np.argsort(-conf)\n    tp, conf, pred_cls = tp[i], conf[i], pred_cls[i]\n\n    # Find unique classes\n    unique_classes, nt = np.unique(target_cls, return_counts=True)\n    nc = unique_classes.shape[0]  # number of classes, number of detections\n\n    # Create Precision-Recall curve and compute AP for each class\n    px, py = np.linspace(0, 1, 1000), []  # for plotting\n    ap, p, r = np.zeros((nc, tp.shape[1])), np.zeros((nc, 1000)), np.zeros((nc, 1000))\n    for ci, c in enumerate(unique_classes):\n        i = pred_cls == c\n        n_l = nt[ci]  # number of labels\n        n_p = i.sum()  # number of predictions\n        if n_p == 0 or n_l == 0:\n            continue\n\n        # Accumulate FPs and TPs\n        fpc = (1 - tp[i]).cumsum(0)\n        tpc = tp[i].cumsum(0)\n\n        # Recall\n        recall = tpc / (n_l + eps)  # recall curve\n        r[ci] = np.interp(-px, -conf[i], recall[:, 0], left=0)  # negative x, xp because xp decreases\n\n        # Precision\n        precision = tpc / (tpc + fpc)  # precision curve\n        p[ci] = np.interp(-px, -conf[i], precision[:, 0], left=1)  # p at pr_score\n\n        # AP from recall-precision curve\n        for j in range(tp.shape[1]):\n            ap[ci, j], mpre, mrec = compute_ap(recall[:, j], precision[:, j])\n            if plot and j == 0:\n                py.append(np.interp(px, mrec, mpre))  # precision at mAP@0.5\n\n    # Compute F1 (harmonic mean of precision and recall)\n    f1 = 2 * p * r / (p + r + eps)\n\n    i = smooth(f1.mean(0), 0.1).argmax()  # max F1 index\n    p, r, f1 = p[:, i], r[:, i], f1[:, i]\n    tp = (r * nt).round()  # true positives\n    fp = (tp / (p + eps) - tp).round()  # false positives\n    return tp, fp, p, r, f1, ap, unique_classes.astype(int)\n\n\ndef compute_ap(recall, precision):\n    \"\"\" Compute the average precision, given the recall and precision curves\n    # Arguments\n        recall:    The recall curve (list)\n        precision: The precision curve (list)\n    # Returns\n        Average precision, precision curve, recall curve\n    \"\"\"\n\n    # Append sentinel values to beginning and end\n    mrec = np.concatenate(([0.0], recall, [1.0]))\n    mpre = np.concatenate(([1.0], precision, [0.0]))\n\n    # Compute the precision envelope\n    mpre = np.flip(np.maximum.accumulate(np.flip(mpre)))\n\n    # Integrate area under curve\n    method = 'interp'  # methods: 'continuous', 'interp'\n    if method == 'interp':\n        x = np.linspace(0, 1, 101)  # 101-point interp (COCO)\n        ap = np.trapz(np.interp(x, mrec, mpre), x)  # integrate\n    else:  # 'continuous'\n        i = np.where(mrec[1:] != mrec[:-1])[0]  # points where x axis (recall) changes\n        ap = np.sum((mrec[i + 1] - mrec[i]) * mpre[i + 1])  # area under curve\n\n    return ap, mpre, mrec\n\ndef parse_opt():\n    parser = argparse.ArgumentParser()\n    parser.add_argument('--label_coco', type=str, default='/home/hjj/Desktop/dataset/dataset_visdrone/test_coco.json', help='label coco path')\n    parser.add_argument('--pred_coco', type=str, default='runs/val/exp/predictions.json', help='pred coco path')\n    # parser.add_argument('--pred_coco', type=str, default='/home/hjj/Desktop/github_code/mmdetection-visdrone/work_dirs/dino-4scale_r50_8xb2-12e_visdrone/test/prediction.pickle', help='pred coco path')\n    parser.add_argument('--iou', type=float, default=0.7, help='iou threshold')\n    parser.add_argument('--conf', type=float, default=0.001, help='conf threshold')\n    opt = parser.parse_known_args()[0]\n    return opt\n    \nif __name__ == '__main__':\n    opt = parse_opt()\n    \n    iouv = torch.linspace(0.5, 0.95, 10)  # iou vector for mAP@0.5:0.95\n    niou = iouv.numel()\n    stats = []\n    \n    label_coco_json_path, pred_coco_json_path = opt.label_coco, opt.pred_coco\n    with open(label_coco_json_path) as f:\n        label = json.load(f)\n    \n    classes = []\n    for data in label['categories']:\n        classes.append(data['name'])\n    \n    image_id_hw_dict = {}\n    for data in label['images']:\n        image_id_hw_dict[data['id']] = [data['height'], data['width']]\n    \n    label_id_dict = {}\n    for data in tqdm.tqdm(label['annotations'], desc='Process label...'):\n        if data['image_id'] not in label_id_dict:\n            label_id_dict[data['image_id']] = []\n        \n        category_id = data['category_id']\n        x_min, y_min, w, h = data['bbox'][0], data['bbox'][1], data['bbox'][2], data['bbox'][3]\n        x_max, y_max = x_min + w, y_min + h\n        label_id_dict[data['image_id']].append(np.array([int(category_id), x_min, y_min, x_max, y_max]))\n    \n    if pred_coco_json_path.endswith('json'):\n        with open(pred_coco_json_path) as f:\n            pred = json.load(f)\n        pred_id_dict = {}\n        for data in tqdm.tqdm(pred, desc='Process pred...'):\n            if data['image_id'] not in pred_id_dict:\n                pred_id_dict[data['image_id']] = []\n            \n            score = data['score']\n            category_id = data['category_id']\n            x_min, y_min, w, h = data['bbox'][0], data['bbox'][1], data['bbox'][2], data['bbox'][3]\n            x_max, y_max = x_min + w, y_min + h\n            \n            pred_id_dict[data['image_id']].append(np.array([x_min, y_min, x_max, y_max, float(score), int(category_id)]))\n    else:\n        with open(pred_coco_json_path, 'rb') as f:\n            pred = pickle.load(f)\n        pred_id_dict = {}\n        for data in tqdm.tqdm(pred, desc='Process pred...'):\n            image_id = os.path.splitext(os.path.basename(data['img_path']))[0]\n            if image_id not in pred_id_dict:\n                pred_id_dict[image_id] = []\n            \n            for i in range(data['pred_instances']['labels'].size(0)):\n                score = data['pred_instances']['scores'][i]\n                category_id = data['pred_instances']['labels'][i]\n                bboxes = data['pred_instances']['bboxes'][i]\n                \n                x_min, y_min, x_max, y_max = bboxes.cpu().detach().numpy()\n                # x_min, x_max = x_min / data['scale_factor'][0], x_max / data['scale_factor'][0]\n                # y_min, y_max = y_min / data['scale_factor'][1], y_max / data['scale_factor'][1]\n                \n                pred_id_dict[image_id].append(np.array([x_min, y_min, x_max, y_max, float(score), int(category_id)]))\n    \n    for idx, image_id in enumerate(tqdm.tqdm(list(image_id_hw_dict.keys()), desc=\"Cal mAP...\")):\n        label = np.array(label_id_dict[image_id])\n        \n        if image_id not in pred_id_dict:\n            pred = np.empty((0, 6))\n        else:\n            pred = torch.from_numpy(np.array(pred_id_dict[image_id]))\n        \n        nl, npr = label.shape[0], pred.shape[0]\n        correct = torch.zeros(npr, niou, dtype=torch.bool)\n        if npr == 0:\n            if nl:\n                stats.append((correct, *torch.zeros((2, 0)), torch.from_numpy(label[:, 0])))\n            continue\n        \n        if nl:\n            correct = process_batch(pred, torch.from_numpy(label), iouv)\n        stats.append((correct, pred[:, 4], pred[:, 5], torch.from_numpy(label[:, 0])))\n    \n    stats = [torch.cat(x, 0).cpu().numpy() for x in zip(*stats)]\n    tp, fp, p, r, f1, ap, ap_class = ap_per_class(*stats)\n    print(f'precision:{p}')\n    print(f'recall:{r}')\n    print(f'mAP@0.5:{ap[:, 0]}')\n    \n    table = PrettyTable()\n    table.title = f\"Metrice\"\n    table.field_names = [\"Classes\", 'Precision', 'Recall', 'mAP50', 'mAP50-95']\n    table.add_row(['all', f'{np.mean(p):.3f}', f'{np.mean(r):.3f}', f'{np.mean(ap[:, 0]):.3f}', f'{np.mean(ap):.3f}'])\n    for cls_idx, classes in enumerate(classes):\n        table.add_row([classes, f'{p[cls_idx]:.3f}', f'{r[cls_idx]:.3f}', f'{ap[cls_idx, 0]:.3f}', f'{ap[cls_idx, :].mean():.3f}'])\n    print(table)"
  },
  {
    "path": "mmdet-course/readme.md",
    "content": "# mmdet使用教程\n\n### mmdet教程命令\n\n1. conda create -n mmdet_py39 python=3.9 anaconda\n2. https://mmdetection.readthedocs.io/en/latest/get_started.html\n3. https://pytorch.org/get-started/previous-versions/  \npip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu121\n4. https://mmdetection.readthedocs.io/zh-cn/latest/user_guides/train.html#id7\n\n### mmdet运行命令\n\n1. 训练\n\n        python tools/train.py <your-config-file>\n2. 测试  \n\n        python tools/test.py <your-config-file> <your-model-weights-file> --out <save-pickle-path>\n3. 计算量、参数量计算脚本  \n\n        python tools/analysis_tools/get_flops.py <your-config-file>\n4. 推理时间、fps、gpu memory计算脚本  \n\n        python tools/analysis_tools/benchmark.py <your-config-file> --checkpoint <your-model-weights-file> --task inference --fuse-conv-bn\n5. 绘制曲线图脚本  \n\n        python tools/analysis_tools/analyze_logs.py plot_curve <train-json-file> --keys <keys> --legend <legend> --out <save-path>\n6. 结果分析脚本  \n\n        python tools/analysis_tools/analyze_results.py <your-config-file> <test-pickle-path> <save-path>\n\n### mmdet视频教程链接(可按顺序观看)\n\n1. [一库打尽目标检测对比实验！mmdetection环境、训练、测试手把手教程！](https://www.bilibili.com/video/BV1xA4m1c7H8/)\n2. [一库打尽目标检测对比实验！mmdetection参数量、计算量、FPS、绘制logs手把手教程](https://www.bilibili.com/video/BV17C41137dW/)\n3. [一库打尽目标检测对比实验！mmdetection指标转换YOLO指标！](https://www.bilibili.com/video/BV1AWtCesEc6/)\n\n### mmdet实验数据(指标均为COCO指标)\n\n以下实验数据环境:  \npython:3.9.19  \ntorch:2.1.0+cu121  \ntorchvision:0.16.0  \nmmdet:3.3.0  \nmmcv:2.1.0  \nmmengine:0.10.3  \n硬件环境:  \nPlatform:Ubuntu  \nCPU:i7-12700K  \nRAM:32G  \nGPU:RTX3090  \n\n#### VisDrone2019-testset\n\n| model | Input Shape | GFlops | Params | coco/bbox_mAP | coco/bbox_mAP_50 | coco/bbox_mAP_s | coco/bbox_mAP_m | coco/bbox_mAP_l |\n| :----: | :----: | :----: | :----: | :----: | :----: | :----: | :----: | :----: |\n| Faster-RCNN-R50-FPN-CIOU | (768, 1344) | 208G | 41.39M | 0.194 | 0.329 | 0.095 | 0.309 | 0.429 |\n| Cascade-RCNN-R50-FPN | (768, 1344) | 236G | 69.29M | 0.197 | 0.326 | 0.099 | 0.309 | 0.406 |\n| ATSS-R50-FPN-DyHead | (768, 1344) | 110G | 38.91M | 0.204 | 0.338 | 0.100 | 0.317 | 0.485 |\n| TOOD-R50 | (768, 1344) | 199G | 32.04M | 0.204 | 0.339 | 0.102 | 0.317 | 0.403 |\n| DINO | (750, 1333) | 274G | 47.56M | 0.253 | 0.445 | 0.150 | 0.371 | 0.503 |\n| DDQ | (768, 1333) | - | - | 0.268 | 0.463 | 0.159 | 0.390 | 0.526 |\n| YOLOX-Tiny | (640, 640) | 7.578G | 5.035M | 0.148 | 0.278 | 0.076 | 0.221 | 0.278 |\n| GFL | (768, 1344) | 206G | 32.279M | 0.193 | 0.321 | 0.094 | 0.300 | 0.409 |\n| RTMDet-Tiny | (640, 640) | 8.033G | 4.876M | 0.184 | 0.312 | 0.077 | 0.288 | 0.445 |\n| RetinaNet-R50-FPN | (768, 1344) | 210G | 36.517M | 0.164 | 0.276 | 0.060 | 0.274 | 0.427 |"
  },
  {
    "path": "mmdet-course/yolo2coco.py",
    "content": "import os\nimport cv2\nimport json\nfrom tqdm import tqdm\nfrom sklearn.model_selection import train_test_split\nimport argparse\n\n# python yolo2coco.py --root_dir VisDrone2019-DET-train --save_path train.json\n# python yolo2coco.py --root_dir VisDrone2019-DET-val --save_path val.json\n# python yolo2coco.py --root_dir VisDrone2019-DET-test-dev --save_path test.json\n\nparser = argparse.ArgumentParser()\nparser.add_argument('--root_dir', default='./dataset/valid',type=str, help=\"root path of images and labels, include ./images and ./labels and classes.txt\")\nparser.add_argument('--save_path', type=str,default='./valid.json', help=\"if not split the dataset, give a path to a json file\")\nparser.add_argument('--random_split', action='store_true', help=\"random split the dataset, default ratio is 8:1:1\")\nparser.add_argument('--split_by_file', action='store_true', help=\"define how to split the dataset, include ./train.txt ./val.txt ./test.txt \")\n\narg = parser.parse_args()\n\ndef train_test_val_split_random(img_paths,ratio_train=0.8,ratio_test=0.1,ratio_val=0.1):\n    # 这里可以修改数据集划分的比例。\n    assert int(ratio_train+ratio_test+ratio_val) == 1\n    train_img, middle_img = train_test_split(img_paths,test_size=1-ratio_train, random_state=233)\n    ratio=ratio_val/(1-ratio_train)\n    val_img, test_img  =train_test_split(middle_img,test_size=ratio, random_state=233)\n    print(\"NUMS of train:val:test = {}:{}:{}\".format(len(train_img), len(val_img), len(test_img)))\n    return train_img, val_img, test_img\n\ndef train_test_val_split_by_files(img_paths, root_dir):\n    # 根据文件 train.txt, val.txt, test.txt（里面写的都是对应集合的图片名字） 来定义训练集、验证集和测试集\n    phases = ['train', 'val', 'test']\n    img_split = []\n    for p in phases:\n        define_path = os.path.join(root_dir, f'{p}.txt')\n        print(f'Read {p} dataset definition from {define_path}')\n        assert os.path.exists(define_path)\n        with open(define_path, 'r') as f:\n            img_paths = f.readlines()\n            # img_paths = [os.path.split(img_path.strip())[1] for img_path in img_paths]  # NOTE 取消这句备注可以读取绝对地址。\n            img_split.append(img_paths)\n    return img_split[0], img_split[1], img_split[2]\n\n\ndef yolo2coco(arg):\n    root_path = arg.root_dir\n    print(\"Loading data from \",root_path)\n\n    assert os.path.exists(root_path)\n    originLabelsDir = os.path.join(root_path, 'labels')                                        \n    originImagesDir = os.path.join(root_path, 'images')\n    with open(os.path.join(root_path, 'classes.txt')) as f:\n        classes = f.read().strip().split()\n    # images dir name\n    indexes = os.listdir(originImagesDir)\n\n    if arg.random_split or arg.split_by_file:\n        # 用于保存所有数据的图片信息和标注信息\n        train_dataset = {'categories': [], 'annotations': [], 'images': []}\n        val_dataset = {'categories': [], 'annotations': [], 'images': []}\n        test_dataset = {'categories': [], 'annotations': [], 'images': []}\n\n        # 建立类别标签和数字id的对应关系, 类别id从0开始。\n        for i, cls in enumerate(classes, 0):\n            train_dataset['categories'].append({'id': i, 'name': cls, 'supercategory': 'mark'})\n            val_dataset['categories'].append({'id': i, 'name': cls, 'supercategory': 'mark'})\n            test_dataset['categories'].append({'id': i, 'name': cls, 'supercategory': 'mark'})\n            \n        if arg.random_split:\n            print(\"spliting mode: random split\")\n            train_img, val_img, test_img = train_test_val_split_random(indexes,0.8,0.1,0.1)\n        elif arg.split_by_file:\n            print(\"spliting mode: split by files\")\n            train_img, val_img, test_img = train_test_val_split_by_files(indexes, root_path)\n    else:\n        dataset = {'categories': [], 'annotations': [], 'images': []}\n        for i, cls in enumerate(classes, 0):\n            dataset['categories'].append({'id': i, 'name': cls, 'supercategory': 'mark'})\n    \n    # 标注的id\n    ann_id_cnt = 0\n    for k, index in enumerate(tqdm(indexes)):\n        # 支持 png jpg 格式的图片。\n        txtFile = index.replace('images','txt').replace('.jpg','.txt').replace('.png','.txt')\n        # 读取图像的宽和高\n        im = cv2.imread(os.path.join(root_path, 'images/') + index)\n        height, width, _ = im.shape\n        if arg.random_split or arg.split_by_file:\n            # 切换dataset的引用对象，从而划分数据集\n                if index in train_img:\n                    dataset = train_dataset\n                elif index in val_img:\n                    dataset = val_dataset\n                elif index in test_img:\n                    dataset = test_dataset\n        # 添加图像的信息\n        dataset['images'].append({'file_name': index,\n                                    'id': k,\n                                    'width': width,\n                                    'height': height})\n        if not os.path.exists(os.path.join(originLabelsDir, txtFile)):\n            # 如没标签，跳过，只保留图片信息。\n            continue\n        with open(os.path.join(originLabelsDir, txtFile), 'r') as fr:\n            labelList = fr.readlines()\n            for label in labelList:\n                label = label.strip().split()\n                x = float(label[1])\n                y = float(label[2])\n                w = float(label[3])\n                h = float(label[4])\n\n                # convert x,y,w,h to x1,y1,x2,y2\n                H, W, _ = im.shape\n                x1 = (x - w / 2) * W\n                y1 = (y - h / 2) * H\n                x2 = (x + w / 2) * W\n                y2 = (y + h / 2) * H\n                # 标签序号从0开始计算, coco2017数据集标号混乱，不管它了。\n                cls_id = int(label[0])   \n                width = max(0, x2 - x1)\n                height = max(0, y2 - y1)\n                dataset['annotations'].append({\n                    'area': width * height,\n                    'bbox': [x1, y1, width, height],\n                    'category_id': cls_id,\n                    'id': ann_id_cnt,\n                    'image_id': k,\n                    'iscrowd': 0,\n                    # mask, 矩形是从左上角点按顺时针的四个顶点\n                    'segmentation': [[x1, y1, x2, y1, x2, y2, x1, y2]]\n                })\n                ann_id_cnt += 1\n\n    # 保存结果\n    folder = os.path.join(root_path, 'annotations')\n    if not os.path.exists(folder):\n        os.makedirs(folder)\n    if arg.random_split or arg.split_by_file:\n        for phase in ['train','val','test']:\n            json_name = os.path.join(root_path, 'annotations/{}.json'.format(phase))\n            with open(json_name, 'w') as f:\n                if phase == 'train':\n                    json.dump(train_dataset, f)\n                elif phase == 'val':\n                    json.dump(val_dataset, f)\n                elif phase == 'test':\n                    json.dump(test_dataset, f)\n            print('Save annotation to {}'.format(json_name))\n    else:\n        json_name = os.path.join(root_path, 'annotations/{}'.format(arg.save_path))\n        with open(json_name, 'w') as f:\n            json.dump(dataset, f)\n            print('Save annotation to {}'.format(json_name))\n\nif __name__ == \"__main__\":\n    yolo2coco(arg)"
  },
  {
    "path": "module-info/CVPR2023-SMPConv.md",
    "content": "# SMPConv模块总结 https://arxiv.org/pdf/2304.02330\n\n## 1. 背景\n\n### 连续卷积的兴起\n连续卷积因其处理不规则采样数据和建模长期依赖关系的能力而备受关注[1]。随着大型卷积核在实验中展现出优异结果，连续卷积因能高效构建大型核而获得进一步发展[1]。\n\n### 现有方法的局限性\n目前主流的连续卷积实现方法是使用多层感知机（MLP）作为神经场来生成核值[1][2]。然而，这种方法存在几个关键问题：\n\n- **计算开销大**：每次训练迭代都需要多次MLP的前向和反向传播来生成核并更新参数[1][2]\n- **超参数调优复杂**：需要调整激活函数、宽度、深度等大量架构变化[2][3]\n- **滤波器描述能力有限**：受到架构先验的严重影响[2][3]\n- **频谱偏差问题**：MLP训练中存在的频谱偏差影响性能[3]\n\n### 大规模应用的挑战\n由于计算复杂度高，基于MLP的方法难以应用于ImageNet等大规模问题[1][2]。\n\n## 2. 模块原理\n\n### 核心设计思想\nSMPConv提出使用**自移动点表示**和**插值方案**来实现连续函数，完全避免使用神经网络[3][6]。\n\n### 数学表示\nSMPConv将连续核函数定义为：\n\n```\nSMP(x; φ) = (1/|N(x)|) Σ g(x, pi, ri)wi\n```\n\n其中：\n- `φ = {{pi}, {wi}, {ri}}` 是可学习参数集合[7]\n- `pi ∈ Rd` 是自移动点的坐标[7]\n- `wi ∈ RNc` 是点的权重参数[7]\n- `ri ∈ R+` 是可学习的半径[7]\n\n### 距离函数\n使用L1距离定义邻域影响：\n```\ng(x, pi, ri) = 1 - ||x - pi||1/ri\n```\n只有在一定距离范围内的点才会影响查询点[7]。\n\n### 关键特性\n\n#### 自移动机制\n- **坐标可学习**：点坐标`{pi}`在训练过程中更新，实现\"移动\"[7]\n- **自适应分布**：更多点可聚集在高频区域，少量点可表示低频成分[7]\n- **参数效率**：单个点可能足以近似单峰函数[3]\n\n#### 插值实现连续性\n- 通过加权平均邻近点表示生成输出向量[7]\n- 在任意查询位置通过插值实现无限分辨率[3]\n\n### 参数共享策略\n在卷积层中，每个滤波器的所有通道共享位置参数，但拥有独立的权重参数[7][8]。这提供了合理的先验：卷积滤波器可以专注于输入域的特定区域[8]。\n\n## 3. 解决了什么问题\n\n### 3.1 计算效率问题\n\n**问题**：MLP方法需要大量前向和反向传播计算[1][2]\n\n**解决方案**：\n- 仅使用点表示和插值，无需神经网络[3][4]\n- 训练速度比FlexConv快7倍以上[9]\n- 比Deformable Conv快2.5倍[9]\n\n### 3.2 参数效率问题\n\n**问题**：传统离散卷积参数数量随核大小平方增长[9]\n\n**解决方案**：\n- 参数数量为`(1 + d + C)Np`，与核分辨率无关[9]\n- 使用`Np ≪ N²`个点表示任意大小的核[9]\n- 固定参数预算下构建大型核[3][5]\n\n### 3.3 频谱偏差问题\n\n**问题**：MLP训练中的频谱偏差降低性能[3][4]\n\n**解决方案**：\n- 每个点表示覆盖输入域的局部区域[3]\n- 点独立更新，不影响整个输入域[3]\n- 邻近点的高度不同值可轻松表达高频成分[3]\n\n### 3.4 架构复杂性问题\n\n**问题**：MLP方法需要复杂的超参数搜索[2][3]\n\n**解决方案**：\n- 移除了新引入神经网络的超参数调优负担[4]\n- 可作为现有框架的即插即用替换[3]\n- 最小化架构先验[3]\n\n### 3.5 大规模应用问题\n\n**问题**：现有连续卷积方法无法处理ImageNet规模数据[2][5]\n\n**解决方案**：\n- 首次在ImageNet上成功应用连续卷积[5][13]\n- 在大规模设置中展示了相对于现有技术的改进[1]\n\n### 3.6 表达能力限制问题\n\n**问题**：现有方法的滤波器描述能力受限[2][3]\n\n**解决方案**：\n- 每个滤波器有独立参数，提供更多自由度[7][8]\n- 点可自由移动到最优位置[7]\n- 能够学习自适应的大型感受野[15]\n\n通过这些创新，SMPConv成功地将连续卷积从概念验证阶段推进到实际大规模应用，为深度学习中的卷积操作提供了一个高效、实用的替代方案。"
  },
  {
    "path": "module-info/CVPR2024-DCMPNet.md",
    "content": "# LEGM和MFM模块详细总结 https://arxiv.org/pdf/2403.01105\n\n## LEGM模块 (Local Feature-embedded Global Feature Extraction Module)\n\n### 1. 背景\n在图像去雾任务中，传统的卷积神经网络主要擅长提取局部特征，但在处理全局信息和长距离依赖关系方面存在局限性[7]。为了有效融合局部和全局特征信息，提高去雾网络的特征表示能力，作者设计了LEGM模块。\n\n### 2. 模块原理\nLEGM模块的核心组件是自注意力块(self-attention block)[7]，其输入包括：\n- U-Net输出经过1×1卷积后的特征\n- 经过3×3卷积的特征  \n- 深度估计网络(DE)后经过DRDB处理的特征\n\n**工作机制**：\n- 将卷积层与自注意力块相结合，命名为LEGM[7]\n- 在深度信息辅助去雾中，只有第一个LEGM接收雾霾图像的深度信息[7]\n- 去雾网络编码器包含三个LEGM，其输出通过MSAAM进行整合以防止浅层特征丢失[7]\n\n### 3. 解决的问题\n- **局部-全局特征融合**：有效结合了卷积网络的局部特征提取能力和自注意力机制的全局建模能力\n- **特征表示增强**：显著提升了网络的特征表示能力，消融实验显示相比基线模型PSNR提升了4.72dB[13]\n- **深度信息集成**：为深度信息的有效利用提供了合适的特征融合机制\n\n---\n\n## MFM模块 (Modulation Fusion Module)\n\n### 1. 背景\n在去雾网络的解码过程中，需要有效融合来自不同层次和不同来源的特征信息。传统的特征融合方法（如简单相加或拼接）无法自适应地调整不同特征的重要性，可能导致关键信息被稀释或丢失[8]。\n\n### 2. 模块原理\nMFM模块采用动态权重调制的特征融合策略[8]：\n\n**输入处理**：\n- 第一个MFM的输入是F̂¹ₗₑₘ和经过3×3卷积处理的特征F¹ᵣc\n- 将F̂¹ₗₑₘ和F¹ᵣc相加后，经过GAP(全局平均池化)、MLP和Softmax处理，得到权重矩阵A¹ᵣ,c[8]\n\n**特征调制**：\n权重矩阵A¹ᵣ,c中的数值表示F̂¹ₗₑₘ和F¹ᵣc在去雾图像重建中的重要性程度。通过A¹ᵣ,c进行调制的具体过程为[8]：\n```\nF̃¹ᵣc = A¹ᵣ,c ⊙ F̂¹ₗₑₘ + A¹ᵣ,c ⊙ F¹ᵣc\n```\n\n**特征整合**：\n- 将F̃¹ᵣc和F̂¹ₗₑₘ进行拼接以增强它们之间的共享信息\n- 拼接结果经过卷积层处理，产生第一个带FMI的LEGM输出[8]\n\n### 3. 解决的问题\n- **自适应特征融合**：通过动态调整融合权重，突出对去雾重建贡献更大的特征信息\n- **特征表示增强**：提升网络的特征表示能力，消融实验显示在LEGM基础上进一步改善了模型性能[13]\n- **信息保持**：通过权重调制机制，确保重要的特征信息在融合过程中得到保留和强化\n- **跨通道特征交互**：促进不同通道间的特征交互，提高整体的特征表达能力[14]\n\n---\n\n## 模块协同作用\n\nLEGM和MFM模块在整个网络架构中形成了有效的协同作用：\n- **LEGM**负责局部-全局特征的有效提取和融合\n- **MFM**负责不同特征间的自适应融合和调制\n- 两个模块共同构成了去雾网络解码器中的核心组件，实现了高质量的特征重建和图像恢复[8]"
  },
  {
    "path": "module-info/CVPR2024-FADC.md",
    "content": "### **FADC模块总结** https://arxiv.org/pdf/2403.05369\n\n#### **1. 背景**\n膨胀卷积（Dilated Convolution）通过插入间隔增加感受野，广泛应用于语义分割和目标检测任务。然而，传统膨胀卷积存在以下问题：\n- **高频信息丢失**：膨胀率增大导致卷积核的频率响应下降，限制了高频细节的捕获能力。[1][3][7]\n- **伪影问题**：当特征图的高频分量超过膨胀卷积的采样率时，会产生网格伪影（Gridding Artifacts）。[1][6][16]\n- **固定膨胀率的局限性**：传统方法使用全局固定的膨胀率，无法适应输入特征的局部变化，导致感受野与带宽的平衡不足。[1][4][7]\n\n为了解决这些问题，作者提出了**频率自适应膨胀卷积（Frequency-Adaptive Dilated Convolution, FADC）**，从频谱分析角度优化膨胀卷积的性能。\n\n---\n\n#### **2. 模块原理**\nFADC包含三个核心模块，分别从膨胀率、卷积核权重和频率分量平衡三个方面进行改进：\n\n1. **自适应膨胀率（Adaptive Dilation Rate, AdaDR）**  \n   - **动态调整膨胀率**：根据特征图的局部频率动态分配膨胀率。在高频区域（如边界），采用小膨胀率以捕获更多细节；在低频区域（如背景），采用大膨胀率以扩展感受野。[3][7][8]\n   - **优化目标**：通过最大化感受野并最小化高频信息损失，平衡膨胀率与频率带宽。[7][8]\n\n2. **自适应卷积核（Adaptive Kernel, AdaKern）**  \n   - **卷积核参数分解**：将卷积核权重分解为低频部分（平均值）和高频部分（残差）。[9]\n   - **动态权重调整**：通过轻量级模块（全局池化+卷积层）动态调整高频和低频分量的比例，增强高频特征的捕获能力，提高有效带宽。[9][15]\n\n3. **频率选择模块（Frequency Selection, FreqSelect）**  \n   - **频率分解**：将特征图分解为不同频段（如低频到高频），并通过二值掩码提取对应频率分量。[9][15]\n   - **空间重加权**：根据输入特征的频率分布，动态调整不同频段的权重。通过抑制背景和对象中心的高频分量，鼓励网络学习更大的膨胀率，从而扩展感受野。[9][16]\n\n---\n\n#### **3. 解决了什么问题**\n1. **高频信息丢失**  \n   - AdaDR通过动态调整膨胀率，在高频区域保留更多细节，避免高频信息丢失。\n   - AdaKern增强高频分量的卷积响应能力，提高了特征图的高频信息捕获。[3][9]\n\n2. **伪影问题（Gridding Artifacts）**  \n   - 通过动态调整膨胀率，FADC避免了特征频率超过采样率的情况，从而有效缓解伪影问题。[1][7][16]\n\n3. **感受野与带宽的平衡不足**  \n   - AdaDR在局部动态分配膨胀率，优化了感受野与带宽的平衡。\n   - FreqSelect通过频率分量的空间重加权，进一步扩大了感受野，同时保留了关键的高频信息。[7][15][16]\n\n4. **适配性与通用性不足**  \n   - FADC无需全局固定膨胀率，能够适应输入特征的局部变化，提高了网络的适配性。\n   - 模块设计轻量化，可无缝替换现有卷积层，适用于语义分割、目标检测等多种任务。[13][14] \n\n--- \n\nFADC通过频率视角优化膨胀卷积，提出的三大模块使其在捕获高频细节、扩展感受野以及解决伪影问题方面表现卓越，显著提升了语义分割和目标检测的性能。"
  },
  {
    "path": "module-info/CVPR2024-PKINet.md",
    "content": "### **PKI Module总结** https://openaccess.thecvf.com/content/CVPR2024/papers/Cai_Poly_Kernel_Inception_Network_for_Remote_Sensing_Detection_CVPR_2024_paper.pdf\n\n#### **1. 背景**\n遥感目标检测任务中，目标尺度变化大（如小型车辆与大型建筑物）、背景复杂且上下文信息多样化。现有方法通过以下方式扩展感受野来解决问题：\n- **大核卷积**：用于捕获更多上下文信息，但容易引入背景噪声，影响小目标检测。\n- **膨胀卷积**：扩大感受野，但可能导致特征表示过于稀疏，丢失细节信息。\n\n这些方法未能有效处理目标尺度变化，同时保持局部纹理特征的完整性。[1][3]\n\n---\n\n#### **2. 模块原理**\nPKI Module是一个**Inception风格**的模块，专为捕获多尺度纹理特征而设计，由以下部分组成：\n1. **局部信息提取**：\n   - 使用一个小核卷积（如3×3）提取局部纹理特征，确保捕获目标的细节信息。\n   - 数学表示：  \n     \\[ L_{l-1,n} = \\text{Conv}_{k_s \\times k_s}(X_{l-1,n}) \\]\n     其中，\\( k_s \\) 为小核大小（如3×3）。\n\n2. **多尺度特征提取**：\n   - 通过多个并行的**深度卷积核**（kernel size如5×5、7×7、9×9等）捕获不同尺度的上下文信息。\n   - 数学表示：  \n     \\[ Z_{l-1,n}^{(m)} = \\text{DWConv}_{k(m) \\times k(m)}(L_{l-1,n}) \\]\n     其中，\\( k(m) = (m+1) \\times 2 + 1 \\)，表示不同尺度的卷积核。\n\n3. **特征融合**：\n   - 将局部特征与多尺度特征通过1×1卷积进行通道融合，整合多尺度信息。\n   - 数学表示：  \n     \\[ P_{l-1,n} = \\text{Conv}_{1 \\times 1}(L_{l-1,n} + \\sum_{m=1}^{4} Z_{l-1,n}^{(m)}) \\]\n   - 这种融合机制确保了在不同尺度下捕获丰富的上下文信息，同时保持局部纹理特征的完整性。[6][7]\n\n---\n\n#### **3. 解决了什么问题**\nPKI Module通过多尺度卷积核设计，解决了以下问题：\n1. **目标尺度变化问题**：\n   - 不同大小的卷积核能够捕获从小到大的目标特征，适应遥感图像中目标尺度跨度大的特性。\n\n2. **背景噪声问题**：\n   - 避免使用大核卷积，减少背景噪声对小目标检测的干扰。\n\n3. **稀疏特征问题**：\n   - 不使用膨胀卷积，避免特征表示稀疏导致的细节丢失，确保特征密度和完整性。\n\n通过以上设计，PKI Module能够有效捕获**局部与多尺度上下文信息**，提升遥感图像目标检测的性能。[3][7][18]"
  },
  {
    "path": "module-info/CVPR2024-ParameterNet.md",
    "content": "# DynamicConv模块总结 https://arxiv.org/pdf/2306.14525v2\n\n## 1. 背景\n\n### 问题背景\n在大规模视觉预训练中，研究者发现了\"低FLOPs陷阱\"现象：低FLOPs模型无法从大规模预训练数据中获益，而高FLOPs模型却能显著受益[1][2]。传统的解决方案是增加模型规模，但这会同时增加参数数量和计算复杂度（FLOPs），不适合移动设备等资源受限的场景[1]。\n\n### 设计需求\n为了让低FLOPs模型也能从大规模预训练中受益，需要一种能够：\n- **大幅增加参数数量**以提升模型容量\n- **几乎不增加FLOPs**以保持计算效率\n- 适用于资源受限环境的技术方案[2][6]\n\n## 2. 模块原理\n\n### 核心思想\nDynamicConv通过**参数增强函数**实现\"参数多、计算少\"的目标：\n```\nW' = f(W)\n```\n该函数需满足两个基本规则：1）计算成本低；2）大幅增加模型容量[6]。\n\n### 技术实现\n\n**标准卷积**：\n```\nY = X * W\n```\n其中X ∈ R^(Cin×H×W)是输入特征，W ∈ R^(Cout×Cin×K×K)是权重张量[6]。\n\n**动态卷积**：\n```\nY = X * W'\nW' = Σ(i=1 to M) αi * Wi\n```\n其中：\n- Wi是第i个卷积权重张量（共M个专家）\n- αi是对应的动态系数\n- 系数根据不同输入样本动态生成[6][7]\n\n### 动态系数生成机制\n```\nα = softmax(MLP(Pool(X)))\n```\n具体步骤：\n1. 对输入X进行**全局平均池化**融合信息\n2. 通过**两层MLP模块**处理\n3. 使用**softmax激活**产生动态系数α ∈ R^M[7]\n\n### 复杂度分析\n\n**参数数量**：\n- 标准卷积：Cout · Cin · K · K\n- 动态卷积：C²in + CinM + M · Cout · Cin · K · K\n- **参数比例**：≈ 1/K² + M ≈ M（当M ≪ CoutK², Cin ≈ Cout时）[8]\n\n**FLOPs计算**：\n- 系数生成：C²in + CinM（可忽略）\n- 权重融合：M · Cout · Cin · K · K\n- 卷积计算：H' · W' · Cout · Cin · K · K\n- **FLOPs比例**：≈ 1（当M ≪ H'W'时）[8]\n\n## 3. 解决的问题\n\n### 主要解决的核心问题\n\n1. **低FLOPs陷阱**：使低FLOPs模型能够从大规模预训练中获益，打破了\"低计算量模型无法利用大数据\"的限制[2][10]\n\n2. **参数-计算效率权衡**：实现了参数数量的大幅增加（约M倍）而计算量几乎不变，解决了传统方法中参数和FLOPs高度耦合的问题[8]\n\n### 具体效果验证\n\n**性能提升**：\n- ParameterNet-600M在ImageNet-1K上达到81.6%准确率，超过Swin Transformer的80.9%\n- FLOPs仅为0.6G，远低于Swin-T的4.5G[2]\n- ImageNet-22K预训练相比ImageNet-1K训练提升约2%[10]\n\n**与替代方案对比**：\n相比重参数化卷积（RepConv），DynamicConv的优势在于：\n- RepConv虽然增加训练参数，但推理时参数和FLOPs不变，模型容量未真正增加\n- DynamicConv在推理时保持增加的参数，真正提升了模型容量，能从大规模预训练中获益[13]\n\n### 应用价值\nDynamicConv模块为移动设备和边缘计算场景提供了新的解决方案，使得资源受限的环境也能享受大规模预训练带来的性能提升，在准确率-延迟权衡方面表现优异[11][12]。"
  },
  {
    "path": "module-info/CVPR2024-RMT.md",
    "content": "# RMT Block模块详细分析 https://arxiv.org/pdf/2309.11523\n\n## 1. 背景\n\n### Vision Transformer的局限性\n传统的Vision Transformer (ViT)存在两个核心问题：\n- **缺乏显式空间先验**：Self-Attention机制本身不具备空间位置感知能力[1]\n- **二次计算复杂度**：全局信息建模时Self-Attention的计算成本随token数量二次增长[1][2]\n\n### 现有解决方案的不足\n现有方法如Swin Transformer使用窗口操作、NAT改变感受野形状等，虽然能部分解决问题，但都会破坏空间先验信息的完整性[2][6]。\n\n### RetNet的启发\nRetNet在NLP领域使用基于距离的时间衰减矩阵为一维单向文本数据提供显式时间先验，这为视觉领域的改进提供了灵感[2][3]。\n\n## 2. 模块原理\n\n### RMT Block整体架构\n根据图3所示，RMT Block包含以下核心组件[7]：\n- **Layer Normalization (LN)**\n- **Manhattan Self-Attention (MaSA)**\n- **Depth-wise Convolution (DWConv 3×3)**\n- **Feed-Forward Network (FFN)**\n\n### Manhattan Self-Attention (MaSA)核心原理\n\n#### 空间衰减矩阵设计\nMaSA的核心是基于曼哈顿距离的二维双向空间衰减矩阵：\n```\nD²d_nm = γ^(|xn-xm|+|yn-ym|)\n```\n其中：\n- `(xn, yn)`表示第n个token的二维坐标\n- `γ`是衰减参数，控制距离衰减的强度\n- 距离越远的token，注意力权重衰减越大[5]\n\n#### MaSA计算公式\n```\nMaSA(X) = (Softmax(QK^T) ⊙ D²d)V\n```\n这里`⊙`表示逐元素相乘，空间衰减矩阵直接调制注意力权重[5]。\n\n#### 注意力分解机制\n为了降低计算复杂度，MaSA采用沿图像两个轴的分解形式：\n```\nAttnH = Softmax(QHK^T_H) ⊙ DH\nAttnW = Softmax(QWK^T_W) ⊙ DW\nMaSA(X) = AttnH(AttnWV)^T\n```\n其中：\n- `DH_nm = γ^|yn-ym|`表示垂直方向距离\n- `DW_nm = γ^|xn-xm|`表示水平方向距离[6][7]\n\n### 局部上下文增强 (LCE)\n为了进一步增强局部表达能力，RMT Block集成了局部上下文增强模块：\n```\nXout = MaSA(X) + LCE(V)\n```\nLCE使用5×5深度卷积来增强局部特征[7]。\n\n### 多头注意力的衰减参数设计\n不同注意力头使用不同的γ值来控制感受野，使模型能够感知多尺度信息。对于第i个头：\n```\nγi = 1 - 2^(-a - (b-a)i/N)\n```\n其中a、b控制感受野范围，N是头的总数[19]。\n\n## 3. 解决了什么问题\n\n### 问题1：显式空间先验缺失\n**解决方案**：通过曼哈顿距离的空间衰减矩阵，为每个token提供了明确的空间位置感知能力。\n- 近距离token获得更高注意力权重\n- 远距离token注意力权重按距离衰减\n- 提供了比传统位置编码更丰富的空间先验信息[3][5]\n\n### 问题2：二次计算复杂度\n**解决方案**：通过注意力分解将复杂度从O(N²)降低到O(N)。\n- 分别计算水平和垂直方向的注意力\n- 保持了与原始MaSA相同的感受野形状\n- 不破坏空间衰减矩阵的完整性[6][7]\n\n### 问题3：全局与局部信息平衡\n**解决方案**：通过分阶段使用不同形式的MaSA实现最优平衡。\n- 前三个阶段使用分解的MaSA处理大量token\n- 最后阶段使用完整MaSA进行精细建模\n- LCE模块补充局部特征表达[7]\n\n### 实验验证效果\n消融实验证明了各组件的有效性：\n- **MaSA vs Vanilla Attention**：分类准确率提升0.8%，检测AP提升2.5%[15]\n- **分解形式的效率**：在保持性能的同时显著降低FLOPs[16]\n- **多任务优越性**：在图像分类、目标检测、实例分割和语义分割任务上都取得了最先进的结果[8][10][13][14]\n\n通过这些创新设计，RMT Block成功地将RetNet的时间建模能力扩展到空间域，为视觉Transformer提供了一个既高效又具有强空间感知能力的核心模块。"
  },
  {
    "path": "module-info/CVPR2024-RepVIT.md",
    "content": "### RepViT Block模块总结 https://arxiv.org/pdf/2307.09283\n\n#### 1. 背景\n\n**原始问题**：\n- MobileNetV3采用的是传统的倒残差瓶颈结构，其中Token Mixer（空间信息融合）和Channel Mixer（通道交互）是耦合在一起的[6]\n- 具体来说，MobileNetV3 block包含1×1扩展卷积、3×3深度卷积（DW）和1×1投影层，这种设计使得空间和通道的处理混合在一起[6]\n- 轻量级ViT的成功很大程度上归因于其采用的MetaFormer架构，该架构将Token Mixer和Channel Mixer分离，这种设计被证明是有效的[6]\n\n**设计动机**：\n- 研究发现ViT的有效性主要来源于其通用的Token Mixer和Channel Mixer架构（即MetaFormer架构），而不是特定的Token Mixer[6]\n- 为了让轻量级CNN也能享受这种架构优势，需要在MobileNetV3中实现Token Mixer和Channel Mixer的分离[6]\n\n#### 2. 模块原理\n\n**结构设计**：\n- **分离设计**：将原本耦合的Token Mixer和Channel Mixer进行分离\n  - Token Mixer：3×3深度卷积（DW），负责空间信息融合\n  - Channel Mixer：1×1卷积层，负责通道间的交互[6][7]\n\n- **层序调整**：\n  - 将3×3 DW卷积前移，使其独立处理空间信息\n  - SE层（如果存在）也随之前移，放置在DW卷积之后，因为SE层依赖于空间信息交互[7]\n\n- **结构重参数化**：\n  - 对DW层采用广泛使用的结构重参数化技术，在训练时使用多分支结构增强学习能力\n  - 在推理时可以将多分支合并为单一卷积，消除跳跃连接带来的计算和内存开销[7]\n\n**具体结构对比**：\n- **MobileNetV3 Block**：1×1扩展 → 3×3 DW → SE（可选）→ 1×1投影\n- **RepViT Block**：3×3 DW → SE（可选）→ 1×1扩展 → 1×1投影[7]\n\n#### 3. 解决了什么问题\n\n**性能提升**：\n- **架构优化**：通过分离Token Mixer和Channel Mixer，使模型能够更好地处理空间和通道信息，提升了模型的表达能力[6][7]\n\n**效率优化**：\n- **延迟降低**：RepViT block将MobileNetV3-L的延迟从1.01ms降低到0.81ms[7]\n- **推理优化**：结构重参数化技术在推理时消除了跳跃连接的计算开销，这对移动设备特别有利[7]\n\n**训练增强**：\n- **学习能力**：结构重参数化技术在训练时提供多分支结构，增强了模型的学习能力，同时在推理时保持单分支的效率[7]\n\n**架构统一**：\n- **设计一致性**：使轻量级CNN的架构与成功的轻量级ViT保持一致，为后续的优化提供了良好的基础[6]\n\n**注意**：虽然RepViT block在延迟上有显著改善，但初期会带来临时的性能下降（从71.5%降至68.3%），这通过后续的扩展比例调整和网络宽度增加得到了补偿[7]"
  },
  {
    "path": "module-info/CVPR2024-Rewrite the Stars.md",
    "content": "# StarBlocks模块总结 https://arxiv.org/pdf/2403.19967\n\n## 1. 背景\n\n### 传统网络设计的局限性\n在深度学习发展历程中，大多数网络都基于**线性投影（卷积和线性层）与非线性激活函数的组合**[1]。虽然自注意力机制在NLP和计算机视觉中表现出色，但其二次复杂度限制了效率[1]。\n\n### 逐元素乘法的兴起\n近年来，通过**逐元素乘法融合不同子空间特征**的学习范式逐渐受到关注[1]。相关工作如FocalNet、HorNet、VAN等都采用了这种\"星操作\"，但缺乏深入的理论分析[1][2]。\n\n### 现有解释的不足\n现有研究对星操作的解释主要基于直觉和假设[2]：\n- FocalNet认为星操作起调制或门控机制作用\n- HorNet认为优势在于利用高阶特征  \n- VAN和Monarch Mixer将其归因于卷积注意力\n\n这些解释缺乏全面分析和强有力证据[2]。\n\n## 2. 模块原理\n\n### 核心设计结构\nStarBlocks采用简洁的设计philosophy[12][13]：\n\n```\n输入 → 深度卷积(DW-Conv) → 全连接层1(FC) → 全连接层2(FC) → ReLU6激活 → 星操作(*) → 全连接层3(FC) → 深度卷积(DW-Conv) → 批归一化(BN) → 输出\n```\n\n### 数学原理\n星操作的数学表达为：**(W₁ᵀX + B₁) * (W₂ᵀX + B₂)**[5]\n\n通过重写可得到：\n```\nw₁ᵀx * w₂ᵀx = Σᵢ₌₁^(d+1) Σⱼ₌₁^(d+1) wᵢ¹wⱼ²xᵢxⱼ\n```\n\n这产生了**(d+2)(d+1)/2 ≈ (d/√2)²个不同的项**，每个项都是输入的非线性组合[6]。\n\n### 多层堆叠效应\n通过l层堆叠，隐式特征维度达到**(d/√2)^(2l)**[7][8]：\n- 第1层：R^((d/√2)²¹)\n- 第2层：R^((d/√2)²²)  \n- 第l层：R^((d/√2)²ˡ)\n\n例如，10层深度、128宽度的网络可获得约**90^1024维**的隐式特征空间[8]。\n\n### 与核函数的关系\n星操作类似于**多项式核函数**[5]：\n- 多项式核：k(x₁,x₂) = (γx₁·x₂ + c)^d\n- 都能将输入映射到高维非线性空间\n- 决策边界可视化证实了这种相似性[10]\n\n## 3. 解决了什么问题\n\n### 3.1 高维特征表示问题\n**传统解决方案的局限**：\n- 传统网络通过增加网络宽度（通道数）来获得高维特征[3]\n- 这种方式增加了计算开销和参数量\n\n**StarBlocks的解决方案**：\n- 在**低维计算空间中获得高维隐式特征表示**[3]\n- 无需增加网络宽度即可实现维度扩展[6]\n\n### 3.2 计算效率与性能的平衡\n**问题**：高效网络设计中性能与计算复杂度的权衡\n\n**解决效果**[14][15]：\n- StarNet-S4相比EdgeViT-XS准确率提升0.9%，速度快3倍\n- 在相同延迟下，StarNet-S1比MobileOne-S0准确率高2.1%\n- 证明了星操作特别适合高效网络设计[3]\n\n### 3.3 激活函数依赖问题\n**传统认知**：激活函数是神经网络不可缺少的组件\n\n**StarBlocks的突破**[10][11]：\n- 移除所有激活函数后，性能仅下降1.2%（从71.7%降至70.5%）\n- 而传统求和操作在相同条件下性能大幅下降33.8%\n- 为**无激活函数网络**开辟了新的研究方向\n\n### 3.4 网络设计复杂度问题\n**传统高效网络的问题**：需要复杂的设计技巧和精细调参[3]\n\n**StarBlocks的优势**[13]：\n- 设计极其简洁，最小化人工干预\n- 无需复杂的重参数化、注意力集成等技术\n- 通过星操作的内在优势实现优异性能\n\n### 3.5 理论理解缺失问题\n**现有问题**：对逐元素乘法有效性缺乏深入理论解释[2]\n\n**StarBlocks的贡献**：\n- 提供了**数学上严格的理论分析**[5][6][7]\n- 通过实验、理论和可视化方法验证了分析的正确性[9][10]\n- 为网络设计提供了**指导性框架**，避免盲目尝试[4]\n\n## 总结\n\nStarBlocks模块通过简洁的设计和深刻的理论洞察，解决了传统网络在高维特征表示、计算效率、激活函数依赖等方面的关键问题，为高效网络设计提供了新的paradigm和理论基础。"
  },
  {
    "path": "module-info/CVPR2024-SFSConv.md",
    "content": "# SFS-Conv模块详细总结 https://openaccess.thecvf.com/content/CVPR2024/papers/Li_Unleashing_Channel_Potential_Space-Frequency_Selection_Convolution_for_SAR_Object_Detection_CVPR_2024_paper.pdf\n\n## 1. 背景\n\n### 1.1 现有问题\n传统深度卷积神经网络在SAR目标检测中存在以下关键问题[1][2]：\n- **特征冗余严重**：单个卷积层内提取的大量特征图表现出相似的模式，存在显著冗余[1][4]\n- **计算资源消耗巨大**：深度网络的成功严重依赖于密集的计算和存储资源，给资源受限环境的部署带来挑战[1]\n- **通用卷积不适配SAR特性**：现有的分组卷积、逐点卷积等并非专门为SAR目标检测任务设计[2]\n\n### 1.2 SAR图像特殊性\nSAR图像具有独特的成像特点[2]：\n- **高分辨率俯视视角**：大多数目标较小，常被斑点噪声遮挡\n- **依赖周围环境信息**：仅凭外观难以识别目标，需要利用目标形状、方向等周围环境线索\n- **频域信息重要**：SAR成像基于雷达系统与目标的相互作用，频域分析可分解回波信号的散射特性\n\n### 1.3 设计先验\n基于SAR图像分析，提出两个重要设计先验[2]：\n- **目标自适应感受野**：SAR图像中目标尺度多样，固定感受野的检测器可能产生错误分类\n- **频率特征关键作用**：SAR成像易受复杂背景干扰，仅凭空间信息难以区分目标特征和杂波噪声\n\n## 2. 模块原理\n\n### 2.1 整体架构\nSFS-Conv采用**分流-感知-选择**三步策略[2][6]：\n\n```\n输入特征 → 分流(Shunt) → 感知(Perceive) → 选择(Select) → 输出特征\n          ↓              ↓                ↓\n        空间/频率      SPU/FPU          CSU融合\n```\n\n### 2.2 分流策略(Shunt)\n将输入特征图X ∈ R^(C×H×W)按比例α分为两部分[6]：\n- **空间方面**：X^s ∈ R^((1-α)C×H×W)，提供空间信息\n- **频率方面**：X^f ∈ R^(αC×H×W)，补充频率特性\n\n通过两个1×1逐点卷积分别调整X^s和X^f，使其更适合后续的空间和频率维度特征提取[6]。\n\n### 2.3 感知策略(Perceive)\n\n#### 2.3.1 空间感知单元(SPU)\n**核心思想**：动态建模不同尺度的上下文信息[6]\n\n**实现方法**：\n- 将空间特征X^s均匀分为n个特征图组X^s_g\n- 每组对应不同尺寸的卷积核K_g，核尺寸递增：k_(g+1) = k_g + 2, k_1 = 3[6]\n- 构建层次化残差连接，扩大感受野：\n\n```\nY^s_g = {\n  X^s_g * K_g,                    g = 1\n  (X^s_g + Y^s_(g-1)) * K_g,     1 < g ≤ n\n}\n```\n\n- 感受野递增公式：RF_(g+1) = RF_g + (k_(g+1) - 1)[6]\n\n#### 2.3.2 频率感知单元(FPU)\n**核心思想**：利用分数阶Gabor变换提取多尺度多方向的频率特征[7]\n\n**分数阶Gabor变换(FrGT)**：\n- 标准FrGT定义[8]：\n```\nG^α_s(p,q) = ∫ s(x)ḡ(x-q)B(p,x,α)dx\n```\n其中B(x₁,x₂,α)是变换核，α = Pπ/2是变换角度\n\n- **卷积分数阶Gabor核(FrGK)**：用FrGT滤波器调制普通卷积核[8]：\n```\nK^v_(i,u) = K_(i,o) * G(u,v)\n```\n\n**实现过程**：\n- 将频率特征X^f分为V组X^f_v\n- 每组使用N = C/VU个卷积核生成对应频率特征\n- 最终连接所有组：Y^f = [Y^f_0, Y^f_1, ..., Y^f_(V-1)][8]\n\n### 2.4 选择策略(Select)\n\n#### 2.4.1 通道选择单元(CSU)\n**目标**：自适应融合空间和频率特征，选择最具区分性的信息[9]\n\n**实现步骤**：\n1. **全局平均池化**：收集空间和频率的全局信息[9]\n```\nS^n = GAP(Y^n) = (1/(H×W)) ∑∑ Y^n_(i,j)\n```\n\n2. **软注意力权重生成**[9]：\n```\nγ = e^(S^s)/(e^(S^s) + e^(S^f))\nβ = e^(S^f)/(e^(S^s) + e^(S^f))\n```\n\n3. **特征融合**[9]：\n```\nY = γY^s + βY^f\n```\n\n## 3. 解决的问题\n\n### 3.1 特征冗余问题\n**问题**：传统卷积产生大量相似的特征图，造成计算资源浪费[1][4]\n\n**解决方案**：\n- 通过分流策略将特征分为空间和频率两个互补方面，避免重复提取相似特征[2]\n- SPU的多尺度设计和FPU的多方向特征提取增加了特征多样性[6][7]\n- 实验显示相比普通卷积，SFS-Conv的特征图展现出更大的多样性和区分性[1]\n\n### 3.2 SAR图像特性适配问题\n**问题**：通用卷积设计未考虑SAR图像的独特特性[2]\n\n**解决方案**：\n- **空间适配**：SPU的动态感受野适应SAR图像中目标的多样尺度[6]\n- **频率适配**：FPU专门提取SAR成像机制产生的频域散射特性[7][8]\n- **噪声抑制**：分数阶Gabor变换有效抑制SAR图像中的斑点噪声[7]\n\n### 3.3 计算效率问题\n**问题**：现有方法通过增加注意力模块提升性能，但增加了模型复杂度[2]\n\n**解决方案**：\n- **参数高效**：CSU采用无参数融合方式，不增加额外参数[9]\n- **计算优化**：相比YOLOv8s仅使用18%参数和24%FLOPs[3]\n- **推理加速**：推理时间仅8.6ms，比YOLOv8s节省39%时间[12]\n\n### 3.4 性能与效率平衡问题\n**问题**：现有方法要么追求轻量化导致性能下降，要么提升性能但计算开销大[2]\n\n**解决方案**：\n- 在三个SAR数据集上都取得了最优性能：HRSID(96.2%)、SAR-AIRcraft-1.0(89.7%)、SSDD(99.6%)[3]\n- 同时保持极低的计算复杂度和推理时间[12]\n- 消融实验证明各组件的有效性和必要性[15][16]\n\nSFS-Conv模块通过创新的分流-感知-选择策略，在单个卷积层内实现了空间和频率特征的有效提取与融合，为SAR目标检测提供了高效、轻量化的解决方案。"
  },
  {
    "path": "module-info/CVPR2024-TransNext.md",
    "content": "# TransNeXt核心模块详解 https://arxiv.org/pdf/2311.17132\n\n## 一、Aggregated Attention（聚合注意力）\n\n### 1. 背景\n\n#### 现有问题\n- **深度退化效应**：许多高效ViT模型依赖堆叠层进行信息交换，但由于残差连接中的深度退化效应，无法形成充分的信息混合[1]\n- **与生物视觉的差异**：现有的局部注意力和空间下采样注意力与生物视觉系统工作原理存在显著差异[3]\n- **窗口分割artifacts**：基于窗口分割的方法会产生不自然的块状痕迹，即使经过深层堆叠也无法消除[3]\n- **计算复杂度**：全局自注意力的二次复杂度限制了在高分辨率图像上的应用[1]\n\n#### 生物视觉启发\n人类视觉系统具有中央凹视觉（高敏锐度，覆盖1-2度视野）和周边视觉（大感受野但精度较低）的二分法特性。眼球通过快速运动（扫视）处理多个视野信息并进行整合[20]。\n\n### 2. 模块原理\n\n#### 核心设计：像素聚焦注意力（Pixel-focused Attention）\n采用**双路径设计**模拟生物视觉系统：\n\n**路径1：滑动窗口注意力**（模拟中央凹视觉）\n- 每个查询对其最近邻特征进行细粒度感知\n- 使用固定的k×k窗口（实验中采用3×3）[5][6]\n\n**路径2：池化注意力**（模拟周边视觉）  \n- 每个查询对空间下采样特征进行粗粒度全局感知\n- 通过\"激活和池化\"操作获得全局信息[6]\n\n**数学表达**：\n```\nS(i,j)~ρ(i,j) = Q(i,j)K^T_ρ(i,j)     # 滑动窗口路径\nS(i,j)~σ(X) = Q(i,j)K^T_σ(X)         # 池化路径\nA(i,j) = softmax(Concat(S(i,j)~ρ(i,j), S(i,j)~σ(X))/√d + B(i,j))\n```\n\n#### 增强机制\n1. **查询嵌入（Query Embedding）**：添加可学习的查询令牌，增强注意力矩阵生成的多样性[7]\n2. **位置注意力（Positional Attention）**：使用可学习令牌与查询交互，提供动态相对位置偏置[8]\n3. **长度缩放余弦注意力**：提升多尺度输入的外推能力，λ = τ log N[9]\n\n### 3. 解决的问题\n\n1. **避免深度退化**：不依赖堆叠进行信息交换，单层即可实现有效的局部-全局建模[1]\n2. **自然视觉感知**：消除窗口分割产生的不自然块状artifacts，实现更符合生物视觉的感知模式[3]\n3. **像素级平移等变性**：模拟眼球连续运动，对图像任意位置的像素都能提供一致的中央凹视觉特性[3]\n4. **线性复杂度**：当池化大小固定时，计算复杂度与输入序列长度呈线性关系[10]\n5. **多尺度适应**：通过长度缩放余弦注意力和log-CPB位置偏置，提升大尺度图像的外推性能[9]\n\n---\n\n## 二、Convolutional GLU（卷积GLU）\n\n### 1. 背景\n\n#### ViT时代的通道注意力需求\n- **SE机制的局限性**：在ViT时代，全局感受野不再稀缺，SE机制使用全局平均池化的方法显得过于粗粒度，所有令牌共享相同的门控信号[11]\n- **ViT缺乏通道注意力**：研究发现将SE机制引入通道混合器可以有效增强模型鲁棒性[11]\n- **位置信息需求**：ViT结构需要通过3×3深度卷积提供条件位置编码（CPE）[11]\n\n#### GLU的优势\n门控线性单元（GLU）在自然语言处理任务中表现优于MLP，由两个线性投影组成，其中一个通过门控函数激活[11]。\n\n### 2. 模块原理\n\n#### 设计理念\n将**最小形式的3×3深度卷积**添加到GLU门控分支的激活函数之前，使其符合门控通道注意力的设计理念[11]。\n\n#### 结构设计\n```\nConvGLU(X) = (XW1 + B1) ⊙ GELU(DWConv(XW2 + B2))\n```\n\n其中：\n- `XW1 + B1`：值分支（保持与MLP和GLU相同的深度）\n- `DWConv(XW2 + B2)`：门控分支（添加3×3深度卷积）\n- `⊙`：逐元素乘法\n- `GELU`：激活函数\n\n#### 关键特性\n1. **基于最近邻特征的门控**：每个令牌拥有基于其最近邻细粒度特征的独特门控信号[12]\n2. **反向传播友好**：值分支保持与MLP相同的深度[12]\n3. **计算效率**：相比ConvFFN，在保持相同参数量的情况下，FLOPs更少[12]\n\n### 3. 解决的问题\n\n1. **细粒度通道注意力**：解决SE机制过于粗粒度的问题，每个令牌都有独特的门控信号[12]\n2. **位置信息编码**：为没有位置编码设计的ViT模型提供必要的位置信息[11]\n3. **增强鲁棒性**：通过基于局部特征的通道注意力机制，有效提升模型鲁棒性[11]\n4. **计算效率优化**：实现注意力化的通道混合器，同时减少计算开销[12]\n5. **满足ViT多样化需求**：简单而鲁棒的设计满足ViT的各种需求[12]\n\n#### 消融实验验证\n在CIFAR-100上的实验表明，ConvGLU相比其他变体（Type-1、Type-2、Type-3）表现最佳，验证了将深度卷积放在门控分支激活函数前的设计合理性[27]。\n\n---\n\n## 总结\n\nAggregated Attention和Convolutional GLU分别作为令牌混合器和通道混合器，共同构成了TransNeXt的核心。前者通过仿生视觉设计解决了深度退化和不自然视觉感知问题，后者通过改进的门控机制提升了通道建模能力和鲁棒性。两个模块的结合使TransNeXt在各种视觉任务上达到了最先进的性能[1][19]。"
  },
  {
    "path": "module-info/CVPR2024-UniRepLKNet.md",
    "content": "# Dilated Reparam Block 模块总结 https://arxiv.org/pdf/2311.15599\n\n## 1. 背景\n\n### 传统大核设计的局限性\n在UniRepLKNet之前，已有研究表明大核卷积应该与并行的小核卷积一起使用，因为小核有助于在训练过程中捕获小尺度模式[5]。传统做法是将大核和小核的输出通过各自的批归一化层后相加，训练后通过结构重参数化将小核等价合并到大核中以消除推理成本[5]。\n\n### 稀疏模式捕获的需求\n作者观察到，除了小尺度模式外，增强大核捕获稀疏模式的能力（即特征图上的像素可能与一些远距离像素比其邻近像素更相关）可能产生更高质量的特征。这种需求恰好匹配膨胀卷积的机制——从滑动窗口的角度看，膨胀率为r的膨胀卷积扫描输入通道以捕获空间模式，其中每个关注像素与其邻居相距r-1个像素[5]。\n\n## 2. 模块原理\n\n### 核心设计思想\nDilated Reparam Block使用多个并行的膨胀小核卷积层来增强非膨胀大核卷积层的性能[5]。该模块的超参数包括：\n- 大核尺寸K\n- 并行卷积层的核尺寸k  \n- 膨胀率r[5]\n\n### 等价转换机制\n**关键创新**：将膨胀卷积等价转换为非膨胀的稀疏大核[6]。\n\n**转换原理**：忽略输入像素等价于在卷积核中插入额外的零元素，因此膨胀率为r、核尺寸为k的膨胀卷积层可以等价转换为核尺寸为(k-1)r+1的非膨胀层[5][6]。\n\n**实现方法**：通过步长为r、恒等核I∈R^(1×1)的转置卷积优雅地实现转换[6]：\n```\nW' = conv_transpose2d(W, I, stride = r)\n```\n\n### 具体实例\n以K=9的示例为例，使用四个并行层，参数设置为k=(5,5,3,3)，r=(1,2,3,4)，等价核尺寸分别为(5,9,7,9)[6]。\n\n对于默认设置K=13，使用五个层，参数为k=(5,7,3,3,3)，r=(1,2,3,4,5)，等价核尺寸为(5,13,7,9,11)[6]。\n\n### 推理时合并\n推理时，首先将每个批归一化层合并到前面的卷积层中，然后使用转换函数将每个膨胀率r>1的层转换，最后通过适当的零填充将所有结果核相加[6]。\n\n## 3. 解决了什么问题\n\n### 1. 性能提升问题\n**实验验证**：与使用相同数量并行分支的非膨胀变体相比，Dilated Reparam Block显著提升了性能。在ImageNet准确率和ADE20K mIoU上分别达到81.63±0.02和46.37±0.10，优于其他变体[9]。\n\n### 2. 稀疏模式捕获问题\n**核心优势**：大核从并行膨胀卷积层捕获稀疏模式的能力中获益，而不仅仅是额外小核或不同感受野的组合[9]。这使得模型能够建立像素与远距离像素之间的长程依赖关系。\n\n### 3. 推理效率问题\n**零额外成本**：通过等价转换，Dilated Reparam Block在推理时可以完全转换为单个大核卷积，实现训练时性能提升和推理时零额外计算成本的完美平衡[5][6]。\n\n### 4. 架构设计问题\n**设计原则**：该模块体现了\"大核应该看得广而不需要很深\"的设计哲学，将传统ConvNet中扩大感受野、增加空间模式抽象层次和提升表征能力三个效果进行解耦[2][3]。\n\nDilated Reparam Block是UniRepLKNet架构设计的核心创新，它不仅解决了大核卷积的性能优化问题，更重要的是为大核ConvNet的架构设计提供了新的思路和方法。"
  },
  {
    "path": "module-info/CVPR2025-BHViT.md",
    "content": "# BHViT: 二值化混合视觉Transformer论文总结 https://arxiv.org/pdf/2503.02394\n\n## 核心思想与主要贡献\n\n本文提出了BHViT（Binarized Hybrid Vision Transformer），这是一种专门为二值化设计的混合视觉Transformer架构。研究发现，直接将现有的二值化CNN技术应用到ViT模型上会导致显著的性能下降，如图1所示，ReActNet在CNN架构上能达到73.3%的准确率，但在ViT架构上仅有49.5%[1]。\n\n主要贡献包括：\n- 探索了当前二值化ViT模型性能严重下降的原因[1][2]\n- 提出了三个新颖模块构建高性能的二值化友好混合ViT框架[2]\n- 提出了基于量化分解(QD)的注意力矩阵二值化方案[2]\n- 设计了正则化损失来解决权重振荡与Adam优化器不兼容的问题[2]\n\n## 方法架构\n\n### 1. 混合架构设计\nBHViT采用四阶段金字塔结构，在不同阶段使用不同的token mixer[5]：\n- **前两个阶段**：使用多尺度分组空洞卷积模块(MSGDC)处理大空间分辨率特征[5]\n- **后两个阶段**：使用多尺度多头注意力模块(MSMHA)进行token级特征融合[5]\n\n### 2. 关键技术模块\n\n#### 多尺度分组空洞卷积(MSGDC)\n使用三个不同空洞率的3×3分组卷积层，实现多尺度特征融合，显著减少模型参数和计算复杂度[6]。\n\n#### 多尺度多头注意力(MSMHA)\n基于窗口注意力机制的变体，通过7×7平均池化获得高尺度特征，同时将输入特征分割为7×7窗口版本，维持全局信息交互并降低计算成本[7]。\n\n#### 量化分解(QD)注意力二值化\n针对二值注意力矩阵无法准确表示不同token相似性差异的问题，提出了QD方法。使用全局缩放常数s=2^n-1，通过逻辑操作获得s个二值注意力矩阵[7][8]。\n\n#### 二值化MLP增强\n引入shift操作模块，包括水平、垂直和混合shift操作，减轻信息损失和梯度误差[9]。\n\n## 三个重要观察\n\n### 观察1：避免过多token有益于二值化ViT\n通过理论分析证明，随着token数量k增加，注意力矩阵的信息熵会增加，概率分布逐渐接近均匀分布，削弱了注意力机制的有效性[6][23][24][25]。\n\n### 观察2：在每个二值化层添加残差连接有益\n层级残差连接能有效缓解多个二值化层连续叠加导致的激活梯度消失问题[8][28][29]。\n\n### 观察3：Adam优化器放大了二值网络的权重振荡\n在训练后期，Adam优化器会放大权重振荡，导致许多参数无法有效更新。为此提出L1正则化损失[10][30][31]。\n\n## 实验结果\n\n### 分类任务性能\n在ImageNet-1K数据集上：\n- BHViT-Small†相比当前SOTA方法ReActNet提升20.6%[12]\n- 相比Swin transformer架构的BiViT方法提升11.5%[12]\n- 在CIFAR-10数据集上，BHViT-Small达到95.0%准确率[11]\n\n### 分割任务性能\n在道路分割任务中，BHViT在RS-LVF数据集上的mIoU达到85.1%，超越全精度ResNet-34的77.8%[13]。在ADE20K图像分割任务中也取得了SOTA性能[13]。\n\n## 消融研究\n\n实验验证了各个提出模块的有效性[14]：\n- 移除正则化损失导致性能下降2.9%\n- 移除shift模块导致性能下降4.3%\n- 移除QD方法导致性能下降6.1%\n\n权重分布分析显示，正则化损失能有效改变潜在权重分布，使其更接近±1，缓解权重振荡问题[15]。\n\n## 结论\n\nBHViT成功解决了二值化ViT面临的关键挑战，通过混合架构设计、创新的注意力二值化方法和优化策略，在多个基准数据集上实现了SOTA性能，为在边缘设备上部署高效的视觉Transformer提供了有效解决方案[16]。"
  },
  {
    "path": "module-info/CVPR2025-DarkIR.md",
    "content": "# DarkIR中EBlock和DBlock模块详细分析 https://arxiv.org/pdf/2412.13443\n\n## EBlock (编码器块) - 低光增强编码器\n\n### 1. 背景\n在低光条件下，图像主要面临照明不足的问题。研究表明，低光条件与图像在频域中的幅度信息高度相关[4][5]。传统方法通常在空间域处理这些问题，但频域处理可以更有效地增强照明条件。\n\n### 2. 模块原理\nEBlock基于Metaformer架构设计，包含两个核心组件[4][5]：\n\n**空间注意力模块 (SpAM)**：\n- 采用类似NAFBlock的结构，使用倒残差块和简化通道注意力(SCA)\n- 使用简单的门控机制替代激活函数\n- 提取有意义的空间信息用于频域增强\n\n**频域多层感知机 (Fre-MLP)**：\n- 应用快速傅里叶变换(FFT)将图像转换到频域\n- **仅对幅度信息进行操作**，不触及相位信息\n- 使用逆快速傅里叶变换(IFFT)转换回空间域\n- 在幅度上操作的MLP比在空间域操作具有更好的效果\n\n**下采样策略**：\n- 使用步长卷积进行下采样\n- 每个层级后特征分辨率减半，允许在深层使用更多编码器块而不显著增加操作数\n\n### 3. 解决的问题\n- **低光照明恢复**：通过频域幅度增强直接改善图像亮度[4][5]\n- **多尺度处理**：照明和幅度在不同尺度上保持一致性，可以在低分辨率估计后进行放大[5]\n- **计算效率**：频域处理的全局特性使得低光增强任务更加高效[4]\n- **中间监督**：产生低分辨率图像估计\\(\\hat{x}_{\\downarrow 8}\\)，用于架构引导损失的正则化[4]\n\n---\n\n## DBlock (解码器块) - 去模糊解码器\n\n### 1. 背景\n图像去模糊通常需要大感受野来处理各种类型的模糊核。传统方法要么通过深度特征提取和下采样实现，要么使用大核卷积，但后者会导致更高的计算复杂度和内存需求[4][6]。\n\n### 2. 模块原理\nDBlock专注于空间变换，同样遵循Metaformer结构[6]：\n\n**扩张空间注意力模块 (Di-SpAM)**：\n- 受大核注意力(LKA)启发，但使用三个不同层级的特征\n- 采用三个扩张深度卷积，扩张因子分别为1、4、9\n- 将三个分支的属性组合，然后应用简化通道注意力\n- 相比LKA性能更好且参数更少[10]\n\n**门控前馈网络 (Gated-FFN)**：\n- 使用简单门控机制替代激活函数\n- 类似NAFNet的设计理念\n\n**处理假设**：\n- 解码器输入是\\(\\hat{x}_{\\downarrow 8}\\)的深度表示\n- 假设照明已被编码器校正，解码器专注于上采样和锐化[5][6]\n\n### 3. 解决的问题\n- **模糊去除**：通过大感受野空间注意力有效处理各种模糊类型[6]\n- **细节恢复**：在照明增强的基础上恢复图像锐度和细节\n- **计算优化**：相比大核卷积方法，扩张卷积提供更好的效率/性能平衡[10]\n- **多尺度特征融合**：通过不同扩张因子捕获不同尺度的模糊信息[6]\n\n---\n\n## 模块协同工作机制\n\n### 任务分工\n- **EBlock**：在低分辨率下处理照明问题，利用频域的全局特性[4][5]\n- **DBlock**：在高分辨率下处理模糊问题，利用空间域的局部特性[6]\n\n### 信息传递\n- 编码器提供照明增强的特征给解码器\n- 通过中间输出\\(\\hat{x}_{\\downarrow 8}\\)进行架构引导[4]\n- 解码器专注于上采样和锐化已增强的低分辨率重建[5]\n\n### 效率优势\n这种非对称设计允许使用更少的块，显著减少参数数量和计算成本，同时保持最先进的性能[4][8]。"
  },
  {
    "path": "module-info/CVPR2025-EVSSM.md",
    "content": "# EVS和EDFFN模块详细分析 https://arxiv.org/pdf/2405.14343\n\n## EVS（高效视觉扫描）模块\n\n### 1. 背景\n传统的状态空间模型（如Mamba）是为处理一维序列数据而设计的，直接应用到视觉任务时需要将图像数据展平为一维序列，这会破坏图像的空间结构，难以捕获来自各种相邻像素的局部信息[2]。\n\n现有的视觉状态空间模型大多采用多方向扫描机制来利用状态空间模型，但这种策略显著增加了计算成本。例如，VMamba的计算成本比Mamba高4倍，因为它在纵向和横向方向上执行双向扫描[5]。\n\n### 2. 模块原理\nEVS模块的核心创新是**几何变换+单方向扫描**的策略[5]：\n\n**几何变换策略**：\n```\nG = {\n    Transpose(Fin)  if i % 2 = 0\n    Flip(Fin)       if i % 2 = 1\n}\n```\n其中i是网络中第i个EVSS模块的索引，Flip操作沿特征的水平和垂直轴进行翻转[6]。\n\n**扫描过程**：\n1. 首先对输入特征应用几何变换\n2. 通过1×1卷积分割特征为X1和X2\n3. 对X1应用深度卷积和选择性扫描S6\n4. 对X2应用激活函数\n5. 最终通过1×1卷积融合结果[7]\n\n**空间结构恢复**：图像特征在每4个EVSS模块后自动恢复到原始空间结构，如果总模块数不能被4整除，可以通过相应的逆变换来恢复原始空间结构[6]。\n\n### 3. 解决的问题\n- **空间信息丢失问题**：通过几何变换保持了图像的空间结构信息，避免了简单展平造成的信息损失[5]\n- **计算复杂度问题**：相比多方向扫描，EVS模块在保持相同参数量和FLOPs的情况下，运行时间从182.6ms降低到88.7ms[12]\n- **非局部信息探索**：通过不同的几何变换，每次扫描都能捕获来自不同方向的上下文信息，有效探索非局部信息[12]\n\n## EDFFN（高效判别频域FFN）模块\n\n### 1. 背景\nFFN部分通常是深度学习模型的核心组件，有助于潜在清晰图像的重建[7]。FFTformer开发了一种判别频域FFN（DFFN），能够自适应地确定应该保留哪些频率信息，但这在执行频域操作时增加了计算成本[7]。\n\n### 2. 模块原理\nEDFFN的核心设计理念是**频域筛选后置**[7]：\n\n**与DFFN的区别**：\n- DFFN：在FFN网络的中间应用频域操作\n- EDFFN：在FFN网络的最终阶段执行频域筛选[7]\n\n**模块结构**：\n1. 输入特征经过归一化\n2. 通过1×1卷积进行特征变换\n3. 应用深度卷积和GELU激活\n4. 在最终阶段进行频域筛选操作\n5. 通过1×1卷积输出最终特征[4]\n\n### 3. 解决的问题\n- **计算效率问题**：通过将频域操作后置到FFN的最终阶段，相比在中间阶段进行频域操作的DFFN，显著降低了计算成本[7]\n- **特征变换效率**：有效且高效地变换来自EVSS模块的特征，为潜在清晰图像重建提供支持[7]\n- **频率信息选择**：保持了对有用频率信息的自适应选择能力，同时提高了计算效率[7]\n\n## 模块协同效果\n\nEVS和EDFFN模块的结合使得EVSSM能够：\n1. **高效处理视觉数据**：EVS模块通过几何变换适配SSM到视觉任务\n2. **有效特征变换**：EDFFN模块高效地处理和筛选频域特征\n3. **整体性能提升**：两个模块协同工作，在GoPro数据集上相比基线方法PSNR提升0.14dB，同时保持相同的参数量和计算复杂度[12]"
  },
  {
    "path": "module-info/CVPR2025-EfficientViM.md",
    "content": "# EfficientViM模块详细分析 https://arxiv.org/pdf/2411.15241\n\n## 1. 背景\n\n### 现有技术挑战\n- **传统CNN局限性**：卷积神经网络虽然在局部特征提取上表现良好，但在捕获全局依赖关系方面存在不足[1]\n- **Vision Transformer瓶颈**：自注意力机制具有二次计算复杂度O(L²D)，在处理长序列时计算成本过高[1]\n- **状态空间模型机遇**：SSM提供了线性复杂度的全局交互能力，但现有SSD层存在计算瓶颈[2][3]\n\n### SSD层的计算瓶颈\n传统NC-SSD层的主要计算开销来自：\n- 输入序列的线性投影操作：O(LD²)[5]\n- 门控和输出投影：O(LD²)[5]\n- 总体复杂度被线性投影主导，限制了模型的可扩展性[5]\n\n## 2. 模块原理\n\n### 2.1 隐藏状态混合器（HSM-SSD）核心思想\n\n#### 计算重排策略\n**关键洞察**：NC-SSD可以分解为两个步骤[5]：\n1. 通过重要性权重a∈R^L对输入状态B^T_i x_i进行加权线性组合，获得共享全局隐藏状态h∈R^(N×D)\n2. 通过相应的C∈R^(L×N)投影隐藏状态生成各输入的输出\n\n#### 数学推导\n原始操作：`h = (a1^T_N ⊙ B)^T(x_in W_in) = ((a1^T_N ⊙ B)^T x_in)W_in = h_in W_in`[5]\n\n通过先计算h_in，将线性投影的复杂度从O(LD²)降低到O(ND²)[5][6]\n\n#### HSM近似\n将原始输出：`x_out = f(y) = Linear(y ⊙ σ(z))`\n近似为：`x_out = C((h ⊙ σ(h_in W_z))W_out) = Cf(h)`[6]\n\n### 2.2 关键技术组件\n\n#### 单头设计优化\n- **问题**：多头配置中的内存绑定操作成为瓶颈，占用约1/4的总运行时间[8]\n- **解决方案**：采用单头设计，消除张量操作开销（reshape、copy等）[8]\n- **能力补偿**：通过状态级重要性权重A∈R^(L×N)模拟多头的多样化关系捕获能力[8]\n\n#### 多阶段隐藏状态融合（MSF）\n- **机制**：融合来自网络多个阶段的隐藏状态预测logits[7]\n- **计算过程**：\n  1. 对每阶段隐藏状态h^(s)计算全局表示：`ĥ^(s) = (1/N)∑h^(s)_i`[7]\n  2. 归一化并投影生成对应logits z^(s)[7]\n  3. 加权融合：`z = ∑β̂^(s)z^(s)`，其中β̂^(s)为可学习权重[7]\n\n### 2.3 算法流程\n```\n输入: x_in ∈ R^(L×D)\n1. B̂, C, Δ ← Linear(x_in)           // O(LND)\n2. B̂, C ← DWConv(B̂, C)             // O(LNK²D)  \n3. A, B ← Discretization(â, B̂, Δ)   // O(LD)\n4. h_in ← (A ⊙ B)^T x_in            // O(LND)\n5. h, z ← Linear(h_in)              // O(ND²)\n6. h ← Linear(h ⊙ σ(z))             // O(ND²)\n7. x_out ← Ch                       // O(LND)\n```\n\n## 3. 解决了什么问题\n\n### 3.1 计算效率问题\n- **复杂度优化**：将SSD层复杂度从O(LD²)降低到O(ND² + LND)，当N≪L时显著减少计算量[5][6]\n- **实际加速**：相比传统方法实现显著的吞吐量提升，EfficientViM-M2达到17,005 img/s[10]\n\n### 3.2 内存效率问题  \n- **内存绑定操作优化**：通过单头设计消除多头配置中的内存访问瓶颈[8]\n- **实际内存使用**：尽管参数较多，但峰值内存使用量仅为某些轻量级模型的1/3[16]\n\n### 3.3 速度-准确率权衡问题\n- **SOTA性能**：在ImageNet-1K上建立新的速度-准确率权衡最优前沿[1][10]\n- **具体提升**：相比SHViT提升0.6%性能的同时实现7%的速度提升[3]\n- **相比传统模型**：相比MobileNetV3性能提升0.6%，速度提升80%[3]\n\n### 3.4 可扩展性问题\n- **高分辨率适应性**：在极高分辨率图像处理中展现出色的扩展能力[20][21]\n- **多任务适用性**：在目标检测、实例分割、语义分割等密集预测任务上均表现优异[14][15][16]\n\n### 3.5 实际部署问题\n- **硬件友好**：优先考虑实际运行性能而非理论FLOPs，更适合实际部署[3]\n- **跨设备性能**：在GPU、CPU和移动设备上均保持竞争力[23][24]\n\n通过这些创新设计，EfficientViM成功解决了现有视觉模型在效率、可扩展性和实际部署方面的关键挑战，为资源受限环境下的视觉任务提供了高效解决方案。"
  },
  {
    "path": "module-info/CVPR2025-FDConv.md",
    "content": "# FDConv模块详细总结 https://arxiv.org/pdf/2503.18783\n\n## 1. 背景\n\n### 传统动态卷积的发展与局限\n- **动态卷积（DY-Conv）** 通过使用多个并行权重结合注意力机制，实现了样本特定的权重自适应，相比标准卷积具有更好的适应性[1][6]。\n- **主要问题**：\n  - 传统动态卷积方法（如ODConv、CondConv等）的并行权重在频率响应上高度相似，缺乏多样性[1][2]。\n  - 参数成本大幅增加（通常增加n倍，n<10），但适应性提升有限[3]。\n  - 权重之间的余弦相似度高达0.88以上，表明存在严重的参数冗余[13]。\n\n## 2. 模块原理\n\n### FDConv的三个核心组件\n\n#### 2.1 傅里叶不相交权重（FDW）[7][8]\n- **核心思想**：在傅里叶域而非空间域学习频谱系数\n- **实现步骤**：\n  1. **傅里叶不相交分组**：将固定数量的参数按频率从低到高排序，均匀分成n个不相交的组\n  2. **傅里叶到空间变换**：使用逆离散傅里叶变换（iDFT）将每组参数转换到空间域\n  3. **重组**：将变换结果裁剪成k×k的块并重组成标准权重形状\n\n#### 2.2 核空间调制（KSM）[8][9]\n- **目的**：实现滤波器级别的精细调制\n- **结构**：\n  - **局部通道分支**：使用轻量级1D卷积捕获局部通道信息，预测密集调制矩阵\n  - **全局通道分支**：使用全连接层获取全局通道信息，预测三个维度的调制值\n- **输出**：生成k×k×Cin×Cout的密集调制矩阵α\n\n#### 2.3 频率带调制（FBM）[9][10]\n- **功能**：实现空间变化的频率调制\n- **工作流程**：\n  1. **核频率分解**：将卷积权重分解为不同频率带（默认4个频带）\n  2. **傅里叶域卷积**：在频率域执行卷积操作\n  3. **空间变化调制**：为每个空间位置的每个频率带预测调制值\n\n## 3. 解决了什么问题\n\n### 3.1 频率多样性问题\n- **问题**：传统动态卷积的并行权重频率响应高度相似[1][2]\n- **解决方案**：FDW通过不相交的傅里叶索引分组，确保每个权重具有独特的频率响应[3][7]\n- **效果**：权重之间的余弦相似度降为0，实现真正的频率多样性[13]\n\n### 3.2 参数效率问题\n- **问题**：传统方法参数成本增加n倍（如CondConv +90M，ODConv +65.1M）[11][12]\n- **解决方案**：FDConv保持固定参数预算，通过傅里叶域分组可生成大量（n>10）多样化权重[3]\n- **效果**：仅增加3.6M参数即可达到优异性能[11]\n\n### 3.3 空间不变性问题\n- **问题**：传统动态卷积在整个特征图上共享权重，无法适应空间变化的内容[9]\n- **解决方案**：FBM实现空间特定的频率调制，可根据局部内容动态调整频率响应[9][10]\n- **效果**：能够在不同空间位置选择性地强调或抑制特定频率带，更好地捕获图像中的复杂结构[15]"
  },
  {
    "path": "module-info/CVPR2025-GroupMamba.md",
    "content": "# GroupMamba Layer模块详细总结 https://arxiv.org/pdf/2407.13772\n\n## 1. 背景\n\n### 现有问题\n传统的Mamba模型在计算机视觉任务中面临几个关键挑战：\n\n**稳定性问题**：\n- Mamba模型，特别是S6算法，在图像分类任务中存在不稳定性，尤其是扩展到大型模型时[2][4]\n- 例如SiMBA-L (MLP)模型会导致次优的分类结果，准确率仅为49%[4]\n\n**计算效率问题**：\n- 视觉状态空间(VSS)块包含大量的输入输出投影和深度卷积，其参数和计算复杂度与输入通道数成正比[2]\n- Mamba设计在处理大量通道时计算效率低下[4][6]\n\n**交互局限性**：\n- 现有模型在处理空间依赖关系和全局-局部信息建模方面存在不足[2]\n\n## 2. 模块原理\n\n### 整体架构\nGroupMamba Layer采用模块化设计，主要包含三个核心组件[5][6]：\n\n```\nXout = Xin + FFN(LN(XCAM))\n其中：\nXGM = GroupedMamba(Xin, Θ)\nXCAM = CAM(XGM, Affinity(Xin))\n```\n\n### 核心组件详解\n\n#### 2.1 分组Mamba算子(Grouped Mamba Operator)\n**设计思路**：\n- 受组卷积启发，将输入通道分为四个组，每组大小为C/4[6]\n- 每个组独立应用VSSS块，在不同空间方向进行扫描[6]\n\n**四方向扫描策略**：\n- 从左到右(Left-to-Right)\n- 从右到左(Right-to-Left) \n- 从上到下(Top-to-Bottom)\n- 从下到上(Bottom-to-Top)[6][7]\n\n**数学表达**：\n```\nXGM = GroupedMamba(Xin, Θ) = Concat[\n    VSSS(XLR, ΘLR), \n    VSSS(XRL, ΘRL),\n    VSSS(XTB, ΘTB), \n    VSSS(XBT, ΘBT)\n]\n```\n其中每个方向的输入张量形状为(B, H, W, C/4)[7]\n\n#### 2.2 视觉单选择扫描(VSSS)块\n**功能**：作为令牌和通道混合器，基于Mamba算子构建[6]\n\n**结构**：\n```\nZ'out = Zin + Mamba(LN(Zin))\nZout = Z'out + FFN(LN(Z'out))\n```\n包含Mamba块和前馈网络，每个前面都有LayerNorm[6]\n\n#### 2.3 通道亲和力调制(CAM)算子\n**设计目的**：解决分组操作导致的跨通道信息交换受限问题[7]\n\n**工作流程**：\n1. **通道统计计算**：\n   ```\n   ChannelStat(Xin) = AvgPool(Xin)\n   ```\n\n2. **亲和力计算**：\n   ```\n   Affinity(Xin) = σ(W2δ(W1ChannelStat(Xin)))\n   ```\n\n3. **特征重新校准**：\n   ```\n   XCAM = XGM · Affinity(Xin)\n   ```\n\n**与SE块的区别**：\n- CAM专门针对多组变换中的跨通道注意力设计\n- 允许组间信息交换，克服分组Mamba算子的固有限制[7][8]\n\n## 3. 解决的关键问题\n\n### 3.1 计算效率问题\n**解决方案**：\n- 通过将通道分为四组，显著减少了参数数量和计算复杂度[6]\n- 相比VMamba-T，GroupMamba-T参数减少26%，吞吐量提升2.5倍[12]\n\n**效果**：\n- GroupMamba-T：2300万参数，相比传统方法参数效率提升显著[9]\n\n### 3.2 稳定性问题  \n**解决方案**：\n- 引入基于蒸馏的训练目标，稳定大型模型训练[8]\n- 联合损失函数：`Ltotal = αLCE(Zs, y) + (1-α)LCE(Zs, yt)`[8]\n\n**效果**：\n- 大型模型训练更加稳定，损失收敛更平滑[20][21]\n- GroupMamba-B通过蒸馏损失准确率提升1.3%[20]\n\n### 3.3 空间建模局限性\n**解决方案**：\n- 四方向扫描策略提供全面的空间覆盖[6][7]\n- 有效建模局部和全局信息的空间依赖关系[2]\n\n**效果**：\n- 四个扫描方向相比单一方向能捕获更丰富的空间线索[20]\n- 在ImageNet-1K上达到state-of-the-art性能[9]\n\n### 3.4 通道交互问题\n**解决方案**：\n- CAM算子增强跨通道通信，补偿分组操作的局限性[7]\n- 通过通道重新校准提升网络表示能力[7]\n\n**效果**：\n- CAM模块使准确率从82.20%提升到82.50%[12]\n- 有效解决了分组操作带来的信息交换受限问题[7][8]\n\n## 总结\n\nGroupMamba Layer通过创新的分组设计、多方向扫描和通道调制机制，成功解决了传统Mamba模型在视觉任务中的效率、稳定性和交互性问题，为构建高效的视觉状态空间模型提供了新的解决方案[1][2]。"
  },
  {
    "path": "module-info/CVPR2025-LSNet.md",
    "content": "# LSNet中的LS Block模块总结 https://arxiv.org/pdf/2503.23135\n\n## 1. 背景\n\n### 传统轻量级网络的局限性\n现有轻量级视觉网络主要依赖两种token混合方式：\n- **自注意力机制**：采用全局感知和全局聚合，但在信息量较少的区域（如背景）会产生冗余注意力，且感知和聚合使用相同的混合范围，扩展上下文时计算复杂度显著增加[1][2]\n- **卷积操作**：使用相对位置关系进行感知，通过固定核权重进行聚合，但关系建模仅依赖相对位置，对不同上下文缺乏适应性，表达能力受限[2][6][7]\n\n### 人类视觉系统的启发\n人类视觉系统具有动态异尺度视觉能力，遵循双步机制：\n- **周边视觉**：通过大视野感知捕获场景的广泛概览（\"看大\"）\n- **中央视觉**：通过小视野聚合实现对特定元素的详细理解（\"聚小\"）\n\n这种机制源于视网膜中两种感光细胞的不同分布和功能：杆状细胞广泛分布于周边区域负责大视野感知，锥状细胞集中在中央凹负责精细聚焦[3]。\n\n## 2. 模块原理\n\n### LS卷积的核心设计\nLS Block的核心是LS（Large-Small）卷积，包含两个关键步骤：\n\n#### 大核感知（Large-Kernel Perception, LKP）\n- 采用大核瓶颈块设计\n- 首先使用1×1卷积将通道维度降至C/2以减少计算成本\n- 然后使用KL×KL的大核深度卷积高效捕获大视野空间上下文信息\n- 最后通过1×1卷积生成上下文自适应权重W∈R^(H×W×D)用于聚合步骤[7][8]\n\n数学表达：\n```\nwi = Pls(xi, NKL(xi)) = PW(DWKL×KL(PW(NKL(xi))))\n```\n\n#### 小核聚合（Small-Kernel Aggregation, SKA）\n- 采用分组动态卷积设计\n- 将特征图通道分为G组，每组包含C/G个通道，同组内共享聚合权重以降低内存开销\n- 将LKP生成的权重wi重塑为w*i∈R^(G×KS×KS)\n- 使用w*i对高度相关的KS×KS邻域进行自适应聚合[8]\n\n数学表达：\n```\nyic = Als(w*ig, NKS(xic)) = w*ig ⊛ NKS(xic)\n```\n\n### LS Block的完整结构\nLS Block基于LS卷积构建，包含以下组件：\n- **LS卷积**：执行有效的token混合\n- **跳跃连接**：促进模型优化\n- **额外的深度卷积和SE层**：通过引入更多局部归纳偏置增强模型能力\n- **前馈网络（FFN）**：用于通道混合[9]\n\n## 3. 解决的问题\n\n### 3.1 计算效率问题\n**问题**：传统自注意力机制在扩展感知范围时计算复杂度急剧增加\n**解决方案**：\n- 通过异尺度设计，大核感知使用高效的深度卷积，小核聚合限制在小区域\n- 总计算复杂度为O(HWC/4(3C + 2K²L + (2G + 4)K²S))，相对输入分辨率呈线性关系[8]\n- 实验显示LS卷积相比其他方法在更低FLOPs下获得更高准确率[17]\n\n### 3.2 表达能力限制问题\n**问题**：传统卷积的聚合权重由固定核权重决定，缺乏对不同上下文的适应性\n**解决方案**：\n- LKP通过大核感知建模丰富的空间关系\n- SKA基于感知结果进行动态自适应聚合\n- 消融实验显示相比简单的大小核组合，LS卷积提升1.5%准确率[17]\n\n### 3.3 感知范围与聚合精度的平衡问题\n**问题**：现有方法难以在有限计算预算下同时实现广泛感知和精确聚合\n**解决方案**：\n- \"看大聚小\"策略：大范围感知捕获全局上下文，小范围聚合实现精确特征融合\n- 可视化分析显示LS卷积同时具备中央区域聚焦和广泛周边视野能力[33]\n- 聚合权重可视化表明能够准确强化语义相关区域[35]\n\n### 3.4 轻量级网络的性能瓶颈\n**问题**：轻量级网络在有限计算资源下难以获得足够的表达能力\n**解决方案**：\n- 通过生物启发的设计提高特征表达效率\n- 在ImageNet-1K上，LSNet-T仅用0.31G FLOPs达到74.9%准确率，显著超越同等计算量的其他方法[11]\n- 在多个下游任务中均表现出色，证明了良好的迁移能力[12][14][15]\n\nLS Block通过巧妙结合大核感知和小核聚合，成功解决了轻量级网络在效率、表达能力和感知精度方面的关键挑战，为轻量级视觉网络设计提供了新的解决思路。"
  },
  {
    "path": "module-info/CVPR2025-MambaIRV2.md",
    "content": "# Attentive State Space Group (ASSG) 模块总结 https://arxiv.org/pdf/2411.15269\n\n## 1. 背景\n\n### 问题背景\n传统Mamba架构在图像修复任务中面临的核心挑战：\n- **因果建模限制**：Mamba的状态空间方程具有因果性质，每个像素只能依赖于扫描序列中的前序像素，无法全局利用相似像素[1][2]\n- **局部-全局建模需求**：图像修复任务既需要捕获局部细节特征，也需要全局上下文信息进行有效修复[9]\n- **计算效率要求**：需要在保证性能的同时控制计算复杂度，特别是对于高分辨率图像[9]\n\n### 设计动机\n基于对注意力机制与状态空间模型数学联系的深入分析，发现可以通过修改状态空间方程的输出矩阵C来实现类似注意力的非因果查询能力[6]。同时，考虑到图像修复任务的层次化特性，需要设计能够同时处理局部和全局信息的模块架构[9]。\n\n## 2. 模块原理\n\n### 整体架构设计\nASSG采用分层处理策略，包含多个Attentive State Space Block (ASSB)，每个ASSB实现渐进式局部到全局建模[9]：\n\n```\nASSG = {ASSB₁, ASSB₂, ..., ASSBₙ}\n```\n\n### ASSB内部结构\n每个ASSB采用统一的模板设计[9]：\n- **Norm + Token Mixer + Norm + FFN**的基本结构\n- **双重Token Mixer**：\n  - 局部部分：窗口多头自注意力(Window MHSA)处理局部交互\n  - 全局部分：注意力状态空间模块(ASSM)处理全局依赖\n- **残差连接**：引入可学习缩放因子的残差连接[9]\n\n### 核心组件协同\n1. **Window MHSA**：负责窗口内的局部特征交互，利用自注意力机制捕获精细的局部结构信息[9]\n\n2. **ASSM (Attentive State Space Module)**：\n   - 包含ASE (Attentive State-space Equation)和SGN (Semantic Guided Neighboring)\n   - 通过单次语义空间扫描实现全局建模[7][8][9]\n\n3. **分层信息融合**：通过多个ASSG的堆叠，形成从浅层到深层的特征层次[9]\n\n## 3. 解决了什么问题\n\n### 3.1 局部-全局建模平衡\n**问题**：传统方法要么局限于局部感受野(CNN)，要么计算复杂度过高(全局注意力)\n\n**解决方案**：\n- 通过Window MHSA高效处理局部交互\n- 通过ASSM实现计算友好的全局建模\n- 渐进式设计确保信息从局部到全局的有效传递[9]\n\n### 3.2 计算效率优化\n**问题**：多方向扫描导致计算冗余，参数利用效率低\n\n**解决方案**：\n- 单次扫描策略：相比传统4方向扫描减少43%参数和50%计算负担[19]\n- 参数预算重分配：将节省的参数用于增强局部建模能力(Window MHSA)[9]\n\n### 3.3 特征表示能力增强\n**问题**：Mamba的因果性限制了对图像全局信息的利用\n\n**解决方案**：\n- ASE通过提示学习机制实现非因果查询，使模型能够\"看到\"未扫描的像素[7][8]\n- SGN通过语义重排缓解长距离衰减问题[9]\n- 局部-全局协同建模提升整体特征表示能力\n\n### 3.4 架构通用性\n**问题**：需要一个能够适应多种图像修复任务的通用骨干网络\n\n**解决方案**：\n- 模块化设计支持不同任务的灵活配置\n- 在超分辨率、去噪、JPEG压缩伪影去除等多个任务上均取得优异性能[11][13][14][16][18][19]\n- 提供Small、Base、Large三种规模变体满足不同应用需求[10]\n\n### 性能验证\n实验结果表明ASSG设计的有效性：\n- **消融研究**：移除ASSM后性能显著下降，验证了全局建模的重要性[10]\n- **效率对比**：相比HAT等方法在保持性能的同时显著降低计算复杂度[16][17]\n- **泛化能力**：在多个数据集和任务上均表现出色，证明了架构的通用性[11][13][14][16][18][19]\n\nASSG模块通过巧妙的局部-全局协同设计，成功解决了Mamba在图像修复任务中的关键限制，为状态空间模型在计算机视觉领域的应用提供了重要突破[9]。"
  },
  {
    "path": "module-info/CVPR2025-MambaOut.md",
    "content": "# Gated CNN Block 模块总结 https://arxiv.org/pdf/2405.07992\n\n## 1. 背景\n\n### 历史发展背景\nGated CNN block最初由Dauphin等人在2017年提出，用于语言建模任务[18]。在本文中，作者发现**Mamba block实际上是基于Gated CNN block构建的**[9][10]。\n\n### 与Mamba的关系\n通过对比分析发现，**Mamba block和Gated CNN block的主要区别仅在于是否包含SSM（状态空间模型）组件**[1][9]：\n- **Gated CNN block**: `TokenMixer(Z) = Conv(Z)`[10]\n- **Mamba block**: `TokenMixer(Z) = SSM(σ(Conv(Z)))`[10]\n\n这一发现促使作者构建MambaOut模型来验证SSM在视觉任务中的必要性[9]。\n\n## 2. 模块原理\n\n### 整体架构\nGated CNN block采用了MetaFormer的元架构设计[9]，其数学表达式为：\n```\nX' = Norm(X)                                    [9]\nY = (TokenMixer(X'W₁) ⊙ σ(X'W₂))W₃ + X        [9]\n```\n\n### 核心组件设计\n\n**Token Mixer设计**[10]：\n- 使用**7×7深度卷积**作为token mixer，遵循ConvNeXt的设计\n- 采用**部分通道卷积**策略，仅对部分通道进行深度卷积以提升实际运行速度\n\n**门控机制**[10]：\n- 输入通过`fc1`线性层分为三个部分：`g`（门控）、`i`（信息）、`c`（卷积）\n- 门控部分`g`经过激活函数后与其他部分相乘，实现选择性信息传递\n- 公式：`output = fc2(act(g) * cat(i, conv(c)))`\n\n### 具体实现细节\n根据Algorithm 1的PyTorch代码[10]：\n- **扩展比例**：默认为8/3\n- **卷积核大小**：7×7\n- **分组卷积**：使用深度可分离卷积\n- **残差连接**：包含shortcut连接确保梯度流动\n\n## 3. 解决了什么问题\n\n### 计算效率问题\n**线性复杂度优势**[4][5]：\n- 相比于注意力机制的二次复杂度，卷积操作提供了更高的计算效率\n- 特别适合处理不需要全局信息交互的任务\n\n### 特征选择问题\n**门控机制的优势**[10]：\n- 通过门控单元实现**选择性特征传递**\n- 允许模型自适应地决定哪些信息应该被保留或抑制\n- 提供了比普通卷积更强的表达能力\n\n### 架构简化问题\n**奥卡姆剃刀原理**[14]：\n- 对于不需要复杂序列建模的视觉任务，**Gated CNN提供了更简洁有效的解决方案**\n- 实验证明，在ImageNet图像分类任务中，去除SSM的MambaOut模型反而表现更好\n\n### 实际应用问题\n**工程实现优势**[10]：\n- 代码实现**简单优雅**\n- 相比复杂的SSM机制，更容易理解和调试\n- 在不需要长序列建模的场景下，提供了更好的性能-复杂度权衡\n\n## 核心洞察\n\nGated CNN block的成功说明了一个重要原则：**架构设计应该与任务特征相匹配**[2]。对于图像分类这类不需要长序列和自回归特征的任务，简单的门控卷积架构就足够了，而不需要引入额外的SSM复杂性[3][14]。\n\n这为未来的模型设计提供了重要启示：**并非所有任务都需要最新最复杂的架构，有时候更简单的解决方案反而更有效**。"
  },
  {
    "path": "module-info/CVPR2025-MambaVision.md",
    "content": "# MambaVision Mixer模块总结 https://arxiv.org/pdf/2407.08083\n\n## 1. 背景\n\n### 原始Mamba在视觉任务中的局限性\n传统Mamba架构虽然在自然语言处理任务中表现出色，但在计算机视觉应用中面临显著挑战[2][3]：\n\n- **顺序处理限制**：Mamba的自回归特性适合序列数据处理，但图像像素不具有严格的顺序依赖关系，空间关系更多是局部的，需要并行和集成的处理方式[2]\n- **全局上下文捕获不足**：自回归模型逐步处理数据，限制了在单次前向传播中捕获和利用全局上下文的能力[3]\n- **因果卷积的方向性限制**：原始Mamba使用因果卷积，限制了影响范围到单一方向，这对视觉任务来说是不必要且具有限制性的[8]\n\n### 现有解决方案的不足\n虽然Vision Mamba (Vim)等方法提出了双向SSM来解决全局上下文缺失问题，但这些方法引入了显著的延迟，因为需要在做出预测前处理整个序列，增加的复杂性还可能导致训练困难和过拟合风险[3]。\n\n## 2. 模块原理\n\n### 核心设计思想\nMambaVision Mixer通过创建**对称双分支架构**来重新设计原始Mamba块，如图3所示[8][9]：\n\n### 具体架构组成\n\n#### 分支1：改进的SSM分支\n```\nX1 = Scan(σ(Conv(Linear(C, C/2)(Xin))))\n```\n- 将原始的**因果卷积替换为常规卷积**，消除单向限制[8]\n- 保留选择性扫描(Scan)操作进行序列建模[9]\n- 使用SiLU激活函数[9]\n\n#### 分支2：对称非SSM分支\n```\nX2 = σ(Conv(Linear(C, C/2)(Xin)))\n```\n- **不包含SSM操作**的纯卷积分支[8]\n- 使用相同的卷积和SiLU激活配置[9]\n- 作为补偿路径处理可能因SSM顺序约束丢失的内容[8]\n\n#### 特征融合\n```\nXout = Linear(C/2, C)(Concat(X1, X2))\n```\n- 将两个分支输出**连接(Concat)**而非相加[9]\n- 通过最终线性层投影回原始嵌入维度[9]\n- 每个分支输出维度为C/2，保持参数量与原始设计相似[9]\n\n### 算法实现\n论文提供了PyTorch风格的伪代码实现[7]，展示了完整的前向传播过程，包括：\n- 输入投影和维度分割\n- 双分支并行处理\n- 选择性扫描操作\n- 特征连接和输出投影\n\n## 3. 解决了什么问题\n\n### 3.1 空间信息处理效率问题\n**问题**：原始Mamba的因果卷积限制了空间信息的双向流动[8]\n\n**解决方案**：使用常规卷积替代因果卷积，允许特征在所有空间方向上自由传播，更适合处理图像的二维空间结构[8]\n\n### 3.2 信息丢失补偿问题\n**问题**：SSM的顺序约束可能导致重要空间信息的丢失[8]\n\n**解决方案**：引入对称的非SSM分支作为\"安全网\"，确保即使SSM分支丢失某些信息，也能通过纯卷积路径得到补偿[8]\n\n### 3.3 全局与局部特征平衡问题\n**问题**：需要同时捕获序列依赖和空间上下文信息[9]\n\n**解决方案**：双分支设计使最终特征表示能够融合序列信息(来自SSM分支)和空间信息(来自卷积分支)，充分利用两种处理方式的优势[9]\n\n### 3.4 性能验证结果\n通过系统性消融研究验证了设计有效性[14]：\n\n| 配置 | ImageNet Top-1 | COCO AP_box | COCO AP_mask | ADE20K mIoU |\n|------|----------------|-------------|--------------|-------------|\n| 原始Mamba (因果conv1, 无conv2) | 80.9% | 44.8 | 40.2 | 44.2% |\n| 常规conv1, 无conv2 | 80.9% | 45.0 | 40.8 | 44.7% |\n| conv1 + conv2, 无连接 | 81.3% | 45.3 | 41.0 | 45.7% |\n| **完整MambaVision Mixer** | **82.3%** | **46.4** | **41.8** | **46.0%** |\n\n最终的连接操作带来了显著提升：ImageNet Top-1准确率+1.0%，COCO box AP +1.1，mask AP +0.8，ADE20K mIoU +0.9[14]。\n\n这些结果验证了MambaVision Mixer通过双分支架构和特征连接，成功解决了原始Mamba在视觉任务中的核心局限性，实现了更丰富的特征表示、更好的泛化能力和改进的计算机视觉任务性能[9]。"
  },
  {
    "path": "module-info/CVPR2025-MobileMamba.md",
    "content": "# MobileMamba模块详细分析 https://arxiv.org/pdf/2411.15941\n\n## 1. 背景\n\n### 现有方法的局限性\n- **CNN模型局限**：基于CNN的轻量级模型（如MobileNets）主要使用局部感受野，难以捕获长距离依赖关系，在高分辨率下游任务中性能受限[1][4]\n- **Transformer复杂度问题**：Vision Transformers虽然具有全局感受野和长距离建模能力，但存在二次计算复杂度，在高分辨率场景下计算开销较高[1][3]\n- **现有Mamba模型不足**：尽管状态空间模型具有线性计算复杂度优势，但当前轻量级Mamba模型存在推理速度慢、性能不佳的问题[3]\n\n### 设计动机\n研究发现现有Mamba结构虽然FLOPs较低，但实际推理速度较慢，性能表现不理想[3]。因此需要设计一个既能保持Mamba线性复杂度优势，又能显著提升推理速度和性能的新框架。\n\n## 2. 模块原理\n\n### 整体架构设计\nMobileMamba采用**三阶段网络架构**替代传统四阶段设计[6]。三阶段网络在第一次下采样时将输入图像降至H/16×W/16×C1，最终输出H/64×W/64×C4，相比四阶段网络减少计算量并提升推理速度[6]。\n\n### 核心模块：多感受野特征交互(MRFFI)\nMRFFI模块是MobileMamba的核心创新，将输入特征沿通道维度分为三个部分进行并行处理[7]：\n\n#### 2.1 长距离小波变换增强Mamba (WTE-Mamba)\n**功能**：在全局建模基础上增强高频边缘细节提取能力[7]\n\n**实现原理**：\n- 对输入特征的第一部分 \\[x_{IG} \\in \\mathbb{R}^{h×w×ξc}\\] 通过双向扫描Mamba模块学习全局信息[7]\n- 同时对相同特征图进行Haar小波变换，获得不同频率尺度的特征表示 \\[x_{Iw} \\in \\mathbb{R}^{h/2×w/2×4ξc}\\][7]\n- 通过局部卷积信息提取和逆小波变换恢复原始特征图尺寸[7]\n\n**数学表达**：\n```\nx_{Im1} = SSM(σ(Conv(Linear(x_{IG}[:ξc]))))\nx_{Im2} = σ(Linear(x_{IG}[ξc:]))\nx_{Om} = Linear(x_{Im1} ⊗ x_{Im2})\n```\n\n小波变换部分：\n```\nx_{Iwt} = WT(x_{Iw}) = [f_{LL}, f_{LH}, f_{HL}, f_{HH}]\nx_{Ow} = IWT(Conv(x_{Iwt}))\n```\n\n最终输出：\\[x_{OG} = x_{Om} + x_{Ow}\\][7]\n\n#### 2.2 高效多核深度卷积 (MK-DeConv)\n**功能**：提取具有不同感受野的局部信息，实现多感受野交互[8]\n\n**实现原理**：\n- 将剩余特征 \\[x_{IL} \\in \\mathbb{R}^{h×w×μc}\\] 分为n个部分[8]\n- 每部分使用不同核大小的局部卷积操作：\\[x_{OLj} = Conv(x_{ILj}), k = (2j+1), j \\in \\{1,...,n\\}\\][8]\n- 将不同卷积操作结果连接形成输出特征：\\[x_{OL} = Concat([x_{OL1},...,x_{OLn}], dim=-1)\\][8]\n\n#### 2.3 消除冗余恒等映射\n**功能**：减少高维空间中的特征冗余，降低计算复杂度，提升处理速度[8][9]\n\n**实现**：对剩余 \\[(1-ξ-μ)c\\] 个通道应用恒等映射，避免不必要的计算[9]\n\n**最终输出**：\n```\nx_O = Concat(x_{OG}, x_{OL}, x_I[(1-ξ-μ)c:])\n```\n\n### 训练与测试优化策略\n- **知识蒸馏**：使用TResNet-L作为教师模型进行软蒸馏[10]\n- **扩展训练**：从300轮扩展到1000轮训练[10]  \n- **归一化层融合**：测试时融合批归一化层提升推理速度[10]\n\n## 3. 解决的关键问题\n\n### 3.1 推理速度问题\n**问题**：现有Mamba模型虽然FLOPs较低，但实际推理速度慢[3]\n**解决方案**：\n- 采用三阶段架构减少计算量[6]\n- 通过恒等映射消除冗余计算[9]\n- 归一化层融合提升推理效率[10]\n**效果**：相比LocalVim速度提升21倍，相比EfficientVMamba速度提升3.3倍[3]\n\n### 3.2 感受野局限问题  \n**问题**：单一架构难以同时获得全局和多尺度局部感受野[1]\n**解决方案**：\n- WTE-Mamba提供全局感受野和高频细节提取[7]\n- MK-DeConv提供多尺度局部感受野[8]\n- 小波变换有效扩大感受野范围[7]\n**效果**：实现了全局ERF，同时通过多核局部卷积增强邻近信息提取[3]\n\n### 3.3 性能与效率平衡问题\n**问题**：现有方法难以在保持高性能的同时实现高效率[3]\n**解决方案**：\n- 精心设计的通道分配策略（ξ和μ比例）[9]\n- 渐进式架构优化[17]\n- 多种训练策略协同作用[10]\n**效果**：在ImageNet-1K上达到83.6% Top-1准确率，同时保持高推理速度[12][13]\n\n### 3.4 高分辨率任务适应性问题\n**问题**：轻量级模型在高分辨率下游任务中性能不佳[4]\n**解决方案**：\n- 线性计算复杂度保证高分辨率处理效率[3]\n- 多感受野设计增强细节捕获能力[7][8]\n- 针对不同任务的预训练策略[32]\n**效果**：在目标检测、实例分割、语义分割等高分辨率任务中均取得显著提升[14][15][16]\n\n通过这些创新设计，MobileMamba成功解决了现有轻量级视觉模型在推理速度、感受野覆盖、性能效率平衡等方面的关键问题，为轻量级视觉模型设计提供了新的解决方案。"
  },
  {
    "path": "module-info/CVPR2025-Mona.md",
    "content": "# Mona模块详细分析 https://arxiv.org/pdf/2408.08345\n\n## 1. 背景\n\n### 传统适配器的局限性\n- **来源局限**：现有的计算机视觉适配器设计主要沿用NLP领域的线性适配器结构，使用线性滤波器（主要包括下投影、非线性激活、上投影和跳跃连接）[3][5]\n- **信号处理差异**：视觉信号与语言信号存在显著差异，具有独特的2D卷积操作特性，而传统线性适配器并非为视觉信号优化[3][5]\n- **认知维度单一**：大多数现有适配器使用单一线性层压缩上游特征，缺乏多尺度认知能力[3]\n\n### 增量调优的困境\n- **性能瓶颈**：现有的视觉增量调优方法无法在具有挑战性的任务（如目标检测和分割）上超越全量微调的上限[1][3]\n- **参数固定问题**：适配器调优中固定层参数无法微调以匹配新任务的数据分布，导致传递给适配器的特征分布存在偏差[5]\n\n## 2. 模块原理\n\n### 整体架构\nMona模块被插入到每个SwinTransformer块的MSA（多头自注意力）和MLP（多层感知器）之后，固定预训练层参数，只更新Mona中的参数[5]。\n\n### 核心组件\n\n#### 2.1 输入优化机制\n**缩放归一化层**：\n- 添加LayerNorm层和两个可学习权重s1、s2来调整输入分布[5]\n- 公式表示：`xnorm = s1 · |x0|LN + s2 · x0`[5]\n- **作用**：使适配器能够调整输入分布和来自固定层的输入比例[5]\n\n#### 2.2 多认知视觉滤波器\n**多尺度卷积结构**：\n- 使用三个不同尺寸的深度可分离卷积（DWConv）：3×3、5×5、7×7[6]\n- **设计灵感**：模拟人眼从不同尺度处理视觉信号并整合以获得更好理解的认知过程[5][6]\n- **参数效率**：采用深度可分离卷积而非标准卷积，最小化额外参数量[6]\n\n**特征聚合机制**：\n- 计算三个滤波器的平均结果[6]\n- 使用1×1卷积聚合特征[6]\n- 公式表示：`fdw = x + avg(∑³ᵢ₌₁ ωⁱdw ⊗̂ x)`[6]\n\n#### 2.3 跳跃连接\n- 在两种卷积类型中都添加跳跃连接，增强适配能力[6]\n- 点卷积步骤：`fpw = x + ωpw ⊗ x`[6]\n\n#### 2.4 完整计算流程\n整个Mona的计算过程可表示为：\n`x = x0 + Ulσ(fpw(fdw(Dl(xnorm))))`[6]\n其中Dl和Ul分别表示第l个适配器的下投影和上投影，σ表示GeLU激活函数[6]。\n\n### 参数分析\n每个Mona模块的参数包括：\n- LayerNorm和缩放因子：2m + 2\n- 两个线性层：2mn + m + n  \n- DWConv层：83n（来自3² + 5² + 7² = 83）\n- 点卷积：n²\n- **总参数量**：`(2n + 3)m + n² + 84n + 2`[7]\n\n## 3. 解决的关键问题\n\n### 3.1 视觉信号处理不匹配问题\n**问题**：传统线性适配器主要为语言信号设计，不适合处理具有2D空间特性的视觉信号[3][5]\n**解决方案**：\n- 引入视觉友好的卷积滤波器替代线性滤波器[5]\n- 实验证明卷积滤波器能更好地将视觉知识从预训练模型迁移到其他任务[3]\n\n### 3.2 输入分布偏差问题\n**问题**：固定层参数无法微调以匹配新任务数据分布，导致传递给适配器的特征分布存在偏差[5]\n**解决方案**：\n- 通过缩放归一化层调节输入特征分布[5]\n- LayerNorm帮助稳定前向输入分布和反向传播梯度[5]\n\n### 3.3 单一认知维度限制\n**问题**：现有适配器主要依赖单一线性层压缩上游特征，认知能力有限[3]\n**解决方案**：\n- 采用多尺度卷积滤波器从多个认知角度处理上游特征[6]\n- 模拟人类视觉系统的多尺度认知机制[5][6]\n\n### 3.4 性能上限突破\n**问题**：现有增量调优方法无法在视觉识别任务上超越全量微调[1][3]\n**解决方案**：\n- Mona成为首个在多个视觉任务上都超越全量微调的适配器方法[3]\n- 在COCO数据集上比全量微调提升1% mAP，证明了适配器调优范式可以替代全量微调[1][8]\n\n通过这些创新设计，Mona模块成功地将适配器调优的性能推向了新的高度，为视觉任务的高效迁移学习提供了更优的解决方案[3][10]。"
  },
  {
    "path": "module-info/CVPR2025-OverLoCK.md",
    "content": "# OverLoCK网络模块详解 https://arxiv.org/pdf/2502.20087\n\n## 1. BasicBlock模块\n\n### 背景\nBasicBlock是OverLoCK网络中Base-Net和Overview-Net的基础构建块。由于这两个子网络主要负责编码低/中级特征和快速生成粗略的全局上下文，因此需要相对简单但有效的模块设计[6][7]。\n\n### 模块原理\nBasicBlock采用以下流水线结构[7]：\n1. **残差3×3深度卷积**：首先对输入特征进行局部感知\n2. **核心处理块**：\n   - Layer Normalization层：特征标准化\n   - Dilated RepConv层：扩张重参数化卷积，增强特征表达能力\n   - SE Layer：通道注意力机制，增强重要特征通道\n   - ConvFFN：卷积前馈网络，进一步处理特征\n\n### 解决的问题\n- **特征编码效率**：通过简洁的设计快速编码低/中级特征\n- **计算复杂度控制**：为Base-Net和Overview-Net提供轻量级但有效的特征提取能力\n- **局部特征增强**：通过SE机制和扩张卷积增强局部特征表达\n\n## 2. DynamicBlock模块\n\n### 背景\nDynamicBlock是Focus-Net的核心构建块，需要在自顶向下上下文指导下进行更精细的感知。由于Focus-Net承担\"细看\"的任务，需要更复杂和强大的模块来处理精细特征[7]。\n\n### 模块原理\nDynamicBlock包含以下关键组件[7]：\n1. **残差3×3深度卷积**：基础的局部特征提取\n2. **门控动态空间聚合器（GDSA）**：核心的动态特征处理模块\n3. **ConvFFN**：卷积前馈网络进行最终特征处理\n\n**上下文流机制**[7][8]：\n- 上下文先验Pi和特征图Zi通过拼接融合\n- 在块内部实现特征级和权重级的双重指导\n- 更新后的上下文先验和特征图被分离输出\n\n### 解决的问题\n- **动态特征处理**：通过GDSA实现基于上下文的动态特征聚合\n- **自顶向下指导**：有效利用Overview-Net提供的上下文先验\n- **精细感知能力**：在全局上下文指导下实现更准确的细粒度特征提取\n\n## 3. GDSA（门控动态空间聚合器）模块\n\n### 背景\nGDSA是DynamicBlock的核心组件，旨在实现上下文指导的动态特征聚合。传统的静态卷积无法根据输入内容自适应调整，而GDSA通过引入动态机制和门控机制来解决这一问题[7]。\n\n### 模块原理\nGDSA的处理流程如下[7]：\n\n1. **上下文融合**：\n   - 将上下文先验Pi和特征图Zi拼接\n   - 通过1×1卷积+SiLU激活处理融合特征\n\n2. **动态卷积处理**：\n   - 使用ContMix（上下文混合动态卷积）作为核心令牌混合器\n   - 利用上下文先验Pi计算动态卷积核权重\n   - 实现权重级的上下文指导\n\n3. **门控机制**：\n   - 计算动态门控信号来调制特征图\n   - 通过元素级乘法实现特征级指导\n   - 消除上下文噪声，增强有用信息\n\n4. **并行分支融合**：\n   - 门控信号与并行分支输出进行元素级乘法\n   - 实现自适应的特征选择和增强\n\n### 解决的问题\n\n1. **长距离依赖建模**：\n   - 通过ContMix使固定尺寸卷积核能够捕获全局信息\n   - 解决传统卷积感受野受限的问题[3][4]\n\n2. **上下文噪声过滤**：\n   - 门控机制有效过滤无关的上下文信息\n   - 增强有用的语义指导信号\n\n3. **自适应特征聚合**：\n   - 根据输入内容动态调整特征处理策略\n   - 实现内容感知的特征增强\n\n4. **归纳偏置保持**：\n   - 在获得全局建模能力的同时保持卷积的局部归纳偏置\n   - 平衡全局和局部特征表达能力\n\n## 模块协同工作机制\n\n这三个模块在OverLoCK架构中协同工作，实现了\"先总览后细看\"的仿生视觉机制：\n\n- **BasicBlock**：在Base-Net和Overview-Net中快速编码基础特征和全局上下文\n- **DynamicBlock + GDSA**：在Focus-Net中利用上下文指导进行精细化特征处理\n- **整体协同**：通过上下文流机制实现自顶向下的语义指导，显著提升网络的特征表达能力[7][8]"
  },
  {
    "path": "module-info/CVPR2025-SCSegamba.md",
    "content": "# SAVSS模块详细总结 https://arxiv.org/pdf/2503.01113\n\n## 1. 背景\n\n### 现有方法的局限性\n当前裂缝分割方法面临的主要挑战包括[1][2][3]：\n\n**CNN方法的限制**：\n- CNN如ECSNet和SFIAN虽然具有强大的局部归纳特性，但受限的感受野约束了它们建模整个图像中广泛不规则依赖关系的能力[1]\n- 导致分割不连续和背景噪声抑制能力弱的问题[1]\n- 即使扩张卷积能扩大感受野，其固有的归纳偏置仍无法完全解决复杂裂缝模式中的重背景干扰问题[1]\n\n**Transformer方法的限制**：\n- 虽然Vision Transformer在捕获不规则像素依赖关系方面表现出色，但注意力计算的二次复杂度导致高内存使用和训练挑战[2]\n- 限制了在资源受限的边缘设备上的部署和实际应用[2][3]\n\n**现有Mamba方法的不足**：\n- 大多数Mamba方法通过线性层处理特征图，限制了对裂缝特征的选择性增强或抑制能力[3]\n- 常见的平行或单向对角扫描难以在处理不规则、多方向像素拓扑时保持语义连续性[3]\n- 在多场景裂缝图像中经常产生误检或漏检[3]\n\n## 2. 模块原理\n\n### 整体架构\nSAVSS（Structure-Aware Visual State Space）模块是SCSegamba的核心组件，包含两个关键设计[5][6]：\n\n### 2.1 门控瓶颈卷积（GBC）\n\n**低秩近似原理**：\nGBC采用瓶颈卷积结构实现参数和计算量的显著降低[7]。假设卷积响应为：\n```\nz = Qs + c\n```\n其中Q是大小为f×(p²×d)的矩阵。通过低秩近似，将其表示为：\n```\nz = LM^T s + c'\n```\n计算复杂度从O(fp²d)降至O(f₀p²d) + O(ff₀)[7]。\n\n**门控机制**：\n输入特征x经过以下处理流程[7]：\n1. 保留残差连接：`x_residual = x`\n2. 生成门控特征：`g1(x) = ReLU(Norm1(f1(x)))`\n3. 主分支处理：`x1 = ReLU(Norm2(BottConv2(g1(x))))`\n4. 门控分支：`g2(x) = ReLU(Norm3(BottConv3(x)))`\n5. 哈达玛积融合：`m(x) = x1 ⊙ g2(x)`\n6. 最终输出：`Output = ReLU(Norm4(BottConv4(m(x)))) + x_residual`\n\n### 2.2 结构感知扫描策略（SASS）\n\n**四路径设计**：\nSASS包含四条扫描路径[8]：\n- 两条平行蛇形路径\n- 两条对角蛇形路径\n\n**扫描方程**：\n处理方程如下[8]：\n```\nP = e^(ΔP)\nQ = (ΔP)^(-1)(e^(ΔP) - I) · ΔQ\nz_k = Pz_(k-1) + Qw_k\nu_k = Rz_k + Sw_k\n```\n\n其中：\n- w ∈ R^(t×D)为输入\n- P ∈ R^(G×D)控制隐藏空间状态\n- z_k表示时间步k的特定隐藏状态\n- u_k表示时间步k的输出\n\n**像素注意力导向融合（PAF）**：\n为有效结合初始序列x与经过SS2D处理的序列，集成PAF增强SAVSS捕获裂缝形状和纹理细节的能力[9]。\n\n## 3. 解决的关键问题\n\n### 3.1 裂缝形态学信息捕获问题\n**问题**：传统方法难以有效建模裂缝的形态学信息和纹理特征[1]\n\n**解决方案**：\n- GBC通过门控机制动态调整权重，增强模型在处理多样化裂缝模式和复杂背景时的适应性[7]\n- 瓶颈卷积设计在保持裂缝基本特征的同时动态细化主分支的细粒度特征表征[7]\n\n### 3.2 语义连续性保持问题\n**问题**：现有扫描策略在处理不规则、多方向裂缝拓扑时难以保持语义连续性[3][8]\n\n**解决方案**：\n- SASS的四路径设计能够有效提取规则裂缝区域的连续语义信息[8]\n- 同时在多个方向上保持纹理连续性，适用于具有复杂背景的多场景裂缝图像[8]\n- 实验证明SASS比其他扫描策略的F1和mIoU分别提升0.30%和0.33%[17]\n\n### 3.3 计算效率与性能平衡问题\n**问题**：现有方法难以在保持高分割质量的同时实现低计算资源消耗[3]\n\n**解决方案**：\n- 通过低秩近似显著降低计算复杂度，参数量仅2.80M[14]\n- 四层SAVSS设计在性能和计算需求间取得最佳平衡[21]\n- 消融实验显示完整SAVSS配置下F1和mIoU分别达到0.8390和0.8479[16]\n\n### 3.4 复杂场景适应性问题\n**问题**：在噪声重、低对比度等复杂干扰条件下分割效果不佳[3][15]\n\n**解决方案**：\n- SASS建立多方向邻接关系，使隐藏状态z_k能够捕获更复杂的拓扑和纹理细节[8]\n- 在塑料跑道复杂裂缝拓扑、金属材料噪声重背景、地下管道低对比度场景中均表现出色[15]\n- 有效抑制无关噪声，产生高质量分割图[15]\n\n通过这些创新设计，SAVSS模块成功解决了裂缝分割中的关键技术挑战，为实际应用提供了高效可行的解决方案。"
  },
  {
    "path": "module-info/CVPR2025-Transformers without Normalization.md",
    "content": "# DyT (Dynamic Tanh) 模块详细总结 https://arxiv.org/pdf/2503.10622\n\n## 1. 背景\n\n### 归一化层的普遍性与重要性\n- **历史地位**: 自2015年Batch Normalization发明以来，归一化层已成为现代神经网络最基础的组件之一[1]\n- **广泛应用**: Layer Normalization (LN) 在Transformer架构中被广泛使用，几乎所有现代网络都包含归一化层[1][3]\n- **传统认知**: 归一化层被认为对深度网络的有效训练是**不可或缺的**，这一信念如此根深蒂固，以至于近年来的新架构往往会替换注意力或卷积层，但几乎总是保留归一化层[1]\n\n### 研究动机\n通过对训练好的网络进行分析，研究者发现了一个关键观察：**LN层的输入-输出映射呈现tanh函数般的S形曲线**[5]。这一发现启发了DyT方法的设计思路。\n\n## 2. 模块原理\n\n### 核心设计思想\nDyT的设计基于对归一化层行为的深入理解：\n- **S形映射**: LN层产生类似tanh的S形输入-输出曲线[5]\n- **双重效果**: LN层既能缩放输入激活，又能压缩极值[1]\n- **非线性特性**: 对极值进行非线性压缩，对中心值进行近似线性变换[5][6]\n\n### 数学定义\n```\nDyT(x) = γ * tanh(αx) + β\n```\n其中：\n- **α**: 可学习的标量参数，允许根据输入范围动态调整缩放[7]\n- **γ**: 可学习的逐通道向量参数，用于缩放变换[7]\n- **β**: 可学习的逐通道向量参数，用于偏移变换[7]\n- **tanh函数**: 提供有界的S形压缩特性[7]\n\n### 实现特点\n- **直接替换**: 可以直接替换现有架构中的归一化层，无需修改其他组件[2][7]\n- **无统计计算**: 与归一化层不同，DyT不需要计算激活统计量[1]\n- **逐元素操作**: 对输入张量的每个元素独立操作[7]\n\n### 参数初始化\n- **γ**: 初始化为全1向量[7]\n- **β**: 初始化为全0向量[7]\n- **α**: 默认初始化为0.5（LLM训练除外）[7]\n\n## 3. 解决了什么问题\n\n### 主要解决的核心问题\n\n#### 3.1 挑战传统认知\n- **打破依赖性**: 证明了Transformer可以在**没有归一化层**的情况下稳定训练并达到相同或更好的性能[1][21]\n- **理论突破**: 挑战了\"归一化层对现代神经网络训练不可或缺\"的传统观念[1]\n\n#### 3.2 计算效率问题\n- **显著提升效率**: 在LLaMA 7B模型中，推理时间减少52.4%，训练时间减少42.2%[12]\n- **简化计算**: 避免了归一化层中复杂的统计量计算（均值、方差）[1]\n\n#### 3.3 架构简化问题\n- **实现简单**: 提供了一个极其简单的替代方案，只需要一个tanh函数和几个可学习参数[7]\n- **易于集成**: 可以直接替换现有架构中的归一化层，无需调整训练超参数[2][7]\n\n#### 3.4 性能保持问题\n通过大量实验验证，DyT在多个领域都能保持或超越原有性能：\n- **视觉任务**: 监督学习、自监督学习、扩散模型[8][9]\n- **语言模型**: LLaMA系列模型[10]\n- **语音处理**: wav2vec 2.0模型[10][11]\n- **生物序列**: DNA序列建模[11]\n\n#### 3.5 训练稳定性问题\n- **稳定训练**: 通过tanh函数的有界特性和α参数的动态调整，确保训练过程的稳定性[12]\n- **极值处理**: 有效压缩极值激活，防止梯度爆炸或消失[5][6]\n\n### 理论贡献\n- **机制理解**: 为理解归一化层的工作机制提供了新的视角[21]\n- **设计指导**: 为效率导向的网络设计提供了新的选择[12]\n- **研究启发**: 开辟了无归一化神经网络训练的新研究方向[21]\n\nDyT模块的提出不仅提供了一个实用的技术解决方案，更重要的是从根本上重新审视了归一化层在深度学习中的作用，为未来的网络架构设计提供了新的思路和可能性。"
  },
  {
    "path": "module-info/CVPR2025-vHeat.md",
    "content": "# vHeat模块总结 https://arxiv.org/pdf/2405.16555\n\n## 1. 背景\n\n### 现有视觉模型的局限性\n- **CNN的限制**：卷积神经网络依赖局部感受野和固定卷积算子，在捕获长程和复杂依赖关系方面存在约束[1]\n- **ViT的计算瓶颈**：基于自注意力机制的Vision Transformer虽然具有全局特征依赖的优势，但面临O(N²)的计算复杂度问题，在高分辨率图像处理时计算开销巨大[5]\n- **效率与性能的权衡**：现有改进方法如窗口注意力、线性注意力等在提高效率的同时，往往以牺牲感受野或非线性能力为代价[5]\n\n### 物理启发的动机\n研究者从物理热传导领域汲取灵感，发现热传导中的空间局部性对热能传递的重要性与视觉语义在空间域内的传播具有相似性——相邻图像区域在特定尺度下往往包含相关信息或共享相似特征[1]。\n\n## 2. 模块原理\n\n### 物理热传导方程基础\nvHeat基于二维空间中的经典物理热传导方程[6]：\n```\n∂u/∂t = k(∂²u/∂x² + ∂²u/∂y²)\n```\n其中：\n- u(x,y,t)表示时刻t在位置(x,y)的温度\n- k > 0为热扩散系数，衡量材料中的热传递速率\n\n### 热传导算子（HCO）设计\n\n#### 核心实现\n将二维温度分布u(x,y,t)扩展到多通道图像特征U(x,y,c,t)，HCO的离散实现为[8]：\n```\nU^t = IDCT2D(DCT2D(U^0) × e^(-k(ωx²+ωy²)t))\n```\n\n#### 关键组件\n1. **DCT2D/IDCT2D变换**：使用二维离散余弦变换替代傅里叶变换，基于Neumann边界条件假设，适应视觉数据的矩形约束特性[8]\n\n2. **自适应热扩散系数**：\n   - 通过频率值嵌入（FVEs）预测热扩散系数k[9]\n   - FVEs类似于ViT中的绝对位置嵌入，但工作在频域[9]\n   - 使k能够根据图像内容自适应调整，实现非均匀的视觉热传导[9]\n\n3. **频域滤波机制**：\n   - 系数矩阵e^(-k(ωx²+ωy²)t)在频域中充当自适应滤波器[10]\n   - 不同频率值对应不同图像模式（高频对应边缘和纹理，低频对应平坦区域）[10]\n\n### 网络架构集成\n- **分层设计**：采用4阶段分层架构，分辨率从H/4×W/4逐渐降低到H/32×W/32[7]\n- **热传导层**：类似ViT块，但用HCO替代自注意力算子，保留前馈网络[9]\n- **深度卷积增强**：结合3×3深度卷积层进行特征提取[9]\n\n## 3. 解决了什么问题\n\n### 计算复杂度问题\n- **显著降低复杂度**：从自注意力的O(N²)降低到O(N^1.5)，大幅提升计算效率[1][3]\n- **高分辨率优势**：当输入图像分辨率增加到768×768时，相比Swin-B实现3倍吞吐量提升、80%更少GPU内存占用、35%更少计算FLOPs[3]\n\n### 全局感受野与效率的统一\n- **全局信息感知**：通过频域操作，每个DCT元素都包含来自图像空间所有块的信息，实现全局感受野[3]\n- **高效并行化**：DCT和IDCT操作具有高并行性，提升训练和测试效率[3]\n\n### 模型可解释性\n- **物理基础**：基于可解释的物理热传导原理，相比基于token相似性的自注意力机制更具物理意义[10]\n- **直观理解**：温度U(x,y,c,t)对应视觉特征，热传导过程模拟信息传播，提供清晰的物理解释[10]\n\n### 性能提升\n在多个视觉任务上实现性能提升[11][12][13]：\n- **图像分类**：vHeat-B在ImageNet-1K上达到84.0%准确率，超越Swin-B 0.5%\n- **目标检测**：在COCO数据集上consistently优于基线模型\n- **语义分割**：在ADE20K上实现更高的mIoU\n- **泛化能力**：在鲁棒性评估和低级视觉任务上表现优异[13][14]\n\n### 自适应特征表示\n通过预测的热扩散系数k实现自适应视觉热传导，能够根据图像内容动态调整信息传播模式，相比固定参数的方法更加灵活和有效[15]。"
  },
  {
    "path": "module-info/ICLR2025-Pola.md",
    "content": "# PolaFormer中的Pola模块总结 https://arxiv.org/pdf/2501.15061\n\n## 1. 背景\n\n### 传统线性注意力的局限性\n传统的Transformer自注意力机制具有O(N²)的二次复杂度，在处理长序列或高分辨率图像时计算开销巨大[1]。为解决这一问题，线性注意力方法通过核化特征映射将复杂度降低到O(Nd²)[2]。\n\n### 现有线性注意力的不足\n现有线性注意力方法存在两个关键问题[2]：\n1. **信息丢失严重**：使用ReLU、ELU+1等非负特征映射时，只保留正-正交互，完全丢弃负-负和正-负交互信息\n2. **注意力过于均匀**：缺乏softmax的指数缩放特性，导致注意力权重分布均匀，熵值过高，无法有效区分重要和不重要的查询-键对\n\n如图1所示，传统线性注意力生成的注意力图过于均匀，而PolaFormer能够产生更接近softmax的尖锐注意力分布[1]。\n\n## 2. 模块原理\n\n### 2.1 极性感知分解\nPola模块的核心是将查询向量q和键向量k按极性分解[7]：\n\n```\nq = q⁺ - q⁻\nk = k⁺ - k⁻\n```\n\n其中：\n- q⁺ᵢ = max(qᵢ, 0)，q⁻ᵢ = max(-qᵢ, 0)\n- k⁺ᵢ = max(kᵢ, 0)，k⁻ᵢ = max(-kᵢ, 0)\n\n### 2.2 完整交互建模\n原始查询-键内积可以分解为四种交互类型[7]：\n\n```\n⟨q, k⟩ = ⟨q⁺, k⁺⟩ + ⟨q⁻, k⁻⟩ - ⟨q⁺, k⁻⟩ - ⟨q⁻, k⁺⟩\n        └─────同号交互─────┘   └─────异号交互─────┘\n```\n\n传统线性注意力只保留第一项，Pola模块则显式处理所有四种交互。\n\n### 2.3 可学习极性混合\n为避免直接减法操作导致的不稳定性，Pola模块采用可学习混合策略[7]：\n\n1. **值向量分割**：将值向量v沿通道维度分为两半：v = [vₛ; vₒ]\n2. **分流处理**：\n   - 同号流：处理⟨q⁺, k⁺⟩ + ⟨q⁻, k⁻⟩交互，使用vₛ\n   - 异号流：处理⟨q⁺, k⁻⟩ + ⟨q⁻, k⁺⟩交互，使用vₒ\n3. **系数调节**：通过可学习矩阵Gₛ和Gₒ分别调节两个流的贡献\n\n### 2.4 降熵幂函数\n基于理论分析，Pola模块采用可学习幂函数降低注意力熵值[9]：\n\n```\np = 1 + α sigmoid(w₁, ..., wₐ)\ng(x; p) = (x₁^p₁, ..., xₐ^pₐ)\n```\n\n**理论保证**：定理1证明了具有正一阶和二阶导数的函数g可以降低正序列熵（PSE）[9][26]。\n\n## 3. 解决的问题\n\n### 3.1 信息完整性问题\n**问题**：传统线性注意力丢失负值交互信息，导致表达能力不足[2]\n\n**解决方案**：\n- 通过极性分解显式建模所有四种查询-键交互类型[7]\n- 实验显示极性系数Gₛ和Gₒ学习到明显的负相关关系，证明了互补性[8]\n\n### 3.2 注意力尖锐性问题  \n**问题**：线性注意力权重过于均匀，熵值高，无法聚焦重要信息[2]\n\n**解决方案**：\n- 理论证明并采用可学习幂函数有效降低注意力熵值[9]\n- 可视化结果显示PolaFormer的注意力熵值（H=2.30/2.45）显著低于传统线性注意力（H=3.72）[31]\n\n### 3.3 计算效率问题\n**问题**：在保持线性复杂度的同时提升性能\n\n**解决方案**：\n- 总复杂度仍为O(Nd²)，保持线性特性[10]\n- 实现1.15×-1.32×的推理加速[12]\n- 在ImageNet-1K上相比基线提升2.4%-3.7%性能[11][17]\n\n### 3.4 低秩退化问题\n**问题**：softmax矩阵固有的低秩特性可能导致退化解[8]\n\n**解决方案**：\n- 引入深度卷积（DWC）等技术增加矩阵秩[8][14]\n- 消融研究证明DWC比可变形卷积效果更好[14]\n\n通过这些创新设计，Pola模块成功地在保持线性复杂度的前提下，显著提升了线性注意力的表达能力和性能表现。"
  },
  {
    "path": "module-info/ICLR2025-ToST.md",
    "content": "# Token Statistics Self-Attention (TSSA) 模块总结 https://arxiv.org/pdf/2412.17810\n\n## 1. 背景\n\n### 传统注意力机制的挑战\n传统Transformer的自注意力机制存在显著的计算瓶颈：\n- **二次复杂度问题**：需要计算所有token对之间的相似性，导致计算和内存复杂度随token数量呈二次增长 [1]\n- **成对相似性依赖**：核心操作是scaled dot product attention，通过\"key\"和\"query\"参数矩阵计算token对的缩放点积相似性 [1]\n- **计算负担沉重**：这种设计在处理长序列时带来巨大的计算开销，成为扩展性的主要障碍 [1][2]\n\n### 现有解决方案的局限\n已有的高效注意力方法主要包括：\n- 将token分块处理 [2]\n- 使用滑动窗口注意力 [2]  \n- 寻找合适的低秩投影 [2]\n- 通过Nyström扩展近似计算 [2]\n\n但这些方法本质上仍然依赖或近似成对相似性计算，没有从根本上突破传统注意力的设计范式 [2]。\n\n### 理论动机\n研究发现，自注意力操作本质上是一种核回归形式，通过学习的相似性度量对\"相似\"的输入token进行加权平均 [2]。这启发了一个更抽象的思考：注意力操作可以被视为基于输入token统计量产生输出的更一般算子类别的特例 [2]。\n\n## 2. 模块原理\n\n### 核心数学框架\n\n#### MCR2变分形式\nTSSA基于最大编码率降低(MCR2)目标函数的新变分形式。作者证明了定理1：对于凹函数f，存在上界：\n```\nF(M) ≤ Σf((Q^T MQ)_ii)\n```\n这允许通过计算矩阵乘积对角线元素的标量函数来上界大矩阵的谱函数 [7][8]。\n\n#### 变分目标函数\n基于此理论，构建变分压缩目标：\n```\nR^var_c,f(Z,Π|{U_k}) = (1/2)Σ(n_k/n)Σf((1/n_k)(U_k^T Z Diag(π_k) Z^T U_k)_ii)\n```\n其中U_k是正交矩阵，π_k是组成员分配向量 [8]。\n\n#### TSSA操作公式\n通过对变分目标进行梯度下降，得到TSSA的核心更新公式：\n```\nz_j^+ = z_j - (τ/n)Σ Π_jk U_k D(Z,π_k|U_k) U_k^T z_j\n```\n\n其中：\n- **Π_jk**：token j属于组k的概率\n- **U_k**：第k个注意力头的投影矩阵  \n- **D(Z,π_k|U_k)**：基于二阶矩统计量的对角矩阵 [9][10]\n\n### 操作机制解释\n\n#### 统计量计算\nTSSA的核心是计算投影token特征的二阶矩统计量：\n```\n(U_k^T Z)⊙2 π_k/⟨π_k,1⟩\n```\n这估计了在分布π_k/⟨π_k,1⟩下U_k^T Z的二阶矩 [10]。\n\n#### 数据依赖投影\nTSSA执行近似的低秩数据依赖投影操作[I - (τ/n)U_k D_k U_k^T]：\n- **大功率方向**：具有大二阶矩的方向被保留（D_k中对应元素接近0）\n- **小功率方向**：具有小二阶矩的方向被抑制（D_k中对应元素较大）[10][11]\n\n#### 组成员分配\n使用基于高斯混合模型的后验概率估计组成员：\n```\nΠ_jk ∝ exp((1/2η)||U_k^T z_j||_2^2)\n```\n其中η是可学习的温度参数 [12][13]。\n\n### 实现细节\n\n#### 复杂度优势\n- **时间复杂度**：O(pn)，其中p是投影维度，n是token数量\n- **空间复杂度**：O(p)\n- 相比传统注意力的O(pn²)时间和O(n²)空间复杂度有显著改进 [13]\n\n#### 实际优化\n1. **正交性放松**：实践中不严格执行U矩阵的正交约束\n2. **L2归一化**：对投影token进行L2归一化以稳定训练\n3. **可学习参数**：将理论中的常数系数吸收到可学习参数中 [29][30]\n\n## 3. 解决了什么问题\n\n### 计算效率问题\n**问题**：传统自注意力的O(n²)复杂度在长序列处理中造成计算瓶颈\n**解决方案**：TSSA实现O(n)线性复杂度，显著提升计算效率。实验显示，对于10k个token，TOSS比ViT快约10倍，内存使用减少约100倍 [1][35]\n\n### 内存占用问题  \n**问题**：传统注意力需要存储n×n的注意力矩阵，内存需求随序列长度二次增长\n**解决方案**：TSSA只需要存储O(p)的统计量信息，大幅降低内存占用 [13]\n\n### 可扩展性问题\n**问题**：传统Transformer在处理长序列时面临严重的扩展性挑战\n**解决方案**：线性复杂度使TOST能够高效处理长序列任务。在Long-Range Arena基准测试中，TOST在Transformer类模型中表现最佳 [18]\n\n### 理论理解问题\n**问题**：传统注意力机制缺乏清晰的数学解释和可解释性\n**解决方案**：TSSA基于MCR2理论提供了明确的数学推导，每层操作都有清晰的优化目标。可视化实验验证了模型确实在逐层优化设计目标 [16]\n\n### 设计范式问题\n**问题**：传统观念认为成对相似性计算对Transformer成功至关重要\n**解决方案**：TSSA证明了不依赖成对相似性的注意力机制同样有效，挑战了传统设计范式。实验显示TOST在多个任务上达到了与传统Transformer相当的性能 [3][17]\n\n### 语义理解问题\n**问题**：传统注意力机制在语义聚类和分割方面需要复杂的训练策略\n**解决方案**：TSSA通过统计量驱动的分组机制自动学习语义聚类，无需额外的监督信号。可视化显示TOST能够自动进行有意义的前景分割 [16][17]\n\n总体而言，TSSA通过从统计学角度重新思考注意力机制，不仅解决了计算效率问题，还提供了更好的理论基础和可解释性，为Transformer架构的发展开辟了新的方向。"
  },
  {
    "path": "module-info/TPAMI2025-HyperYOLO.md",
    "content": "# Mixed Aggregation Network (MANet) 模块总结\n\n## 1. 背景\n传统YOLO系列方法的骨干网络主要依赖单一的基础模块进行特征提取，如YOLOv8中的C2f模块。这种单一结构限制了信息流的多样性和特征提取能力[7]。为了增强骨干网络的特征辨别能力，需要设计更加丰富和多样化的特征聚合机制来提升基础网络的特征提取能力。\n\n## 2. 模块原理\nMANet通过协同融合三种典型的卷积变体来实现混合聚合[7]：\n\n### 核心组件\n- **1×1旁路卷积**：用于通道级特征重校准\n- **深度可分离卷积（DSConv）**：用于高效的空间特征处理  \n- **C2f模块**：用于增强特征层次集成\n\n### 计算流程[8]\n```\nXmid = Conv1(Xin)  // 输入通道扩展到2c\nX1 = Conv2(Xmid)   // 1×1卷积分支\nX2 = DSConv(Conv3(Xmid))  // 深度可分离卷积分支\nX3, X4 = Split(Xmid)  // 分割用于C2f处理\n// C2f模块的迭代处理\nX5 = ConvNeck1(X4) + X4\nX6 = ConvNeck2(X5) + X5\n...\nXout = Convo(X1||X2||...||X4+n)  // 特征融合和压缩\n```\n\n### 配置优化\n通过消融实验确定最优的卷积核尺寸配置[k2, k3, k4, k5] = [3, 5, 5, 3]，在性能和参数数量之间取得平衡[16]。\n\n## 3. 解决的问题\n\n### 信息流多样性不足\n- **问题**：单一的C2f模块限制了梯度流的丰富性和多样性\n- **解决**：通过三种不同的卷积结构产生更加多样化和丰富的梯度流，显著放大了基础特征在五个关键阶段中封装的语义深度[7]\n\n### 特征提取能力受限\n- **问题**：传统单一模块无法充分利用不同类型的特征表示\n- **解决**：混合聚合机制整合了三种经典结构，实现更丰富的信息流动。实验显示，在相同颈部网络下，MANet比C2f模块在所有指标上都表现更优，APval提升1.5个百分点[16]\n\n---\n\n# HyperC2Net 模块总结\n\n## 1. 背景\n传统YOLO模型的颈部设计存在显著局限性[2]：\n- **PANet局限**：主要局限于相邻层之间的特征融合，无法充分解决跨层级特征集成问题\n- **Gold-YOLO不足**：虽然促进了层间信息交换，但仍无法实现特征图内的跨位置交互\n- **高阶相关性缺失**：未能充分探索特征相互关系的潜力，特别是涉及高阶相关性的复杂非线性关系[2]\n\n## 2. 模块原理\n\n### HGC-SCS框架实现\nHyperC2Net是HGC-SCS框架的具体实例化，包含三个核心阶段[10]：\n\n#### 语义收集阶段\n```\nXmixed = B1||B2||B3||B4||B5\n```\n将来自骨干网络五个阶段的特征图{B1, B2, B3, B4, B5}进行通道级连接，合成跨层级视觉特征[9]。\n\n#### 超图构建与计算\n- **顶点构建**：将网格化的视觉特征解构为超图的顶点集合V\n- **超边构建**：使用距离阈值构建ε-球作为超边[9]\n  ```\n  E = {ball(v, ε) | v ∈ V}\n  ball(v, ε) = {u | ||xu - xv||d < ε, u ∈ V}\n  ```\n- **超图卷积**：采用空间域超图卷积进行高阶消息传递[10]\n  ```\n  HyperConv(X, H) = X + D⁻¹ᵥHD⁻¹ₑH^T XΘ\n  ```\n\n#### 语义散射阶段\n```\nN3, N4, N5 = ϕ(Xhyper, B3), ϕ(Xhyper, B4), ϕ(Xhyper, B5)\n```\n将高阶结构信息分散到最终的三个检测尺度[10]。\n\n### 关键技术特点\n- **五尺度融合**：操作跨越五个尺度，突破传统网格结构限制\n- **跨层级跨位置**：允许不同层级和位置之间的复杂高阶交互[3]\n\n## 3. 解决的问题\n\n### 跨层级特征融合限制\n- **问题**：PANet仅能融合相邻层信息，这种邻接约束的融合模式限制了网络内信息集成的广度[11]\n- **解决**：HyperC2Net能够直接融合来自骨干网络的五层特征，实现更强大和多样化的信息流，缩小了不同深度特征之间的连接差距[11]\n\n### 跨位置交互缺失\n- **问题**：传统颈部设计不能实现特征图内的跨位置交互，Gold-YOLO虽然能跨层级但不支持跨位置[11]\n- **解决**：通过超图计算实现非网格约束的信息流动，支持跨层级和跨位置的高阶信息传播，突破了传统网格结构的限制[11]\n\n### 高阶相关性建模不足\n- **问题**：传统方法无法充分利用视觉数据中复杂的高阶相关性和非线性关系[3]\n- **解决**：通过超图计算捕获特征图中潜在的复杂高阶关联，生成的特征表示综合考虑了语义特征和高阶结构特征[11]\n\n### 性能提升验证\n消融实验显示高阶学习相比低阶学习APval提升0.4个百分点[16]。公平比较实验中，仅将YOLOv9的颈部替换为HyperC2Net：\n- Hyper-YOLOv1.1-T相比YOLOv9-T提升2.0 APval\n- Hyper-YOLOv1.1-S相比YOLOv9-S提升1.2 APval[15]\n\n这验证了高阶学习方法在目标检测任务中的有效性。"
  },
  {
    "path": "mutilmodel-project.md",
    "content": "# 2025-YOLO|RTDETR多模态目标检测项目\n对于当今的视觉任务来说，最简单入手的便是YOLO系列，通过ultralytics库的帮助下，无论是否来自计算机科班的同学基本都可以快速构建自己的目标检测模型。但是与简单方便相伴而来的是现在的YOLO系列模型的整体拒稿率越来越高，甚至与很多期刊或导师看到YOLO四个字便直接Reject，即使组合出性能优异的检测模型也难以发表到心仪的期刊上去，因此单靠单模态的YOLO发有点要求的期刊已经开始显得有些吃力。很多人尝试转向RT-DETR模型，对于从YOLO迁移过去的人来说一样简单好用，但是RTDETR的训练成本要比YOLO系列模型略高，因此对于部分没有服务器/自费服务器的同学来说可能有点难接受。虽然单模态的YOLO确实显得吃力，但是多模态的YOLO就不是这样了，从去年开始多模态就开始慢慢火起来，但由于缺乏相对应的教程，让很多人望而止步，从去年到今年，也越来越多人问，有没有多模态相关的YOLO改进项目？别急，它终于要来了，而且还不止YOLO，RTDETR的多模态也有！\n\n## 1. 这个项目包含什么内容？\n\n1. 这个项目主体思路是在尽可能的保证继承ultralytics库简单好用的基础上为YOLO与RT-DETR现阶段这两个最热门的目标检测器，提供出多模态的能力。<可以理解为YOLO｜RTDETR的多模态进阶版>  \n2. 这个项目的核心是在原有可见光（RGB图像的基础上）结合红外或深度图谱（以及其他对齐后的图张量数据）实现多模态信息结合的能力。\n3. 同时根据自身的工作经验，我们在项目中提供大量不同的多模态模型结构基础模型进行对应的实验选择。\n4. 在项目中我提供了灵活自由的模型配置方式<本项目基于Ultralytics的YOLO以及 RTDETR 模型进行对应的修改>通过使用不同的模型 yaml配置方式实现调用不同的模型配置结构，同时拥有几百个改进点的改进项目结合多模态直接起飞～  \n5. 当前阶段仅考虑支持目标检测，实例分割，旋转目标检测。不支持姿态检测。\n6. 项目内容提供深度模态，DEM 模态的生成。不提供红外模态的生成\n7. 本项目不提供非对齐多模态图像的支持，不提供模态配准的内容，不提供数据集。\n\n## 2. 这个项目会以什么形式开展？\n\n1. 本次项目核心目的在于为大开箱即用的完善的图图多模态目标检测项目，由于架构设计的内容如果魔导的其他Ultralytics项目内的改进点也可以迁移到多模态项目中(例如v8v10、v11v12、rtdetr改进项目中)。\n2. 项目内我将提供多种不同形式，融合思路的模型配置，大家可以在其中选择一个进行改进创建。同时未来也会在项目中提供一些模块方便大家组合实验。\n3. 这个项目会以未来持续更新的态势进行扩展，包括支持更多多模态基础模型以及不同的实验功能，还有专属于多模态项目以及通用的改进模块。考虑到工作与时间上的问题这会是一个持续更新的过程，大家也不用着急。\n4. 附带答疑群，群里主要是答疑实验，代码操作，代码报错等问题。考虑到个人空闲时间问题不一定每一个问题都能及时回答，也可以在群里询问其他大佬的帮助。一些反复出现的高频问题也会收集录制对应的答疑视频来给大家解答。我本人也会在群里给一些多模态写作投稿的思路与建议。\n\n## 3. 入手须知\n\n1.\t本项目毕竟是为YOLO以及RT-DETR系列做的扩展，因此建议在已经有了ultralytics库的使用经验后来使用本项目。同时为了达到最佳效果，强烈建议搭配魔导的相关改进项目来配合使用。\n以下人群非常不建议入手此项目：\n- 未入门、1000%计算机小白（可以考虑先补充相关的基础知识）。\n- 不想花时间学习，不想了解多模态结构，仅仅只想水论文。\n- 不喜欢看说明或使用文档的。\n- 没有跑过ultralytics 库经验的。\n2. 此项目不涉及多模态数据中的配准相关问题。\n3. 考虑到架构复杂性问题以及多模态结构的特殊性，所以不会考虑提供多模态的剪枝蒸馏在内。但是会考虑提供生成模态的办法作为数据集来源缺失的补充。(生成模态办法主要以深度方面，采用成熟深度学习代码包括一些顶会的工作进行相关模态生成。由于生成模态的作用因此可以在单一模态数据集上进行额外扩展，实现一集多用的办法同时避免配准的问题。)\n4. 本项目仅包含图像相关的多模态，不包含图像+文字的多模态。\n5. 本项目的环境建议在torch2.0以上版本跑。有一些专门的优化API调用。模型显存占用，体积会比单模态较大，但是不用担心，速度不会降低很多，依然是快速的训练。\n\n## 4. 价格\n\n1. 本项目价格为288，购买过<YOLOV8V10改进项目>、<YOLO1112改进项目>、<RTDETR改进项目>其中之一的优惠50，优惠后价格为238。没有时效限制。\n2. 虚拟项目一经售出不退不换，需要入手前考虑清楚，如果你是初次入手我的项目，怕我不靠谱，可以先考虑入手个YOLO和RTDETR看下。\n3. 如果确定需要购买的话，请把以下的内容原封不动复制给汤圆，“确定2购买5多模态3项目”\n\n## 5. 项目使用问题\n\n1. 购买本项目的使用者都会得到一个独一无二的用于解压7z的密码，到时候用于解压对应的压缩包，此密码自己妥善保管，请勿告诉他人。\n2. 本项目的视频和直播回放统一都是加密视频，每个购买者都可以得到一个激活码，激活码在每个人专属的7z压缩文件内。\n\n## 6. 更新日志\n\n  2025年12月\n\n  - 多模态旋转框（OBB）支持：新增训练/验证/预测脚本与 OBB 模型 YAML\n  - 数据集加载修复：支持 .npz .npy 等文件形式加载\n  - 离线模态生成器：新增 DepthGen 深度图生成器、DEM 特征生成器、EdgeGen 边缘模态生成器\n  - 可视化系统增强：完善色彩空间与模态消融支持，增强分辨率控制与素材导出。\n  - 模块新增：新增三十余个模块与其对应配置文件\n\n  2025年11月\n\n  - 多模态路由：添加动态通道路由与预测器路由兼容性改进，并严格化单模态语义\n  - 网络与配置扩展：新增 LSCD 轻量化检测头、SOEP 小目标增强颈部模块、门控融合模块，C3k2，C2PSA等变体模块并补充大量多模态 YAML 配置\n  - 评估指标增强：移植/完善 COCO 评估，并扩展 COCO 尺寸分级 IoU 指标\n\n  2025年10月\n\n  - 修复RTDETR多模态预测器bbox坐标归一化偏移问题\n  - 修复RTDETRMM验证器tensor操作,完善RTDETRMM验证器的指标计算\n  - 优化残差融合架构并统一版本标识系统\n\n  2025年9月\n\n  - 多模态分割支持：实现YOLOMM多模态分割完整功能\n  - 可视化系统重构：重构为组件化Pipeline架构\n  - 性能优化：添加GFLOPs性能指标和统一profile接口\n  - 修复YOLOMM任务自动检测与类型兼容性\n  - YOLOv5/v9/v10多模态配置\n\n\n  2025年8月\n\n  - 高级融合模块：实现SOTA融合算法（CTF多头交叉注意力、FFN FCM等）\n    - FCM/FFN模块\n    - DEYOLO系列：DEA、DECA、DEPA、BiFocus、C2f_BiFocus\n    - CAM跨模态注意力机制\n    - CTF多头交叉注意力\n    - ICAFusion变体\n    - RD架构模块\n  - 对比学习系统：实现基础对比学习与特征捕获架构\n  - 多模态增强：完成IR专属增强和深度增强系统\n  - Wiki系统：构建项目内置文档说明系统\n  - 路由系统优化：统一MultiModalRouter接管软填充与消融\n  - 预测可视化重构：统一绘图组件与多模态输出\n  - 强化FP32数值稳定性与调试系统\n\n  2025年7月\n\n  - 可视化系统：实现完整Grad-CAM热力图和特征图可视化\n  - COCO验证功能：实现COCOMetrics类和YOLO到COCO格式转换器\n  - 可视化API统一：为YOLOMM和RTDETRMM添加vis()方法\n  - 支持多层独立可视化和letterbox预处理\n  - 修复多模态验证器参数显示问题\n"
  },
  {
    "path": "objectdetection-tricks/readme.md",
    "content": "# objectdetection-tricks\n这个项目主要是提供一些关于目标检测的tricks.\n\n# Explanation\n- **tricks_1**  \n    可视化并统计目标检测中的TP,FP,FN.  \n    视频教学地址：[可视化-哔哩哔哩](https://www.bilibili.com/video/BV18M411c7jN/).  [统计-哔哩哔哩](https://www.bilibili.com/video/BV1yM4y1d7Gp/).  \n- **tricks_2**  \n    深度学习小实验-卷积家族(fps,flops,param)对比实验.  \n    目前支持:Conv,DWConv,Ghost-Conv,GSConv,DSConv,PConv,DCNV2,DCNV3.  \n    视频教学地址：[3.8 哔哩哔哩](https://www.bilibili.com/video/BV15x4y1T7Ly/).  [3.19 哔哩哔哩](https://www.bilibili.com/video/BV1UL411R7Qr/).   \n- **tricks_3**  \n    yolov5中的FeatureMap可视化(热力图格式).  \n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV1LV4y1R7w6/).  \n- **tricks_4**  \n    用于yolov5和v7中的yolo格式转换coco格式的脚本.(如何在v5和v7中输出ap_small,ap_middle,ap_large coco指标)  \n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV14T411s7Ts/).  \n- **tricks_5**  \n    Segment Anything演示代码.  \n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV1hv4y1H7eg/).  \n- **tricks_6**  \n    固定随机种子以便在同一个主机上进行复现结果.  \n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV1bh4y1n7Yc/).  \n- **tricks_7**  \n    计算yolov5推理时间和FPS的脚本.  \n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV1Uu4y1C714/).  \n- **tricks_8**  \n    计算yolov7推理时间和FPS的脚本.  \n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV17p4y177Pe/).  \n- **tricks_9**  \n    深度学习小实验-YOLO-Block家族(fps,flops,param)对比实验.  \n    目前支持:C3(Yolov5),ELAN(Yolov7),C2f(Yolov8)RepNCSPELAN(Yolov9).  \n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV17H4y1V7s9/).  \n- **tricks_10**  \n    输出YOLOV8、RTDETR各个层的计算量和参数量.  \n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV1tb421b7aB/).  \n- **tricks_11**  \n    以YOLOV8为例，保存多个模型的PR曲线的数据并进行读取绘制到一张图上.  \n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV1uC41177oE/).  \n- **tricks_12**  \n    yolov5、v7、v8、v9、v10曲线对比图、推理时间vs精度对比图绘制手把手教程.  \n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV1yf421X7t5/).  \n- **tricks_13**  \n    YOLOV8-输出每一层的图特征图尺寸和通道数.  \n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV1Mz421B7xz/).  \n- **tricks_14**  \n    YOLOV8V10V11V12更详细的输出精度结果.\n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV1dBQDY6Ec5/).  \n- **tricks_15** \n    1. 统计YOLO格式数据集中每个类别的实例数和对应小中大目标的实例数。\n    2. 可视化YOLO格式数据集中的标签。\n    3. 去掉YOLO格式数据集中的部分类别并类别重新排序。  \n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV1k2TizGEnH). \n- **tricks_16**  \n    用于调试生成COCO指标的文件.  \n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV1SdNizEE4X/).  "
  },
  {
    "path": "objectdetection-tricks/tricks_1.py",
    "content": "import os, cv2, tqdm, shutil\nimport numpy as np\n\ndef xywh2xyxy(box):\n    box[:, 0] = box[:, 0] - box[:, 2] / 2\n    box[:, 1] = box[:, 1] - box[:, 3] / 2\n    box[:, 2] = box[:, 0] + box[:, 2]\n    box[:, 3] = box[:, 1] + box[:, 3]\n    return box\n\ndef iou(box1, box2):\n    x11, y11, x12, y12 = np.split(box1, 4, axis=1)\n    x21, y21, x22, y22 = np.split(box2, 4, axis=1)\n \n    xa = np.maximum(x11, np.transpose(x21))\n    xb = np.minimum(x12, np.transpose(x22))\n    ya = np.maximum(y11, np.transpose(y21))\n    yb = np.minimum(y12, np.transpose(y22))\n \n    area_inter = np.maximum(0, (xb - xa + 1)) * np.maximum(0, (yb - ya + 1))\n \n    area_1 = (x12 - x11 + 1) * (y12 - y11 + 1)\n    area_2 = (x22 - x21 + 1) * (y22 - y21 + 1)\n    area_union = area_1 + np.transpose(area_2) - area_inter\n \n    iou = area_inter / area_union\n    return iou\n\ndef draw_box(img, box, color):\n    cv2.rectangle(img, (int(box[0]), int(box[1])), (int(box[2]), int(box[3])), color, thickness=2)\n    return img\n\nif __name__ == '__main__':\n    postfix = 'jpg'\n    img_path = 'image'\n    label_path = 'label'\n    predict_path = 'predict'\n    save_path = 'vis'\n    classes = ['train', 'diningtable', 'person', 'bus', 'pottedplant', 'chair', 'cat', 'tvmonitor', 'motorbike', 'sofa', 'cow', 'bottle', 'aeroplane', 'dog', 'horse', 'car', 'boat', 'sheep', 'bicycle', 'bird']\n    detect_color, missing_color, error_color  = (0, 255, 0), (0, 0, 255), (255, 0, 0)\n    iou_threshold = 0.45\n    \n    if os.path.exists(save_path):\n        shutil.rmtree(save_path)\n    os.makedirs(save_path, exist_ok=True)\n\n    all_right_num, all_missing_num, all_error_num = 0, 0, 0\n    with open('result.txt', 'w') as f_w:\n        for path in tqdm.tqdm(os.listdir(label_path)):\n            image = cv2.imread(f'{img_path}/{path[:-4]}.{postfix}')\n            if image is None:\n                print(f'image:{img_path}/{path[:-4]}.{postfix} not found.', file=f_w)\n            h, w = image.shape[:2]\n            \n            try:\n                with open(f'{predict_path}/{path}') as f:\n                    pred = np.array(list(map(lambda x:np.array(x.strip().split(), dtype=np.float32), f.readlines())))\n                    pred[:, 1:5] = xywh2xyxy(pred[:, 1:5])\n                    pred[:, [1, 3]] *= w\n                    pred[:, [2, 4]] *= h\n                    pred = list(pred)\n            except:\n                pred = []\n            \n            try:\n                with open(f'{label_path}/{path}') as f:\n                    label = np.array(list(map(lambda x:np.array(x.strip().split(), dtype=np.float32), f.readlines())))\n                    label[:, 1:] = xywh2xyxy(label[:, 1:])\n                    label[:, [1, 3]] *= w\n                    label[:, [2, 4]] *= h\n            except:\n                print(f'label path:{label_path}/{path} (not found or no target).', file=f_w)\n            \n            right_num, missing_num, error_num = 0, 0, 0\n            label_id, pred_id = list(range(label.shape[0])), [] if len(pred) == 0 else list(range(len(pred)))\n            for i in range(label.shape[0]):\n                if len(pred) == 0: break\n                ious = iou(label[i:i+1, 1:], np.array(pred)[:, 1:5])[0]\n                ious_argsort = ious.argsort()[::-1]\n                missing = True\n                for j in ious_argsort:\n                    if ious[j] < iou_threshold: break\n                    if label[i, 0] == pred[j][0]:\n                        image = draw_box(image, pred[j][1:5], detect_color)\n                        pred.pop(j)\n                        missing = False\n                        right_num += 1\n                        break\n                \n                if missing:\n                    image = draw_box(image, label[i][1:5], missing_color)\n                    missing_num += 1\n            \n            if len(pred):\n                for j in range(len(pred)):\n                    image = draw_box(image, pred[j][1:5], error_color)\n                    error_num += 1\n            \n            all_right_num, all_missing_num, all_error_num = all_right_num + right_num, all_missing_num + missing_num, all_error_num + error_num\n            cv2.imwrite(f'{save_path}/{path[:-4]}.{postfix}', image)\n            print(f'name:{path[:-4]} right:{right_num} missing:{missing_num} error:{error_num}', file=f_w)\n        print(f'all_result: right:{all_right_num} missing:{all_missing_num} error:{all_error_num}', file=f_w)\n"
  },
  {
    "path": "objectdetection-tricks/tricks_10.py",
    "content": "import torch, thop\nfrom thop import profile\nfrom ultralytics import YOLO, RTDETR\nfrom prettytable import PrettyTable\n\nif __name__ == '__main__':\n    batch_size, height, width = 1, 640, 640\n\n    model = YOLO(r'ultralytics/cfg/models/yolov8/yolov8n.yaml').model # select your model.pt path\n    # model = RTDETR(r'ultralytics/cfg/models/rt-detr/rtdetr-resnet50.yaml').model\n    model.fuse()\n    input = torch.randn(batch_size, 3, height, width)\n    total_flops, total_params, layers = profile(model, [input], verbose=True, ret_layer_info=True)\n    FLOPs, Params = thop.clever_format([total_flops * 2 / batch_size, total_params], \"%.3f\")\n    table = PrettyTable()\n    table.title = f'Model Flops:{FLOPs} Params:{Params}'\n    table.field_names = ['Layer ID', \"FLOPs\", \"Params\"]\n    for layer_id in layers['model'][2]:\n        data = layers['model'][2][layer_id]\n        FLOPs, Params = thop.clever_format([data[0] * 2 / batch_size, data[1]], \"%.3f\")\n        table.add_row([layer_id, FLOPs, Params])\n    print(table)"
  },
  {
    "path": "objectdetection-tricks/tricks_11.py",
    "content": "import numpy as np\nimport pandas as pd\nimport matplotlib.pyplot as plt\n\nif __name__ == '__main__':\n    file_list = ['a/face_Box.csv', 'b/face_Box.csv']\n    names = ['improve', 'baseline']\n    ap = ['0.673', '0.639']\n    \n    plt.figure(figsize=(6, 6))\n    for i in range(len(file_list)):\n        pr_data = pd.read_csv(file_list[i], header=None)\n        recall, precision = np.array(pr_data[0]), np.array(pr_data[1])\n        \n        plt.plot(recall, precision, label=f'{names[i]} ap:{ap[i]}')\n    plt.xlabel('Recall')\n    plt.ylabel('Precision')\n    plt.title('Precision-Recall Curve')\n    plt.legend()\n    plt.tight_layout()\n    plt.savefig('pr.png')"
  },
  {
    "path": "objectdetection-tricks/tricks_12.py",
    "content": "import pandas as pd\nimport numpy as np\nimport matplotlib.pylab as plt\n\ndef deal_yolov7_result(data_path):\n    with open(data_path) as f:\n        data = np.array(list(map(lambda x:np.array(x.strip().split()), f.readlines())))\n    return data\n\nif __name__ == '__main__':\n    epoch = 50\n    yolov5_result_csv = '/home/hjj/Desktop/github_code/yolov5/runs/train/yolov5n-crowdhuman/results.csv'\n    yolov7_result_csv = '/home/hjj/Desktop/github_code/yolov7/runs/train/yolov7-tiny-crowdhuman/results.txt'\n    yolov8_result_csv = '/home/hjj/Desktop/github_code/ultralytics/runs/train/yolov8n-crowdhuman/results.csv'\n    yolov9_result_csv = '/home/hjj/Desktop/github_code/yolov9/runs/train/yolov9s-corwdhuman/results.csv'\n    yolov10_result_csv = '/home/hjj/Desktop/github_code/yolov10/runs/train/yolov10n-crowdhuman/results.csv'\n    \n    yolov5_result_data = pd.read_csv(yolov5_result_csv)\n    yolov7_result_data = deal_yolov7_result(yolov7_result_csv)\n    yolov8_result_data = pd.read_csv(yolov8_result_csv)\n    yolov9_result_data = pd.read_csv(yolov9_result_csv)\n    yolov10_result_data = pd.read_csv(yolov10_result_csv)\n    \n    plt.figure(figsize=(10, 8))  # 调整图形大小\n    plt.plot(np.arange(epoch), yolov5_result_data['     metrics/mAP_0.5'], label='yolov5n', linewidth=2)\n    plt.plot(np.arange(epoch), np.array(yolov7_result_data[:, 11], dtype=float), label='yolov7-tiny', linewidth=2)\n    plt.plot(np.arange(epoch), yolov8_result_data['       metrics/mAP50(B)'], label='yolov8n', linewidth=2)\n    plt.plot(np.arange(epoch), yolov9_result_data['     metrics/mAP_0.5'], label='yolov9s', linewidth=2)\n    plt.plot(np.arange(epoch), yolov10_result_data['       metrics/mAP50(B)'], label='yolov10n', linewidth=2)\n    \n    plt.xlabel('Epoch', fontsize=14)  # 调整x轴标签字体大小\n    plt.ylabel('mAP@0.5', fontsize=14)  # 调整y轴标签字体大小\n    plt.legend(fontsize=20)  # 调整图例字体大小\n    plt.xticks(fontsize=12)  # 调整x轴刻度字体大小\n    plt.yticks(fontsize=12)  # 调整y轴刻度字体大小\n    plt.title('YOLO CrowdHuman mAP50 Curve', fontsize=20)\n    plt.tight_layout()\n    plt.savefig('mAP50-curve.png')\n    \n    data_dict = {\n        'yolov5n':[0.672, 0.1+3.2+0.7, '+'], \n        'yolov7-tiny':[0.74, 4.0, '*'],\n        'yolov8n':[0.711, 4.5, 'x'],\n        'yolov9s':[0.772, 9.9, 'D'],\n        'yolov10n':[0.727, 5.3, '_']\n    }\n    \n    plt.figure(figsize=(10, 8))  # 调整图形大小\n    for model_name in data_dict:\n        print(data_dict[model_name][1], data_dict[model_name][0])\n        plt.scatter(data_dict[model_name][1], data_dict[model_name][0], label=model_name, marker=data_dict[model_name][2], s=500)\n    plt.xlabel('Inference Time(ms/img)', fontsize=14)  # 调整x轴标签字体大小\n    plt.ylabel('mAP@0.5', fontsize=14)  # 调整y轴标签字体大小\n    plt.legend(fontsize=20, loc=4)  # 调整图例字体大小\n    plt.xticks(fontsize=12)  # 调整x轴刻度字体大小\n    plt.yticks(fontsize=12)  # 调整y轴刻度字体大小\n    plt.title('inferencetimevsmAP50', fontsize=20)\n    plt.tight_layout()\n    plt.savefig('inferencetimevsmAP50.png')\n"
  },
  {
    "path": "objectdetection-tricks/tricks_13.py",
    "content": "if type(x) in {list, tuple}:\n    if idx == (len(self.model) - 1):\n        if type(x[1]) is dict:\n            print(f'layer id:{idx:>2} {m.type:>50} output shape:{\", \".join([str(x_.size()) for x_ in x[1][\"one2one\"]])}')\n        else:\n            print(f'layer id:{idx:>2} {m.type:>50} output shape:{\", \".join([str(x_.size()) for x_ in x[1]])}')\n    else:\n        print(f'layer id:{idx:>2} {m.type:>50} output shape:{\", \".join([str(x_.size()) for x_ in x if x_ is not None])}')\nelif type(x) is dict:\n    print(f'layer id:{idx:>2} {m.type:>50} output shape:{\", \".join([str(x_.size()) for x_ in x[\"one2one\"]])}')\nelse:\n    if not hasattr(m, 'backbone'):\n        print(f'layer id:{idx:>2} {m.type:>50} output shape:{x.size()}')"
  },
  {
    "path": "objectdetection-tricks/tricks_14.py",
    "content": "import warnings\nwarnings.filterwarnings('ignore')\nimport os\nimport numpy as np\nfrom prettytable import PrettyTable\nfrom ultralytics import YOLO\nfrom ultralytics.utils.torch_utils import model_info\n\n# BILIBILI UP 魔傀面具\n# 验证参数官方详解链接：https://docs.ultralytics.com/modes/val/#usage-examples:~:text=of%20each%20category-,Arguments%20for%20YOLO%20Model%20Validation,-When%20validating%20YOLO\n\ndef get_weight_size(path):\n    stats = os.stat(path)\n    return f'{stats.st_size / 1024 / 1024:.1f}'\n\nif __name__ == '__main__':\n    model_path = 'runs/train/exp/weights/best.pt'\n    model = YOLO(model_path) # 选择训练好的权重路径\n    result = model.val(data='/root/dataset/dataset_visdrone/data.yaml',\n                        split='val', # split可以选择train、val、test 根据自己的数据集情况来选择.\n                        imgsz=640,\n                        batch=16,\n                        project='runs/val',\n                        name='exp',\n                        )\n    \n    if model.task == 'detect': # 仅目标检测任务适用\n        length = result.box.p.size\n        model_names = list(result.names.values())\n        preprocess_time_per_image = result.speed['preprocess']\n        inference_time_per_image = result.speed['inference']\n        postprocess_time_per_image = result.speed['postprocess']\n        all_time_per_image = preprocess_time_per_image + inference_time_per_image + postprocess_time_per_image\n        \n        n_l, n_p, n_g, flops = model_info(model.model)\n        \n        print('-'*20 + '论文上的数据以以下结果为准' + '-'*20)\n        print('-'*20 + '论文上的数据以以下结果为准' + '-'*20)\n        print('-'*20 + '论文上的数据以以下结果为准' + '-'*20)\n        print('-'*20 + '论文上的数据以以下结果为准' + '-'*20)\n        print('-'*20 + '论文上的数据以以下结果为准' + '-'*20)\n\n        model_info_table = PrettyTable()\n        model_info_table.title = \"Model Info\"\n        model_info_table.field_names = [\"GFLOPs\", \"Parameters\", \"前处理时间/一张图\", \"推理时间/一张图\", \"后处理时间/一张图\", \"FPS(前处理+模型推理+后处理)\", \"FPS(推理)\", \"Model File Size\"]\n        model_info_table.add_row([f'{flops:.1f}', f'{n_p:,}', \n                                  f'{preprocess_time_per_image / 1000:.6f}s', f'{inference_time_per_image / 1000:.6f}s', \n                                  f'{postprocess_time_per_image / 1000:.6f}s', f'{1000 / all_time_per_image:.2f}', \n                                  f'{1000 / inference_time_per_image:.2f}', f'{get_weight_size(model_path)}MB'])\n        print(model_info_table)\n\n        model_metrice_table = PrettyTable()\n        model_metrice_table.title = \"Model Metrice\"\n        model_metrice_table.field_names = [\"Class Name\", \"Precision\", \"Recall\", \"F1-Score\", \"mAP50\", \"mAP75\", \"mAP50-95\"]\n        for idx in range(length):\n            model_metrice_table.add_row([\n                                        model_names[idx], \n                                        f\"{result.box.p[idx]:.4f}\", \n                                        f\"{result.box.r[idx]:.4f}\", \n                                        f\"{result.box.f1[idx]:.4f}\", \n                                        f\"{result.box.ap50[idx]:.4f}\", \n                                        f\"{result.box.all_ap[idx, 5]:.4f}\", # 50 55 60 65 70 75 80 85 90 95 \n                                        f\"{result.box.ap[idx]:.4f}\"\n                                    ])\n        model_metrice_table.add_row([\n                                    \"all(平均数据)\", \n                                    f\"{result.results_dict['metrics/precision(B)']:.4f}\", \n                                    f\"{result.results_dict['metrics/recall(B)']:.4f}\", \n                                    f\"{np.mean(result.box.f1[:length]):.4f}\", \n                                    f\"{result.results_dict['metrics/mAP50(B)']:.4f}\", \n                                    f\"{np.mean(result.box.all_ap[:length, 5]):.4f}\", # 50 55 60 65 70 75 80 85 90 95 \n                                    f\"{result.results_dict['metrics/mAP50-95(B)']:.4f}\"\n                                ])\n        print(model_metrice_table)\n\n        with open(result.save_dir / 'paper_data.txt', 'w+') as f:\n            f.write(str(model_info_table))\n            f.write('\\n')\n            f.write(str(model_metrice_table))\n        \n        print('-'*20, f'结果已保存至{result.save_dir}/paper_data.txt...', '-'*20)\n        print('-'*20, f'结果已保存至{result.save_dir}/paper_data.txt...', '-'*20)\n        print('-'*20, f'结果已保存至{result.save_dir}/paper_data.txt...', '-'*20)\n        print('-'*20, f'结果已保存至{result.save_dir}/paper_data.txt...', '-'*20)\n        print('-'*20, f'结果已保存至{result.save_dir}/paper_data.txt...', '-'*20)"
  },
  {
    "path": "objectdetection-tricks/tricks_15.py",
    "content": "import os, glob, cv2, tqdm\nfrom prettytable import PrettyTable\n\nRED, GREEN, BLUE, YELLOW, ORANGE, RESET = \"\\033[91m\", \"\\033[92m\", \"\\033[94m\", \"\\033[93m\", \"\\033[38;5;208m\", \"\\033[0m\"\n\nimage_postfix = ['jpg', 'png', 'bmp', 'tif']\nimages_folder_path = ['/home/dataset/dataset_visdrone/VisDrone2019-DET-train/images', \n                      '/home/dataset/dataset_visdrone/VisDrone2019-DET-val/images',\n                      '/home/dataset/dataset_visdrone/VisDrone2019-DET-test-dev/images']\nlabels_folder_path = ['/home/dataset/dataset_visdrone/VisDrone2019-DET-train/labels',\n                      '/home/dataset/dataset_visdrone/VisDrone2019-DET-val/labels',\n                      '/home/dataset/dataset_visdrone/VisDrone2019-DET-test-dev/labels']\nclasses = ['pedestrian', 'people', 'bicycle', 'car', 'van', 'truck', 'tricycle', 'awning-tricycle', 'bus', 'motor']\n# classes = ['people', 'bicycle', 'van', 'truck', 'tricycle', 'awning-tricycle', 'bus', 'motor']\nobject_info = [32*32, 96*96]\nCOLOR_LIST = [\n    (255, 0, 0),         # 红色 (person)\n    (0, 255, 0),         # 绿色 (car)\n    (0, 0, 255),         # 蓝色 (bike)\n    (255, 165, 0),       # 橙色 (motorcycle)\n    (255, 255, 0),       # 黄色 (truck)\n    (0, 255, 255),       # 青色 (bus)\n    (255, 0, 255),       # 品红 (train)\n    (255, 255, 255),     # 白色 (airplane)\n    (128, 0, 0),         # 棕色 (dog)\n    (0, 128, 0),         # 深绿色 (cat)\n    (0, 0, 128),         # 深蓝色 (horse)\n    (128, 128, 0),       # 橄榄色 (sheep)\n    (0, 128, 128),       # 蓝绿色 (cow)\n    (128, 0, 128),       # 紫色 (elephant)\n    (192, 192, 192),     # 银色 (giraffe)\n    (255, 99, 71),       # 番茄色 (zebra)\n    (0, 255, 127),       # 春绿色 (monkey)\n    (255, 105, 180),     # 深粉色 (bird)\n    (70, 130, 180),      # 钢蓝色 (fish)\n]\n\ndef get_color_by_class(class_id):\n    # 根据类别的索引返回固定颜色\n    return COLOR_LIST[class_id % len(COLOR_LIST)]  # 确保索引不越界\n\ndef draw_detections(box, name, color, img):\n    height, width, _ = img.shape\n    xmin, ymin, xmax, ymax = list(map(int, list(box)))\n    \n    # 根据图像大小调整矩形框的线宽和文本的大小\n    line_thickness = max(1, int(min(height, width) / 400))\n    font_scale = min(height, width) / 1000\n    font_thickness = max(1, int(min(height, width) / 400))\n    # 根据图像大小调整文本的纵向位置\n    text_offset_y = int(min(height, width) / 100)\n    \n    cv2.rectangle(img, (xmin, ymin), (xmax, ymax), color, line_thickness)\n    cv2.putText(img, str(name), (xmin, ymin - text_offset_y), cv2.FONT_HERSHEY_SIMPLEX, font_scale, (0, 255, 0), font_thickness, lineType=cv2.LINE_AA)\n    return img\n\ndef get_images_and_labels_path(images_folder_path, labels_folder_path):\n    labels_path_list, labels_filename = [], {}\n    for folder_path in labels_folder_path:\n        glob_list = glob.glob(os.path.join(folder_path, '*.txt'))\n        filename = {os.path.splitext(os.path.basename(i))[0]:i for i in glob_list}\n        labels_path_list.extend(glob_list)\n        labels_filename.update(filename)\n    \n    images_path_list, images_filename = [], {}\n    for folder_path in images_folder_path:\n        for p in image_postfix:\n            glob_list = glob.glob(os.path.join(folder_path, f'*.{p}'))\n            filename = {os.path.splitext(os.path.basename(i))[0]:i for i in glob_list}\n            images_path_list.extend(glob_list)\n            images_filename.update(filename)\n    \n    print(ORANGE + f'image_path_length:{len(images_filename)} label_path_length:{len(labels_filename)}')\n\n    image_label_dict = {}\n    for i in labels_filename:\n        if i in images_filename:\n            image_label_dict[labels_filename[i]] = images_filename[i]\n    \n    print(f'After matching. data_length:{len(image_label_dict)}' + RESET)\n\n    return image_label_dict, labels_path_list\n\ndef show_dataset_info(image_label_dict, visual_box=False, save_path='visual_box'):\n    if visual_box and not os.path.exists(save_path):\n        os.makedirs(save_path)\n\n    classes_dict = {cls:{'s':0, 'm':0, 'l':0, 'num':0} for cls in classes}\n    for label_path in tqdm.tqdm(image_label_dict):\n        image_path = image_label_dict[label_path]\n\n        image = cv2.imread(image_path)\n        try:\n            h, w = image.shape[:2]\n        except:\n            print(RED + f'{image_path} read failure. skip.' + RESET)\n        \n        with open(label_path) as f:\n            label = list(map(lambda x:x.strip().split(), f.readlines()))\n        \n        for cls_id,x_c,y_c,width,height in label:\n            classes_dict[classes[int(float(cls_id))]]['num'] += 1\n            width = float(width) * w\n            height = float(height) * h\n            obj_area = width * height\n\n            if obj_area < object_info[0]:\n                classes_dict[classes[int(float(cls_id))]]['s'] += 1\n            elif obj_area > object_info[1]:\n                classes_dict[classes[int(float(cls_id))]]['l'] += 1\n            else:\n                classes_dict[classes[int(float(cls_id))]]['m'] += 1\n            \n            if visual_box:\n                x_c, y_c = float(x_c) * w, float(y_c) * h\n                x_min, y_min, x_max, y_max = x_c - width / 2, y_c - height / 2, x_c + width / 2, y_c + height / 2\n                image = draw_detections([x_min, y_min, x_max, y_max], classes[int(float(cls_id))], get_color_by_class(int(float(cls_id))), image)\n                cv2.imwrite(os.path.join(save_path, os.path.basename(image_path)), image)\n    \n    # 统计总和\n    total_s = sum(v['s'] for v in classes_dict.values())\n    total_m = sum(v['m'] for v in classes_dict.values())\n    total_l = sum(v['l'] for v in classes_dict.values())\n    total_num = sum(v['num'] for v in classes_dict.values())\n\n    # 创建表格\n    table = PrettyTable()\n    table.field_names = [\"Category\", \"Small (s)\", \"Medium (m)\", \"Large (l)\", \"Total (num)\"]\n\n    # 添加每一行\n    for category, values in classes_dict.items():\n        s, m, l, num = values['s'], values['m'], values['l'], values['num']\n        row = [\n            category,\n            f\"{s} ({s/num:.1%})\",\n            f\"{m} ({m/num:.1%})\",\n            f\"{l} ({l/num:.1%})\",\n            num\n        ]\n        table.add_row(row)\n\n    # 添加总计行\n    row_total = [\n        \"All\",\n        f\"{total_s} ({total_s/total_num:.1%})\",\n        f\"{total_m} ({total_m/total_num:.1%})\",\n        f\"{total_l} ({total_l/total_num:.1%})\",\n        total_num\n    ]\n    table.add_row(row_total)\n\n    # 可选：左对齐类别列\n    table.align[\"Category\"] = \"l\"\n\n    # 打印表格\n    print(table)\n\ndef remap_yolo_dataset_class(labels_path_list, delete_label=[0, 1, 3, 5]):\n    classes = []\n    for label_path in tqdm.tqdm(labels_path_list, desc='scan dataset class'):\n        with open(label_path) as f:\n            label = list(map(lambda x:x.strip().split(), f.readlines()))\n            \n        for cls_id,x_c,y_c,width,height in label:\n            classes.append(int(float(cls_id)))\n    classes = sorted(list(set(classes)))\n    filter_classes = list(sorted(set(classes) - set(delete_label)))\n    print(ORANGE + f'now classes:{classes} delete classes:{delete_label} filter_classes:{filter_classes}' + RESET)\n\n    for label_path in tqdm.tqdm(labels_path_list, desc='process dataset class'):\n        with open(label_path) as f:\n            label = list(map(lambda x:x.strip().split(), f.readlines()))\n        \n        new_label = []\n        for cls_id,x_c,y_c,width,height in label:\n            if int(float(cls_id)) in delete_label:\n                continue\n\n            new_label.append(' '.join([str(filter_classes.index(int(float(cls_id)))),x_c,y_c,width,height]))\n        \n        with open(label_path, 'w+') as f:\n            f.write('\\n'.join(new_label))\n\nif __name__ == '__main__':\n    image_label_dict, labels_path_list = get_images_and_labels_path(images_folder_path, labels_folder_path)\n    \n    show_dataset_info(image_label_dict, visual_box=True)\n    # remap_yolo_dataset_class(labels_path_list, delete_label=[0, 3])"
  },
  {
    "path": "objectdetection-tricks/tricks_16.py",
    "content": "import json, tqdm, cv2, shutil, os\nimport numpy as np\nimport matplotlib.pyplot as plt\n\n# 1. 标签文件类别有问题，例如类别从1开始，不是从0开始。\n# 2. image_id不匹配。\n# 3. 标签的box异常。\n\nSAVE_PATH = 'coco_visual'\nLABEL_COCO_PATH = '/Users/moguimianju/Downloads/data.json'\nPRED_COCO_PATH = '/Users/moguimianju/Downloads/predictions.json'\nSCORE_THR = 0.2\nCOLOR_LIST = [\n    (255, 0, 0),         # 红色 (person)\n    (0, 255, 0),         # 绿色 (car)\n    (0, 0, 255),         # 蓝色 (bike)\n    (255, 165, 0),       # 橙色 (motorcycle)\n    (255, 255, 0),       # 黄色 (truck)\n    (0, 255, 255),       # 青色 (bus)\n    (255, 0, 255),       # 品红 (train)\n    (255, 255, 255),     # 白色 (airplane)\n    (128, 0, 0),         # 棕色 (dog)\n    (0, 128, 0),         # 深绿色 (cat)\n    (0, 0, 128),         # 深蓝色 (horse)\n    (128, 128, 0),       # 橄榄色 (sheep)\n    (0, 128, 128),       # 蓝绿色 (cow)\n    (128, 0, 128),       # 紫色 (elephant)\n    (192, 192, 192),     # 银色 (giraffe)\n    (255, 99, 71),       # 番茄色 (zebra)\n    (0, 255, 127),       # 春绿色 (monkey)\n    (255, 105, 180),     # 深粉色 (bird)\n    (70, 130, 180),      # 钢蓝色 (fish)\n]\n\ndef get_color_by_class(class_id):\n    # 根据类别的索引返回固定颜色\n    return COLOR_LIST[class_id % len(COLOR_LIST)]  # 确保索引不越界\n\ndef draw_detections(box, name, color, img):\n    height, width, _ = img.shape\n    xmin, ymin, xmax, ymax = list(map(int, list(box)))\n    \n    # 根据图像大小调整矩形框的线宽和文本的大小\n    line_thickness = max(1, int(min(height, width) / 400))\n    font_scale = min(height, width) / 1000\n    font_thickness = max(1, int(min(height, width) / 400))\n    # 根据图像大小调整文本的纵向位置\n    text_offset_y = int(min(height, width) / 100)\n    \n    cv2.rectangle(img, (xmin, ymin), (xmax, ymax), color, line_thickness)\n    cv2.putText(img, str(name), (xmin, ymin - text_offset_y), cv2.FONT_HERSHEY_SIMPLEX, font_scale, (0, 255, 0), font_thickness, lineType=cv2.LINE_AA)\n    return img\n\nif __name__ == '__main__':\n    if os.path.exists(SAVE_PATH):\n        shutil.rmtree(SAVE_PATH)\n    os.makedirs(SAVE_PATH)\n\n    with open(LABEL_COCO_PATH) as f:\n        label = json.load(f)\n\n    with open(PRED_COCO_PATH) as f:\n        predictions = json.load(f)\n\n    print(f'label json classes info:{label[\"categories\"]}')\n\n    label_dict = {}\n    for data in label['images']:\n        image_id = data['id']\n        label_dict[image_id] = {'file_name':data['file_name'], 'width':data['width'], 'height':data['height'], 'bbox_info':[]}\n    \n    for data in tqdm.tqdm(label['annotations'], desc='process annotations'):\n        image_id = data['image_id']\n        label_dict[image_id]['bbox_info'].append({'class_id':data['category_id'], 'bbox':data['bbox']})\n    \n    pred_classes_set = []\n    pred_dict = {}\n    for data in tqdm.tqdm(predictions, desc='process predictions'):\n        image_id = data['image_id']\n        if image_id not in pred_dict:\n            pred_dict[image_id] = []\n        if data['category_id'] not in pred_classes_set:\n            pred_classes_set.append(data['category_id'])\n        if data['score'] < SCORE_THR:\n            continue\n        pred_dict[image_id].append({'class_id':data['category_id'], 'bbox':data['bbox'], 'score':data['score']})\n\n    print(f'predictions json classes set:{sorted(pred_classes_set)}')\n\n    # print('-'*40 + 'label image_id' + '-'*40)\n    # print(label_dict.keys())\n    # print('-'*40 + 'pred image_id' + '-'*40)\n    # print(pred_dict.keys())\n\n    for image_id in tqdm.tqdm(label_dict, desc='process draw func'):\n        if image_id not in pred_dict:\n            print(f'image id:{image_id} not in predictions.json')\n            continue\n\n        label_img = np.ones((label_dict[image_id]['height'], label_dict[image_id]['width'], 3), dtype=np.uint8) * 255\n        pred_img = np.ones((label_dict[image_id]['height'], label_dict[image_id]['width'], 3), dtype=np.uint8) * 255\n\n        for bbox_info in label_dict[image_id]['bbox_info']:\n            class_id = bbox_info['class_id']\n            x, y, w, h = bbox_info['bbox']\n            x_min, y_min, x_max, y_max = x - w / 2, y - h / 2, x + w / 2, y + h / 2\n            draw_detections([x_min, y_min, x_max, y_max], f'{class_id}', get_color_by_class(class_id), label_img)\n        \n        for bbox_info in pred_dict[image_id]:\n            class_id = bbox_info['class_id']\n            score = bbox_info['score']\n            x, y, w, h = bbox_info['bbox']\n            x_min, y_min, x_max, y_max = x - w / 2, y - h / 2, x + w / 2, y + h / 2\n            draw_detections([x_min, y_min, x_max, y_max], f'{class_id} {score:.2f}', get_color_by_class(class_id), pred_img)\n        \n        plt.figure(figsize=(12, 8))\n\n        plt.subplot(1, 2, 1)\n        plt.imshow(cv2.cvtColor(label_img, cv2.COLOR_BGR2RGB))\n        plt.axis('off')\n        plt.title('label')\n\n        plt.subplot(1, 2, 2)\n        plt.imshow(cv2.cvtColor(pred_img, cv2.COLOR_BGR2RGB))\n        plt.axis('off')\n        plt.title('predictions')\n\n        plt.tight_layout()\n        plt.savefig(f'{SAVE_PATH}/{image_id}.png')\n        plt.close()"
  },
  {
    "path": "objectdetection-tricks/tricks_2.py",
    "content": "import torch, time, math, thop, tqdm, torchvision\nimport torch.nn as nn\nimport torch.nn.functional as F\nfrom torch.nn.modules.conv import _ConvNd\nfrom torch.nn.modules.utils import _pair\nfrom torch.nn.parameter import Parameter\nfrom prettytable import PrettyTable\n\ndef time_synchronized():\n    # pytorch-accurate time\n    if torch.cuda.is_available():\n        torch.cuda.synchronize()\n    return time.time()\n\ndef autopad(k, p=None, d=1):  # kernel, padding, dilation\n    # Pad to 'same' shape outputs\n    if d > 1:\n        k = d * (k - 1) + 1 if isinstance(k, int) else [d * (x - 1) + 1 for x in k]  # actual kernel-size\n    if p is None:\n        p = k // 2 if isinstance(k, int) else [x // 2 for x in k]  # auto-pad\n    return p\n\nclass Conv2D(nn.Module):\n    def __init__(self, inc, ouc, kernel_size, g=1):\n        super().__init__()\n        \n        self.conv = nn.Conv2d(inc, ouc, kernel_size, padding=autopad(kernel_size), groups=g)\n        self.bn = nn.BatchNorm2d(num_features=ouc)\n        self.act = nn.ReLU(inplace=True)\n    \n    def forward(self, x):\n        return self.act(self.bn(self.conv(x)))\n\n    def __str__(self):\n        return 'Conv2D'\n\nclass DConv2D(nn.Module):\n    def __init__(self, inc, ouc, kernel_size):\n        super().__init__()\n        \n        self.pw = Conv2D(inc, ouc, 1)\n        self.dw = Conv2D(ouc, ouc, kernel_size, g=ouc)\n    \n    def forward(self, x):\n        return self.dw(self.pw(x))\n\n    def __str__(self):\n        return 'Depth-Conv2D'\n\nclass GhostConv2D(nn.Module):\n    def __init__(self, inp, oup, kernel_size=1, ratio=2, dw_size=3):\n        super().__init__()\n        self.oup = oup\n        init_channels = math.ceil(oup / ratio)\n        new_channels = init_channels*(ratio-1)\n\n        self.primary_conv = Conv2D(inp, init_channels, kernel_size)\n        self.cheap_operation = Conv2D(init_channels, new_channels, dw_size, g=init_channels)\n\n    def forward(self, x):\n        x1 = self.primary_conv(x)\n        x2 = self.cheap_operation(x1)\n        out = torch.cat([x1,x2], dim=1)\n        return out[:,:self.oup,:,:]\n\n    def __str__(self):\n        return 'Ghost-Conv2D'\n\nclass GSConv(nn.Module):\n    # GSConv https://github.com/AlanLi1997/slim-neck-by-gsconv\n    def __init__(self, c1, c2, k=1, s=1, g=1):\n        super().__init__()\n        c_ = c2 // 2\n        self.cv1 = Conv2D(c1, c_, k, g)\n        self.cv2 = Conv2D(c_, c_, 5, c_)\n\n    def forward(self, x):\n        x1 = self.cv1(x)\n        x2 = torch.cat((x1, self.cv2(x1)), 1)\n        # shuffle\n        # y = x2.reshape(x2.shape[0], 2, x2.shape[1] // 2, x2.shape[2], x2.shape[3])\n        # y = y.permute(0, 2, 1, 3, 4)\n        # return y.reshape(y.shape[0], -1, y.shape[3], y.shape[4])\n\n        b, n, h, w = x2.data.size()\n        b_n = b * n // 2\n        y = x2.reshape(b_n, 2, h * w)\n        y = y.permute(1, 0, 2)\n        y = y.reshape(2, -1, n // 2, h, w)\n\n        return torch.cat((y[0], y[1]), 1)\n    \n    def __str__(self):\n        return 'GSConv2D'\n\nclass DSConv(_ConvNd):\n    def __init__(self, in_channels, out_channels, kernel_size, block_size=32, stride=1,\n                 padding=None, dilation=1, groups=1, padding_mode='zeros', bias=False, KDSBias=False, CDS=False):\n        padding = _pair(autopad(kernel_size, padding, dilation))\n        kernel_size = _pair(kernel_size)\n        stride = _pair(stride)\n        dilation = _pair(dilation)\n\n        blck_numb = math.ceil(((in_channels)/(block_size*groups)))\n        super(DSConv, self).__init__(\n            in_channels, out_channels, kernel_size, stride, padding, dilation,\n            False, _pair(0), groups, bias, padding_mode)\n\n        # KDS weight From Paper\n        self.intweight = torch.Tensor(out_channels, in_channels, *kernel_size)\n        self.alpha = torch.Tensor(out_channels, blck_numb, *kernel_size)\n\n        # KDS bias From Paper\n        self.KDSBias = KDSBias\n        self.CDS = CDS\n\n        if KDSBias:\n            self.KDSb = torch.Tensor(out_channels, blck_numb, *kernel_size)\n        if CDS:\n            self.CDSw = torch.Tensor(out_channels)\n            self.CDSb = torch.Tensor(out_channels)\n\n        self.reset_parameters()\n\n    def get_weight_res(self):\n        # Include expansion of alpha and multiplication with weights to include in the convolution layer here\n        alpha_res = torch.zeros(self.weight.shape).to(self.alpha.device)\n\n        # Include KDSBias\n        if self.KDSBias:\n            KDSBias_res = torch.zeros(self.weight.shape).to(self.alpha.device)\n\n        # Handy definitions:\n        nmb_blocks = self.alpha.shape[1]\n        total_depth = self.weight.shape[1]\n        bs = total_depth//nmb_blocks\n\n        llb = total_depth-(nmb_blocks-1)*bs\n\n        # Casting the Alpha values as same tensor shape as weight\n        for i in range(nmb_blocks):\n            length_blk = llb if i==nmb_blocks-1 else bs\n\n            shp = self.alpha.shape # Notice this is the same shape for the bias as well\n            to_repeat=self.alpha[:, i, ...].view(shp[0],1,shp[2],shp[3]).clone()\n            repeated = to_repeat.expand(shp[0], length_blk, shp[2], shp[3]).clone()\n            alpha_res[:, i*bs:(i*bs+length_blk), ...] = repeated.clone()\n\n            if self.KDSBias:\n                to_repeat = self.KDSb[:, i, ...].view(shp[0], 1, shp[2], shp[3]).clone()\n                repeated = to_repeat.expand(shp[0], length_blk, shp[2], shp[3]).clone()\n                KDSBias_res[:, i*bs:(i*bs+length_blk), ...] = repeated.clone()\n\n        if self.CDS:\n            to_repeat = self.CDSw.view(-1, 1, 1, 1)\n            repeated = to_repeat.expand_as(self.weight)\n            print(repeated.shape)\n\n        # Element-wise multiplication of alpha and weight\n        weight_res = torch.mul(alpha_res, self.weight)\n        if self.KDSBias:\n            weight_res = torch.add(weight_res, KDSBias_res)\n        return weight_res\n\n    def forward(self, input):\n        # Get resulting weight\n        #weight_res = self.get_weight_res()\n\n        # Returning convolution\n        return F.conv2d(input, self.weight, self.bias,\n                            self.stride, self.padding, self.dilation,\n                            self.groups)\n\nclass DSConv2D(Conv2D):\n    def __init__(self, inc, ouc, kernel_size, g=1):\n        super().__init__(inc, ouc, kernel_size, g)\n        self.conv = DSConv(inc, ouc, kernel_size)\n    \n    def __str__(self):\n        return 'DSConv2D'\n\nclass Partial_conv3(nn.Module):\n    def __init__(self, dim, kernel_size, n_div=4, forward='split_cat'):\n        super().__init__()\n        self.dim_conv3 = dim // n_div\n        self.dim_untouched = dim - self.dim_conv3\n        self.partial_conv3 = nn.Conv2d(self.dim_conv3, self.dim_conv3, kernel_size, 1, autopad(kernel_size), bias=False)\n\n        if forward == 'slicing':\n            self.forward = self.forward_slicing\n        elif forward == 'split_cat':\n            self.forward = self.forward_split_cat\n        else:\n            raise NotImplementedError\n\n    def forward_slicing(self, x):\n        # only for inference\n        x = x.clone()   # !!! Keep the original input intact for the residual connection later\n        x[:, :self.dim_conv3, :, :] = self.partial_conv3(x[:, :self.dim_conv3, :, :])\n        return x\n\n    def forward_split_cat(self, x):\n        # for training/inference\n        x1, x2 = torch.split(x, [self.dim_conv3, self.dim_untouched], dim=1)\n        x1 = self.partial_conv3(x1)\n        x = torch.cat((x1, x2), 1)\n        return x\n\nclass PConv(Conv2D):\n    def __init__(self, inc, ouc, kernel_size, g=1):\n        super().__init__(inc, ouc, kernel_size, g)\n        self.conv = Partial_conv3(inc, kernel_size)\n    \n    def __str__(self):\n        return 'PConv2D-FasterNet'\n\nclass DCNV2(nn.Module):\n    def __init__(self, in_channels, out_channels, kernel_size, stride=1,\n                 padding=1, groups=1, act=True, dilation=1, deformable_groups=1):\n        super(DCNV2, self).__init__()\n\n        self.in_channels = in_channels\n        self.out_channels = out_channels\n        self.kernel_size = (kernel_size, kernel_size)\n        self.stride = (stride, stride)\n        self.padding = (autopad(kernel_size, padding), autopad(kernel_size, padding))\n        self.dilation = (dilation, dilation)\n        self.groups = groups\n        self.deformable_groups = deformable_groups\n\n        self.weight = nn.Parameter(\n            torch.empty(out_channels, in_channels, *self.kernel_size)\n        )\n        self.bias = nn.Parameter(torch.empty(out_channels))\n\n        out_channels_offset_mask = (self.deformable_groups * 3 *\n                                    self.kernel_size[0] * self.kernel_size[1])\n        self.conv_offset_mask = nn.Conv2d(\n            self.in_channels,\n            out_channels_offset_mask,\n            kernel_size=self.kernel_size,\n            stride=self.stride,\n            padding=self.padding,\n            bias=True,\n        )\n        self.bn = nn.BatchNorm2d(out_channels)\n        self.act = nn.SiLU() if act is True else (act if isinstance(act, nn.Module) else nn.Identity())\n        self.reset_parameters()\n\n    def forward(self, x):\n        offset_mask = self.conv_offset_mask(x)\n        o1, o2, mask = torch.chunk(offset_mask, 3, dim=1)\n        offset = torch.cat((o1, o2), dim=1)\n        mask = torch.sigmoid(mask)\n        x = torch.ops.torchvision.deform_conv2d(\n            x,\n            self.weight,\n            offset,\n            mask,\n            self.bias,\n            self.stride[0], self.stride[1],\n            self.padding[0], self.padding[1],\n            self.dilation[0], self.dilation[1],\n            self.groups,\n            self.deformable_groups,\n            True\n        )\n        x = self.bn(x)\n        x = self.act(x)\n        return x\n\n    def reset_parameters(self):\n        n = self.in_channels\n        for k in self.kernel_size:\n            n *= k\n        std = 1. / math.sqrt(n)\n        self.weight.data.uniform_(-std, std)\n        self.bias.data.zero_()\n        self.conv_offset_mask.weight.data.zero_()\n        self.conv_offset_mask.bias.data.zero_()\n\n    def __str__(self):\n        return 'DCNV2'\n\nfrom ops_dcnv3.modules import DCNv3\nclass DCNV3(Conv2D):\n    def __init__(self, inc, ouc, k=1, s=1, p=None, g=1, d=1, act=True):\n        super().__init__(inc, ouc, k, g)\n        self.conv = DCNv3(inc, kernel_size=k, stride=s, group=g, dilation=d)\n    \n    def __str__(self):\n        return 'DCNV3'\n\n    def forward(self, x):\n        x = x.permute(0, 2, 3, 1)\n        x = self.conv(x)\n        x = x.permute(0, 3, 1, 2)\n        return self.act(self.bn(x))\n    \nif __name__ == '__main__':\n    warmup, test_times = 1000, 3000\n    bs, h, w = 8, 256, 256\n    inc, ouc, kernel_size = 128, 128, 3\n    cuda, half = True, True\n    module_list = [\n                   Conv2D(inc, ouc, kernel_size), \n                   DConv2D(inc, ouc, kernel_size), \n                   GhostConv2D(inc, ouc, kernel_size=1, ratio=2, dw_size=kernel_size), \n                   GSConv(inc, ouc, kernel_size),\n                   DSConv2D(inc, ouc, kernel_size),\n                   PConv(inc, ouc, kernel_size),\n                   DCNV2(inc, ouc, kernel_size),\n                   DCNV3(inc, ouc, kernel_size)\n                   ]\n    \n    device = torch.device(\"cuda:0\") if cuda else torch.device(\"cpu\")\n    inputs = torch.randn((bs, inc, h, w)).to(device)\n    if half:\n        inputs = inputs.half()\n    table = PrettyTable()\n    table.title = 'Conv Family Speed'\n    table.field_names = ['Name', 'All_Time', 'Mean_Time', 'FPS', \"FLOPs\", \"Params\"]\n    for module in module_list:\n        module = module.to(device)\n        if half:\n            module = module.half()\n        for i in tqdm.tqdm(range(warmup), desc=f'{str(module)} Warmup....'):\n            module(inputs)\n        all_time = 0\n        for i in tqdm.tqdm(range(test_times), desc=f'{str(module)} Calculate Speed....'):\n            begin = time_synchronized()\n            module(inputs)\n            all_time += time_synchronized() - begin\n        FLOPs, Params = thop.profile(module, inputs=(inputs, ), verbose=False)\n        FLOPs, Params = thop.clever_format([FLOPs, Params], \"%.3f\")\n        # print(f'{str(module)} all_time:{all_time:.5f} mean_time:{all_time / test_times:.5f} fps:{1 / (all_time / test_times)} FLOPs:{FLOPs} Params:{Params}')\n        table.add_row([str(module), f'{all_time:.5f}', f'{all_time / test_times:.5f}', f'{1 / (all_time / test_times)}', f'{FLOPs}', f'{Params}'])\n    print(table)"
  },
  {
    "path": "objectdetection-tricks/tricks_3.py",
    "content": "def feature_visualization(x, module_type, stage, n=32, save_dir=Path('runs/detect/exp')):\n    \"\"\"\n    x:              Features to be visualized\n    module_type:    Module type\n    stage:          Module stage within model\n    n:              Maximum number of feature maps to plot\n    save_dir:       Directory to save results\n    \"\"\"\n    if 'Detect' not in module_type:\n        batch, channels, height, width = x.shape  # batch, channels, height, width\n        if height > 1 and width > 1:\n            f = save_dir / f\"stage{stage}_{module_type.split('.')[-1]}_features.png\"  # filename\n\n            blocks = torch.chunk(x[0].cpu(), channels, dim=0)  # select batch index 0, block by channels\n            n = min(n, channels)  # number of plots\n            fig, ax = plt.subplots(math.ceil(n / 8), 8, tight_layout=True)  # 8 rows x n/8 cols\n            ax = ax.ravel()\n            plt.subplots_adjust(wspace=0.05, hspace=0.05)\n            for i in range(n):\n                block = blocks[i].squeeze().detach().numpy()\n                block = (block - np.min(block)) / (np.max(block) - np.min(block))\n                temp = np.array(block * 255.0, dtype=np.uint8)\n                temp = cv2.applyColorMap(temp, cv2.COLORMAP_JET)\n                ax[i].imshow(temp, cmap=plt.cm.jet)  # cmap='gray'\n                ax[i].axis('off')\n\n            LOGGER.info(f'Saving {f}... ({n}/{channels})')\n            plt.savefig(f, dpi=300, bbox_inches='tight')\n            plt.close()\n            np.save(str(f.with_suffix('.npy')), x[0].cpu().numpy())  # npy save"
  },
  {
    "path": "objectdetection-tricks/tricks_4.py",
    "content": "import os\nimport cv2\nimport json\nfrom tqdm import tqdm\nfrom sklearn.model_selection import train_test_split\nimport argparse\n\nparser = argparse.ArgumentParser()\nparser.add_argument('--root_dir', default='/home/hjj/Desktop/dataset/dataset_seaship',type=str, help=\"root path of images and labels, include ./images and ./labels and classes.txt\")\nparser.add_argument('--save_path', type=str,default='instances_val2017.json', help=\"if not split the dataset, give a path to a json file\")\n\narg = parser.parse_args()\n\ndef yolo2coco(arg):\n    root_path = arg.root_dir\n    print(\"Loading data from \",root_path)\n\n    assert os.path.exists(root_path)\n    originLabelsDir = os.path.join(root_path, 'labels/test')                                        \n    originImagesDir = os.path.join(root_path, 'images/test')\n    with open(os.path.join(root_path, 'classes.txt')) as f:\n        classes = list(map(lambda x:x.strip(), f.readlines()))\n    # images dir name\n    indexes = os.listdir(originImagesDir)\n\n    dataset = {'categories': [], 'annotations': [], 'images': []}\n    for i, cls in enumerate(classes, 0):\n        dataset['categories'].append({'id': i, 'name': cls, 'supercategory': 'mark'})\n    \n    # 标注的id\n    ann_id_cnt = 0\n    for k, index in enumerate(tqdm(indexes)):\n        # 支持 png jpg 格式的图片。\n        txtFile = index.replace('images','txt').replace('.jpg','.txt').replace('.png','.txt')\n        # 读取图像的宽和高\n        im = cv2.imread(os.path.join(originImagesDir, index))\n        height, width, _ = im.shape\n        # 添加图像的信息\n        if not os.path.exists(os.path.join(originLabelsDir, txtFile)):\n            # 如没标签，跳过，只保留图片信息。\n            continue\n        dataset['images'].append({'file_name': index,\n                            'id': int(index[:-4]) if index[:-4].isnumeric() else index[:-4],\n                            'width': width,\n                            'height': height})\n        with open(os.path.join(originLabelsDir, txtFile), 'r') as fr:\n            labelList = fr.readlines()\n            for label in labelList:\n                label = label.strip().split()\n                x = float(label[1])\n                y = float(label[2])\n                w = float(label[3])\n                h = float(label[4])\n\n                # convert x,y,w,h to x1,y1,x2,y2\n                H, W, _ = im.shape\n                x1 = (x - w / 2) * W\n                y1 = (y - h / 2) * H\n                x2 = (x + w / 2) * W\n                y2 = (y + h / 2) * H\n                # 标签序号从0开始计算, coco2017数据集标号混乱，不管它了。\n                cls_id = int(label[0])   \n                width = max(0, x2 - x1)\n                height = max(0, y2 - y1)\n                dataset['annotations'].append({\n                    'area': width * height,\n                    'bbox': [x1, y1, width, height],\n                    'category_id': cls_id,\n                    'id': ann_id_cnt,\n                    'image_id': int(index[:-4]) if index[:-4].isnumeric() else index[:-4],\n                    'iscrowd': 0,\n                    # mask, 矩形是从左上角点按顺时针的四个顶点\n                    'segmentation': [[x1, y1, x2, y1, x2, y2, x1, y2]]\n                })\n                ann_id_cnt += 1\n\n    # 保存结果\n    with open(arg.save_path, 'w') as f:\n        json.dump(dataset, f)\n        print('Save annotation to {}'.format(arg.save_path))\n\nif __name__ == \"__main__\":\n    yolo2coco(arg)"
  },
  {
    "path": "objectdetection-tricks/tricks_5.py",
    "content": "import cv2\nimport numpy as np\nimport matplotlib.pylab as plt\nfrom segment_anything import SamPredictor, sam_model_registry\n\ndef show_mask(mask, ax, random_color=False):\n    if random_color:\n        color = np.concatenate([np.random.random(3), np.array([0.6])], axis=0)\n    else:\n        color = np.array([30/255, 144/255, 255/255, 0.6])\n    h, w = mask.shape[-2:]\n    mask_image = mask.reshape(h, w, 1) * color.reshape(1, 1, -1)\n    ax.imshow(mask_image)\n    \ndef show_points(coords, labels, ax, marker_size=375):\n    pos_points = coords[labels==1]\n    neg_points = coords[labels==0]\n    ax.scatter(pos_points[:, 0], pos_points[:, 1], color='green', marker='*', s=marker_size, edgecolor='white', linewidth=1.25)\n    ax.scatter(neg_points[:, 0], neg_points[:, 1], color='red', marker='*', s=marker_size, edgecolor='white', linewidth=1.25)   \n    \ndef show_box(box, ax):\n    x0, y0 = box[0], box[1]\n    w, h = box[2] - box[0], box[3] - box[1]\n    ax.add_patch(plt.Rectangle((x0, y0), w, h, edgecolor='green', facecolor=(0,0,0,0), lw=2))    \n\nclass Select_RoI:\n    def __init__(self, img) -> None:\n        self.mouseWindowName = 'Select_RoI'\n        self.last_img, self.cur_img = img.copy(), img.copy()\n        \n        self.point_lefttop, self.point_rightbottom, self.center_point, self.count = [], [], [], 0\n        \n        cv2.namedWindow(self.mouseWindowName, cv2.WINDOW_NORMAL)\n        cv2.setMouseCallback(self.mouseWindowName, self.on_mouse)\n        while True:\n            cv2.imshow(self.mouseWindowName, self.cur_img)\n            key = cv2.waitKey(5)\n            if key == 13:  # 按回车键13表示完成绘制\n                break\n            elif key == 99:  # 按键盘c退回上一次的状态\n                self.clear()\n            elif key == 32:\n                self.confirm()\n        \n    def on_mouse(self, event, x, y, flags, param):\n        if event == cv2.EVENT_LBUTTONDOWN:\n            if len(self.point_lefttop) == len(self.point_rightbottom):\n                self.point_lefttop.append([x, y])\n                cv2.circle(self.cur_img, (x, y), 5, (0, 255, 0), -1)\n            else:\n                self.point_rightbottom.append([x, y])\n                cv2.circle(self.cur_img, (x, y), 5, (0, 255, 0), -1)\n                cv2.rectangle(self.cur_img, (tuple(self.point_lefttop[-1])), (tuple(self.point_rightbottom[-1])), (0, 0, 255), 3)\n            cv2.imshow(self.mouseWindowName, self.cur_img)\n        if event == cv2.EVENT_RBUTTONDOWN:\n            cv2.circle(self.cur_img, (x, y), 5, (255, 0, 0), -1)\n            self.center_point.append([x, y])\n    \n    def clear(self):\n        if len(self.center_point) == len(self.point_lefttop) == len(self.point_rightbottom):\n            min_len = len(self.center_point) - 1\n        else:\n            min_len = np.min([len(self.center_point), len(self.point_lefttop), len(self.point_rightbottom)])\n        \n        if len(self.center_point) > min_len:\n            self.center_point.pop(-1)\n        if len(self.point_lefttop) > min_len:\n            self.point_lefttop.pop(-1)\n        if len(self.point_rightbottom) > min_len:\n            self.point_rightbottom.pop(-1)\n        \n        if len(self.center_point) == len(self.point_lefttop) == len(self.point_rightbottom):\n            self.count = min_len\n            self.cur_img = self.last_img.copy()\n        else:\n            raise \"center_point point_lefttop point_rightbottom not equal.\"\n        print(f'point_lefttop:{self.point_lefttop}\\npoint_rightbottom:{self.point_rightbottom}\\ncenter_point:{self.center_point}\\ncount:{self.count}')\n    \n    def confirm(self):\n        self.last_img = self.cur_img.copy()\n        if len(self.center_point) == len(self.point_lefttop) == len(self.point_rightbottom):\n                self.count = len(self.center_point)\n        else:\n            raise \"center_point point_lefttop point_rightbottom not equal.\"\n        print(f'point_lefttop:{self.point_lefttop}\\npoint_rightbottom:{self.point_rightbottom}\\ncenter_point:{self.center_point}\\ncount:{self.count}')\n        \n    def get_result(self):\n        return np.array([np.array([*i, *j]) for i, j in zip(self.point_lefttop, self.point_rightbottom)]), np.array([np.array(i) for i in self.center_point])\n\nsam = sam_model_registry[\"vit_b\"](checkpoint=\"sam_vit_b_01ec64.pth\")\npredictor = SamPredictor(sam)\n\npath = '1.jpg'\nimage = cv2.imread(path)\nroi = Select_RoI(image.copy())\nbox, point = roi.get_result()\nlabel = np.array([0 for i in point])\npredictor.set_image(image)\nif point.shape[0] != 0:\n    masks, scores, logits = predictor.predict(box=box, point_coords=point, point_labels=label)\nelse:\n    masks, scores, logits = predictor.predict(box=box)\n\nimage = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)\nfor i, (mask, score) in enumerate(zip(masks, scores)):\n    plt.figure(figsize=(10,10))\n    plt.imshow(image)\n    show_mask(mask, plt.gca())\n    if point.shape[0] != 0:\n        show_points(point, label, plt.gca())\n    plt.title(f\"Mask {i+1}, Score: {score:.3f}\", fontsize=18)\n    plt.axis('off')\n    plt.tight_layout()\n    plt.show()"
  },
  {
    "path": "objectdetection-tricks/tricks_6.py",
    "content": "import pkg_resources as pkg\ndef check_version(current='0.0.0', minimum='0.0.0', name='version ', pinned=False, hard=False, verbose=False):\n    # Check version vs. required version\n    current, minimum = (pkg.parse_version(x) for x in (current, minimum))\n    result = (current == minimum) if pinned else (current >= minimum)  # bool\n    return result\n\n\ndef set_seeds(seed=0, deterministic=False):\n    # Initialize random number generator (RNG) seeds https://pytorch.org/docs/stable/notes/randomness.html\n    random.seed(seed)\n    np.random.seed(seed)\n    torch.manual_seed(seed)\n    torch.cuda.manual_seed(seed)\n    torch.cuda.manual_seed_all(seed)  # for Multi-GPU, exception safe\n    # torch.backends.cudnn.benchmark = True  # AutoBatch problem https://github.com/ultralytics/yolov5/issues/9287\n    if deterministic and check_version(torch.__version__, '1.12.0'):  # https://github.com/ultralytics/yolov5/pull/8213\n        torch.use_deterministic_algorithms(True)\n        torch.backends.cudnn.deterministic = True\n        os.environ['CUBLAS_WORKSPACE_CONFIG'] = ':4096:8'\n        os.environ['PYTHONHASHSEED'] = str(seed)"
  },
  {
    "path": "objectdetection-tricks/tricks_7.py",
    "content": "import warnings\nwarnings.filterwarnings('ignore')\nimport argparse\nimport logging\nimport math\nimport os\nimport random\nimport time\nimport sys\nfrom copy import deepcopy\nfrom pathlib import Path\nfrom threading import Thread\n\nimport numpy as np\nimport torch.distributed as dist\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport torch.optim as optim\nimport torch.optim.lr_scheduler as lr_scheduler\nimport torch.utils.data\nimport yaml\nfrom torch.cuda import amp\nfrom torch.nn.parallel import DistributedDataParallel as DDP\nfrom tqdm import tqdm\n\nfrom utils.torch_utils import select_device\nfrom models.common import DetectMultiBackend\n\ndef get_weight_size(path):\n    stats = os.stat(path)\n    return f'{stats.st_size / 1024 / 1024:.1f}'\n\nif __name__ == '__main__':\n    parser = argparse.ArgumentParser()\n    parser.add_argument('--weights', type=str, default='', help='trained weights path')\n    parser.add_argument('--batch', type=int, default=1, help='total batch size for all GPUs')\n    parser.add_argument('--imgs', nargs='+', type=int, default=[640, 640], help='[height, width] image sizes')\n    parser.add_argument('--device', default='', help='cuda device, i.e. 0 or 0,1,2,3 or cpu')\n    parser.add_argument('--warmup', default=200, type=int, help='warmup time')\n    parser.add_argument('--testtime', default=1000, type=int, help='test time')\n    parser.add_argument('--half', action='store_true', default=False, help='fp16 mode.')\n    opt = parser.parse_args()\n    \n    device = select_device(opt.device, batch_size=opt.batch)\n    \n    # Model\n    weights = opt.weights\n    pretrained = weights.endswith('.pt')\n    if pretrained:\n        model = DetectMultiBackend(weights, device=device)\n        print(f'Loaded {weights}')  # report\n    else:\n        assert weights.endswith('.pt'), \"compress need weights.\"\n    \n    example_inputs = torch.randn((opt.batch, 3, *opt.imgs)).to(device)\n    \n    if opt.half:\n        model = model.half()\n        example_inputs = example_inputs.half()\n    \n    print('begin warmup...')\n    for i in tqdm(range(opt.warmup), desc='warmup....'):\n        model(example_inputs)\n    \n    print('begin test latency...')\n    time_arr = []\n    \n    for i in tqdm(range(opt.testtime), desc='test latency....'):\n        if device.type == 'cuda':\n            torch.cuda.synchronize()\n        start_time = time.time()\n        \n        model(example_inputs)\n        \n        if device.type == 'cuda':\n            torch.cuda.synchronize()\n        end_time = time.time()\n        time_arr.append(end_time - start_time)\n    \n    std_time = np.std(time_arr)\n    infer_time_per_image = np.sum(time_arr) / (opt.testtime * opt.batch)\n    \n    print(f'model weights:{opt.weights} size:{get_weight_size(opt.weights)}M (bs:{opt.batch})Latency:{infer_time_per_image:.5f}s +- {std_time:.5f}s fps:{1 / infer_time_per_image:.1f}')"
  },
  {
    "path": "objectdetection-tricks/tricks_8.py",
    "content": "import warnings\nwarnings.filterwarnings('ignore')\nimport argparse\nimport logging\nimport math\nimport os\nimport random\nimport time\nimport sys\nfrom copy import deepcopy\nfrom pathlib import Path\nfrom threading import Thread\n\nimport numpy as np\nimport torch.distributed as dist\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport torch.optim as optim\nimport torch.optim.lr_scheduler as lr_scheduler\nimport torch.utils.data\nimport yaml\nfrom torch.cuda import amp\nfrom torch.nn.parallel import DistributedDataParallel as DDP\nfrom torch.utils.tensorboard import SummaryWriter\nfrom tqdm import tqdm\n\nfrom models.experimental import attempt_load\nfrom models.yolo import Model\nfrom utils.torch_utils import select_device\n\ndef get_weight_size(path):\n    stats = os.stat(path)\n    return f'{stats.st_size / 1024 / 1024:.1f}'\n\nif __name__ == '__main__':\n    parser = argparse.ArgumentParser()\n    parser.add_argument('--weights', type=str, default='', help='trained weights path')\n    parser.add_argument('--batch', type=int, default=1, help='total batch size for all GPUs')\n    parser.add_argument('--imgs', nargs='+', type=int, default=[640, 640], help='[height, width] image sizes')\n    parser.add_argument('--device', default='', help='cuda device, i.e. 0 or 0,1,2,3 or cpu')\n    parser.add_argument('--warmup', default=200, type=int, help='warmup time')\n    parser.add_argument('--testtime', default=1000, type=int, help='test time')\n    parser.add_argument('--half', action='store_true', default=False, help='fp16 mode.')\n    opt = parser.parse_args()\n    \n    device = select_device(opt.device, batch_size=opt.batch)\n    \n    # Model\n    weights = opt.weights\n    pretrained = weights.endswith('.pt')\n    if pretrained:\n        model = torch.load(weights, map_location=device)\n        if model['ema']:\n           model = model['ema'].float()\n        else:\n            model = model['model'].float()\n        model.fuse()\n        model.info(img_size=opt.imgs[0])\n        print(f'Loaded {weights}')  # report\n    else:\n        assert weights.endswith('.pt'), \"compress need weights.\"\n    \n    example_inputs = torch.randn((opt.batch, 3, *opt.imgs)).to(device)\n    \n    if opt.half:\n        model = model.half()\n        example_inputs = example_inputs.half()\n    \n    print('begin warmup...')\n    for i in tqdm(range(opt.warmup), desc='warmup....'):\n        model(example_inputs)\n    \n    print('begin test latency...')\n    time_arr = []\n    \n    for i in tqdm(range(opt.testtime), desc='test latency....'):\n        if device.type == 'cuda':\n            torch.cuda.synchronize()\n        start_time = time.time()\n        \n        model(example_inputs)\n        \n        if device.type == 'cuda':\n            torch.cuda.synchronize()\n        end_time = time.time()\n        time_arr.append(end_time - start_time)\n    \n    mean_time, std_time = np.mean(time_arr), np.std(time_arr)\n    \n    print(f'model weights:{opt.weights} size:{get_weight_size(opt.weights)}M Latency:{mean_time:.5f}s +- {std_time:.5f}s fps:{1 / mean_time:.1f}')"
  },
  {
    "path": "objectdetection-tricks/tricks_9.py",
    "content": "import torch, time, math, thop, tqdm, torchvision\nimport torch.nn as nn\nimport torch.nn.functional as F\nfrom prettytable import PrettyTable\nimport numpy as np\n\ndef time_synchronized():\n    # pytorch-accurate time\n    if torch.cuda.is_available():\n        torch.cuda.synchronize()\n    return time.time()\n\ndef fuse_conv_and_bn(conv, bn):\n    \"\"\"Fuse Conv2d() and BatchNorm2d() layers https://tehnokv.com/posts/fusing-batchnorm-and-conv/.\"\"\"\n    fusedconv = (\n        nn.Conv2d(\n            conv.in_channels,\n            conv.out_channels,\n            kernel_size=conv.kernel_size,\n            stride=conv.stride,\n            padding=conv.padding,\n            dilation=conv.dilation,\n            groups=conv.groups,\n            bias=True,\n        )\n        .requires_grad_(False)\n        .to(conv.weight.device)\n    )\n\n    # Prepare filters\n    w_conv = conv.weight.clone().view(conv.out_channels, -1)\n    w_bn = torch.diag(bn.weight.div(torch.sqrt(bn.eps + bn.running_var)))\n    fusedconv.weight.copy_(torch.mm(w_bn, w_conv).view(fusedconv.weight.shape))\n\n    # Prepare spatial bias\n    b_conv = torch.zeros(conv.weight.shape[0], device=conv.weight.device) if conv.bias is None else conv.bias\n    b_bn = bn.bias - bn.weight.mul(bn.running_mean).div(torch.sqrt(bn.running_var + bn.eps))\n    fusedconv.bias.copy_(torch.mm(w_bn, b_conv.reshape(-1, 1)).reshape(-1) + b_bn)\n\n    return fusedconv\n\ndef autopad(k, p=None, d=1):  # kernel, padding, dilation\n    \"\"\"Pad to 'same' shape outputs.\"\"\"\n    if d > 1:\n        k = d * (k - 1) + 1 if isinstance(k, int) else [d * (x - 1) + 1 for x in k]  # actual kernel-size\n    if p is None:\n        p = k // 2 if isinstance(k, int) else [x // 2 for x in k]  # auto-pad\n    return p\n\nclass Conv(nn.Module):\n    \"\"\"Standard convolution with args(ch_in, ch_out, kernel, stride, padding, groups, dilation, activation).\"\"\"\n\n    default_act = nn.SiLU()  # default activation\n\n    def __init__(self, c1, c2, k=1, s=1, p=None, g=1, d=1, act=True):\n        \"\"\"Initialize Conv layer with given arguments including activation.\"\"\"\n        super().__init__()\n        self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p, d), groups=g, dilation=d, bias=False)\n        self.bn = nn.BatchNorm2d(c2)\n        self.act = self.default_act if act is True else act if isinstance(act, nn.Module) else nn.Identity()\n\n    def forward(self, x):\n        \"\"\"Apply convolution, batch normalization and activation to input tensor.\"\"\"\n        return self.act(self.bn(self.conv(x)))\n\n    def forward_fuse(self, x):\n        \"\"\"Perform transposed convolution of 2D data.\"\"\"\n        return self.act(self.conv(x))\n\nclass Bottleneck(nn.Module):\n    \"\"\"Standard bottleneck.\"\"\"\n\n    def __init__(self, c1, c2, shortcut=True, g=1, k=(3, 3), e=0.5):\n        \"\"\"Initializes a bottleneck module with given input/output channels, shortcut option, group, kernels, and\n        expansion.\n        \"\"\"\n        super().__init__()\n        c_ = int(c2 * e)  # hidden channels\n        self.cv1 = Conv(c1, c_, k[0], 1)\n        self.cv2 = Conv(c_, c2, k[1], 1, g=g)\n        self.add = shortcut and c1 == c2\n\n    def forward(self, x):\n        \"\"\"'forward()' applies the YOLO FPN to input data.\"\"\"\n        return x + self.cv2(self.cv1(x)) if self.add else self.cv2(self.cv1(x))\n\n################################# YOLOV7-ELAN #################################\n\nclass ELAN(nn.Module):\n    def __init__(self, inc, ouc, hidc, act=True):\n        super(ELAN, self).__init__()\n        \n        self.conv1 = Conv(inc, hidc, k=1, act=act)\n        self.conv2 = Conv(inc, hidc, k=1, act=act)\n        self.conv3 = Conv(hidc, hidc, k=3, act=act)\n        self.conv4 = Conv(hidc, hidc, k=3, act=act)\n        self.conv5 = Conv(hidc * 4, ouc, k=1, act=act)\n        \n    def forward(self, x):\n        x1, x2 = self.conv1(x), self.conv2(x)\n        x3 = self.conv3(x2)\n        x4 = self.conv4(x3)\n        x_concat = torch.concat([x1, x2, x3, x4], dim=1)\n        x_final = self.conv5(x_concat)\n        return x_final\n\n    def __str__(self):\n        return 'ELAN'\n    \n################################# YOLOV8-C2f #################################\n\nclass C2f(nn.Module):\n    \"\"\"Faster Implementation of CSP Bottleneck with 2 convolutions.\"\"\"\n\n    def __init__(self, c1, c2, n=1, shortcut=False, g=1, e=0.5):\n        \"\"\"Initialize CSP bottleneck layer with two convolutions with arguments ch_in, ch_out, number, shortcut, groups,\n        expansion.\n        \"\"\"\n        super().__init__()\n        self.c = int(c2 * e)  # hidden channels\n        self.cv1 = Conv(c1, 2 * self.c, 1, 1)\n        self.cv2 = Conv((2 + n) * self.c, c2, 1)  # optional act=FReLU(c2)\n        self.m = nn.ModuleList(Bottleneck(self.c, self.c, shortcut, g, k=((3, 3), (3, 3)), e=1.0) for _ in range(n))\n\n    def forward(self, x):\n        \"\"\"Forward pass through C2f layer.\"\"\"\n        y = list(self.cv1(x).chunk(2, 1))\n        y.extend(m(y[-1]) for m in self.m)\n        return self.cv2(torch.cat(y, 1))\n\n    def forward_split(self, x):\n        \"\"\"Forward pass using split() instead of chunk().\"\"\"\n        y = list(self.cv1(x).split((self.c, self.c), 1))\n        y.extend(m(y[-1]) for m in self.m)\n        return self.cv2(torch.cat(y, 1))\n    \n    def __str__(self):\n        return 'C2f'\n\n################################# YOLOV5-C3 #################################\n\nclass C3(nn.Module):\n    \"\"\"CSP Bottleneck with 3 convolutions.\"\"\"\n\n    def __init__(self, c1, c2, n=1, shortcut=True, g=1, e=0.5):\n        \"\"\"Initialize the CSP Bottleneck with given channels, number, shortcut, groups, and expansion values.\"\"\"\n        super().__init__()\n        c_ = int(c2 * e)  # hidden channels\n        self.cv1 = Conv(c1, c_, 1, 1)\n        self.cv2 = Conv(c1, c_, 1, 1)\n        self.cv3 = Conv(2 * c_, c2, 1)  # optional act=FReLU(c2)\n        self.m = nn.Sequential(*(Bottleneck(c_, c_, shortcut, g, k=((1, 1), (3, 3)), e=1.0) for _ in range(n)))\n\n    def forward(self, x):\n        \"\"\"Forward pass through the CSP bottleneck with 2 convolutions.\"\"\"\n        return self.cv3(torch.cat((self.m(self.cv1(x)), self.cv2(x)), 1))\n    \n    def __str__(self):\n        return 'C3'\n\n################################# YOLOV9-RepNCSPELAN4 #################################\n\nclass RepConvN(nn.Module):\n    \"\"\"RepConv is a basic rep-style block, including training and deploy status\n    This code is based on https://github.com/DingXiaoH/RepVGG/blob/main/repvgg.py\n    \"\"\"\n    default_act = nn.SiLU()  # default activation\n\n    def __init__(self, c1, c2, k=3, s=1, p=1, g=1, d=1, act=True, bn=False, deploy=False):\n        super().__init__()\n        assert k == 3 and p == 1\n        self.g = g\n        self.c1 = c1\n        self.c2 = c2\n        self.act = self.default_act if act is True else act if isinstance(act, nn.Module) else nn.Identity()\n\n        self.bn = None\n        self.conv1 = Conv(c1, c2, k, s, p=p, g=g, act=False)\n        self.conv2 = Conv(c1, c2, 1, s, p=(p - k // 2), g=g, act=False)\n\n    def forward_fuse(self, x):\n        \"\"\"Forward process\"\"\"\n        return self.act(self.conv(x))\n\n    def forward(self, x):\n        \"\"\"Forward process\"\"\"\n        id_out = 0 if self.bn is None else self.bn(x)\n        return self.act(self.conv1(x) + self.conv2(x) + id_out)\n\n    def get_equivalent_kernel_bias(self):\n        kernel3x3, bias3x3 = self._fuse_bn_tensor(self.conv1)\n        kernel1x1, bias1x1 = self._fuse_bn_tensor(self.conv2)\n        kernelid, biasid = self._fuse_bn_tensor(self.bn)\n        return kernel3x3 + self._pad_1x1_to_3x3_tensor(kernel1x1) + kernelid, bias3x3 + bias1x1 + biasid\n\n    def _avg_to_3x3_tensor(self, avgp):\n        channels = self.c1\n        groups = self.g\n        kernel_size = avgp.kernel_size\n        input_dim = channels // groups\n        k = torch.zeros((channels, input_dim, kernel_size, kernel_size))\n        k[np.arange(channels), np.tile(np.arange(input_dim), groups), :, :] = 1.0 / kernel_size ** 2\n        return k\n\n    def _pad_1x1_to_3x3_tensor(self, kernel1x1):\n        if kernel1x1 is None:\n            return 0\n        else:\n            return torch.nn.functional.pad(kernel1x1, [1, 1, 1, 1])\n\n    def _fuse_bn_tensor(self, branch):\n        if branch is None:\n            return 0, 0\n        if isinstance(branch, Conv):\n            kernel = branch.conv.weight\n            running_mean = branch.bn.running_mean\n            running_var = branch.bn.running_var\n            gamma = branch.bn.weight\n            beta = branch.bn.bias\n            eps = branch.bn.eps\n        elif isinstance(branch, nn.BatchNorm2d):\n            if not hasattr(self, 'id_tensor'):\n                input_dim = self.c1 // self.g\n                kernel_value = np.zeros((self.c1, input_dim, 3, 3), dtype=np.float32)\n                for i in range(self.c1):\n                    kernel_value[i, i % input_dim, 1, 1] = 1\n                self.id_tensor = torch.from_numpy(kernel_value).to(branch.weight.device)\n            kernel = self.id_tensor\n            running_mean = branch.running_mean\n            running_var = branch.running_var\n            gamma = branch.weight\n            beta = branch.bias\n            eps = branch.eps\n        std = (running_var + eps).sqrt()\n        t = (gamma / std).reshape(-1, 1, 1, 1)\n        return kernel * t, beta - running_mean * gamma / std\n\n    def fuse_convs(self):\n        if hasattr(self, 'conv'):\n            return\n        kernel, bias = self.get_equivalent_kernel_bias()\n        self.conv = nn.Conv2d(in_channels=self.conv1.conv.in_channels,\n                              out_channels=self.conv1.conv.out_channels,\n                              kernel_size=self.conv1.conv.kernel_size,\n                              stride=self.conv1.conv.stride,\n                              padding=self.conv1.conv.padding,\n                              dilation=self.conv1.conv.dilation,\n                              groups=self.conv1.conv.groups,\n                              bias=True).requires_grad_(False)\n        self.conv.weight.data = kernel\n        self.conv.bias.data = bias\n        for para in self.parameters():\n            para.detach_()\n        self.__delattr__('conv1')\n        self.__delattr__('conv2')\n        if hasattr(self, 'nm'):\n            self.__delattr__('nm')\n        if hasattr(self, 'bn'):\n            self.__delattr__('bn')\n        if hasattr(self, 'id_tensor'):\n            self.__delattr__('id_tensor')\n\nclass RepNBottleneck(nn.Module):\n    # Standard bottleneck\n    def __init__(self, c1, c2, shortcut=True, g=1, k=(3, 3), e=0.5, act=True):  # ch_in, ch_out, shortcut, kernels, groups, expand\n        super().__init__()\n        c_ = int(c2 * e)  # hidden channels\n        self.cv1 = RepConvN(c1, c_, k[0], 1, act=act)\n        self.cv2 = Conv(c_, c2, k[1], 1, g=g, act=act)\n        self.add = shortcut and c1 == c2\n\n    def forward(self, x):\n        return x + self.cv2(self.cv1(x)) if self.add else self.cv2(self.cv1(x))\n\nclass RepNCSP(nn.Module):\n    # CSP Bottleneck with 3 convolutions\n    def __init__(self, c1, c2, n=1, shortcut=True, g=1, e=0.5, act=True):  # ch_in, ch_out, number, shortcut, groups, expansion\n        super().__init__()\n        c_ = int(c2 * e)  # hidden channels\n        self.cv1 = Conv(c1, c_, 1, 1, act=act)\n        self.cv2 = Conv(c1, c_, 1, 1, act=act)\n        self.cv3 = Conv(2 * c_, c2, 1, act=act)  # optional act=FReLU(c2)\n        self.m = nn.Sequential(*(RepNBottleneck(c_, c_, shortcut, g, e=1.0, act=act) for _ in range(n)))\n\n    def forward(self, x):\n        return self.cv3(torch.cat((self.m(self.cv1(x)), self.cv2(x)), 1))\n\nclass RepNCSPELAN4(nn.Module):\n    # csp-elan\n    def __init__(self, c1, c2, c3, c4, c5=1, act=True):  # ch_in, ch_out, number, shortcut, groups, expansion\n        super().__init__()\n        self.c = c3//2\n        self.cv1 = Conv(c1, c3, 1, 1, act=act)\n        self.cv2 = nn.Sequential(RepNCSP(c3//2, c4, c5, act=act), Conv(c4, c4, 3, 1, act=act))\n        self.cv3 = nn.Sequential(RepNCSP(c4, c4, c5, act=act), Conv(c4, c4, 3, 1, act=act))\n        self.cv4 = Conv(c3+(2*c4), c2, 1, 1, act=act)\n\n    def forward(self, x):\n        y = list(self.cv1(x).chunk(2, 1))\n        y.extend((m(y[-1])) for m in [self.cv2, self.cv3])\n        return self.cv4(torch.cat(y, 1))\n\n    def forward_split(self, x):\n        y = list(self.cv1(x).split((self.c, self.c), 1))\n        y.extend(m(y[-1]) for m in [self.cv2, self.cv3])\n        return self.cv4(torch.cat(y, 1))\n    \n    def __str__(self):\n        return 'RepNCSPELAN'\n\nclass RepNCSPELAN4_Att(nn.Module):\n    # csp-elan\n    def __init__(self, c1, c2, c3, c4, c5=1, act=True):  # ch_in, ch_out, number, shortcut, groups, expansion\n        super().__init__()\n        self.c = c3//2\n        self.cv1 = Conv(c1, c3, 1, 1, act=act)\n        self.cv2 = nn.Sequential(RepNCSP(c3//2, c4, c5, act=act), Conv(c4, c4, 3, 1, act=act))\n        self.cv3 = nn.Sequential(RepNCSP(c4, c4, c5, act=act), Conv(c4, c4, 3, 1, act=act))\n        self.cv4 = Conv(c3+(2*c4), c2, 1, 1, act=act)\n\n    def forward(self, x):\n        y = list(self.cv1(x).chunk(2, 1))\n        y.extend((m(y[-1])) for m in [self.cv2, self.cv3])\n        return self.cv4(torch.cat(y, 1))\n\n    def forward_split(self, x):\n        y = list(self.cv1(x).split((self.c, self.c), 1))\n        y.extend(m(y[-1]) for m in [self.cv2, self.cv3])\n        return self.cv4(torch.cat(y, 1))\n    \n    def __str__(self):\n        return 'RepNCSPELAN_Att'\n\nif __name__ == '__main__':\n    warmup, test_times = 1000, 2000\n    bs, h, w = 1, 128, 128\n    channel = 256\n    cuda, half = True, False\n    module_list = [\n                   C3(channel, channel),\n                   ELAN(channel, channel, channel // 2),\n                   C2f(channel, channel),\n                   RepNCSPELAN4(channel, channel, channel // 2, channel // 4, 1),\n                   ]\n    \n    device = torch.device(\"cuda:0\") if cuda else torch.device(\"cpu\")\n    inputs = torch.randn((bs, channel, h, w)).to(device)\n    if half:\n        inputs = inputs.half()\n    table = PrettyTable()\n    table.title = 'Yolo Block Family Speed'\n    table.field_names = ['Name', 'All_Time', 'Mean_Time', 'FPS', \"FLOPs\", \"Params\"]\n    for module in module_list:\n        for m in module.modules():\n            if isinstance(m, (Conv,)) and hasattr(m, \"bn\"):\n                    m.conv = fuse_conv_and_bn(m.conv, m.bn)  # update conv\n                    delattr(m, \"bn\")  # remove batchnorm\n                    m.forward = m.forward_fuse  # update forward\n            if isinstance(m, RepConvN):\n                    m.fuse_convs()\n                    m.forward = m.forward_fuse  # update forward\n        \n        module = module.to(device)\n        if half:\n            module = module.half()\n        for i in tqdm.tqdm(range(warmup), desc=f'{str(module)} Warmup....'):\n            module(inputs)\n        all_time = 0\n        for i in tqdm.tqdm(range(test_times), desc=f'{str(module)} Calculate Speed....'):\n            begin = time_synchronized()\n            module(inputs)\n            all_time += time_synchronized() - begin\n        FLOPs, Params = thop.profile(module, inputs=(inputs, ), verbose=False)\n        FLOPs, Params = thop.clever_format([FLOPs, Params], \"%.3f\")\n        # print(f'{str(module)} all_time:{all_time:.5f} mean_time:{all_time / test_times:.5f} fps:{1 / (all_time / test_times)} FLOPs:{FLOPs} Params:{Params}')\n        table.add_row([str(module), f'{all_time:.5f}', f'{all_time / test_times:.5f}', f'{1 / (all_time / test_times)}', f'{FLOPs}', f'{Params}'])\n    print(table)"
  },
  {
    "path": "readme.md",
    "content": "# Object Detection Script\n这个项目主要是提供一些关于目标检测的代码和改进思路参考.\n\n### [BiliBili视频指南](https://github.com/z1069614715/objectdetection_script/blob/master/bilibili-guide.md)\n\n# Project <需要入手请加企鹅1615905974/1069614715,如添加不上可bilibili私聊直发企鹅号码,最好好友请求也设置不需要验证就可以加上>\n1. 基于Ultralytics的yolov8、yolov10改进项目.(69.9¥)\n    \n    [目前已有的改进方案和更新详细公告](https://github.com/z1069614715/objectdetection_script/blob/master/yolo-improve/yolov8v10-project.md)  \n    项目简单介绍，详情请看项目详解.\n    1. 提供修改好的代码和每个改进点的配置文件,相当于积木都给大家准备好,大家只需要做实验和搭积木(修改yaml配置文件组合创新点)即可,装好环境即可使用.\n    2. 后续的改进方案都会基于这个项目更新进行发布，在群公告进行更新百度云链接.\n    3. 购买了本项目的都会赠送yolov5-PAGCP通道剪枝算法代码和相关实验参数命令.\n    4. 购买后进YOLOV8V10交流群(代码视频均在群公告),群里可交流代码和论文相关,目前1群2群已满,现在进的是3群,气氛活跃.\n    5. 项目因为(价格问题)不附带一对一私人答疑服务,群里附带答疑服务,平时我有时间都会回复群里部分问题.\n    6. 里面配备使用说明(部分改进点使用复杂度高、二次创新、原创的模块都会有对应的视频进行说明)\n\n2. 基于Ultralytics的yolo11、yolo12改进项目.(69.9¥)\n    \n    [目前已有的改进方案和更新详细公告](https://github.com/z1069614715/objectdetection_script/blob/master/yolo-improve/yolov11-project.md)  \n    项目简单介绍，详情请看项目详解.\n    1. 提供修改好的代码和每个改进点的配置文件,相当于积木都给大家准备好,大家只需要做实验和搭积木(修改yaml配置文件组合创新点)即可,装好环境即可使用.\n    2. 后续的改进方案都会基于这个项目更新进行发布，在群公告进行更新百度云链接.\n    3. 购买了本项目的都会赠送yolov5-PAGCP通道剪枝算法代码和相关实验参数命令.\n    4. 购买后进YOLOV11交流群(代码视频均在群公告),群里可交流代码和论文相关,气氛活跃.\n    5. 项目因为(价格问题)不附带一对一私人答疑服务,群里附带答疑服务,平时我有时间都会回复群里部分问题.\n    6. 里面配备使用说明(部分改进点使用复杂度高、二次创新、原创的模块都会有对应的视频进行说明)。\n    7. 包含yolo12-目标检测、实例分割、关键点检测、旋转目标检测、分类配置文件，可以通过仅修改配置文件的方式改进yolo12。\n\n3. 基于YOLOV5,YOLOV7的(剪枝+知识蒸馏)项目.(129.9¥)[项目详解](https://github.com/z1069614715/objectdetection_script/blob/master/yolo-improve/yolov5v7-light.md)\n\n    1. 模型轻量化,部署必备之一!\n    2. 项目里面配套几个剪枝和蒸馏的示例,并且都配有视频讲解,供大家理解如何进行剪枝和蒸馏.\n    3. 购买后进YOLOV5V7轻量化交流群(代码视频均在群公告),轻量化问题都可在群交流,因为剪枝蒸馏问题比较困难,所以剪枝蒸馏问题可以群里提问,我都会群里回复相关问题.\n\n4. 基于Ultralytics的RT-DETR(CVPR2024)改进项目.(89.9¥)\n\n    [目前已有的改进方案和更新详细公告](https://github.com/z1069614715/objectdetection_script/blob/master/yolo-improve/rtdetr-project.md)  \n    项目简单介绍，详情请看项目详解.\n    1. 提供修改好的代码和每个改进点的配置文件,相当于积木都给大家准备好,大家只需要做实验和搭积木(修改yaml配置文件组合创新点)即可,装好环境即可使用.\n    2. 后续的改进方案都会基于这个项目更新进行发布,在群公告进行更新百度云链接.\n    3. 购买了RT-DETR项目的都会赠送yolov5-PAGCP通道剪枝算法代码和相关实验参数命令.\n    4. 购买后进RT-DETR交流群(代码视频均在群公告),群里可交流代码和论文相关.\n    5. 项目因为(价格问题)不附带一对一私人答疑服务,群里附带答疑服务,平时我有时间都会回复群里部分问题.\n    6. RT-DETR项目包含多种基准模型改进方案(RT-DETR-R18,RT-DETR-R50,RT-DETR-L,Yolov8-Detr,Yolov5-Detr),具体可点击[目前已有的改进方案和更新详细公告](https://github.com/z1069614715/objectdetection_script/blob/master/yolo-improve/rtdetr-project.md)看详细.\n    7. 里面配备使用说明(部分改进点使用复杂度高、二次创新、原创的模块都会有对应的视频进行说明)\n\n5. 基于YOLOV8V10V11V12的剪枝蒸馏项目.  \n   \n    注意:\n    1. 本次项目就直接提供几个文件，到时候会提供教程，自行复制到项目一/二上即可跑，原理上其他版本应该也可以跑，但是开发的时候我是基于项目一/二的(ultralytics版本号:v8.1.9、v8.2.50、v8.3.1)上开发的，附近的版本的话应该也可以跑，但是没办法一一验证，所以需自行考虑!\n    2. 里面会提供一个官方纯净版的(ultralytics版本号:8.1.9、8.2.50、8.3.1、8.3.78)的ultralytics以及其对应的剪枝蒸馏代码，以便没有购买项目一/二的同学使用。\n\n    剪枝:[项目详解](https://github.com/z1069614715/objectdetection_script/blob/master/yolo-improve/yolov8-compress.md)(89.9¥)\n    1. 模型轻量化,部署,大论文堆工作量必备之一!\n    2. 项目里面配套剪枝示例(示例中是基于项目一/二的改进代码进行剪枝,如没有入手项目一/二是不包含这部分代码的,但对你理解剪枝操作没影响),并且都配有视频讲解,供大家理解如何进行剪枝.\n    3. 购买后进YOLOV8V10V11V12剪枝交流群(代码视频均在群公告),因为剪枝操作有一定的难度,所以剪枝问题可以群里提问,我都会群里回复相关问题.\n    4. 支持yolov8中的目标检测、实例分割、姿态检测、旋转目标检测剪枝、yolov10目标检测剪枝、yolo11/12(目标检测、实例分割、姿态检测、旋转目标检测剪枝)。\n\n    蒸馏:[项目详解](https://github.com/z1069614715/objectdetection_script/blob/master/yolo-improve/yolov8-distill.md)(89.9¥)\n    1. 模型轻量化,部署,大论文堆工作量必备之一!\n    2. 项目里面配套蒸馏示例(部分示例中是基于项目一/二的改进代码进行蒸馏,如没有入手项目一/二是不包含这部分代码的,但对你理解蒸馏操作没影响),并且都配有视频讲解,供大家理解如何进行蒸馏.\n    3. 购买后进YOLOV8V10V11V12蒸馏交流群(代码视频均在群公告),因为蒸馏操作有一定的难度,所以蒸馏操作问题可以群里提问,我都会群里回复相关问题.\n    4. 支持yolov8中的目标检测、实例分割、姿态检测、旋转目标检测蒸馏、yolov10目标检测蒸馏、yolo11/12(目标检测、实例分割、姿态检测、旋转目标检测蒸馏)。\n    5. 实例分割、姿态检测、旋转目标检测暂不支持BCKD蒸馏方法.\n\n6. 基于Ultralytics的RT-DETR(CVPR2024)的剪枝蒸馏项目.  \n   \n    注意：基于Ultralytics的RT-DETR的剪枝蒸馏项目是基于项目四上进行开发的，所以入手剪枝蒸馏项目也需要项目四才能使用。\n\n    剪枝：[项目详解](https://github.com/z1069614715/objectdetection_script/blob/master/yolo-improve/rtdetr-compress.md)(89.9¥)\n    1. 模型轻量化,部署,大论文堆工作量必备之一!\n    2. 项目里面配套剪枝示例(包含一些项目四中的改进模型的剪枝教程),并且都配有视频讲解,供大家理解如何进行蒸馏.\n    3. 购买后进RTDETR剪枝交流群(代码视频均在群公告),因为剪枝操作有一定的难度,所以剪枝操作问题可以群里提问,我都会群里回复相关问题.\n    4. 经过我目前的实验,rtdetr很难进行稀疏训练,因此本项目目前不包含稀疏训练的剪枝方法,如果一定要进行稀疏训练的剪枝慎入,目前项目包含6种不需要稀疏训练方法的剪枝.\n\n    蒸馏：[项目详解](https://github.com/z1069614715/objectdetection_script/blob/master/yolo-improve/rtdetr-distill.md)(69.9¥)  \n    1. 模型轻量化,部署,大论文堆工作量必备之一!\n    2. 项目里面配套蒸馏示例,并且都配有视频讲解,供大家理解如何进行蒸馏.\n    3. 购买后进RTDETR蒸馏交流群(代码视频均在群公告),因为蒸馏操作有一定的难度,所以蒸馏操作问题可以群里提问,我都会群里回复相关问题.\n    4. 知识蒸馏整体修改难度大，代表少人使用，物以稀为贵，增加文章的创新度！\n\n7. 基于CVPR2025-DEIM的改进项目.(288¥)\n    \n    项目详细介绍请看[此处](https://github.com/z1069614715/objectdetection_script/blob/master/cvpr2025-deim-project.md)\n    1. 相比官方有更多分析的图表，基本论文常用到的都有.(YOLO指标、FPS、模型大小、COCO指标中的每类tsml等等指标、热力图、特征图、漏检误检可视化....)\n    2. 总所周知DETR系列模型检测头非常难改，需要代码功底和一定知识存储才能改，但本项目有DETR检测头的改进，并且还有视频讲解整体实现原理.\n    3. 此项目有一些模型创新课题的视频，由我整理一下比较新且有创新空间的模块和讲解视频，想学模块创新一定不可错过.\n    4. 相比官方的代码修复了很多存在的bug，做科研没有一个稳定的代码框架怎么行呢？\n    5. 目前包含学生-教师类型的知识蒸馏、模型导出(onnx、tensorrt)、ByteTrack目标跟踪等凑工作量的内容，大小论文一网打尽～\n    6. 支持实例分割，给实例分割的同学们多了一个非常nice的选择～\n    7. 支持DINOV3主干，即使数据量少，得益于DINOV3性能依然抗打～\n    8. 更多请点击上述链接进行查看～\n\n8. 基于YOLO|RTDETR多模态目标检测项目.(原价288¥,若已购买yolo8101112或rtdetr项目的则优惠50¥=238¥)\n\n    项目详细介绍请看[此处](https://github.com/z1069614715/objectdetection_script/blob/master/mutilmodel-project.md)\n\n9. Ultralytics-YOLO改进项目.(99¥)\n\n    项目详细介绍请看[此处](https://github.com/z1069614715/objectdetection_script/blob/master/Ultralytics-YOLO-project.md)\n    1. 本项目集成了YOLOv8、v10、v11、v12乃至前沿的YOLO26等全系列基础模型。 无论是做横向对比实验，还是纵向的版本改进，无需到处找资源，一个项目就能满足你所有的实验需求！\n    2. 核心代码已实现高度模块化与解耦，专为新手优化。 你完全不需要死磕底层复杂代码，只需像搭积木一样简单修改YAML配置文件，就能轻松实现各种改进模块的自由组合。\n    3. 面对日益内卷的YOLO赛道，简单的“缝合”已难满足毕业要求。 本项目不仅提供现成的创新方案，更配套独家“二次创新”课程，授人以渔。我们将手把手教你掌握模块设计的底层逻辑，助你从“模仿者”进阶为“创造者”，设计出独属于你的创新模块。\n    4. 针对有代码基础但受困于Ultralytics复杂架构的同学， 本项目引入了来自DFine、DEIM项目中成熟的“万物皆可融”架构思想。你无需纠结模块注册等信息，只需遵循我所提供的标准接口规范，即可将自定义魔改模块无缝融入YAML配置，与各类CSP变种灵活结合。\n    5. 实验跑通了，却不知道如何写创新点？ 本项目将定期拆解高分论文，传授写作心法，教你如何将实验成果转化为逻辑严密、亮点突出的高质量学术论文，解决写作难题！\n    6. 毕业设计缺少高大上的展示界面？ 别担心，项目会内置基于PyQt或HTML的通用可视化界面，开箱即用，完美补齐毕业论文的最后一块拼图，助你从容应对答辩！\n    7. 购买即享专属技术交流群， 这里有业内公认的高效答疑服务，以及志同道合的伙伴互助交流。拒绝闭门造车，让我们带你避开深坑，高效通关！  \n    \n    **注意：部分功能在项目初期可能尚未实现，将随着项目的持续开发逐步补齐完善。**\n\n10. 基于YOLO和RT-DETR的论文全流程指导项目.(原价238¥，若已购买yolo8101112或rtdetr项目或deim项目的则优惠50¥=188¥)[项目详解](https://github.com/z1069614715/objectdetection_script/blob/master/yolo-improve/paper.md)  \n    我们目前有非常多的代码项目，几乎是全网最全价格最优惠性格比最高的一家，但是难免有些同学在做完实验后还是完全不懂应该怎么去写or不想走太多弯路的情况，因此开展这个基于YOLO和RT-DETR的论文全流程指导项目，本项目致力于帮助那些在论文道路上极其困难的同学，基本上配合上述的一些改进项目和此论文全流程指导项目再加上自己的一点努力可以完全实现毕业无忧,项目简介如下：\n\n    1. 直播内容涉及到发论文的整个论文框架体系的方方面面，每次直播都会优先讲大家最想听的部分，根据课程目录投票决定。\n    2. 直播答疑每个人的问题，上课前会使用excel表格在线收集大家的问题，直播时集中讲解。\n    3. 直播的回放视频会实时上传到百度网盘，并且视频均为加密视频，一人一机一码，且课程目录的每部分对应检索直播回放视频链接方便大家后续查找，实时更新百度网盘链接内容和使用说明文档。\n    4. 购买后进论文指导交流群(视频均在群公告),群里可交流论文相关。\n    5. 项目不附带私人答疑服务,群里附带答疑服务,平时我有时间都会回复群里部分问题。\n    6. 不定时收集群友反馈，有问题可以在群内随时提出，逐步完善课程体系，让大家高效快速发出论文。\n    7. 项目有效期为一年，时间从付费进群那天开始算，例如我2024年5月2日进群，2025年5月2日到期，一年时间足以解决所有论文相关的问题。\n    8. 项目公开课试听B站链接1：[长达80分钟的<论文中对比实验+消融实验+论文工作量创新点评估+答疑>解答直播回放来啦~](https://www.bilibili.com/video/BV1u5rCYmE4k/)\n    9. 项目公开课试听B站链接2：[长达60分钟的<实验向论文过渡指导+论文写作顺序+创新性评估+公开答疑>直播回放来啦~](https://www.bilibili.com/video/BV1oJPueREfR/)\n    10. 项目公开课试听B站链接3：[长达2小时的论文高效画图专题全面剖析：数据可视化+模型图绘制+实验数据分析图+答疑直播，全程高能！！！！](https://www.bilibili.com/video/BV1xEEEzZEUs)\n\n## 导购指南\n\n不知道怎么选？按你的目标直接对号入座：\n\n### 1. 只求毕业，期刊无硬性要求\n- 推荐项目：**1、2、9（推荐项目9，性价比最高）**\n- 适合人群：希望快速跑通实验、以“稳妥毕业”为第一目标。\n- 标签：`上手快` `性价比高` `代码投入低` `训练速度快`\n\n### 2. 有期刊要求，但不想深钻代码\n- 推荐项目：**4**\n- 适合人群：希望做出有区分度的实验，但不希望在底层代码上投入过多时间。\n- 标签：`上手快` `DETR发论文友好` `代码投入低`\n\n### 3. 追逐热点，愿意学代码，追求创新，冲刺SCI\n- 推荐项目：**7、8**\n- 适合人群：愿意投入更多时间做方法创新、实验分析和前沿方向探索。\n- 标签：`前沿热点` `创新空间大` `冲刺高区SCI`\n\n### 4. 大论文需要凑工作量 + 有部署需求\n- 推荐项目：**5、6**\n- 适合人群：希望同时覆盖“剪枝/蒸馏/部署”链路，补齐论文工作量与落地内容。\n- 标签：`大论文工作量充足` `部署导向` `实用性强`\n- 注意：项目 6 基于项目 4 开发，需配合项目 4 使用。\n\n### 5. 实验做完后，论文完全不会写\n- 推荐项目：**10**\n- 适合人群：实验已完成，但论文结构、创新表述、图表组织和写作流程缺少方法。\n- 标签：`写作指导` `答疑导向` `适合论文收尾`\n\n## 如果上述项目还不能满足您的需求，我们这里还有专业AI算法定制～\n![Advertising Board](https://github.com/z1069614715/objectdetection_script/blob/master/Customization.png)\n\n## GPU服务器推荐\n为了让大家在科研路上一路畅通、降低初期上手难度、并且降低大家租服务器的成本，这边联合多个平台提供一个稳定、快速、便宜的服务器租用平台给大家，经过多次沟通，在我的链接上注册or充值可以给到大家福利如下：\n\n---------------------------------------- 智算云扉 ----------------------------------------\n1. 价格非常优惠，几乎全网最低。3090:0.99/h,4090d最低:1.18/h,4090-24GB:最低1.78/h,4090D-48G:2.52/h,4090-48GB:3.19/h\n2. 使用我的专属优惠码进行充值可以额外获取百分之5的算力点。举个例子:我要充100，本来我只能得100算力点，使用我的优惠码后，可以得到105算力点！下单链接：https://waas.aigate.cc/user/charge?channel=BLBLMGMJ&coupon=DLJGKNBEE1 或者手动填优惠码：DLJGKNBEE1，点击验证即可。优惠码界面在充值入口里面\n3. 智算云扉平台上，我已经提供好我自己改进项目的专属镜像、镜像里面会给大家配置好环境、并且相对应需要编译的模型都会给大家配置好、真正实现上传数据集和代码立刻开跑！跑实验也快人一步！直接在镜像社区/云扉工坊搜索yolo关键词就可以看到。\n4. 智算云扉平台上，我为大家提供了一些常用的数据集，并且格式已经转换好，包含COCO2017,VOC2007+2012,CrowdHuman,Visdrone2019,BDD100K.\n5. 支持无卡模式开机、支持绑定百度云账号,直接把网盘的内容秒传到云磁盘，省下数据集上传的时间！\n6. 可以通过qq搜索以下群号：798692951，添加智算云扉平台交流群，里面有智算云扉官方的客服帮助大家答疑相关平台的问题！\n7. B站视频教程：https://www.bilibili.com/video/BV11DXTYiENS/\n8. 20260114更新:数据集的位置有所变动，请看这期视频:https://www.bilibili.com/video/BV1TDrLBfEr7/\n\n---------------------------------------- DAModel ----------------------------------------\n1. 在DAModel平台上现有的优惠折扣上，额外加上(按需95折、包日97折、包月99折扣优惠)，假如平台租用一台4090按每小时是2.18，假设平台的优惠福利是85折，那么在我的用户下再加上95折，最终价格就是2.18*0.85*0.95=1.76！(优惠目前仅限4090相关服务器)\n2. DAModel平台上，我已经提供好我自己改进项目的专属镜像、镜像里面会给大家配置好环境、并且相对应需要编译的模型都会给大家配置好、真正实现上传数据集和代码立刻开跑！跑实验也快人一步！视频参考：https://www.bilibili.com/video/BV1mg2SYGEGF/\n3. DAModel平台上，我为大家提供了一些常用的数据集，并且格式已经转换好，包含COCO2017,VOC2007+2012,CrowdHuman,Visdrone2019,BDD100K. 视频参考：https://www.bilibili.com/video/BV1UV5qzuEGf/\n4. 谨记，以上福利仅在以下注册链接上进行注册才享有！注册链接：https://damodel.com/register?source=47EC6199\n5. 可以通过qq搜索以下群号：728938131，添加DAModel平台交流群，里面有DAModel官方的客服帮助大家答疑相关平台的问题！\n\n# Explanation\n- **yolo**  \n    yolo文件夹是针对yolov5,yolov7,yolov8的数据集处理脚本，具体可看[readme.md](https://github.com/z1069614715/objectdetection_script/blob/master/yolo/readme.md).  \n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV1tM411a7it/).  \n\n- **damo-yolo**  \n    damo-yolo文件夹是针对DAMO-YOLO的数据集处理脚本，具体可看[readme.md](https://github.com/z1069614715/objectdetection_script/blob/master/damo-yolo/readme.md).  \n    目前只支持voc转coco.  \n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV1M24y1v7Uf/).   \n\n- **yolo-improve**  \n    yolo-improve文件夹是提供一些关于yolo系列模型改进思路的源码，具体可看[readme.md](https://github.com/z1069614715/objectdetection_script/blob/master/yolo-improve/readme.md).   \n\n- **yolo-gradcam**  \n    yolo-gradcam文件夹是提供一些关于可视化yolo模型的热力图的源码，具体可看[readme.md](https://github.com/z1069614715/objectdetection_script/blob/master/yolo-gradcam/README.md).\n\n- **cv-attention**  \n    cv-attention文件夹是关于CV的一些经典注意力机制，具体可看[readme.md](https://github.com/z1069614715/objectdetection_script/blob/master/cv-attention/readme.md).\n\n- **objectdetection-tricks**  \n    objectdetection-tricks文件夹是关于目标检测中各种小技巧，具体可看[readme.md](https://github.com/z1069614715/objectdetection_script/blob/master/objectdetection-tricks/readme.md).\n\n- **mmdet-course**  \n    mmdet-course文件夹是提供mmdet教程相关资料，具体可看[readme.md](https://github.com/z1069614715/objectdetection_script/blob/master/mmdet-course/readme.md)\n\n- **data-offline-aug**  \n    data-offline-aug文件夹是关于图像任务的离线数据增强脚本，具体可看[readme.md](https://github.com/z1069614715/objectdetection_script/blob/master/data-offline-aug/readme.md)\n\n[![Forkers repo roster for @z1069614715/objectdetection_script](https://reporoster.com/forks/z1069614715/objectdetection_script)](https://github.com/z1069614715/objectdetection_script/network/members)\n[![Stargazers repo roster for @z1069614715/objectdetection_script](https://reporoster.com/stars/z1069614715/objectdetection_script)](https://github.com/z1069614715/objectdetection_script/stargazers)\n\n# Star History\n\n[![Star History Chart](https://api.star-history.com/svg?repos=z1069614715/objectdetection_script&type=Date)](https://star-history.com/#z1069614715/objectdetection_script&Date)\n\n<a id=\"0\"></a>\n"
  },
  {
    "path": "visdrone2019-benchmark/readme.md",
    "content": "# VisDrone2019 Testset Benchmark\n### Visdrone2019 测试集(1610张图) COCO指标 (有需要使用对比实验数据的同学可以直接用)\n### Jetson Orin Nano 4G TensorRT(8.6.2) FP16 BatchSize=1\n### RTX4090D TensorRT(10.11.0) FP16 BatchSize=1\n\n![Visdrone2019 Benchmark](https://github.com/z1069614715/objectdetection_script/blob/master/visdrone2019-benchmark/visdrone_ap_gflops_params_bubble.svg)\n\n| model | Input Shape | GFlops | Params | Ap | Ap50 | APs | APm | APl | FPS(Jetson Orin Nano 4G) | FPS(RTX4090D) |\n| :----: | :----: | :----: | :----: | :----: | :----: | :----: | :----: | :----: | :----: | :----: |\n| Faster-RCNN-R50-FPN-CIOU | (768, 1344) | 208G | 41.39M | 0.194 | 0.329 | 0.095 | 0.309 | 0.429 | - | - |\n| Cascade-RCNN-R50-FPN | (768, 1344) | 236G | 69.29M | 0.197 | 0.326 | 0.099 | 0.309 | 0.406 | - | - |\n| ATSS-R50-FPN-DyHead | (768, 1344) | 110G | 38.91M | 0.204 | 0.338 | 0.100 | 0.317 | 0.485 | - | - |\n| TOOD-R50 | (768, 1344) | 199G | 32.04M | 0.204 | 0.339 | 0.102 | 0.317 | 0.403 | - | - |\n| DINO | (750, 1333) | 274G | 47.56M | 0.253 | 0.445 | 0.150 | 0.371 | 0.503 | - | - |\n| DDQ | (768, 1333) | - | - | 0.268 | 0.463 | 0.159 | 0.390 | 0.526 | - | - |\n| YOLOX-Tiny | (640, 640) | 7.578G | 5.035M | 0.148 | 0.278 | 0.076 | 0.221 | 0.278 | - | - |\n| GFL | (768, 1344) | 206G | 32.279M | 0.193 | 0.321 | 0.094 | 0.300 | 0.409 | - | - |\n| RTMDet-Tiny | (640, 640) | 8.033G | 4.876M | 0.184 | 0.312 | 0.077 | 0.288 | 0.445 | - | - |\n| RetinaNet-R50-FPN | (768, 1344) | 210G | 36.517M | 0.164 | 0.276 | 0.060 | 0.274 | 0.427 | - | - |\n| RTDETR-R18(Ultralytics版本实现) | (640, 640) | 57G | 19.885M | 0.208 | 0.363 | 0.113 | 0.305 | 0.413 | 28.3 | 889.75 |\n| D-Fine-N | (640, 640) | 7.1238G | 3.73M | 0.183 | 0.334 | 0.093 | 0.270 | 0.442 | 53.5 | 924.63 |\n| D-Fine-S | (640, 640) | 24.8595G | 10.18M | 0.227 | 0.394 | 0.128 | 0.331 | 0.468 | 29.9 | 696.18 |\n| D-Fine-M | (640, 640) | 56.3726G | 19.19M | 0.239 | 0.416 | 0.136 | 0.346 | 0.464 | 18.2 | 480.95 |\n| D-Fine-L | (640, 640) | 90.7205G | 30.67M | 0.244 | 0.421 | 0.137 | 0.353 | 0.522 | - | - |\n| D-Fine-L-4scale(P2345) | (640, 640) | 214.587G | 33.75M | 0.270 | 0.459 | 0.165 | 0.380 | 0.521 | - | - |\n| D-Fine-Dinov3(ConvNext-Tiny)-L | (640, 640) | 117.212G | 44.41M | 0.244 | 0.424 | 0.133 | 0.361 | 0.496 | 7.7 | 411.95 |\n| D-Fine-Dinov3(ConvNext-Tiny)-L-4scale(P2345) | (640, 640) | 152.504G | 41.18M | 0.284 | 0.480 | 0.178 | 0.398 | 0.526 | - | - |\n| DEIM-D-Fine-N | (640, 640) | 7.1238G | 3.73M | 0.177 | 0.322 | 0.090 | 0.262 | 0.376 | 53.5 | 924.63 |\n| DEIM-D-Fine-S | (640, 640) | 24.8595G | 10.18M | 0.219 | 0.384 | 0.122 | 0.321 | 0.397 | 29.9 | 696.18 |\n| DEIM-D-Fine-M | (640, 640) | 56.3726G | 19.19M | 0.242 | 0.417 | 0.139 | 0.344 | 0.485 | 18.2 | 480.95 |\n| DEIMV2-S | (640, 640) | 25.3903G | 9.67M | 0.204 | 0.363 | 0.109 | 0.299 | 0.451 | 16.5 | 569.92 |\n| RTDETR-R18(官方pytorch版本) | (640, 640) | 60G | 20M | 0.185 | 0.333 | 0.139 | 0.275 | 0.423 | - | - |\n| RTDETRV2-R18(官方pytorch版本) | (640, 640) | 60G | 20M | 0.222 | 0.391 | 0.127 | 0.321 | 0.456 | - | - |\n| YOLOV5n | (640, 640) | 4.2G | 1.77M | 0.099 | 0.205 | 0.046 | 0.154 | 0.231 | - | - |\n| YOLOV5s | (640, 640) | 15.8G | 7.04M | 0.130 | 0.257 | 0.062 | 0.201 | 0.259 | - | - |\n| YOLOV5m | (640, 640) | 48.0G | 20.89M | 0.152 | 0.288 | 0.073 | 0.233 | 0.306 | - | - |\n| YOLO8n | (640, 640) | 8.1G | 3.0M | 0.144 | 0.259 | 0.059 | 0.225 | 0.339 | - | 2114.04 |\n| YOLO8n | (960, 960) | 18.5G | 3.0M | 0.192 | 0.333 | 0.099 | 0.288 | 0.377 | - | 1506.86 |\n| YOLO8s | (640, 640) | 28.5G | 11.13M | 0.173 | 0.307 | 0.078 | 0.269 | 0.372 | - | 1607.19 |\n| YOLO8s | (960, 960) | 64.5G | 11.13M | 0.224 | 0.386 | 0.123 | 0.333 | 0.441 | - | 1128.2 |\n| YOLO8m | (640, 640) | 78.7G | 25.85M | 0.190 | 0.332 | 0.090 | 0.294 | 0.417 | - | 924.37 |\n| YOLO10n | (640, 640) | 6.5G | 2.28M | 0.142 | 0.261 | 0.063 | 0.224 | 0.292 | - | 1694.1 |\n| YOLO10s | (640, 640) | 21.4G | 7.22M | 0.179 | 0.323 | 0.086 | 0.278 | 0.361 | - | 1336.88 |\n| YOLO10m | (640, 640) | 58.9G | 15.32M | 0.195 | 0.345 | 0.097 | 0.300 | 0.414 | - | 842.27 |\n| YOLO11n | (640, 640) | 6.3G | 2.59M | 0.142 | 0.258 | 0.058 | 0.225 | 0.316 | 94.2 | 1425.91 |\n| YOLO11s | (640, 640) | 21.3G | 9.42M | 0.176 | 0.313 | 0.080 | 0.272 | 0.364 | 56.4 | 1171.25 |\n| YOLO11m | (640, 640) | 67.7G | 20.04M | 0.203 | 0.350 | 0.098 | 0.312 | 0.413 | 28.9 | 752.8 |\n| YOLO12n | (640, 640) | 6.3G | 2.56M | 0.142 | 0.259 | 0.057 | 0.224 | 0.346 | - | 1133.07 |\n| YOLO12s | (640, 640) | 21.2G | 9.23M | 0.176 | 0.312 | 0.081 | 0.274 | 0.356 | - | 901.36 |\n| YOLO12m | (640, 640) | 67.2G | 20.11M | 0.192 | 0.336 | 0.094 | 0.298 | 0.386 | - | 648.88 |\n| [FBRT-YOLO-N](https://arxiv.org/abs/2504.20670) | (640, 640) | 6.7G | 0.8M | 0.148 | 0.265 | 0.062 | 0.234 | 0.323 | - | - |\n| [FBRT-YOLO-S](https://arxiv.org/abs/2504.20670) | (640, 640) | 22.9G | 2.9M | 0.183 | 0.323 | 0.085 | 0.283 | 0.425 | - | - |\n| [FBRT-YOLO-M](https://arxiv.org/abs/2504.20670) | (640, 640) | 58.7G | 7.36M | 0.196 | 0.344 | 0.094 | 0.309 | 0.421 | - | - |\n| YOLO13n | (640, 640) | 6.2G | 2.45M | 0.133 | 0.244 | 0.055 | 0.210 | 0.317 | - | - |\n| YOLO13s | (640, 640) | 20.1G | 9.0M | 0.167 | 0.297 | 0.077 | 0.258 | 0.387 | - | - |\n| YOLO8m-worldv2 | (640, 640) | 88.1G | 28.36M | 0.186 | 0.326 | 0.085 | 0.288 | 0.419 | - | - |\n| YOLOE-11m | (640, 640) | 67.7G | 20.04M | 0.195 | 0.339 | 0.092 | 0.301 | 0.427 | - | - |\n| YOLO26n | (640, 640) | 5.2G | 2.38M | 0.135 | 0.249 | 0.063 | 0.203 | 0.291 | - | 1495.93 |\n| YOLO26n | (960, 960) | 11.7G | 2.38M | 0.185 | 0.322 | 0.100 | 0.271 | 0.377 | - | 1197 |\n| YOLO26s | (640, 640) | 20.5G | 9.47M | 0.160 | 0.294 | 0.082 | 0.240 | 0.362 | - | 1229.47 |\n| YOLO26m | (640, 640) | 67.9G | 20.36M | 0.186 | 0.332 | 0.096 | 0.281 | 0.361 | - | 866.74 |"
  },
  {
    "path": "yolo/data.yaml",
    "content": "# dataset path\ntrain: ./dataset/images/train\nval: ./dataset/images/val\ntest: ./dataset/images/test\n\n# number of classes\nnc: \n\n# class names\nnames: []"
  },
  {
    "path": "yolo/dataset/VOCdevkit/Annotations/ReadMe.md",
    "content": "# 存放VOC标注格式的文件夹"
  },
  {
    "path": "yolo/dataset/VOCdevkit/JPEGImages/ReadMe.md",
    "content": "# 存放图像的文件夹"
  },
  {
    "path": "yolo/dataset/VOCdevkit/txt/ReadMe.md",
    "content": "# 存放YOLO标注格式的文件夹"
  },
  {
    "path": "yolo/dataset/split_data.py",
    "content": "import os, shutil, random\nrandom.seed(0)\nimport numpy as np\nfrom sklearn.model_selection import train_test_split\n\nval_size = 0.1\ntest_size = 0.2\npostfix = 'jpg'\nimgpath = 'VOCdevkit/JPEGImages'\ntxtpath = 'VOCdevkit/txt'\n\nos.makedirs('images/train', exist_ok=True)\nos.makedirs('images/val', exist_ok=True)\nos.makedirs('images/test', exist_ok=True)\nos.makedirs('labels/train', exist_ok=True)\nos.makedirs('labels/val', exist_ok=True)\nos.makedirs('labels/test', exist_ok=True)\n\nlistdir = np.array([i for i in os.listdir(txtpath) if 'txt' in i])\nrandom.shuffle(listdir)\ntrain, val, test = listdir[:int(len(listdir) * (1 - val_size - test_size))], listdir[int(len(listdir) * (1 - val_size - test_size)):int(len(listdir) * (1 - test_size))], listdir[int(len(listdir) * (1 - test_size)):]\nprint(f'train set size:{len(train)} val set size:{len(val)} test set size:{len(test)}')\n\nfor i in train:\n    shutil.copy('{}/{}.{}'.format(imgpath, i[:-4], postfix), 'images/train/{}.{}'.format(i[:-4], postfix))\n    shutil.copy('{}/{}'.format(txtpath, i), 'labels/train/{}'.format(i))\n\nfor i in val:\n    shutil.copy('{}/{}.{}'.format(imgpath, i[:-4], postfix), 'images/val/{}.{}'.format(i[:-4], postfix))\n    shutil.copy('{}/{}'.format(txtpath, i), 'labels/val/{}'.format(i))\n\nfor i in test:\n    shutil.copy('{}/{}.{}'.format(imgpath, i[:-4], postfix), 'images/test/{}.{}'.format(i[:-4], postfix))\n    shutil.copy('{}/{}'.format(txtpath, i), 'labels/test/{}'.format(i))"
  },
  {
    "path": "yolo/dataset/xml2txt.py",
    "content": "import xml.etree.ElementTree as ET\nimport os, cv2\nimport numpy as np\nfrom os import listdir\nfrom os.path import join\n\nclasses = []\n\ndef convert(size, box):\n    dw = 1. / (size[0])\n    dh = 1. / (size[1])\n    x = (box[0] + box[1]) / 2.0 - 1\n    y = (box[2] + box[3]) / 2.0 - 1\n    w = box[1] - box[0]\n    h = box[3] - box[2]\n    x = x * dw\n    w = w * dw\n    y = y * dh\n    h = h * dh\n    return (x, y, w, h)\n\n\ndef convert_annotation(xmlpath, xmlname):\n    with open(xmlpath, \"r\", encoding='utf-8') as in_file:\n        txtname = xmlname[:-4] + '.txt'\n        txtfile = os.path.join(txtpath, txtname)\n        tree = ET.parse(in_file)\n        root = tree.getroot()\n        filename = root.find('filename')\n        img = cv2.imdecode(np.fromfile('{}/{}.{}'.format(imgpath, xmlname[:-4], postfix), np.uint8), cv2.IMREAD_COLOR)\n        h, w = img.shape[:2]\n        res = []\n        for obj in root.iter('object'):\n            cls = obj.find('name').text\n            if cls not in classes:\n                classes.append(cls)\n            cls_id = classes.index(cls)\n            xmlbox = obj.find('bndbox')\n            b = (float(xmlbox.find('xmin').text), float(xmlbox.find('xmax').text), float(xmlbox.find('ymin').text),\n                 float(xmlbox.find('ymax').text))\n            bb = convert((w, h), b)\n            res.append(str(cls_id) + \" \" + \" \".join([str(a) for a in bb]))\n        if len(res) != 0:\n            with open(txtfile, 'w+') as f:\n                f.write('\\n'.join(res))\n\n\nif __name__ == \"__main__\":\n    postfix = 'jpg'\n    imgpath = 'VOCdevkit/JPEGImages'\n    xmlpath = 'VOCdevkit/Annotations'\n    txtpath = 'VOCdevkit/txt'\n    \n    if not os.path.exists(txtpath):\n        os.makedirs(txtpath, exist_ok=True)\n    \n    list = os.listdir(xmlpath)\n    error_file_list = []\n    for i in range(0, len(list)):\n        try:\n            path = os.path.join(xmlpath, list[i])\n            if ('.xml' in path) or ('.XML' in path):\n                convert_annotation(path, list[i])\n                print(f'file {list[i]} convert success.')\n            else:\n                print(f'file {list[i]} is not xml format.')\n        except Exception as e:\n            print(f'file {list[i]} convert error.')\n            print(f'error message:\\n{e}')\n            error_file_list.append(list[i])\n    print(f'this file convert failure\\n{error_file_list}')\n    print(f'Dataset Classes:{classes}')"
  },
  {
    "path": "yolo/readme.md",
    "content": "# YOLOV5,YOLOV7,YOLOV8的数据集处理文件\n本目录下的脚本是针对与yolov5,v7,v8的数据集处理脚本，支持如下：\n1. VOC标注格式转换为YOLO标注格式。\n2. 对数据集进行划分训练集，验证集，测试集。\n\n# VOC标注格式数据集使用示例\n1. 把图片存放在dataset\\VOCdevkit\\JPEGImages中，图片后缀需要一致，比如都是jpg或者png等等，不支持混合的图片后缀格式，比如一些是jpg，一些是png。\n2. 把VOC标注格式的XML文件存放在dataset\\VOCdevkit\\Annotations中。\n3. 运行xml2txt.py,在这个文件中其会把Annotations中的XML格式标注文件转换到txt中的yolo格式标注文件。其中xml2txt.py中的postfix参数是JPEGImages的图片后缀,修改成图片的后缀即可，默认为jpg。比如我的图片都是png后缀的，需要把postfix修改为png即可。其中运行这个文件的时候，输出信息会输出你的数据集的类别，你需要把类别列表复制到data.yaml中的names中，并且修改nc为你的类别数，也就是names中类别个数。\n4. 运行split_data.py,这个文件是划分训练、验证、测试集。其中支持修改val_size**验证集比例**和test_size**测试集比例**，可以在split_data.py中找到对应的参数进行修改，然后postfix参数也是你的图片数据集后缀格式，默认为jpg，如果你的图片后缀不是jpg结尾的话，需要修改一下这个参数。\n\n# YOLO标注格式数据集使用示例\n1. 把图片存放在dataset\\VOCdevkit\\JPEGImages中，图片后缀需要一致，比如都是jpg或者png等等，不支持混合的图片后缀格式，比如一些是jpg，一些是png。\n2. 把YOLO标注格式的TXT文件存放在dataset\\VOCdevkit\\txt中。\n3. 运行split_data.py,这个文件是划分训练、验证、测试集。其中支持修改val_size**验证集比例**和test_size**测试集比例**，可以在split_data.py中找到对应的参数进行修改，然后postfix参数也是你的图片数据集后缀格式，默认为jpg，如果你的图片后缀不是jpg结尾的话，需要修改一下这个参数。\n4. 在data.yaml中的names设置你的类别，其为一个list，比如我的YOLO标注格式数据集中，0代表face，1代表body，那在data.yaml中就是names:['face', 'body']，然后nc:2，nc就是类别个数。\n"
  },
  {
    "path": "yolo-gradcam/README.md",
    "content": "# yolo-gradcam\nyolo model with gradcam visual.  \n即插即用,不需要对源码进行任何修改!\n\n## 哔哩哔哩视频教学地址\n1. yolov5-[哔哩哔哩地址](https://www.bilibili.com/video/BV1F6421V77v/)\n2. yolov7-[哔哩哔哩地址](https://www.bilibili.com/video/BV1F6421V77v/)\n3. yolov8-[哔哩哔哩地址](https://www.bilibili.com/video/BV1T2N6eaEFD/)\n4. yolov9-[哔哩哔哩地址](https://www.bilibili.com/video/BV14H4y157MP/)\n5. yolov11-[哔哩哔哩地址](https://www.bilibili.com/video/BV1T2N6eaEFD/)\n\n## 环境\npip install grad-cam==1.4.8 -i https://pypi.tuna.tsinghua.edu.cn/simple\n\n## 注意事项\n1. yolov5是在v7.0进行编写和测试的。\n2. yolov7是在2023.10.1号的版本进行编写和测试的。\n3. yolov8是在2024.1.31号的版本进行编写和测试的。\n4. yolov9是在2024.3.7号的版本进行编写和测试的。\n5. 建议在新版本下进行使用，旧版本可能会有报错，需要自行解决。\n"
  },
  {
    "path": "yolo-gradcam/yolov11_heatmap.py",
    "content": "import warnings\nwarnings.filterwarnings('ignore')\nwarnings.simplefilter('ignore')\nimport torch, yaml, cv2, os, shutil, sys, copy\nimport numpy as np\nnp.random.seed(0)\nimport matplotlib.pyplot as plt\nfrom tqdm import trange\nfrom PIL import Image\nfrom ultralytics import YOLO\nfrom ultralytics.nn.tasks import attempt_load_weights\nfrom ultralytics.utils.torch_utils import intersect_dicts\nfrom ultralytics.utils.ops import xywh2xyxy, non_max_suppression\nfrom pytorch_grad_cam import GradCAMPlusPlus, GradCAM, XGradCAM, EigenCAM, HiResCAM, LayerCAM, RandomCAM, EigenGradCAM, KPCA_CAM, AblationCAM\nfrom pytorch_grad_cam.utils.image import show_cam_on_image, scale_cam_image\nfrom pytorch_grad_cam.activations_and_gradients import ActivationsAndGradients\n\ndef letterbox(im, new_shape=(640, 640), color=(114, 114, 114), auto=True, scaleFill=False, scaleup=True, stride=32):\n    # Resize and pad image while meeting stride-multiple constraints\n    shape = im.shape[:2]  # current shape [height, width]\n    if isinstance(new_shape, int):\n        new_shape = (new_shape, new_shape)\n\n    # Scale ratio (new / old)\n    r = min(new_shape[0] / shape[0], new_shape[1] / shape[1])\n    if not scaleup:  # only scale down, do not scale up (for better val mAP)\n        r = min(r, 1.0)\n\n    # Compute padding\n    ratio = r, r  # width, height ratios\n    new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r))\n    dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1]  # wh padding\n    if auto:  # minimum rectangle\n        dw, dh = np.mod(dw, stride), np.mod(dh, stride)  # wh padding\n    elif scaleFill:  # stretch\n        dw, dh = 0.0, 0.0\n        new_unpad = (new_shape[1], new_shape[0])\n        ratio = new_shape[1] / shape[1], new_shape[0] / shape[0]  # width, height ratios\n\n    dw /= 2  # divide padding into 2 sides\n    dh /= 2\n\n    if shape[::-1] != new_unpad:  # resize\n        im = cv2.resize(im, new_unpad, interpolation=cv2.INTER_LINEAR)\n    top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1))\n    left, right = int(round(dw - 0.1)), int(round(dw + 0.1))\n    im = cv2.copyMakeBorder(im, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color)  # add border\n    return im, ratio, (top, bottom, left, right)\n\nclass ActivationsAndGradients:\n    \"\"\" Class for extracting activations and\n    registering gradients from targetted intermediate layers \"\"\"\n\n    def __init__(self, model, target_layers, reshape_transform):\n        self.model = model\n        self.gradients = []\n        self.activations = []\n        self.reshape_transform = reshape_transform\n        self.handles = []\n        for target_layer in target_layers:\n            self.handles.append(\n                target_layer.register_forward_hook(self.save_activation))\n            # Because of https://github.com/pytorch/pytorch/issues/61519,\n            # we don't use backward hook to record gradients.\n            self.handles.append(\n                target_layer.register_forward_hook(self.save_gradient))\n\n    def save_activation(self, module, input, output):\n        activation = output\n\n        if self.reshape_transform is not None:\n            activation = self.reshape_transform(activation)\n        self.activations.append(activation.cpu().detach())\n\n    def save_gradient(self, module, input, output):\n        if not hasattr(output, \"requires_grad\") or not output.requires_grad:\n            # You can only register hooks on tensor requires grad.\n            return\n\n        # Gradients are computed in reverse order\n        def _store_grad(grad):\n            if self.reshape_transform is not None:\n                grad = self.reshape_transform(grad)\n            self.gradients = [grad.cpu().detach()] + self.gradients\n\n        output.register_hook(_store_grad)\n\n    def post_process(self, result):\n        if self.model.end2end:\n            logits_ = result[:, :, 4:]\n            boxes_ = result[:, :, :4]\n            sorted, indices = torch.sort(logits_[:, :, 0], descending=True)\n            return logits_[0][indices[0]], boxes_[0][indices[0]]\n        elif self.model.task == 'detect':\n            logits_ = result[:, 4:]\n            boxes_ = result[:, :4]\n            sorted, indices = torch.sort(logits_.max(1)[0], descending=True)\n            return torch.transpose(logits_[0], dim0=0, dim1=1)[indices[0]], torch.transpose(boxes_[0], dim0=0, dim1=1)[indices[0]]\n        elif self.model.task == 'segment':\n            logits_ = result[0][:, 4:4 + self.model.nc]\n            boxes_ = result[0][:, :4]\n            mask_p, mask_nm = result[1][2].squeeze(), result[1][1].squeeze().transpose(1, 0)\n            c, h, w = mask_p.size()\n            mask = (mask_nm @ mask_p.view(c, -1))\n            sorted, indices = torch.sort(logits_.max(1)[0], descending=True)\n            return torch.transpose(logits_[0], dim0=0, dim1=1)[indices[0]], torch.transpose(boxes_[0], dim0=0, dim1=1)[indices[0]], mask[indices[0]]\n        elif self.model.task == 'pose':\n            logits_ = result[:, 4:4 + self.model.nc]\n            boxes_ = result[:, :4]\n            poses_ = result[:, 4 + self.model.nc:]\n            sorted, indices = torch.sort(logits_.max(1)[0], descending=True)\n            return torch.transpose(logits_[0], dim0=0, dim1=1)[indices[0]], torch.transpose(boxes_[0], dim0=0, dim1=1)[indices[0]], torch.transpose(poses_[0], dim0=0, dim1=1)[indices[0]]\n        elif self.model.task == 'obb':\n            logits_ = result[:, 4:4 + self.model.nc]\n            boxes_ = result[:, :4]\n            angles_ = result[:, 4 + self.model.nc:]\n            sorted, indices = torch.sort(logits_.max(1)[0], descending=True)\n            return torch.transpose(logits_[0], dim0=0, dim1=1)[indices[0]], torch.transpose(boxes_[0], dim0=0, dim1=1)[indices[0]], torch.transpose(angles_[0], dim0=0, dim1=1)[indices[0]]\n        elif self.model.task == 'classify':\n            return result[0]\n  \n    def __call__(self, x):\n        self.gradients = []\n        self.activations = []\n        model_output = self.model(x)\n        if self.model.task == 'detect':\n            post_result, pre_post_boxes = self.post_process(model_output[0])\n            return [[post_result, pre_post_boxes]]\n        elif self.model.task == 'segment':\n            post_result, pre_post_boxes, pre_post_mask = self.post_process(model_output)\n            return [[post_result, pre_post_boxes, pre_post_mask]]\n        elif self.model.task == 'pose':\n            post_result, pre_post_boxes, pre_post_pose = self.post_process(model_output[0])\n            return [[post_result, pre_post_boxes, pre_post_pose]]\n        elif self.model.task == 'obb':\n            post_result, pre_post_boxes, pre_post_angle = self.post_process(model_output[0])\n            return [[post_result, pre_post_boxes, pre_post_angle]]\n        elif self.model.task == 'classify':\n            data = self.post_process(model_output)\n            return [data]\n\n    def release(self):\n        for handle in self.handles:\n            handle.remove()\n\nclass yolo_detect_target(torch.nn.Module):\n    def __init__(self, ouput_type, conf, ratio, end2end) -> None:\n        super().__init__()\n        self.ouput_type = ouput_type\n        self.conf = conf\n        self.ratio = ratio\n        self.end2end = end2end\n    \n    def forward(self, data):\n        post_result, pre_post_boxes = data\n        result = []\n        for i in trange(int(post_result.size(0) * self.ratio)):\n            if (self.end2end and float(post_result[i, 0]) < self.conf) or (not self.end2end and float(post_result[i].max()) < self.conf):\n                break\n            if self.ouput_type == 'class' or self.ouput_type == 'all':\n                if self.end2end:\n                    result.append(post_result[i, 0])\n                else:\n                    result.append(post_result[i].max())\n            elif self.ouput_type == 'box' or self.ouput_type == 'all':\n                for j in range(4):\n                    result.append(pre_post_boxes[i, j])\n        return sum(result)\n\nclass yolo_segment_target(yolo_detect_target):\n    def __init__(self, ouput_type, conf, ratio, end2end):\n        super().__init__(ouput_type, conf, ratio, end2end)\n    \n    def forward(self, data):\n        post_result, pre_post_boxes, pre_post_mask = data\n        result = []\n        for i in trange(int(post_result.size(0) * self.ratio)):\n            if float(post_result[i].max()) < self.conf:\n                break\n            if self.ouput_type == 'class' or self.ouput_type == 'all':\n                result.append(post_result[i].max())\n            elif self.ouput_type == 'box' or self.ouput_type == 'all':\n                for j in range(4):\n                    result.append(pre_post_boxes[i, j])\n            elif self.ouput_type == 'segment' or self.ouput_type == 'all':\n                result.append(pre_post_mask[i].mean())\n        return sum(result)\n\nclass yolo_pose_target(yolo_detect_target):\n    def __init__(self, ouput_type, conf, ratio, end2end):\n        super().__init__(ouput_type, conf, ratio, end2end)\n    \n    def forward(self, data):\n        post_result, pre_post_boxes, pre_post_pose = data\n        result = []\n        for i in trange(int(post_result.size(0) * self.ratio)):\n            if float(post_result[i].max()) < self.conf:\n                break\n            if self.ouput_type == 'class' or self.ouput_type == 'all':\n                result.append(post_result[i].max())\n            elif self.ouput_type == 'box' or self.ouput_type == 'all':\n                for j in range(4):\n                    result.append(pre_post_boxes[i, j])\n            elif self.ouput_type == 'pose' or self.ouput_type == 'all':\n                result.append(pre_post_pose[i].mean())\n        return sum(result)\n\nclass yolo_obb_target(yolo_detect_target):\n    def __init__(self, ouput_type, conf, ratio, end2end):\n        super().__init__(ouput_type, conf, ratio, end2end)\n    \n    def forward(self, data):\n        post_result, pre_post_boxes, pre_post_angle = data\n        result = []\n        for i in trange(int(post_result.size(0) * self.ratio)):\n            if float(post_result[i].max()) < self.conf:\n                break\n            if self.ouput_type == 'class' or self.ouput_type == 'all':\n                result.append(post_result[i].max())\n            elif self.ouput_type == 'box' or self.ouput_type == 'all':\n                for j in range(4):\n                    result.append(pre_post_boxes[i, j])\n            elif self.ouput_type == 'obb' or self.ouput_type == 'all':\n                result.append(pre_post_angle[i])\n        return sum(result)\n\nclass yolo_classify_target(yolo_detect_target):\n    def __init__(self, ouput_type, conf, ratio, end2end):\n        super().__init__(ouput_type, conf, ratio, end2end)\n    \n    def forward(self, data):\n        return data.max()\n\nclass yolo_heatmap:\n    def __init__(self, weight, device, method, layer, backward_type, conf_threshold, ratio, show_result, renormalize, task, img_size):\n        device = torch.device(device)\n        model_yolo = YOLO(weight)\n        model_names = model_yolo.names\n        print(f'model class info:{model_names}')\n        model = copy.deepcopy(model_yolo.model)\n        model.to(device)\n        model.info()\n        for p in model.parameters():\n            p.requires_grad_(True)\n        model.eval()\n        \n        model.task = task\n        if not hasattr(model, 'end2end'):\n            model.end2end = False\n        \n        if task == 'detect':\n            target = yolo_detect_target(backward_type, conf_threshold, ratio, model.end2end)\n        elif task == 'segment':\n            target = yolo_segment_target(backward_type, conf_threshold, ratio, model.end2end)\n        elif task == 'pose':\n            target = yolo_pose_target(backward_type, conf_threshold, ratio, model.end2end)\n        elif task == 'obb':\n            target = yolo_obb_target(backward_type, conf_threshold, ratio, model.end2end)\n        elif task == 'classify':\n            target = yolo_classify_target(backward_type, conf_threshold, ratio, model.end2end)\n        else:\n            raise Exception(f\"not support task({task}).\")\n        \n        target_layers = [model.model[l] for l in layer]\n        method = eval(method)(model, target_layers)\n        method.activations_and_grads = ActivationsAndGradients(model, target_layers, None)\n        \n        colors = np.random.uniform(0, 255, size=(len(model_names), 3)).astype(np.int32)\n        self.__dict__.update(locals())\n    \n    def post_process(self, result):\n        result = non_max_suppression(result, conf_thres=self.conf_threshold, iou_thres=0.65)[0]\n        return result\n\n    def draw_detections(self, box, color, name, img):\n        xmin, ymin, xmax, ymax = list(map(int, list(box)))\n        cv2.rectangle(img, (xmin, ymin), (xmax, ymax), tuple(int(x) for x in color), 2) # 绘制检测框\n        cv2.putText(img, str(name), (xmin, ymin - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.8, tuple(int(x) for x in color), 2, lineType=cv2.LINE_AA)  # 绘制类别、置信度\n        return img\n\n    def renormalize_cam_in_bounding_boxes(self, boxes, image_float_np, grayscale_cam):\n        \"\"\"Normalize the CAM to be in the range [0, 1] \n        inside every bounding boxes, and zero outside of the bounding boxes. \"\"\"\n        renormalized_cam = np.zeros(grayscale_cam.shape, dtype=np.float32)\n        for x1, y1, x2, y2 in boxes:\n            x1, y1 = max(x1, 0), max(y1, 0)\n            x2, y2 = min(grayscale_cam.shape[1] - 1, x2), min(grayscale_cam.shape[0] - 1, y2)\n            renormalized_cam[y1:y2, x1:x2] = scale_cam_image(grayscale_cam[y1:y2, x1:x2].copy())    \n        renormalized_cam = scale_cam_image(renormalized_cam)\n        eigencam_image_renormalized = show_cam_on_image(image_float_np, renormalized_cam, use_rgb=True)\n        return eigencam_image_renormalized\n    \n    def process(self, img_path, save_path):\n        # img process\n        try:\n            img = cv2.imdecode(np.fromfile(img_path, np.uint8), cv2.IMREAD_COLOR)\n        except:\n            print(f\"Warning... {img_path} read failure.\")\n            return\n        img, _, (top, bottom, left, right) = letterbox(img, new_shape=(self.img_size, self.img_size), auto=True) # 如果需要完全固定成宽高一样就把auto设置为False\n        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)\n        img = np.float32(img) / 255.0\n        tensor = torch.from_numpy(np.transpose(img, axes=[2, 0, 1])).unsqueeze(0).to(self.device)\n        print(f'tensor size:{tensor.size()}')\n        \n        try:\n            grayscale_cam = self.method(tensor, [self.target])\n        except AttributeError as e:\n            print(f\"Warning... self.method(tensor, [self.target]) failure.\")\n            return\n        \n        grayscale_cam = grayscale_cam[0, :]\n        cam_image = show_cam_on_image(img, grayscale_cam, use_rgb=True)\n        \n        pred = self.model_yolo.predict(tensor, conf=self.conf_threshold, iou=0.7)[0]\n        if self.renormalize and self.task in ['detect', 'segment', 'pose']:\n            cam_image = self.renormalize_cam_in_bounding_boxes(pred.boxes.xyxy.cpu().detach().numpy().astype(np.int32), img, grayscale_cam)\n        if self.show_result:\n            cam_image = pred.plot(img=cam_image,\n                                  conf=True, # 显示置信度\n                                  font_size=None, # 字体大小，None为根据当前image尺寸计算\n                                  line_width=None, # 线条宽度，None为根据当前image尺寸计算\n                                  labels=False, # 显示标签\n                                  )\n        \n        # 去掉padding边界\n        cam_image = cam_image[top:cam_image.shape[0] - bottom, left:cam_image.shape[1] - right]\n        cam_image = Image.fromarray(cam_image)\n        cam_image.save(save_path)\n    \n    def __call__(self, img_path, save_path):\n        # remove dir if exist\n        if os.path.exists(save_path):\n            shutil.rmtree(save_path)\n        # make dir if not exist\n        os.makedirs(save_path, exist_ok=True)\n\n        if os.path.isdir(img_path):\n            for img_path_ in os.listdir(img_path):\n                self.process(f'{img_path}/{img_path_}', f'{save_path}/{img_path_}')\n        else:\n            self.process(img_path, f'{save_path}/result.png')\n        \ndef get_params():\n    params = {\n        'weight': 'yolo11n.pt', # 现在只需要指定权重即可,不需要指定cfg\n        'device': 'cuda:0',\n        'method': 'GradCAMPlusPlus', # GradCAMPlusPlus, GradCAM, XGradCAM, EigenCAM, HiResCAM, LayerCAM, RandomCAM, EigenGradCAM, KPCA_CAM\n        'layer': [10, 12, 14, 16, 18],\n        'backward_type': 'all', # detect:<class, box, all> segment:<class, box, segment, all> pose:<box, keypoint, all> obb:<box, angle, all> classify:<all>\n        'conf_threshold': 0.2, # 0.2\n        'ratio': 0.02, # 0.02-0.1\n        'show_result': True, # 不需要绘制结果请设置为False\n        'renormalize': False, # 需要把热力图限制在框内请设置为True(仅对detect,segment,pose有效)\n        'task':'detect', # 任务(detect,segment,pose,obb,classify)\n        'img_size':640, # 图像尺寸\n    }\n    return params\n\n# pip install grad-cam==1.5.4 --no-deps\nif __name__ == '__main__':\n    model = yolo_heatmap(**get_params())\n    model(r'/home/hjj/Desktop/dataset/dataset_coco/coco/images/val2017/000000361238.jpg', 'result')\n    # model(r'/home/hjj/Desktop/dataset/dataset_coco/coco/images/val2017', 'result')"
  },
  {
    "path": "yolo-gradcam/yolov5_heatmap.py",
    "content": "import warnings\nwarnings.filterwarnings('ignore')\nwarnings.simplefilter('ignore')\nimport torch, yaml, cv2, os, shutil\nimport numpy as np\nnp.random.seed(0)\nimport matplotlib.pyplot as plt\nfrom tqdm import trange\nfrom PIL import Image\nfrom models.yolo import Model\nfrom utils.general import intersect_dicts\nfrom utils.augmentations import letterbox\nfrom utils.general import xywh2xyxy, non_max_suppression\nfrom models.experimental import attempt_load\nfrom pytorch_grad_cam import GradCAMPlusPlus, GradCAM, XGradCAM, EigenCAM, HiResCAM, LayerCAM, RandomCAM, EigenGradCAM\nfrom pytorch_grad_cam.utils.image import show_cam_on_image, scale_cam_image\nfrom pytorch_grad_cam.activations_and_gradients import ActivationsAndGradients\n\nclass ActivationsAndGradients:\n    \"\"\" Class for extracting activations and\n    registering gradients from targetted intermediate layers \"\"\"\n\n    def __init__(self, model, target_layers, reshape_transform):\n        self.model = model\n        self.gradients = []\n        self.activations = []\n        self.reshape_transform = reshape_transform\n        self.handles = []\n        for target_layer in target_layers:\n            self.handles.append(\n                target_layer.register_forward_hook(self.save_activation))\n            # Because of https://github.com/pytorch/pytorch/issues/61519,\n            # we don't use backward hook to record gradients.\n            self.handles.append(\n                target_layer.register_forward_hook(self.save_gradient))\n\n    def save_activation(self, module, input, output):\n        activation = output\n\n        if self.reshape_transform is not None:\n            activation = self.reshape_transform(activation)\n        self.activations.append(activation.cpu().detach())\n\n    def save_gradient(self, module, input, output):\n        if not hasattr(output, \"requires_grad\") or not output.requires_grad:\n            # You can only register hooks on tensor requires grad.\n            return\n\n        # Gradients are computed in reverse order\n        def _store_grad(grad):\n            if self.reshape_transform is not None:\n                grad = self.reshape_transform(grad)\n            self.gradients = [grad.cpu().detach()] + self.gradients\n\n        output.register_hook(_store_grad)\n\n    def post_process(self, result):\n        logits_ = result[..., 4:]\n        boxes_ = result[..., :4]\n        sorted, indices = torch.sort(logits_[..., 0], descending=True)\n        return logits_[0][indices[0]], xywh2xyxy(boxes_[0][indices[0]]).cpu().detach().numpy()\n  \n    def __call__(self, x):\n        self.gradients = []\n        self.activations = []\n        model_output = self.model(x)\n        post_result, pre_post_boxes = self.post_process(model_output[0])\n        return [[post_result, pre_post_boxes]]\n\n    def release(self):\n        for handle in self.handles:\n            handle.remove()\n\nclass yolov5_target(torch.nn.Module):\n    def __init__(self, ouput_type, conf, ratio) -> None:\n        super().__init__()\n        self.ouput_type = ouput_type\n        self.conf = conf\n        self.ratio = ratio\n    \n    def forward(self, data):\n        post_result, pre_post_boxes = data\n        result = []\n        for i in trange(int(post_result.size(0) * self.ratio)):\n            if float(post_result[i, 1:].max()) < self.conf:\n                break\n            if self.ouput_type == 'class' or self.ouput_type == 'all':\n                result.append(post_result[i, 1:].max())\n            elif self.ouput_type == 'box' or self.ouput_type == 'all':\n                for j in range(4):\n                    result.append(pre_post_boxes[i, j])\n        return sum(result)\n\nclass yolov5_heatmap:\n    def __init__(self, weight, device, method, layer, backward_type, conf_threshold, ratio, show_box, renormalize):\n        device = torch.device(device)\n        ckpt = torch.load(weight)\n        model_names = ckpt['model'].names\n        model = attempt_load(weight, device=device)\n        for p in model.parameters():\n            p.requires_grad_(True)\n        model.eval()\n        \n        target = yolov5_target(backward_type, conf_threshold, ratio)\n        target_layers = [model.model[l] for l in layer]\n        method = eval(method)(model, target_layers, use_cuda=device.type == 'cuda')\n        method.activations_and_grads = ActivationsAndGradients(model, target_layers, None)\n\n        colors = np.random.uniform(0, 255, size=(len(model_names), 3)).astype(np.int)\n        self.__dict__.update(locals())\n\n    def post_process(self, result):\n        result = non_max_suppression(result, conf_thres=self.conf_threshold, iou_thres=0.65)[0]\n        return result\n    \n    def draw_detections(self, box, color, name, img):\n        xmin, ymin, xmax, ymax = list(map(int, list(box)))\n        cv2.rectangle(img, (xmin, ymin), (xmax, ymax), tuple(int(x) for x in color), 2)\n        cv2.putText(img, str(name), (xmin, ymin - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.8, tuple(int(x) for x in color), 2, lineType=cv2.LINE_AA)\n        return img\n\n    def renormalize_cam_in_bounding_boxes(self, boxes, image_float_np, grayscale_cam):\n        \"\"\"Normalize the CAM to be in the range [0, 1] \n        inside every bounding boxes, and zero outside of the bounding boxes. \"\"\"\n        renormalized_cam = np.zeros(grayscale_cam.shape, dtype=np.float32)\n        for x1, y1, x2, y2 in boxes:\n            x1, y1 = max(x1, 0), max(y1, 0)\n            x2, y2 = min(grayscale_cam.shape[1] - 1, x2), min(grayscale_cam.shape[0] - 1, y2)\n            renormalized_cam[y1:y2, x1:x2] = scale_cam_image(grayscale_cam[y1:y2, x1:x2].copy())    \n        renormalized_cam = scale_cam_image(renormalized_cam)\n        eigencam_image_renormalized = show_cam_on_image(image_float_np, renormalized_cam, use_rgb=True)\n        return eigencam_image_renormalized\n    \n    def process(self, img_path, save_path):\n        # img process\n        img = cv2.imread(img_path)\n        img = letterbox(img)[0]\n        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)\n        img = np.float32(img) / 255.0\n        tensor = torch.from_numpy(np.transpose(img, axes=[2, 0, 1])).unsqueeze(0).to(self.device)\n        \n        try:\n            grayscale_cam = self.method(tensor, [self.target])\n        except AttributeError as e:\n            return\n        \n        grayscale_cam = grayscale_cam[0, :]\n        cam_image = show_cam_on_image(img, grayscale_cam, use_rgb=True)\n        \n        with torch.no_grad():\n            pred = self.model(tensor)[0]\n            pred = self.post_process(pred)\n        if self.renormalize:\n            cam_image = self.renormalize_cam_in_bounding_boxes(pred[:, :4].cpu().detach().numpy().astype(np.int32), img, grayscale_cam)\n        if self.show_box:\n            for data in pred:\n                data = data.cpu().detach().numpy()\n                cam_image = self.draw_detections(data[:4], self.colors[int(data[5])], f'{self.model_names[int(data[5])]} {float(data[4]):.2f}', cam_image)\n        \n        cam_image = Image.fromarray(cam_image)\n        cam_image.save(save_path)\n    \n    def __call__(self, img_path, save_path):\n        # remove dir if exist\n        if os.path.exists(save_path):\n            shutil.rmtree(save_path)\n        # make dir if not exist\n        os.makedirs(save_path, exist_ok=True)\n\n        if os.path.isdir(img_path):\n            for img_path_ in os.listdir(img_path):\n                self.process(f'{img_path}/{img_path_}', f'{save_path}/{img_path_}')\n        else:\n            self.process(img_path, f'{save_path}/result.png')\n\ndef get_params():\n    params = {\n        'weight': 'runs/train/yolov5n_lamp_exp3/weights/best.pt',\n        'device': 'cuda:0',\n        'method': 'XGradCAM', # GradCAMPlusPlus, GradCAM, XGradCAM, EigenCAM, HiResCAM, LayerCAM, RandomCAM, EigenGradCAM\n        'layer': [16, 19, 21],\n        'backward_type': 'all', # class, box, all\n        'conf_threshold': 0.2, # 0.6\n        'ratio': 0.02, # 0.02-0.1\n        'show_box': False,\n        'renormalize': True\n    }\n    return params\n\nif __name__ == '__main__':\n    model = yolov5_heatmap(**get_params())\n    model(r'/home/hjj/Desktop/dataset/dataset_crowdhuman/images/test', 'result')"
  },
  {
    "path": "yolo-gradcam/yolov7_heatmap.py",
    "content": "import warnings\nwarnings.filterwarnings('ignore')\nwarnings.simplefilter('ignore')\nimport torch, yaml, cv2, os, shutil\nimport numpy as np\nnp.random.seed(0)\nimport matplotlib.pyplot as plt\nfrom tqdm import trange\nfrom PIL import Image\nfrom models.yolo import Model\nfrom utils.datasets import letterbox\nfrom utils.general import xywh2xyxy, non_max_suppression\nfrom models.experimental import attempt_load\nfrom pytorch_grad_cam import GradCAMPlusPlus, GradCAM, XGradCAM, EigenCAM, HiResCAM, LayerCAM, RandomCAM, EigenGradCAM\nfrom pytorch_grad_cam.utils.image import show_cam_on_image, scale_cam_image\nfrom pytorch_grad_cam.activations_and_gradients import ActivationsAndGradients\n\nclass ActivationsAndGradients:\n    \"\"\" Class for extracting activations and\n    registering gradients from targetted intermediate layers \"\"\"\n\n    def __init__(self, model, target_layers, reshape_transform):\n        self.model = model\n        self.gradients = []\n        self.activations = []\n        self.reshape_transform = reshape_transform\n        self.handles = []\n        for target_layer in target_layers:\n            self.handles.append(\n                target_layer.register_forward_hook(self.save_activation))\n            # Because of https://github.com/pytorch/pytorch/issues/61519,\n            # we don't use backward hook to record gradients.\n            self.handles.append(\n                target_layer.register_forward_hook(self.save_gradient))\n\n    def save_activation(self, module, input, output):\n        activation = output\n\n        if self.reshape_transform is not None:\n            activation = self.reshape_transform(activation)\n        self.activations.append(activation.cpu().detach())\n\n    def save_gradient(self, module, input, output):\n        if not hasattr(output, \"requires_grad\") or not output.requires_grad:\n            # You can only register hooks on tensor requires grad.\n            return\n\n        # Gradients are computed in reverse order\n        def _store_grad(grad):\n            if self.reshape_transform is not None:\n                grad = self.reshape_transform(grad)\n            self.gradients = [grad.cpu().detach()] + self.gradients\n\n        output.register_hook(_store_grad)\n\n    def post_process(self, result):\n        boxes_ = result[0][..., :4]\n        logits_ = []\n        for data in result[1]:\n            bs, n, w, h, _ = data.size()\n            logits_.append(data.reshape((bs, n * w * h, _)))\n        logits_ = torch.cat(logits_, dim=1)[..., 4:]\n        sorted, indices = torch.sort(logits_[..., 0], descending=True)\n        logits_ = logits_[0][indices[0]]\n        logits_[:, 0] = torch.sigmoid(logits_[:, 0])\n        return logits_, xywh2xyxy(boxes_[0][indices[0]]).cpu().detach().numpy()\n  \n    def __call__(self, x):\n        self.gradients = []\n        self.activations = []\n        model_output = self.model(x)\n        post_result, pre_post_boxes = self.post_process(model_output)\n        return [[post_result, pre_post_boxes]]\n\n    def release(self):\n        for handle in self.handles:\n            handle.remove()\n\nclass yolov7_target(torch.nn.Module):\n    def __init__(self, ouput_type, conf, ratio) -> None:\n        super().__init__()\n        self.ouput_type = ouput_type\n        self.conf = conf\n        self.ratio = ratio\n    \n    def forward(self, data):\n        post_result, pre_post_boxes = data\n        result = []\n        for i in trange(int(post_result.size(0) * self.ratio)):\n            if float(post_result[i, 1:].max()) < self.conf:\n                break\n            if self.ouput_type == 'class' or self.ouput_type == 'all':\n                result.append(post_result[i, 1:].max())\n            elif self.ouput_type == 'box' or self.ouput_type == 'all':\n                for j in range(4):\n                    result.append(pre_post_boxes[i, j])\n        return sum(result)\n\nclass yolov7_heatmap:\n    def __init__(self, weight, device, method, layer, backward_type, conf_threshold, ratio, show_box, renormalize):\n        device = torch.device(device)\n        ckpt = torch.load(weight)\n        model_names = ckpt['model'].names\n        model = attempt_load(weight, device)\n        for p in model.parameters():\n            p.requires_grad_(True)\n        model.eval()\n        \n        target = yolov7_target(backward_type, conf_threshold, ratio)\n        target_layers = [model.model[l] for l in layer]\n        method = eval(method)(model, target_layers, use_cuda=device.type == 'cuda')\n        method.activations_and_grads = ActivationsAndGradients(model, target_layers, None)\n\n        colors = np.random.uniform(0, 255, size=(len(model_names), 3)).astype(np.int)\n        self.__dict__.update(locals())\n\n    def post_process(self, result):\n        result = non_max_suppression(result, conf_thres=self.conf_threshold, iou_thres=0.65)[0]\n        return result\n    \n    def draw_detections(self, box, color, name, img):\n        xmin, ymin, xmax, ymax = list(map(int, list(box)))\n        cv2.rectangle(img, (xmin, ymin), (xmax, ymax), tuple(int(x) for x in color), 2)\n        cv2.putText(img, str(name), (xmin, ymin - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.8, tuple(int(x) for x in color), 2, lineType=cv2.LINE_AA)\n        return img\n\n    def renormalize_cam_in_bounding_boxes(self, boxes, image_float_np, grayscale_cam):\n        \"\"\"Normalize the CAM to be in the range [0, 1] \n        inside every bounding boxes, and zero outside of the bounding boxes. \"\"\"\n        renormalized_cam = np.zeros(grayscale_cam.shape, dtype=np.float32)\n        for x1, y1, x2, y2 in boxes:\n            x1, y1 = max(x1, 0), max(y1, 0)\n            x2, y2 = min(grayscale_cam.shape[1] - 1, x2), min(grayscale_cam.shape[0] - 1, y2)\n            renormalized_cam[y1:y2, x1:x2] = scale_cam_image(grayscale_cam[y1:y2, x1:x2].copy())    \n        renormalized_cam = scale_cam_image(renormalized_cam)\n        eigencam_image_renormalized = show_cam_on_image(image_float_np, renormalized_cam, use_rgb=True)\n        return eigencam_image_renormalized\n    \n    def process(self, img_path, save_path):\n        # img process\n        img = cv2.imread(img_path)\n        img = letterbox(img)[0]\n        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)\n        img = np.float32(img) / 255.0\n        tensor = torch.from_numpy(np.transpose(img, axes=[2, 0, 1])).unsqueeze(0).to(self.device)\n        \n        try:\n            grayscale_cam = self.method(tensor, [self.target])\n        except AttributeError as e:\n            return\n        \n        grayscale_cam = grayscale_cam[0, :]\n        cam_image = show_cam_on_image(img, grayscale_cam, use_rgb=True)\n        \n        with torch.no_grad():\n            pred = self.model(tensor)\n            pred = self.post_process(pred[0])\n        if self.renormalize:\n            cam_image = self.renormalize_cam_in_bounding_boxes(pred[:, :4].cpu().detach().numpy().astype(np.int32), img, grayscale_cam)\n        if self.show_box:\n            for data in pred:\n                data = data.cpu().detach().numpy()\n                cam_image = self.draw_detections(data[:4], self.colors[int(data[5])], f'{self.model_names[int(data[5])]} {float(data[4]):.2f}', cam_image)\n        \n        cam_image = Image.fromarray(cam_image)\n        cam_image.save(save_path)\n    \n    def __call__(self, img_path, save_path):\n        # remove dir if exist\n        if os.path.exists(save_path):\n            shutil.rmtree(save_path)\n        # make dir if not exist\n        os.makedirs(save_path, exist_ok=True)\n\n        if os.path.isdir(img_path):\n            for img_path_ in os.listdir(img_path):\n                self.process(f'{img_path}/{img_path_}', f'{save_path}/{img_path_}')\n        else:\n            self.process(img_path, f'{save_path}/result.png')\n\ndef get_params():\n    params = {\n        'weight': 'runs/train/yolov7_tiny_custom_fasternet_lamp_exp1/weights/best.pt',\n        'device': 'cuda:0',\n        'method': 'XGradCAM', # GradCAMPlusPlus, GradCAM, XGradCAM, EigenCAM, HiResCAM, LayerCAM, RandomCAM, EigenGradCAM\n        'layer': [11, 14, 17],\n        'backward_type': 'all', # class, box, all\n        'conf_threshold': 0.2, # 0.6\n        'ratio': 0.02, # 0.02-0.1\n        'show_box': False,\n        'renormalize': True\n    }\n    return params\n\nif __name__ == '__main__':\n    model = yolov7_heatmap(**get_params())\n    model(r'/home/hjj/Desktop/dataset/dataset_crowdhuman/images/test', 'result')"
  },
  {
    "path": "yolo-gradcam/yolov8_heatmap.py",
    "content": "import warnings\nwarnings.filterwarnings('ignore')\nwarnings.simplefilter('ignore')\nimport torch, yaml, cv2, os, shutil, sys, copy\nimport numpy as np\nnp.random.seed(0)\nimport matplotlib.pyplot as plt\nfrom tqdm import trange\nfrom PIL import Image\nfrom ultralytics import YOLO\nfrom ultralytics.nn.tasks import attempt_load_weights\nfrom ultralytics.utils.torch_utils import intersect_dicts\nfrom ultralytics.utils.ops import xywh2xyxy, non_max_suppression\nfrom pytorch_grad_cam import GradCAMPlusPlus, GradCAM, XGradCAM, EigenCAM, HiResCAM, LayerCAM, RandomCAM, EigenGradCAM, KPCA_CAM, AblationCAM\nfrom pytorch_grad_cam.utils.image import show_cam_on_image, scale_cam_image\nfrom pytorch_grad_cam.activations_and_gradients import ActivationsAndGradients\n\ndef letterbox(im, new_shape=(640, 640), color=(114, 114, 114), auto=True, scaleFill=False, scaleup=True, stride=32):\n    # Resize and pad image while meeting stride-multiple constraints\n    shape = im.shape[:2]  # current shape [height, width]\n    if isinstance(new_shape, int):\n        new_shape = (new_shape, new_shape)\n\n    # Scale ratio (new / old)\n    r = min(new_shape[0] / shape[0], new_shape[1] / shape[1])\n    if not scaleup:  # only scale down, do not scale up (for better val mAP)\n        r = min(r, 1.0)\n\n    # Compute padding\n    ratio = r, r  # width, height ratios\n    new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r))\n    dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1]  # wh padding\n    if auto:  # minimum rectangle\n        dw, dh = np.mod(dw, stride), np.mod(dh, stride)  # wh padding\n    elif scaleFill:  # stretch\n        dw, dh = 0.0, 0.0\n        new_unpad = (new_shape[1], new_shape[0])\n        ratio = new_shape[1] / shape[1], new_shape[0] / shape[0]  # width, height ratios\n\n    dw /= 2  # divide padding into 2 sides\n    dh /= 2\n\n    if shape[::-1] != new_unpad:  # resize\n        im = cv2.resize(im, new_unpad, interpolation=cv2.INTER_LINEAR)\n    top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1))\n    left, right = int(round(dw - 0.1)), int(round(dw + 0.1))\n    im = cv2.copyMakeBorder(im, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color)  # add border\n    return im, ratio, (top, bottom, left, right)\n\nclass ActivationsAndGradients:\n    \"\"\" Class for extracting activations and\n    registering gradients from targetted intermediate layers \"\"\"\n\n    def __init__(self, model, target_layers, reshape_transform):\n        self.model = model\n        self.gradients = []\n        self.activations = []\n        self.reshape_transform = reshape_transform\n        self.handles = []\n        for target_layer in target_layers:\n            self.handles.append(\n                target_layer.register_forward_hook(self.save_activation))\n            # Because of https://github.com/pytorch/pytorch/issues/61519,\n            # we don't use backward hook to record gradients.\n            self.handles.append(\n                target_layer.register_forward_hook(self.save_gradient))\n\n    def save_activation(self, module, input, output):\n        activation = output\n\n        if self.reshape_transform is not None:\n            activation = self.reshape_transform(activation)\n        self.activations.append(activation.cpu().detach())\n\n    def save_gradient(self, module, input, output):\n        if not hasattr(output, \"requires_grad\") or not output.requires_grad:\n            # You can only register hooks on tensor requires grad.\n            return\n\n        # Gradients are computed in reverse order\n        def _store_grad(grad):\n            if self.reshape_transform is not None:\n                grad = self.reshape_transform(grad)\n            self.gradients = [grad.cpu().detach()] + self.gradients\n\n        output.register_hook(_store_grad)\n\n    def post_process(self, result):\n        if self.model.end2end:\n            logits_ = result[:, :, 4:]\n            boxes_ = result[:, :, :4]\n            sorted, indices = torch.sort(logits_[:, :, 0], descending=True)\n            return logits_[0][indices[0]], boxes_[0][indices[0]]\n        elif self.model.task == 'detect':\n            logits_ = result[:, 4:]\n            boxes_ = result[:, :4]\n            sorted, indices = torch.sort(logits_.max(1)[0], descending=True)\n            return torch.transpose(logits_[0], dim0=0, dim1=1)[indices[0]], torch.transpose(boxes_[0], dim0=0, dim1=1)[indices[0]]\n        elif self.model.task == 'segment':\n            logits_ = result[0][:, 4:4 + self.model.nc]\n            boxes_ = result[0][:, :4]\n            mask_p, mask_nm = result[1][2].squeeze(), result[1][1].squeeze().transpose(1, 0)\n            c, h, w = mask_p.size()\n            mask = (mask_nm @ mask_p.view(c, -1))\n            sorted, indices = torch.sort(logits_.max(1)[0], descending=True)\n            return torch.transpose(logits_[0], dim0=0, dim1=1)[indices[0]], torch.transpose(boxes_[0], dim0=0, dim1=1)[indices[0]], mask[indices[0]]\n        elif self.model.task == 'pose':\n            logits_ = result[:, 4:4 + self.model.nc]\n            boxes_ = result[:, :4]\n            poses_ = result[:, 4 + self.model.nc:]\n            sorted, indices = torch.sort(logits_.max(1)[0], descending=True)\n            return torch.transpose(logits_[0], dim0=0, dim1=1)[indices[0]], torch.transpose(boxes_[0], dim0=0, dim1=1)[indices[0]], torch.transpose(poses_[0], dim0=0, dim1=1)[indices[0]]\n        elif self.model.task == 'obb':\n            logits_ = result[:, 4:4 + self.model.nc]\n            boxes_ = result[:, :4]\n            angles_ = result[:, 4 + self.model.nc:]\n            sorted, indices = torch.sort(logits_.max(1)[0], descending=True)\n            return torch.transpose(logits_[0], dim0=0, dim1=1)[indices[0]], torch.transpose(boxes_[0], dim0=0, dim1=1)[indices[0]], torch.transpose(angles_[0], dim0=0, dim1=1)[indices[0]]\n        elif self.model.task == 'classify':\n            return result[0]\n  \n    def __call__(self, x):\n        self.gradients = []\n        self.activations = []\n        model_output = self.model(x)\n        if self.model.task == 'detect':\n            post_result, pre_post_boxes = self.post_process(model_output[0])\n            return [[post_result, pre_post_boxes]]\n        elif self.model.task == 'segment':\n            post_result, pre_post_boxes, pre_post_mask = self.post_process(model_output)\n            return [[post_result, pre_post_boxes, pre_post_mask]]\n        elif self.model.task == 'pose':\n            post_result, pre_post_boxes, pre_post_pose = self.post_process(model_output[0])\n            return [[post_result, pre_post_boxes, pre_post_pose]]\n        elif self.model.task == 'obb':\n            post_result, pre_post_boxes, pre_post_angle = self.post_process(model_output[0])\n            return [[post_result, pre_post_boxes, pre_post_angle]]\n        elif self.model.task == 'classify':\n            data = self.post_process(model_output)\n            return [data]\n\n    def release(self):\n        for handle in self.handles:\n            handle.remove()\n\nclass yolo_detect_target(torch.nn.Module):\n    def __init__(self, ouput_type, conf, ratio, end2end) -> None:\n        super().__init__()\n        self.ouput_type = ouput_type\n        self.conf = conf\n        self.ratio = ratio\n        self.end2end = end2end\n    \n    def forward(self, data):\n        post_result, pre_post_boxes = data\n        result = []\n        for i in trange(int(post_result.size(0) * self.ratio)):\n            if (self.end2end and float(post_result[i, 0]) < self.conf) or (not self.end2end and float(post_result[i].max()) < self.conf):\n                break\n            if self.ouput_type == 'class' or self.ouput_type == 'all':\n                if self.end2end:\n                    result.append(post_result[i, 0])\n                else:\n                    result.append(post_result[i].max())\n            elif self.ouput_type == 'box' or self.ouput_type == 'all':\n                for j in range(4):\n                    result.append(pre_post_boxes[i, j])\n        return sum(result)\n\nclass yolo_segment_target(yolo_detect_target):\n    def __init__(self, ouput_type, conf, ratio, end2end):\n        super().__init__(ouput_type, conf, ratio, end2end)\n    \n    def forward(self, data):\n        post_result, pre_post_boxes, pre_post_mask = data\n        result = []\n        for i in trange(int(post_result.size(0) * self.ratio)):\n            if float(post_result[i].max()) < self.conf:\n                break\n            if self.ouput_type == 'class' or self.ouput_type == 'all':\n                result.append(post_result[i].max())\n            elif self.ouput_type == 'box' or self.ouput_type == 'all':\n                for j in range(4):\n                    result.append(pre_post_boxes[i, j])\n            elif self.ouput_type == 'segment' or self.ouput_type == 'all':\n                result.append(pre_post_mask[i].mean())\n        return sum(result)\n\nclass yolo_pose_target(yolo_detect_target):\n    def __init__(self, ouput_type, conf, ratio, end2end):\n        super().__init__(ouput_type, conf, ratio, end2end)\n    \n    def forward(self, data):\n        post_result, pre_post_boxes, pre_post_pose = data\n        result = []\n        for i in trange(int(post_result.size(0) * self.ratio)):\n            if float(post_result[i].max()) < self.conf:\n                break\n            if self.ouput_type == 'class' or self.ouput_type == 'all':\n                result.append(post_result[i].max())\n            elif self.ouput_type == 'box' or self.ouput_type == 'all':\n                for j in range(4):\n                    result.append(pre_post_boxes[i, j])\n            elif self.ouput_type == 'pose' or self.ouput_type == 'all':\n                result.append(pre_post_pose[i].mean())\n        return sum(result)\n\nclass yolo_obb_target(yolo_detect_target):\n    def __init__(self, ouput_type, conf, ratio, end2end):\n        super().__init__(ouput_type, conf, ratio, end2end)\n    \n    def forward(self, data):\n        post_result, pre_post_boxes, pre_post_angle = data\n        result = []\n        for i in trange(int(post_result.size(0) * self.ratio)):\n            if float(post_result[i].max()) < self.conf:\n                break\n            if self.ouput_type == 'class' or self.ouput_type == 'all':\n                result.append(post_result[i].max())\n            elif self.ouput_type == 'box' or self.ouput_type == 'all':\n                for j in range(4):\n                    result.append(pre_post_boxes[i, j])\n            elif self.ouput_type == 'obb' or self.ouput_type == 'all':\n                result.append(pre_post_angle[i])\n        return sum(result)\n\nclass yolo_classify_target(yolo_detect_target):\n    def __init__(self, ouput_type, conf, ratio, end2end):\n        super().__init__(ouput_type, conf, ratio, end2end)\n    \n    def forward(self, data):\n        return data.max()\n\nclass yolo_heatmap:\n    def __init__(self, weight, device, method, layer, backward_type, conf_threshold, ratio, show_result, renormalize, task, img_size):\n        device = torch.device(device)\n        model_yolo = YOLO(weight)\n        model_names = model_yolo.names\n        print(f'model class info:{model_names}')\n        model = copy.deepcopy(model_yolo.model)\n        model.to(device)\n        model.info()\n        for p in model.parameters():\n            p.requires_grad_(True)\n        model.eval()\n        \n        model.task = task\n        if not hasattr(model, 'end2end'):\n            model.end2end = False\n        \n        if task == 'detect':\n            target = yolo_detect_target(backward_type, conf_threshold, ratio, model.end2end)\n        elif task == 'segment':\n            target = yolo_segment_target(backward_type, conf_threshold, ratio, model.end2end)\n        elif task == 'pose':\n            target = yolo_pose_target(backward_type, conf_threshold, ratio, model.end2end)\n        elif task == 'obb':\n            target = yolo_obb_target(backward_type, conf_threshold, ratio, model.end2end)\n        elif task == 'classify':\n            target = yolo_classify_target(backward_type, conf_threshold, ratio, model.end2end)\n        else:\n            raise Exception(f\"not support task({task}).\")\n        \n        target_layers = [model.model[l] for l in layer]\n        method = eval(method)(model, target_layers)\n        method.activations_and_grads = ActivationsAndGradients(model, target_layers, None)\n        \n        colors = np.random.uniform(0, 255, size=(len(model_names), 3)).astype(np.int32)\n        self.__dict__.update(locals())\n    \n    def post_process(self, result):\n        result = non_max_suppression(result, conf_thres=self.conf_threshold, iou_thres=0.65)[0]\n        return result\n\n    def draw_detections(self, box, color, name, img):\n        xmin, ymin, xmax, ymax = list(map(int, list(box)))\n        cv2.rectangle(img, (xmin, ymin), (xmax, ymax), tuple(int(x) for x in color), 2) # 绘制检测框\n        cv2.putText(img, str(name), (xmin, ymin - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.8, tuple(int(x) for x in color), 2, lineType=cv2.LINE_AA)  # 绘制类别、置信度\n        return img\n\n    def renormalize_cam_in_bounding_boxes(self, boxes, image_float_np, grayscale_cam):\n        \"\"\"Normalize the CAM to be in the range [0, 1] \n        inside every bounding boxes, and zero outside of the bounding boxes. \"\"\"\n        renormalized_cam = np.zeros(grayscale_cam.shape, dtype=np.float32)\n        for x1, y1, x2, y2 in boxes:\n            x1, y1 = max(x1, 0), max(y1, 0)\n            x2, y2 = min(grayscale_cam.shape[1] - 1, x2), min(grayscale_cam.shape[0] - 1, y2)\n            renormalized_cam[y1:y2, x1:x2] = scale_cam_image(grayscale_cam[y1:y2, x1:x2].copy())    \n        renormalized_cam = scale_cam_image(renormalized_cam)\n        eigencam_image_renormalized = show_cam_on_image(image_float_np, renormalized_cam, use_rgb=True)\n        return eigencam_image_renormalized\n    \n    def process(self, img_path, save_path):\n        # img process\n        try:\n            img = cv2.imdecode(np.fromfile(img_path, np.uint8), cv2.IMREAD_COLOR)\n        except:\n            print(f\"Warning... {img_path} read failure.\")\n            return\n        img, _, (top, bottom, left, right) = letterbox(img, new_shape=(self.img_size, self.img_size), auto=True) # 如果需要完全固定成宽高一样就把auto设置为False\n        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)\n        img = np.float32(img) / 255.0\n        tensor = torch.from_numpy(np.transpose(img, axes=[2, 0, 1])).unsqueeze(0).to(self.device)\n        print(f'tensor size:{tensor.size()}')\n        \n        try:\n            grayscale_cam = self.method(tensor, [self.target])\n        except AttributeError as e:\n            print(f\"Warning... self.method(tensor, [self.target]) failure.\")\n            return\n        \n        grayscale_cam = grayscale_cam[0, :]\n        cam_image = show_cam_on_image(img, grayscale_cam, use_rgb=True)\n        \n        pred = self.model_yolo.predict(tensor, conf=self.conf_threshold, iou=0.7)[0]\n        if self.renormalize and self.task in ['detect', 'segment', 'pose']:\n            cam_image = self.renormalize_cam_in_bounding_boxes(pred.boxes.xyxy.cpu().detach().numpy().astype(np.int32), img, grayscale_cam)\n        if self.show_result:\n            cam_image = pred.plot(img=cam_image,\n                                  conf=True, # 显示置信度\n                                  font_size=None, # 字体大小，None为根据当前image尺寸计算\n                                  line_width=None, # 线条宽度，None为根据当前image尺寸计算\n                                  labels=False, # 显示标签\n                                  )\n        \n        # 去掉padding边界\n        cam_image = cam_image[top:cam_image.shape[0] - bottom, left:cam_image.shape[1] - right]\n        cam_image = Image.fromarray(cam_image)\n        cam_image.save(save_path)\n    \n    def __call__(self, img_path, save_path):\n        # remove dir if exist\n        if os.path.exists(save_path):\n            shutil.rmtree(save_path)\n        # make dir if not exist\n        os.makedirs(save_path, exist_ok=True)\n\n        if os.path.isdir(img_path):\n            for img_path_ in os.listdir(img_path):\n                self.process(f'{img_path}/{img_path_}', f'{save_path}/{img_path_}')\n        else:\n            self.process(img_path, f'{save_path}/result.png')\n        \ndef get_params():\n    params = {\n        'weight': 'yolo11n.pt', # 现在只需要指定权重即可,不需要指定cfg\n        'device': 'cuda:0',\n        'method': 'GradCAMPlusPlus', # GradCAMPlusPlus, GradCAM, XGradCAM, EigenCAM, HiResCAM, LayerCAM, RandomCAM, EigenGradCAM, KPCA_CAM\n        'layer': [10, 12, 14, 16, 18],\n        'backward_type': 'all', # detect:<class, box, all> segment:<class, box, segment, all> pose:<box, keypoint, all> obb:<box, angle, all> classify:<all>\n        'conf_threshold': 0.2, # 0.2\n        'ratio': 0.02, # 0.02-0.1\n        'show_result': True, # 不需要绘制结果请设置为False\n        'renormalize': False, # 需要把热力图限制在框内请设置为True(仅对detect,segment,pose有效)\n        'task':'detect', # 任务(detect,segment,pose,obb,classify)\n        'img_size':640, # 图像尺寸\n    }\n    return params\n\n# pip install grad-cam==1.5.4 --no-deps\nif __name__ == '__main__':\n    model = yolo_heatmap(**get_params())\n    model(r'/home/hjj/Desktop/dataset/dataset_coco/coco/images/val2017/000000361238.jpg', 'result')\n    # model(r'/home/hjj/Desktop/dataset/dataset_coco/coco/images/val2017', 'result')"
  },
  {
    "path": "yolo-gradcam/yolov9_heatmap.py",
    "content": "import warnings\nwarnings.filterwarnings('ignore')\nwarnings.simplefilter('ignore')\nimport torch, yaml, cv2, os, shutil\nimport numpy as np\nnp.random.seed(0)\nimport matplotlib.pyplot as plt\nfrom tqdm import trange\nfrom PIL import Image\nfrom models.yolo import Model\nfrom utils.augmentations import letterbox\nfrom utils.general import xywh2xyxy, non_max_suppression\nfrom models.experimental import attempt_load\nfrom pytorch_grad_cam import GradCAMPlusPlus, GradCAM, XGradCAM, EigenCAM, HiResCAM, LayerCAM, RandomCAM, EigenGradCAM\nfrom pytorch_grad_cam.utils.image import show_cam_on_image, scale_cam_image\nfrom pytorch_grad_cam.activations_and_gradients import ActivationsAndGradients\n\nclass ActivationsAndGradients:\n    \"\"\" Class for extracting activations and\n    registering gradients from targetted intermediate layers \"\"\"\n\n    def __init__(self, model, target_layers, reshape_transform):\n        self.model = model\n        self.gradients = []\n        self.activations = []\n        self.reshape_transform = reshape_transform\n        self.handles = []\n        for target_layer in target_layers:\n            self.handles.append(\n                target_layer.register_forward_hook(self.save_activation))\n            # Because of https://github.com/pytorch/pytorch/issues/61519,\n            # we don't use backward hook to record gradients.\n            self.handles.append(\n                target_layer.register_forward_hook(self.save_gradient))\n\n    def save_activation(self, module, input, output):\n        activation = output\n\n        if self.reshape_transform is not None:\n            activation = self.reshape_transform(activation)\n        self.activations.append(activation.cpu().detach())\n\n    def save_gradient(self, module, input, output):\n        if not hasattr(output, \"requires_grad\") or not output.requires_grad:\n            # You can only register hooks on tensor requires grad.\n            return\n\n        # Gradients are computed in reverse order\n        def _store_grad(grad):\n            if self.reshape_transform is not None:\n                grad = self.reshape_transform(grad)\n            self.gradients = [grad.cpu().detach()] + self.gradients\n\n        output.register_hook(_store_grad)\n\n    def post_process(self, result):\n        logits_ = result[:, 4:]\n        boxes_ = result[:, :4]\n        sorted, indices = torch.sort(logits_.max(1)[0], descending=True)\n        return torch.transpose(logits_[0], dim0=0, dim1=1)[indices[0]], torch.transpose(boxes_[0], dim0=0, dim1=1)[indices[0]], xywh2xyxy(torch.transpose(boxes_[0], dim0=0, dim1=1)[indices[0]]).cpu().detach().numpy()\n\n  \n    def __call__(self, x):\n        self.gradients = []\n        self.activations = []\n        model_output = self.model(x)\n        post_result, pre_post_boxes, post_boxes = self.post_process(model_output[0])\n        return [[post_result, pre_post_boxes]]\n\n    def release(self):\n        for handle in self.handles:\n            handle.remove()\n\nclass yolov9_target(torch.nn.Module):\n    def __init__(self, ouput_type, conf, ratio) -> None:\n        super().__init__()\n        self.ouput_type = ouput_type\n        self.conf = conf\n        self.ratio = ratio\n    \n    def forward(self, data):\n        post_result, pre_post_boxes = data\n        result = []\n        for i in trange(int(post_result.size(0) * self.ratio)):\n            if float(post_result[i].max()) < self.conf:\n                break\n            if self.ouput_type == 'class' or self.ouput_type == 'all':\n                result.append(post_result[i].max())\n            elif self.ouput_type == 'box' or self.ouput_type == 'all':\n                for j in range(4):\n                    result.append(pre_post_boxes[i, j])\n        return sum(result)\n\nclass yolov9_heatmap:\n    def __init__(self, weight, device, method, layer, backward_type, conf_threshold, ratio, show_box, renormalize):\n        device = torch.device(device)\n        ckpt = torch.load(weight)\n        model_names = ckpt['model'].names\n        model = attempt_load(weight, device)\n        for p in model.parameters():\n            p.requires_grad_(True)\n        model.eval()\n        \n        target = yolov9_target(backward_type, conf_threshold, ratio)\n        target_layers = [model.model[l] for l in layer]\n        method = eval(method)(model, target_layers, use_cuda=device.type == 'cuda')\n        method.activations_and_grads = ActivationsAndGradients(model, target_layers, None)\n\n        colors = np.random.uniform(0, 255, size=(len(model_names), 3)).astype(np.int)\n        self.__dict__.update(locals())\n\n    def post_process(self, result):\n        result = non_max_suppression(result, conf_thres=self.conf_threshold, iou_thres=0.65)[0]\n        return result\n    \n    def draw_detections(self, box, color, name, img):\n        xmin, ymin, xmax, ymax = list(map(int, list(box)))\n        cv2.rectangle(img, (xmin, ymin), (xmax, ymax), tuple(int(x) for x in color), 2)\n        cv2.putText(img, str(name), (xmin, ymin - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.8, tuple(int(x) for x in color), 2, lineType=cv2.LINE_AA)\n        return img\n\n    def renormalize_cam_in_bounding_boxes(self, boxes, image_float_np, grayscale_cam):\n        \"\"\"Normalize the CAM to be in the range [0, 1] \n        inside every bounding boxes, and zero outside of the bounding boxes. \"\"\"\n        renormalized_cam = np.zeros(grayscale_cam.shape, dtype=np.float32)\n        for x1, y1, x2, y2 in boxes:\n            x1, y1 = max(x1, 0), max(y1, 0)\n            x2, y2 = min(grayscale_cam.shape[1] - 1, x2), min(grayscale_cam.shape[0] - 1, y2)\n            renormalized_cam[y1:y2, x1:x2] = scale_cam_image(grayscale_cam[y1:y2, x1:x2].copy())    \n        renormalized_cam = scale_cam_image(renormalized_cam)\n        eigencam_image_renormalized = show_cam_on_image(image_float_np, renormalized_cam, use_rgb=True)\n        return eigencam_image_renormalized\n    \n    def process(self, img_path, save_path):\n        # img process\n        img = cv2.imread(img_path)\n        img = letterbox(img)[0]\n        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)\n        img = np.float32(img) / 255.0\n        tensor = torch.from_numpy(np.transpose(img, axes=[2, 0, 1])).unsqueeze(0).to(self.device)\n        \n        try:\n            grayscale_cam = self.method(tensor, [self.target])\n        except AttributeError as e:\n            return\n        \n        grayscale_cam = grayscale_cam[0, :]\n        cam_image = show_cam_on_image(img, grayscale_cam, use_rgb=True)\n        \n        with torch.no_grad():\n            pred = self.model(tensor)\n            pred = self.post_process(pred[0])\n        if self.renormalize:\n            cam_image = self.renormalize_cam_in_bounding_boxes(pred[:, :4].cpu().detach().numpy().astype(np.int32), img, grayscale_cam)\n        if self.show_box:\n            for data in pred:\n                data = data.cpu().detach().numpy()\n                cam_image = self.draw_detections(data[:4], self.colors[int(data[5])], f'{self.model_names[int(data[5])]} {float(data[4]):.2f}', cam_image)\n        \n        cam_image = Image.fromarray(cam_image)\n        cam_image.save(save_path)\n    \n    def __call__(self, img_path, save_path):\n        # remove dir if exist\n        if os.path.exists(save_path):\n            shutil.rmtree(save_path)\n        # make dir if not exist\n        os.makedirs(save_path, exist_ok=True)\n\n        if os.path.isdir(img_path):\n            for img_path_ in os.listdir(img_path):\n                self.process(f'{img_path}/{img_path_}', f'{save_path}/{img_path_}')\n        else:\n            self.process(img_path, f'{save_path}/result.png')\n\ndef get_params():\n    params = {\n        'weight': 'yolov9-c-converted.pt',\n        'device': 'cuda:0',\n        'method': 'XGradCAM', # GradCAMPlusPlus, GradCAM, XGradCAM, EigenCAM, HiResCAM, LayerCAM, RandomCAM, EigenGradCAM\n        'layer': [11, 14, 17],\n        'backward_type': 'all', # class, box, all\n        'conf_threshold': 0.2, # 0.6\n        'ratio': 0.02, # 0.02-0.1\n        'show_box': True,\n        'renormalize': False\n    }\n    return params\n\nif __name__ == '__main__':\n    model = yolov9_heatmap(**get_params())\n    model(r'/root/data_ssd/coco17/images', 'result')"
  },
  {
    "path": "yolo-improve/CAM.py",
    "content": "class CAM(nn.Module):\n    def __init__(self, inc, fusion='weight'):\n        super().__init__()\n        \n        assert fusion in ['weight', 'adaptive', 'concat']\n        self.fusion = fusion\n        \n        self.conv1 = Conv(inc, inc, 3, 1, None, 1, 1)\n        self.conv2 = Conv(inc, inc, 3, 1, None, 1, 3)\n        self.conv3 = Conv(inc, inc, 3, 1, None, 1, 5)\n        \n        self.fusion_1 = Conv(inc, inc, 1)\n        self.fusion_2 = Conv(inc, inc, 1)\n        self.fusion_3 = Conv(inc, inc, 1)\n\n        if self.fusion == 'adaptive':\n            self.fusion_4 = Conv(inc * 3, 3, 1)\n    \n    def forward(self, x):\n        x1 = self.conv1(x)\n        x2 = self.conv2(x)\n        x3 = self.conv3(x)\n        \n        if self.fusion == 'weight':\n            return self.fusion_1(x1) + self.fusion_2(x2) + self.fusion_3(x3)\n        elif self.fusion == 'adaptive':\n            fusion = torch.softmax(self.fusion_4(torch.cat([self.fusion_1(x1), self.fusion_2(x2), self.fusion_3(x3)], dim=1)), dim=1)\n            x1_weight, x2_weight, x3_weight = torch.split(fusion, [1, 1, 1], dim=1)\n            return x1 * x1_weight + x2 * x2_weight + x3 * x3_weight\n        else:\n            return torch.cat([self.fusion_1(x1), self.fusion_2(x2), self.fusion_3(x3)], dim=1)\n\n\nelif m is CAM:\n    c1, c2 = ch[f], (ch[f] * 3 if args[0] == 'concat' else ch[f])\n    args = [c1, args[0]]\n\n\n### yolov5 cam yaml\nnc: 80  # number of classes\ndepth_multiple: 0.33  # model depth multiple\nwidth_multiple: 0.25  # layer channel multiple\nanchors:\n  - [10,13, 16,30, 33,23]  # P3/8\n  - [30,61, 62,45, 59,119]  # P4/16\n  - [116,90, 156,198, 373,326]  # P5/32\n\n# YOLOv5 v6.0 backbone\nbackbone:\n  # [from, number, module, args]\n  [[-1, 1, Conv, [64, 6, 2, 2]],  # 0-P1/2\n   [-1, 1, Conv, [128, 3, 2]],  # 1-P2/4\n   [-1, 3, C3, [128]],\n   [-1, 1, Conv, [256, 3, 2]],  # 3-P3/8\n   [-1, 6, C3, [256]],\n   [-1, 1, Conv, [512, 3, 2]],  # 5-P4/16\n   [-1, 9, C3, [512]],\n   [-1, 1, Conv, [1024, 3, 2]],  # 7-P5/32\n   [-1, 3, C3, [1024]],\n   [-1, 1, SPPF, [1024, 5]],  # 9\n  ]\n\n# YOLOv5 v6.0 head\nhead:\n  [[-1, 1, Conv, [512, 1, 1]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [[-1, 6], 1, Concat, [1]],  # cat backbone P4\n   [-1, 3, C3, [512, False]],  # 13\n\n   [-1, 1, Conv, [256, 1, 1]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [[-1, 4], 1, Concat, [1]],  # cat backbone P3\n   [-1, 3, C3, [256, False]],  # 17 (P3/8-small)\n\n   [-1, 1, Conv, [256, 3, 2]],\n   [[-1, 14], 1, Concat, [1]],  # cat head P4\n   [-1, 3, C3, [512, False]],  # 20 (P4/16-medium)\n\n   [-1, 1, Conv, [512, 3, 2]],\n   [10, 1, CAM, ['weight']],\n   [[-2, -1], 1, Concat, [1]],  # cat head P5\n   [-1, 3, C3, [1024, False]],  # 23 (P5/32-large)\n\n   [[17, 20, 24], 1, Detect, [nc, anchors]],  # Detect(P3, P4, P5)\n  ]\n"
  },
  {
    "path": "yolo-improve/iou.py",
    "content": "import numpy as np\nimport torch, math\n\nclass WIoU_Scale:\n    ''' monotonous: {\n            None: origin v1\n            True: monotonic FM v2\n            False: non-monotonic FM v3\n        }\n        momentum: The momentum of running mean'''\n    \n    iou_mean = 1.\n    monotonous = False\n    _momentum = 1 - 0.5 ** (1 / 7000)\n    _is_train = True\n\n    def __init__(self, iou):\n        self.iou = iou\n        self._update(self)\n    \n    @classmethod\n    def _update(cls, self):\n        if cls._is_train: cls.iou_mean = (1 - cls._momentum) * cls.iou_mean + \\\n                                         cls._momentum * self.iou.detach().mean().item()\n    \n    @classmethod\n    def _scaled_loss(cls, self, gamma=1.9, delta=3):\n        if isinstance(self.monotonous, bool):\n            if self.monotonous:\n                return (self.iou.detach() / self.iou_mean).sqrt()\n            else:\n                beta = self.iou.detach() / self.iou_mean\n                alpha = delta * torch.pow(gamma, beta - delta)\n                return beta / alpha\n        return 1\n    \n\ndef bbox_iou(box1, box2, xywh=True, GIoU=False, DIoU=False, CIoU=False, SIoU=False, EIoU=False, WIoU=False, Focal=False, alpha=1, gamma=0.5, scale=False, eps=1e-7):\n    # Returns Intersection over Union (IoU) of box1(1,4) to box2(n,4)\n\n    # Get the coordinates of bounding boxes\n    if xywh:  # transform from xywh to xyxy\n        (x1, y1, w1, h1), (x2, y2, w2, h2) = box1.chunk(4, -1), box2.chunk(4, -1)\n        w1_, h1_, w2_, h2_ = w1 / 2, h1 / 2, w2 / 2, h2 / 2\n        b1_x1, b1_x2, b1_y1, b1_y2 = x1 - w1_, x1 + w1_, y1 - h1_, y1 + h1_\n        b2_x1, b2_x2, b2_y1, b2_y2 = x2 - w2_, x2 + w2_, y2 - h2_, y2 + h2_\n    else:  # x1, y1, x2, y2 = box1\n        b1_x1, b1_y1, b1_x2, b1_y2 = box1.chunk(4, -1)\n        b2_x1, b2_y1, b2_x2, b2_y2 = box2.chunk(4, -1)\n        w1, h1 = b1_x2 - b1_x1, (b1_y2 - b1_y1).clamp(eps)\n        w2, h2 = b2_x2 - b2_x1, (b2_y2 - b2_y1).clamp(eps)\n\n    # Intersection area\n    inter = (b1_x2.minimum(b2_x2) - b1_x1.maximum(b2_x1)).clamp(0) * \\\n            (b1_y2.minimum(b2_y2) - b1_y1.maximum(b2_y1)).clamp(0)\n\n    # Union Area\n    union = w1 * h1 + w2 * h2 - inter + eps\n    if scale:\n        self = WIoU_Scale(1 - (inter / union))\n\n    # IoU\n    # iou = inter / union # ori iou\n    iou = torch.pow(inter/(union + eps), alpha) # alpha iou\n    if CIoU or DIoU or GIoU or EIoU or SIoU or WIoU:\n        cw = b1_x2.maximum(b2_x2) - b1_x1.minimum(b2_x1)  # convex (smallest enclosing box) width\n        ch = b1_y2.maximum(b2_y2) - b1_y1.minimum(b2_y1)  # convex height\n        if CIoU or DIoU or EIoU or SIoU or WIoU:  # Distance or Complete IoU https://arxiv.org/abs/1911.08287v1\n            c2 = (cw ** 2 + ch ** 2) ** alpha + eps  # convex diagonal squared\n            rho2 = (((b2_x1 + b2_x2 - b1_x1 - b1_x2) ** 2 + (b2_y1 + b2_y2 - b1_y1 - b1_y2) ** 2) / 4) ** alpha  # center dist ** 2\n            if CIoU:  # https://github.com/Zzh-tju/DIoU-SSD-pytorch/blob/master/utils/box/box_utils.py#L47\n                v = (4 / math.pi ** 2) * (torch.atan(w2 / h2) - torch.atan(w1 / h1)).pow(2)\n                with torch.no_grad():\n                    alpha_ciou = v / (v - iou + (1 + eps))\n                if Focal:\n                    return iou - (rho2 / c2 + torch.pow(v * alpha_ciou + eps, alpha)), torch.pow(inter/(union + eps), gamma)  # Focal_CIoU\n                else:\n                    return iou - (rho2 / c2 + torch.pow(v * alpha_ciou + eps, alpha))  # CIoU\n            elif EIoU:\n                rho_w2 = ((b2_x2 - b2_x1) - (b1_x2 - b1_x1)) ** 2\n                rho_h2 = ((b2_y2 - b2_y1) - (b1_y2 - b1_y1)) ** 2\n                cw2 = torch.pow(cw ** 2 + eps, alpha)\n                ch2 = torch.pow(ch ** 2 + eps, alpha)\n                if Focal:\n                    return iou - (rho2 / c2 + rho_w2 / cw2 + rho_h2 / ch2), torch.pow(inter/(union + eps), gamma) # Focal_EIou\n                else:\n                    return iou - (rho2 / c2 + rho_w2 / cw2 + rho_h2 / ch2) # EIou\n            elif SIoU:\n                # SIoU Loss https://arxiv.org/pdf/2205.12740.pdf\n                s_cw = (b2_x1 + b2_x2 - b1_x1 - b1_x2) * 0.5 + eps\n                s_ch = (b2_y1 + b2_y2 - b1_y1 - b1_y2) * 0.5 + eps\n                sigma = torch.pow(s_cw ** 2 + s_ch ** 2, 0.5)\n                sin_alpha_1 = torch.abs(s_cw) / sigma\n                sin_alpha_2 = torch.abs(s_ch) / sigma\n                threshold = pow(2, 0.5) / 2\n                sin_alpha = torch.where(sin_alpha_1 > threshold, sin_alpha_2, sin_alpha_1)\n                angle_cost = torch.cos(torch.arcsin(sin_alpha) * 2 - math.pi / 2)\n                rho_x = (s_cw / cw) ** 2\n                rho_y = (s_ch / ch) ** 2\n                gamma = angle_cost - 2\n                distance_cost = 2 - torch.exp(gamma * rho_x) - torch.exp(gamma * rho_y)\n                omiga_w = torch.abs(w1 - w2) / torch.max(w1, w2)\n                omiga_h = torch.abs(h1 - h2) / torch.max(h1, h2)\n                shape_cost = torch.pow(1 - torch.exp(-1 * omiga_w), 4) + torch.pow(1 - torch.exp(-1 * omiga_h), 4)\n                if Focal:\n                    return iou - torch.pow(0.5 * (distance_cost + shape_cost) + eps, alpha), torch.pow(inter/(union + eps), gamma) # Focal_SIou\n                else:\n                    return iou - torch.pow(0.5 * (distance_cost + shape_cost) + eps, alpha) # SIou\n            elif WIoU:\n                if Focal:\n                    raise RuntimeError(\"WIoU do not support Focal.\")\n                elif scale:\n                    return getattr(WIoU_Scale, '_scaled_loss')(self), (1 - iou) * torch.exp((rho2 / c2)), iou # WIoU https://arxiv.org/abs/2301.10051\n                else:\n                    return iou, torch.exp((rho2 / c2)) # WIoU v1\n            if Focal:\n                return iou - rho2 / c2, torch.pow(inter/(union + eps), gamma)  # Focal_DIoU\n            else:\n                return iou - rho2 / c2  # DIoU\n        c_area = cw * ch + eps  # convex area\n        if Focal:\n            return iou - torch.pow((c_area - union) / c_area + eps, alpha), torch.pow(inter/(union + eps), gamma)  # Focal_GIoU https://arxiv.org/pdf/1902.09630.pdf\n        else:\n            return iou - torch.pow((c_area - union) / c_area + eps, alpha)  # GIoU https://arxiv.org/pdf/1902.09630.pdf\n    if Focal:\n        return iou, torch.pow(inter/(union + eps), gamma)  # Focal_IoU\n    else:\n        return iou  # IoU\n\n### yolov8\nif type(iou) is tuple:\n    if len(iou) == 2:\n        loss_iou = ((1.0 - iou[0]) * iou[1].detach() * weight).sum() / target_scores_sum\n    else:\n        loss_iou = (iou[0] * iou[1] * weight).sum() / target_scores_sum\nelse:\n    loss_iou = ((1.0 - iou) * weight).sum() / target_scores_sum\n    \n### yolov5\niou = bbox_iou(pbox, tbox[i], CIoU=True)\nif type(iou) is tuple:\n    if len(iou) == 2:\n        lbox += (iou[1].detach().squeeze() * (1 - iou[0].squeeze())).mean()\n        iou = iou[0].squeeze()\n    else:\n        lbox += (iou[0] * iou[1]).mean()\n        iou = iou[2].squeeze()\nelse:\n    lbox += (1.0 - iou.squeeze()).mean()  # iou loss\n    iou = iou.squeeze()\n"
  },
  {
    "path": "yolo-improve/paper.md",
    "content": "# 基于YOLO和RT-DETR的论文全流程指导项目<此项目全程由E导主导>\n\n### 1. 入手此项目后如果还需要一对一的服务享受会员优惠,此一对一为E导主导\n\n1. 实验方面讲解 268/h (会员248/h) --（拒绝废话纯干货直击痛点）\n2. 论文方面讲解 298/h (会员268/h) --（拒绝废话纯干货直击痛点）\n\n        一对一业务范畴\n        ①大论文全程问题都可以\n        ②小论文全程问题都可以\n        ③投稿前(帮忙审稿)润色论文及帮看是否符合期刊投稿要求\n        ④投稿后帮忙修改审稿意见\n        ⑤其他业务等等 可询问\n\n### 2. 讲课相关安排\n\n- 1.进群须知：\n- (1) 从入群时间起，群内会员有效期为一年（一年后如有需要则续费即可）\n- (2) 1月份建群起开始直播讲课，逐渐直播+直播回放（而不是加群则提前录制好了全部课程）\n- (3) 讲课方式：qq群课堂or腾讯会议直播(具体群通知)（后进群或没参与直播的可看录屏回放）\n- (4) 每次直播附带直播答疑服务,每次直播约1-2小时\n- (5) 一周至少一次直播课，每次直播会按照以下流程告知讲课内容\n- (6) 项目不附带私人答疑服务,群里附带答疑服务,平时我有时间都会回复群里部分问题\n\n- 2. 答疑相关细节：\n- (1) 直播时答疑：课前excel收集群内近日答疑问题，直播时讲解答疑问题\n- (2) 群内日常答疑：群里附带答疑服务,平时我有时间都会回复群里部分问题\n- 3. 讲课流程：\n- (1) 课前\n- - 课前 先 提前告知讲课时间 && 收集讲课内容（群投票）\n- (群投票内容为讲课目录,投票最多的地方则为本次课需要讲解的地方，若无则按顺序讲解)\n- - 课前 中 选定课程目录后告知讲课内容\n- - 课前 后 Excel在线表格收集该内容的相关答疑问题，上课解决（答疑问题时本人必须在场）\n- (2) 课中（全程1小时左右/每次课）\n- - 课中 先 直播讲课（按照课前定好的目录）\n- - 课中 中 总结讲课内容\n- - 课中 后 直播答疑（按照课前Excel的收集表，课中弹幕出现的问题）--答疑期间可连麦可互动\n- (3) 课后 录制回放发群里，下次讲课时间再定（根据实际情况一周2次以上，上不封顶）\n- (4) 课后 每周群内某个时间段免费远程解决bug问题（可Todesk远程帮忙解决）\n- (5) 后续项目内容会逐渐完善，会员福利会逐渐更新补充，敬请期待\n\n### 3. 论文项目课程目录（每次直播回放视频会对应课程目录内容，提供索引供大家后期检索）\n\n    1. 搜论文的几种方式\n    1.1 谷歌学术 web of science IEEE Springer MDPI ScienceDirection 等等\n    1.2 一些技巧（查看不能看的论文等等）\n    2. 如何参考相关论文，关键字搜索--针对性找到自己想要的参考论文\n    3. 写论文的方法（每个部分的写作逻辑和模版）（①介绍 ②相关工作 ③方法 ④实验 ⑤结论）\n    3.1 介绍-------只需要确定好课题方向即可开写（实验部分先空着）\n    3.1.1 写作逻辑和思路讲解\n    3.1.2 怎么写，该写什么\n    3.2 相关工作---可能会涉及到数据集，基线模型，三个创新点方面的相关工作\n    3.2.1 写作逻辑和思路讲解\n    3.2.2 怎么写，该写什么\n    3.3 方法-------整体框架+三到四个创新点\n    3.3.0 写作逻辑和思路讲解\n    3.3.1 画图（从入门到接近顶会水平）\n    3.3.2 公式（如何写公式等等）\n    3.3.3 文字描述创新点（快有快的方法，慢有慢的方法）\n    3.4 实验\n    3.4.0 写作逻辑和思路讲解\n    3.4.1 表格（该做哪些实验，该放哪些指标，授人以鱼并且授人以渔）\n    3.5 摘要，结论\n    3.5.0 写作逻辑和思路讲解\n    3.5.1 总结性内容一次性搞清\n    3.6 参考文献\n    3.6.1 如何引用，引用格式\n    4. 投稿选择（会议 or 期刊）\n    4.1 EI论文\n    4.2 CCF论文\n    4.3 SCI论文---如何筛选自己适合投哪些期刊\n    4.4 中文核心 or 北大核心 or 学报 \n    5. 论文规范\n    5.1 审美，格式规范 \n    5.2 论文逻辑严谨\n    5.3 论文书写有说服力\n    5.4 投稿前先预审稿\n    6. 独特技巧经验，高效技巧（讲课过程中会随机穿插小技巧，不过多解释，懂的都懂）\n    7. 投稿前的一些准备工作，根据期刊等级帮忙查看是否达到发论文的要求（一对一范畴）\n    8. 硕士毕业大论文书写\n    9. 持续更新中........"
  },
  {
    "path": "yolo-improve/readme.md",
    "content": "# YOLO-Improve\n这个项目主要是提供一些关于yolo系列模型的改进思路，效果因数据集和参数而异，仅作参考。  \n\n\n# Explanation\n- **iou**  \n    添加EIOU，SIOU，ALPHA-IOU, FocalEIOU, Wise-IOU到yolov5,yolov8的box_iou中.  \n    1. yolov5\n        视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV1KM411b7Sz/).  \n        博客地址：[CSDN](https://blog.csdn.net/qq_37706472/article/details/128737484?spm=1001.2014.3001.5501).\n\n        #### 2023-2-8 更新: 新增[Wise-IoU](https://arxiv.org/abs/2301.10051) 视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV1tG4y1N7Gk/). reference:[github](https://github.com/Instinct323/wiou)  \n    2. yolov8\n        视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV1PY4y1o7Hm/).  \n        博客地址：[CSDN](https://blog.csdn.net/qq_37706472/article/details/128743012?spm=1001.2014.3001.5502).\n\n        #### 2023-2-7 更新: 新增[Wise-IoU](https://arxiv.org/abs/2301.10051) 视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV1De4y1N7Mb/). reference:[github](https://github.com/Instinct323/wiou)   \n- **yolov5-GFPN**   \n    使用DAMO-YOLO中的GFPN替换YOLOV5中的Head.  \n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV1iR4y1a7bx/).  \n- **yolov5-C2F**  \n    使用yolov8中的C2F模块替换yolov5中的C3模块.(这个操作比较简单，因此就不提供代码，直接看视频操作一下即可)  \n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV1rx4y1g7xt/).  \n- **yolov7-iou**  \n    添加EIOU，SIOU，ALPHA-IOU, FocalEIOU, Wise-IOU到yolov7的box_iou中.  \n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV1zx4y177EF/).  \n    博客地址：[CSDN](https://blog.csdn.net/qq_37706472/article/details/128780275?spm=1001.2014.3001.5502).  \n    #### 2023-2-11 更新: 新增[Wise-IoU](https://arxiv.org/abs/2301.10051) 视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV1yv4y147kf/). reference:[github](https://github.com/Instinct323/wiou)  \n- **yolov5-OTA**  \n    添加Optimal Transport Assignment到yolov5的Loss中.  \n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV1xD4y1J76n/).  \n- **yolov5-DCN**  \n    添加Deformable convolution V2到yolov5中.  \n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV1rT411Q76q/).  \n- **yolov8-DCN**  \n    添加Deformable convolution V2到yolov8中.  \n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV1Fo4y1i7Mm/).  \n- **yolov7-DCN**  \n    添加Deformable convolution V2到yolov7中.  \n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV17R4y1q7vr/).  \n- **yolov5-AUX**\n    添加辅助训练分支到yolov5中.  \n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV1Fo4y1v7bi/).  \n    原理参考链接：[知乎](https://zhuanlan.zhihu.com/p/588947172)\n- **CAM**  \n    添加context augmentation module到yolov5中.  \n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV17b411d7ef/).  \n    paper：[链接](https://openreview.net/pdf?id=q2ZaVU6bEsT)\n- **yolov5-SAConv**  \n    添加SAC到yolov5中.  \n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV1xD4y1u7NU/).  \n    paper：[链接](https://arxiv.org/pdf/2006.02334.pdf)  \n    reference: [链接](https://github.com/joe-siyuan-qiao/DetectoRS)\n- **yolov7-SAConv**  \n    添加SAC到yolov7中.  \n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV1xD4y1u7NU/).  \n    paper：[链接](https://arxiv.org/pdf/2006.02334.pdf)  \n    reference: [链接](https://github.com/joe-siyuan-qiao/DetectoRS)\n- **yolov5-CoordConv**  \n    添加CoordConv到yolov5中.  \n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV1ng4y1E7rS/).   \n    reference: [链接](https://blog.csdn.net/qq_35608277/article/details/125257225)\n- **yolov5-soft-nms**  \n    添加soft-nms(IoU,GIoU,DIoU,CIoU,EIoU,SIoU)到yolov5中.  \n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV1cM41147Ry/).  \n- **yolov7-CoordConv**  \n    添加CoordConv到yolov7中.  \n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV1K54y1g7ye/).   \n    reference: [链接](https://blog.csdn.net/qq_35608277/article/details/125257225)\n- **yolov7-soft-nms**  \n    添加soft-nms(IoU,GIoU,DIoU,CIoU,EIoU,SIoU)到yolov7中.  \n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV1ZY41167iC/). \n- **yolov5-DSConv**  \n    添加DSConv到yolov5中.  \n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV1iT411a7Mi/).   \n    paper: [链接](https://arxiv.org/abs/1901.01928)  \n    reference: [链接](https://github.com/ActiveVisionLab/DSConv)\n- **yolov7-DSConv**  \n    添加DSConv到yolov7中.  \n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV1724y1b7PD/).   \n    paper: [链接](https://arxiv.org/abs/1901.01928)  \n    reference: [链接](https://github.com/ActiveVisionLab/DSConv)\n- **yolov5-DCNV3**  \n    添加DCNV3到yolov5中.  \n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV1LY411z7iE/).   \n    补充事项-视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV1Dv4y1j7ij/).   \n    paper: [链接](https://arxiv.org/abs/2211.05778)  \n    reference: [链接](https://github.com/OpenGVLab/InternImage)  \n- **yolov5-NWD**  \n    添加Normalized Gaussian Wasserstein Distance到yolov5中.  \n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV1zY4y197UP/).   \n    paper: [链接](https://arxiv.org/abs/2110.13389)  \n    reference: [链接](https://github.com/jwwangchn/NWD)  \n- **yolov7-DCNV3**  \n    添加DCNV3到yolov7中.  \n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV1mk4y1h7us/).   \n    paper: [链接](https://arxiv.org/abs/2211.05778)  \n    reference: [链接](https://github.com/OpenGVLab/InternImage) \n- **yolov5-DecoupledHead**  \n    添加Efficient-DecoupledHead到yolov5中.  \n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV1mk4y1h7us/).   \n    paper: [yolov6链接](https://arxiv.org/pdf/2301.05586.pdf)  \n    reference: [链接](https://github.com/meituan/YOLOv6/blob/main/yolov6/models/effidehead.py) \n- **yolov5-FasterBlock**  \n    添加FasterNet中的Faster-Block到yolov5中.  \n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV1Bs4y1H7Ph/).   \n    paper: [链接](https://arxiv.org/abs/2303.03667)  \n    reference: [链接](https://github.com/JierunChen/FasterNet) \n- **yolov7-NWD**  \n    添加Normalized Gaussian Wasserstein Distance到yolov7中.  \n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV1kM411H7g1/).   \n    paper: [链接](https://arxiv.org/abs/2110.13389)  \n    reference: [链接](https://github.com/jwwangchn/NWD)\n- **yolov7-DecoupledHead**  \n    添加具有隐式知识学习的Efficient-DecoupledHead到yolov7中.  \n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV1tg4y1x7ha/).   \n    paper: [yolov6链接](https://arxiv.org/pdf/2301.05586.pdf) [yolor链接](https://arxiv.org/abs/2105.04206) [yolor参考博客](https://blog.csdn.net/AaronYKing/article/details/123804988)  \n    reference: [链接](https://github.com/meituan/YOLOv6/blob/main/yolov6/models/effidehead.py) \n- **yolov5-backbone**  \n    添加Timm支持的主干到yolov5中.  \n    需要安装timm库. 命令: pip install -i https://pypi.tuna.tsinghua.edu.cn/simple timm  \n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV1Mx4y1A7jy/).   \n    reference: [链接](https://github.com/huggingface/pytorch-image-models#:~:text=I%20missed%20anything.-,Models,-All%20model%20architecture)\n- **yolov7-PConv**  \n    添加FasterNet中的PConv到yolov7中.  \n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV1Z84y137oi/).   \n    paper: [链接](https://arxiv.org/abs/2303.03667)  \n    reference: [链接](https://github.com/JierunChen/FasterNet) \n- **yolov5-TSCODE**  \n    添加Task-Specific Context Decoupling到yolov5中.  \n    需要安装einops库. 命令: pip install -i https://pypi.tuna.tsinghua.edu.cn/simple einops  \n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV1mk4y1h7us/).   \n    paper: [yolov6链接](https://arxiv.org/pdf/2303.01047v1.pdf)  \n- **yolov5-backbone/fasternet**  \n    添加FasterNet主干到yolov5中.  \n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV1ra4y1K77u/).   \n    reference: [链接](https://github.com/JierunChen/FasterNet)\n- **yolov5-backbone/ODConv**  \n    添加Omni-Dimensional Dynamic Convolution主干(od_mobilenetv2,od_resnet)到yolov5中.  \n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV1Jk4y1v7EW/).   \n    reference: [链接](https://github.com/OSVAI/ODConv)  \n- **yolov5-backbone/ODConvFuse**  \n    融合Omni-Dimensional Dynamic Convolution主干(od_mobilenetv2,od_resnet)中的Conv和BN.  \n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV1Rs4y1N7fp/).   \n- **yolov5-CARAFE**  \n    添加轻量级上采样算子CARAFE到yolov5中.  \n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV1kj411c72a/).  [yolov7修改视频-哔哩哔哩](https://www.bilibili.com/video/BV1yc411p7wL/).  \n    reference: [链接](https://github.com/XiaLiPKU/CARAFE)  \n- **yolov5-EVC**  \n    添加CFPNet中的EVC-Block到yolov5中.  \n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV1Pg4y1u7cM/).  \n    reference: [链接](https://github.com/QY1994-0919/CFPNet)  \n- **yolov5-dyhead**  \n    添加基于注意力机制的目标检测头(DYHEAD)到yolov5中.  \n    yolov7版本: [哔哩哔哩](https://www.bilibili.com/video/BV1Ph4y1s7i9/).  \n    安装命令:\n\n        pip install -U openmim\n        mim install mmengine\n        mim install \"mmcv>=2.0.0\"\n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV1qs4y117Mx/).  \n    reference: [链接](https://github.com/open-mmlab/mmdetection/blob/main/mmdet/models/necks/dyhead.py)  \n    paper: [链接](https://arxiv.org/abs/2106.08322)  \n- **yolov5-backbone/inceptionnext**  \n    添加(2023年New)InceptionNeXt主干到yolov5中.  \n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV12v4y1H7E1/).   \n    reference: [链接](https://github.com/sail-sg/inceptionnext)  \n    paper: [链接](https://arxiv.org/pdf/2303.16900.pdf)  \n- **yolov5-aLRPLoss**  \n    添加aLRPLoss到yolov5中.  \n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV1YV4y1Z7rV/).     \n    reference: [链接](https://github.com/kemaloksuz/aLRPLoss)  \n    paper: [链接](https://arxiv.org/abs/2009.13592)  \n- **yolov5-res2block**  \n    结合Res2Net提出具有多尺度提取能力的C3模块.  \n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV13X4y167VB/).     \n    reference: [链接](https://github.com/Res2Net/Res2Net-PretrainedModels)  \n    paper: [链接](https://arxiv.org/pdf/1904.01169.pdf)  \n- **yolov7-odconv**  \n    添加Omni-Dimensional Dynamic Convolution到yolov7中.  \n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV1vh411j71Z/).     \n    reference: [链接](https://github.com/OSVAI/ODConv)  \n- **yolov5-backbone/FocalNet**  \n    添加(2022年)FocalNet(transformer)主干到yolov5中.  \n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV1ch411L7Dk/).   \n    reference: [链接](https://github.com/microsoft/FocalNet)  \n    paper: [链接](https://arxiv.org/abs/2203.11926)  \n- **yolov5-backbone/EMO**  \n    添加(2023年)EMO(transformer)主干到yolov5中.  \n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV1Dh4y1J7SV/).   \n    reference: [链接](https://github.com/zhangzjn/EMO)  \n    paper: [链接](https://arxiv.org/pdf/2301.01146.pdf)  \n- **yolov5-backbone/EfficientFormerV2**  \n    添加(2022年)EfficientFormerV2(transformer)主干到yolov5中.  \n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV1da4y1g7KT/).   \n    reference: [链接](https://github.com/snap-research/EfficientFormer)  \n    paper: [链接](https://arxiv.org/pdf/2212.08059.pdf)  \n    weight_download: [百度网盘链接](https://pan.baidu.com/s/1I0Ygc3-6fNf2LdIJe290kw?pwd=yvc8)\n- **yolov5-backbone/PoolFormer**  \n    添加(2022年CVPR)PoolFormer(transformer)主干到yolov5中.  \n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV1eh411c7bz/).   \n    reference: [链接](https://github.com/sail-sg/poolformer)  \n    paper: [链接](https://arxiv.org/abs/2111.11418)  \n- **yolov5-backbone/EfficientViT**  \n    添加(2023年)EfficientViT(transformer)主干到yolov5中.  \n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV1xk4y1L7Gu/).   \n    reference: [链接](https://github.com/mit-han-lab/efficientvit)  \n    paper: [链接](https://arxiv.org/abs/2205.14756)  \n    weight_download: [百度网盘链接](https://pan.baidu.com/s/1dvwuQQBnRCr7aGReY8IEVw?pwd=74ad)\n- **yolov5-ContextAggregation**  \n    添加ContextAggregation到yolov5中.  \n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV1Yk4y1s7Kx/).     \n    reference: [链接](https://github.com/yeliudev/CATNet)  \n    paper: [链接](https://arxiv.org/abs/2111.11057)  \n- **yolov5-backbone/VanillaNet**  \n    添加(2023年)VanillaNet主干到yolov5中.  \n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV1os4y1v7Du/).   \n    reference: [链接](https://github.com/huawei-noah/VanillaNet)  \n    paper: [链接](https://arxiv.org/abs/2305.12972)  \n    weight_download: [百度网盘链接](https://pan.baidu.com/s/1EBAiOtDVMhvQqu2NWoFSIg?pwd=ofx9)  \n- **yolov7-EVC**  \n    添加CFPNet中的EVC-Block到yolov7中.  \n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV12u4y1f7np/).  \n    reference: [链接](https://github.com/QY1994-0919/CFPNet)  \n- **yolov7-head**  \n    P2,P6检测层在YOLOV7中的添加.  \n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV1LX4y1a72m/).  \n- **yolov7-slimneck**  \n    使用VOVGSCSP轻量化yolov7的Neck.  \n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV14m4y147PC/).  \n    reference: [链接](https://github.com/AlanLi1997/slim-neck-by-gsconv)  \n- **yolov5-SwinTransformer**  \n    添加SwinTransformer-Tiny主干到yolov5中.  \n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV1WX4y1a7ea/).  \n    reference: [链接](https://github.com/microsoft/Swin-Transformer)  \n    weight_download: [SwinTransformer-Tiny百度云链接](https://pan.baidu.com/s/1vct0VYwwQQ8PYkBjwSSBZQ?pwd=swin)  \n- **yolov5-NextViT**  \n    添加(2022年)NextViT主干到yolov5中.  \n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV1im4y1i7Ht/).  \n    reference: [链接](https://github.com/bytedance/Next-ViT)  \n    weight_download: [百度云链接](https://pan.baidu.com/s/18IHKssf9kN8Ej7zIWBKfcw?pwd=houj)  \n- **yolov5-ConvNextV2**  \n    添加(2023年)ConvNextV2主干到yolov5中.  \n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV1es4y1e7b9/).  \n    reference: [链接](https://github.com/facebookresearch/ConvNeXt-V2)  \n- **yolov5-RIFormer**  \n    添加(2023年)RIFormer主干到yolov5中.  \n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV1bW4y1X7Lo/).  \n    reference: [mmpretrain链接](https://github.com/open-mmlab/mmpretrain/blob/main/mmpretrain/models/backbones/riformer.py)  \n    weight_download: [mmpretrain链接](https://github.com/open-mmlab/mmpretrain/tree/main/configs/riformer)\n- **yolov5-C3RFEM**  \n    Scale-Aware RFE与C3结合而成的C3RFEM添加到yolov5中.  \n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV1Gj411D7Pf/).  \n    reference: [链接](https://github.com/Krasjet-Yu/YOLO-FaceV2)  \n- **yolov7-RFEM**  \n    Scale-Aware RFE添加到yolov7中.  \n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV1hW4y1D7gQ/).  \n    reference: [链接](https://github.com/Krasjet-Yu/YOLO-FaceV2)  \n- **yolov5-DBB**  \n    把重参数结构DiverseBranchBlock与C3融合成C3-DBB添加到yolov5中.  \n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV1sM4y177Cn/).  \n    reference: [链接](https://github.com/DingXiaoH/DiverseBranchBlock)  \n- **yolov7-DBB**  \n    把重参数结构DiverseBranchBlock添加到yolov7中.  \n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV14u411b7kL/).  \n    reference: [链接](https://github.com/DingXiaoH/DiverseBranchBlock)  \n- **yolov5-backbone/CVPR2023-EfficientViT**  \n    添加(2023CVPR)EfficientViT(transformer)主干到yolov5中.  \n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV1xk4y1L7Gu/).   \n    reference: [链接](https://github.com/microsoft/Cream/tree/main/EfficientViT)  \n    paper: [链接](https://arxiv.org/pdf/2305.07027.pdf)  \n    weight: [github链接](https://github.com/xinyuliu-jeffrey/EfficientViT_Model_Zoo/releases/tag/v1.0)\n- **yolov5-backbone/LSKNet**  \n    添加(2023旋转目标检测SOTA)LSKNet主干到yolov5中.  \n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV1xk4y1L7Gu/).   \n    reference: [链接](https://github.com/zcablii/LSKNet)  \n    paper: [链接](https://arxiv.org/pdf/2303.09030.pdf)  \n- **yolov5-MPDiou**  \n    添加(2023最新IoU度量算法)MPDiou到yolov5中.(视频教学地址中为详细从头手把手教学,因此本项没有提供代码)  \n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV19P41147gJ/).   \n    paper: [链接](https://arxiv.org/pdf/2307.07662v1.pdf)  \n- **yolov7-MPDiou**  \n    添加(2023最新IoU度量算法)MPDiou到yolov7中.  \n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV1Qh4y1r7D3/).   \n    paper: [链接](https://arxiv.org/pdf/2307.07662v1.pdf)  \n- **yolov5-SlideLoss**  \n    添加Yolo-Face-V2中SlideLoss的到yolov5中.  \n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV1W14y1i79U/).    \n    reference: [链接](https://github.com/Krasjet-Yu/YOLO-FaceV2/blob/master/utils/loss.py)  \n    paper: [链接](https://arxiv.org/abs/2208.02019)  \n- **yolov5-backbone/CVPR2023-RepViT**  \n    添加RepViT(transformer)主干到yolov5中.  \n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV1PH4y1S7mf/).   \n    reference: [链接](https://github.com/THU-MIG/RepViT)  \n    paper: [链接](https://arxiv.org/abs/2307.09283)  \n- **yolov5-GOLDYOLO**  \n    利用华为2023最新GOLD-YOLO中的Gatherand-Distribute进行改进YOLOV5中的特征融合模块.  \n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV1PH4y1S7mf/).   \n    reference: [链接](https://github.com/huawei-noah/Efficient-Computing/tree/master/Detection/Gold-YOLO)  \n    paper: [链接](https://arxiv.org/abs/2309.11331)  \n- **yolov7-GOLDYOLO(文件在yolov5-GOLDYOLO的文件夹中)**  \n    利用华为2023最新GOLD-YOLO中的Gatherand-Distribute进行改进YOLOV7中的特征融合模块.  \n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV14V411c7H1/).   \n    reference: [链接](https://github.com/huawei-noah/Efficient-Computing/tree/master/Detection/Gold-YOLO)  \n    paper: [链接](https://arxiv.org/abs/2309.11331)  \n- **yolov5-DySnakeConv**  \n    利用动态蛇形卷积改进YOLOV5.  \n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV1Qu411K7Hw/).   \n    reference: [链接](https://github.com/YaoleiQi/DSCNet)  \n    paper: [链接](https://arxiv.org/abs/2307.08388)  \n- **yolov7-DySnakeConv**  \n    利用动态蛇形卷积改进YOLOV7.  \n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV1Wj411x7fq/).   \n    reference: [链接](https://github.com/YaoleiQi/DSCNet)  \n    paper: [链接](https://arxiv.org/abs/2307.08388)  \n- **yolov5-AIFI**  \n    利用带有位置信息编码的AIFI自注意力机制改进YOLOV5.  \n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV1nu4y1h7eS/).   \n    reference: [链接](https://github.com/lyuwenyu/RT-DETR)  \n    paper: [链接](https://arxiv.org/pdf/2304.08069.pdf)  \n- **yolov7-AIFI**  \n    利用带有位置信息编码的AIFI自注意力机制改进YOLOV7.  \n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV1rj411a7s4/).   \n    reference: [链接](https://github.com/lyuwenyu/RT-DETR)  \n    paper: [链接](https://arxiv.org/pdf/2304.08069.pdf)  \n- **yolov5-backbone/UniRepLKNet**  \n    添加UniRepLKNet主干到yolov5中.  \n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV1PH4y1S7mf/).   \n    reference: [链接](https://github.com/AILab-CVC/UniRepLKNet)  \n    paper: [链接](https://arxiv.org/abs/2311.15599)  \n    weights-download: [百度云链接](https://pan.baidu.com/s/1Gk48Xa6cWKAVJgsF5cqk1g?pwd=b55v)\n- **yolov5-asf** \n    添加Attentional Scale Sequence Fusion到yolov5中.\n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV1kN411V7VZ/).   \n    reference: [链接](https://github.com/mkang315/ASF-YOLO)  \n    paper: [链接](https://arxiv.org/abs/2312.06458)  \n- **yolov5-ccfm**\n    添加cross-scale feature-fusion到yolov5中.\n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV1Tb4y1P7yd/).   \n    reference: [链接](https://github.com/ultralytics/ultralytics)  \n    paper: [链接](https://arxiv.org/pdf/2304.08069.pdf)  \n- **yolov7-asf** \n    添加Attentional Scale Sequence Fusion到yolov7中.\n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV1PH4y1S7mf/).   \n    reference: [链接](https://github.com/mkang315/ASF-YOLO)  \n    paper: [链接](https://arxiv.org/abs/2312.06458)  \n- **yolov5-RepNCSPELAN**\n    添加yolov9中的RepNCSPELAN到yolov5中.\n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV17y421z73k/).   \n    reference: [链接](https://github.com/WongKinYiu/yolov9)  \n    paper: [链接](https://arxiv.org/abs/2402.13616)\n- **yolov7-RepNCSPELAN**\n    添加yolov9中的RepNCSPELAN到yolov7中.\n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV1UA4m137hz/).   \n    reference: [链接](https://github.com/WongKinYiu/yolov9)  \n    paper: [链接](https://arxiv.org/abs/2402.13616)\n- **yolov9-backbone**  \n    添加各种backbone到yolov9中.  \n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV1Ax4y1B7Ln/).   \n- **yolov5-backbone/CVPR2024-StarNet**  \n    添加CVPR2024-StarNet到yolov5、yolov7、yolov9中.  \n    视频教学地址：[哔哩哔哩](https://www.bilibili.com/video/BV1Ax4y1B7Ln/).   "
  },
  {
    "path": "yolo-improve/rtdetr-compress.md",
    "content": "# RTDETR剪枝项目介绍\n\n## 对于群里的剪枝相关问题,我基本都会回复,对于一些剪枝问题,我都会给出建议。  \n\n### 首先剪枝是什么？  \n模型剪枝是深度学习中的一种技术，旨在通过减少神经网络中不必要的参数和连接，来优化模型的效率和性能。模型剪枝可以分为结构剪枝和参数剪枝两种类型。  \n\n### 为什么需要剪枝？  \n剪枝可以很好地衡量模型轻量化程度与精度的关系,是替换轻量化结构完全没办法比的,比如我模型剪枝可以压缩百分之30的计算量,精度只下降了百分之1,但是你通过换模块来达到压缩百分之30的计算量,一般时间就会变长,因为大部分轻量化模块都是由时间换空间,而且精度还会下降得比较多,但是剪枝可以很好地避免这个问题.\n\n### 目前剪枝项目包含以下剪枝方法：\n1. L1 \n2. Random \n3. Slim(需要稀疏训练)\n4. GroupSlim(需要稀疏训练)\n5. GroupNorm \n6. LAMP \n7. GroupSL(需要稀疏训练)\n8. GroupReg(需要稀疏训练)\n9. GroupHessian\n10. GroupTaylor\n\n# 对于RTDETR模型，稀疏训练比较难成功，就算能稀疏到模型，掉的精度都比较多，所以我不建议各位使用需要稀疏训练的方法去剪枝，本身RTDETR的训练速度就比较慢，稀疏训练会更加慢一点，所以买剪枝的目的之一一定要需要稀疏训练的方法，那你慎入！！！！！\n\n### 其中prune系列还有一些细节：\n1. 支持设定加速比例，模型会进行自动压缩，压缩到指定比例或者达到最大压缩次数后会自动进入finetune。\n\n### 剪枝的一些顾虑\n大家关心最多的一个问题就是，我的结构能不能剪之类的，剪枝对模型复杂度的要求比较高，目前剪枝都是基于Torch_Pruning库进行剪枝，prune系列的可以跳过一些不能剪枝的层(某些复杂的结构可能在构建动态图的时候失败,这些就只能换结构)，这个项目会有比较多的示例和视频教程教大家如何去剪自己的结构,注意点在哪里等等。这个剪枝项目是没办法保证所有的结构都能剪，有一定的风险，是否入手请自行考虑！\n\n### 那些人群建议入手剪枝\n1. 原始的算法精度很高,没办法再提升精度,只能走轻量化路线,这种建议配合一些轻量化模块+剪枝来增加你的工作量和创新度.\n2. 需要部署到嵌入式或者手机端等低算力设备,这类本身模型就不能太复杂,而且以轻量化为主,剪枝是非常适合的.\n3. 以后需从事深度学习方面的工作,模型轻量化(蒸馏、量化、剪枝)基本是必须要会的技能.\n\n### RTDETR相关实验 GPU-Device:RTX4090D (以下Model Size为x的实验为我当时记录的数据有点错误，因此直接略掉)\n#### Dataset:VisDrone2019 Model:RTDETR-R18\n| model | Parameters | GFLOPs | Model Size | mAP50 | mAP50-95 | Inference Time(bs:8) |\n| :----: | :----: | :----: | :----: | :----: | :----: | :----: |\n| BaseLine | 19,884,600 | 57.0 | x | 0.377 | 0.219 | 0.00305s |\n| LAMP exp1 | 13,458,528(67.7%) | 36.6(64.2%) | x | 0.356(-0.021) | 0.205(-0.014) | 0.00247s(81%) |\n| LAMP exp2 | 12,279,364(61.7%) | 32.9(57.7%) | x | 0.347(-0.030) | 0.199(-0.020) | 0.00242s(79%) |\n| LAMP exp3 | 15,729,152(79.1%) | 43.6(76.5%) | x | 0.366(-0.011) | 0.211(-0.008) | 0.00277s(91%) |\n| LAMP exp4 | 14,321,866(72.0%) | 39.1(68.6%) | x | 0.363(-0.014) | 0.21(-0.009) | 0.00260s(85%) |\n\n#### Dataset:CrowdHuman Model:RTDETR-R18\n| model | Parameters | GFLOPs | Model Size | mAP50 | mAP50-95 | Inference Time(bs:8) |\n| :----: | :----: | :----: | :----: | :----: | :----: | :----: |\n| BaseLine | 19,874,328 | 56.9 | x | 0.848 | 0.552 | 0.00306s |\n| LAMP exp1 | 14,311,594(72.0%) | 39.1(68.7%) | x | 0.837(-0.011) | 0.543(-0.009) | 0.00259s(85%) |\n\n#### Dataset:Seaship 20%Training Data Model:RTDETR-R18\n| model | Parameters | GFLOPs | Model Size | mAP50 | mAP50-95 | Inference Time(bs:8) |\n| :----: | :----: | :----: | :----: | :----: | :----: | :----: |\n| BaseLine | 19,879,464 | 57.0 | x | 0.951 | 0.73 | 0.00304s |\n| LAMP | 7,091,768(35.7%) | 32.1(56.3%) | x | 0.934(-0.017) | 0.73(+0.000) | 0.00239s(79%) |\n| L1 | 7,712,000(38.8%) | 33.1(58.1%) | x | 0.935(-0.016) | 0.739(+0.009) | 0.00239s(79%) |\n| GROUP_TAYLOR | 1,3160,368(66.2%) | 31.9(55.9%) | x | 0.942(-0.009)\t | 0.734(+0.004) | 0.00212s(70%) |\n| GRAOUP_NORM | 9,752,072(49.0%) | 31.7(55.6%) | x | 0.951(0.000) | 0.74(+0.010) | 0.00228s(75%) |\n| GRAOUP_HESSIAN | 11,405,392(57.4%) | 31.5(55.3%) | x | 0.94(-0.011) | 0.746(+0.016) | 0.00225s(74%) |"
  },
  {
    "path": "yolo-improve/rtdetr-distill.md",
    "content": "# RTDETR蒸馏项目介绍\n\n### 首先蒸馏是什么？  \n模型蒸馏（Model Distillation）是一种用于在计算机视觉中提高模型性能和效率的技术。在模型蒸馏中，通常存在两个模型，即“教师模型”和“学生模型”。\n\n### 为什么需要蒸馏？  \n1. 在不增加模型计算量和参数量的情况下提升精度，也即是可以无损提高精度。\n2. 论文中的保底手段，因为蒸馏的特殊性，其都不会增加参数量和计算量，可以在最后一个点上大幅度增加实验和工作量，因为本身蒸馏也需要做大量实验。\n3. 如果在模型改进过程中进行了轻量化，但是精度降低得有点多，可以尝试使用知识蒸馏来弥补轻量化带来的精度丢失问题。\n\n### 目前蒸馏方法包含：\n1. Logical\n    1. RTDETRLogicLoss(根据rtdetr的特点进行开发的逻辑蒸馏)\n    2. RTDETRMutilLogicLoss(根据rtdetr的特点进行开发的逻辑蒸馏)\n2. Feature\n    1. [Mimic](https://openaccess.thecvf.com/content_cvpr_2017/papers/Li_Mimicking_Very_Efficient_CVPR_2017_paper.pdf)\n    2. [Masked Generative Distillation](https://link.zhihu.com/?target=https%3A//arxiv.org/pdf/2205.01529.pdf) (ECCV 2022)\n    3. [Channel-wise Distillation](https://arxiv.org/pdf/2011.13256.pdf) (ICCV 2021)\n    4. [ChSimLoss Distillation](https://openaccess.thecvf.com/content/ICCV2021/html/Liu_Exploring_Inter-Channel_Correlation_for_Diversity-Preserved_Knowledge_Distillation_ICCV_2021_paper.html) (ICCV2021)\n    5. [SPKDLoss Distillation](https://arxiv.org/pdf/1907.09682.pdf) (ICCV2019)\n\n### 知识蒸馏的一些细节(具体项目会提供视频讲解)\n1. Feature蒸馏可以自定义选择层进行蒸馏.\n2. 蒸馏损失支持常数,线性,余弦进行动调整.\n3. 支持Logical和Feature一起使用.\n4. 过程中会输出Logical和Feature的损失,让用户可以及时调整对应的损失系数.\n5. 支持正常训练模型时候进行蒸馏和剪枝后finetune蒸馏.\n6. 支持自蒸馏.\n7. 可以利用知识蒸馏压缩模型.\n\n# 实验示例结果.(以下示例实验相关命令,视频教程,实验数据都在项目里面)\n#### Dataset:Visdrone(训练集只用了2500张图,验证集和测试集用了全量的数据) 为了加速实验,老师选择了yolov8s-detr,学生选择了yolov8n-detr\n\n| model | GFLOPs | mAP50(test set) | mAP50-95(test set) |\n| :----: | :----: | :----: | :----: |\n| yolov8n-detr | 11.7 | 0.266 | 0.146 |\n| yolov8s-detr | 27.3 | 0.286 | 0.161 |\n| yolov8n-detr logloss exp1 | 11.7 | 0.272(+0.006) | 0.153(+0.007) |\n| yolov8n-detr logloss exp2 | 11.7 | 0.278(+0.012) | 0.157(+0.011) |\n| yolov8n-detr logloss exp3 | 11.7 | 0.271(+0.005) | 0.154(+0.008) |\n| yolov8n-detr logloss exp4 | 11.7 | 0.282(+0.016) | 0.160(+0.014) |\n| yolov8n-detr cwd exp1 | 11.7 | 0.255(-0.011) | 0.139(-0.007) |\n| yolov8n-detr cwd exp2 | 11.7 | 0.267(+0.001) | 0.148(+0.002) |\n| yolov8n-detr cwd exp3 | 11.7 | 0.268(+0.002) | 0.149(+0.003) |\n| yolov8n-detr cwd exp4 | 11.7 | 0.261(-0.005) | 0.146(0.000) |\n| yolov8n-detr cwd exp5 | 11.7 | 0.266(0.000) | 0.147(+0.001) |\n| yolov8n-detr cwd exp6 | 11.7 | 0.264(-0.002) | 0.146(0.000) |\n| yolov8n-detr cwd exp7 | 11.7 | 0.260(-0.006) | 0.144(-0.002) |\n| yolov8n-detr cwd exp8 | 11.7 | 0.268(+0.002) | 0.148(+0.002) |\n| yolov8n-detr cwd exp9 | 11.7 | 0.269(+0.003) | 0.149(+0.003) |\n| yolov8n-detr cwd exp10 | 11.7 | 0.267(+0.001) | 0.147(+0.001) |\n| yolov8n-detr cwd exp11 | 11.7 | 0.257(-0.009) | 0.141(-0.005) |\n| yolov8n-detr mgd exp1 | 11.7 | 0.271(+0.005) | 0.152(+0.006) |\n| yolov8n-detr mgd exp2 | 11.7 | 0.265(-0.001) | 0.148(+0.002) |\n| yolov8n-detr mgd exp3 | 11.7 | 0.269(+0.003) | 0.150(+0.004) |\n| yolov8n-detr mgd exp4 | 11.7 | 0.265(-0.001) | 0.147(+0.001) |\n| yolov8n-detr mgd exp5 | 11.7 | 0.264(-0.002) | 0.146(0.000) |\n| yolov8n-detr mgd exp6 | 11.7 | 0.270(+0.004) | 0.151(+0.005) |\n| yolov8n-detr mgd exp7 | 11.7 | 0.260(-0.006) | 0.145(-0.001) |\n| yolov8n-detr mgd exp8 | 11.7 | 0.271(+0.005) | 0.152(+0.006) |\n| yolov8n-detr shsim exp1 | 11.7 | 0.264(-0.002) | 0.147(+0.001) |\n| yolov8n-detr shsim exp2 | 11.7 | 0.266(0.000) | 0.148(+0.002) |\n| yolov8n-detr shsim exp3 | 11.7 | 0.260(-0.006) | 0.143(-0.003) |\n| yolov8n-detr spkd exp1 | 11.7 | 0.259(-0.007) | 0.143(-0.003) |\n| yolov8n-detr spkd exp2 | 11.7 | 0.256(-0.010) | 0.142(-0.004) |\n| yolov8n-detr spkd exp3 | 11.7 | 0.262(-0.004) | 0.145(-0.001) |\n| yolov8n-detr logloss-mgd exp1 | 11.7 | 0.277(+0.011) | 0.157(+0.011) |\n| yolov8n-detr logloss-cwd exp1 | 11.7 | 0.274(+0.008) | 0.151(+0.005) |\n| yolov8n-detr logloss-cwd exp2 | 11.7 | 0.272(+0.006) | 0.153(+0.007) |"
  },
  {
    "path": "yolo-improve/rtdetr-project.md",
    "content": "# [基于Ultralytics的RT-DETR改进详细介绍](https://github.com/z1069614715/objectdetection_script)\n\n# 目前自带的一些改进方案(目前拥有合计320+个改进点！持续更新！)\n\n# 为了感谢各位对RTDETR项目的支持,本项目的赠品是yolov5-PAGCP通道剪枝算法.[具体使用教程](https://www.bilibili.com/video/BV1yh4y1Z7vz/)\n\n# 自带的一些文件说明\n1. train.py\n    训练模型的脚本\n2. main_profile.py\n    输出模型和模型每一层的参数,计算量的脚本(rtdetr-l和rtdetr-x因为thop库的问题,没办法正常输出每一层的参数和计算量和时间)\n3. val.py\n    使用训练好的模型计算指标的脚本\n4. detect.py\n    推理的脚本\n5. track.py\n    跟踪推理的脚本\n6. heatmap.py\n    生成热力图的脚本\n7. get_FPS.py\n    计算模型储存大小、模型推理时间、FPS的脚本\n8. get_COCO_metrice.py\n    计算COCO指标的脚本\n9. plot_result.py\n    绘制曲线对比图的脚本\n10. get_model_erf.py\n    绘制模型的有效感受野.[视频链接](https://www.bilibili.com/video/BV1Gx4y1v7ZZ/)\n11. export.py\n    导出模型脚本\n12. test_env.py\n    验证一些需要编译的或者难安装的(mmcv)是否成功的代码.[百度云链接](https://pan.baidu.com/s/1sWwvN4UC3blBRVe1twrJAg?pwd=bru5)\n13. get_all_yaml_param_and_flops.py\n    计算所有yaml的计算量并排序.[百度云链接](https://pan.baidu.com/s/1ZDzglU7EIzzfaUDhAhagBA?pwd=kg8k)\n\n# RT-DETR基准模型\n\n1. ultralytics/cfg/models/rt-detr/rtdetr-r18.yaml(有预训练权重COCO+Objects365,来自RTDETR-Pytorch版本的移植)\n\n    rtdetr-r18 summary: 421 layers, 20184464 parameters, 20184464 gradients, 58.6 GFLOPs\n2. ultralytics/cfg/models/rt-detr/rtdetr-r34.yaml(有预训练权重COCO,来自RTDETR-Pytorch版本的移植)\n\n    rtdetr-r34 summary: 525 layers, 31441668 parameters, 31441668 gradients, 90.6 GFLOPs\n3. ultralytics/cfg/models/rt-detr/rtdetr-r50-m.yaml(有预训练权重COCO,来自RTDETR-Pytorch版本的移植)\n\n    rtdetr-r50-m summary: 637 layers, 36647020 parameters, 36647020 gradients, 98.3 GFLOPs\n4. ultralytics/cfg/models/rt-detr/rtdetr-r50.yaml(有预训练权重COCO+Objects365,来自RTDETR-Pytorch版本的移植)\n\n    rtdetr-r50 summary: 629 layers, 42944620 parameters, 42944620 gradients, 134.8 GFLOPs\n5. ultralytics/cfg/models/rt-detr/rtdetr-r101.yaml\n\n    rtdetr-r101 summary: 867 layers, 76661740 parameters, 76661740 gradients, 257.7 GFLOPs\n6. ultralytics/cfg/models/rt-detr/rtdetr-l.yaml(有预训练权重)\n\n    rtdetr-l summary: 673 layers, 32970732 parameters, 32970732 gradients, 108.3 GFLOPs\n7. ultralytics/cfg/models/rt-detr/rtdetr-x.yaml(有预训练权重)\n\n    rtdetr-x summary: 867 layers, 67468108 parameters, 67468108 gradients, 232.7 GFLOPs\n# 专栏改进汇总\n\n### 二次创新系列\n1. ultralytics/cfg/models/rt-detr/rtdetr-DCNV2-Dynamic.yaml\n\n    使用自研可变形卷积DCNV2-Dynamic改进resnet18-backbone中的BasicBlock.(详细介绍请看百度云视频-MPCA与DCNV2_Dynamic的说明)\n2. ultralytics/cfg/models/rt-detr/rtdetr-iRMB-Cascaded.yaml\n\n    使用[EfficientViT CVPR2023](https://github.com/microsoft/Cream/tree/main/EfficientViT)中的CascadedGroupAttention对[EMO ICCV2023](https://github.com/zhangzjn/EMO)中的iRMB进行二次创新来改进resnet18-backbone中的BasicBlock.(详细介绍请看百度云视频-20231119更新说明)\n3. ultralytics/cfg/models/rt-detr/rtdetr-PConv-Rep.yaml\n\n    使用[RepVGG CVPR2021](https://github.com/DingXiaoH/RepVGG)中的RepConv对[FasterNet CVPR2023](https://github.com/JierunChen/FasterNet)中的PConv进行二次创新后改进resnet18-backbone中的BasicBlock.\n4. ultralytics/cfg/models/rt-detr/rtdetr-Faster-Rep.yaml\n\n    使用[RepVGG CVPR2021](https://github.com/DingXiaoH/RepVGG)中的RepConv对[FasterNet CVPR2023](https://github.com/JierunChen/FasterNet)中的Faster-Block进行二次创新后改进resnet18-backbone中的BasicBlock.\n5. ultralytics/cfg/models/rt-detr/rtdetr-Faster-EMA.yaml\n\n    使用[EMA ICASSP2023](https://arxiv.org/abs/2305.13563v1)对[FasterNet CVPR2023](https://github.com/JierunChen/FasterNet)中的Faster-Block进行二次创新后改进resnet18-backbone中的BasicBlock.\n6. ultralytics/cfg/models/rt-detr/rtdetr-Faster-Rep-EMA.yaml\n    \n    使用[RepVGG CVPR2021](https://github.com/DingXiaoH/RepVGG)中的RepConv和[EMA ICASSP2023](https://arxiv.org/abs/2305.13563v1)对[FasterNet CVPR2023](https://github.com/JierunChen/FasterNet)中的Faster-Block进行二次创新后改进resnet18-backbone中的BasicBlock.\n7. ultralytics/cfg/models/rt-detr/rtdetr-DWRC3-DRB.yaml\n\n    使用[UniRepLKNet](https://github.com/AILab-CVC/UniRepLKNet/tree/main)中的DilatedReparamBlock对[DWRSeg](https://arxiv.org/abs/2212.01173)中的Dilation-wise Residual(DWR)进行二次创新改进rtdetr.\n8. ultralytics/cfg/models/rt-detr/rtdetr-ASF-P2.yaml\n\n    在ultralytics/cfg/models/rt-detr/rtdetr-ASF.yaml的基础上进行二次创新，引入P2检测层并对网络结构进行优化.\n9. ultralytics/cfg/models/rt-detr/rtdetr-slimneck-ASF.yaml\n\n    使用[SlimNeck](https://github.com/AlanLi1997/slim-neck-by-gsconv)中的VoVGSCSP\\VoVGSCSPC和GSConv和[ASF-YOLO](https://github.com/mkang315/ASF-YOLO)中的Attentional Scale Sequence Fusion改进rtdetr中的CCFM.\n10. ultralytics/cfg/models/rt-detr/rtdetr-goldyolo-asf.yaml\n\n    利用华为2023最新GOLD-YOLO中的Gatherand-Distribute和[ASF-YOLO](https://github.com/mkang315/ASF-YOLO)中的Attentional Scale Sequence Fusion进行改进特征融合模块.\n11. ultralytics/cfg/models/rt-detr/rtdetr-HSPAN.yaml\n\n    对[MFDS-DETR](https://github.com/JustlfC03/MFDS-DETR)中的HS-FPN进行二次创新后得到HSPAN改进RTDETR中的CCFM.\n12. ultralytics/cfg/models/rt-detr/rtdetr-ASF-Dynamic.yaml\n\n    使用[ICCV2023 DySample](https://arxiv.org/abs/2308.15085)改进[ASF-YOLO](https://github.com/mkang315/ASF-YOLO)中的Attentional Scale Sequence Fusion的上采样模块得到Dynamic Sample Attentional Scale Sequence Fusion改进CCFM.\n13. ultralytics/cfg/models/rt-detr/rtdetr-iRMB-DRB.yaml\n\n    使用[UniRepLKNet](https://github.com/AILab-CVC/UniRepLKNet/tree/main)中的DilatedReparamBlock对[EMO ICCV2023](https://github.com/zhangzjn/EMO)中的iRMB进行二次创新来改进resnet18-backbone中的BasicBlock.\n14. ultralytics/cfg/models/rt-detr/rtdetr-iRMB-SWC.yaml\n\n    使用[shift-wise conv](https://arxiv.org/abs/2401.12736)对[EMO ICCV2023](https://github.com/zhangzjn/EMO)中的iRMB进行二次创新来改进resnet18-backbone中的BasicBlock.\n15. ultralytics/cfg/models/rt-detr/rtdetr-DBBNCSPELAN.yaml\n\n    在rtdetr-RepNCSPELAN.yaml使用[Diverse Branch Block CVPR2021](https://arxiv.org/abs/2103.13425)进行二次创新.(详细介绍请看百度云视频-20240225更新说明)\n\n16. ultralytics/cfg/models/rt-detr/rtdetr-OREPANCSPELAN.yaml\n\n    在rtdetr-RepNCSPELAN.yaml使用[Online Convolutional Re-parameterization (CVPR2022)](https://github.com/JUGGHM/OREPA_CVPR2022/tree/main)进行二次创新.(详细介绍请看百度云视频-20240225更新说明)\n\n17. ultralytics/cfg/models/rt-detr/rtdetr-DRBNCSPELAN.yaml\n\n    在rtdetr-RepNCSPELAN.yaml使用[UniRepLKNet](https://github.com/AILab-CVC/UniRepLKNet/tree/main)中的DilatedReparamBlock进行二次创新.(详细介绍请看百度云视频-20240225更新说明)\n\n18. ultralytics/cfg/models/rt-detr/rtdetr-Conv3XCNCSPELAN.yaml\n\n    在rtdetr-RepNCSPELAN.yaml使用[Swift Parameter-free Attention Network](https://github.com/hongyuanyu/SPAN/tree/main)中的Conv3XC进行二次创新.(详细介绍请看百度云视频-20240225更新说明)\n\n19. ultralytics/cfg/models/rt-detr/rtdetr-ELA-HSFPN.yaml\n\n    使用[Efficient Local Attention](https://arxiv.org/abs/2403.01123)改进HSFPN.\n\n20. ultralytics/cfg/models/rt-detr/rtdetr-CA-HSFPN.yaml\n\n    使用[Coordinate Attention CVPR2021](https://github.com/houqb/CoordAttention)改进HSFPN.\n\n21. ultralytics/cfg/models/rt-detr/rtdetr-RepNCSPELAN-CAA.yaml\n\n    使用[CVPR2024 PKINet](https://github.com/PKINet/PKINet)中的CAA模块改进RepNCSPELAN.\n\n22. ultralytics/cfg/models/rt-detr/rtdetr-CAA-HSFPN.yaml\n\n    使用[CVPR2024 PKINet](https://github.com/PKINet/PKINet)中的CAA模块HSFPN.\n\n23. ultralytics/cfg/models/rt-detr/rtdetr-CAFMFusion.yaml\n\n    利用具有[HCANet](https://github.com/summitgao/HCANet)中的CAFM，其具有获取全局和局部信息的注意力机制进行二次改进content-guided attention fusion.\n\n24. ultralytics/cfg/models/rt-detr/rtdetr-faster-CGLU.yaml\n\n    使用[TransNeXt CVPR2024](https://github.com/DaiShiResearch/TransNeXt)中的Convolutional GLU对CVPR2023中的FasterNet进行二次创新.\n\n25. ultralytics/cfg/models/rt-detr/rtdetr-bifpn-GLSA.yaml\n\n    使用[GLSA](https://github.com/Barrett-python/DuAT)模块对bifpn进行二次创新.\n\n26. ultralytics/cfg/models/rt-detr/rtdetr-BIMAFPN.yaml\n\n    利用BIFPN的思想对[MAF-YOLO](https://arxiv.org/pdf/2407.04381)的MAFPN进行二次改进得到BIMAFPN.\n\n27. ultralytics/cfg/models/rt-detr/rtdetr-C2f-AddutuveBlock-CGLU.yaml\n\n    使用[CAS-ViT](https://github.com/Tianfang-Zhang/CAS-ViT)中的AdditiveBlock和[TransNeXt CVPR2024](https://github.com/DaiShiResearch/TransNeXt)中的Convolutional GLU和CSP思想改进backbone.\n\n28. ultralytics/cfg/models/rt-detr/rtdetr-C2f-MSMHSA-CGLU.yaml\n\n    使用[CMTFNet](https://github.com/DrWuHonglin/CMTFNet/tree/main)中的M2SA和[TransNeXt CVPR2024](https://github.com/DaiShiResearch/TransNeXt)中的Convolutional GLU改进c2f.\n\n29. ultralytics/cfg/models/rt-detr/rtdetr-C2f-SHSA-CGLU.yaml\n\n    使用[SHViT CVPR2024](https://github.com/ysj9909/SHViT)中的SHSABlock与[TransNeXt CVPR2024](https://github.com/DaiShiResearch/TransNeXt)中的CGLU和CSP思想改进backbone.\n\n30. ultralytics/cfg/models/rt-detr/rtdetr-C2f-SMAFB-CGLU.yaml\n\n    使用[SMAFormer BIBM2024](https://github.com/CXH-Research/SMAFormer)中的SMAFormerBlock与[TransNeXt CVPR2024](https://github.com/DaiShiResearch/TransNeXt)中的CGLU改进与CSP思想改进backbone.\n\n31. ultralytics/cfg/models/rt-detr/rtdetr-MAN-Faster.yaml\n\n    使用[Hyper-YOLO](https://www.arxiv.org/pdf/2408.04804)中的 Mixed Aggregation Network和[FasterNet CVPR2023](https://github.com/JierunChen/FasterNet)中的Faster-Block进行二次创新改进rtdetr.\n\n32. ultralytics/cfg/models/rt-detr/rtdetr-MAN-FasterCGLU.yaml\n\n    使用[Hyper-YOLO](https://www.arxiv.org/pdf/2408.04804)中的 Mixed Aggregation Network和[FasterNet CVPR2023](https://github.com/JierunChen/FasterNet)中的Faster-Block和[TransNeXt CVPR2024](https://github.com/DaiShiResearch/TransNeXt)中的Convolutional GLU进行二次创新改进rtdetr.\n\n33. ultralytics/cfg/models/rt-detr/rtdetr-MAN-Star.yaml\n\n    使用[Hyper-YOLO](https://www.arxiv.org/pdf/2408.04804)中的 Mixed Aggregation Network和[StarNet CVPR2024](https://github.com/ma-xu/Rewrite-the-Stars/tree/main)中的StarBlock进行二次创新改进rtdetr.\n\n34. ultralytics/cfg/models/rt-detr/rtdetr-MutilBackbone-MSGA.yaml\n\n    使用[MSA^2 Net](https://github.com/xmindflow/MSA-2Net)中的Multi-Scale Adaptive Spatial Attention Gate对自研系列MutilBackbone再次创新.\n\n35. ultralytics/cfg/models/rt-detr/rtdetr-slimneck-WFU.yaml\n\n    使用[ACMMM2024 WFEN](https://github.com/PRIS-CV/WFEN)中的Wavelet Feature Upgrade对slimneck二次创新.\n\n36. ultralytics/cfg/models/rt-detr/rtdetr-CDFA.yaml\n\n    使用[ACMMM2024 WFEN](https://github.com/PRIS-CV/WFEN)中的WaveletConv与[AAAI2025 ConDSeg](https://github.com/Mengqi-Lei/ConDSeg)的ContrastDrivenFeatureAggregation结合改进rtdetr.\n\n37. ultralytics/cfg/models/rt-detr/rtdetr-C2f-StripCGLU.yaml\n\n    使用[Strip R-CNN](https://arxiv.org/pdf/2501.03775)中的StripBlock和[TransNeXt CVPR2024](https://github.com/DaiShiResearch/TransNeXt)中的Convolutional GLU与CSP结合改进backbone.\n\n38. ultralytics/cfg/models/rt-detr/rtdetr-C2f-ELGCA-CGLU.yaml\n\n    使用[ELGC-Net](https://github.com/techmn/elgcnet)中的ELGCA和[TransNeXt CVPR2024](https://github.com/DaiShiResearch/TransNeXt)中的Convolutional GLU与CSP结合改进backbone.\n\n39. ultralytics/cfg/models/rt-detr/rtdetr-C2f-Faster-KAN.yaml\n\n    使用[ICLR2025 Kolmogorov-Arnold Transformer](https://github.com/Adamdad/kat)中的KAN对(CVPR2023)fasternet中的FastetBlock进行二次创新.\n\n40. ultralytics/cfg/models/11/yolo11-C3k2-DIMB-KAN.yaml\n\n    在ultralytics/cfg/models/rt-detr/rtdetr-C2f-DIMB.yaml的基础上把mlp模块换成[ICLR2025 Kolmogorov-Arnold Transformer](https://github.com/Adamdad/kat)中的KAN.\n\n41. ultralytics/cfg/models/rt-detr/rtdetr-C2f-EfficientVIM-CGLU.yaml\n\n    使用[CVPR2025 EfficientViM](https://github.com/mlvlab/EfficientViM)中的EfficientViMBlock和[TransNeXt CVPR2024](https://github.com/DaiShiResearch/TransNeXt)中的Convolutional GLU与CSP结合改进backbone.\n\n42. ultralytics/cfg/models/rt-detr/rtdetr-EUCB-SC.yaml\n\n    使用[CVPR2024 EMCAD](https://github.com/SLDGroup/EMCAD)中的EUCB和[CVPR2025 BHViT](https://github.com/IMRL/BHViT)中的ShiftChannelMix改进rtdetr-r18的上采样.\n\n43. ultralytics/cfg/models/rt-detr/rtdetr-EMBSFPN-SC.yaml\n\n    在ultralytics/cfg/models/rt-detr/rtdetr-EMBSFPN.yaml方案上引入[CVPR2025 BHViT](https://github.com/IMRL/BHViT)中的ShiftChannelMix.\n\n44. ultralytics/cfg/models/rt-detr/rtdetr-Pola-CGLU.yaml\n\n    使用[ICLR2025 PolaFormer](https://github.com/ZacharyMeng/PolaFormer)中的PolaAttention与[TransNeXt CVPR2024](https://github.com/DaiShiResearch/TransNeXt)中的Convolutional GLU进行二次创新.\n\n45. ultralytics/cfg/models/rt-detr/rtdetr-Pola-FMFFN.yaml\n\n    使用[ICLR2025 PolaFormer](https://github.com/ZacharyMeng/PolaFormer)中的PolaAttention与[ICLR2024-FTIC](https://github.com/qingshi9974/ICLR2024-FTIC)中的的FMFFN进行二次创新.\n\n46. ultralytics/cfg/models/rt-detr/rtdetr-MFMMAFPN.yaml\n\n    利用[CVPR2024 DCMPNet](https://github.com/zhoushen1/DCMPNet)中的MFM对[MAF-YOLO](https://arxiv.org/pdf/2407.04381)的MAFPN进行二次改进得到MFMMAFPN.\n\n47. ultralytics/cfg/models/rt-detr/rtdetr-HyperCompute-MFM.yaml\n\n    利用[CVPR2024 DCMPNet](https://github.com/zhoushen1/DCMPNet)中的MFM对[Hyper-YOLO](https://www.arxiv.org/pdf/2408.04804)中的Hypergraph Computation in Semantic Space进行二次创新.\n\n48. ultralytics/cfg/models/rt-detr/rtdetr-AIFI-ASSA-SEFN.yaml\n\n    使用[CVPR2024 Adapt or Perish: Adaptive Sparse Transformer with Attentive Feature Refinement for Image Restoration](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhou_Adapt_or_Perish_Adaptive_Sparse_Transformer_with_Attentive_Feature_Refinement_CVPR_2024_paper.pdf)中的Adaptive Sparse Self-Attention与[WACV2025 SEM-Net](https://github.com/ChrisChen1023/SEM-Net)的Spatially-Enhanced Feedforward Network (SEFN)改进AIFI.\n\n49. ultralytics/cfg/models/rt-detr/rtdetr-Pola-SEFN.yaml\n\n    使用[ICLR2025 PolaFormer)](https://github.com/ZacharyMeng/PolaFormer)中的PolaAttention与[WACV2025 SEM-Net](https://github.com/ChrisChen1023/SEM-Net)的Spatially-Enhanced Feedforward Network (SEFN)改进AIFI.\n\n50. ultralytics/cfg/models/rt-detr/rtdetr-AIFI-ASSA-SEFN-Mona.yaml\n\n    使用[CVPR2024 Adapt or Perish: Adaptive Sparse Transformer with Attentive Feature Refinement for Image Restoration](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhou_Adapt_or_Perish_Adaptive_Sparse_Transformer_with_Attentive_Feature_Refinement_CVPR_2024_paper.pdf)中的Adaptive Sparse Self-Attention与[WACV2025 SEM-Net](https://github.com/ChrisChen1023/SEM-Net)的Spatially-Enhanced Feedforward Network (SEFN)和[CVPR2025 Mona](https://github.com/Leiyi-Hu/mona)的Mona改进AIFI.\n\n51. ultralytics/cfg/models/rt-detr/rtdetr-Pola-SEFN-Mona.yaml\n\n    使用[ICLR2025 PolaFormer)](https://github.com/ZacharyMeng/PolaFormer)中的PolaAttention与[WACV2025 SEM-Net](https://github.com/ChrisChen1023/SEM-Net)的Spatially-Enhanced Feedforward Network (SEFN)和[CVPR2025 Mona](https://github.com/Leiyi-Hu/mona)的Mona改进AIFI.\n\n52. ultralytics/cfg/models/rt-detr/rtdetr-C2f-mambaout-LSConv.yaml\n\n    使用[CVPR2025 LSNet](https://github.com/THU-MIG/lsnet)的LSConv与[CVPR2025 MambaOut](https://github.com/yuweihao/MambaOut)中的MambaOutBlock二次创新后改进C2f.\n\n53. ultralytics/cfg/models/rt-detr/rtdetr-AIFI-ASSA-SEFN-Mona-DyT.yaml\n\n    使用[CVPR2024 Adapt or Perish: Adaptive Sparse Transformer with Attentive Feature Refinement for Image Restoration](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhou_Adapt_or_Perish_Adaptive_Sparse_Transformer_with_Attentive_Feature_Refinement_CVPR_2024_paper.pdf)中的Adaptive Sparse Self-Attention与[WACV2025 SEM-Net](https://github.com/ChrisChen1023/SEM-Net)的Spatially-Enhanced Feedforward Network (SEFN)和[CVPR2025 Mona](https://github.com/Leiyi-Hu/mona)的Mona改进和[CVPR2025 DyT](https://github.com/jiachenzhu/DyT)中的DynamicTan改进AIFI.\n\n54. ultralytics/cfg/models/rt-detr/rtdetr-Pola-SEFN-Mona-DyT.yaml\n\n    使用[ICLR2025 PolaFormer)](https://github.com/ZacharyMeng/PolaFormer)中的PolaAttention与[WACV2025 SEM-Net](https://github.com/ChrisChen1023/SEM-Net)的Spatially-Enhanced Feedforward Network (SEFN)和[CVPR2025 Mona](https://github.com/Leiyi-Hu/mona)的Mona改进和[CVPR2025 DyT](https://github.com/jiachenzhu/DyT)中的DynamicTan改进AIFI.\n\n55. ultralytics/cfg/models/rt-detr/rtdetr-Pola-SEFFN-Mona-DyT.yaml\n\n    使用[ICLR2025 PolaFormer)](https://github.com/ZacharyMeng/PolaFormer)中的PolaAttention与[TransMamba](https://github.com/sunshangquan/TransMamba)的SpectralEnhancedFFN和[CVPR2025 Mona](https://github.com/Leiyi-Hu/mona)的Mona改进和[CVPR2025 DyT](https://github.com/jiachenzhu/DyT)中的DynamicTan改进AIFI.\n\n56. ultralytics/cfg/models/rt-detr/rtdetr-Pola-EDFFN-Mona-DyT.yaml\n\n    使用[ICLR2025 PolaFormer)](https://github.com/ZacharyMeng/PolaFormer)中的PolaAttention与[CVPR2025 EVSSM](https://github.com/kkkls/EVSSM)中的EDFFN和[CVPR2025 Mona](https://github.com/Leiyi-Hu/mona)的Mona改进和[CVPR2025 DyT](https://github.com/jiachenzhu/DyT)中的DynamicTan改进AIFI.\n\n57. ultralytics/cfg/models/rt-detr/rtdetr-C2f-mambaout-FDConv.yaml\n\n    使用[CVPR2025 Frequency Dynamic Convolution for Dense Image Prediction](https://github.com/Linwei-Chen/FDConv)的FDConv和[CVPR2025 MambaOut](https://github.com/yuweihao/MambaOut)中的MambaOutBlock二次创新后改进BackBone.\n\n58. ultralytics/cfg/models/rt-detr/rtdetr-C2f-PFDConv.yaml\n\n    使用[FasterNet CVPR2023](https://github.com/JierunChen/FasterNet)中的PConv与[CVPR2025 Frequency Dynamic Convolution for Dense Image Prediction](https://github.com/Linwei-Chen/FDConv)的FDConv二次创新后改进BackBone.\n\n59. ultralytics/cfg/models/rt-detr/rtdetr-C2f-FasterFDConv.yaml\n\n    使用[FasterNet CVPR2023](https://github.com/JierunChen/FasterNet)中的FasterBlock与[CVPR2025 Frequency Dynamic Convolution for Dense Image Prediction](https://github.com/Linwei-Chen/FDConv)的FDConv二次创新后改进BackBone.\n\n60. ultralytics/cfg/models/rt-detr/rtdetr-C2f-DSAN-EDFFN.yaml\n\n    使用[DSA: Deformable Spatial Attention](https://www.techrxiv.org/users/628671/articles/775010-deformable-spatial-attention-networks-enhancing-lightweight-convolutional-models-for-vision-tasks)中的Deformable Spatial Attention Block和[CVPR2025 EVSSM](https://github.com/kkkls/EVSSM)中的EDFFN进行二次创新后改进BackBone.\n\n61. ultralytics/cfg/models/rt-detr/rtdetr-C2f-mambaout-DSA.yaml\n\n    使用[DSA: Deformable Spatial Attention](https://www.techrxiv.org/users/628671/articles/775010-deformable-spatial-attention-networks-enhancing-lightweight-convolutional-models-for-vision-tasks)中的Deformable Spatial Attention Block与[CVPR2025 MambaOut](https://github.com/yuweihao/MambaOut)中的MambaOutBlock二次创新后改进BackBone.\n\n62. ultralytics/cfg/models/rt-detr/rtdetr-SOEP-RFPN.yaml\n\n    使用[ECCV2024 rethinking-fpn](https://github.com/AlanLi1997/rethinking-fpn)的SNI和GSConvE对原创改进SOEP再次创新.\n\n63. ultralytics/cfg/models/rt-detr/rtdetr-SOEP-MFM.yaml\n\n    使用[CVPR2024 DCMPNet](https://github.com/zhoushen1/DCMPNet)中的MFM对原创改进SOEP再次创新.\n\n64. ultralytics/cfg/models/rt-detr/rtdetr-SOEP-MFM-RFPN.yaml\n\n    使用[ECCV2024 rethinking-fpn](https://github.com/AlanLi1997/rethinking-fpn)的SNI和GSConvE和[CVPR2024 DCMPNet](https://github.com/zhoushen1/DCMPNet)中的MFM对原创改进SOEP再次创新.\n\n65. ultralytics/cfg/models/rt-detr/rtdetr-C2f-mambaout-SFSC.yaml\n\n    使用[CVPR2024 SFSConv](https://github.com/like413/SFS-Conv)的SFSConv与[CVPR2025 MambaOut](https://github.com/yuweihao/MambaOut)中的MambaOutBlock二次创新后改进C2f.\n\n66. ultralytics/cfg/models/rt-detr/rtdetr-C2f-PSFSConv.yaml\n\n    使用[FasterNet CVPR2023](https://github.com/JierunChen/FasterNet)中的PConv与[CVPR2024 SFSConv](https://github.com/like413/SFS-Conv)的SFSConv二次创新后改进C2f.\n\n67. ultralytics/cfg/models/rt-detr/rtdetr-C2f-FasterSFSConv.yaml\n\n    使用[FasterNet CVPR2023](https://github.com/JierunChen/FasterNet)中的FasterBlock与[CVPR2024 SFSConv](https://github.com/like413/SFS-Conv)的SFSConv二次创新后改进C2f.\n\n68. ultralytics/cfg/models/rt-detr/rtdetr-SOEP-PST.yaml \n\n    使用[Pyramid Sparse Transformer](https://arxiv.org/abs/2505.12772)中的Pyramid Sparse Transformer对原创改进SOEP进行创新.\n\n69. ultralytics/cfg/models/rt-detr/rtdetr-C2f-DIMB-HyperACE.yaml\n\n    使用[yolo13](https://github.com/iMoonLab/yolov13)中的HyperACE与自研模块DynamicInceptionDWConv2d的结合.\n\n70. ultralytics/cfg/models/rt-detr/rtdetr-AIFI-SHSA-EPGO.yaml\n\n    使用[ACM MM 2025 CPRAformer](https://github.com/zs1314/CPRAformer)中的EPGO和[SHViT CVPR2024](https://github.com/ysj9909/SHViT)中的SHSA改进AIFI.\n\n71. ultralytics/cfg/models/rt-detr/rtdetr-C2f-SHSA-EPGO.yaml\n\n    使用[SHViT CVPR2024](https://github.com/ysj9909/SHViT)中的SHSABlock与[ACM MM 2025 CPRAformer](https://github.com/zs1314/CPRAformer)中的EPGO和CSP思想改进backbone.\n\n72. ultralytics/cfg/models/rt-detr/rtdetr-C2f-SHSA-EPGO-CGLU.yaml\n\n    使用[SHViT CVPR2024](https://github.com/ysj9909/SHViT)中的SHSABlock与[TransNeXt CVPR2024](https://github.com/DaiShiResearch/TransNeXt)中的CGLU与[ACM MM 2025 CPRAformer](https://github.com/zs1314/CPRAformer)中的EPGO和CSP思想改进backbone.\n\n### 自研系列\n1. ultralytics/cfg/models/rt-detr/rtdetr-PACAPN.yaml\n\n    自研结构, Parallel Atrous Convolution Attention Pyramid Network, PAC-APN\n    1. 并行(上/下)采样分支可为网络提供多条特征提取途径，丰富特征表达的多样性、再结合gate机制对采样后的特征进行特征选择，强化更有意义的特征，抑制冗余或不相关的特征，提升特征表达的有效性。\n    2. PAC模块通过使用具有不同膨胀率的并行空洞卷积，能够有效地提取不同尺度的特征。这使得网络能够捕捉数据中局部和上下文信息，提高其表示复杂模式的能力。\n\n2. ultralytics/cfg/models/rt-detr/rtdetr-FDPN.yaml\n\n    自研特征聚焦扩散金字塔网络(Focusing Diffusion Pyramid Network)\n    1. 通过定制的特征聚焦模块与特征扩散机制，能让每个尺度的特征都具有详细的上下文信息，更有利于后续目标的检测与分类。\n    2. 定制的特征聚焦模块可以接受三个尺度的输入，其内部包含一个Inception-Style的模块，其利用一组并行深度卷积来捕获丰富的跨多个尺度的信息。\n    3. 通过扩散机制使具有丰富的上下文信息的特征进行扩散到各个检测尺度.\n\n3. ultralytics/cfg/models/rt-detr/rtdetr-FDPN-DASI.yaml\n\n    使用[HCFNet](https://github.com/zhengshuchen/HCFNet)中的Dimension-Aware Selective Integration Module对自研的Focusing Diffusion Pyramid Network再次创新.\n\n4. ultralytics/cfg/models/rt-detr/rtdetr-RGCSPELAN.yaml\n\n    自研RepGhostCSPELAN.\n    1. 参考GhostNet中的思想(主流CNN计算的中间特征映射存在广泛的冗余)，采用廉价的操作生成一部分冗余特征图，以此来降低计算量和参数量。\n    2. 舍弃yolov5与yolov8中常用的BottleNeck，为了弥补舍弃残差块所带来的性能损失，在梯度流通分支上使用RepConv，以此来增强特征提取和梯度流通的能力，并且RepConv可以在推理的时候进行融合，一举两得。\n    3. 可以通过缩放因子控制RGCSPELAN的大小，使其可以兼顾小模型和大模型。\n\n5. ultralytics/cfg/models/rt-detr/rtdetr-ContextGuideFPN.yaml\n\n    Context Guide Fusion Module（CGFM）是一个创新的特征融合模块，旨在改进YOLOv8中的特征金字塔网络（FPN）。该模块的设计考虑了多尺度特征融合过程中上下文信息的引导和自适应调整。\n    1. 上下文信息的有效融合：通过SE注意力机制，模块能够在特征融合过程中捕捉并利用重要的上下文信息，从而增强特征表示的有效性，并有效引导模型学习检测目标的信息，从而提高模型的检测精度。\n    2. 特征增强：通过权重化的特征重组操作，模块能够增强重要特征，同时抑制不重要特征，提升特征图的判别能力。\n    3. 简单高效：模块结构相对简单，不会引入过多的计算开销，适合在实时目标检测任务中应用。\n    这期视频讲解在B站:https://www.bilibili.com/video/BV1Vx4y1n7hZ/\n\n6. ultralytics/cfg/models/rt-detr/rtdetr-C2f-SMPCGLU.yaml\n\n    Self-moving Point Convolutional GLU模型改进C2f.\n    SMP来源于[CVPR2023-SMPConv](https://github.com/sangnekim/SMPConv),Convolutional GLU来源于[TransNeXt CVPR2024](https://github.com/DaiShiResearch/TransNeXt).\n    1. 普通的卷积在面对数据中的多样性和复杂性时，可能无法捕捉到有效的特征，因此我们采用了SMPConv，其具备最新的自适应点移动机制，从而更好地捕捉局部特征，提高特征提取的灵活性和准确性。\n    2. 在SMPConv后添加CGLU，Convolutional GLU 结合了卷积和门控机制，能够选择性地通过信息通道，提高了特征提取的有效性和灵活性。\n\n7. Re-CalibrationFPN\n\n    为了加强浅层和深层特征的相互交互能力，推出重校准特征金字塔网络(Re-CalibrationFPN).\n    P2345：ultralytics/cfg/models/v8/yolov8-ReCalibrationFPN-P2345.yaml(带有小目标检测头的ReCalibrationFPN)\n    P345：ultralytics/cfg/models/v8/yolov8-ReCalibrationFPN-P345.yaml\n    P3456：ultralytics/cfg/models/v8/yolov8-ReCalibrationFPN-P3456.yaml(带有大目标检测头的ReCalibrationFPN)\n    1. 浅层语义较少，但细节丰富，有更明显的边界和减少失真。此外，深层蕴藏着丰富的物质语义信息。因此，直接融合低级具有高级特性的特性可能导致冗余和不一致。为了解决这个问题，我们提出了[SBA](https://github.com/Barrett-python/DuAT)模块，它有选择地聚合边界信息和语义信息来描绘更细粒度的物体轮廓和重新校准物体的位置。\n    2. 相比传统的FPN结构，[SBA](https://github.com/Barrett-python/DuAT)模块引入了高分辨率和低分辨率特征之间的双向融合机制，使得特征之间的信息传递更加充分，进一步提升了多尺度特征融合的效果。\n    3. [SBA](https://github.com/Barrett-python/DuAT)模块通过自适应的注意力机制，根据特征图的不同分辨率和内容，自适应地调整特征的权重，从而更好地捕捉目标的多尺度特征。\n\n8. ultralytics/cfg/models/rt-detr/rtdetr-SOEP.yaml\n\n    小目标在正常的P3、P4、P5检测层上略显吃力，比较传统的做法是加上P2检测层来提升小目标的检测能力，但是同时也会带来一系列的问题，例如加上P2检测层后计算量过大、后处理更加耗时等问题，日益激发需要开发新的针对小目标有效的特征金字塔，我们基于原本的PAFPN上进行改进，提出SmallObjectEnhancePyramid，相对于传统的添加P2检测层，我们使用P2特征层经过SPDConv得到富含小目标信息的特征给到P3进行融合，然后使用CSP思想和基于[AAAI2024的OmniKernel](https://ojs.aaai.org/index.php/AAAI/article/view/27907)进行改进得到CSP-OmniKernel进行特征整合，OmniKernel模块由三个分支组成，即三个分支，即全局分支、大分支和局部分支、以有效地学习从全局到局部的特征表征，最终从而提高小目标的检测性能。\n\n9. ultralytics/cfg/models/rt-detr/rtdetr-CGRFPN.yaml\n\n    Context-Guided Spatial Feature Reconstruction Feature Pyramid Network.\n    1. 借鉴[ECCV2024-CGRSeg](https://github.com/nizhenliang/CGRSeg)中的Rectangular Self-Calibration Module经过精心设计,用于空间特征重建和金字塔上下文提取,它在水平和垂直方向上捕获全局上下文，并获得轴向全局上下文来显式地建模矩形关键区域.\n    2. PyramidContextExtraction Module使用金字塔上下文提取模块（PyramidContextExtraction），有效整合不同层级的特征信息，提升模型的上下文感知能力。\n    3. FuseBlockMulti 和 DynamicInterpolationFusion 这些模块用于多尺度特征的融合，通过动态插值和多特征融合，进一步提高了模型的多尺度特征表示能力和提升模型对复杂背景下目标的识别能力。\n\n10. ultralytics/cfg/models/rt-detr/rtdetr-EMBSFPN.yaml\n\n    基于BIFPN、[MAF-YOLO](https://arxiv.org/pdf/2407.04381)、[CVPR2024 EMCAD](https://github.com/SLDGroup/EMCAD)提出全新的Efficient Multi-Branch&Scale FPN.\n    Efficient Multi-Branch&Scale FPN拥有<轻量化>、<多尺度特征加权融合>、<多尺度高效卷积模块>、<高效上采样模块>、<全局异构核选择机制>。\n    1. 具有多尺度高效卷积模块和全局异构核选择机制，Trident网络的研究表明，具有较大感受野的网络更适合检测较大的物体，反之，较小尺度的目标则从较小的感受野中受益，因此我们在FPN阶段，对于不同尺度的特征层选择不同的多尺度卷积核以适应并逐步获得多尺度感知场信息。\n    2. 借鉴BIFPN中的多尺度特征加权融合，能把Concat换成Add来减少参数量和计算量的情况下，还能通过不同尺度特征的重要性进行自适用选择加权融合。\n    3. 高效上采样模块来源于CVPR2024-EMCAD中的EUCB，能够在保证一定效果的同时保持高效性。\n\n11. ultralytics/cfg/models/rt-detr/rtdetr-CSP-PMSFA.yaml\n\n    自研模块:CSP-Partial Multi-Scale Feature Aggregation.\n    1. 部分多尺度特征提取：参考CVPR2020-GhostNet、CVPR2024-FasterNet的思想，采用高效的PartialConv，该模块能够从输入中提取多种尺度的特征信息，但它并不是在所有通道上进行这种操作，而是部分（Partial）地进行，从而提高了计算效率。\n    2. 增强的特征融合: 最后的 1x1 卷积层通过将不同尺度的特征融合在一起，同时使用残差连接将输入特征与处理后的特征相加，有效保留了原始信息并引入了新的多尺度信息，从而提高模型的表达能力。\n\n12. ultralytics/cfg/models/rt-detr/rtdetr-MutilBackbone-DAF.yaml\n\n    自研MutilBackbone-DynamicAlignFusion.\n    1. 为了避免在浅层特征图上消耗过多计算资源，设计的MutilBackbone共享一个stem的信息，这个设计有利于避免计算量过大，推理时间过大的问题。\n    2. 为了避免不同Backbone信息融合出现不同来源特征之间的空间差异，我们为此设计了DynamicAlignFusion，其先通过融合来自两个不同模块学习到的特征，然后生成一个名为DynamicAlignWeight去调整各自的特征，最后使用一个可学习的通道权重，其可以根据输入特征动态调整两条路径的权重，从而增强模型对不同特征的适应能力。\n\n13. ultralytics/cfg/models/rt-detr/rtdetr-CSP-MutilScaleEdgeInformationEnhance.yaml\n\n    自研CSP-MutilScaleEdgeInformationEnhance.\n    MutilScaleEdgeInformationEnhance模块结合了多尺度特征提取、边缘信息增强和卷积操作。它的主要目的是从不同尺度上提取特征，突出边缘信息，并将这些多尺度特征整合到一起，最后通过卷积层输出增强的特征。这个模块在特征提取和边缘增强的基础上有很好的表征能力.\n    1. 多尺度特征提取：通过 nn.AdaptiveAvgPool2d 进行多尺度的池化，提取不同大小的局部信息，有助于捕捉图像的多层次特征。\n    2. 边缘增强：EdgeEnhancer 模块专门用于提取边缘信息，使得网络对边缘的敏感度增强，这对许多视觉任务（如目标检测、语义分割等）有重要作用。\n    3. 特征融合：将不同尺度下提取的特征通过插值操作对齐到同一尺度，然后将它们拼接在一起，最后经过卷积层融合成统一的特征表示，能够提高模型对多尺度特征的感知。\n\n14. ultralytics/cfg/models/rt-detr/rtdetr-CSP-FreqSpatial.yaml\n\n    FreqSpatial 是一个融合时域和频域特征的卷积神经网络（CNN）模块。该模块通过在时域和频域中提取特征，旨在捕捉不同层次的空间和频率信息，以增强模型在处理图像数据时的鲁棒性和表示能力。模块的主要特点是将 Scharr 算子（用于边缘检测）与 时域卷积 和 频域卷积 结合，通过多种视角捕获图像的结构特征。\n    1. 时域特征提取：从原始图像中提取出基于空间结构的特征，主要捕捉图像的细节、边缘信息等。\n    2. 频域特征提取：从频率域中提取出频率相关的模式，捕捉到图像的低频和高频成分，能够帮助模型在全局和局部的尺度上提取信息。\n    3. 特征融合：将时域和频域的特征进行加权相加，得到最终的输出特征图。这种加权融合允许模型同时考虑空间结构信息和频率信息，从而增强模型在多种场景下的表现能力。\n\n15. ultralytics/cfg/models/rt-detr/rtdetr-CSP-MutilScaleEdgeInformationSelect.yaml\n\n    基于自研CSP-MutilScaleEdgeInformationEnhance再次创新.\n    我们提出了一个 多尺度边缘信息选择模块（MutilScaleEdgeInformationSelect），其目的是从多尺度边缘信息中高效选择与目标任务高度相关的关键特征。为了实现这一目标，我们引入了一个具有通过聚焦更重要的区域能力的注意力机制[ICCV2023 DualDomainSelectionMechanism, DSM](https://github.com/c-yn/FocalNet)。该机制通过聚焦图像中更重要的区域（如复杂边缘和高频信号区域），在多尺度特征中自适应地筛选具有更高任务相关性的特征，从而显著提升了特征选择的精准度和整体模型性能。\n\n16. GlobalEdgeInformationTransfer\n\n    总所周知，物体框的定位非常之依赖物体的边缘信息，但是对于常规的目标检测网络来说，没有任何组件能提高网络对物体边缘信息的关注度，我们需要开发一个能让边缘信息融合到各个尺度所提取的特征中，因此我们提出一个名为GlobalEdgeInformationTransfer(GEIT)的模块，其可以帮助我们把浅层特征中提取到的边缘信息传递到整个backbone上，并与不同尺度的特征进行融合。\n    1. 由于原始图像中含有大量背景信息，因此从原始图像上直接提取边缘信息传递到整个backbone上会给网络的学习带来噪声，而且浅层的卷积层会帮助我们过滤不必要的背景信息，因此我们选择在网络的浅层开发一个名为MutilScaleEdgeInfoGenetator的模块，其会利用网络的浅层特征层去生成多个尺度的边缘信息特征图并投放到主干的各个尺度中进行融合。\n    2. 对于下采样方面的选择，我们需要较为谨慎，我们的目标是保留并增强边缘信息，同时进行下采样，选择MaxPool 会更合适。它能够保留局部区域的最强特征，更好地体现边缘信息。因为 AvgPool 更适用于需要平滑或均匀化特征的场景，但在保留细节和边缘信息方面的表现不如 MaxPool。\n    3. 对于融合部分，ConvEdgeFusion巧妙地结合边缘信息和普通卷积特征，提出了一种新的跨通道特征融合方式。首先，使用conv_channel_fusion进行边缘信息与普通卷积特征的跨通道融合，帮助模型更好地整合不同来源的特征。然后采用conv_3x3_feature_extract进一步提取融合后的特征，以增强模型对局部细节的捕捉能力。最后通过conv_1x1调整输出特征维度。\n\n17. ultralytics/cfg/models/rt-detr/rtdetr-C2f-DIMB.yaml\n\n    自研模块DynamicInceptionDWConv2d.(更多解释请看项目内的使用教程.md)\n\n18. ultralytics/cfg/models/rt-detr/rtdetr-HAFB-1.yaml\n    \n    自研模块Hierarchical Attention Fusion Block.(更多解释请看项目内的使用教程.md)\n\n19. ultralytics/cfg/models/rt-detr/rtdetr-HAFB-2.yaml\n     \n    HAFB的另外一种使用方式.\n\n20. ultralytics/cfg/models/rt-detr/rtdetr-MutilBackbone-HAFB.yaml\n\n    在rtdetr-MutilBackbone-DAF.yaml上引入HAFB(Hierarchical Attention Fusion Block).\n\n### BackBone系列\n1. ultralytics/cfg/models/rt-detr/rt-detr-timm.yaml\n\n    使用[timm](https://github.com/huggingface/pytorch-image-models)库系列的主干替换rtdetr的backbone.(基本支持现有CNN模型)\n2. ultralytics/cfg/models/rt-detr/rt-detr-fasternet.yaml\n\n    使用[FasterNet CVPR2023](https://github.com/JierunChen/FasterNet)替换rtdetr的backbone.\n3. ultralytics/cfg/models/rt-detr/rt-detr-EfficientViT.yaml\n\n    使用[EfficientViT CVPR2023](https://github.com/microsoft/Cream/tree/main/EfficientViT)替换rtdetr的backbone.\n4. ultralytics/cfg/models/rt-detr/rtdetr-convnextv2.yaml\n\n    使用[ConvNextV2 2023](https://github.com/facebookresearch/ConvNeXt-V2)替换rtdetr的backbone.\n5. ultralytics/cfg/models/rt-detr/rtdetr-EfficientFormerv2.yaml\n\n    使用[EfficientFormerv2 2022](https://github.com/snap-research/EfficientFormer)替换rtdetr的backbone.\n6. ultralytics/cfg/models/rt-detr/rtdetr-repvit.yaml\n\n    使用[RepViT ICCV2023](https://github.com/THU-MIG/RepViT)替换rtdetr的backbone.\n7. ultralytics/cfg/models/rt-detr/rtdetr-CSwomTramsformer.yaml\n\n    使用[CSwinTramsformer CVPR2022](https://github.com/microsoft/CSWin-Transformer)替换rtdetr的backbone.\n8. ultralytics/cfg/models/rt-detr/rtdetr-VanillaNet.yaml\n\n    使用[VanillaNet 2023](https://github.com/huawei-noah/VanillaNet)替换rtdetr的backbone.\n9. ultralytics/cfg/models/rt-detr/rtdetr-SwinTransformer.yaml\n\n    使用[SwinTransformer ICCV2021](https://github.com/microsoft/Swin-Transformer)替换rtdetr的backbone.\n10. ultralytics/cfg/models/rt-detr/rtdetr-lsknet.yaml\n\n    使用[LSKNet ICCV2023](https://github.com/zcablii/LSKNet)替换rtdetr的backbone.\n11. ultralytics/cfg/models/rt-detr/rt-detr-unireplknet.yaml\n\n    使用[UniRepLKNet](https://github.com/AILab-CVC/UniRepLKNet/tree/main)替换rtdetr的backbone.\n12. ultralytics/cfg/models/rt-detr/rtdetr-TransNeXt.yaml\n\n    使用[TransNeXt](https://github.com/DaiShiResearch/TransNeXt)改进rtdetr的backbone.\n13. ultralytics/cfg/models/rt-detr/rtdetr-RepNCSPELAN.yaml\n\n    使用[YOLOV9](https://github.com/WongKinYiu/yolov9)中的RepNCSPELAN和ADown进行改进RTDETR-R18.\n14. ultralytics/cfg/models/rt-detr/rtdetr-rmt.yaml\n\n    使用[CVPR2024 RMT](https://arxiv.org/abs/2309.11523)改进rtdetr的主干.\n15. ultralytics/cfg/models/rt-detr/rtdetr-C2f-PKI.yaml\n\n    使用[CVPR2024 PKINet](https://github.com/PKINet/PKINet)中的PKIModule和CAA模块和C2f改进backbone.\n16. ultralytics/cfg/models/rt-detr/rtdetr-C2f-PPA.yaml\n\n    使用[HCFNet](https://github.com/zhengshuchen/HCFNet)中的Parallelized Patch-Aware Attention Module改进C2f.\n17. ultralytics/cfg/models/rt-detr/rtdetr-mobilenetv4.yaml\n\n    使用[MobileNetV4](https://github.com/jaiwei98/MobileNetV4-pytorch/tree/main)改进rtdetr-backbone.\n18. ultralytics/cfg/models/rt-detr/rtdetr-starnet.yaml\n\n    使用[StarNet CVPR2024](https://github.com/ma-xu/Rewrite-the-Stars/tree/main)改进yolov8-backbone.\n\n19. ultralytics/cfg/models/rt-detr/rtdetr-C2f-vHeat.yaml\n\n    使用[vHeat](https://github.com/MzeroMiko/vHeat/tree/main)中的HeatBlock和C2f改进backbone.\n\n20. ultralytics/cfg/models/rt-detr/rtdetr-C2f-FMB.yaml\n\n    使用[ECCV2024 SMFANet](https://github.com/Zheng-MJ/SMFANet/tree/main)的Feature Modulation block改进C2f.\n\n21. ultralytics/cfg/models/rt-detr/rtdetr-C2f-gConv.yaml\n\n    使用[Rethinking Performance Gains in Image Dehazing Networks](https://arxiv.org/abs/2209.11448)的gConvblock改进C2f.\n\n22. ultralytics/cfg/models/rt-detr/rtdetr-C2f-AddutuveBlock.yaml\n\n    使用[CAS-ViT](https://github.com/Tianfang-Zhang/CAS-ViT)中的AdditiveBlock和CSP思想改进backbone.\n\n23. ultralytics/cfg/models/rt-detr/rtdetr-C2f-MogaBlock.yaml\n\n    使用[MogaNet ICLR2024](https://github.com/Westlake-AI/MogaNet)中的MogaBlock与CSP思想结合改进backbone.\n\n24. ultralytics/cfg/models/rt-detr/rtdetr-C2f-SHSA.yaml\n\n    使用[SHViT CVPR2024](https://github.com/ysj9909/SHViT)中的SHSABlock和CSP思想改进backbone.\n\n25. ultralytics/cfg/models/rt-detr/rtdetr-C2f-SMAFB.yaml\n\n    使用[SMAFormer BIBM2024](https://github.com/CXH-Research/SMAFormer)中的SMAFormerBlock与CSP思想改进backbone.\n\n26. ultralytics/cfg/models/rt-detr/rtdetr-C2f-FFCM.yaml\n\n    使用[Efficient Frequency-Domain Image Deraining with Contrastive Regularization ECCV2024](https://github.com/deng-ai-lab/FADformer)中的Fused_Fourier_Conv_Mixer与CSP思想结合改进rtdetr-backbone.\n\n27. ultralytics/cfg/models/rt-detr/rtdetr-C2f-SFHF.yaml\n\n    使用[SFHformer ECCV2024](https://github.com/deng-ai-lab/SFHformer)中的block与CSP思想结合改进 rtdetr-backbone.\n\n28. ultralytics/cfg/models/rt-detr/rtdetr-C2f-MSM.yaml\n\n    使用[Revitalizing Convolutional Network for Image Restoration TPAMI2024](https://zhuanlan.zhihu.com/p/720777160)中的MSM与CSP思想结合改进rtdetr-backbone.\n\n29. ultralytics/cfg/models/rt-detr/rtdetr-C2f-HDRAB.yaml\n\n    使用[Pattern Recognition 2024|DRANet](https://github.com/WenCongWu/DRANet)中的HDRAB(hybrid dilated residual attention block)结合CSP思想改进backbone.\n\n30. ultralytics/cfg/models/rt-detr/rtdetr-C2f-RAB.yaml\n\n    使用[Pattern Recognition 2024|DRANet](https://github.com/WenCongWu/DRANet)中的RAB( residual attention block)结合CSP思想改进backbone.\n\n31. ultralytics/cfg/models/rt-detr/rtdetr-C2f-FCA.yaml\n\n    使用[FreqFormer](https://github.com/JPWang-CS/FreqFormer)的Frequency-aware Cascade Attention与CSP结合改进backbone.\n\n32. ultralytics/cfg/models/rt-detr/rtdetr-C2f-CAMixer.yaml\n\n    使用[CAMixerSR CVPR2024](https://github.com/icandle/CAMixerSR)中的CAMixer与CSP结合改进backbone.\n\n33. ultralytics/cfg/models/rt-detr/rtdetr-C2f-HFERB.yaml\n\n    使用[ICCV2023 CRAFT-SR](https://github.com/AVC2-UESTC/CRAFT-SR)中的high-frequency enhancement residual block与CSP结合改进backbone.\n\n34. ultralytics/cfg/models/rt-detr/rtdetr-C2f-DTAB.yaml\n\n    使用[AAAI2025 TBSN](https://github.com/nagejacob/TBSN)中的DTAB与CSP结合改进backbone.\n\n35. ultralytics/cfg/models/rt-detr/rtdetr-C2f-JDPM.yaml\n\n    使用[ECCV2024 FSEL](https://github.com/CSYSI/FSEL)中的joint domain perception module与CSP结合改进backbone.\n\n36. ultralytics/cfg/models/rt-detr/rtdetr-C2f-ETB.yaml\n\n    使用[ECCV2024 FSEL](https://github.com/CSYSI/FSEL)中的entanglement transformer block与CSP结合改进backbone.\n\n37. ultralytics/cfg/models/rt-detr/rtdetr-C2f-FDT.yaml\n\n    使用[ACMMM2024 WFEN](https://github.com/PRIS-CV/WFEN)中的Full-domain Transformer与CSP结合改进backbone.\n\n38. ultralytics/cfg/models/rt-detr/rtdetr-C2f-AP.yaml\n\n    使用[AAAI2025 Pinwheel-shaped Convolution and Scale-based Dynamic Loss for Infrared Small Target Detection](https://github.com/JN-Yang/PConv-SDloss-Data)中的Asymmetric Padding bottleneck改进rtdetr.\n\n39. ultralytics/cfg/models/rt-detr/rtdetr-C2f-ELGCA.yaml\n\n    使用[ELGC-Net](https://github.com/techmn/elgcnet)中的ELGCA与CSP结合改进backbone.\n\n40. ultralytics/cfg/models/rt-detr/rtdetr-C2f-Strip.yaml\n\n    使用[Strip R-CNN](https://arxiv.org/pdf/2501.03775)中的StripBlock与CSP结合改进backbone.\n\n41. ultralytics/cfg/models/rt-detr/rtdetr-C2f-KAT.yaml\n\n    使用[ICLR2025 Kolmogorov-Arnold Transformer](https://github.com/Adamdad/kat)中的KAT与CSP结合改进backbone.\n\n42. ultralytics/cfg/models/rt-detr/rtdetr-C2f-GlobalFilter.yaml\n\n    使用[T-PAMI Global Filter Networks for Image Classification](https://github.com/raoyongming/GFNet)中的GlobalFilterBlock和[TransNeXt CVPR2024](https://github.com/DaiShiResearch/TransNeXt)中的Convolutional GLU和CSP改进rtdetr-backbone.\n\n43. ultralytics/cfg/models/rt-detr/rtdetr-C2f-DynamicFilter.yaml\n\n    使用[AAAI2024 FFT-Based Dynamic Token Mixer for Vision](https://github.com/okojoalg/dfformer)中的DynamicFilter与CSP改进rtdetr-backbone.\n\n44. ultralytics/cfg/models/rt-detr/rtdetr-RepHMS.yaml\n     \n     使用[MHAF-YOLO](https://github.com/yang-0201/MHAF-YOLO)中的RepHMS改进rtdetr.\n\n45. ultralytics/cfg/models/rt-detr/rtdetr-C2f-SAVSS.yaml\n    \n    使用[CVPR2025 SCSegamba](https://github.com/Karl1109/SCSegamba)中的Structure-Aware Scanning Strategy与CSP结合改进backbone.\n\n46. ultralytics/cfg/models/rt-detr/rtdetr-mambaout.yaml\n     \n    使用[CVPR2025 MambaOut](https://github.com/yuweihao/MambaOut)中的MambaOut替换BackBone.\n\n47. ultralytics/cfg/models/rt-detr/rtdetr-C2f-mambaout.yaml\n\n    使用[CVPR2025 MambaOut](https://github.com/yuweihao/MambaOut)中的MambaOut与CSP结合改进backbone.\n\n48. ultralytics/cfg/models/rt-detr/rtdetr-C2f-EfficientVIM.yaml\n\n    使用[CVPR2025 EfficientViM](https://github.com/mlvlab/EfficientViM)中的EfficientViMBlock与CSP结合改进backbone.\n\n49. ultralytics/cfg/models/rt-detr/rtdetr-C2f-IEL.yaml\n\n    使用[CVPR2025 HVI](https://github.com/Fediory/HVI-CIDNet)中的Intensity Enhancement Layer与CSP改进rtdetr中的BackBone.\n\n50. ultralytics/cfg/models/rt-detr/rtdetr-overlock.yaml\n\n    使用[CVPR2025 OverLock](https://arxiv.org/pdf/2502.20087)中的overlock-backbone替换rtdetr-r18的backbone.\n\n51. ultralytics/cfg/models/rt-detr/rtdetr-C2f-RCB.yaml\n\n    使用[CVPR2025 OverLock](https://arxiv.org/pdf/2502.20087)中的RepConvBlock与CSP改进rtdetr-r18的backbone.\n\n52. ultralytics/cfg/models/rt-detr/rtdetr-C2f-LEGM.yaml\n\n    使用[CVPR2024 DCMPNet](https://github.com/zhoushen1/DCMPNet)中的LEGM与CSP改进rtdetr-r18的backbone.\n\n53. ultralytics/cfg/models/rt-detr/rtdetr-C2f-FAT.yaml\n\n    使用[ICLR2024-FTIC](https://github.com/qingshi9974/ICLR2024-FTIC)中的FATBlock与CSP改进rtdetr-r18的backbone.\n\n54. ultralytics/cfg/models/rt-detr/rtdetr-C2f-MobileMamba.yaml\n\n    使用使用[CVPR2025 MobileMamba](https://github.com/lewandofskee/MobileMamba)中的MobileMambaBlock与CSP思想改进backbone.\n\n55. ultralytics/cfg/models/rt-detr/rtdetr-MobileMamba.yaml\n\n    使用[CVPR2025 MobileMamba](https://github.com/lewandofskee/MobileMamba)中的MobileMamba改进Backbone.\n\n56. ultralytics/cfg/models/rt-detr/rtdetr-C2f-LFEM.yaml\n\n    使用[LEGNet](https://github.com/lwCVer/LEGNet)中的LFEModule与CSP思想改进backbone.\n\n57. ultralytics/cfg/models/rt-detr/rtdetr-C2f-SBSM.yaml\n\n    使用[WACV2025 SEM-Net](https://github.com/ChrisChen1023/SEM-Net)的Snake Bi-Directional Sequence Modelling (SBSM)与CSP思想改进backbone.\n\n58. ultralytics/cfg/models/rt-detr/rtdetr-lsnet.yaml\n\n    使用[CVPR2025 LSNet](https://github.com/THU-MIG/lsnet)的LSNet替换backbone.\n\n59. ultralytics/cfg/models/rt-detr/rtdetr-C2f-LSBlock.yaml\n\n    使用[CVPR2025 LSNet](https://github.com/THU-MIG/lsnet)的LSBlock改进C2f.\n\n60. ultralytics/cfg/models/rt-detr/rtdetr-C2f-TransMamba.yaml\n\n    使用[TransMamba](https://github.com/sunshangquan/TransMamba)的TransMamba与CSP思想改进backbone.\n\n61. ultralytics/cfg/models/rt-detr/rtdetr-C2f-EVS.yaml \n\n    使用[CVPR2025 EVSSM](https://github.com/kkkls/EVSSM)中的EVS与CSP思想改进backbone.\n\n62. ultralytics/cfg/models/rt-detr/rtdetr-C2f-EBlock.yaml\n\n    使用[CVPR2025 DarkIR](https://github.com/cidautai/DarkIR)中的EVS与CSP思想改进backbone.\n\n63. ultralytics/cfg/models/rt-detr/rtdetr-C2f-DBlock.yaml\n\n    使用[CVPR2025 DarkIR](https://github.com/cidautai/DarkIR)中的EVS与CSP思想改进backbone.\n\n64. ultralytics/cfg/models/rt-detr/rtdetr-C2f-FDConv.yaml\n\n    使用[CVPR2025 Frequency Dynamic Convolution for Dense Image Prediction](https://github.com/Linwei-Chen/FDConv)的FDConv与CSP思想改进BackBone.\n\n65. ultralytics/cfg/models/rt-detr/rtdetr-C2f-DSAN.yaml\n\n    使用[DSA: Deformable Spatial Attention](https://www.techrxiv.org/users/628671/articles/775010-deformable-spatial-attention-networks-enhancing-lightweight-convolutional-models-for-vision-tasks)中的Deformable Spatial Attention Block与CSP改进BackBone.\n\n66. ultralytics/cfg/models/rt-detr/rtdetr-C2f-DSA.yaml\n\n    使用[DSA: Deformable Spatial Attention](https://www.techrxiv.org/users/628671/articles/775010-deformable-spatial-attention-networks-enhancing-lightweight-convolutional-models-for-vision-tasks)中的Deformable Spatial Attention与CSP改进BackBone.\n\n67. ultralytics/cfg/models/rt-detr/rtdetr-C2f-RMB.yaml\n\n    使用[CVPR2025 MaIR](https://github.com/XLearning-SCU/2025-CVPR-MaIR)中的Residual Mamba Block与CSP思想改进BackBone.\n\n68. ultralytics/cfg/models/rt-detr/rtdetr-C2f-SFSConv.yaml\n\n    使用[CVPR2024 SFSConv](https://github.com/like413/SFS-Conv)的SFSConv改进C2f.\n\n69. ultralytics/cfg/models/rt-detr/rtdetr-C2f-GroupMamba.yaml\n\n    使用[CVPR2025 GroupMamba](https://github.com/Amshaker/GroupMamba)中的GroupMambaLayer与CSP思想改进Backbone.\n\n70. ultralytics/cfg/models/rt-detr/rtdetr-C2f-GroupMambaBlock.yaml\n\n    使用[CVPR2025 GroupMamba](https://github.com/Amshaker/GroupMamba)中的GroupMambaBlock与CSP思想改进Backbone.\n\n71. ultralytics/cfg/models/rt-detr/rtdetr-C2f-MambaVision.yaml\n\n    使用[CVPR2025 MambaVision](https://github.com/NVlabs/MambaVision)中的MambaVision与CSP思想改进Backbone.\n\n72. ultralytics/cfg/models/rt-detr/rtdetr-FCM.yaml\n\n    使用[AAAI2025 FBRT-YOLO](https://github.com/galaxy-oss/FCM)的模块改进rtdetr.\n\n73. ultralytics/cfg/models/rt-detr/rtdetr-C2f-FourierConv.yaml\n\n    使用[MIA2025 Fourier Convolution Block with global receptive field for MRI reconstruction](https://www.sciencedirect.com/science/article/abs/pii/S1361841524002743)中的FourierConv改进C2f.\n\n74. ultralytics/cfg/models/rt-detr/rtdetr-C2f-wConv.yaml\n\n    使用[weightedConvolution2.0](https://github.com/cammarasana123/weightedConvolution2.0)中的wConv2d改进C2f.\n\n75. ultralytics/cfg/models/rt-detr/rtdetr-C2f-GLVSS.yaml\n\n    使用[TGRS2025 UMFormer](https://github.com/takeyoutime/UMFormer)中的GLVSS与CSP改进backbone.\n\n76. ultralytics/cfg/models/rt-detr/rtdetr-C2f-ESC.yaml\n\n    使用[ICCV2025 ESC: Emulating Self-attention with Convolution for Efficient Image Super-Resolution](https://github.com/dslisleedh/ESC)中的ESC与CSP改进backbone.\n\n77. ultralytics/cfg/models/rt-detr/rtdetr-C2f-MBRConv3.yaml\n\n    使用[ICCV2025 MobileIE](https://github.com/AVC2-UESTC/MobileIE)中的MBRConv3与CSP改进backbone.\n\n78. ultralytics/cfg/models/rt-detr/rtdetr-C2f-MBRConv5.yaml\n\n    使用[ICCV2025 MobileIE](https://github.com/AVC2-UESTC/MobileIE)中的MBRConv5与CSP改进backbone.\n\n79. ultralytics/cfg/models/rt-detr/rtdetr-C2f-MBRConv3.yaml\n\n    使用[ICCV2025 MobileIE](https://github.com/AVC2-UESTC/MobileIE)中的MBRConv3与CSP改进backbone.\n\n80. ultralytics/cfg/models/rt-detr/rtdetr-C2f-VSSD.yaml\n\n    使用[ICCV2025 VSSD](https://github.com/YuHengsss/VSSD)中的VSSD与CSP改进backbone.\n\n81. ultralytics/cfg/models/rt-detr/rtdetr-C2f-TVIM.yaml    \n\n    使用[ICCV2025 TinyVIM](https://arxiv.org/abs/2411.17473)中的TinyVIMBlock与CSP改进backbone.\n\n82. ultralytics/cfg/models/rt-detr/rtdetr-C2f-CSI.yaml\n\n    使用[INFFUS2025 SAMamba](https://arxiv.org/pdf/2505.23214)中的CSI与C2f改进Backbone.\n\n83. ultralytics/cfg/models/rt-detr/rtdetr-C2f-ConvAttn.yaml\n\n    使用[ICCV2025 ESC: Emulating Self-attention with Convolution for Efficient Image Super-Resolution](https://github.com/dslisleedh/ESC)中的ConvAttn与CSP改进Backbone.\n\n84. ultralytics/cfg/models/rt-detr/rtdetr-C2f-UniConvBlock.yaml\n\n    使用[ICCV2025 UniConvBlock](https://github.com/ai-paperwithcode/UniConvNet)中的UniConvBlock与CSP思想改进backbone.\n\n85. ultralytics/cfg/models/rt-detr/rtdetr-C2f-LGLB.yaml\n\n    使用[ACM MM 2025 Mobile U-ViT](https://github.com/FengheTan9/Mobile-U-ViT)中的LGLBBlock与CSP思想改进backbone.\n\n86. ultralytics/cfg/models/rt-detr/rtdetr-C2f-ConverseB.yaml\n\n    使用[ICCV2025 ConverseBNet](https://github.com/cszn/ConverseNet)中的ConverseBlock与CSP思想改进backbone.\n\n87. ultralytics/cfg/models/rt-detr/rtdetr-C2f-Converse2D.yaml\n\n    使用[ICCV2025 ConverseBNet](https://github.com/cszn/ConverseNet)中的Converse2D与CSP思想改进backbone.\n\n88. ultralytics/cfg/models/rt-detr/rtdetr-C2f-GCConv.yaml\n\n    使用[CVPR2025 Golden Cudgel Network](https://github.com/gyyang23/GCNet)中的GCConv与CSP改进backbone.\n\n89. ultralytics/cfg/models/rt-detr/rtdetr-C2f-CFBlock.yaml\n\n    使用[AAAI2024 SCTNet](https://arxiv.org/pdf/2312.17071)中的CFBlock与CSP改进backbone.\n\n90. ultralytics/cfg/models/rt-detr/rtdetr-C2f-FMABlock.yaml\n\n    使用[IJCV2024 SRConvNet](https://github.com/lifengcs/SRConvNet)中的FMABlock与CSP思想改进backbone.\n\n91. ultralytics/cfg/models/rt-detr/rtdetr-C2f-LWGA.yaml\n\n    使用[LWGANet](https://github.com/lwCVer/LWGANet)中的LWGABlock与CSP思想改进backbone.\n\n92. ultralytics/cfg/models/rt-detr/rtdetr-C2f-CSSC.yaml\n\n    使用[TGRS2025 ASCNet](https://ieeexplore.ieee.org/document/10855453)中的CSSC与CSP改进backbone.\n\n93. ultralytics/cfg/models/rt-detr/rtdetr-C2f-CNCM.yaml\n\n    使用[TGRS2025 ASCNet](https://ieeexplore.ieee.org/document/10855453)中的CNCM与CSP改进backbone.\n\n94. ultralytics/cfg/models/rt-detr/rtdetr-C2f-HFRB.yaml\n\n    使用[ICCV2025 HFRB](https://arxiv.org/pdf/2507.10689)中的HFRB与CSP改进backbone.\n\n95. ultralytics/cfg/models/rt-detr/rtdetr-C2f-EVA.yaml\n\n    使用[ICIP2025 BEVANET](https://arxiv.org/pdf/2508.07300)中的EVA与CSP改进backbone.\n\n96. ultralytics/cfg/models/rt-detr/rtdetr-C2f-RMBC.yaml\n\n    使用[PlainUSR](https://arxiv.org/pdf/2409.13435)中的RepMBConv与CSP改进backbone.\n\n97. ultralytics/cfg/models/rt-detr/rtdetr-C2f-RMBC-LA.yaml\n\n    使用[PlainUSR](https://arxiv.org/pdf/2409.13435)中的RepMBConv、Local Importance-based Attention与CSP改进backbone.\n\n98. ultralytics/cfg/models/rt-detr/rtdetr-C2f-SFMB.yaml\n\n    使用[TIP2025 SFMB](https://arxiv.org/pdf/2511.06593v1)中的SFMB与CSP改进backbone.\n\n99. ultralytics/cfg/models/rt-detr/rtdetr-ESMoE.yaml\n\n    使用[YOLO-Master](https://github.com/isLinXu/YOLO-Master)中的ES-MoE模块改进RTDETR.\n\n100. ultralytics/cfg/models/rt-detr/rtdetr-FAENet.yaml\n\n    使用[TGRS2025 MASFNet](https://ieeexplore.ieee.org/document/10955257)中的FAENet增强输入图像的特征.\n\n101. ultralytics/cfg/models/rt-detr/rtdetr-C2f-MFEB.yaml\n\n    使用[MICCAI2023 SHISRCNet](https://arxiv.org/abs/2306.14119)中的MFEB与CSP改进Backbone.\n\n102. ultralytics/cfg/models/rt-detr/rtdetr-C2f-PartialNetBlock.yaml\n\n    使用[AAAI2026 Partial Channel Network](https://arxiv.org/pdf/2502.01303)中的PartialNetBlock与CSP改进Backbone.\n\n103. ultralytics/cfg/models/rt-detr/rtdetr-C2f-DGR.yaml\n\n    使用[TGRS2025 DRPCA-Net](https://arxiv.org/pdf/2507.09541)中的DRG与CSP改进Backbone.\n\n104. ultralytics/cfg/models/rt-detr/rtdetr-C2f-GLGM.yaml\n\n    使用[TGRS2025 ISGLNet](https://ieeexplore.ieee.org/document/11232501)中的GLGM与CSP改进Backbone.\n\n105. ultralytics/cfg/models/rt-detr/rtdetr-C2f-MAC.yaml\n\n    使用[TGRS2025 HDNet](https://ieeexplore.ieee.org/document/11232501)中的MAC与CSP改进Backbone.\n\n106. ultralytics/cfg/models/rt-detr/rtdetr-C2f-SPJFB.yaml\n    \n    使用[AAAI2026 SPJFNet](https://arxiv.org/pdf/2508.04041)中的SPJFBlock与CSP改进Backbone.\n\n107. ultralytics/cfg/models/rt-detr/rtdetr-C2f-GLSS2D.yaml\n    \n    使用[TGRS2025 GLVMamba](https://ieeexplore.ieee.org/document/11014226)中的GLSS2D与CSP改进Backbone.\n\n108. ultralytics/cfg/models/rt-detr/rtdetr-C2f-DEGConv.yaml\n    \n    使用[CVPR2026 MixerCSeg](https://arxiv.org/pdf/2603.01361)中的DEGConv与CSP改进Backbone.\n\n109. ultralytics/cfg/models/rt-detr/rtdetr-C2f-TransMixer.yaml\n    \n    使用[CVPR2026 MixerCSeg](https://arxiv.org/pdf/2603.01361)中的TransMixer与CSP改进Backbone.\n\n### AIFI系列\n1. ultralytics/cfg/models/rt-detr/rtdetr-AIFI-LPE.yaml\n\n    使用LearnedPositionalEncoding改进AIFI中的位置编码生成.(详细介绍请看百度云视频-20231119更新说明)\n2. ultralytics/cfg/models/rt-detr/rtdetr-CascadedGroupAttention.yaml\n\n    使用[EfficientViT CVPR2023](https://github.com/microsoft/Cream/tree/main/EfficientViT)中的CascadedGroupAttention改进rtdetr中的AIFI.(详细请看百度云视频-rtdetr-CascadedGroupAttention说明)\n3. ultralytics/cfg/models/rt-detr/rtdetr-AIFI-DAttention.yaml\n\n    使用[Vision Transformer with Deformable Attention CVPR2022](https://github.com/LeapLabTHU/DAT)中的DAttention改进AIFI.\n4. ultralytics/cfg/models/rt-detr/rtdetr-AIFI-HiLo.yaml\n\n    使用[LITv2](https://github.com/ziplab/LITv2)中具有提取高低频信息的高效注意力对AIFI进行二次改进.\n5. ultralytics/cfg/models/rt-detr/rtdetr-AIFI-EfficientAdditive.yaml\n\n    使用[ICCV2023 SwiftFormer](https://github.com/Amshaker/SwiftFormer/tree/main)中的EfficientAdditiveAttention改进AIFI.\n\n6. ultralytics/cfg/models/rt-detr/rtdetr-AIFIRepBN.yaml\n\n    使用[ICML-2024 SLAB](https://github.com/xinghaochen/SLAB)中的RepBN改进AIFI.\n\n7. ultralytics/cfg/models/rt-detr/rtdetr-AdditiveTokenMixer.yaml\n\n    使用[CAS-ViT](https://github.com/Tianfang-Zhang/CAS-ViT)中的AdditiveBlock改进AIFI.\n\n8. ultralytics/cfg/models/rt-detr/rtdetr-AIFI-MSMHSA.yaml\n\n    使用[CMTFNet](https://github.com/DrWuHonglin/CMTFNet/tree/main)中的M2SA改进AIFI.\n\n9. ultralytics/cfg/models/rt-detr/rtdetr-AIFI-DHSA.yaml\n\n    使用[Histoformer ECCV2024](https://github.com/sunshangquan/Histoformer)中的Dynamic-range Histogram Self-Attention改进AIFI.\n\n10. ultralytics/cfg/models/rt-detr/rtdetr-AIFI-DPB.yaml\n\n    使用[CrossFormer](https://arxiv.org/pdf/2108.00154)中的DynamicPosBias-Attention改进AIFI.\n\n11. ultralytics/cfg/models/rt-detr/rtdetr-DTAB.yaml\n\n    使用[AAAI2025 TBSN](https://github.com/nagejacob/TBSN)中的DTAB替换AIFI.\n\n12. ultralytics/cfg/models/rt-detr/rtdetr-ETB.yaml\n\n    使用[ECCV2024 FSEL](https://github.com/CSYSI/FSEL)中的entanglement transformer block替换AIFI.\n\n13. ultralytics/cfg/models/rt-detr/rtdetr-FDT.yaml\n\n    使用[ACMMM2024 WFEN](https://github.com/PRIS-CV/WFEN)中的Full-domain Transformer替换AIFI.\n\n14. ultralytics/cfg/models/rt-detr/rtdetr-Pola.yaml\n\n    使用[ICLR2025 PolaFormer)](https://github.com/ZacharyMeng/PolaFormer)中的PolaAttention改进AIFI.\n\n15. ultralytics/cfg/models/rt-detr/rtdetr-AIFI-TSSA.yaml\n\n    使用[Token Statistics Transformer](https://github.com/RobinWu218/ToST)中的Token Statistics Self-Attention改进AIFI.\n\n16. ultralytics/cfg/models/rt-detr/rtdetr-AIFI-ASSA.yaml\n    \n    使用[CVPR2024 Adapt or Perish: Adaptive Sparse Transformer with Attentive Feature Refinement for Image Restoration](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhou_Adapt_or_Perish_Adaptive_Sparse_Transformer_with_Attentive_Feature_Refinement_CVPR_2024_paper.pdf)中的Adaptive Sparse Self-Attention改进AIFI.\n\n17. ultralytics/cfg/models/rt-detr/rtdetr-ASSR.yaml\n     \n    使用[CVPR2025 MambaIR](https://github.com/csguoh/MambaIR)中的Attentive State Space Group改进rtdetr.\n\n18. ultralytics/cfg/models/rt-detr/rtdetr-AIFI-SEFN.yaml\n\n    使用[WACV2025 SEM-Net](https://github.com/ChrisChen1023/SEM-Net)的Spatially-Enhanced Feedforward Network (SEFN)改进AIFI.\n\n19. ultralytics/cfg/models/rt-detr/rtdetr-AIFI-DyT.yaml\n\n    使用[CVPR2025 DyT](https://github.com/jiachenzhu/DyT)中的DynamicTan改进AIFI.\n\n20. ultralytics/cfg/models/rt-detr/rtdetr-AIFI-SEFFN.yaml\n\n    使用[TransMamba](https://github.com/sunshangquan/TransMamba)的SpectralEnhancedFFN改进AIFI.\n\n21. ultralytics/cfg/models/rt-detr/rtdetr-AIFI-EDFFN.yaml\n\n    使用[CVPR2025 EVSSM](https://github.com/kkkls/EVSSM)中的EDFFN改进AIFI.\n\n22. ultralytics/cfg/models/rt-detr/rtdetr-AIFI-MSLA.yaml\n\n    使用[MSLA](https://arxiv.org/pdf/2505.18823)改进AIFI.\n\n23. ultralytics/cfg/models/rt-detr/rtdetr-AIFI-EPGO.yaml\n\n    使用[ACM MM 2025 CPRAformer](https://github.com/zs1314/CPRAformer)中的EPGO改进AIFI.\n\n24. ultralytics/cfg/models/rt-detr/rtdetr-AIFI-SHSA.yaml\n\n    使用[SHViT CVPR2024](https://github.com/ysj9909/SHViT)中的SHSA改进AIFI.\n\n25. ultralytics/cfg/models/rt-detr/rtdetr-AIFI-DML.yaml\n\n    使用[IJCV2024 SRConvNet](https://github.com/lifengcs/SRConvNet)中的DMI改进AIFI.\n\n26. ultralytics/cfg/models/rt-detr/rtdetr-AIFI-LRSA.yaml \n\n    使用[TPAMI2025 LRFormer](https://mmcheng.net/wp-content/uploads/2025/06/25PAMI_LRFormer.pdf)中的LRSA改进AIFI.\n\n27. ultralytics/cfg/models/rt-detr/rtdetr-AIFI-MALA.yaml\n\n    使用[ICCV2025 Rectifying Magnitude Neglect in Linear Attention](https://arxiv.org/pdf/2507.00698)中的MALA改进AIFI.\n\n28. ultralytics/cfg/models/rt-detr/rtdetr-AIFI-EGSA.yaml\n\n    使用[ACMMM2025 FlickCD](https://dl.acm.org/doi/epdf/10.1145/3746027.3755657)中的EGSA改进AIFI.\n\n29. ultralytics/cfg/models/rt-detr/rtdetr-AIFI-SWSA.yaml\n\n    使用[ACMMM2025 FlickCD](https://dl.acm.org/doi/epdf/10.1145/3746027.3755657)中的SWSA改进AIFI.\n\n30. ultralytics/cfg/models/rt-detr/rtdetr-AIFI-DWMMSA.yaml\n    \n    使用[TIP2025 DSMT](https://ieeexplore.ieee.org/document/10955125)中的DWMMSA改进AIFI.\n\n31. ultralytics/cfg/models/rt-detr/rtdetr-AIFI-BinaryAttn.yaml\n    \n    使用[CVPR2026 BinaryAttention](https://arxiv.org/abs/2602.00701)中的BinaryAttention改进AIFI.\n\n32. ultralytics/cfg/models/rt-detr/rtdetr-AIFI-WCA.yaml\n    \n    使用[CVPR2025 Wavelet and Prototype Augmented Query-based Transformer for Pixel-level Surface Defect Detection](https://openaccess.thecvf.com/content/CVPR2025/papers/Yan_Wavelet_and_Prototype_Augmented_Query-based_Transformer_for_Pixel-level_Surface_Defect_CVPR_2025_paper.pdf)中的WCA改进AIFI.\n\n### Neck系列\n1. ultralytics/cfg/models/rt-detr/rtdetr-ASF.yaml\n\n    使用[ASF-YOLO](https://github.com/mkang315/ASF-YOLO)中的Attentional Scale Sequence Fusion来改进rtdetr.\n2. ultralytics/cfg/models/rt-detr/rtdetr-slimneck.yaml\n\n    使用[SlimNeck](https://github.com/AlanLi1997/slim-neck-by-gsconv)中的VoVGSCSP\\VoVGSCSPC和GSConv改进rtdetr中的CCFM.\n3. ultralytics/cfg/models/rt-detr/rtdetr-SDI.yaml\n\n    使用[U-NetV2](https://github.com/yaoppeng/U-Net_v2)中的 Semantics and Detail Infusion Module对CCFM中的feature fusion进行改进.\n4. ultralytics/cfg/models/rt-detr/rtdetr-goldyolo.yaml\n\n    利用华为2023最新GOLD-YOLO中的Gatherand-Distribute进行改进特征融合模块.\n5. ultralytics/cfg/models/rt-detr/rtdetr-HSFPN.yaml\n\n    使用[MFDS-DETR](https://github.com/JustlfC03/MFDS-DETR)中的HS-FPN改进RTDETR中的CCFM.\n6. ultralytics/cfg/models/rt-detr/rtdetr-bifpn.yaml\n\n    添加BIFPN到rtdetr-r18中.  \n    其中BIFPN中有三个可选参数：\n    1. Fusion  \n        其中BIFPN中的Fusion模块支持四种: weight, adaptive, concat, bifpn(default), SDI  \n        其中weight, adaptive, concat出自[paper链接-Figure 3](https://openreview.net/pdf?id=q2ZaVU6bEsT), SDI出自[U-NetV2](https://github.com/yaoppeng/U-Net_v2)\n    2. node_mode  \n        block模块选择,具体可看对应百度云视频-20240302更新公告.\n    3. head_channel  \n        BIFPN中的通道数,默认设置为256.\n7. ultralytics/cfg/models/rt-detr/rtdetr-CSFCN.yaml\n\n    使用[Context and Spatial Feature Calibration for Real-Time Semantic Segmentation](https://github.com/kaigelee/CSFCN/tree/main)中的Context and Spatial Feature Calibration模块改进rtdetr-neck.\n8. ultralytics/cfg/models/rt-detr/rtdetr-CGAFusion.yaml\n\n    使用[DEA-Net](https://github.com/cecret3350/DEA-Net)中的content-guided attention fusion改进rtdetr-neck.\n9. ultralytics/cfg/models/rt-detr/rtdetr-SDFM.yaml\n\n    使用[PSFusion](https://github.com/Linfeng-Tang/PSFusion)中的superficial detail fusion module改进rtdetr-neck.\n\n10. ultralytics/cfg/models/rt-detr/rtdetr-PSFM.yaml\n\n    使用[PSFusion](https://github.com/Linfeng-Tang/PSFusion)中的profound semantic fusion module改进yolov8-neck.\n\n11. ultralytics/cfg/models/rt-detr/rtdetr-GLSA.yaml\n\n    使用[GLSA](https://github.com/Barrett-python/DuAT)模块改进rtdetr的neck.\n\n12. ultralytics/cfg/models/rt-detr/rtdetr-CTrans.yaml\n\n    使用[[AAAI2022] UCTransNet](https://github.com/McGregorWwww/UCTransNet/tree/main)中的ChannelTransformer改进rtdetr-neck.\n\n13. ultralytics/cfg/models/rt-detr/rtdetr-p6-CTrans.yaml\n\n    使用[[AAAI2022] UCTransNet](https://github.com/McGregorWwww/UCTransNet/tree/main)中的ChannelTransformer改进rtdetr-neck.(带有p6版本)\n\n14. ultralytics/cfg/models/rt-detr/rtdetr-MAFPN.yaml\n\n    使用[MAF-YOLO](https://arxiv.org/pdf/2407.04381)的MAFPN改进Neck.\n\n15. Cross-Layer Feature Pyramid Transformer.   \n\n    P345:ultralytics/cfg/models/rt-detr/rtdetr-CFPT.yaml\n    P3456:ultralytics/cfg/models/rt-detr/rtdetr-CFPT-P3456.yaml\n    使用[CFPT](https://github.com/duzw9311/CFPT/tree/main)改进neck.\n\n16. ultralytics/cfg/models/rt-detr/rtdetr-FreqFFPN.yaml\n\n    使用[FreqFusion TPAMI2024](https://github.com/Linwei-Chen/FreqFusion)中的FreqFusion改进Neck.(这个需要python3.10,不然最后保存模型会出错.)\n\n17. ultralytics/cfg/models/rt-detr/rtdetr-msga.yaml\n\n    使用[MSA^2 Net](https://github.com/xmindflow/MSA-2Net)中的Multi-Scale Adaptive Spatial Attention Gate改进rtdetr-neck.\n\n18. ultralytics/cfg/models/rt-detr/rtdetr-WFU.yaml\n\n    使用[ACMMM2024 WFEN](https://github.com/PRIS-CV/WFEN)中的Wavelet Feature Upgrade改进rtdetr-neck.\n\n19. ultralytics/cfg/models/rt-detr/rtdetr-mpcafsa.yaml\n\n    使用[BIBM2024 Spatial-Frequency Dual Domain Attention Network For Medical Image Segmentation](https://github.com/nkicsl/SF-UNet)的Frequency-Spatial Attention和Multi-scale Progressive Channel Attention改进rtdetr-neck.\n\n20. ultralytics/cfg/models/rt-detr/rtdetr-fsa.yaml\n\n    使用[BIBM2024 Spatial-Frequency Dual Domain Attention Network For Medical Image Segmentation](https://github.com/nkicsl/SF-UNet)的Frequency-Spatial Attention改进rtdetr.\n\n21. ultralytics/cfg/models/rt-detr/rtdetr-CAB.yaml\n\n    使用[CVPR2025 HVI](https://github.com/Fediory/HVI-CIDNet)中的CAB改进rtdetr中的特征融合.\n\n22. ultralytics/cfg/models/rt-detr/rtdetr-MFM.yaml\n\n    使用[CVPR2024 DCMPNet](https://github.com/zhoushen1/DCMPNet)中的MFM改进neck.\n\n23. ultralytics/cfg/models/rt-detr/rtdetr-GDSAFusion.yaml\n\n    使用[CVPR2025 OverLock](https://arxiv.org/pdf/2502.20087)中的GDSAFusion改进Fusion.\n\n24. ultralytics/cfg/models/rt-detr/rtdetr-PST.yaml \n\n    使用[Pyramid Sparse Transformer](https://arxiv.org/abs/2505.12772)中的Pyramid Sparse Transformer改进rtdetr-r18.\n\n25. ultralytics/cfg/models/rt-detr/rtdetr-HS-FPN.yaml\n\n    使用[AAAI2025 HS-FPN](https://github.com/ShiZican/HS-FPN/tree/main)中的HFP和SDP改进rtdetr-neck.\n\n26. ultralytics/cfg/models/rt-detr/rtdetr-HyperACE.yaml\n\n    使用[yolo13](https://github.com/iMoonLab/yolov13)中的HyperACE改进rtdetr-neck.\n\n27. ultralytics/cfg/models/rt-detr/rtdetr-DPCF.yaml\n\n    使用[INFFUS2025 SAMamba](https://arxiv.org/pdf/2505.23214)中的DPCF改进rtdetr-neck.\n\n28. ultralytics/cfg/models/rt-detr/rtdetr-RFPN.yaml\n\n    使用[ECCV2024 rethinking-fpn](https://github.com/AlanLi1997/rethinking-fpn)的SNI和GSConvE改进rtdetr-neck.\n\n29. ultralytics/cfg/models/rt-detr/rtdetr-LCA.yaml\n\n    使用[CVPR2025 HVI](https://arxiv.org/pdf/2502.20272)中的LCA改进rtdetr-neck.\n\n30. ultralytics/cfg/models/rt-detr/rtdetr-HFFE.yaml\n\n    使用[TGRS2025 HAFNet](https://ieeexplore.ieee.org/document/11154006)中的HFFE改进rtdetr-neck.\n\n31. ultralytics/cfg/models/rt-detr/rtdetr-MFPM.yaml\n\n    使用[TGRS2025 ISGLNet](https://ieeexplore.ieee.org/document/11232501)中的MFPM改进特征融合.\n\n32. ultralytics/cfg/models/rt-detr/rtdetr-ERM.yaml\n\n    使用[TGRS2025 ISGLNet](https://ieeexplore.ieee.org/document/11232501)中的ERM改进特征融合.\n\n33. ultralytics/cfg/models/rt-detr/rtdetr-CAFM.yaml\n    \n    使用[TIP2025 DSMT](https://ieeexplore.ieee.org/document/10955125)中的CAFM改进rtdetr-neck.\n\n### Head系列\n1. ultralytics/cfg/models/rt-detr/rtdetr-p2.yaml\n\n    添加小目标检测头P2到TransformerDecoderHead中.\n\n### RepC3改进系列\n1. ultralytics/cfg/models/rt-detr/rtdetr-DWRC3.yaml\n\n    使用[DWRSeg](https://arxiv.org/abs/2212.01173)中的Dilation-wise Residual(DWR)模块构建DWRC3改进rtdetr.\n2. ultralytics/cfg/models/rt-detr/rtdetr-Conv3XCC3.yaml\n\n    使用[Swift Parameter-free Attention Network](https://github.com/hongyuanyu/SPAN/tree/main)中的Conv3XC改进RepC3.\n3. ultralytics/cfg/models/rt-detr/rtdetr-DRBC3.yaml\n\n    使用[UniRepLKNet](https://github.com/AILab-CVC/UniRepLKNet/tree/main)中的DilatedReparamBlock改进RepC3.\n4. ultralytics/cfg/models/rt-detr/rtdetr-DBBC3.yaml\n\n    使用[DiverseBranchBlock CVPR2021](https://github.com/DingXiaoH/DiverseBranchBlock)改进RepC3.\n5. ultralytics/cfg/models/rt-detr/rtdetr-DGCST.yaml\n\n    使用[Lightweight Object Detection](https://arxiv.org/abs/2403.01736)中的Dynamic Group Convolution Shuffle Transformer改进rtdetr-r18.\n6. ultralytics/cfg/models/rt-detr/rtdetr-DGCST2.yaml\n\n    使用[Lightweight Object Detection](https://arxiv.org/abs/2403.01736)中的Dynamic Group Convolution Shuffle Transformer与Dynamic Group Convolution Shuffle Module进行结合改进rtdetr-r18.\n7. ultralytics/cfg/models/rt-detr/rtdetr-RetBlockC3.yaml\n\n    使用[CVPR2024 RMT](https://arxiv.org/abs/2309.11523)中的RetBlock改进RepC3.\n8. ultralytics/cfg/models/rt-detr/rtdetr-KANC3.yaml\n\n    使用[Pytorch-Conv-KAN](https://github.com/IvanDrokin/torch-conv-kan)的KAN卷积算子改进RepC3.\n    目前支持:\n    1. FastKANConv2DLayer\n    2. KANConv2DLayer\n    3. KALNConv2DLayer\n    4. KACNConv2DLayer\n    5. KAGNConv2DLayer\n9. ultralytics/cfg/models/rt-detr/rtdetr-gConvC3.yaml\n\n    使用[Rethinking Performance Gains in Image Dehazing Networks](https://arxiv.org/abs/2209.11448)的gConvblock改进RepC3.\n\n10. ultralytics/cfg/models/rt-detr/rtdetr-LFEC3.yaml\n\n    使用[Efficient Long-Range Attention Network for Image Super-resolution ECCV2022](https://github.com/xindongzhang/ELAN)中的Local feature extraction改进RepC3.\n\n11. ultralytics/cfg/models/rt-detr/rtdetr-IELC3.yaml\n\n    使用[CVPR2025 HVI](https://github.com/Fediory/HVI-CIDNet)中的Intensity Enhancement Layer改进rtdetr中的RepC3.\n\n12. ultralytics/cfg/models/rt-detr/rtdetr-FDConvC3.yaml\n\n    使用[CVPR2025 Frequency Dynamic Convolution for Dense Image Prediction](https://github.com/Linwei-Chen/FDConv)的FDConv改进RepC3.\n\n13. ultralytics/cfg/models/rt-detr/rtdetr-MBRConv3C3.yaml\n\n    使用[ICCV2025 MobileIE](https://github.com/AVC2-UESTC/MobileIE)中的MBRConv3改进RepC3.\n\n14. ultralytics/cfg/models/rt-detr/rtdetr-MBRConv5C3.yaml\n\n    使用[ICCV2025 MobileIE](https://github.com/AVC2-UESTC/MobileIE)中的MBRConv5改进RepC3.\n\n15. ultralytics/cfg/models/rt-detr/rtdetr-Converse2DC3.yaml\n\n    使用[ICCV2025 ConverseBNet](https://github.com/cszn/ConverseNet)中的Converse2D改进RepC3.\n\n### ResNet主干中的BasicBlock/BottleNeck改进系列(以下改进BottleNeck基本都有,就不再重复标注)\n1. ultralytics/cfg/models/rt-detr/rtdetr-Ortho.yaml\n\n    使用[OrthoNets](https://github.com/hady1011/OrthoNets/tree/main)中的正交通道注意力改进resnet18-backbone中的BasicBlock.(详细介绍请看百度云视频-20231119更新说明)\n2. ultralytics/cfg/models/rt-detr/rtdetr-DCNV2.yaml\n\n    使用可变形卷积DCNV2改进resnet18-backbone中的BasicBlock.\n3. ultralytics/cfg/models/rt-detr/rtdetr-DCNV3.yaml\n\n    使用可变形卷积[DCNV3 CVPR2023](https://github.com/OpenGVLab/InternImage)改进resnet18-backbone中的BasicBlock.(安装教程请看百度云视频-20231119更新说明)\n4. ultralytics/cfg/models/rt-detr/rtdetr-iRMB.yaml\n\n    使用[EMO ICCV2023](https://github.com/zhangzjn/EMO)中的iRMB改进resnet18-backbone中的BasicBlock.(详细介绍请看百度云视频-20231119更新说明)\n5. ultralytics/cfg/models/rt-detr/rtdetr-DySnake.yaml\n\n    添加[DySnakeConv](https://github.com/YaoleiQi/DSCNet)到resnet18-backbone中的BasicBlock中.\n6. ultralytics/cfg/models/rt-detr/rtdetr-PConv.yaml\n\n    使用[FasterNet CVPR2023](https://github.com/JierunChen/FasterNet)中的PConv改进resnet18-backbone中的BasicBlock.\n7. ultralytics/cfg/models/rt-detr/rtdetr-Faster.yaml\n\n    使用[FasterNet CVPR2023](https://github.com/JierunChen/FasterNet)中的Faster-Block改进resnet18-backbone中的BasicBlock.\n8. ultralytics/cfg/models/rt-detr/rtdetr-AKConv.yaml\n\n    使用[AKConv 2023](https://github.com/CV-ZhangXin/AKConv)改进resnet18-backbone中的BasicBlock.\n\n9. ultralytics/cfg/models/rt-detr/rtdetr-RFAConv.yaml\n\n    使用[RFAConv 2023](https://github.com/Liuchen1997/RFAConv)改进resnet18-backbone中的BasicBlock.\n\n10. ultralytics/cfg/models/rt-detr/rtdetr-RFCAConv.yaml\n\n    使用[RFCAConv 2023](https://github.com/Liuchen1997/RFAConv)改进resnet18-backbone中的BasicBlock.\n\n11. ultralytics/cfg/models/rt-detr/rtdetr-RFCBAMConv.yaml\n\n    使用[RFCBAMConv 2023](https://github.com/Liuchen1997/RFAConv)改进resnet18-backbone中的BasicBlock.\n12. ultralytics/cfg/models/rt-detr/rtdetr-Conv3XC.yaml\n\n    使用[Swift Parameter-free Attention Network](https://github.com/hongyuanyu/SPAN/tree/main)中的Conv3XC改进resnet18-backbone中的BasicBlock.\n13. ultralytics/cfg/models/rt-detr/rtdetr-DRB.yaml\n\n    使用[UniRepLKNet](https://github.com/AILab-CVC/UniRepLKNet/tree/main)中的DilatedReparamBlock改进resnet18-backbone中的BasicBlock.\n14. ultralytics/cfg/models/rt-detr/rtdetr-DBB.yaml\n\n    使用[DiverseBranchBlock CVPR2021](https://github.com/DingXiaoH/DiverseBranchBlock)改进resnet18-backbone中的BasicBlock.\n15. ultralytics/cfg/models/rt-detr/rtdetr-DualConv.yaml\n\n    使用[DualConv](https://github.com/ChipsGuardian/DualConv)改进resnet18-backbone中的BasicBlock.\n16. ultralytics/cfg/models/rt-detr/rtdetr-AggregatedAtt.yaml\n\n    使用[TransNeXt](https://github.com/DaiShiResearch/TransNeXt)中的聚合感知注意力改进resnet18中的BasicBlock.(百度云视频-20240106更新说明)\n17. ultralytics/cfg/models/rt-detr/rtdetr-DCNV4.yaml\n\n    使用[DCNV4](https://github.com/OpenGVLab/DCNv4)改进resnet18中的BasicBlock.\n18. ultralytics/cfg/models/rt-detr/rtdetr-SWC.yaml\n\n    使用[shift-wise conv](https://arxiv.org/abs/2401.12736)改进resnet18中的BasicBlock.\n19. ultralytics/cfg/models/rt-detr/rtdetr-VSS.yaml\n\n    使用最新的Mamba架构[Mamba-UNet中的VSS](https://github.com/ziyangwang007/Mamba-UNet)改进resnet18-backbone中的BasicBlock.\n20. ultralytics/cfg/models/rt-detr/rtdetr-ContextGuided.yaml\n\n    使用[CGNet](https://github.com/wutianyiRosun/CGNet/tree/master)中的Light-weight Context Guided和Light-weight Context Guided DownSample改进rtdetr-r18.\n21. ultralytics/cfg/models/rt-detr/rtdetr-fadc.yaml\n\n    使用[CVPR2024 Frequency-Adaptive Dilated Convolution](https://github.com/Linwei-Chen/FADC)改进resnet18-basicblock.\n22. ultralytics/cfg/models/rt-detr/rtdetr-Star.yaml\n\n    使用[StarNet CVPR2024](https://github.com/ma-xu/Rewrite-the-Stars/tree/main)中的StarBlock改进resnet18-basicblock.\n23. ultralytics/cfg/models/rt-detr/rtdetr-KAN.yaml\n\n    使用[Pytorch-Conv-KAN](https://github.com/IvanDrokin/torch-conv-kan)的KAN卷积算子改进resnet18-basicblock.\n    目前支持:\n    1. FastKANConv2DLayer\n    2. KANConv2DLayer\n    3. KALNConv2DLayer\n    4. KACNConv2DLayer\n    5. KAGNConv2DLayer\n24. ultralytics/cfg/models/rt-detr/rtdetr-DEConv.yaml\n\n    使用[DEA-Net](https://github.com/cecret3350/DEA-Net)中的detail-enhanced convolution改进resnet18-basicblock.\n    关于DEConv在运行的时候重参数化后比重参数化前的计算量还要大的问题:是因为重参数化前thop库其计算不准的问题,看重参数化后的参数即可.\n\n25. ultralytics/cfg/models/rt-detr/rtdetr-WTConv.yaml\n\n    使用[ECCV2024 Wavelet Convolutions for Large Receptive Fields](https://github.com/BGU-CS-VIL/WTConv)中的WTConv改进BasicBlock.\n\n26. ultralytics/cfg/models/rt-detr/rtdetr-WDBB.yaml\n\n    使用[YOLO-MIF](https://github.com/wandahangFY/YOLO-MIF)中的WDBB改进BasicBlock.\n\n27. ultralytics/cfg/models/rt-detr/rtdetr-DeepDBB.yaml\n\n    使用[YOLO-MIF](https://github.com/wandahangFY/YOLO-MIF)中的DeepDBB改进BasicBlock.\n\n28. ultralytics/cfg/models/rt-detr/rtdetr-GCConvC3.yaml\n\n    使用[CVPR2025 Golden Cudgel Network](https://github.com/gyyang23/GCNet)中的GCConv改进RepC3.\n\n### 上下采样算子系列\n1. ultralytics/cfg/models/rt-detr/rtdetr-DySample.yaml\n\n    使用[ICCV2023 DySample](https://arxiv.org/abs/2308.15085)改进CCFM中的上采样.\n2. ultralytics/cfg/models/rt-detr/rtdetr-CARAFE.yaml\n\n    使用[ICCV2019 CARAFE](https://arxiv.org/abs/1905.02188)改进CCFM中的上采样.\n3. ultralytics/cfg/models/rt-detr/rtdetr-HWD.yaml\n\n    使用[Haar wavelet downsampling](https://www.sciencedirect.com/science/article/abs/pii/S0031320323005174)改进CCFM的下采样.\n4. ultralytics/cfg/models/rt-detr/rtdetr-ContextGuidedDown.yaml\n\n    使用[CGNet](https://github.com/wutianyiRosun/CGNet/tree/master)中的Light-weight Context Guided DownSample改进rtdetr-r18.\n5. ultralytics/cfg/models/rt-detr/rtdetr-SRFD.yaml\n\n    使用[A Robust Feature Downsampling Module for Remote Sensing Visual Tasks](https://ieeexplore.ieee.org/document/10142024)改进rtdetr的下采样.\n\n6. ultralytics/cfg/models/rt-detr/rtdetr-WaveletPool.yaml\n\n    使用[Wavelet Pooling](https://openreview.net/forum?id=rkhlb8lCZ)改进RTDETR的上采样和下采样。\n\n7. ultralytics/cfg/models/rt-detr/rtdetr-LDConv.yaml\n\n    使用[LDConv](https://github.com/CV-ZhangXin/LDConv/tree/main)改进下采样.\n\n8. ultralytics/cfg/models/rt-detr/rtdetr-PSConv.yaml\n\n    使用[AAAI2025 Pinwheel-shaped Convolution and Scale-based Dynamic Loss for Infrared Small Target Detection](https://github.com/JN-Yang/PConv-SDloss-Data)中的Pinwheel-shaped Convolution改进rtdetr.\n\n9. ultralytics/cfg/models/rt-detr/rtdetr-EUCB.yaml\n\n    使用[CVPR2024 EMCAD](https://github.com/SLDGroup/EMCAD)中的EUCB改进rtdetr-r18的上采样.\n\n10. ultralytics/cfg/models/rt-detr/rtdetr-LoGStem.yaml\n\n    使用[LEGNet](https://github.com/lwCVer/LEGNet)中的LoGStem改进Stem.\n\n11. ultralytics/cfg/models/rt-detr/rtdetr-FourierConv.yaml\n\n    使用[MIA2025 Fourier Convolution Block with global receptive field for MRI reconstruction](https://www.sciencedirect.com/science/article/abs/pii/S1361841524002743)中的FourierConv改进Conv.\n\n12. ultralytics/cfg/models/rt-detr/rtdetr-wConv.yaml\n\n    使用[weightedConvolution2.0](https://github.com/cammarasana123/weightedConvolution2.0)中的wConv2d改进rtdetr.\n\n13. ultralytics/cfg/models/rt-detr/rtdetr-Converse2D.yaml\n\n    使用[ICCV2025 ConverseBNet](https://github.com/cszn/ConverseNet)中的Converse2D改进neck中的上采样.\n\n14. ultralytics/cfg/models/rt-detr/rtdetr-RepStem.yaml\n\n    使用[ICCV2023 FastVit](https://arxiv.org/pdf/2303.14189)中的RepStem改进rtdetr下采样.\n\n15. ultralytics/cfg/models/rt-detr/rtdetr-GCConv.yaml\n\n    使用[CVPR2025 Golden Cudgel Network](https://github.com/gyyang23/GCNet)中的GCConv改进下采样.\n\n16. ultralytics/cfg/models/rt-detr/rtdetr-FSConv.yaml\n\n    使用[TGRS2025 Think Locally and Act Globally](https://ieeexplore.ieee.org/document/11175146)中的FSConv改进下采样.\n\n### RT-DETR-L改进系列\n1. ultralytics/cfg/models/rt-detr/rtdetr-l-GhostHGNetV2.yaml\n\n    使用GhostConv改进HGNetV2.(详细介绍请看百度云视频-20231109更新说明)\n\n2. ultralytics/cfg/models/rt-detr/rtdetr-l-RepHGNetV2.yaml\n\n    使用RepConv改进HGNetV2.(详细介绍请看百度云视频-20231109更新说明)\n\n3. ultralytics/cfg/models/rt-detr/rtdetr-l-attention.yaml\n\n    添加注意力模块到HGBlock中.(手把手教程请看百度云视频-手把手添加注意力教程)\n\n### RT-DETR-Mamba\n    集成Mamba-YOLO,并把head改为RTDETR-Head.(需要编译，请看百度云视频)\n    ultralytics/cfg/models/rt-detr/rtdetr-mamba-T.yaml\n    ultralytics/cfg/models/rt-detr/rtdetr-mamba-B.yaml\n    ultralytics/cfg/models/rt-detr/rtdetr-mamba-L.yaml\n\n### 注意力系列\n1. EMA\n2. SimAM\n3. SpatialGroupEnhance\n4. BiLevelRoutingAttention, BiLevelRoutingAttention_nchw\n5. TripletAttention\n6. CoordAtt\n7. CBAM\n8. BAMBlock\n9. EfficientAttention(CloFormer中的注意力)\n10. LSKBlock\n11. SEAttention\n12. CPCA\n13. deformable_LKA\n14. EffectiveSEModule\n15. LSKA\n16. SegNext_Attention\n17. DAttention(Vision Transformer with Deformable Attention CVPR2022)\n18. FocusedLinearAttention(ICCV2023)\n19. MLCA\n20. TransNeXt_AggregatedAttention\n21. HiLo\n22. LocalWindowAttention(EfficientViT中的CascadedGroupAttention注意力)\n23. Efficient Local Attention\n24. CAA(CVPR2024 PKINet中的注意力)\n25. CAFM\n\n### IoU系列\n1. IoU,GIoU,DIoU,CIoU,EIoU,SIoU(百度云视频-20231125更新说明)\n2. MPDIoU[论文链接](https://arxiv.org/pdf/2307.07662.pdf)(百度云视频-20231125更新说明)\n3. Inner-IoU,Inner-GIoU,Inner-DIoU,Inner-CIoU,Inner-EIoU,Inner-SIoU[论文链接](https://arxiv.org/abs/2311.02877)(百度云视频-20231125更新说明)\n4. Inner-MPDIoU(利用Inner-Iou与MPDIou进行二次创新)(百度云视频-20231125更新说明)\n5. Normalized Gaussian Wasserstein Distance.[论文链接](https://arxiv.org/abs/2110.13389)(百度云视频-20231125更新说明)\n6. Shape-IoU,Inner-Shape-IoU[论文链接](https://arxiv.org/abs/2110.13389)(百度云视频-20240106更新说明)\n7. SlideLoss,EMASlideLoss[创新思路](https://www.bilibili.com/video/BV1W14y1i79U/?vd_source=c8452371e7ca510979593165c8d7ac27).[Yolo-Face V2](https://github.com/Krasjet-Yu/YOLO-FaceV2/blob/master/utils/loss.py)(百度云视频-20240113更新说明)\n8. Wise-IoU(v1,v2,v3)系列(IoU,WIoU,EIoU,GIoU,DIoU,CIoU,SIoU,MPDIoU,ShapeIoU)(百度云视频-20240113更新说明)\n9. Inner-Wise-IoU(v1,v2,v3)系列(IoU,WIoU,EIoU,GIoU,DIoU,CIoU,SIoU,MPDIoU,ShapeIoU)(百度云视频-20240113更新说明)\n10. Focaler-IoU,Focaler-GIoU,Focaler-DIoU,Focaler-CIoU,Focaler-EIoU,Focaler-SIoU,Focaler-Shape-IoU,Focaler-MPDIoU[论文链接](https://arxiv.org/abs/2401.10525)(百度云视频-20240128更新说明)\n11. Focaler-Wise-IoU(v1,v2,v3)(IoU,WIoU,EIoU,GIoU,DIoU,CIoU,SIoU,MPDIoU,ShapeIoU)[论文链接](https://arxiv.org/abs/2401.10525)(百度云视频-20240128更新说明)\n12. Powerful-IoU,Powerful-IoUV2,Inner-Powerful-IoU,Inner-Powerful-IoUV2,Focaler-Powerful-IoU,Focaler-Powerful-IoUV2,Wise-Powerful-IoU(v1,v2,v3),Wise-Powerful-IoUV2(v1,v2,v3)[论文链接](https://www.sciencedirect.com/science/article/abs/pii/S0893608023006640)\n13. SlideVarifocalLoss,EMASlideVarifocalLoss[创新思路](https://www.bilibili.com/video/BV1W14y1i79U/?vd_source=c8452371e7ca510979593165c8d7ac27).[Yolo-Face V2](https://github.com/Krasjet-Yu/YOLO-FaceV2/blob/master/utils/loss.py)(百度云视频-20240302更新说明)\n14. CVPR2025-DEIM-MAL.(百度云视频-20240315更新说明)\n15. Gaussian Combined Distance[论文链接](https://arxiv.org/pdf/2510.27649)(百度云视频-20251122更新说明)\n\n### 以Yolov8为基准模型的改进方案\n1. ultralytics/cfg/models/yolo-detr/yolov8-detr.yaml\n\n    使用RT-DETR中的TransformerDecoderHead改进yolov8.\n\n2. ultralytics/cfg/models/yolo-detr/yolov8-detr-DWR.yaml\n\n    使用RT-DETR中的TransformerDecoderHead和[DWRSeg](https://arxiv.org/abs/2212.01173)中的Dilation-wise Residual(DWR)模块改进yolov8.\n\n3. ultralytics/cfg/models/yolo-detr/yolov8-detr-fasternet.yaml\n\n    使用RT-DETR中的TransformerDecoderHead和[FasterNet CVPR2023](https://github.com/JierunChen/FasterNet)改进yolov8.(支持替换其他主干,请看百度云视频-替换主干示例教程)\n\n4. ultralytics/cfg/models/yolo-detr/yolov8-detr-AIFI-LPE.yaml\n\n    使用RT-DETR中的TransformerDecoderHead和LearnedPositionalEncoding改进yolov8.(详细介绍请看百度云视频-20231119更新说明)\n\n5. ultralytics/cfg/models/yolo-detr/yolov8-detr-C2f-DCNV2.yaml\n\n    使用RT-DETR中的TransformerDecoderHead和可变形卷积DCNV2改进yolov8.\n\n6. ultralytics/cfg/models/yolo-detr/yolov8-detr-C2f-DCNV3.yaml\n\n    使用RT-DETR中的TransformerDecoderHead和可变形卷积[DCNV3 CVPR2023](https://github.com/OpenGVLab/InternImage)改进yolov8.(安装教程请看百度云视频-20231119更新说明)\n\n7. ultralytics/cfg/models/yolo-detr/yolov8-detr-C2f-DCNV2-Dynamic.yaml\n\n    使用RT-DETR中的TransformerDecoderHead和自研可变形卷积DCNV2-Dynamic改进yolov8.(详细介绍请看百度云视频-MPCA与DCNV2_Dynamic的说明)\n\n8. ultralytics/cfg/models/yolo-detr/yolov8-detr-C2f-Ortho.yaml\n\n    使用RT-DETR中的TransformerDecoderHead和[OrthoNets](https://github.com/hady1011/OrthoNets/tree/main)中的正交通道注意力改进yolov8.(详细介绍请看百度云视频-20231119更新说明)\n\n9. ultralytics/cfg/models/yolo-detr/yolov8-detr-attention.yaml\n\n    添加注意力到基于RTDETR-Head中的yolov8中.(手把手教程请看百度云视频-手把手添加注意力教程)\n\n10. ultralytics/cfg/models/yolo-detr/yolov8-detr-p2.yaml\n\n    添加小目标检测头P2到TransformerDecoderHead中.\n\n11. ultralytics/cfg/models/yolo-detr/yolov8-detr-C2f-DySnake.yaml\n\n    [DySnakeConv](https://github.com/YaoleiQi/DSCNet)与C2f融合.  \n\n12. ultralytics/cfg/models/yolo-detr/yolov8-detr-C2f-Faster.yaml\n\n    使用RT-DETR中的TransformerDecoderHead和[FasterNet CVPR2023](https://github.com/JierunChen/FasterNet)中的Faster-Block改进yolov8.\n\n13. ultralytics/cfg/models/yolo-detr/yolov8-detr-C2f-Faster-Rep.yaml\n\n    使用RT-DETR中的TransformerDecoderHead和[FasterNet CVPR2023](https://github.com/JierunChen/FasterNet)中与[RepVGG CVPR2021](https://github.com/DingXiaoH/RepVGG)中的RepConv二次创新后的Faster-Block-Rep改进yolov8.\n\n14. ultralytics/cfg/models/yolo-detr/yolov8-detr-C2f-Faster-EMA.yaml\n\n    使用RT-DETR中的TransformerDecoderHead和[FasterNet CVPR2023](https://github.com/JierunChen/FasterNet)中与[EMA ICASSP2023](https://arxiv.org/abs/2305.13563v1)二次创新后的Faster-Block-EMA的Faster-Block-EMA改进yolov8.\n\n15. ultralytics/cfg/models/yolo-detr/yolov8-detr-C2f-Faster-Rep-EMA.yaml\n\n    使用RT-DETR中的TransformerDecoderHead和[FasterNet CVPR2023](https://github.com/JierunChen/FasterNet)中与[RepVGG CVPR2021](https://github.com/DingXiaoH/RepVGG)中的RepConv、[EMA ICASSP2023](https://arxiv.org/abs/2305.13563v1)二次创新后的Faster-Block改进yolov8.\n\n16. ultralytics/cfg/models/yolo-detr/yolov8-detr-C2f-AKConv.yaml\n\n    使用RT-DETR中的TransformerDecoderHead和[AKConv 2023](https://github.com/CV-ZhangXin/AKConv)改进yolov8.\n\n17. ultralytics/cfg/models/yolo-detr/yolov8-detr-C2f-RFAConv.yaml\n\n    使用RT-DETR中的TransformerDecoderHead和[RFAConv 2023](https://github.com/Liuchen1997/RFAConv)改进yolov8.\n\n18. ultralytics/cfg/models/yolo-detr/yolov8-detr-C2f-RFAConv.yaml\n\n    使用RT-DETR中的TransformerDecoderHead和[RFCAConv 2023](https://github.com/Liuchen1997/RFAConv)改进yolov8.\n\n19. ultralytics/cfg/models/yolo-detr/yolov8-detr-C2f-RFAConv.yaml\n\n    使用RT-DETR中的TransformerDecoderHead和[RFCBAMConv 2023](https://github.com/Liuchen1997/RFAConv)改进yolov8.\n\n20. ultralytics/cfg/models/yolo-detr/yolov8-detr-C2f-Conv3XC.yaml\n\n    使用RT-DETR中的TransformerDecoderHead和[Swift Parameter-free Attention Network](https://github.com/hongyuanyu/SPAN/tree/main)中的Conv3XC改进yolov8.\n\n21. ultralytics/cfg/models/yolo-detr/yolov8-detr-C2f-SPAB.yaml\n\n    使用RT-DETR中的TransformerDecoderHead和[Swift Parameter-free Attention Network](https://github.com/hongyuanyu/SPAN/tree/main)中的SPAB改进yolov8.\n\n22. ultralytics/cfg/models/yolo-detr/yolov8-detr-C2f-DRB.yaml\n\n    使用RT-DETR中的TransformerDecoderHead和[UniRepLKNet](https://github.com/AILab-CVC/UniRepLKNet/tree/main)中的DilatedReparamBlock改进yolov8.\n\n23. ultralytics/cfg/models/yolo-detr/yolov8-detr-C2f-UniRepLKNetBlock.yaml\n\n    使用RT-DETR中的TransformerDecoderHead和[UniRepLKNet](https://github.com/AILab-CVC/UniRepLKNet/tree/main)中的UniRepLKNetBlock改进yolov8.\n\n24. ultralytics/cfg/models/yolo-detr/yolov8-detr-DWR-DRB.yaml\n\n    使用[UniRepLKNet](https://github.com/AILab-CVC/UniRepLKNet/tree/main)中的DilatedReparamBlock对[DWRSeg](https://arxiv.org/abs/2212.01173)中的Dilation-wise Residual(DWR)进行二次创新改进yolov8.\n\n25. ultralytics/cfg/models/yolo-detr/yolov8-detr-C2f-DBB.yaml\n\n    使用RT-DETR中的TransformerDecoderHead和[DiverseBranchBlock CVPR2021](https://github.com/DingXiaoH/DiverseBranchBlock)改进yolov8.\n\n26. ultralytics/cfg/models/yolo-detr/yolov8-detr-CSP-EDLAN.yaml\n\n    使用RT-DETR中的TransformerDecoderHead和[DualConv](https://github.com/ChipsGuardian/DualConv)打造CSP Efficient Dual Layer Aggregation Networks改进yolov8.\n\n27. ultralytics/cfg/models/yolo-detr/yolov8-detr-ASF.yaml\n\n    使用RT-DETR中的TransformerDecoderHead和[ASF-YOLO](https://github.com/mkang315/ASF-YOLO)中的Attentional Scale Sequence Fusion改进yolov8.\n\n28. ultralytics/cfg/models/yolo-detr/yolov8-detr-ASF-P2.yaml\n\n    在ultralytics/cfg/models/yolo-detr/yolov8-detr-ASF.yaml的基础上进行二次创新，引入P2检测层并对网络结构进行优化.\n\n29. ultralytics/cfg/models/yolo-detr/yolov8-detr-slimneck.yaml\n\n    使用RT-DETR中的TransformerDecoderHead和[SlimNeck](https://github.com/AlanLi1997/slim-neck-by-gsconv)中VoVGSCSP\\VoVGSCSPC和GSConv改进yolov8的neck.\n\n30. ultralytics/cfg/models/yolo-detr/yolov8-detr-slimneck-asf.yaml\n\n    在ultralytics/cfg/models/yolo-detr/yolov8-detr-slimneck.yaml使用[ASF-YOLO](https://github.com/mkang315/ASF-YOLO)中的Attentional Scale Sequence Fusion进行二次创新.\n\n31. ultralytics/cfg/models/yolo-detr/yolov8-detr-C2f-AggregatedAtt.yaml\n\n    使用RT-DETR中的TransformerDecoderHead和[TransNeXt](https://github.com/DaiShiResearch/TransNeXt)中的聚合感知注意力改进C2f.(百度云视频-20240106更新说明)\n\n32. ultralytics/cfg/models/yolo-detr/yolov8-detr-SDI.yaml\n\n    使用RT-DETR中的TransformerDecoderHead和[U-NetV2](https://github.com/yaoppeng/U-Net_v2)中的 Semantics and Detail Infusion Module对yolov8中的feature fusion进行改进.\n\n33. ultralytics/cfg/models/yolo-detr/yolov8-detr-goldyolo.yaml\n\n    利用RT-DETR中的TransformerDecoderHead和华为2023最新GOLD-YOLO中的Gatherand-Distribute进行改进特征融合模块.\n\n34. ultralytics/cfg/models/yolo-detr/yolov8-detr-goldyolo-asf.yaml\n\n    利用RT-DETR中的TransformerDecoderHead和华为2023最新GOLD-YOLO中的Gatherand-Distribute和[ASF-YOLO](https://github.com/mkang315/ASF-YOLO)中的Attentional Scale Sequence Fusion进行改进特征融合模块.\n\n35. ultralytics/cfg/models/yolo-detr/yolov8-detr-C2f-DCNV4.yaml\n\n    使用[DCNV4](https://github.com/OpenGVLab/DCNv4)改进C2f.\n\n36. ultralytics/cfg/models/yolo-detr/yolov8-detr-HSFPN.yaml\n\n    利用RT-DETR中的TransformerDecoderHead和使用[MFDS-DETR](https://github.com/JustlfC03/MFDS-DETR)中的HS-FPN改进YOLOV8中的PAN.\n\n37. ultralytics/cfg/models/yolo-detr/yolov8-detr-HSPAN.yaml\n\n    利用RT-DETR中的TransformerDecoderHead和对[MFDS-DETR](https://github.com/JustlfC03/MFDS-DETR)中的HS-FPN进行二次创新后得到HSPAN改进YOLOV8中的PAN.\n\n38. ultralytics/cfg/models/yolo-detr/yolov8-detr-Dysample.yaml\n\n    使用[ICCV2023 DySample](https://arxiv.org/abs/2308.15085)改进yolov8-detr neck中的上采样.\n\n39. ultralytics/cfg/models/yolo-detr/yolov8-detr-CARAFE.yaml\n\n    使用[ICCV2019 CARAFE](https://arxiv.org/abs/1905.02188)改进yolov8-detr neck中的上采样.\n\n40. ultralytics/cfg/models/yolo-detr/yolov8-detr-HWD.yaml\n\n    使用[Haar wavelet downsampling](https://www.sciencedirect.com/science/article/abs/pii/S0031320323005174)改进yolov8-detr neck的下采样.\n\n41. ultralytics/cfg/models/yolo-detr/yolov8-detr-ASF-Dynamic.yaml\n\n    使用[ICCV2023 DySample](https://arxiv.org/abs/2308.15085)改进[ASF-YOLO](https://github.com/mkang315/ASF-YOLO)中的Attentional Scale Sequence Fusion的上采样模块得到Dynamic Sample Attentional Scale Sequence Fusion改进yolov8-detr中的neck.\n\n42. ultralytics/cfg/models/yolo-detr/yolov8-detr-C2f-SWC.yaml\n\n    使用[shift-wise conv](https://arxiv.org/abs/2401.12736)改进yolov8-detr中的C2f.\n\n43. ultralytics/cfg/models/yolo-detr/yolov8-detr-iRMB-DRB.yaml\n\n    使用[UniRepLKNet](https://github.com/AILab-CVC/UniRepLKNet/tree/main)中的DilatedReparamBlock对[EMO ICCV2023](https://github.com/zhangzjn/EMO)中的iRMB进行二次创新来改进yolov8-detr中的C2f.\n\n44. ultralytics/cfg/models/yolo-detr/yolov8-detr-iRMB-SWC.yaml\n\n    使用[shift-wise conv](https://arxiv.org/abs/2401.12736)对[EMO ICCV2023](https://github.com/zhangzjn/EMO)中的iRMB进行二次创新来改进yolov8-detr中的C2f.\n\n45. ultralytics/cfg/models/yolo-detr/yolov8-detr-C2f-VSS.yaml\n\n    使用最新的Mamba架构[Mamba-UNet中的VSS](https://github.com/ziyangwang007/Mamba-UNet)对C2f中的BottleNeck进行改进,使其能更有效地捕获图像中的复杂细节和更广泛的语义上下文.\n\n46. ultralytics/cfg/models/yolo-detr/yolov8-detr-C2f-LVMB.yaml\n\n    使用最新的Mamba架构[Mamba-UNet中的VSS](https://github.com/ziyangwang007/Mamba-UNet)与Cross Stage Partial进行结合,使其能更有效地捕获图像中的复杂细节和更广泛的语义上下文.\n\n47. ultralytics/cfg/models/yolo-detr/yolov8-detr-RepNCSPELAN.yaml\n\n    使用[YOLOV9](https://github.com/WongKinYiu/yolov9)中的RepNCSPELAN进行改进yolov8-detr.\n\n48. ultralytics/cfg/models/yolo-detr/yolov8-detr-bifpn.yaml\n\n    添加BIFPN到yolov8中.  \n    其中BIFPN中有三个可选参数：\n    1. Fusion  \n        其中BIFPN中的Fusion模块支持五种: weight, adaptive, concat, bifpn(default), SDI  \n        其中weight, adaptive, concat出自[paper链接-Figure 3](https://openreview.net/pdf?id=q2ZaVU6bEsT), SDI出自[U-NetV2](https://github.com/yaoppeng/U-Net_v2)\n    2. node_mode  \n        block模块选择,具体可看对应百度云视频-20240302更新公告.\n    3. head_channel  \n        BIFPN中的通道数,默认设置为256.\n\n49. ultralytics/cfg/models/yolo-detr/yolov8-detr-C2f-ContextGuided.yaml\n\n    使用[CGNet](https://github.com/wutianyiRosun/CGNet/tree/master)中的Light-weight Context Guided和Light-weight Context Guided DownSample改进yolov8-detr.\n\n50. ultralytics/cfg/models/yolo-detr/yolov8-detr-PACAPN.yaml\n\n    自研结构, Parallel Atrous Convolution Attention Pyramid Network, PAC-APN\n\n51. ultralytics/cfg/models/yolo-detr/yolov8-detr-DGCST.yaml\n\n    使用[Lightweight Object Detection](https://arxiv.org/abs/2403.01736)中的Dynamic Group Convolution Shuffle Transformer改进yolov8-detr.\n\n52. ultralytics/cfg/models/yolo-detr/yolov8-detr-C2f-RetBlock.yaml\n\n    使用[CVPR2024 RMT](https://arxiv.org/abs/2309.11523)中的RetBlock改进C2f.\n\n53. ultralytics/cfg/models/yolo-detr/yolov8-detr-C2f-PKI.yaml\n\n    使用[CVPR2024 PKINet](https://github.com/PKINet/PKINet)中的PKIModule和CAA模块改进C2f.\n\n54. ultralytics/cfg/models/yolo-detr/yolov8-detr-C2f-fadc.yaml\n\n    使用[CVPR2024 Frequency-Adaptive Dilated Convolution](https://github.com/Linwei-Chen/FADC)改进C2f.\n\n55. ultralytics/cfg/models/yolo-detr/yolov8-detr-FDPN.yaml\n\n    自研特征聚焦扩散金字塔网络(Focusing Diffusion Pyramid Network)\n    1. 通过定制的特征聚焦模块与特征扩散机制，能让每个尺度的特征都具有详细的上下文信息，更有利于后续目标的检测与分类。\n    2. 定制的特征聚焦模块可以接受三个尺度的输入，其内部包含一个Inception-Style的模块，其利用一组并行深度卷积来捕获丰富的跨多个尺度的信息。\n    3. 通过扩散机制使具有丰富的上下文信息的特征进行扩散到各个检测尺度.\n\n56. ultralytics/cfg/models/yolo-detr/yolov8-detr-FDPN-DASI.yaml\n\n    使用[HCFNet](https://github.com/zhengshuchen/HCFNet)中的Dimension-Aware Selective Integration Module对自研的Focusing Diffusion Pyramid Network再次创新.\n\n57. ultralytics/cfg/models/yolo-detr/yolov8-detr-C2f-PPA.yaml\n\n    使用[HCFNet](https://github.com/zhengshuchen/HCFNet)中的Parallelized Patch-Aware Attention Module改进C2f.\n\n58. ultralytics/cfg/models/yolo-detr/yolov8-detr-SRFD.yaml\n\n    使用[A Robust Feature Downsampling Module for Remote Sensing Visual Tasks](https://ieeexplore.ieee.org/document/10142024)改进yolov8的下采样.\n\n59. ultralytics/cfg/models/yolo-detr/yolov8-detr-CSFCN.yaml\n\n    使用[Context and Spatial Feature Calibration for Real-Time Semantic Segmentation](https://github.com/kaigelee/CSFCN/tree/main)中的Context and Spatial Feature Calibration模块改进yolov8.\n\n60. ultralytics/cfg/models/yolo-detr/yolov8-detr-CGAFusion.yaml\n\n    使用[DEA-Net](https://github.com/cecret3350/DEA-Net)中的content-guided attention fusion改进yolov8-neck.\n\n61. ultralytics/cfg/models/yolo-detr/yolov8-detr-CAFMFusion.yaml\n\n    利用具有[HCANet](https://github.com/summitgao/HCANet)中的CAFM，其具有获取全局和局部信息的注意力机制进行二次改进content-guided attention fusion.\n \n62. ultralytics/cfg/models/yolo-detr/yolov8-detr-RGCSPELAN.yaml\n\n    自研RepGhostCSPELAN.\n    1. 参考GhostNet中的思想(主流CNN计算的中间特征映射存在广泛的冗余)，采用廉价的操作生成一部分冗余特征图，以此来降低计算量和参数量。\n    2. 舍弃yolov5与yolov8中常用的BottleNeck，为了弥补舍弃残差块所带来的性能损失，在梯度流通分支上使用RepConv，以此来增强特征提取和梯度流通的能力，并且RepConv可以在推理的时候进行融合，一举两得。\n    3. 可以通过缩放因子控制RGCSPELAN的大小，使其可以兼顾小模型和大模型。\n\n63. ultralytics/cfg/models/yolo-detr/yolov8-detr-C2f-Faster-CGLU.yaml\n\n    使用[TransNeXt CVPR2024](https://github.com/DaiShiResearch/TransNeXt)中的Convolutional GLU对CVPR2023中的FasterNet进行二次创新.\n\n64. ultralytics/cfg/models/yolo-detr/yolov8-detr-SDFM.yaml\n\n    使用[PSFusion](https://github.com/Linfeng-Tang/PSFusion)中的superficial detail fusion module改进yolov8-neck.\n\n65. ultralytics/cfg/models/yolo-detr/yolov8-detr-PSFM.yaml\n\n    使用[PSFusion](https://github.com/Linfeng-Tang/PSFusion)中的profound semantic fusion module改进yolov8-neck.\n\n66. ultralytics/cfg/models/yolo-detr/yolov8-detr-C2f-Star.yaml\n\n    使用[StarNet CVPR2024](https://github.com/ma-xu/Rewrite-the-Stars/tree/main)中的StarBlock改进C2f.\n\n67. ultralytics/cfg/models/yolo-detr/yolov8-detr-C2f-Star-CAA.yaml\n\n    使用[StarNet CVPR2024](https://github.com/ma-xu/Rewrite-the-Stars/tree/main)中的StarBlock和[CVPR2024 PKINet](https://github.com/PKINet/PKINet)中的CAA改进C2f.\n\n68. ultralytics/cfg/models/yolo-detr/yolov8-detr-C2f-KAN.yaml\n\n    使用[Pytorch-Conv-KAN](https://github.com/IvanDrokin/torch-conv-kan)的KAN卷积算子改进C2f.\n    目前支持:\n    1. FastKANConv2DLayer\n    2. KANConv2DLayer\n    3. KALNConv2DLayer\n    4. KACNConv2DLayer\n    5. KAGNConv2DLayer\n\n69. ultralytics/cfg/models/yolo-detr/yolov8-detr-ContextGuideFPN.yaml\n\n    Context Guide Fusion Module（CGFM）是一个创新的特征融合模块，旨在改进YOLOv8中的特征金字塔网络（FPN）。该模块的设计考虑了多尺度特征融合过程中上下文信息的引导和自适应调整。\n    1. 上下文信息的有效融合：通过SE注意力机制，模块能够在特征融合过程中捕捉并利用重要的上下文信息，从而增强特征表示的有效性，并有效引导模型学习检测目标的信息，从而提高模型的检测精度。\n    2. 特征增强：通过权重化的特征重组操作，模块能够增强重要特征，同时抑制不重要特征，提升特征图的判别能力。\n    3. 简单高效：模块结构相对简单，不会引入过多的计算开销，适合在实时目标检测任务中应用。\n    这期视频讲解在B站:https://www.bilibili.com/video/BV1Vx4y1n7hZ/\n\n70. ultralytics/cfg/models/yolo-detr/yolov8-detr-C2f-DEConv.yaml\n\n    使用[DEA-Net](https://github.com/cecret3350/DEA-Net)中的detail-enhanced convolution改进C2f.\n    关于DEConv在运行的时候重参数化后比重参数化前的计算量还要大的问题:是因为重参数化前thop库其计算不准的问题,看重参数化后的参数即可.\n\n71. ultralytics/cfg/models/yolo-detr/yolov8-detr-C2f-SMPCGLU.yaml\n\n    Self-moving Point Convolutional GLU模型改进C2f.\n    SMP来源于[CVPR2023-SMPConv](https://github.com/sangnekim/SMPConv),Convolutional GLU来源于[TransNeXt CVPR2024](https://github.com/DaiShiResearch/TransNeXt).\n    1. 普通的卷积在面对数据中的多样性和复杂性时，可能无法捕捉到有效的特征，因此我们采用了SMPConv，其具备最新的自适应点移动机制，从而更好地捕捉局部特征，提高特征提取的灵活性和准确性。\n    2. 在SMPConv后添加CGLU，Convolutional GLU 结合了卷积和门控机制，能够选择性地通过信息通道，提高了特征提取的有效性和灵活性。\n\n### 以Yolov5为基准模型的改进方案\n1. ultralytics/cfg/models/yolo-detr/yolov5-detr.yaml\n\n    使用RT-DETR中的TransformerDecoderHead改进yolov5.\n\n2. ultralytics/cfg/models/yolo-detr/yolov5-detr-DWR.yaml\n\n    使用RT-DETR中的TransformerDecoderHead和[DWRSeg](https://arxiv.org/abs/2212.01173)中的Dilation-wise Residual(DWR)模块改进yolov5.\n\n3. ultralytics/cfg/models/yolo-detr/yolov5-detr-fasternet.yaml\n\n    使用RT-DETR中的TransformerDecoderHead和[FasterNet CVPR2023](https://github.com/JierunChen/FasterNet)改进yolov5.(支持替换其他主干,请看百度云视频-替换主干示例教程)\n\n4. ultralytics/cfg/models/yolo-detr/yolov5-detr-AIFI-LPE.yaml\n\n    使用RT-DETR中的TransformerDecoderHead和LearnedPositionalEncoding改进yolov5.(详细介绍请看百度云视频-20231119更新说明)\n\n5. ultralytics/cfg/models/yolo-detr/yolov5-detr-C3-DCNV2.yaml\n\n    使用RT-DETR中的TransformerDecoderHead和可变形卷积DCNV2改进yolov5.\n\n6. ultralytics/cfg/models/yolo-detr/yolov5-detr-C3-DCNV3.yaml\n\n    使用RT-DETR中的TransformerDecoderHead和可变形卷积[DCNV3 CVPR2023](https://github.com/OpenGVLab/InternImage)改进yolov5.(安装教程请看百度云视频-20231119更新说明)\n\n7. ultralytics/cfg/models/yolo-detr/yolov5-detr-C3-DCNV2-Dynamic.yaml\n\n    使用RT-DETR中的TransformerDecoderHead和自研可变形卷积DCNV2-Dynamic改进yolov5.(详细介绍请看百度云视频-MPCA与DCNV2_Dynamic的说明)\n\n8. ultralytics/cfg/models/yolo-detr/yolov5-detr-C3-Ortho.yaml(详细介绍请看百度云视频-20231119更新说明)\n\n    使用RT-DETR中的TransformerDecoderHead和[OrthoNets](https://github.com/hady1011/OrthoNets/tree/main)中的正交通道注意力改进yolov5.\n\n9. ultralytics/cfg/models/yolo-detr/yolov5-detr-attention.yaml\n\n    添加注意力到基于RTDETR-Head中的yolov5中.(手把手教程请看百度云视频-手把手添加注意力教程)\n\n10. ultralytics/cfg/models/yolo-detr/yolov5-detr-p2.yaml\n\n    添加小目标检测头P2到TransformerDecoderHead中.\n\n11. ultralytics/cfg/models/yolo-detr/yolov5-detr-C3-DySnake.yaml\n\n    [DySnakeConv](https://github.com/YaoleiQi/DSCNet)与C3融合.  \n\n12. ultralytics/cfg/models/yolo-detr/yolov5-detr-C3-Faster.yaml\n\n    使用RT-DETR中的TransformerDecoderHead和[FasterNet CVPR2023](https://github.com/JierunChen/FasterNet)中的Faster-Block改进yolov5.\n\n13. ultralytics/cfg/models/yolo-detr/yolov5-detr-C3-Faster-Rep.yaml\n\n    使用RT-DETR中的TransformerDecoderHead和[FasterNet CVPR2023](https://github.com/JierunChen/FasterNet)中与[RepVGG CVPR2021](https://github.com/DingXiaoH/RepVGG)中的RepConv二次创新后的Faster-Block-Rep改进yolov5.\n\n14. ultralytics/cfg/models/yolo-detr/yolov5-detr-C3-Faster-EMA.yaml\n\n    使用RT-DETR中的TransformerDecoderHead和[FasterNet CVPR2023](https://github.com/JierunChen/FasterNet)中与[EMA ICASSP2023](https://arxiv.org/abs/2305.13563v1)二次创新后的Faster-Block-EMA的Faster-Block-EMA改进yolov5.\n\n15. ultralytics/cfg/models/yolo-detr/yolov5-detr-C3-Faster-Rep-EMA.yaml\n\n    使用RT-DETR中的TransformerDecoderHead和[FasterNet CVPR2023](https://github.com/JierunChen/FasterNet)中与[RepVGG CVPR2021](https://github.com/DingXiaoH/RepVGG)中的RepConv、[EMA ICASSP2023](https://arxiv.org/abs/2305.13563v1)二次创新后的Faster-Block改进yolov5.\n\n16. ultralytics/cfg/models/yolo-detr/yolov5-detr-C3-AKConv.yaml\n\n    使用RT-DETR中的TransformerDecoderHead和[AKConv 2023](https://github.com/CV-ZhangXin/AKConv)改进yolov5.\n\n17. ultralytics/cfg/models/yolo-detr/yolov5-detr-C3-RFAConv.yaml\n\n    使用RT-DETR中的TransformerDecoderHead和[RFAConv 2023](https://github.com/Liuchen1997/RFAConv)改进yolov5.\n\n18. ultralytics/cfg/models/yolo-detr/yolov5-detr-C3-RFAConv.yaml\n\n    使用RT-DETR中的TransformerDecoderHead和[RFCAConv 2023](https://github.com/Liuchen1997/RFAConv)改进yolov5.\n\n19. ultralytics/cfg/models/yolo-detr/yolov5-detr-C3-RFAConv.yaml\n\n    使用RT-DETR中的TransformerDecoderHead和[RFCBAMConv 2023](https://github.com/Liuchen1997/RFAConv)改进yolov5.\n\n20. ultralytics/cfg/models/yolo-detr/yolov5-detr-C3-Conv3XC.yaml\n\n    使用RT-DETR中的TransformerDecoderHead和[Swift Parameter-free Attention Network](https://github.com/hongyuanyu/SPAN/tree/main)中的Conv3XC改进yolov5.\n\n21. ultralytics/cfg/models/yolo-detr/yolov5-detr-C3-SPAB.yaml\n\n    使用RT-DETR中的TransformerDecoderHead和[Swift Parameter-free Attention Network](https://github.com/hongyuanyu/SPAN/tree/main)中的SPAB改进yolov5.\n\n22. ultralytics/cfg/models/yolo-detr/yolov5-detr-C3-DRB.yaml\n\n    使用RT-DETR中的TransformerDecoderHead和[UniRepLKNet](https://github.com/AILab-CVC/UniRepLKNet/tree/main)中的DilatedReparamBlock改进改进yolov5.\n\n23. ultralytics/cfg/models/yolo-detr/yolov5-detr-C3-UniRepLKNetBlock.yaml\n\n    使用RT-DETR中的TransformerDecoderHead和[UniRepLKNet](https://github.com/AILab-CVC/UniRepLKNet/tree/main)中的UniRepLKNetBlock改进改进yolov5.\n\n24. ultralytics/cfg/models/yolo-detr/yolov5-detr-DWR-DRB.yaml\n\n    使用[UniRepLKNet](https://github.com/AILab-CVC/UniRepLKNet/tree/main)中的DilatedReparamBlock对[DWRSeg](https://arxiv.org/abs/2212.01173)中的Dilation-wise Residual(DWR)进行二次创新改进yolov5.\n\n25. ultralytics/cfg/models/yolo-detr/yolov5-detr-C3-DBB.yaml\n\n    使用RT-DETR中的TransformerDecoderHead和[DiverseBranchBlock CVPR2021](https://github.com/DingXiaoH/DiverseBranchBlock)改进yolov5.\n\n26. ultralytics/cfg/models/yolo-detr/yolov5-detr-CSP-EDLAN.yaml\n\n    使用RT-DETR中的TransformerDecoderHead和[DualConv](https://github.com/ChipsGuardian/DualConv)打造CSP Efficient Dual Layer Aggregation Networks改进yolov5.\n\n27. ultralytics/cfg/models/yolo-detr/yolov5-detr-ASF.yaml\n\n    使用RT-DETR中的TransformerDecoderHead和[ASF-YOLO](https://github.com/mkang315/ASF-YOLO)中的Attentional Scale Sequence Fusion改进yolov5.\n\n28. ultralytics/cfg/models/yolo-detr/yolov5-detr-ASF-P2.yaml\n\n    在ultralytics/cfg/models/yolo-detr/yolov5-detr-ASF.yaml的基础上进行二次创新，引入P2检测层并对网络结构进行优化.\n\n29. ultralytics/cfg/models/yolo-detr/yolov5-detr-slimneck.yaml\n\n    使用RT-DETR中的TransformerDecoderHead和[SlimNeck](https://github.com/AlanLi1997/slim-neck-by-gsconv)中VoVGSCSP\\VoVGSCSPC和GSConv改进yolov5的neck.\n\n30. ultralytics/cfg/models/yolo-detr/yolov5-detr-slimneck-asf.yaml\n\n    在ultralytics/cfg/models/yolo-detr/yolov5-detr-slimneck.yaml使用[ASF-YOLO](https://github.com/mkang315/ASF-YOLO)中的Attentional Scale Sequence Fusion进行二次创新.\n\n31. ultralytics/cfg/models/yolo-detr/yolov5-detr-C3-AggregatedAtt.yaml\n\n    使用RT-DETR中的TransformerDecoderHead和[TransNeXt](https://github.com/DaiShiResearch/TransNeXt)中的聚合感知注意力改进C3.(百度云视频-20240106更新说明)\n\n32. ultralytics/cfg/models/yolo-detr/yolov5-detr-SDI.yaml\n\n    使用RT-DETR中的TransformerDecoderHead和[U-NetV2](https://github.com/yaoppeng/U-Net_v2)中的 Semantics and Detail Infusion Module对yolov5中的feature fusion进行改进.\n\n33. ultralytics/cfg/models/yolo-detr/yolov5-detr-goldyolo.yaml\n\n    利用RT-DETR中的TransformerDecoderHead和华为2023最新GOLD-YOLO中的Gatherand-Distribute进行改进特征融合模块.\n\n34. ultralytics/cfg/models/yolo-detr/yolov5-detr-goldyolo-asf.yaml\n\n    利用RT-DETR中的TransformerDecoderHead和华为2023最新GOLD-YOLO中的Gatherand-Distribute和[ASF-YOLO](https://github.com/mkang315/ASF-YOLO)中的Attentional Scale Sequence Fusion进行改进特征融合模块.\n\n35. ultralytics/cfg/models/yolo-detr/yolov5-detr-C3-DCNV4.yaml\n\n    使用[DCNV4](https://github.com/OpenGVLab/DCNv4)改进C3.\n\n36. ultralytics/cfg/models/yolo-detr/yolov5-detr-HSFPN.yaml\n\n    利用RT-DETR中的TransformerDecoderHead和使用[MFDS-DETR](https://github.com/JustlfC03/MFDS-DETR)中的HS-FPN改进YOLOV5中的PAN.\n\n37. ultralytics/cfg/models/yolo-detr/yolov5-detr-HSPAN.yaml\n\n    利用RT-DETR中的TransformerDecoderHead和对[MFDS-DETR](https://github.com/JustlfC03/MFDS-DETR)中的HS-FPN进行二次创新后得到HSPAN改进YOLOV5中的PAN.\n\n38. ultralytics/cfg/models/yolo-detr/yolov8-detr-Dysample.yaml\n\n    使用[ICCV2023 DySample](https://arxiv.org/abs/2308.15085)改进yolov8-detr neck中的上采样.\n\n39. ultralytics/cfg/models/yolo-detr/yolov8-detr-CARAFE.yaml\n\n    使用[ICCV2019 CARAFE](https://arxiv.org/abs/1905.02188)改进yolov8-detr neck中的上采样.\n\n40. ultralytics/cfg/models/yolo-detr/yolov8-detr-HWD.yaml\n\n    使用[Haar wavelet downsampling](https://www.sciencedirect.com/science/article/abs/pii/S0031320323005174)改进yolov8-detr neck的下采样.\n\n41. ultralytics/cfg/models/yolo-detr/yolov5-detr-ASF-Dynamic.yaml\n\n    使用[ICCV2023 DySample](https://arxiv.org/abs/2308.15085)改进[ASF-YOLO](https://github.com/mkang315/ASF-YOLO)中的Attentional Scale Sequence Fusion的上采样模块得到Dynamic Sample Attentional Scale Sequence Fusion改进yolov5-detr中的neck.\n\n42. ultralytics/cfg/models/yolo-detr/yolov5-detr-SWC.yaml\n\n    使用[shift-wise conv](https://arxiv.org/abs/2401.12736)改进yolov5-detr中的C3.\n\n43. ultralytics/cfg/models/yolo-detr/yolov5-detr-iRMB-DRB.yaml\n\n    使用[UniRepLKNet](https://github.com/AILab-CVC/UniRepLKNet/tree/main)中的DilatedReparamBlock对[EMO ICCV2023](https://github.com/zhangzjn/EMO)中的iRMB进行二次创新来改进yolov5-detr中的C2f.\n\n44. ultralytics/cfg/models/yolo-detr/yolov5-detr-iRMB-SWC.yaml\n\n    使用[shift-wise conv](https://arxiv.org/abs/2401.12736)对[EMO ICCV2023](https://github.com/zhangzjn/EMO)中的iRMB进行二次创新来改进yolov5-detr中的C2f.\n\n45. ultralytics/cfg/models/yolo-detr/yolov5-detr-C3-VSS.yaml\n\n    使用最新的Mamba架构[Mamba-UNet中的VSS](https://github.com/ziyangwang007/Mamba-UNet)对C3中的BottleNeck进行改进,使其能更有效地捕获图像中的复杂细节和更广泛的语义上下文.\n\n46. ultralytics/cfg/models/yolo-detr/yolov5-detr-C3-LVMB.yaml\n\n    使用最新的Mamba架构[Mamba-UNet中的VSS](https://github.com/ziyangwang007/Mamba-UNet)与Cross Stage Partial进行结合,使其能更有效地捕获图像中的复杂细节和更广泛的语义上下文.\n\n47. ultralytics/cfg/models/yolo-detr/yolov5-detr-RepNCSPELAN.yaml\n\n    使用[YOLOV9](https://github.com/WongKinYiu/yolov9)中的RepNCSPELAN进行改进yolov5-detr.\n\n48. ultralytics/cfg/models/yolo-detr/yolov5-detr-bifpn.yaml\n\n    添加BIFPN到yolov8中.  \n    其中BIFPN中有三个可选参数：\n    1. Fusion  \n        其中BIFPN中的Fusion模块支持五种: weight, adaptive, concat, bifpn(default), SDI  \n        其中weight, adaptive, concat出自[paper链接-Figure 3](https://openreview.net/pdf?id=q2ZaVU6bEsT), SDI出自[U-NetV2](https://github.com/yaoppeng/U-Net_v2)\n    2. node_mode  \n        block模块选择,具体可看对应百度云视频-20240302更新公告.\n    3. head_channel  \n        BIFPN中的通道数,默认设置为256.\n\n49. ultralytics/cfg/models/yolo-detr/yolov5-detr-C2f-ContextGuided.yaml\n\n    使用[CGNet](https://github.com/wutianyiRosun/CGNet/tree/master)中的Light-weight Context Guided和Light-weight Context Guided DownSample改进yolov5-detr.\n\n50. ultralytics/cfg/models/yolo-detr/yolov5-detr-PACAPN.yaml\n\n    自研结构, Parallel Atrous Convolution Attention Pyramid Network, PAC-APN\n\n51. ultralytics/cfg/models/yolo-detr/yolov5-detr-DGCST.yaml\n\n    使用[Lightweight Object Detection](https://arxiv.org/abs/2403.01736)中的Dynamic Group Convolution Shuffle Transformer改进yolov5-detr.\n\n52. ultralytics/cfg/models/yolo-detr/yolov5-detr-C3-RetBlock.yaml\n\n    使用[CVPR2024 RMT](https://arxiv.org/abs/2309.11523)中的RetBlock改进C3.\n\n53. ultralytics/cfg/models/yolo-detr/yolov5-detr-C3-PKI.yaml\n\n    使用[CVPR2024 PKINet](https://github.com/PKINet/PKINet)中的PKIModule和CAA模块改进C3.\n\n54. ultralytics/cfg/models/yolo-detr/yolov5-detr-C3-fadc.yaml\n\n    使用[CVPR2024 Frequency-Adaptive Dilated Convolution](https://github.com/Linwei-Chen/FADC)改进C3.\n\n55. ultralytics/cfg/models/yolo-detr/yolov5-detr-FDPN.yaml\n\n    自研特征聚焦扩散金字塔网络(Focusing Diffusion Pyramid Network)\n    1. 通过定制的特征聚焦模块与特征扩散机制，能让每个尺度的特征都具有详细的上下文信息，更有利于后续目标的检测与分类。\n    2. 定制的特征聚焦模块可以接受三个尺度的输入，其内部包含一个Inception-Style的模块，其利用一组并行深度卷积来捕获丰富的跨多个尺度的信息。\n    3. 通过扩散机制使具有丰富的上下文信息的特征进行扩散到各个检测尺度.\n\n56. ultralytics/cfg/models/yolo-detr/yolov5-detr-FDPN-DASI.yaml\n\n    使用[HCFNet](https://github.com/zhengshuchen/HCFNet)中的Dimension-Aware Selective Integration Module对自研的Focusing Diffusion Pyramid Network再次创新.\n\n57. ultralytics/cfg/models/yolo-detr/yolov5-detr-C3-PPA.yaml\n\n    使用[HCFNet](https://github.com/zhengshuchen/HCFNet)中的Parallelized Patch-Aware Attention Module改进C3.\n\n58. ultralytics/cfg/models/yolo-detr/yolov5-detr-SRFD.yaml\n\n    使用[A Robust Feature Downsampling Module for Remote Sensing Visual Tasks](https://ieeexplore.ieee.org/document/10142024)改进yolov5的下采样.\n\n59. ultralytics/cfg/models/yolo-detr/yolov5-detr-CSFCN.yaml\n\n    使用[Context and Spatial Feature Calibration for Real-Time Semantic Segmentation](https://github.com/kaigelee/CSFCN/tree/main)中的Context and Spatial Feature Calibration模块改进yolov5.\n\n60. ultralytics/cfg/models/yolo-detr/yolov5-detr-CGAFusion.yaml\n\n    使用[DEA-Net](https://github.com/cecret3350/DEA-Net)中的content-guided attention fusion改进yolov5-neck.\n\n61. ultralytics/cfg/models/yolo-detr/yolov5-detr-CAFMFusion.yaml\n\n    利用具有[HCANet](https://github.com/summitgao/HCANet)中的CAFM，其具有获取全局和局部信息的注意力机制进行二次改进content-guided attention fusion.\n \n62. ultralytics/cfg/models/yolo-detr/yolov5-detr-RGCSPELAN.yaml\n\n    自研RepGhostCSPELAN.\n    1. 参考GhostNet中的思想(主流CNN计算的中间特征映射存在广泛的冗余)，采用廉价的操作生成一部分冗余特征图，以此来降低计算量和参数量。\n    2. 舍弃yolov5与yolov8中常用的BottleNeck，为了弥补舍弃残差块所带来的性能损失，在梯度流通分支上使用RepConv，以此来增强特征提取和梯度流通的能力，并且RepConv可以在推理的时候进行融合，一举两得。\n    3. 可以通过缩放因子控制RGCSPELAN的大小，使其可以兼顾小模型和大模型。\n\n63. ultralytics/cfg/models/yolo-detr/yolov5-detr-C3-Faster-CGLU.yaml\n\n    使用[TransNeXt CVPR2024](https://github.com/DaiShiResearch/TransNeXt)中的Convolutional GLU对CVPR2023中的FasterNet进行二次创新.\n\n64. ultralytics/cfg/models/yolo-detr/yolov5-detr-SDFM.yaml\n\n    使用[PSFusion](https://github.com/Linfeng-Tang/PSFusion)中的superficial detail fusion module改进yolov5-neck.\n\n65. ultralytics/cfg/models/yolo-detr/yolov5-detr-PSFM.yaml\n\n    使用[PSFusion](https://github.com/Linfeng-Tang/PSFusion)中的profound semantic fusion module改进yolov5-neck.\n\n66. ultralytics/cfg/models/yolo-detr/yolov5-detr-C3-Star.yaml\n\n    使用[StarNet CVPR2024](https://github.com/ma-xu/Rewrite-the-Stars/tree/main)中的StarBlock改进C3.\n\n67. ultralytics/cfg/models/yolo-detr/yolov5-detr-C3-Star-CAA.yaml\n\n    使用[StarNet CVPR2024](https://github.com/ma-xu/Rewrite-the-Stars/tree/main)中的StarBlock和[CVPR2024 PKINet](https://github.com/PKINet/PKINet)中的CAA改进C3.\n\n68. ultralytics/cfg/models/yolo-detr/yolov5-detr-C3-KAN.yaml\n\n    使用[Pytorch-Conv-KAN](https://github.com/IvanDrokin/torch-conv-kan)的KAN卷积算子改进C3.\n    目前支持:\n    1. FastKANConv2DLayer\n    2. KANConv2DLayer\n    3. KALNConv2DLayer\n    4. KACNConv2DLayer\n    5. KAGNConv2DLayer\n\n69. ultralytics/cfg/models/yolo-detr/yolov5-detr-ContextGuideFPN.yaml\n\n    Context Guide Fusion Module（CGFM）是一个创新的特征融合模块，旨在改进YOLOv8中的特征金字塔网络（FPN）。该模块的设计考虑了多尺度特征融合过程中上下文信息的引导和自适应调整。\n    1. 上下文信息的有效融合：通过SE注意力机制，模块能够在特征融合过程中捕捉并利用重要的上下文信息，从而增强特征表示的有效性，并有效引导模型学习检测目标的信息，从而提高模型的检测精度。\n    2. 特征增强：通过权重化的特征重组操作，模块能够增强重要特征，同时抑制不重要特征，提升特征图的判别能力。\n    3. 简单高效：模块结构相对简单，不会引入过多的计算开销，适合在实时目标检测任务中应用。\n    这期视频讲解在B站:https://www.bilibili.com/video/BV1Vx4y1n7hZ/\n\n70. ultralytics/cfg/models/yolo-detr/yolov5-detr-C3-DEConv.yaml\n\n    使用[DEA-Net](https://github.com/cecret3350/DEA-Net)中的detail-enhanced convolution改进C3.\n    关于DEConv在运行的时候重参数化后比重参数化前的计算量还要大的问题:是因为重参数化前thop库其计算不准的问题,看重参数化后的参数即可.\n\n71. ultralytics/cfg/models/yolo-detr/yolov5-detr-C3-SMPCGLU.yaml\n\n    Self-moving Point Convolutional GLU模型改进C3.\n    SMP来源于[CVPR2023-SMPConv](https://github.com/sangnekim/SMPConv),Convolutional GLU来源于[TransNeXt CVPR2024](https://github.com/DaiShiResearch/TransNeXt).\n    1. 普通的卷积在面对数据中的多样性和复杂性时，可能无法捕捉到有效的特征，因此我们采用了SMPConv，其具备最新的自适应点移动机制，从而更好地捕捉局部特征，提高特征提取的灵活性和准确性。\n    2. 在SMPConv后添加CGLU，Convolutional GLU 结合了卷积和门控机制，能够选择性地通过信息通道，提高了特征提取的有效性和灵活性。\n\n# 更新公告\n- **20231105-rtdetr-v1.0**\n    1. 初版项目发布.\n\n- **20231109-rtdetr-v1.1**\n    1. 修复断点训练不能正常使用的bug.\n    2. 优化get_FPS.py中的模型导入方法.\n    3. 增加以yolov5和yolov8为基准模型更换为RTDETR的Head,后续也会提供yolov5-detr,yolov8-detr相关的改进.\n    4. 新增百度云视频-20231109更新说明视频和替换主干说明视频.\n    5. 新增GhostHGNetV2,RepHGNetV2,详细请看使用教程中的RT-DETR改进方案.\n    6. 新增使用DWRSeg中的Dilation-wise Residual(DWR)模块,加强从网络高层的可扩展感受野中提取特征,详细请看使用教程中的RT-DETR改进方案.\n\n- **20231119-rtdetr-v1.2**\n    1. 增加DCNV2,DCNV3,DCNV2-Dynamic,并以RTDETR-R18,RTDETR-R50,YOLOV5-Detr,YOLOV8-Detr多个基准模型进行改进,详细请看使用教程中的RT-DETR改进方案.\n    2. 使用CVPR2022-OrthoNets中的正交通道注意力改进resnet18-backbone中的BasicBlock,resnet50-backbone中的BottleNeck,yolov8-C2f,yolov5-C3,详细请看使用教程中的RT-DETR改进方案.\n    3. 使用LearnedPositionalEncoding改进AIFI中的位置编码信息生成,详细请看使用教程中的RT-DETR改进方案.\n    4. 增加EMO模型中的iRMB模块,并使用(EfficientViT-CVPR2023)中的CascadedAttention对其二次创新得到iRMB_Cascaded,详细请看使用教程中的RT-DETR改进方案.\n    5. 百度云视频增加1119更新说明和手把手添加注意力机制视频教学.\n    6. 更新使用教程.\n\n- **20231126-rtdetr-v1.3**\n    1. 支持IoU,GIoU,DIoU,CIoU,EIoU,SIoU.\n    2. 支持MPDIoU,Inner-IoU,Inner-MPDIoU.\n    3. 支持Normalized Gaussian Wasserstein Distance.\n    4. 支持小目标检测层P2.\n    5. 支持DySnakeConv.\n    6. 新增Pconv,PConv-Rep(二次创新)优化rtdetr-r18与rtdetr-r50.\n    7. 新增Faster-Block,Faster-Block-Rep(二次创新),Faster-Block-EMA(二次创新),Faster-Block-Rep-EMA(二次创新)优化rtdetr-r18、rtdetr-r50、yolov5-detr、yolov8-retr.\n    8. 更新使用教程.\n    9. 百度云视频增加1126更新说明.\n\n- **20231202-rtdetr-v1.4**\n    1. 支持AKConv(具有任意采样形状和任意数目参数的卷积核).\n    2. 支持RFAConv,RFCAConv,RFCBAMConv(感受野注意力卷积).\n    3. 支持UniRepLKNet(大核CNNRepLK正统续作).\n    4. 使用CVPR2022 DAttention改进AIFI.\n    4. 更新使用教程.\n    5. 百度云视频增加1202更新说明.\n    6. 解决训练过程中由于指标出现的nan问题导致best.pt没办法正常保存.\n\n- **20231210-rtdetr-v1.5**\n    1. 支持来自Swift Parameter-free Attention Network中的重参数化Conv3XC模块.\n    2. 支持UniRepLKNet中的DilatedReparamBlock.\n    3. 支持UniRepLKNet中的DilatedReparamBlock对DWRSeg中的Dilation-wise Residual(DWR)模块进行二次创新的DWR_DRB.\n    4. 使用ICCV2023 FLatten Transformer中的FocusedLinearAttention改进AIFI.\n    5. 更新使用教程.\n    6. 百度云视频增加1210更新说明.\n\n- **20231214-rtdetr-v1.6**\n    1. 支持DiverseBranchBlock.\n    2. 利用DualConv打造CSP Efficient Dual Layer Aggregation Networks(仅支持yolov5-detr和yolov8-detr).\n    3. 使用Swift Parameter-free Attention Network中的重参数化Conv3XC和DiverseBranchBlock改进RepC3.\n    4. 支持最新的ASF-YOLO中的Attentional Scale Sequence Fusion.\n    5. 更新使用教程.\n    6. 百度云视频增加1214更新说明.\n\n- **20231223-rtdetr-v1.7**\n    1. 增加rtdetr-r18-asf-p2.yaml,使用ASF-YOLO中的Attentional Scale Sequence Fusion与Small Object Detection Head进行二次创新.\n    2. 新增rtdetr-slimneck.yaml和rtdetr-slimneck-ASF.yaml.\n    3. 新增yolov8-detr-slimneck.yaml,yolov8-detr-slimneck-asf.yaml.\n    4. 新增yolov5-detr-slimneck.yaml,yolov5-detr-slimneck-asf.yaml.\n    5. 修正热力图计算中预处理.\n    6. 更新使用教程.\n    7. 百度云视频增加1223更新说明.\n\n- **20240106-rtdetr-v1.8**\n    1. 新增Shape-IoU,Inner-Shape-IoU.\n    2. 新增支持TransNeXt主干和TransNeXt中的聚焦感知注意力机制.\n    3. 新增U-NetV2中的Semantics and Detail Infusion Module对RTDETR的CCFM进行创新.\n    4. ASF系列支持attention_add.\n    5. 更新使用教程.\n    6. 百度云视频增加20240106更新说明.\n\n- **20240113-rtdetr-v1.9**\n    1. 支持Wise-IoU(v1,v2,v3)系列(IoU,WIoU,EIoU,GIoU,DIoU,CIoU,SIoU,MPDIoU,ShapeIoU).\n    2. 支持Inner-Wise-IoU(v1,v2,v3)系列(IoU,WIoU,EIoU,GIoU,DIoU,CIoU,SIoU,MPDIoU,ShapeIoU).\n    3. 支持SlideLoss,EMASlideLoss(利用Exponential Moving Average优化mean iou,可当自研创新模块).\n    4. 使用华为2023最新GOLD-YOLO中的Gatherand-Distribute进行改进特征融合模块.\n    5. 使用ASF-YOLO中Attentional Scale Sequence Fusion与GOLD-YOLO中的Gatherand-Distribute进行二次创新结合.\n    6. 修正rtdetr-r34中检测头参数错误的问题,增加rtdetr-r34,rtdetr-r50-m的预训练权重.\n    7. 更新使用教程.\n    8. 百度云视频增加20240113更新说明.\n\n- **20240120-rtdetr-v1.10**\n    1. 新增DCNV4.\n    2. 使用[LITv2](https://github.com/ziplab/LITv2)中具有提取高低频信息的高效注意力对AIFI进行二次改进.\n    3. 使用[MFDS-DETR](https://github.com/JustlfC03/MFDS-DETR)中的HS-FPN改进RTDETR中的CCFM和YOLOV5-DETR、YOLOV8-DETR中的Neck.\n    4. 对[MFDS-DETR](https://github.com/JustlfC03/MFDS-DETR)中的HS-FPN进行二次创新后得到HSPAN改进RTDETR中的CCFM和YOLOV5-DETR、YOLOV8-DETR中的Neck.\n    5. 修复没有使用wiou时候断点续寻的bug.\n    6. 修复plot_result.py画结果图中乱码的问题.\n    7. 更新使用教程.\n    8. 百度云视频增加20240120更新说明.\n\n- **20240128-rtdetr-v1.11**\n    1. 增加CARAFE轻量化上采样算子.\n    2. 增加DySample(ICCV2023)动态上采样算子.\n    3. 增加Haar wavelet downsampling下采样算子.\n    4. 增加Focaler-IoU,Focaler-GIoU,Focaler-DIoU,Focaler-CIoU,Focaler-EIoU,Focaler-SIoU,Focaler-Shape-IoU,Focaler-MPDIoU.\n    5. 增加Focaler-Wise-IoU(v1,v2,v3)(IoU,WIoU,EIoU,GIoU,DIoU,CIoU,SIoU,MPDIoU,ShapeIoU).\n    6. 使用DySample(ICCV2023)动态上采样算子对ASF-YOLO中的Attentional Scale Sequence Fusion进行二次创新.\n    7. 更新使用教程.\n    8. 百度云视频增加20240128更新说明.\n\n- **20240206-rtdetr-v1.12**\n    1. 新增Shift-ConvNets相关改进内容.(rtdetr-SWC.yaml,rtdetr-R50-SWC.yaml,yolov8-detr-C2f-SWC.yaml,yolov5-detr-C3-SWC.yaml)\n    2. 使用UniRepLKNet中的DilatedReparamBlock对EMO中的iRMB进行二次创新.\n    3. 使用Shift-ConvNets中的具有移位操作的卷积对EMO中的iRMB进行二次创新.\n    4. 更新使用教程.\n    5. 百度云视频增加20240206更新说明.\n\n- **20240219-rtdetr-v1.13**\n    1. 使用最新的Mamba架构(号称超越Transformer的新架构)改进rtdetr-r18,rtdetr-r50,yolov5-detr,yolov8-detr.\n    2. 新增Powerful-IoU,Powerful-IoUV2,Inner-Powerful-IoU,Inner-Powerful-IoUV2,Focaler-Powerful-IoU,Focaler-Powerful-IoUV2,Wise-Powerful-IoU(v1,v2,v3),Wise-Powerful-IoUV2(v1,v2,v3)系列.\n    3. 更新热力图脚本,使用方式可参考最新发的yolov5v7-gradcam的视频.\n    4. 更新COCO脚本,增加其他指标输出.\n    5. 更新使用教程.\n    6. 百度云视频增加20240219更新说明.\n\n- **20240225-rtdetr-v1.14**\n    1. 新增YOLOV9中的RepNCSPELAN模块.\n    2. 使用DBB,OREPA,DilatedReparamBlock,Conv3XC对YOLOV9中的RepNCSPELAN模块进行二次创新.\n    3. 更新使用教程.\n    4. 百度云视频增加20240225更新说明.\n\n- **20240302-rtdetr-v1.15**\n    1. 新增CGNet中的Light-weight Context Guided和Light-weight Context Guided DownSample模块.\n    2. Neck模块新增BIFPN,并对其进行创新,支持替换不同的block.\n    3. 为RTDETR定制SlideVarifocalLoss,EMASlideVarifocalLoss.\n    4. 更新使用教程.\n    5. 百度云视频增加20240302更新说明.\n\n- **20240307-rtdetr-v1.16**\n    1. 新增自研Neck结构Parallel Atrous Convolution Attention Pyramid Network, PAC-APN.附带模块内结构图\n    2. 复现Lightweight Object Detection中的Dynamic Group Convolution Shuffle Transformer.\n    3. 更新使用教程.\n    4. 百度云视频增加20240307更新说明.\n\n- **20240321-rtdetr-v1.17**\n    1. 新增CVPR2024-RMT主干,并支持RetBlock改进RepC3.\n    2. 新增2024年新出的Efficient Local Attention,并用其对HSFPN进行二次创新.\n    3. 使用CVPR2021-CoordAttention对HSFPN进行二次创新.\n    4. 更新使用教程,增加多个常见疑问解答.\n    5. 百度云视频增加20240321更新说明.\n\n- **20240404-rtdetr-v1.18**\n    1. 新增CVPR2024 PKINet主干.\n    2. 新增CVPR2024 PKINet中的PKIModule和CAA模块,提出C2f-PKI.\n    3. 使用CVPR2024 PKINet中的Context Anchor Attention改进RepNCSPELAN、HSFPN.\n    4. 新增CVPR2024 Frequency-Adaptive Dilated Convolution.\n    5. 增加有效感受野可视化脚本.\n    6. 更新使用教程\n    7. 百度云视频增加20240404更新说明.\n\n- **20240412-rtdetr-v1.19**\n    1. 新增自研Focusing Diffusion Pyramid Network.\n    2. 新增HCFNet针对小目标分割的Parallelized Patch-Aware Attention Module改进C2f.\n    3. 新增HCFNet针对小目标分割的Dimension-Aware Selective Integration Module对自研Focusing Diffusion Pyramid Network再次进行创新.\n    4. 更新使用教程.\n    5. 百度云视频增加20240412更新说明.\n\n- **20240427-rtdetr-v1.20**\n    1. 新增mobilenetv4-backbone.\n    2. 新增A Robust Feature Downsampling Module for Remote Sensing Visual Tasks中的下采样.\n    3. 新增Context and Spatial Feature Calibration for Real-Time Semantic Segmentation中的Context and Spatial Feature Calibration.\n    4. 更新使用教程.\n    5. 百度云视频增加20240427更新说明.\n\n- **20240502-rtdetr-v1.21**\n    1. 新增支持content-guided attention fusion改进rtdetr-neck.\n    2. 新增支持使用CAFM对CGAFusion进行二次改进,得到CAFMFusion改进rtdetr-neck.\n    3. get_FPS.py脚本新增可以通过yaml测试推理速度.\n    4. 新增自研RGCSPELAN,其比C3、ELAN、C2f、RepNCSPELAN更低参数量和计算量更快推理速度.\n    5. 更新使用教程.\n    6. 百度云视频增加20240502更新说明.\n\n- **20240518-rtdetr-v1.22**\n    1. 新增CVPR2024-StarNet-Backbone以及其衍生的改进(C3-Star、C3-Star-CAA、C2f-Star、C2f-Star-CAA、BasicBlock_Star、BottleNeck_Star).\n    2. 使用CVPR2024-TransNext中的Convolutional GLU对CVPR2023-FasterBlock进行二次创新(C3_Faster_CGLU, C2f_Faster_CGLU, BasicBlock_Faster_Block_CGLU, BottleNeck_Faster_Block_CGLU).\n    3. 新增PSFusion中的superficial detail fusion module、profound semantic fusion module.\n    4. 更新使用教程.\n    5. 百度云视频增加20240518更新说明.\n\n- **20240525-rtdetr-v1.23**\n    1. KAN In! Mamba Out!,集成pytorch-kan-conv，支持多种KAN变种！\n    2. 同步DCNV4-CVPR2024最新代码.\n    3. 更新使用教程.\n    4. 百度云视频增加20240525更新说明.\n\n- **20240608-rtdetr-v1.24**\n    1. 新增自研ContextGuideFPN.\n    2. 新增detail-enhanced convolution改进RTDETR.\n    3. 新增自研SMPCGLU，里面的模块分别来自CVPR2023和CVPR2024.\n    4. 更新使用教程.\n    5. 百度云视频增加20240608更新说明.\n\n- **20240618-rtdetr-v1.25**\n    1. 新增支持物理传热启发的视觉表征模型vHeat中的vHeatBlock.\n    2. 新增自研重校准特征金字塔网络(Re-CalibrationFPN),推出多个版本(P2345,P345,P3456).\n    3. 新增WaveletPool改进上采样和下采样.\n    4. 更新使用教程.\n    5. 百度云视频增加20240618更新说明.\n\n- **20240622-rtdetr-v1.26**\n    1. 新增RtDetr-Mamba.\n    2. 新增GLSA改进rtdetr-neck.\n    3. 新增GLSA对BIFPN进行二次创新.\n    4. 更新使用教程.\n    5. 百度云视频增加20240622更新说明.\n\n- **20240703-rtdetr-v1.27**\n    1. 新增UCTransNet中的ChannelTransformer改进rtdetr-neck.\n    2. 新增自研SmallObjectEnhancePyramid.\n    3. 新增SwiftFormer的EfficientAdditiveAttention改进AIFI.\n    4. 更新使用教程.\n    5. 百度云视频增加20240703更新说明.\n\n- **20240715-rtdetr-v1.28**\n    1. 新增自研Context-Guided Spatial Feature Reconstruction Feature Pyramid Network.\n    2. 新增Wavelet Convolutions for Large Receptive Fields中的WTConv改进BasicBlock.\n    3. 新增UBRFC-Net中的Adaptive Fine-Grained Channel Attention.\n    4. 更新使用教程.\n    5. 百度云视频增加20240715更新说明.\n\n- **20240725-rtdetr-v1.29**\n    1. 新增ECCV2024-SMFANet中的Feature Modulation block.\n    2. 新增Rethinking Performance Gains in Image Dehazing Networks中的gConvblock.\n    3. 更新使用教程.\n    4. 百度云视频增加20240725更新说明.\n\n- **20240802-rtdetr-v1.30**\n    1. 新增LDConv.\n    2. 新增MAF-YOLO中的MAFPN，并利用BIFPN的思想对MAFPN进行二次创新得到BIMAFPN.\n    3. 更新使用教程.\n    4. 百度云视频增加20240802更新说明.\n\n- **20240815-rtdetr-v1.31**\n    1. 新增YOLO-MIF中的WDBB、DeepDBB的重参数化模块.\n    2. 新增SLAB中的RepBN改进AIFI.\n    3. 更新使用教程.\n    4. 百度云视频增加20240815更新说明.\n\n- **20240825-rtdetr-v1.32**\n    1. 新增CAS-ViT中的AdditiveBlock和CSP思想改进backbone.\n    2. 新增CAS-ViT中的AdditiveBlock改进AIFI.\n    3. 新增自研Efficient Multi-Branch&Scale FPN.\n    4. 更新使用教程.\n    5. 百度云视频增加20240825更新说明.\n\n- **20240902-rtdetr-v1.33**\n    1. 新增CMTFUnet和TransNext的二次创新模块.\n    2. 新增自研CSP-Partial Multi-Scale Feature Aggregation.\n    3. 更新使用教程.\n    4. 百度云视频增加20240902更新说明.\n\n- **20240912-rtdetr-v1.34**\n    1. 新增Cross-Layer Feature Pyramid Transformer for Small Object Detection in Aerial Images中的CFPT.\n    2. 新增ICLR2024中的MogaBlock.\n    3. 更新使用教程.\n    4. 百度云视频增加20240912更新说明.\n\n- **20240926-rtdetr-v1.35**\n    1. 新增CVPR2024-SHViT中的SHSABlock和其的二次创新.\n    2. 新增BIBM2024-SMAFormer中的SMAFormerBlock和其的二次创新.\n    3. 新增TPAMI2024-FreqFusion中的FreqFusion改进Neck.\n    4. 新增自研MutilBackBone-DynamicAlignFusion.\n    5. 更新使用教程.\n    6. 百度云视频增加20240926更新说明.\n\n- **20241020-rtdetr-v1.36**\n    1. 新增Histoformer ECCV2024中的Dynamic-range Histogram Self-Attention改进AIFI.\n    2. 新增自研CSP-MutilScaleEdgeInformationEnhance.\n    3. 新增Efficient Frequency-Domain Image Deraining with Contrastive Regularization ECCV2024中的Fused_Fourier_Conv_Mixer与CSP思想结合改进rtdetr-backbone.\n    4. 更新使用教程.\n    5. 百度云视频增加20241020更新说明.\n\n- **20241106-rtdetr-v1.37**\n    1. 新增自研CSP-FreqSpatial.\n    2. 新增SFHformer ECCV2024中的block与CSP思想结合改进 rtdetr-backbone.\n    3. 新增Revitalizing Convolutional Network for Image Restoration TPAMI2024中的MSM与CSP思想结合改进rtdetr-backbone.\n    4. 更新使用教程.\n    5. 百度云视频增加20241106更新说明.\n\n- **20241118-rtdetr-v1.38**\n    1. 基于自研CSP-MutilScaleEdgeInformationEnhance再次创新得到CSP-MutilScaleEdgeInformationSelect.\n    2. 新增Pattern Recognition 2024|DRANet中的HDRAB和RAB模块与CSP思想结合改进rtdetr-backbone.\n    3. 新增ECCV2022-ELAN中的Local feature extraction改进RepC3.\n    4. 更新使用教程.\n    5. 百度云视频增加20241118更新说明.\n\n- **20241130-rtdetr-v1.39**\n    1. 新增自研GlobalEdgeInformationTransfer.\n    2. 新增FreqFormer的Frequency-aware Cascade Attention与CSP结合改进backbone.\n    3. 更新使用教程.\n    4. 百度云视频增加20241130更新说明.\n\n- **20241215-rtdetr-v1.40**\n    1. 新增CrossFormer中的DynamicPosBias-Attention改进AIFI.\n    2. 新增CAMixerSR中的CAMixer与CSP结合改进backbone.\n    3. 修改保存模型规则,原本为fp16变成fp32,详细请看本期更新视频.\n    4. 百度云视频增加20241215更新说明.\n\n- **20241216-rtdetr-v1.41**\n    1. 新增Hyper-YOLO中的Hypergraph Computation in Semantic Space和Mixed Aggregation Network改进rtdetr.\n    2. 修复已知bug.\n    3. 更新使用教程.\n    4. 百度云视频增加20241216更新说明.\n\n- **20241228-rtdetr-v1.42**\n    1. 新增基于Hyper-YOLO中的Mixed Aggregation Network三个二次改进系列.\n    2. 新增使用MSA^2 Net中的Multi-Scale Adaptive Spatial Attention Gate改进rtdetr-neck.\n    3. 新增使用MSA^2 Net中的Multi-Scale Adaptive Spatial Attention Gate改进自研系列的MutilBackbone.\n    4. 更新使用教程.\n    5. 百度云视频增加20241228更新说明.\n\n- **20250111-rtdetr-v1.43**\n    1. 新增CRAFT-SR中的high-frequency enhancement residual block与CSP结合改进backbone.\n    2. 新增AAAI2025-TBSN中的DTAB改进backbone、AIFI.\n    3. 新增ECCV2024-FSEL中的多个模块改进rtdetr.\n    4. 新增ACMMM2024-WFEN中的多个模块改进rtdetr.\n    5. 更新使用教程.\n    6. 百度云视频增加20250111更新说明.\n\n- **20250119-rtdetr-v1.44**\n    1. 新增AAAI2025 Pinwheel-shaped Convolution and Scale-based Dynamic Loss for Infrared Small Target Detection中的Pinwheel-shaped Convolution类型改进.\n    2. 新增AAAI2025 ConDSeg中的ContrastDrivenFeatureAggregation与ACMMM2024 WFEN中的小波变换进行创新.\n    3. 更新使用教程.\n    4. 百度云视频增加20250119更新说明.\n\n- **20250204-rtdetr-v1.45**\n    1. 新增ELGC-Net的改进及其二次创新.\n    2. 新增ICLR2025 PolaFormer中的PolaAttention改进AIFI.\n    3. 新增遥感目标检测Strip R-CNN中的StripBlock及其二次创新.\n    4. 新增BIBM2024 Spatial-Frequency Dual Domain Attention Network For Medical Image Segmentation中的Frequency-Spatial Attention和Multi-scale Progressive Channel Attention.\n    5. 更新使用教程.\n    6. 百度云视频增加20250204更新说明.\n\n- **20250206-rtdetr-v1.46**\n    1. 新增ICLR2025 Kolmogorov-Arnold Transformer中的KAT及其配合FasterBlock的二次创新.<此模块需要编译>\n    2. 更新使用教程.\n    3. 百度云视频增加20250206更新说明.\n\n- **20250216-rtdetr-v1.47**\n    1. 新增自研模块DynamicInceptionDWConv2d.\n    2. 新增GlobalFilter和DynamicFilter.\n    3. 更新使用教程.\n    4. 百度云视频增加20250216更新说明.\n\n- **20250303-rtdetr-v1.48**\n    1. 新增自研模块Hierarchical Attention Fusion并提供多种使用方式.\n    2. 新增ICLR2025-Token Statistics Transformer中的TSSA改进AIFI.\n    3. 新增MHAF-YOLO中的RepHMS.<这个是YOLO群内的一个博士新作品>\n    4. 更新使用教程.\n    5. 百度云视频增加20250303更新说明.\n\n- **20250315-rtdetr-v1.49**\n    1. 新增CVPR2024-Adaptive Sparse Transformer的模块改进aifi.\n    2. 新增CVPR2025-MambaIR的模块.\n    3. 新增CVPR2025-SCSegamba中的模块.\n    4. 新增CVPR2025-MambaOut中的模块.\n    5. 新增CVPR2025-DEIM MAL损失函数.\n    6. 更新使用教程.\n    7. 百度云视频增加20250315更新说明.\n\n- **20250403-rtdetr-v1.50**\n    1. 新增CVPR2025-MambaOut与CVPR2024-UniRepLKNet二次创新后的模块.\n    2. 新增CVPR2025-EfficientViM和其与CVPR2024-TransNeXt的二次创新后的模块.\n    3. 新增CVPR2024-EMCAD中的EUCB.\n    4. 新增CVPR2025-BHViT中的ShiftChannelMix和CVPR2024-EMCAD中的EUCB二次创新模块.\n    5. 新增rtdetr-EMBSFPN.yaml方案上引入[CVPR2025 BHViT](https://github.com/IMRL/BHViT)中的ShiftChannelMix.\n    6. 新增CVPR2025-HVI中的Intensity Enhancement Layer.\n    7. 新增CVPR2025-OverLock中的模块.\n    8. 更新使用教程.\n    9. 百度云视频增加20250403更新说明.\n\n- **20250420-rtdetr-v1.51**\n    1. 新增ICLR2024-FTIC中的多个模块、以及其与ICLR2025-PolaFormer的二次创新模块.\n    2. 新增CVPR2024-DCMPNet中的多个模块.\n    3. 新增ICLR2025-PolaFormer与CVPR2024-TransNext的二次创新模块.\n    4. 新增CVPR2025-OverLock中的GDSAFusion.\n    5. 新增统计配置文件的计算量和参数量并排序的脚本.\n    6. 更新使用教程.\n    7. 百度云视频增加20250420更新说明.\n\n- **20250508-rtdetr-v1.52**\n    1. 新增CVPR2025-MobileMamba的相关改进.\n    2. 新增LEGNet中的LFEModule和LoGStem改进.\n    3. 新增WACV2025-SEMNet中的Snake Bi-Directional Sequence Modelling (SBSM)和Spatially-Enhanced Feedforward Network (SEFN)的多个改进，并含有二次创新相关内容.\n    4. 新增CVPR2025-LSNet中的多个改进，并含有二次创新相关内容.\n    5. 新增CVPR2025-DynamicTan中的多个改进，并含有二次创新相关内容.\n    6. 更新使用教程.\n    7. 百度云视频增加20250508更新说明.\n\n- **20250523-rtdetr-v1.53**\n    1. 新增TransMamba中的多个改进.\n    2. 新增CVPR2025-EVSSM中的多个改进.\n    3. 新增CVPR2025-DarkIR中的多个改进.\n    4. 更新使用教程.\n    5. 百度云视频增加20250523更新说明.\n\n- **20250606-rtdetr-v1.54**\n    1. 新增CVPR2025-FDConv的改进及其多个二次创新模块.\n    2. 新增DSA: Deformable Spatial Attention的改进及其多个二次创新模块.\n    3. 新增CVPR2025-MaIR中的Residual Mamba Block.\n    4. 更新使用教程.\n    5. 百度云视频增加20250606更新说明.\n\n- **20250622-rtdetr-v1.55**\n    1. 新增ECCV2024-rethinkingfpn中的模块，并对原创改进SOEP再次创新。\n    2. 新增CVPR2024-SFSConv的改进及其多个二次创新模块.\n    3. 新增CVPR2025-GroupMamba中的模块.\n    4. 新增CVPR2025-MambaVision中的模块.\n    5. 新增AAAI2025-FBRTYOLO中的模块.\n    5. 更新使用教程.\n    6. 百度云视频增加20250622更新说明.\n    7. 修复在torch2.6.0以及以上的版本会出现模型读取失败的问题.\n\n- **20250711-rtdetr-v1.56**\n    1. 新增Pyramid Sparse Transformer改进rtdetr-neck.\n    2. 新增Pyramid Sparse Transformer对SOEP再创新.\n    3. 新增weightedConvolution2.0.\n    4. 新增MIA2025-FourierConv.\n    5. 新增AAAI2025的HS-FPN.\n    6. 更新使用教程.\n    7. 百度云视频增加20250711更新说明.\n\n- **20250727-rtdetr-v1.57**\n    1. 新增ICCV2025-ESC中的模块.\n    2. 新增ICCV2025-MobileIE中的模块.\n    3. 新增ICCV2025-VSSD中的模块.\n    4. 新增ICCV2025-TinyVIM中的模块.\n    5. 新增MSLA.\n    6. 新增INFFUS2025-SAMamba中的模块.\n    7. 新增TGRS2025-UMFormer中模块.\n    8. 更新使用教程.\n    9. 百度云视频增加20250727更新说明.\n\n- **20250815-rtdetr-v1.58**\n    1. 新增CPRAformer中的EPGO多个改进。\n    2. 新增ICCV2025-ESC中的ConvAttn改进。\n    3. 更新使用教程.\n    4. 百度云视频增加20250815更新说明.\n\n- **20250829-rtdetr-v1.59**\n    1. 新增ICCV2025-UniConvBlock中的模块.\n    2. 新增ICCV2025-ConverseBNet中的模块.\n    3. 新增ACM MM 2025-Mobile U-ViT中的模块.\n    4. 更新使用教程.\n    5. 百度云视频增加20250829更新说明.\n\n- **20250914-rtdetr-v1.60**\n    1. 新增CVPR2025-GCConv模块.\n    2. 新增AAAI2024-CFBlock模块.\n    3. 新增ICCV2023-FastViT中的RepStem模块.\n    4. 更新使用教程.\n    5. 百度云视频增加20250914更新说明.\n\n- **20251008-rtdetr-v1.61**\n    1. 新增IJCV2024-SRConvNet中的模块.\n    2. 新增LWGANet中的模块.\n    3. 更新使用教程.\n    4. 百度云视频增加20251008更新说明.\n\n- **20251028-rtdetr-v1.62**\n    1. 新增TGRS2025-ASCNet中的模块.\n    2. 新增ICCV2025-HFRB模块.\n    3. 新增ICIP2025-BEVANET中的模块.\n    4. 新增TPAMI2025-LRFormer中的模块.\n    5. 新增ICCV2025-Rectifying Magnitude Neglect in Linear Attention的模块.\n    6. 更新使用教程.\n    7. 百度云视频增加20251028更新说明.\n\n- **20251122-rtdetr-v1.63**\n    1. 新增GRSL2025-Gaussian Combined Distance,详细请看LOSS改进系列.md.\n    2. 新增ACCV2024-PlainUSR中的模块.\n    3. 更新使用教程.\n    4. 百度云视频增加20251122更新说明.\n\n- **20251219-rtdetr-v1.64**\n    1. 新增CVPR2025-HVI中的LCA模块.\n    2. 新增TIP2025-SFMB模块.\n    3. 新增TGRS2025-HAFNet中的HFFE模块.\n    4. 更新使用教程.\n    5. 百度云视频增加20251219更新说明.\n\n- **20260114-rtdetr-v1.65**\n    1. 新增YOLO-Master中的MoE模块.\n    2. 新增ACMMM2025-FlickCD中的模块.\n    3. 更新使用教程.\n    4. 百度云视频增加20260114更新说明.\n\n- **20260203-rtdetr-v1.66**\n    1. 新增TGRS2025-Think Locally and Act Globally中的模块.\n    2. 新增TGRS2025-ISGLNet中的多个模块.\n    3. 新增TGRS2025-MASFNet中的模块.\n    4. 更新使用教程.\n    5. 百度云视频增加20260203更新说明.\n\n- **20260224-rtdetr-v1.67**\n    1. 新增MICCAI2023-SHISRCNet中的模块.\n    2. 新增AAAI2026-Partial Channel Network中的模块.\n    3. 新增TGRS2025-DRPCANet中的模块.\n    4. 新增TGRS2025-ISGLNet中的模块.\n    5. 新增TGRS2025-HDNet中的模块.\n    6. 更新使用教程.\n    7. 百度云视频增加20260223更新说明.\n\n- **20260307-rtdetr-v1.68**\n    1. 增加训练过程中的mAP75输出.\n    2. 优化detect.py中的特征图保存机制，使其可以单独保存每一个通道的特征图和总通道求和的特征图.\n\n- **20260321-rtdetr-v1.69**\n    1. 新增AAAI2026-SPJFBlock模块.\n    2. 新增TGRS2025-GLVMamba中的GLSS2D模块.\n    3. 新增TIP2025-DSMT中的CAFM模块.\n    4. 新增TGRS2025-USTNet中的DWMMSA模块.\n    5. 新增CVPR2026-MixerCSeg中的DEGConv模块.\n    6. 新增CVPR2026-BinaryAttention的模块.\n    7. 新增CVPR2026-TransMixer模块.\n    8. 新增CVPR2025-Wavelet and Prototype Augmented Query-based Transformer for Pixel-level Surface Defect Detection中的WCA模块.\n    9. 更新使用教程.\n    10. 百度云视频增加20260321更新说明.\n    11. 修复一些失效的链接."
  },
  {
    "path": "yolo-improve/ultralytics-yolo/get_COCO_metrice.py",
    "content": "import warnings\nwarnings.filterwarnings('ignore')\nimport argparse\nfrom pycocotools.coco import COCO\nfrom pycocotools.cocoeval import COCOeval\nfrom tidecv import TIDE, datasets\n\n# COCO指标如果一直生成不出来之类的问题可以看这期视频排查：https://www.bilibili.com/video/BV1SdNizEE4X/\n# 出现缺失的info健的问题请装pycocotools==2.0.8\n\ndef parse_opt():\n    parser = argparse.ArgumentParser()\n    parser.add_argument('--anno_json', type=str, default='data.json', help='label coco json path') # 数据集coco格式的json标签文件\n    parser.add_argument('--pred_json', type=str, default='', help='pred coco json path') # 数据集coco格式的json模型推理文件\n    \n    return parser.parse_known_args()[0]\n\nif __name__ == '__main__':\n    opt = parse_opt()\n    anno_json = opt.anno_json\n    pred_json = opt.pred_json\n    \n    anno = COCO(anno_json)  # init annotations api\n    pred = anno.loadRes(pred_json)  # init predictions api\n    eval = COCOeval(anno, pred, 'bbox')\n    eval.evaluate()\n    eval.accumulate()\n    eval.summarize()\n\n    tide = TIDE()\n    tide.evaluate_range(datasets.COCO(anno_json), datasets.COCOResult(pred_json), mode=TIDE.BOX)\n    tide.summarize()\n    tide.plot(out_dir='tide_result')"
  },
  {
    "path": "yolo-improve/ultralytics-yolo/heatmap.py",
    "content": "import warnings\nwarnings.filterwarnings('ignore')\nwarnings.simplefilter('ignore')\nimport torch, yaml, cv2, os, shutil, sys, copy\ntorch.autograd.set_detect_anomaly(True)\nimport numpy as np\nnp.random.seed(0)\nimport matplotlib.pyplot as plt\nfrom tqdm import trange\nfrom PIL import Image\nfrom ultralytics import YOLO\nfrom ultralytics.nn.modules.head import Pose, Pose26\nfrom ultralytics.utils.nms import non_max_suppression\nfrom ultralytics.utils import LOGGER\nfrom pytorch_grad_cam import GradCAMPlusPlus, GradCAM, XGradCAM, EigenCAM, HiResCAM, LayerCAM, RandomCAM, EigenGradCAM, KPCA_CAM, AblationCAM\nfrom pytorch_grad_cam.utils.image import show_cam_on_image, scale_cam_image\nfrom pytorch_grad_cam.activations_and_gradients import ActivationsAndGradients\n\nRED, GREEN, BLUE, YELLOW, ORANGE, CYAN, MAGENTA, BOLD, RESET = \"\\033[91m\", \"\\033[92m\", \"\\033[94m\", \"\\033[93m\", \"\\033[38;5;208m\", \"\\033[96m\", \"\\033[95m\", \"\\033[1m\", \"\\033[0m\"\n\ndef patch_pose_classes_for_gradcam():\n    \"\"\"修复 Pose 和 Pose26 类使其兼容 Grad-CAM，移除 inplace 操作\"\"\"\n    \n    # 修复 Pose 类\n    def pose_kpts_decode_no_inplace(self, kpts: torch.Tensor) -> torch.Tensor:\n        \"\"\"Decode keypoints from predictions (no inplace operations).\"\"\"\n        ndim = self.kpt_shape[1]\n        bs = kpts.shape[0]\n        if self.export:\n            y = kpts.view(bs, *self.kpt_shape, -1)\n            a = (y[:, :, :2] * 2.0 + (self.anchors - 0.5)) * self.strides\n            if ndim == 3:\n                a = torch.cat((a, y[:, :, 2:3].sigmoid()), 2)\n            return a.view(bs, self.nk, -1)\n        else:\n            y = kpts.clone()\n            if ndim == 3:\n                # 强制使用非 inplace 操作\n                y[:, 2::ndim] = y[:, 2::ndim].sigmoid()\n            y[:, 0::ndim] = (y[:, 0::ndim] * 2.0 + (self.anchors[0] - 0.5)) * self.strides\n            y[:, 1::ndim] = (y[:, 1::ndim] * 2.0 + (self.anchors[1] - 0.5)) * self.strides\n            return y\n    \n    # 修复 Pose26 类\n    def pose26_kpts_decode_no_inplace(self, kpts: torch.Tensor) -> torch.Tensor:\n        \"\"\"Decode keypoints from predictions (no inplace operations).\"\"\"\n        ndim = self.kpt_shape[1]\n        bs = kpts.shape[0]\n        if self.export:\n            y = kpts.view(bs, *self.kpt_shape, -1)\n            # NCNN fix\n            a = (y[:, :, :2] + self.anchors) * self.strides\n            if ndim == 3:\n                a = torch.cat((a, y[:, :, 2:3].sigmoid()), 2)\n            return a.view(bs, self.nk, -1)\n        else:\n            y = kpts.clone()\n            if ndim == 3:\n                # 强制使用非 inplace 操作\n                y[:, 2::ndim] = y[:, 2::ndim].sigmoid()\n            y[:, 0::ndim] = (y[:, 0::ndim] + self.anchors[0]) * self.strides\n            y[:, 1::ndim] = (y[:, 1::ndim] + self.anchors[1]) * self.strides\n            return y\n    \n    # 应用补丁\n    Pose.kpts_decode = pose_kpts_decode_no_inplace\n    Pose26.kpts_decode = pose26_kpts_decode_no_inplace\n\npatch_pose_classes_for_gradcam()\n\ndef letterbox(im, new_shape=(640, 640), color=(114, 114, 114), auto=True, scaleFill=False, scaleup=True, stride=32):\n    # Resize and pad image while meeting stride-multiple constraints\n    shape = im.shape[:2]  # current shape [height, width]\n    if isinstance(new_shape, int):\n        new_shape = (new_shape, new_shape)\n\n    # Scale ratio (new / old)\n    r = min(new_shape[0] / shape[0], new_shape[1] / shape[1])\n    if not scaleup:  # only scale down, do not scale up (for better val mAP)\n        r = min(r, 1.0)\n\n    # Compute padding\n    ratio = r, r  # width, height ratios\n    new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r))\n    dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1]  # wh padding\n    if auto:  # minimum rectangle\n        dw, dh = np.mod(dw, stride), np.mod(dh, stride)  # wh padding\n    elif scaleFill:  # stretch\n        dw, dh = 0.0, 0.0\n        new_unpad = (new_shape[1], new_shape[0])\n        ratio = new_shape[1] / shape[1], new_shape[0] / shape[0]  # width, height ratios\n\n    dw /= 2  # divide padding into 2 sides\n    dh /= 2\n\n    if shape[::-1] != new_unpad:  # resize\n        im = cv2.resize(im, new_unpad, interpolation=cv2.INTER_LINEAR)\n    top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1))\n    left, right = int(round(dw - 0.1)), int(round(dw + 0.1))\n    im = cv2.copyMakeBorder(im, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color)  # add border\n    return im, ratio, (top, bottom, left, right)\n\nclass ActivationsAndGradients:\n    \"\"\" Class for extracting activations and\n    registering gradients from targetted intermediate layers \"\"\"\n\n    def __init__(self, model, target_layers, reshape_transform):\n        self.model = model\n        self.gradients = []\n        self.activations = []\n        self.reshape_transform = reshape_transform\n        self.handles = []\n        for target_layer in target_layers:\n            self.handles.append(\n                target_layer.register_forward_hook(self.save_activation))\n            # Because of https://github.com/pytorch/pytorch/issues/61519,\n            # we don't use backward hook to record gradients.\n            self.handles.append(\n                target_layer.register_forward_hook(self.save_gradient))\n\n    def save_activation(self, module, input, output):\n        activation = output\n\n        if self.reshape_transform is not None:\n            activation = self.reshape_transform(activation)\n        self.activations.append(activation.cpu().detach())\n\n    def save_gradient(self, module, input, output):\n        if not hasattr(output, \"requires_grad\") or not output.requires_grad:\n            # You can only register hooks on tensor requires grad.\n            return\n\n        # Gradients are computed in reverse order\n        def _store_grad(grad):\n            if self.reshape_transform is not None:\n                grad = self.reshape_transform(grad)\n            self.gradients = [grad.cpu().detach()] + self.gradients\n\n        output.register_hook(_store_grad)\n\n    def post_process(self, result):\n        if self.model.end2end:\n            logits_ = result[:, :, 4:]\n            boxes_ = result[:, :, :4]\n            sorted, indices = torch.sort(logits_[:, :, 0], descending=True)\n            return logits_[0][indices[0]], boxes_[0][indices[0]]\n        elif self.model.task == 'detect':\n            logits_ = result[:, 4:]\n            boxes_ = result[:, :4]\n            sorted, indices = torch.sort(logits_.max(1)[0], descending=True)\n            return torch.transpose(logits_[0], dim0=0, dim1=1)[indices[0]], torch.transpose(boxes_[0], dim0=0, dim1=1)[indices[0]]\n        elif self.model.task == 'segment':\n            logits_ = result[0][0][:, 4:4 + self.model.nc]\n            boxes_ = result[0][0][:, :4]\n            mask_p, mask_nm = result[0][1].squeeze(), result[0][0][:, 4 + self.model.nc:].squeeze().transpose(1, 0)\n            c, h, w = mask_p.size()\n            mask = (mask_nm @ mask_p.view(c, -1))\n            sorted, indices = torch.sort(logits_.max(1)[0], descending=True)\n            return torch.transpose(logits_[0], dim0=0, dim1=1)[indices[0]], torch.transpose(boxes_[0], dim0=0, dim1=1)[indices[0]], mask[indices[0]]\n        elif self.model.task == 'pose':\n            logits_ = result[:, 4:4 + self.model.nc]\n            boxes_ = result[:, :4]\n            poses_ = result[:, 4 + self.model.nc:]\n            sorted, indices = torch.sort(logits_.max(1)[0], descending=True)\n            return torch.transpose(logits_[0], dim0=0, dim1=1)[indices[0]], torch.transpose(boxes_[0], dim0=0, dim1=1)[indices[0]], torch.transpose(poses_[0], dim0=0, dim1=1)[indices[0]]\n        elif self.model.task == 'obb':\n            logits_ = result[:, 4:4 + self.model.nc]\n            boxes_ = result[:, :4]\n            angles_ = result[:, 4 + self.model.nc:]\n            sorted, indices = torch.sort(logits_.max(1)[0], descending=True)\n            return torch.transpose(logits_[0], dim0=0, dim1=1)[indices[0]], torch.transpose(boxes_[0], dim0=0, dim1=1)[indices[0]], torch.transpose(angles_[0], dim0=0, dim1=1)[indices[0]]\n        elif self.model.task == 'classify':\n            return result[0]\n  \n    def __call__(self, x):\n        self.gradients = []\n        self.activations = []\n        model_output = self.model(x)\n        if self.model.task == 'detect':\n            post_result, pre_post_boxes = self.post_process(model_output[0])\n            return [[post_result, pre_post_boxes]]\n        elif self.model.task == 'segment':\n            post_result, pre_post_boxes, pre_post_mask = self.post_process(model_output)\n            return [[post_result, pre_post_boxes, pre_post_mask]]\n        elif self.model.task == 'pose':\n            post_result, pre_post_boxes, pre_post_pose = self.post_process(model_output[0])\n            return [[post_result, pre_post_boxes, pre_post_pose]]\n        elif self.model.task == 'obb':\n            post_result, pre_post_boxes, pre_post_angle = self.post_process(model_output[0])\n            return [[post_result, pre_post_boxes, pre_post_angle]]\n        elif self.model.task == 'classify':\n            data = self.post_process(model_output)\n            return [data]\n\n    def release(self):\n        for handle in self.handles:\n            handle.remove()\n\nclass yolo_detect_target(torch.nn.Module):\n    def __init__(self, ouput_type, conf, ratio, end2end) -> None:\n        super().__init__()\n        self.ouput_type = ouput_type\n        self.conf = conf\n        self.ratio = ratio\n        self.end2end = end2end\n\n    @staticmethod\n    def _accumulate(acc, value):\n        return value if acc is None else acc + value\n\n    @staticmethod\n    def _zero_scalar_like(tensor):\n        # Keep the zero target connected to autograd graph so Grad-CAM layers receive zero (not None) gradients.\n        return tensor.sum() * 0.0\n    \n    def forward(self, data):\n        post_result, pre_post_boxes = data\n        acc = None\n        loop_count = min(int(post_result.size(0) * self.ratio), post_result.size(0))\n        for i in trange(loop_count):\n            if (self.end2end and float(post_result[i, 0]) < self.conf) or (not self.end2end and float(post_result[i].max()) < self.conf):\n                break\n            if self.ouput_type in (\"class\", \"all\"):\n                acc = self._accumulate(acc, post_result[i, 0] if self.end2end else post_result[i].max())\n            if self.ouput_type in (\"box\", \"all\"):\n                for j in range(4):\n                    acc = self._accumulate(acc, pre_post_boxes[i, j])\n        return acc if acc is not None else self._zero_scalar_like(post_result)\n\nclass yolo_segment_target(yolo_detect_target):\n    def __init__(self, ouput_type, conf, ratio, end2end):\n        super().__init__(ouput_type, conf, ratio, end2end)\n    \n    def forward(self, data):\n        post_result, pre_post_boxes, pre_post_mask = data\n        acc = None\n        loop_count = min(int(post_result.size(0) * self.ratio), post_result.size(0))\n        for i in trange(loop_count):\n            if float(post_result[i].max()) < self.conf:\n                break\n            if self.ouput_type in (\"class\", \"all\"):\n                acc = self._accumulate(acc, post_result[i].max())\n            if self.ouput_type in (\"box\", \"all\"):\n                for j in range(4):\n                    acc = self._accumulate(acc, pre_post_boxes[i, j])\n            if self.ouput_type in (\"segment\", \"all\"):\n                acc = self._accumulate(acc, pre_post_mask[i].mean())\n        return acc if acc is not None else self._zero_scalar_like(post_result)\n\nclass yolo_pose_target(yolo_detect_target):\n    def __init__(self, ouput_type, conf, ratio, end2end):\n        super().__init__(ouput_type, conf, ratio, end2end)\n    \n    def forward(self, data):\n        post_result, pre_post_boxes, pre_post_pose = data\n        acc = None\n        loop_count = min(int(post_result.size(0) * self.ratio), post_result.size(0))\n        for i in trange(loop_count):\n            if float(post_result[i].max()) < self.conf:\n                break\n            if self.ouput_type in (\"class\", \"all\"):\n                acc = self._accumulate(acc, post_result[i].max())\n            if self.ouput_type in (\"box\", \"all\"):\n                for j in range(4):\n                    acc = self._accumulate(acc, pre_post_boxes[i, j])\n            if self.ouput_type in (\"pose\", \"all\"):\n                acc = self._accumulate(acc, pre_post_pose[i].mean())\n        return acc if acc is not None else self._zero_scalar_like(post_result)\n\nclass yolo_obb_target(yolo_detect_target):\n    def __init__(self, ouput_type, conf, ratio, end2end):\n        super().__init__(ouput_type, conf, ratio, end2end)\n    \n    def forward(self, data):\n        post_result, pre_post_boxes, pre_post_angle = data\n        acc = None\n        loop_count = min(int(post_result.size(0) * self.ratio), post_result.size(0))\n        for i in trange(loop_count):\n            if float(post_result[i].max()) < self.conf:\n                break\n            if self.ouput_type in (\"class\", \"all\"):\n                acc = self._accumulate(acc, post_result[i].max())\n            if self.ouput_type in (\"box\", \"all\"):\n                for j in range(4):\n                    acc = self._accumulate(acc, pre_post_boxes[i, j])\n            if self.ouput_type in (\"obb\", \"all\"):\n                acc = self._accumulate(acc, pre_post_angle[i])\n        return acc if acc is not None else self._zero_scalar_like(post_result)\n\nclass yolo_classify_target(yolo_detect_target):\n    def __init__(self, ouput_type, conf, ratio, end2end):\n        super().__init__(ouput_type, conf, ratio, end2end)\n    \n    def forward(self, data):\n        return data.max()\n\nclass yolo_heatmap:\n    def __init__(self, weight, device, method, layer, backward_type, conf_threshold, ratio, show_result, renormalize, task, img_size, letterbox_auto):\n        device = torch.device(device)\n        model_yolo = YOLO(weight)\n        model_names = model_yolo.names\n        LOGGER.info(f'{ORANGE}model class info:{model_names}{RESET}')\n        model = copy.deepcopy(model_yolo.model)\n        model.to(device)\n        model.info()\n        for p in model.parameters():\n            p.requires_grad_(True)\n        model.eval()\n        \n        model.task = task\n        if not hasattr(model, 'end2end'):\n            model.end2end = False\n        if model.end2end:\n            model.end2end = False\n        \n        if task == 'detect':\n            target = yolo_detect_target(backward_type, conf_threshold, ratio, model.end2end)\n        elif task == 'segment':\n            target = yolo_segment_target(backward_type, conf_threshold, ratio, model.end2end)\n        elif task == 'pose':\n            target = yolo_pose_target(backward_type, conf_threshold, ratio, model.end2end)\n        elif task == 'obb':\n            target = yolo_obb_target(backward_type, conf_threshold, ratio, model.end2end)\n        elif task == 'classify':\n            target = yolo_classify_target(backward_type, conf_threshold, ratio, model.end2end)\n        else:\n            raise Exception(f\"not support task({task}).\")\n        \n        target_layers = [model.model[l] for l in layer]\n        cam_methods = {\n            \"GradCAMPlusPlus\": GradCAMPlusPlus,\n            \"GradCAM\": GradCAM,\n            \"XGradCAM\": XGradCAM,\n            \"EigenCAM\": EigenCAM,\n            \"HiResCAM\": HiResCAM,\n            \"LayerCAM\": LayerCAM,\n            \"RandomCAM\": RandomCAM,\n            \"EigenGradCAM\": EigenGradCAM,\n            \"KPCA_CAM\": KPCA_CAM,\n            \"AblationCAM\": AblationCAM,\n        }\n        if method not in cam_methods:\n            raise ValueError(f\"Unsupported CAM method '{method}'. Available methods: {', '.join(cam_methods)}\")\n        method = cam_methods[method](model, target_layers)\n        method.activations_and_grads = ActivationsAndGradients(model, target_layers, None)\n        \n        colors = np.random.uniform(0, 255, size=(len(model_names), 3)).astype(np.int32)\n        self.__dict__.update(locals())\n    \n    def post_process(self, result):\n        result = non_max_suppression(result, conf_thres=self.conf_threshold, iou_thres=0.65)[0]\n        return result\n\n    def draw_detections(self, box, color, name, img):\n        xmin, ymin, xmax, ymax = list(map(int, list(box)))\n        cv2.rectangle(img, (xmin, ymin), (xmax, ymax), tuple(int(x) for x in color), 2) # 绘制检测框\n        cv2.putText(img, str(name), (xmin, ymin - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.8, tuple(int(x) for x in color), 2, lineType=cv2.LINE_AA)  # 绘制类别、置信度\n        return img\n\n    def renormalize_cam_in_bounding_boxes(self, boxes, image_float_np, grayscale_cam):\n        \"\"\"Normalize the CAM to be in the range [0, 1] \n        inside every bounding boxes, and zero outside of the bounding boxes. \"\"\"\n        renormalized_cam = np.zeros(grayscale_cam.shape, dtype=np.float32)\n        for x1, y1, x2, y2 in boxes:\n            x1, y1 = max(x1, 0), max(y1, 0)\n            x2, y2 = min(grayscale_cam.shape[1] - 1, x2), min(grayscale_cam.shape[0] - 1, y2)\n            renormalized_cam[y1:y2, x1:x2] = scale_cam_image(grayscale_cam[y1:y2, x1:x2].copy())    \n        renormalized_cam = scale_cam_image(renormalized_cam)\n        eigencam_image_renormalized = show_cam_on_image(image_float_np, renormalized_cam, use_rgb=True)\n        return eigencam_image_renormalized\n    \n    def process(self, img_path, save_path):\n        # img process\n        try:\n            img = cv2.imdecode(np.fromfile(img_path, np.uint8), cv2.IMREAD_COLOR)\n        except Exception:\n            LOGGER.error(f\"{RED}{img_path} read failure.{RESET}\")\n            return False\n        if img is None:\n            LOGGER.error(f\"{RED}{img_path} decode failure (not an image or corrupted file).{RESET}\")\n            return False\n        img, _, (top, bottom, left, right) = letterbox(img, new_shape=(self.img_size, self.img_size), auto=self.letterbox_auto)\n        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)\n        img = np.float32(img) / 255.0\n        tensor = torch.from_numpy(np.transpose(img, axes=[2, 0, 1])).unsqueeze(0).to(self.device)\n        LOGGER.info(f'{BOLD}{ORANGE}tensor size:{tensor.size()}{RESET}')\n        \n        try:\n            grayscale_cam = self.method(tensor, [self.target])\n        except AttributeError:\n            LOGGER.warning(f\"{CYAN}self.method(tensor, [self.target]) failure.{RESET}\")\n            return False\n        \n        grayscale_cam = grayscale_cam[0, :]\n        cam_image = show_cam_on_image(img, grayscale_cam, use_rgb=True)\n        \n        pred = self.model_yolo.predict(tensor, conf=self.conf_threshold, iou=0.7, verbose=False)[0]\n        if self.renormalize and self.task in ['detect', 'segment', 'pose']:\n            cam_image = self.renormalize_cam_in_bounding_boxes(pred.boxes.xyxy.cpu().detach().numpy().astype(np.int32), img, grayscale_cam)\n        if self.show_result:\n            cam_image = pred.plot(img=cam_image,\n                                  conf=True, # 显示置信度\n                                  font_size=None, # 字体大小，None为根据当前image尺寸计算\n                                  line_width=None, # 线条宽度，None为根据当前image尺寸计算\n                                  labels=False, # 显示标签\n                                  )\n        \n        # 去掉padding边界\n        cam_image = cam_image[top:cam_image.shape[0] - bottom, left:cam_image.shape[1] - right]\n        cam_image = Image.fromarray(cam_image)\n        cam_image.save(save_path)\n        return True\n    \n    def __call__(self, img_path, save_path):\n        # remove dir if exist\n        if os.path.exists(save_path):\n            shutil.rmtree(save_path)\n        # make dir if not exist\n        os.makedirs(save_path, exist_ok=True)\n\n        if os.path.isdir(img_path):\n            success, failed = 0, 0\n            for img_path_ in os.listdir(img_path):\n                ok = self.process(f'{img_path}/{img_path_}', f'{save_path}/{img_path_}')\n                success += int(ok)\n                failed += int(not ok)\n            LOGGER.info(f\"{BOLD}{ORANGE}processed images: success={success}, failed={failed}{RESET}\")\n        else:\n            ok = self.process(img_path, f'{save_path}/result.png')\n            if not ok:\n                LOGGER.error(f\"{RED}failed to process input image: {img_path}{RESET}\")\n        \n        LOGGER.info(f'{BOLD}{MAGENTA}进度条不满是正常现象,只要进度条不是0,都可以进行出图.{RESET}')\n        \ndef get_params():\n    params = {\n        'weight': 'yolo26n.pt', # 现在只需要指定权重即可,不需要指定cfg\n        'device': 'cuda:0',\n        'method': 'GradCAMPlusPlus', # GradCAMPlusPlus, GradCAM, XGradCAM, EigenCAM, HiResCAM, LayerCAM, RandomCAM, EigenGradCAM, KPCA_CAM\n        'layer': [16, 19, 22],\n        'backward_type': 'all', # detect:<class, box, all> segment:<class, box, segment, all> pose:<box, keypoint, all> obb:<box, angle, all> classify:<all>\n        'conf_threshold': 0.2, # 0.2\n        'ratio': 0.02, # 0.02-0.1\n        'show_result': True, # 不需要绘制结果请设置为False\n        'renormalize': False, # 需要把热力图限制在框内请设置为True(仅对detect,segment,pose有效)\n        'task':'detect', # 任务(detect,segment,pose,obb,classify)\n        'img_size':640, # 图像尺寸\n        'letterbox_auto': True # 如果需要固定成宽高一样就设置为False，部分改进可能需要输入的宽高一致，不然会报错\n    }\n    return params\n\n# pip install grad-cam==1.5.5 --no-deps\nif __name__ == '__main__':\n    model = yolo_heatmap(**get_params())\n    model(r'/root/dataset/coco/images/val2017/000000361238.jpg', 'heatmap_result')\n    # model(r'/root/dataset/coco/images/val2017', 'heatmap_result')\n    # model(r'/root/code/project/datasets/DOTAv1.5/images/test', 'heatmap_result')"
  },
  {
    "path": "yolo-improve/ultralytics-yolo/requirements.txt",
    "content": "PyYAML\ntensorboard\nscipy\nthop\ntransformers\neinops\nprettytable\nPyWavelets\npolars"
  },
  {
    "path": "yolo-improve/ultralytics-yolo/train.py",
    "content": "import warnings, os, sys\nsys.path.append(os.path.dirname(os.path.abspath(__file__)))\nwarnings.filterwarnings('ignore')\nfrom ultralytics import YOLO\n\n# BILIBILI UP 魔傀面具\n# 训练参数官方详解链接：https://docs.ultralytics.com/modes/train/#resuming-interrupted-trainings:~:text=a%20training%20run.-,Train%20Settings,-The%20training%20settings\n\nif __name__ == '__main__':\n    yaml_path = 'ultralytics/cfg/models/26/yolo26n.yaml'\n\n    # 初始化 YOLO 模型，从 yaml 配置文件构建网络结构\n    model = YOLO(yaml_path)\n    # model.load('yolo26n.pt') # 加载预训练权重，一般都不建议加载\n    model.train(data='/root/dataset/dataset_visdrone/data.yaml', # 数据集配置文件路径\n                cache=False, # 是否缓存图像到内存以加快训练速度。False=不缓存，True=缓存到RAM(很吃内存，内存少的慎开)，'disk'=缓存到磁盘(吃硬盘空间)\n                imgsz=640, # 输入图像尺寸（像素）\n                epochs=300, # 训练总轮数\n                batch=16, # 批次大小\n                close_mosaic=0, # 最后多少个 epoch 关闭 Mosaic 数据增强。设置 0 代表全程开启 Mosaic 训练\n                workers=4, # 数据加载的工作线程数。Windows 下出现卡顿或奇怪错误可尝试设置为 0\n                device='0', # 训练设备选择。'0' 代表使用第一块 GPU，'cpu' 为 CPU，'0,1,2' 为多 GPU\n                optimizer='MuSGD' if 'yolo26' in yaml_path else 'SGD', # 优化器选择。YOLO26 使用官方推荐的 MuSGD，其他模型使用 SGD\n                patience=50, # 早停机制的耐心值。连续 50 个 epoch 验证指标未提升则停止训练。设置 0 关闭早停\n                # resume=True, # 断点续训，需要在 YOLO 初始化时加载 last.pt 权重文件\n                amp=True, # 是否启用自动混合精度（Automatic Mixed Precision）训练，默认为 True | loss出现nan可以关闭amp\n                # fraction=0.2, # 设置0.2代表只选择百分之20的数据进行训练\n                cos_lr=False, # 是否使用余弦退火学习率调度器，默认为 False\n                save_period=-1, # 每隔多少个 epoch 保存一次 checkpoint（默认 -1 表示禁用，仅保存最好和最后的）\n                project='train', # 训练结果保存的项目目录\n                name='exp', # 本次实验的名称，（若已存在则自动创建 exp2, exp3...）\n                )"
  },
  {
    "path": "yolo-improve/ultralytics-yolo/val.py",
    "content": "import warnings\nwarnings.filterwarnings('ignore')\nimport os\nimport numpy as np\nfrom prettytable import PrettyTable\nfrom ultralytics import YOLO\nfrom ultralytics.utils.torch_utils import model_info\n\n# BILIBILI UP 魔傀面具\n# 验证参数官方详解链接：https://docs.ultralytics.com/modes/val/#usage-examples:~:text=of%20each%20category-,Arguments%20for%20YOLO%20Model%20Validation,-When%20validating%20YOLO\n\n# 最终论文的参数量和计算量统一以这个脚本运行出来的为准\n\ndef get_weight_size(path):\n    stats = os.stat(path)\n    return f'{stats.st_size / 1024 / 1024:.1f}'\n\nif __name__ == '__main__':\n    model_path = ''\n    model = YOLO(model_path) # 选择训练好的权重路径\n    result = model.val(data='data.yaml',\n                        split='test', # split可以选择train、val、test 根据自己的数据集情况来选择.\n                        imgsz=640,\n                        batch=16,\n                        # iou=0.7,\n                        project='val',\n                        name='exp',\n                        # end2end=False # 如果训练的是NMSFree类型的模型，不想用一对一的头可以设置False\n                        )\n    \n    if model.task == 'detect': # 仅目标检测任务适用 需要改别的任务可以看：https://www.bilibili.com/video/BV1dBQDY6Ec5/\n        length = result.box.p.size\n        model_names = list(result.names.values())\n        preprocess_time_per_image = result.speed['preprocess']\n        inference_time_per_image = result.speed['inference']\n        postprocess_time_per_image = result.speed['postprocess']\n        all_time_per_image = preprocess_time_per_image + inference_time_per_image + postprocess_time_per_image\n        \n        n_l, n_p, n_g, flops = model_info(model.model)\n        \n        print('-'*20 + '论文上的数据以以下结果为准' + '-'*20)\n        print('-'*20 + '论文上的数据以以下结果为准' + '-'*20)\n        print('-'*20 + '论文上的数据以以下结果为准' + '-'*20)\n        print('-'*20 + '论文上的数据以以下结果为准' + '-'*20)\n        print('-'*20 + '论文上的数据以以下结果为准' + '-'*20)\n\n        model_info_table = PrettyTable()\n        model_info_table.title = \"Model Info\"\n        model_info_table.field_names = [\"GFLOPs\", \"Parameters\", \"前处理时间/一张图\", \"推理时间/一张图\", \"后处理时间/一张图\", \"FPS(前处理+模型推理+后处理)\", \"FPS(推理)\", \"Model File Size\"]\n        model_info_table.add_row([f'{flops:.1f}', f'{n_p:,}', \n                                  f'{preprocess_time_per_image / 1000:.6f}s', f'{inference_time_per_image / 1000:.6f}s', \n                                  f'{postprocess_time_per_image / 1000:.6f}s', f'{1000 / all_time_per_image:.2f}', \n                                  f'{1000 / inference_time_per_image:.2f}', f'{get_weight_size(model_path)}MB'])\n        print(model_info_table)\n\n        model_metrice_table = PrettyTable()\n        model_metrice_table.title = \"Model Metrice\"\n        model_metrice_table.field_names = [\"Class Name\", \"Precision\", \"Recall\", \"F1-Score\", \"mAP50\", \"mAP75\", \"mAP50-95\"]\n        for idx in range(length):\n            model_metrice_table.add_row([\n                                        model_names[idx], \n                                        f\"{result.box.p[idx]:.4f}\", \n                                        f\"{result.box.r[idx]:.4f}\", \n                                        f\"{result.box.f1[idx]:.4f}\", \n                                        f\"{result.box.ap50[idx]:.4f}\", \n                                        f\"{result.box.all_ap[idx, 5]:.4f}\", # 50 55 60 65 70 75 80 85 90 95 \n                                        f\"{result.box.ap[idx]:.4f}\"\n                                    ])\n        model_metrice_table.add_row([\n                                    \"all(平均数据)\", \n                                    f\"{result.results_dict['metrics/precision(B)']:.4f}\", \n                                    f\"{result.results_dict['metrics/recall(B)']:.4f}\", \n                                    f\"{np.mean(result.box.f1[:length]):.4f}\", \n                                    f\"{result.results_dict['metrics/mAP50(B)']:.4f}\", \n                                    f\"{np.mean(result.box.all_ap[:length, 5]):.4f}\", # 50 55 60 65 70 75 80 85 90 95 \n                                    f\"{result.results_dict['metrics/mAP50-95(B)']:.4f}\"\n                                ])\n        print(model_metrice_table)\n\n        with open(result.save_dir / 'paper_data.txt', 'w+', errors=\"ignore\", encoding=\"utf-8\") as f:\n            f.write(str(model_info_table))\n            f.write('\\n')\n            f.write(str(model_metrice_table))\n        \n        print('-'*20, f'结果已保存至{result.save_dir}/paper_data.txt...', '-'*20)\n        print('-'*20, f'结果已保存至{result.save_dir}/paper_data.txt...', '-'*20)\n        print('-'*20, f'结果已保存至{result.save_dir}/paper_data.txt...', '-'*20)\n        print('-'*20, f'结果已保存至{result.save_dir}/paper_data.txt...', '-'*20)\n        print('-'*20, f'结果已保存至{result.save_dir}/paper_data.txt...', '-'*20)"
  },
  {
    "path": "yolo-improve/ultralytics-yolo/yolo2coco.py",
    "content": "import json\nimport os\nfrom pathlib import Path\nfrom PIL import Image\n\n\nclass YOLOtoCOCO:\n    def __init__(self, yolo_dir, image_dir, class_names, output_json='coco_annotations.json'):\n        \"\"\"\n        初始化YOLO到COCO转换器\n        \n        Args:\n            yolo_dir: YOLO标签文件目录\n            image_dir: 图片文件目录\n            class_names: 类别名称列表，索引对应YOLO的类别ID\n            output_json: 输出的COCO格式JSON文件路径\n        \"\"\"\n        self.yolo_dir = Path(yolo_dir)\n        self.image_dir = Path(image_dir)\n        self.class_names = class_names\n        self.output_json = output_json\n        \n        # COCO格式的基本结构\n        self.coco_format = {\n            \"images\": [],\n            \"annotations\": [],\n            \"categories\": []\n        }\n        \n        self.annotation_id = 0\n    \n    def create_categories(self):\n        \"\"\"创建类别信息\"\"\"\n        for i, class_name in enumerate(self.class_names):\n            category = {\n                \"id\": i,\n                \"name\": class_name,\n                \"supercategory\": \"object\"\n            }\n            self.coco_format[\"categories\"].append(category)\n    \n    def yolo_to_coco_bbox(self, yolo_bbox, img_width, img_height):\n        \"\"\"\n        将YOLO格式的bbox转换为COCO格式\n        \n        YOLO格式: [x_center, y_center, width, height] (归一化)\n        COCO格式: [x_min, y_min, width, height] (像素值)\n        \"\"\"\n        x_center, y_center, width, height = yolo_bbox\n        \n        # 转换为像素值\n        x_center *= img_width\n        y_center *= img_height\n        width *= img_width\n        height *= img_height\n        \n        # 转换为COCO格式 (左上角坐标 + 宽高)\n        x_min = x_center - width / 2\n        y_min = y_center - height / 2\n        \n        return [x_min, y_min, width, height]\n    \n    def bbox_to_segmentation(self, bbox):\n        \"\"\"\n        将bbox转换为segmentation格式\n        矩形四个顶点，从左上角开始顺时针\n        \n        Args:\n            bbox: [x_min, y_min, width, height]\n        \n        Returns:\n            segmentation: [[x1, y1, x2, y2, x3, y3, x4, y4]]\n        \"\"\"\n        x_min, y_min, width, height = bbox\n        \n        # 计算四个顶点坐标（从左上角开始顺时针）\n        # 左上角\n        x1, y1 = x_min, y_min\n        # 右上角\n        x2, y2 = x_min + width, y_min\n        # 右下角\n        x3, y3 = x_min + width, y_min + height\n        # 左下角\n        x4, y4 = x_min, y_min + height\n        \n        # COCO segmentation格式: [[x1, y1, x2, y2, x3, y3, x4, y4]]\n        segmentation = [[x1, y1, x2, y2, x3, y3, x4, y4]]\n        \n        return segmentation\n    \n    def process_image(self, image_path, label_path):\n        \"\"\"处理单张图片及其标签\"\"\"\n        # 使用文件名(不含扩展名)作为image_id\n        image_id = image_path.stem\n        \n        # 读取图片获取尺寸\n        try:\n            img = Image.open(image_path)\n            img_width, img_height = img.size\n        except Exception as e:\n            print(f\"无法读取图片 {image_path}: {e}\")\n            return\n        \n        # 添加图片信息\n        image_info = {\n            \"id\": image_id,\n            \"file_name\": image_path.name,\n            \"width\": img_width,\n            \"height\": img_height\n        }\n        self.coco_format[\"images\"].append(image_info)\n        \n        # 读取YOLO标签文件\n        if not label_path.exists():\n            print(f\"标签文件不存在: {label_path}\")\n            return\n        \n        with open(label_path, 'r') as f:\n            lines = f.readlines()\n        \n        # 处理每个标注\n        for line in lines:\n            line = line.strip()\n            if not line:\n                continue\n            \n            parts = line.split()\n            class_id = int(parts[0])\n            bbox = [float(x) for x in parts[1:5]]\n            \n            # 转换bbox格式\n            coco_bbox = self.yolo_to_coco_bbox(bbox, img_width, img_height)\n            \n            # 计算面积\n            area = coco_bbox[2] * coco_bbox[3]\n            \n            # 生成segmentation（矩形四个顶点）\n            segmentation = self.bbox_to_segmentation(coco_bbox)\n            \n            # 创建标注信息\n            annotation = {\n                \"id\": self.annotation_id,\n                \"image_id\": image_id,\n                \"category_id\": class_id,\n                \"bbox\": coco_bbox,\n                \"area\": area,\n                \"iscrowd\": 0,\n                \"segmentation\": segmentation\n            }\n            self.coco_format[\"annotations\"].append(annotation)\n            self.annotation_id += 1\n    \n    def convert(self):\n        \"\"\"执行转换\"\"\"\n        print(\"开始转换YOLO格式到COCO格式...\")\n        \n        # 创建类别信息\n        self.create_categories()\n        \n        # 获取所有图片文件\n        image_extensions = ['.jpg', '.jpeg', '.png', '.bmp']\n        image_files = []\n        for ext in image_extensions:\n            image_files.extend(self.image_dir.glob(f'*{ext}'))\n            image_files.extend(self.image_dir.glob(f'*{ext.upper()}'))\n        \n        print(f\"找到 {len(image_files)} 张图片\")\n        \n        # 处理每张图片\n        for image_path in image_files:\n            # 对应的标签文件\n            label_path = self.yolo_dir / f\"{image_path.stem}.txt\"\n            self.process_image(image_path, label_path)\n        \n        # 保存为JSON文件\n        with open(self.output_json, 'w', encoding='utf-8') as f:\n            json.dump(self.coco_format, f, indent=2, ensure_ascii=False)\n        \n        print(f\"转换完成！\")\n        print(f\"图片数量: {len(self.coco_format['images'])}\")\n        print(f\"标注数量: {len(self.coco_format['annotations'])}\")\n        print(f\"类别数量: {len(self.coco_format['categories'])}\")\n        print(f\"输出文件: {self.output_json}\")\n\n\n# 使用示例\nif __name__ == \"__main__\":\n    # 配置参数\n    yolo_label_dir = \"/root/dataset/dataset_visdrone/VisDrone2019-DET-test-dev/labels\"  # YOLO标签文件目录\n    image_dir = \"/root/dataset/dataset_visdrone/VisDrone2019-DET-test-dev/images\"  # 图片目录\n    \n    # 类别名称列表（索引对应YOLO的类别ID）\n    class_names = ['pedestrian', 'people', 'bicycle', 'car', 'van', 'truck', 'tricycle', 'awning-tricycle', 'bus', 'motor']\n    \n    output_json = \"/root/dataset/dataset_visdrone/VisDrone2019-DET-test-dev/coco_annotations.json\"  # 输出文件名\n    \n    # 创建转换器并执行转换\n    converter = YOLOtoCOCO(\n        yolo_dir=yolo_label_dir,\n        image_dir=image_dir,\n        class_names=class_names,\n        output_json=output_json\n    )\n    \n    converter.convert()"
  },
  {
    "path": "yolo-improve/yolov11-project.md",
    "content": "# [基于Ultralytics的YOLO11|YOLO12改进项目.(69.9¥)](https://github.com/z1069614715/objectdetection_script)\n#### 因为YOLO11和YOLO12的结构高度相似，所以YOLO12的配置文件都可以从YOLO11修改过去，项目内有标注视频链接！\n\n# 目前自带的一些改进方案(目前拥有合计420+个改进点！持续更新！)\n\n# 为了感谢各位对本项目的支持,本项目的赠品是yolov5-PAGCP通道剪枝算法.[具体使用教程](https://www.bilibili.com/video/BV1yh4y1Z7vz/)\n\n# 专栏改进汇总\n\n## YOLO11系列\n### 二次创新系列\n1. ultralytics/cfg/models/11/yolo11-RevCol.yaml\n\n    使用(ICLR2023)Reversible Column Networks对yolo11主干进行重设计,里面的支持更换不同的C3k2-Block.\n2. EMASlideLoss\n\n    使用EMA思想与SlideLoss进行相结合.\n3. ultralytics/cfg/models/11/yolo11-dyhead-DCNV3.yaml\n\n    使用[DCNV3](https://github.com/OpenGVLab/InternImage)替换DyHead中的DCNV2.\n4. ultralytics/cfg/models/11/yolo11-C3k2-EMBC.yaml\n\n    使用[Efficientnet](https://blog.csdn.net/weixin_43334693/article/details/131114618?spm=1001.2014.3001.5501)中的MBConv与EffectiveSE改进C3k2.\n5. ultralytics/cfg/models/11/yolo11-GhostHGNetV2.yaml\n\n    使用Ghost_HGNetV2作为YOLO11的backbone.\n6. ultralytics/cfg/models/11/yolo11-RepHGNetV2.yaml\n\n    使用Rep_HGNetV2作为YOLO11的backbone.\n7. ultralytics/cfg/models/11/yolo11-C3k2-DWR-DRB.yaml\n\n    使用[UniRepLKNet](https://github.com/AILab-CVC/UniRepLKNet/tree/main)中的DilatedReparamBlock对[DWRSeg](https://arxiv.org/abs/2212.01173)中的Dilation-wise Residual(DWR)的模块进行二次创新后改进C3k2.\n8. ultralytics/cfg/models/11/yolo11-ASF-P2.yaml\n\n    在ultralytics/cfg/models/11/yolo11-ASF.yaml的基础上进行二次创新，引入P2检测层并对网络结构进行优化.\n9. ultralytics/cfg/models/11/yolo11-CSP-EDLAN.yaml\n\n    使用[DualConv](https://github.com/ChipsGuardian/DualConv)打造CSP Efficient Dual Layer Aggregation Networks改进yolo11.\n10. ultralytics/cfg/models/11/yolo11-bifpn-SDI.yaml\n\n    使用[U-NetV2](https://github.com/yaoppeng/U-Net_v2)中的 Semantics and Detail Infusion Module对BIFPN进行二次创新.\n11. ultralytics/cfg/models/11/yolo11-goldyolo-asf.yaml\n\n    利用华为2023最新GOLD-YOLO中的Gatherand-Distribute与[ASF-YOLO](https://github.com/mkang315/ASF-YOLO)中的Attentional Scale Sequence Fusion进行二次创新改进yolo11的neck.\n12. ultralytics/cfg/models/11/yolo11-dyhead-DCNV4.yaml\n\n    使用[DCNV4](https://github.com/OpenGVLab/DCNv4)对DyHead进行二次创新.(请关闭AMP进行训练,使用教程请看20240116版本更新说明)\n13. ultralytics/cfg/models/11/yolo11-HSPAN.yaml\n\n    对[MFDS-DETR](https://github.com/JustlfC03/MFDS-DETR)中的HS-FPN进行二次创新后得到HSPAN改进yolo11的neck.\n14. ultralytics/cfg/models/11/yolo11-GDFPN.yaml\n\n    使用[DAMO-YOLO](https://github.com/tinyvision/DAMO-YOLO)中的RepGFPN与[ICCV2023 DySample](https://arxiv.org/abs/2308.15085)进行二次创新改进Neck.\n15. ultralytics/cfg/models/11/yolo11-HSPAN-DySample.yaml\n\n    对[MFDS-DETR](https://github.com/JustlfC03/MFDS-DETR)中的HS-FPN进行二次创新后得到HSPAN再进行创新,使用[ICCV2023 DySample](https://arxiv.org/abs/2308.15085)改进其上采样模块.\n16. ultralytics/cfg/models/11/yolo11-ASF-DySample.yaml\n\n    使用[ASF-YOLO](https://github.com/mkang315/ASF-YOLO)中的Attentional Scale Sequence Fusion与[ICCV2023 DySample](https://arxiv.org/abs/2308.15085)组合得到Dynamic Sample Attentional Scale Sequence Fusion.\n\n17. ultralytics/cfg/models/11/yolo11-C3k2-DCNV2-Dynamic.yaml\n\n    利用自研注意力机制MPCA强化DCNV2中的offset和mask.\n\n18. ultralytics/cfg/models/11/yolo11-C3k2-iRMB-Cascaded.yaml\n\n    使用[EfficientViT CVPR2023](https://github.com/microsoft/Cream/tree/main/EfficientViT)中的CascadedGroupAttention对[EMO ICCV2023](https://github.com/zhangzjn/EMO)中的iRMB进行二次创新来改进C3k2.\n\n19. ultralytics/cfg/models/11/yolo11-C3k2-iRMB-DRB.yaml\n\n    使用[UniRepLKNet](https://github.com/AILab-CVC/UniRepLKNet/tree/main)中的DilatedReparamBlock对[EMO ICCV2023](https://github.com/zhangzjn/EMO)中的iRMB进行二次创新来改进C3k2.\n\n20. ultralytics/cfg/models/11/yolo11-C3k2-iRMB-SWC.yaml\n\n    使用[shift-wise conv](https://arxiv.org/abs/2401.12736)对[EMO ICCV2023](https://github.com/zhangzjn/EMO)中的iRMB进行二次创新来改进C3k2.\n\n21. ultralytics/cfg/models/11/yolo11-DBBNCSPELAN.yaml\n\n    使用[Diverse Branch Block CVPR2021](https://arxiv.org/abs/2103.13425)对[YOLOV9](https://github.com/WongKinYiu/yolov9)中的RepNCSPELAN进行二次创新后改进yolo11.\n\n22. ultralytics/cfg/models/11/yolo11-OREPANCSPELAN.yaml\n\n    使用[Online Convolutional Re-parameterization (CVPR2022)](https://github.com/JUGGHM/OREPA_CVPR2022/tree/main)对[YOLOV9](https://github.com/WongKinYiu/yolov9)中的RepNCSPELAN进行二次创新后改进yolo11.\n\n23. ultralytics/cfg/models/11/yolo11-DRBNCSPELAN.yaml\n\n    使用[UniRepLKNet](https://github.com/AILab-CVC/UniRepLKNet/tree/main)中的DilatedReparamBlock对[YOLOV9](https://github.com/WongKinYiu/yolov9)中的RepNCSPELAN进行二次创新后改进yolo11.\n\n24. ultralytics/cfg/models/11/yolo11-DynamicHGNetV2.yaml\n\n    使用[CVPR2024 parameternet](https://arxiv.org/pdf/2306.14525v2.pdf)中的DynamicConv对[CVPR2024 RTDETR](https://arxiv.org/abs/2304.08069)中的HGBlokc进行二次创新.\n\n25. ultralytics/cfg/models/11/yolo11-C3k2-RVB-EMA.yaml\n\n    使用[CVPR2024 RepViT](https://github.com/THU-MIG/RepViT/tree/main)中的RepViTBlock和EMA注意力机制改进C3k2.\n\n26. ultralytics/cfg/models/11/yolo11-ELA-HSFPN.yaml\n\n    使用[Efficient Local Attention](https://arxiv.org/abs/2403.01123)改进HSFPN.\n\n27. ultralytics/cfg/models/11/yolo11-CA-HSFPN.yaml\n\n    使用[Coordinate Attention CVPR2021](https://github.com/houqb/CoordAttention)改进HSFPN.\n\n28. ultralytics/cfg/models/11/yolo11-CAA-HSFPN.yaml\n\n    使用[CVPR2024 PKINet](https://github.com/PKINet/PKINet)中的CAA模块HSFPN.\n\n29. ultralytics/cfg/models/11/yolo11-CSMHSA.yaml\n\n    对Mutil-Head Self-Attention进行创新得到Cross-Scale Mutil-Head Self-Attention.\n    1. 由于高维通常包含更高级别的语义信息，而低维包含更多细节信息，因此高维信息作为query，而低维信息作为key和Value，将两者结合起来可以利用高维的特征帮助低维的特征进行精细过滤，可以实现更全面和丰富的特征表达。\n    2. 通过使用高维的上采样信息进行Query操作，可以更好地捕捉到目标的全局信息，从而有助于增强模型对目标的识别和定位能力。\n\n30. ultralytics/cfg/models/11/yolo11-CAFMFusion.yaml\n\n    利用具有[HCANet](https://github.com/summitgao/HCANet)中的CAFM，其具有获取全局和局部信息的注意力机制进行二次改进content-guided attention fusion.\n\n31. ultralytics/cfg/models/11/yolo11-C3k2-Faster-CGLU.yaml\n\n    使用[TransNeXt CVPR2024](https://github.com/DaiShiResearch/TransNeXt)中的Convolutional GLU对CVPR2023中的FasterNet进行二次创新.\n\n32. ultralytics/cfg/models/11/yolo11-C3k2-Star-CAA.yaml\n\n    使用[StarNet CVPR2024](https://github.com/ma-xu/Rewrite-the-Stars/tree/main)中的StarBlock和[CVPR2024 PKINet](https://github.com/PKINet/PKINet)中的CAA改进C3k2.\n\n33. ultralytics/cfg/models/11/yolo11-bifpn-GLSA.yaml\n\n    使用[GLSA](https://github.com/Barrett-python/DuAT)模块对bifpn进行二次创新.\n\n34. ultralytics/cfg/models/11/yolo11-BIMAFPN.yaml\n\n    利用BIFPN的思想对[MAF-YOLO](https://arxiv.org/pdf/2407.04381)的MAFPN进行二次改进得到BIMAFPN.\n\n35. ultralytics/cfg/models/11/yolo11-C3k2-AdditiveBlock-CGLU.yaml\n\n    使用[CAS-ViT](https://github.com/Tianfang-Zhang/CAS-ViT)中的AdditiveBlock和[TransNeXt CVPR2024](https://github.com/DaiShiResearch/TransNeXt)中的Convolutional GLU改进C3k2.\n\n36. ultralytics/cfg/models/11/yolo11-C3k2-MSMHSA-CGLU.yaml\n\n    使用[CMTFNet](https://github.com/DrWuHonglin/CMTFNet/tree/main)中的M2SA和[TransNeXt CVPR2024](https://github.com/DaiShiResearch/TransNeXt)中的Convolutional GLU改进C3k2.\n\n37. ultralytics/cfg/models/11/yolo11-C3k2-IdentityFormer-CGLU.yaml\n\n    使用[Metaformer TPAMI2024](https://github.com/sail-sg/metaformer)中的IdentityFormer和[TransNeXt CVPR2024](https://github.com/DaiShiResearch/TransNeXt)中的CGLU改进C3k2.\n\n38. ultralytics/cfg/models/11/yolo11-C3k2-RandomMixing-CGLU.yaml\n\n    使用[Metaformer TPAMI2024](https://github.com/sail-sg/metaformer)中的RandomMixing和[TransNeXt CVPR2024](https://github.com/DaiShiResearch/TransNeXt)中的CGLU改进C3k2.\n\n39. ultralytics/cfg/models/11/yolo11-C3k2-PoolingFormer-CGLU.yaml\n\n    使用[Metaformer TPAMI2024](https://github.com/sail-sg/metaformer)中的PoolingFormer和[TransNeXt CVPR2024](https://github.com/DaiShiResearch/TransNeXt)中的CGLU改进C3k2.\n\n40. ultralytics/cfg/models/11/yolo11-C3k2-ConvFormer-CGLU.yaml\n\n    使用[Metaformer TPAMI2024](https://github.com/sail-sg/metaformer)中的ConvFormer和[TransNeXt CVPR2024](https://github.com/DaiShiResearch/TransNeXt)中的CGLU改进C3k2.\n\n41. ultralytics/cfg/models/11/yolo11-C3k2-CaFormer-CGLU.yaml\n\n    使用[Metaformer TPAMI2024](https://github.com/sail-sg/metaformer)中的CaFormer和[TransNeXt CVPR2024](https://github.com/DaiShiResearch/TransNeXt)中的CGLU改进C3k2.\n\n42. ultralytics/cfg/models/11/yolo11-MAN-Faster.yaml\n\n    使用[Hyper-YOLO](https://www.arxiv.org/pdf/2408.04804)中的 Mixed Aggregation Network和[FasterNet CVPR2023](https://github.com/JierunChen/FasterNet)中的Faster-Block进行二次创新改进yolo11.\n\n43. ultralytics/cfg/models/11/yolo11-MAN-FasterCGLU.yaml\n\n    使用[Hyper-YOLO](https://www.arxiv.org/pdf/2408.04804)中的 Mixed Aggregation Network和[FasterNet CVPR2023](https://github.com/JierunChen/FasterNet)中的Faster-Block和[TransNeXt CVPR2024](https://github.com/DaiShiResearch/TransNeXt)中的Convolutional GLU进行二次创新改进yolo11.\n\n44. ultralytics/cfg/models/11/yolo11-MAN-Star.yaml\n\n    使用[Hyper-YOLO](https://www.arxiv.org/pdf/2408.04804)中的 Mixed Aggregation Network和[StarNet CVPR2024](https://github.com/ma-xu/Rewrite-the-Stars/tree/main)中的StarBlock进行二次创新改进yolo11.\n\n45. ultralytics/cfg/models/11/yolo11-MutilBackbone-MSGA.yaml\n\n    使用[MSA^2 Net](https://github.com/xmindflow/MSA-2Net)中的Multi-Scale Adaptive Spatial Attention Gate对自研系列MutilBackbone再次创新.\n\n46. ultralytics/cfg/models/11/yolo11-slimneck-WFU.yaml\n\n    使用[ACMMM2024 WFEN](https://github.com/PRIS-CV/WFEN)中的Wavelet Feature Upgrade改进slimneck.\n\n47. ultralytics/cfg/models/11/yolo11-MAN-FasterCGLU-WFU.yaml\n\n    使用[ACMMM2024 WFEN](https://github.com/PRIS-CV/WFEN)中的Wavelet Feature Upgrade和[Hyper-YOLO](https://www.arxiv.org/pdf/2408.04804)中的 Mixed Aggregation Network和[FasterNet CVPR2023](https://github.com/JierunChen/FasterNet)中的Faster-Block和[TransNeXt CVPR2024](https://github.com/DaiShiResearch/TransNeXt)中的Convolutional GLU进行二次创新改进yolo11.\n\n48. ultralytics/cfg/models/11/yolo11-CDFA.yaml\n\n    使用[ACMMM2024 WFEN](https://github.com/PRIS-CV/WFEN)中的WaveletConv与[AAAI2025 ConDSeg](https://github.com/Mengqi-Lei/ConDSeg)的ContrastDrivenFeatureAggregation结合改进yolo11.\n\n49. ultralytics/cfg/models/11/yolo11-C3k2-Faster-KAN.yaml\n\n    使用[ICLR2025 Kolmogorov-Arnold Transformer](https://github.com/Adamdad/kat)中的KAN对(CVPR2023)fasternet中的FastetBlock进行二次创新.\n\n50. ultralytics/cfg/models/11/yolo11-C3k2-ELGCACGLU.yaml\n\n    使用[ELGC-Net](https://github.com/techmn/elgcnet)中的ELGCA和和[TransNeXt CVPR2024](https://github.com/DaiShiResearch/TransNeXt)中的Convolutional GLU改进C3k2.\n\n51. ultralytics/cfg/models/11/yolo11-C3k2-StripCGLU.yaml\n\n    使用[Strip R-CNN](https://arxiv.org/pdf/2501.03775)中的StripBlock和[TransNeXt CVPR2024](https://github.com/DaiShiResearch/TransNeXt)中的Convolutional GLU改进C3k2.\n\n52. ultralytics/cfg/models/11/yolo11-C3k2-DIMB-KAN.yaml\n\n    在ultralytics/cfg/models/11/yolo11-C3k2-DIMB.yaml的基础上把mlp模块换成[ICLR2025 Kolmogorov-Arnold Transformer](https://github.com/Adamdad/kat)中的KAN.\n\n53. ultralytics/cfg/models/11/yolo11-C2TSSA-DYT.yaml\n\n    使用[CVPR2025 DyT](https://github.com/jiachenzhu/DyT)中的DynamicTan和[ICLR2025 Token Statistics Transformer](https://github.com/RobinWu218/ToST)中的Token Statistics Self-Attention改进C2PSA.\n\n54. ultralytics/cfg/models/11/yolo11-C2Pola-DYT.yaml\n\n    使用[CVPR2025 DyT](https://github.com/jiachenzhu/DyT)中的DynamicTan和[ICLR2025 PolaFormer](https://github.com/ZacharyMeng/PolaFormer)中的PolaAttention改进C2PSA.\n\n55. ultralytics/cfg/models/12/yolo12-A2C2f-CGLU-DYT.yaml\n     \n    使用[CVPR2025 DyT](https://github.com/jiachenzhu/DyT)中的DynamicTanh和[TransNeXt CVPR2024](https://github.com/DaiShiResearch/TransNeXt)中的Convolutional GLU改进A2C2f.\n\n56. ultralytics/cfg/models/12/yolo12-A2C2f-DFFN-DYT.yaml\n\n    使用[CVPR2025 DyT](https://github.com/jiachenzhu/DyT)中的DynamicTanh和[FreqFormer](https://github.com/JPWang-CS/FreqFormer)中的DFFN改进A2C2f.\n\n57. ultralytics/cfg/models/11/yolo11-C3k2-MambaOut-UniRepLK.yaml\n\n    使用[CVPR2025 MambaOut](https://github.com/yuweihao/MambaOut)中的MambaOutBlock和[CVPR2024 UniRepLKNet](https://github.com/AILab-CVC/UniRepLKNet/tree/main)中的DilatedReparamBlock二次创新后改进C3k2.\n\n58. ultralytics/cfg/models/11/yolo11-C3k2-EfficientVIM-CGLU.yaml\n\n    使用[CVPR2025 EfficientViM](https://github.com/mlvlab/EfficientViM)中的EfficientViMBlock和[TransNeXt CVPR2024](https://github.com/DaiShiResearch/TransNeXt)中的Convolutional GLU改进C3k2.\n\n59. Localization Quality Estimation - Lightweight Shared Convolutional Detection Head\n\n    Localization Quality Estimation模块出自[GFocalV2](https://arxiv.org/abs/2011.12885).\n    detect:ultralytics/cfg/models/11/yolo11-LSCD-LQE.yaml\n    seg:ultralytics/cfg/models/11/yolo11-seg-LSCD-LQE.yaml\n    pose:ultralytics/cfg/models/11/yolo11-pose-LSCD-LQE.yaml\n    obb:ultralytics/cfg/models/11/yolo11-obb-LSCD-LQE.yaml\n\n60. ultralytics/cfg/models/11/yolo11-EUCB-SC.yaml\n\n    使用[CVPR2024 EMCAD](https://github.com/SLDGroup/EMCAD)中的EUCB和[CVPR2025 BHViT](https://github.com/IMRL/BHViT)中的ShiftChannelMix改进yolo11的上采样.\n\n61. ultralytics/cfg/models/11/yolo11-EMBSFPN-SC.yaml\n\n    在ultralytics/cfg/models/11/yolo11-EMBSFPN.yaml方案上引入[CVPR2025 BHViT](https://github.com/IMRL/BHViT)中的ShiftChannelMix.\n\n62. ultralytics/cfg/models/12/yolo12-A2C2f-FMFFN-DYT.yaml\n\n    使用[ICLR2024-FTIC](https://github.com/qingshi9974/ICLR2024-FTIC)中的FMFFN和[CVPR2025 DyT](https://github.com/jiachenzhu/DyT)中的DynamicTan对A2C2f二次创新.\n\n63. ultralytics/cfg/models/11/yolo11-MFMMAFPN.yaml\n\n    使用[CVPR2024 DCMPNet](https://github.com/zhoushen1/DCMPNet)中的MFM对[MAF-YOLO](https://arxiv.org/pdf/2407.04381)的MAFPN进行二次创新.\n\n64. ultralytics/cfg/models/11/yolo11-MBSMFFPN.yaml\n\n    使用[CVPR2024 DCMPNet](https://github.com/zhoushen1/DCMPNet)中的MFM对yolo11-EMBSFPN.yaml再次创新 Multi-Branch&Scale Modulation-Fusion FPN.\n\n65. ultralytics/cfg/models/11/yolo11-hyper-MFM.yaml\n\n    使用[CVPR2024 DCMPNet](https://github.com/zhoushen1/DCMPNet)中的MFM对[Hyper-YOLO](https://www.arxiv.org/pdf/2408.04804)中的Hypergraph Computation in Semantic Space进行二次创新.\n\n66. ultralytics/cfg/models/11/yolo11-C2TSSA-DYT-Mona-SEFN.yaml\n\n    使用[CVPR2025 DyT](https://github.com/jiachenzhu/DyT)中的DynamicTan和[ICLR2025 Token Statistics Transformer](https://github.com/RobinWu218/ToST)中的Token Statistics Self-Attention和[CVPR2025 Mona](https://github.com/Leiyi-Hu/mona)的Mona和[WACV2025 SEM-Net](https://github.com/ChrisChen1023/SEM-Net)的Spatially-Enhanced Feedforward Network (SEFN)改进C2PSA.\n\n67. ultralytics/cfg/models/11/yolo11-C2TSSA-DYT-Mona.yaml\n\n    使用[CVPR2025 DyT](https://github.com/jiachenzhu/DyT)中的DynamicTan和[ICLR2025 Token Statistics Transformer](https://github.com/RobinWu218/ToST)中的Token Statistics Self-Attention和[CVPR2025 Mona](https://github.com/Leiyi-Hu/mona)的Mona改进C2PSA.\n\n68. ultralytics/cfg/models/12/yolo12-A2C2f-DFFN-DYT-Mona.yaml\n\n    使用[CVPR2025 DyT](https://github.com/jiachenzhu/DyT)中的DynamicTanh和[FreqFormer](https://github.com/JPWang-CS/FreqFormer)中的DFFN和[CVPR2025 Mona](https://github.com/Leiyi-Hu/mona)的Mona改进A2C2f.\n\n69. ultralytics/cfg/models/11/yolo11-C3k2-MambaOut-LSConv.yaml\n\n    使用[CVPR2025 LSNet](https://github.com/THU-MIG/lsnet)的LSConv与[CVPR2025 MambaOut](https://github.com/yuweihao/MambaOut)中的MambaOutBlock二次创新后改进C3k2.\n\n70. ultralytics/cfg/models/11/yolo11-C2TSSA-DYT-Mona-SEFFN.yaml\n\n    使用[CVPR2025 DyT](https://github.com/jiachenzhu/DyT)中的DynamicTan和[ICLR2025 Token Statistics Transformer](https://github.com/RobinWu218/ToST)中的Token Statistics Self-Attention和[CVPR2025 Mona](https://github.com/Leiyi-Hu/mona)的Mona和[TransMamba](https://github.com/sunshangquan/TransMamba)的SpectralEnhancedFFN改进C2PSA.\n\n71. ultralytics/cfg/models/11/yolo11-C2TSSA-DYT-Mona-EDFFN.yaml\n\n    使用[CVPR2025 DyT](https://github.com/jiachenzhu/DyT)中的DynamicTan和[ICLR2025 Token Statistics Transformer](https://github.com/RobinWu218/ToST)中的Token Statistics Self-Attention和[CVPR2025 Mona](https://github.com/Leiyi-Hu/mona)的Mona和[CVPR2025 EVSSM](https://github.com/kkkls/EVSSM)中的EDFFN改进C2PSA.\n\n72. ultralytics/cfg/models/11/yolo11-C3k2-MambaOut-FDConv.yaml\n\n    使用[CVPR2025 Frequency Dynamic Convolution for Dense Image Prediction](https://github.com/Linwei-Chen/FDConv)的FDConv与[CVPR2025 MambaOut](https://github.com/yuweihao/MambaOut)中的MambaOutBlock二次创新后改进C3k2.\n\n73. ultralytics/cfg/models/11/yolo11-C3k2-PFDConv.yaml\n\n    使用[FasterNet CVPR2023](https://github.com/JierunChen/FasterNet)中的PConv与[CVPR2025 Frequency Dynamic Convolution for Dense Image Prediction](https://github.com/Linwei-Chen/FDConv)的FDConv二次创新后改进C3k2.\n\n74. ultralytics/cfg/models/11/yolo11-C3k2-FasterFD.yaml\n\n    使用[FasterNet CVPR2023](https://github.com/JierunChen/FasterNet)中的FasterBlock与[CVPR2025 Frequency Dynamic Convolution for Dense Image Prediction](https://github.com/Linwei-Chen/FDConv)的FDConv二次创新后改进C3k2.\n\n75. ultralytics/cfg/models/11/yolo11-C3k2-MambaOut-DSA.yaml\n\n    使用[DSA: Deformable Spatial Attention](https://www.techrxiv.org/users/628671/articles/775010-deformable-spatial-attention-networks-enhancing-lightweight-convolutional-models-for-vision-tasks)中的Deformable Spatial Attention Block与[CVPR2025 MambaOut](https://github.com/yuweihao/MambaOut)中的MambaOutBlock二次创新后改进C3k2.\n\n76. ultralytics/cfg/models/11/yolo11-C3k2-DSAN-EDFFN.yaml\n\n    使用[DSA: Deformable Spatial Attention](https://www.techrxiv.org/users/628671/articles/775010-deformable-spatial-attention-networks-enhancing-lightweight-convolutional-models-for-vision-tasks)中的Deformable Spatial Attention Block和[CVPR2025 EVSSM](https://github.com/kkkls/EVSSM)中的EDFFN进行二次创新后改进C3k2.\n\n77. ultralytics/cfg/models/11/yolo11-SOEP-RFPN.yaml\n\n    使用[ECCV2024 rethinking-fpn](https://github.com/AlanLi1997/rethinking-fpn)的SNI和GSConvE对原创改进SOEP再次创新.\n\n78. ultralytics/cfg/models/11/yolo11-SOEP-MFM.yaml\n\n    使用[CVPR2024 DCMPNet](https://github.com/zhoushen1/DCMPNet)中的MFM对原创改进SOEP再次创新.\n\n79. ultralytics/cfg/models/11/yolo11-SOEP-RFPN-MFM.yaml\n\n    使用[ECCV2024 rethinking-fpn](https://github.com/AlanLi1997/rethinking-fpn)的SNI和GSConvE和[CVPR2024 DCMPNet](https://github.com/zhoushen1/DCMPNet)中的MFM对原创改进SOEP再次创新.\n\n80. ultralytics/cfg/models/11/yolo11-C3k2-MambaOut-SFSC.yaml\n\n    使用[CVPR2024 SFSConv](https://github.com/like413/SFS-Conv)的SFSConv与[CVPR2025 MambaOut](https://github.com/yuweihao/MambaOut)中的MambaOutBlock二次创新后改进C3k2.\n\n81. ultralytics/cfg/models/11/yolo11-C3k2-PSFSConv.yaml\n\n    使用[FasterNet CVPR2023](https://github.com/JierunChen/FasterNet)中的PConv与[CVPR2024 SFSConv](https://github.com/like413/SFS-Conv)的SFSConv二次创新后改进C3k2.\n\n82. ultralytics/cfg/models/11/yolo11-C3k2-FasterSFSC.yaml\n\n    使用[FasterNet CVPR2023](https://github.com/JierunChen/FasterNet)中的FasterBlock与[CVPR2024 SFSConv](https://github.com/like413/SFS-Conv)的SFSConv二次创新后改进C3k2.\n\n83. ultralytics/cfg/models/11/yolo11-SOEP-PST.yaml\n\n    使用[Pyramid Sparse Transformer](https://arxiv.org/abs/2505.12772)中的Pyramid Sparse Transformer对原创改进SOEP进行创新.\n\n84. ultralytics/cfg/models/11/yolo11-C3k2-SHSA-EPGO.yaml\n\n    使用[ACM MM 2025 CPRAformer](https://github.com/zs1314/CPRAformer)中的EPGO改进[SHViT CVPR2024](https://github.com/ysj9909/SHViT)中的SHSABlock.\n\n85. ultralytics/cfg/models/11/yolo11-C3k2-SHSA-EPGO-CGLU.yaml\n\n    使用[SHViT CVPR2024](https://github.com/ysj9909/SHViT)中的SHSABlock与[TransNeXt CVPR2024](https://github.com/DaiShiResearch/TransNeXt)中的CGLU与[ACM MM 2025 CPRAformer](https://github.com/zs1314/CPRAformer)中的EPGO联合创新.\n\n86. ultralytics/cfg/models/11/yolo11-MAN-GCConv.yaml\n\n    使用[CVPR2025 Golden Cudgel Network](https://github.com/gyyang23/GCNet)中的GCConv改进[Hyper-YOLO TPAMI2025](https://www.arxiv.org/pdf/2408.04804)中的Mixed Aggregation Network.\n\n### 自研系列\n1. ultralytics/cfg/models/11/yolo11-LAWDS.yaml\n\n    Light Adaptive-weight downsampling.自研模块,具体讲解请看百度云链接中的视频.\n\n2. ultralytics/cfg/models/11/yolo11-C3k2-EMSC.yaml\n\n    Efficient Multi-Scale Conv.自研模块,具体讲解请看百度云链接中的视频.\n\n3. ultralytics/cfg/models/11/yolo11-C3k2-EMSCP.yaml\n\n    Efficient Multi-Scale Conv Plus.自研模块,具体讲解请看百度云链接中的视频.\n\n4. Lightweight Shared Convolutional Detection Head\n\n    自研轻量化检测头.\n    detect:ultralytics/cfg/models/11/yolo11-LSCD.yaml\n    seg:ultralytics/cfg/models/11/yolo11-seg-LSCD.yaml\n    pose:ultralytics/cfg/models/11/yolo11-pose-LSCD.yaml\n    obb:ultralytics/cfg/models/11/yolo11-obb-LSCD.yaml\n    1. GroupNorm在FOCS论文中已经证实可以提升检测头定位和分类的性能.\n    2. 通过使用共享卷积，可以大幅减少参数数量，这使得模型更轻便，特别是在资源受限的设备上.\n    3. 在使用共享卷积的同时，为了应对每个检测头所检测的目标尺度不一致的问题，使用Scale层对特征进行缩放.\n    综合以上，我们可以让检测头做到参数量更少、计算量更少的情况下，尽可能减少精度的损失.\n\n5. Task Align Dynamic Detection Head\n\n    自研任务对齐动态检测头.\n    detect:ultralytics/cfg/models/11/yolo11-TADDH.yaml\n    seg:ultralytics/cfg/models/11/yolo11-seg-TADDH.yaml\n    pose:ultralytics/cfg/models/11/yolo11-pose-TADDH.yaml\n    obb:ultralytics/cfg/models/11/yolo11-obb-TADDH.yaml\n    1. GroupNorm在FCOS论文中已经证实可以提升检测头定位和分类的性能.\n    2. 通过使用共享卷积，可以大幅减少参数数量，这使得模型更轻便，特别是在资源受限的设备上.并且在使用共享卷积的同时，为了应对每个检测头所检测的目标尺度不一致的问题，使用Scale层对特征进行缩放.\n    3. 参照TOOD的思想,除了标签分配策略上的任务对齐,我们也在检测头上进行定制任务对齐的结构,现有的目标检测器头部通常使用独立的分类和定位分支,这会导致两个任务之间缺乏交互,TADDH通过特征提取器从多个卷积层中学习任务交互特征,得到联合特征,定位分支使用DCNV2和交互特征生成DCNV2的offset和mask,分类分支使用交互特征进行动态特征选择.\n\n6. ultralytics/cfg/models/11/yolo11-FDPN.yaml\n\n    自研特征聚焦扩散金字塔网络(Focusing Diffusion Pyramid Network)\n    1. 通过定制的特征聚焦模块与特征扩散机制，能让每个尺度的特征都具有详细的上下文信息，更有利于后续目标的检测与分类。\n    2. 定制的特征聚焦模块可以接受三个尺度的输入，其内部包含一个Inception-Style的模块，其利用一组并行深度卷积来捕获丰富的跨多个尺度的信息。\n    3. 通过扩散机制使具有丰富的上下文信息的特征进行扩散到各个检测尺度.\n\n7. ultralytics/cfg/models/11/yolo11-FDPN-DASI.yaml\n\n    使用[HCFNet](https://github.com/zhengshuchen/HCFNet)中的Dimension-Aware Selective Integration Module对自研的Focusing Diffusion Pyramid Network再次创新.\n\n8. ultralytics/cfg/models/11/yolo11-RGCSPELAN.yaml\n\n    自研RepGhostCSPELAN.\n    1. 参考GhostNet中的思想(主流CNN计算的中间特征映射存在广泛的冗余)，采用廉价的操作生成一部分冗余特征图，以此来降低计算量和参数量。\n    2. 舍弃yolov5与yolo11中常用的BottleNeck，为了弥补舍弃残差块所带来的性能损失，在梯度流通分支上使用RepConv，以此来增强特征提取和梯度流通的能力，并且RepConv可以在推理的时候进行融合，一举两得。\n    3. 可以通过缩放因子控制RGCSPELAN的大小，使其可以兼顾小模型和大模型。\n\n9. Lightweight Shared Convolutional Separamter BN Detection Head\n\n    基于自研轻量化检测头上，参考NASFPN的设计思路把GN换成BN，并且BN层参数不共享.\n    detect:ultralytics/cfg/models/11/yolo11-LSCSBD.yaml\n    seg:ultralytics/cfg/models/11/yolo11-seg-LSCSBD.yaml\n    pose:ultralytics/cfg/models/11/yolo11-pose-LSCSBD.yaml\n    obb:ultralytics/cfg/models/11/yolo11-obb-LSCSBD.yaml\n    1. 由于不同层级之间特征的统计量仍存在差异，Normalization layer依然是必须的，由于直接在共享参数的检测头中引入BN会导致其滑动平均值产生误差，而引入 GN 又会增加推理时的开销，因此我们参考NASFPN的做法，让检测头共享卷积层，而BN则分别独立计算。\n\n10. ultralytics/cfg/models/11/yolo11-EIEStem.yaml\n\n    1. 通过SobelConv分支，可以提取图像的边缘信息。由于Sobel滤波器可以检测图像中强度的突然变化，因此可以很好地捕捉图像的边缘特征。这些边缘特征在许多计算机视觉任务中都非常重要，例如图像分割和物体检测。\n    2. EIEStem模块还结合空间信息，除了边缘信息，EIEStem还通过池化分支提取空间信息，保留重要的空间信息。结合边缘信息和空间信息，可以帮助模型更好地理解图像内容。\n    3. 通过3D组卷积高效实现Sobel算子。\n\n11. ultralytics/cfg/models/11/yolo11-C3k2-EIEM.yaml\n\n    提出了一种新的EIEStem模块，旨在作为图像识别任务中的高效前端模块。该模块结合了提取边缘信息的SobelConv分支和提取空间信息的卷积分支，能够学习到更加丰富的图像特征表示。\n    1. 边缘信息学习: 卷积神经网络 (CNN)通常擅长学习空间信息，但是对于提取图像中的边缘信息可能稍显不足。EIEStem 模块通过SobelConv分支，显式地提取图像的边缘特征。Sobel滤波器是一种经典的边缘检测滤波器，可以有效地捕捉图像中强度的突然变化，从而获得重要的边缘信息。\n    2. 空间信息保留: 除了边缘信息，图像中的空间信息也同样重要。EIEStem模块通过一个额外的卷积分支 (conv_branch) 来提取空间信息。与SobelCon 分支不同，conv_branch提取的是原始图像的特征，可以保留丰富的空间细节。\n    3. 特征融合: EIEStem模块将来自SobelConv分支和conv_branch提取的特征进行融合 (concatenate)。 这种融合操作使得学习到的特征表示既包含了丰富的边缘信息，又包含了空间信息，能够更加全面地刻画图像内容。\n\n12. ultralytics/cfg/models/11/yolo11-ContextGuideFPN.yaml\n\n    Context Guide Fusion Module（CGFM）是一个创新的特征融合模块，旨在改进YOLO11中的特征金字塔网络（FPN）。该模块的设计考虑了多尺度特征融合过程中上下文信息的引导和自适应调整。\n    1. 上下文信息的有效融合：通过SE注意力机制，模块能够在特征融合过程中捕捉并利用重要的上下文信息，从而增强特征表示的有效性，并有效引导模型学习检测目标的信息，从而提高模型的检测精度。\n    2. 特征增强：通过权重化的特征重组操作，模块能够增强重要特征，同时抑制不重要特征，提升特征图的判别能力。\n    3. 简单高效：模块结构相对简单，不会引入过多的计算开销，适合在实时目标检测任务中应用。\n    这期视频讲解在B站:https://www.bilibili.com/video/BV1Vx4y1n7hZ/\n\n13. ultralytics/cfg/models/11/yolo11-LSDECD.yaml\n\n    基于自研轻量化检测头上(LSCD)，使用detail-enhanced convolution进一步改进，提高检测头的细节捕获能力，进一步改善检测精度.\n    detect:ultralytics/cfg/models/11/yolo11-LSDECD.yaml\n    segment:ultralytics/cfg/models/11/yolo11-seg-LSDECD.yaml\n    pose:ultralytics/cfg/models/11/yolo11-pose-LSDECD.yaml\n    obb:ultralytics/cfg/models/11/yolo11-obb-LSDECD.yaml\n    1. DEA-Net中设计了一个细节增强卷积（DEConv），具体来说DEConv将先验信息整合到普通卷积层，以增强表征和泛化能力。然后，通过使用重参数化技术，DEConv等效地转换为普通卷积，不需要额外的参数和计算成本。\n\n14. ultralytics/cfg/models/11/yolo11-C3k2-SMPCGLU.yaml\n\n    Self-moving Point Convolutional GLU模型改进C3k2.\n    SMP来源于[CVPR2023-SMPConv](https://github.com/sangnekim/SMPConv),Convolutional GLU来源于[TransNeXt CVPR2024](https://github.com/DaiShiResearch/TransNeXt).\n    1. 普通的卷积在面对数据中的多样性和复杂性时，可能无法捕捉到有效的特征，因此我们采用了SMPConv，其具备最新的自适应点移动机制，从而更好地捕捉局部特征，提高特征提取的灵活性和准确性。\n    2. 在SMPConv后添加CGLU，Convolutional GLU 结合了卷积和门控机制，能够选择性地通过信息通道，提高了特征提取的有效性和灵活性。\n\n15. Re-CalibrationFPN\n\n    为了加强浅层和深层特征的相互交互能力，推出重校准特征金字塔网络(Re-CalibrationFPN).\n    P2345：ultralytics/cfg/models/11/yolo11-ReCalibrationFPN-P2345.yaml(带有小目标检测头的ReCalibrationFPN)\n    P345：ultralytics/cfg/models/11/yolo11-ReCalibrationFPN-P345.yaml\n    P3456：ultralytics/cfg/models/11/yolo11-ReCalibrationFPN-P3456.yaml(带有大目标检测头的ReCalibrationFPN)\n    1. 浅层语义较少，但细节丰富，有更明显的边界和减少失真。此外，深层蕴藏着丰富的物质语义信息。因此，直接融合低级具有高级特性的特性可能导致冗余和不一致。为了解决这个问题，我们提出了SBA模块，它有选择地聚合边界信息和语义信息来描绘更细粒度的物体轮廓和重新校准物体的位置。\n    2. 相比传统的FPN结构，SBA模块引入了高分辨率和低分辨率特征之间的双向融合机制，使得特征之间的信息传递更加充分，进一步提升了多尺度特征融合的效果。\n    3. SBA模块通过自适应的注意力机制，根据特征图的不同分辨率和内容，自适应地调整特征的权重，从而更好地捕捉目标的多尺度特征。\n\n16. ultralytics/cfg/models/11/yolo11-CSP-PTB.yaml\n\n    Cross Stage Partial - Partially Transformer Block\n    在计算机视觉任务中，Transformer结构因其强大的全局特征提取能力而受到广泛关注。然而，由于Transformer结构的计算复杂度较高，直接将其应用于所有通道会导致显著的计算开销。为了在保证高效特征提取的同时降低计算成本，我们设计了一种混合结构，将输入特征图分为两部分，分别由CNN和Transformer处理，结合了卷积神经网络(CNN)和Transformer机制的模块，旨在增强特征提取的能力。\n    我们提出了一种名为CSP_PTB(Cross Stage Partial - Partially Transformer Block)的模块，旨在结合CNN和Transformer的优势，通过对输入通道进行部分分配来优化计算效率和特征提取能力。\n    1. 融合局部和全局特征：多项研究表明，CNN的感受野大小较少，导致其只能提取局部特征，但Transformer的MHSA能够提取全局特征，能够同时利用两者的优势。\n    2. 保证高效特征提取的同时降低计算成本：为了能引入Transformer结构来提取全局特征又不想大幅度增加计算复杂度，因此提出Partially Transformer Block，只对部分通道使用TransformerBlock。\n    3. MHSA_CGLU包含Mutil-Head-Self-Attention和[ConvolutionalGLU(TransNext CVPR2024)](https://github.com/DaiShiResearch/TransNeXt)，其中Mutil-Head-Self-Attention负责提取全局特征，ConvolutionalGLU用于增强非线性特征表达能力，ConvolutionalGLU相比于传统的FFN，具有更强的性能。\n    4. 可以根据不同的模型大小和具体的运行情况调节用于Transformer的通道数。\n\n17. ultralytics/cfg/models/11/yolo11-SOEP.yaml  \n    \n    小目标在正常的P3、P4、P5检测层上略显吃力，比较传统的做法是加上P2检测层来提升小目标的检测能力，但是同时也会带来一系列的问题，例如加上P2检测层后计算量过大、后处理更加耗时等问题，日益激发需要开发新的针对小目标有效的特征金字塔，我们基于原本的PAFPN上进行改进，提出SmallObjectEnhancePyramid，相对于传统的添加P2检测层，我们使用P2特征层经过SPDConv得到富含小目标信息的特征给到P3进行融合，然后使用CSP思想和基于[AAAI2024的OmniKernel](https://ojs.aaai.org/index.php/AAAI/article/view/27907)进行改进得到CSP-OmniKernel进行特征整合，OmniKernel模块由三个分支组成，即三个分支，即全局分支、大分支和局部分支、以有效地学习从全局到局部的特征表征，最终从而提高小目标的检测性能。(该模块需要在train.py中关闭amp、且在ultralytics/engine/validator.py 115行附近的self.args.half设置为False、跑其余改进记得修改回去！)\n    出现这个报错的:RuntimeError: cuFFT error: CUFFT_INTERNAL_ERROR,如果你是40系显卡,需要更新torch大于2.0，并且cuda大于12.0.\n\n18. ultralytics/cfg/models/11/yolo11-CGRFPN.yaml\n\n    Context-Guided Spatial Feature Reconstruction Feature Pyramid Network.\n    1. 借鉴[ECCV2024-CGRSeg](https://github.com/nizhenliang/CGRSeg)中的Rectangular Self-Calibration Module经过精心设计,用于空间特征重建和金字塔上下文提取,它在水平和垂直方向上捕获全局上下文，并获得轴向全局上下文来显式地建模矩形关键区域.\n    2. PyramidContextExtraction Module使用金字塔上下文提取模块（PyramidContextExtraction），有效整合不同层级的特征信息，提升模型的上下文感知能力。\n    3. FuseBlockMulti 和 DynamicInterpolationFusion 这些模块用于多尺度特征的融合，通过动态插值和多特征融合，进一步提高了模型的多尺度特征表示能力和提升模型对复杂背景下目标的识别能力。\n\n19. ultralytics/cfg/models/11/yolo11-FeaturePyramidSharedConv.yaml\n\n    1. 多尺度特征提取\n        通过使用不同膨胀率的卷积层，模块能够提取不同尺度的特征。这对捕捉图像中不同大小和不同上下文的信息非常有利。\n        低膨胀率捕捉局部细节，高膨胀率捕捉全局上下文。\n    2. 参数共享\n        使用共享的卷积层 self.share_conv，大大减少了需要训练的参数数量。相比于每个膨胀率使用独立的卷积层，共享卷积层能够减少冗余，提升模型效率。\n        减少了模型的存储和计算开销，提升了计算效率。\n    3. 高效的通道变换\n        通过1x1卷积层 self.cv1 和 self.cv2，模块能够高效地调整通道数，并进行特征融合。1x1卷积层在减少参数量的同时还能保留重要的特征信息。\n    4. 更细粒度的特征提取\n        FeaturePyramidSharedConv 使用卷积操作进行特征提取，能够捕捉更加细粒度的特征。相比之下，SPPF 的池化操作可能会丢失一些细节信息。\n        卷积操作在特征提取时具有更高的灵活性和表达能力，可以更好地捕捉图像中的细节和复杂模式。\n\n20. APT(Adaptive Power Transformation)-TAL.\n\n    为了使不同gt预测对的匹配质量和损失权重更具鉴别性，我们通过自定义的PowerTransformer显著增强高质量预测框的权重，抑制低质量预测框的影响，并使模型在学习的过程可以更关注质量高的预测框。\n\n21. ultralytics/cfg/models/11/yolo11-EMBSFPN.yaml\n\n    基于BIFPN、[MAF-YOLO](https://arxiv.org/pdf/2407.04381)、[CVPR2024 EMCAD](https://github.com/SLDGroup/EMCAD)提出全新的Efficient Multi-Branch&Scale FPN.\n    Efficient Multi-Branch&Scale FPN拥有<轻量化>、<多尺度特征加权融合>、<多尺度高效卷积模块>、<高效上采样模块>、<全局异构核选择机制>。\n    1. 具有多尺度高效卷积模块和全局异构核选择机制，Trident网络的研究表明，具有较大感受野的网络更适合检测较大的物体，反之，较小尺度的目标则从较小的感受野中受益，因此我们在FPN阶段，对于不同尺度的特征层选择不同的多尺度卷积核以适应并逐步获得多尺度感知场信息。\n    2. 借鉴BIFPN中的多尺度特征加权融合，能把Concat换成Add来减少参数量和计算量的情况下，还能通过不同尺度特征的重要性进行自适用选择加权融合。\n    3. 高效上采样模块来源于CVPR2024-EMCAD中的EUCB，能够在保证一定效果的同时保持高效性。\n\n22. ultralytics/cfg/models/11/yolo11-CSP-PMSFA.yaml\n\n    自研模块:CSP-Partial Multi-Scale Feature Aggregation.\n    1. 部分多尺度特征提取：参考CVPR2020-GhostNet、CVPR2024-FasterNet的思想，采用高效的PartialConv，该模块能够从输入中提取多种尺度的特征信息，但它并不是在所有通道上进行这种操作，而是部分（Partial）地进行，从而提高了计算效率。\n    2. 增强的特征融合: 最后的 1x1 卷积层通过将不同尺度的特征融合在一起，同时使用残差连接将输入特征与处理后的特征相加，有效保留了原始信息并引入了新的多尺度信息，从而提高模型的表达能力。\n\n23. ultralytics/cfg/models/11/yolo11-MutilBackbone-DAF.yaml\n\n    自研MutilBackbone-DynamicAlignFusion.\n    1. 为了避免在浅层特征图上消耗过多计算资源，设计的MutilBackbone共享一个stem的信息，这个设计有利于避免计算量过大，推理时间过大的问题。\n    2. 为了避免不同Backbone信息融合出现不同来源特征之间的空间差异，我们为此设计了DynamicAlignFusion，其先通过融合来自两个不同模块学习到的特征，然后生成一个名为DynamicAlignWeight去调整各自的特征，最后使用一个可学习的通道权重，其可以根据输入特征动态调整两条路径的权重，从而增强模型对不同特征的适应能力。\n\n24. ultralytics/cfg/models/11/yolo11-C3k2-MutilScaleEdgeInformationEnhance.yaml\n\n    自研CSP-MutilScaleEdgeInformationEnhance.\n    MutilScaleEdgeInformationEnhance模块结合了多尺度特征提取、边缘信息增强和卷积操作。它的主要目的是从不同尺度上提取特征，突出边缘信息，并将这些多尺度特征整合到一起，最后通过卷积层输出增强的特征。这个模块在特征提取和边缘增强的基础上有很好的表征能力.\n    1. 多尺度特征提取：通过 nn.AdaptiveAvgPool2d 进行多尺度的池化，提取不同大小的局部信息，有助于捕捉图像的多层次特征。\n    2. 边缘增强：EdgeEnhancer 模块专门用于提取边缘信息，使得网络对边缘的敏感度增强，这对许多视觉任务（如目标检测、语义分割等）有重要作用。\n    3. 特征融合：将不同尺度下提取的特征通过插值操作对齐到同一尺度，然后将它们拼接在一起，最后经过卷积层融合成统一的特征表示，能够提高模型对多尺度特征的感知。\n\n25. ultralytics/cfg/models/11/yolo11-CSP-FreqSpatial.yaml\n\n    FreqSpatial 是一个融合时域和频域特征的卷积神经网络（CNN）模块。该模块通过在时域和频域中提取特征，旨在捕捉不同层次的空间和频率信息，以增强模型在处理图像数据时的鲁棒性和表示能力。模块的主要特点是将 Scharr 算子（用于边缘检测）与 时域卷积 和 频域卷积 结合，通过多种视角捕获图像的结构特征。\n    1. 时域特征提取：从原始图像中提取出基于空间结构的特征，主要捕捉图像的细节、边缘信息等。\n    2. 频域特征提取：从频率域中提取出频率相关的模式，捕捉到图像的低频和高频成分，能够帮助模型在全局和局部的尺度上提取信息。\n    3. 特征融合：将时域和频域的特征进行加权相加，得到最终的输出特征图。这种加权融合允许模型同时考虑空间结构信息和频率信息，从而增强模型在多种场景下的表现能力。\n\n26. ultralytics/cfg/models/11/yolo11-C3k2-MutilScaleEdgeInformationSelect.yaml\n\n    基于自研CSP-MutilScaleEdgeInformationEnhance再次创新.\n    我们提出了一个 多尺度边缘信息选择模块（MutilScaleEdgeInformationSelect），其目的是从多尺度边缘信息中高效选择与目标任务高度相关的关键特征。为了实现这一目标，我们引入了一个具有通过聚焦更重要的区域能力的注意力机制[ICCV2023 DualDomainSelectionMechanism, DSM](https://github.com/c-yn/FocalNet)。该机制通过聚焦图像中更重要的区域（如复杂边缘和高频信号区域），在多尺度特征中自适应地筛选具有更高任务相关性的特征，从而显著提升了特征选择的精准度和整体模型性能。\n\n27. GlobalEdgeInformationTransfer\n\n    实现版本1：ultralytics/cfg/models/11/yolo11-GlobalEdgeInformationTransfer1.yaml\n    实现版本2：ultralytics/cfg/models/11/yolo11-GlobalEdgeInformationTransfer2.yaml\n    实现版本3：ultralytics/cfg/models/11/yolo11-GlobalEdgeInformationTransfer3.yaml\n    总所周知，物体框的定位非常之依赖物体的边缘信息，但是对于常规的目标检测网络来说，没有任何组件能提高网络对物体边缘信息的关注度，我们需要开发一个能让边缘信息融合到各个尺度所提取的特征中，因此我们提出一个名为GlobalEdgeInformationTransfer(GEIT)的模块，其可以帮助我们把浅层特征中提取到的边缘信息传递到整个backbone上，并与不同尺度的特征进行融合。\n    1. 由于原始图像中含有大量背景信息，因此从原始图像上直接提取边缘信息传递到整个backbone上会给网络的学习带来噪声，而且浅层的卷积层会帮助我们过滤不必要的背景信息，因此我们选择在网络的浅层开发一个名为MutilScaleEdgeInfoGenetator的模块，其会利用网络的浅层特征层去生成多个尺度的边缘信息特征图并投放到主干的各个尺度中进行融合。\n    2. 对于下采样方面的选择，我们需要较为谨慎，我们的目标是保留并增强边缘信息，同时进行下采样，选择MaxPool 会更合适。它能够保留局部区域的最强特征，更好地体现边缘信息。因为 AvgPool 更适用于需要平滑或均匀化特征的场景，但在保留细节和边缘信息方面的表现不如 MaxPool。\n    3. 对于融合部分，ConvEdgeFusion巧妙地结合边缘信息和普通卷积特征，提出了一种新的跨通道特征融合方式。首先，使用conv_channel_fusion进行边缘信息与普通卷积特征的跨通道融合，帮助模型更好地整合不同来源的特征。然后采用conv_3x3_feature_extract进一步提取融合后的特征，以增强模型对局部细节的捕捉能力。最后通过conv_1x1调整输出特征维度。\n\n28. ultralytics/cfg/models/11/yolo11-C3k2-DIMB.yaml\n\n    自研模块DynamicInceptionDWConv2d.(更详细点说明看项目的配置文件.md)\n\n29. ultralytics/cfg/models/11/yolo11-HAFB-1.yaml\n    \n    自研模块Hierarchical Attention Fusion Block, HAFB.(更详细点说明看项目的配置文件.md)\n\n30. ultralytics/cfg/models/11/yolo11-HAFB-2.yaml\n    \n    自研模块Hierarchical Attention Fusion Block, HAFB.(更详细点说明看项目的配置文件.md)\n\n31. ultralytics/cfg/models/11/yolo11-MutilBackbone-HAFB.yaml\n    \n    在yolo11-MutilBackbone-DAF.yaml的自研创新上引入HAFB.\n\n### BackBone系列\n1. ultralytics/cfg/models/11/yolo11-efficientViT.yaml\n    \n    (CVPR2023)efficientViT替换yolo11主干.\n2. ultralytics/cfg/models/11/yolo11-fasternet.yaml\n\n    (CVPR2023)fasternet替换yolo11主干.\n3. ultralytics/cfg/models/11/yolo11-timm.yaml\n\n    使用timm支持的主干网络替换yolo11主干.\n\n4. ultralytics/cfg/models/11/yolo11-convnextv2.yaml\n\n    使用convnextv2网络替换yolo11主干.\n5. ultralytics/cfg/models/11/yolo11-EfficientFormerV2.yaml\n\n    使用EfficientFormerV2网络替换yolo11主干.(需要看[常见错误和解决方案的第五点](#a))  \n6. ultralytics/cfg/models/11/yolo11-vanillanet.yaml\n\n    vanillanet替换yolo11主干.\n7. ultralytics/cfg/models/11/yolo11-LSKNet.yaml\n\n    LSKNet(2023旋转目标检测SOTA的主干)替换yolo11主干.\n8. ultralytics/cfg/models/11/yolo11-swintransformer.yaml\n\n    SwinTransformer-Tiny替换yolo11主干.\n9. ultralytics/cfg/models/11/yolo11-repvit.yaml\n\n    [RepViT](https://github.com/THU-MIG/RepViT/tree/main)替换yolo11主干.\n10. ultralytics/cfg/models/11/yolo11-CSwinTransformer.yaml\n\n    使用[CSWin-Transformer(CVPR2022)](https://github.com/microsoft/CSWin-Transformer/tree/main)替换yolo11主干.(需要看[常见错误和解决方案的第五点](#a))\n11. ultralytics/cfg/models/11/yolo11-HGNetV2.yaml\n\n    使用HGNetV2作为YOLO11的backbone.\n12. ultralytics/cfg/models/11/yolo11-unireplknet.yaml\n\n    使用[UniRepLKNet](https://github.com/AILab-CVC/UniRepLKNet/tree/main)替换yolo11主干.\n13. ultralytics/cfg/models/11/yolo11-TransNeXt.yaml\n\n    使用[TransNeXt](https://github.com/DaiShiResearch/TransNeXt)改进yolo11的backbone.(需要看[常见错误和解决方案的第五点](#a))   \n14. ultralytics/cfg/models/rt-detr/yolo11-rmt.yaml\n\n    使用[CVPR2024 RMT](https://arxiv.org/abs/2309.11523)改进rtdetr的主干.\n15. ultralytics/cfg/models/11/yolo11-pkinet.yaml\n\n    使用[CVPR2024 PKINet](https://github.com/PKINet/PKINet)改进backbone.(需要安装mmcv和mmengine)\n16. ultralytics/cfg/models/11/yolo11-mobilenetv4.yaml\n\n    使用[MobileNetV4](https://github.com/jaiwei98/MobileNetV4-pytorch/tree/main)改进yolo11-backbone.\n17. ultralytics/cfg/models/11/yolo11-starnet.yaml\n\n    使用[StarNet CVPR2024](https://github.com/ma-xu/Rewrite-the-Stars/tree/main)改进yolo11-backbone.\n18. ultralytics/cfg/models/11/yolo11-inceptionnext.yaml\n\n    使用[InceptionNeXt CVPR2024](https://github.com/sail-sg/inceptionnext)替换backbone.\n19. ultralytics/cfg/models/11/yolo11-mambaout.yaml\n     \n    使用[CVPR2025 MambaOut](https://github.com/yuweihao/MambaOut)中的MambaOut替换BackBone.\n20. ultralytics/cfg/models/11/yolo11-MobileMamba.yaml\n     \n    使用[CVPR2025 MobileMamba](https://github.com/lewandofskee/MobileMamba)中的MobileMamba改进Backbone.\n21. ultralytics/cfg/models/11/yolo11-overlock.yaml\n\n    使用[CVPR2025 OverLock](https://arxiv.org/pdf/2502.20087)中的overlock-backbone替换backbone.\n22. ultralytics/cfg/models/11/yolo11-lsnet.yaml\n\n    使用[CVPR2025 LSNet](https://github.com/THU-MIG/lsnet)的LSNet替换yolo11-backbone.\n23. ultralytics/cfg/models/11/yolo11-ESMoE.yaml\n\n    使用[YOLO-Master](https://github.com/isLinXu/YOLO-Master)中的ES-MoE模块改进Yolo11.\n24. ultralytics/cfg/models/11/yolo11-FAENet.yaml\n\n    使用[TGRS2025 MASFNet](https://ieeexplore.ieee.org/document/10955257)中的FAENet增强输入图像的特征.\n\n### SPPF系列\n1. ultralytics/cfg/models/11/yolo11-FocalModulation.yaml\n\n    使用[Focal Modulation](https://github.com/microsoft/FocalNet)替换SPPF.\n2. ultralytics/cfg/models/11/yolo11-SPPF-LSKA.yaml\n\n    使用[LSKA](https://github.com/StevenLauHKHK/Large-Separable-Kernel-Attention)注意力机制改进SPPF,增强多尺度特征提取能力.\n3. ultralytics/cfg/models/11/yolo11-AIFI.yaml\n\n    使用[RT-DETR](https://arxiv.org/pdf/2304.08069.pdf)中的Attention-based Intrascale Feature Interaction(AIFI)改进yolo11.\n4. ultralytics/cfg/models/11/yolo11-AIFIRepBN.yaml\n\n    使用[ICML-2024 SLAB](https://github.com/xinghaochen/SLAB)中的RepBN改进AIFI.\n\n### Neck系列\n1. ultralytics/cfg/models/11/yolo11-bifpn.yaml\n\n    添加BIFPN到yolo11中.  \n    其中BIFPN中有三个可选参数：\n    1. Fusion  \n        其中BIFPN中的Fusion模块支持五种: weight, adaptive, concat, bifpn(default), SDI  \n        其中weight, adaptive, concat出自[paper链接-Figure 3](https://openreview.net/pdf?id=q2ZaVU6bEsT), SDI出自[U-NetV2](https://github.com/yaoppeng/U-Net_v2)\n    2. node_mode  \n        支持大部分C3k2-XXX结构.\n    3. head_channel  \n        BIFPN中的通道数,默认设置为256.\n2. ultralytics/cfg/models/11/yolo11-slimneck.yaml\n\n    使用VoVGSCSP\\VoVGSCSPC和GSConv替换yolo11 neck中的C3k2和Conv.\n3. Asymptotic Feature Pyramid Network[reference](https://github.com/gyyang23/AFPN/tree/master)\n\n    a. ultralytics/cfg/models/11/yolo11-AFPN-P345.yaml  \n    b. ultralytics/cfg/models/11/yolo11-AFPN-P345-Custom.yaml  \n    c. ultralytics/cfg/models/11/yolo11-AFPN-P2345.yaml  \n    d. ultralytics/cfg/models/11/yolo11-AFPN-P2345-Custom.yaml  \n    其中Custom中的block支持大部分C3k2-XXX结构.\n4. ultralytics/cfg/models/11/yolo11-RCSOSA.yaml\n\n    使用[RCS-YOLO](https://github.com/mkang315/RCS-YOLO/tree/main)中的RCSOSA替换C3k2.\n5. ultralytics/cfg/models/11/yolo11-goldyolo.yaml\n\n    利用华为2023最新GOLD-YOLO中的Gatherand-Distribute进行改进特征融合模块\n6. ultralytics/cfg/models/11/yolo11-GFPN.yaml\n\n    使用[DAMO-YOLO](https://github.com/tinyvision/DAMO-YOLO)中的RepGFPN改进Neck.\n7. ultralytics/cfg/models/11/yolo11-EfficientRepBiPAN.yaml\n\n    使用[YOLOV6](https://github.com/meituan/YOLOv6/tree/main)中的EfficientRepBiPAN改进Neck.\n8. ultralytics/cfg/models/11/yolo11-ASF.yaml\n\n    使用[ASF-YOLO](https://github.com/mkang315/ASF-YOLO)中的Attentional Scale Sequence Fusion改进yolo11.\n9. ultralytics/cfg/models/11/yolo11-SDI.yaml\n\n    使用[U-NetV2](https://github.com/yaoppeng/U-Net_v2)中的 Semantics and Detail Infusion Module对yolo11中的feature fusion部分进行重设计.\n10. ultralytics/cfg/models/11/yolo11-HSFPN.yaml\n\n    使用[MFDS-DETR](https://github.com/JustlfC03/MFDS-DETR)中的HS-FPN改进yolo11的neck.\n11. ultralytics/cfg/models/11/yolo11-CSFCN.yaml\n\n    使用[Context and Spatial Feature Calibration for Real-Time Semantic Segmentation](https://github.com/kaigelee/CSFCN/tree/main)中的Context and Spatial Feature Calibration模块改进yolo11.\n12. ultralytics/cfg/models/11/yolo11-CGAFusion.yaml\n\n    使用[DEA-Net](https://github.com/cecret3350/DEA-Net)中的content-guided attention fusion改进yolo11-neck.\n13. ultralytics/cfg/models/11/yolo11-SDFM.yaml\n\n    使用[PSFusion](https://github.com/Linfeng-Tang/PSFusion)中的superficial detail fusion module改进yolo11-neck.\n\n14. ultralytics/cfg/models/11/yolo11-PSFM.yaml\n\n    使用[PSFusion](https://github.com/Linfeng-Tang/PSFusion)中的profound semantic fusion module改进yolo11-neck.\n\n15. ultralytics/cfg/models/11/yolo11-GLSA.yaml\n\n    使用[GLSA](https://github.com/Barrett-python/DuAT)模块改进yolo11的neck.\n\n16. ultralytics/cfg/models/11/yolo11-CTrans.yaml\n\n    使用[[AAAI2022] UCTransNet](https://github.com/McGregorWwww/UCTransNet/tree/main)中的ChannelTransformer改进yolo11-neck.(需要看[常见错误和解决方案的第五点](#a))  \n\n17. ultralytics/cfg/models/11/yolo11-p6-CTrans.yaml\n\n    使用[[AAAI2022] UCTransNet](https://github.com/McGregorWwww/UCTransNet/tree/main)中的ChannelTransformer改进yolo11-neck.(带有p6版本)(需要看[常见错误和解决方案的第五点](#a))  \n\n18. ultralytics/cfg/models/11/yolo11-MAFPN.yaml\n\n    使用[MAF-YOLO](https://arxiv.org/pdf/2407.04381)的MAFPN改进Neck.\n\n19. ultralytics/cfg/models/11/yolo11-hyper.yaml\n\n    使用[Hyper-YOLO](https://www.arxiv.org/pdf/2408.04804)中的Hypergraph Computation in Semantic Space改进yolov11.\n\n20. ultralytics/cfg/models/11/yolo11-msga.yaml\n\n    使用[MSA^2 Net](https://github.com/xmindflow/MSA-2Net)中的Multi-Scale Adaptive Spatial Attention Gate改进yolo11-neck.\n\n21. ultralytics/cfg/models/11/yolo11-WFU.yaml\n\n    使用[ACMMM2024 WFEN](https://github.com/PRIS-CV/WFEN)中的Wavelet Feature Upgrade改进yolo11-neck.\n\n22. ultralytics/cfg/models/11/yolo11-mpcafsa.yaml\n\n    使用[BIBM2024 Spatial-Frequency Dual Domain Attention Network For Medical Image Segmentation](https://github.com/nkicsl/SF-UNet)的Frequency-Spatial Attention和Multi-scale Progressive Channel Attention改进yolo11-neck.\n\n23. ultralytics/cfg/models/11/yolo11-fsa.yaml\n\n    使用[BIBM2024 Spatial-Frequency Dual Domain Attention Network For Medical Image Segmentation](https://github.com/nkicsl/SF-UNet)的Frequency-Spatial Attention改进yolo11.\n\n24. ultralytics/cfg/models/11/yolo11-GDSAFusion.yaml\n\n    使用[CVPR2025 OverLock](https://arxiv.org/pdf/2502.20087)中的GDSAFusion改进neck.\n\n25. ultralytics/cfg/models/11/yolo11-MFM.yaml\n\n    使用[CVPR2024 DCMPNet](https://github.com/zhoushen1/DCMPNet)中的MFM改进neck.\n\n26. ultralytics/cfg/models/11/yolo11-RFPN.yaml\n\n    使用[ECCV2024 rethinking-fpn](https://github.com/AlanLi1997/rethinking-fpn)的SNI和GSConvE改进YOLO11-neck.\n\n27. ultralytics/cfg/models/11/yolo11-PST.yaml\n\n    使用[Pyramid Sparse Transformer](https://arxiv.org/abs/2505.12772)中的Pyramid Sparse Transformer改进yolo11-neck.\n\n28. ultralytics/cfg/models/11/yolo11-HS-FPN.yaml\n\n    使用[AAAI2025 HS-FPN](https://github.com/ShiZican/HS-FPN/tree/main)中的HFP和SDP改进yolo11-neck.\n\n29. ultralytics/cfg/models/11/yolo11-MSAM.yaml\n\n    使用[TGRS2025 UMFormer](https://github.com/takeyoutime/UMFormer)中的MSAM和yolo13的扩散机制改进yolo11-neck.\n\n30. ultralytics/cfg/models/11/yolo11-DPCF.yaml\n\n    使用[INFFUS2025 SAMamba](https://arxiv.org/pdf/2505.23214)中的DPCF改进neck.\n\n31. ultralytics/cfg/models/11/yolo11-LCA.yaml\n\n    使用[CVPR2025 HVI](https://arxiv.org/pdf/2502.20272)中的LCA改进yolo11-neck.\n\n32. ultralytics/cfg/models/11/yolo11-HFFE.yaml\n\n    使用[TGRS2025 HAFNet](https://ieeexplore.ieee.org/document/11154006)中的HFFE改进yolo11-neck.\n\n33. ultralytics/cfg/models/11/yolo11-MFPM.yaml\n\n    使用[TGRS2025 ISGLNet](https://ieeexplore.ieee.org/document/11232501)中的MFPM改进特征融合.\n\n34. ultralytics/cfg/models/11/yolo11-ERM.yaml\n\n    使用[TGRS2025 ISGLNet](https://ieeexplore.ieee.org/document/11232501)中的ERM改进特征融合.\n\n35. ultralytics/cfg/models/11/yolo11-CAFM.yaml\n    \n    使用[TIP2025 DSMT](https://ieeexplore.ieee.org/document/10955125)中的CAFM改进yolo11-neck.\n\n### Head系列\n1. ultralytics/cfg/models/11/yolo11-dyhead.yaml\n\n    添加基于注意力机制的目标检测头到yolo11中.\n2. ultralytics/cfg/models/11/yolo11-EfficientHead.yaml\n\n    对检测头进行重设计,支持2种轻量化检测头.详细请看ultralytics/nn/extra_modules/head.py中的Detect_Efficient class.\n3. ultralytics/cfg/models/11/yolo11-aux.yaml\n\n    参考YOLOV7-Aux对YOLO11添加额外辅助训练头,在训练阶段参与训练,在最终推理阶段去掉.  \n    其中辅助训练头的损失权重系数可在ultralytics/utils/loss.py中的class v8DetectionLoss中的__init__函数中的self.aux_loss_ratio设定,默认值参考yolov7为0.25.\n4. ultralytics/cfg/models/11/yolo11-seg-EfficientHead.yaml(实例分割)\n\n    对检测头进行重设计,支持2种轻量化检测头.详细请看ultralytics/nn/extra_modules/head.py中的Detect_Efficient class. \n5. ultralytics/cfg/models/11/yolo11-SEAMHead.yaml\n\n    使用[YOLO-Face V2](https://arxiv.org/pdf/2208.02019v2.pdf)中的遮挡感知注意力改进Head,使其有效地处理遮挡场景.\n6. ultralytics/cfg/models/11/yolo11-MultiSEAMHead.yaml\n\n    使用[YOLO-Face V2](https://arxiv.org/pdf/2208.02019v2.pdf)中的遮挡感知注意力改进Head,使其有效地处理遮挡场景.\n7. ultralytics/cfg/models/11/yolo11-PGI.yaml\n\n    使用[YOLOV9](https://github.com/WongKinYiu/yolov9)的programmable gradient information改进YOLO11.(PGI模块可在训练结束后去掉)\n8. Lightweight Asymmetric Detection Head\n\n    detect:ultralytics/cfg/models/11/yolo11-LADH.yaml\n    segment:ultralytics/cfg/models/11/yolo11-seg-LADH.yaml\n    pose:ultralytics/cfg/models/11/yolo11-pose-LADH.yaml\n    obb:ultralytics/cfg/models/11/yolo11-obb-LADH.yaml\n    使用[Faster and Lightweight: An Improved YOLOv5 Object Detector for Remote Sensing Images](https://www.mdpi.com/2072-4292/15/20/4974)中的Lightweight Asymmetric Detection Head改进yolo11-head.\n9. ultralytics/cfg/models/11/yolo11-atthead.yaml\n\n    B站注意力教程例子.链接:https://www.bilibili.com/video/BV1mXkVYAEGM/\n10. Localization Quality Estimation Head\n\n    此模块出自[GFocalV2](https://arxiv.org/abs/2011.12885).\n    detect:ultralytics/cfg/models/11/yolo11-LQEHead.yaml\n    segmet:ultralytics/cfg/models/11/yolo11-seg-LQE.yaml\n    pose:ultralytics/cfg/models/11/yolo11-pose-LQE.yaml\n    obb:ultralytics/cfg/models/11/yolo11-obb-LQE.yaml\n\n### Label Assign系列\n1. Adaptive Training Sample Selection匹配策略.\n\n    在ultralytics/utils/loss.py中的class v8DetectionLoss中自行选择对应的self.assigner即可.\n\n### PostProcess系列\n1. soft-nms(IoU,GIoU,DIoU,CIoU,EIoU,SIoU,ShapeIoU)\n\n    soft-nms替换nms.(建议:仅在val.py时候使用,具体替换请看20240122版本更新说明)\n\n2. ultralytics/cfg/models/11/yolo11-nmsfree.yaml\n\n    仿照yolov10的思想采用双重标签分配和一致匹配度量进行训练,后处理不需要NMS!\n\n### 上下采样算子\n1. ultralytics/cfg/models/11/yolo11-ContextGuidedDown.yaml\n\n    使用[CGNet](https://github.com/wutianyiRosun/CGNet/tree/master)中的Light-weight Context Guided DownSample进行下采样.\n2. ultralytics/cfg/models/11/yolo11-SPDConv.yaml\n\n    使用[SPDConv](https://github.com/LabSAINT/SPD-Conv/tree/main)进行下采样.\n3. ultralytics/cfg/models/11/yolo11-dysample.yaml\n\n    使用[ICCV2023 DySample](https://arxiv.org/abs/2308.15085)改进yolo11-neck中的上采样.\n\n4. ultralytics/cfg/models/11/yolo11-CARAFE.yaml\n\n    使用[ICCV2019 CARAFE](https://arxiv.org/abs/1905.02188)改进yolo11-neck中的上采样.\n\n5. ultralytics/cfg/models/11/yolo11-HWD.yaml\n\n    使用[Haar wavelet downsampling](https://www.sciencedirect.com/science/article/abs/pii/S0031320323005174)改进yolo11的下采样.(请关闭AMP情况下使用)\n\n6. ultralytics/cfg/models/11/yolo11-v7DS.yaml\n\n    使用[YOLOV7 CVPR2023](https://arxiv.org/abs/2207.02696)的下采样结构改进YOLO11中的下采样.\n\n7. ultralytics/cfg/models/11/yolo11-ADown.yaml\n\n    使用[YOLOV9](https://github.com/WongKinYiu/yolov9)的下采样结构改进YOLO11中的下采样.\n\n8. ultralytics/cfg/models/11/yolo11-SRFD.yaml\n\n    使用[A Robust Feature Downsampling Module for Remote Sensing Visual Tasks](https://ieeexplore.ieee.org/document/10142024)改进yolo11的下采样.\n\n9. ultralytics/cfg/models/11/yolo11-WaveletPool.yaml\n\n    使用[Wavelet Pooling](https://openreview.net/forum?id=rkhlb8lCZ)改进YOLO11的上采样和下采样。\n\n10. ultralytics/cfg/models/11/yolo11-LDConv.yaml\n\n    使用[LDConv](https://github.com/CV-ZhangXin/LDConv/tree/main)改进下采样.\n\n11. ultralytics/cfg/models/11/yolo11-PSConv.yaml\n\n    使用[AAAI2025 Pinwheel-shaped Convolution and Scale-based Dynamic Loss for Infrared Small Target Detection](https://github.com/JN-Yang/PConv-SDloss-Data)中的Pinwheel-shaped Convolution改进yolo11.\n\n12. ultralytics/cfg/models/11/yolo11-EUCB.yaml\n\n    使用[CVPR2024 EMCAD](https://github.com/SLDGroup/EMCAD)中的EUCB改进yolo11的上采样.\n\n13. ultralytics/cfg/models/11/yolo11-LoGStem.yaml\n\n    使用[LEGNet](https://github.com/lwCVer/LEGNet)中的LoGStem改进Stem(第一第二层卷积).\n\n14. ultralytics/cfg/models/11/yolo11-wConv.yaml\n\n    使用[weightedConvolution2.0](https://github.com/cammarasana123/weightedConvolution2.0)中的wConv2d改进yolo11.\n\n15. ultralytics/cfg/models/11/yolo11-FourierConv.yaml\n\n    使用[MIA2025 Fourier Convolution Block with global receptive field for MRI reconstruction](https://www.sciencedirect.com/science/article/abs/pii/S1361841524002743)中的FourierConv改进Conv.\n\n16. ultralytics/cfg/models/11/yolo11-Converse2D.yaml\n\n    使用[ICCV2025 ConverseBNet](https://github.com/cszn/ConverseNet)中的Converse2D改进neck中的上采样.\n\n17. ultralytics/cfg/models/11/yolo11-GCConv.yaml\n\n    使用[CVPR2025 Golden Cudgel Network](https://github.com/gyyang23/GCNet)中的GCConv改进下采样.\n\n18. ultralytics/cfg/models/11/yolo11-RepStem.yaml\n\n    使用[ICCV2023 FastVit](https://arxiv.org/pdf/2303.14189)中的RepStem改进yolo11下采样.\n\n19. ultralytics/cfg/models/11/yolo11-FSConv.yaml\n\n    使用[TGRS2025 Think Locally and Act Globally](https://ieeexplore.ieee.org/document/11175146)中的FSConv改进下采样.\n\n### YOLO11-C3k2系列\n1. ultralytics/cfg/models/11/yolo11-C3k2-Faster.yaml\n\n    使用C3k2-Faster替换C3k2.(使用FasterNet中的FasterBlock替换C3k2中的Bottleneck)\n2. ultralytics/cfg/models/11/yolo11-C3k2-ODConv.yaml\n\n    使用C3k2-ODConv替换C3k2.(使用ODConv替换C3k2中的Bottleneck中的Conv)\n3. ultralytics/cfg/models/11/yolo11-C3k2-ODConv.yaml\n\n    使用C3k2-ODConv替换C3k2.(使用ODConv替换C3k2中的Bottleneck中的Conv)\n4. ultralytics/cfg/models/11/yolo11-C3k2-Faster-EMA.yaml\n\n    使用C3k2-Faster-EMA替换C3k2.(C3k2-Faster-EMA推荐可以放在主干上,Neck和head部分可以选择C3k2-Faster)\n5. ultralytics/cfg/models/11/yolo11-C3k2-DBB.yaml\n\n    使用C3k2-DBB替换C3k2.(使用DiverseBranchBlock替换C3k2中的Bottleneck中的Conv)\n6. ultralytics/cfg/models/11/yolo11-C3k2-CloAtt.yaml\n\n    使用C3k2-CloAtt替换C3k2.(使用CloFormer中的具有全局和局部特征的注意力机制添加到C3k2中的Bottleneck中)(需要看[常见错误和解决方案的第五点](#a))\n7. ultralytics/cfg/models/11/yolo11-C3k2-SCConv.yaml\n\n    SCConv(CVPR2020 http://mftp.mmcheng.net/Papers/20cvprSCNet.pdf)与C3k2融合.\n8. ultralytics/cfg/models/11/yolo11-C3k2-SCcConv.yaml\n\n    ScConv(CVPR2023 https://openaccess.thecvf.com/content/CVPR2023/papers/Li_SCConv_Spatial_and_Channel_Reconstruction_Convolution_for_Feature_Redundancy_CVPR_2023_paper.pdf)与C3k2融合.  \n    (取名为SCcConv的原因是在windows下命名是不区分大小写的)\n9. ultralytics/cfg/models/11/yolo11-KernelWarehouse.yaml\n    \n    使用[Towards Parameter-Efficient Dynamic Convolution](https://github.com/OSVAI/KernelWarehouse)添加到yolo11中.  \n    使用此模块需要注意,在epoch0-20的时候精度会非常低,过了20epoch会正常.\n10. ultralytics/cfg/models/11/yolo11-C3k2-DySnakeConv.yaml\n\n    [DySnakeConv](https://github.com/YaoleiQi/DSCNet)与C3k2融合.\n11. ultralytics/cfg/models/11/yolo11-C3k2-DCNV2.yaml\n\n    使用C3k2-DCNV2替换C3k2.(DCNV2为可变形卷积V2)\n12. ultralytics/cfg/models/11/yolo11-C3k2-DCNV3.yaml\n\n    使用C3k2-DCNV3替换C3k2.([DCNV3](https://github.com/OpenGVLab/InternImage)为可变形卷积V3(CVPR2023,众多排行榜的SOTA))  \n    官方中包含了一些指定版本的DCNV3 whl包,下载后直接pip install xxx即可.具体和安装DCNV3可看百度云链接中的视频.\n13. ultralytics/cfg/models/11/yolo11-C3k2-OREPA.yaml\n\n    使用C3k2-OREPA替换C3k2.[Online Convolutional Re-parameterization (CVPR2022)](https://github.com/JUGGHM/OREPA_CVPR2022/tree/main)\n14. ultralytics/cfg/models/11/yolo11-C3k2-REPVGGOREPA.yaml\n\n    使用C3k2-REPVGGOREPA替换C3k2.[Online Convolutional Re-parameterization (CVPR2022)](https://github.com/JUGGHM/OREPA_CVPR2022/tree/main)\n15. ultralytics/cfg/models/11/yolo11-C3k2-DCNV4.yaml\n\n    使用[DCNV4](https://github.com/OpenGVLab/DCNv4)改进C3k2.(请关闭AMP进行训练,使用教程请看20240116版本更新说明)\n16. ultralytics/cfg/models/11/yolo11-C3k2-ContextGuided.yaml\n\n    使用[CGNet](https://github.com/wutianyiRosun/CGNet/tree/master)中的Light-weight Context Guided改进C3k2.\n17. ultralytics/cfg/models/11/yolo11-C3k2-MSBlock.yaml\n\n    使用[YOLO-MS](https://github.com/FishAndWasabi/YOLO-MS/tree/main)中的MSBlock改进C3k2.\n18. ultralytics/cfg/models/11/yolo11-C3k2-DLKA.yaml\n\n    使用[deformableLKA](https://github.com/xmindflow/deformableLKA)改进C3k2.\n19. ultralytics/cfg/models/11/yolo11-C3k2-DAttention.yaml\n\n    使用[Vision Transformer with Deformable Attention(CVPR2022)](https://github.com/LeapLabTHU/DAT)改进C3k2.(需要看[常见错误和解决方案的第五点](#a))  \n    使用注意点请看百度云视频.(DAttention(Vision Transformer with Deformable Attention CVPR2022)使用注意说明.)\n20. 使用[ParC-Net](https://github.com/hkzhang-git/ParC-Net/tree/main)中的ParC_Operator改进C3k2.(需要看[常见错误和解决方案的第五点](#a))  \n    使用注意点请看百度云视频.(20231031更新说明)    \n21. ultralytics/cfg/models/11/yolo11-C3k2-DWR.yaml\n\n    使用[DWRSeg](https://arxiv.org/abs/2212.01173)中的Dilation-wise Residual(DWR)模块,加强从网络高层的可扩展感受野中提取特征.\n22. ultralytics/cfg/models/11/yolo11-C3k2-RFAConv.yaml\n\n    使用[RFAConv](https://github.com/Liuchen1997/RFAConv/tree/main)中的RFAConv改进yolo11.\n\n23. ultralytics/cfg/models/11/yolo11-C3k2-RFCBAMConv.yaml\n\n    使用[RFAConv](https://github.com/Liuchen1997/RFAConv/tree/main)中的RFCBAMConv改进yolo11.\n\n24. ultralytics/cfg/models/11/yolo11-C3k2-RFCAConv.yaml\n\n    使用[RFAConv](https://github.com/Liuchen1997/RFAConv/tree/main)中的RFCAConv改进yolo11.\n25. ultralytics/cfg/models/11/yolo11-C3k2-FocusedLinearAttention.yaml\n\n    使用[FLatten Transformer(ICCV2023)](https://github.com/LeapLabTHU/FLatten-Transformer)中的FocusedLinearAttention改进C3k2.(需要看[常见错误和解决方案的第五点](#a))    \n    使用注意点请看百度云视频.(20231114版本更新说明.)\n26. ultralytics/cfg/models/11/yolo11-C3k2-MLCA.yaml\n\n    使用[Mixed Local Channel Attention 2023](https://github.com/wandahangFY/MLCA/tree/master)改进C3k2.(用法请看百度云视频-20231129版本更新说明)\n\n27. ultralytics/cfg/models/11/yolo11-C3k2-AKConv.yaml\n\n    使用[AKConv 2023](https://github.com/CV-ZhangXin/AKConv)改进C3k2.(用法请看百度云视频-20231129版本更新说明)\n28. ultralytics/cfg/models/11/yolo11-C3k2-UniRepLKNetBlock.yaml\n\n    使用[UniRepLKNet](https://github.com/AILab-CVC/UniRepLKNet/tree/main)中的UniRepLKNetBlock改进C3k2.\n29. ultralytics/cfg/models/11/yolo11-C3k2-DRB.yaml\n\n    使用[UniRepLKNet](https://github.com/AILab-CVC/UniRepLKNet/tree/main)中的DilatedReparamBlock改进C3k2.\n30. ultralytics/cfg/models/11/yolo11-C3k2-AggregatedAtt.yaml\n\n    使用[TransNeXt](https://github.com/DaiShiResearch/TransNeXt)中的聚合感知注意力改进C3k2.(需要看[常见错误和解决方案的第五点](#a))   \n\n31. ultralytics/cfg/models/11/yolo11-C3k2-SWC.yaml\n\n    使用[shift-wise conv](https://arxiv.org/abs/2401.12736)改进yolo11中的C3k2.\n\n32. ultralytics/cfg/models/11/yolo11-C3k2-iRMB.yaml\n\n    使用[EMO ICCV2023](https://github.com/zhangzjn/EMO)中的iRMB改进C3k2.\n\n33. ultralytics/cfg/models/11/yolo11-C3k2-VSS.yaml\n\n    使用最新的Mamba架构[Mamba-UNet中的VSS](https://github.com/ziyangwang007/Mamba-UNet)对C3k2中的BottleNeck进行改进,使其能更有效地捕获图像中的复杂细节和更广泛的语义上下文.\n\n34. ultralytics/cfg/models/11/yolo11-C3k2-LVMB.yaml\n\n    使用最新的Mamba架构[Mamba-UNet中的VSS](https://github.com/ziyangwang007/Mamba-UNet)与Cross Stage Partial进行结合,使其能更有效地捕获图像中的复杂细节和更广泛的语义上下文.\n\n35. ultralytics/cfg/models/11/yolo11-RepNCSPELAN.yaml\n\n    使用[YOLOV9](https://github.com/WongKinYiu/yolov9)中的RepNCSPELAN进行改进yolo11.\n\n36. ultralytics/cfg/models/11/yolo11-C3k2-DynamicConv.yaml\n\n    使用[CVPR2024 parameternet](https://arxiv.org/pdf/2306.14525v2.pdf)中的DynamicConv改进C3k2.\n\n37. ultralytics/cfg/models/11/yolo11-C3k2-GhostDynamicConv.yaml\n\n    使用[CVPR2024 parameternet](https://arxiv.org/pdf/2306.14525v2.pdf)中的GhostModule改进C3k2.\n\n38. ultralytics/cfg/models/11/yolo11-C3k2-RVB.yaml\n\n    使用[CVPR2024 RepViT](https://github.com/THU-MIG/RepViT/tree/main)中的RepViTBlock改进C3k2.\n\n39. ultralytics/cfg/models/11/yolo11-DGCST.yaml\n\n    使用[Lightweight Object Detection](https://arxiv.org/abs/2403.01736)中的Dynamic Group Convolution Shuffle Transformer改进yolo11.\n\n40. ultralytics/cfg/models/11/yolo11-C3k2-RetBlock.yaml\n\n    使用[CVPR2024 RMT](https://arxiv.org/abs/2309.11523)中的RetBlock改进C3k2.\n\n41. ultralytics/cfg/models/11/yolo11-C3k2-PKI.yaml\n\n    使用[CVPR2024 PKINet](https://github.com/PKINet/PKINet)中的PKIModule和CAA模块改进C3k2.\n\n42. ultralytics/cfg/models/11/yolo11-RepNCSPELAN_CAA.yaml\n\n    使用[CVPR2024 PKINet](https://github.com/PKINet/PKINet)中的CAA模块改进RepNCSPELAN.\n\n43. ultralytics/cfg/models/11/yolo11-C3k2-fadc.yaml\n\n    使用[CVPR2024 Frequency-Adaptive Dilated Convolution](https://github.com/Linwei-Chen/FADC)改进C3k2.\n\n44. ultralytics/cfg/models/11/yolo11-C3k2-PPA.yaml\n\n    使用[HCFNet](https://github.com/zhengshuchen/HCFNet)中的Parallelized Patch-Aware Attention Module改进C3k2.\n\n45. ultralytics/cfg/models/11/yolo11-C3k2-Star.yaml\n\n    使用[StarNet CVPR2024](https://github.com/ma-xu/Rewrite-the-Stars/tree/main)中的StarBlock改进C3k2.\n\n46. ultralytics/cfg/models/11/yolo11-C3k2-KAN.yaml\n\n    KAN In! Mamba Out! Kolmogorov-Arnold Networks.\n    目前支持:\n    1. FastKANConv2DLayer\n    2. KANConv2DLayer\n    3. KALNConv2DLayer\n    4. KACNConv2DLayer\n    5. KAGNConv2DLayer\n\n47. ultralytics/cfg/models/11/yolo11-C3k2-DEConv.yaml\n\n    使用[DEA-Net](https://github.com/cecret3350/DEA-Net)中的detail-enhanced convolution改进C3k2.\n\n48. ultralytics/cfg/models/11/yolo11-C3k2-Heat.yaml\n\n    使用[vHeat](https://github.com/MzeroMiko/vHeat/tree/main)中的HeatBlock改进C3k2.\n\n49. ultralytics/cfg/models/11/yolo11-C3k2-WTConv.yaml\n\n    使用[ECCV2024 Wavelet Convolutions for Large Receptive Fields](https://github.com/BGU-CS-VIL/WTConv)中的WTConv改进C3k2-BottleNeck.\n\n50. ultralytics/cfg/models/11/yolo11-C3k2-FMB.yaml\n\n    使用[ECCV2024 SMFANet](https://github.com/Zheng-MJ/SMFANet/tree/main)的Feature Modulation block改进C3k2.\n\n51. ultralytics/cfg/models/11/yolo11-C3k2-gConv.yaml\n\n    使用[Rethinking Performance Gains in Image Dehazing Networks](https://arxiv.org/abs/2209.11448)的gConvblock改进C3k2.\n\n52. ultralytics/cfg/models/11/yolo11-C3k2-WDBB.yaml\n\n    使用[YOLO-MIF](https://github.com/wandahangFY/YOLO-MIF)中的WDBB改进C3k2.\n\n53. ultralytics/cfg/models/11/yolo11-C3k2-DeepDBB.yaml\n\n    使用[YOLO-MIF](https://github.com/wandahangFY/YOLO-MIF)中的DeepDBB改进C3k2.\n\n54. ultralytics/cfg/models/11/yolo11-C3k2-AdditiveBlock.yaml\n\n    使用[CAS-ViT](https://github.com/Tianfang-Zhang/CAS-ViT)中的AdditiveBlock改进C3k2.\n\n55. ultralytics/cfg/models/11/yolo11-C3k2-MogaBlock.yaml\n\n    使用[MogaNet ICLR2024](https://github.com/Westlake-AI/MogaNet)中的MogaBlock改进C3k2.\n\n56. ultralytics/cfg/models/11/yolo11-C3k2-IdentityFormer.yaml\n\n    使用[Metaformer TPAMI2024](https://github.com/sail-sg/metaformer)中的IdentityFormer改进C3k2.\n\n57. ultralytics/cfg/models/11/yolo11-C3k2-RandomMixing.yaml\n\n    使用[Metaformer TPAMI2024](https://github.com/sail-sg/metaformer)中的RandomMixingFormer改进C3k2.(需要看[常见错误和解决方案的第五点](#a))\n\n58. ultralytics/cfg/models/11/yolo11-C3k2-PoolingFormer.yaml\n\n    使用[Metaformer TPAMI2024](https://github.com/sail-sg/metaformer)中的PoolingFormer改进C3k2.\n\n59. ultralytics/cfg/models/11/yolo11-C3k2-ConvFormer.yaml\n\n    使用[Metaformer TPAMI2024](https://github.com/sail-sg/metaformer)中的ConvFormer改进C3k2.\n\n60. ultralytics/cfg/models/11/yolo11-C3k2-CaFormer.yaml\n\n    使用[Metaformer TPAMI2024](https://github.com/sail-sg/metaformer)中的CaFormer改进C3k2.\n\n61. ultralytics/cfg/models/11/yolo11-C3k2-FFCM.yaml\n\n    使用[Efficient Frequency-Domain Image Deraining with Contrastive Regularization ECCV2024](https://github.com/deng-ai-lab/FADformer)中的Fused_Fourier_Conv_Mixer改C3k2.\n\n62. ultralytics/cfg/models/11/yolo11-C3k2-SFHF.yaml\n\n    使用[SFHformer ECCV2024](https://github.com/deng-ai-lab/SFHformer)中的block改进C3k2.\n\n63. ultralytics/cfg/models/11/yolo11-C3k2-MSM.yaml\n\n    使用[Revitalizing Convolutional Network for Image Restoration TPAMI2024](https://zhuanlan.zhihu.com/p/720777160)中的MSM改进C3k2.\n\n64. ultralytics/cfg/models/11/yolo11-C3k2-HDRAB.yaml\n\n    使用[Pattern Recognition 2024|DRANet](https://github.com/WenCongWu/DRANet)中的RAB( residual attention block)改进C3k2.\n\n65. ultralytics/cfg/models/11/yolo11-C3k2-RAB.yaml\n\n    使用[Pattern Recognition 2024|DRANet](https://github.com/WenCongWu/DRANet)中的HDRAB(hybrid dilated residual attention block)改进C3k2.\n\n66. ultralytics/cfg/models/11/yolo11-C3k2-LFE.yaml\n\n    使用[Efficient Long-Range Attention Network for Image Super-resolution ECCV2022](https://github.com/xindongzhang/ELAN)中的Local feature extraction改进C3k2.\n\n67. ultralytics/cfg/models/11/yolo11-C3k2-SFA.yaml\n\n    使用[FreqFormer](https://github.com/JPWang-CS/FreqFormer)的Frequency-aware Cascade Attention-SFA改进C3k2.\n\n68. ultralytics/cfg/models/11/yolo11-C3k2-CTA.yaml\n\n    使用[FreqFormer](https://github.com/JPWang-CS/FreqFormer)的Frequency-aware Cascade Attention-CTA改进C3k2.\n\n69. ultralytics/cfg/models/11/yolo11-C3k2-IDWC.yaml\n\n    使用[InceptionNeXt CVPR2024](https://github.com/sail-sg/inceptionnext)中的InceptionDWConv2d改进C3k2.\n\n70. ultralytics/cfg/models/11/yolo11-C3k2-IDWD.yaml\n\n    使用[InceptionNeXt CVPR2024](https://github.com/sail-sg/inceptionnext)中的InceptionDWBlock改进C3k2.\n\n71. ultralytics/cfg/models/11/yolo11-C3k2-PConv.yaml\n\n    使用[FasterNet CVPR2023](https://github.com/JierunChen/FasterNet)中的PConv改进C3k2.\n\n72. ultralytics/cfg/models/11/yolo11-C3k2-EMA.yaml\n\n    B站注意力教程例子.链接:https://www.bilibili.com/video/BV1mXkVYAEGM/\n\n73. ultralytics/cfg/models/11/yolo11-C3k2-CAMixer.yaml\n\n    使用[CAMixerSR CVPR2024](https://github.com/icandle/CAMixerSR)中的CAMixer改进C3k2.\n\n74. ultralytics/cfg/models/11/yolo11-MAN.yaml\n\n    使用[Hyper-YOLO TPAMI2025](https://www.arxiv.org/pdf/2408.04804)中的Mixed Aggregation Network改进yolov11.\n\n75. ultralytics/cfg/models/11/yolo11-C3k2-HFERB.yaml\n\n    使用[ICCV2023 CRAFT-SR](https://github.com/AVC2-UESTC/CRAFT-SR)中的high-frequency enhancement residual block改进C3k2.\n\n76. ultralytics/cfg/models/11/yolo11-C3k2-DTAB.yaml\n\n    使用[AAAI2025 TBSN](https://github.com/nagejacob/TBSN)中的DTAB改进C3k2.\n\n77. ultralytics/cfg/models/11/yolo11-C3k2-JDPM.yaml\n\n    使用[ECCV2024 FSEL](https://github.com/CSYSI/FSEL)中的joint domain perception module改进C3k2.\n\n78. ultralytics/cfg/models/11/yolo11-C3k2-ETB.yaml\n\n    使用[ECCV2024 FSEL](https://github.com/CSYSI/FSEL)中的entanglement transformer block改进C3k2.\n\n79. ultralytics/cfg/models/11/yolo11-C3k2-FDT.yaml\n\n    使用[ACMMM2024 WFEN](https://github.com/PRIS-CV/WFEN)中的Full-domain Transformer改进C3k2.\n\n80. ultralytics/cfg/models/11/yolo11-C3k2-AP.yaml\n\n    使用[AAAI2025 Pinwheel-shaped Convolution and Scale-based Dynamic Loss for Infrared Small Target Detection](https://github.com/JN-Yang/PConv-SDloss-Data)中的Asymmetric Padding bottleneck改进yolo11.\n\n81. ultralytics/cfg/models/11/yolo11-C3k2-Kat.yaml\n\n    使用[ICLR2025 Kolmogorov-Arnold Transformer](https://github.com/Adamdad/kat)中的KAT改进C3k2.\n\n82. ultralytics/cfg/models/11/yolo11-C3k2-ELGCA.yaml\n\n    使用[ELGC-Net](https://github.com/techmn/elgcnet)中的ELGCA改进C3k2.\n\n83. ultralytics/cfg/models/11/yolo11-C3k2-Strip.yaml\n\n    使用[Strip R-CNN](https://arxiv.org/pdf/2501.03775)中的StripBlock改进C3k2.\n\n84. ultralytics/cfg/models/11/yolo11-C3k2-GlobalFilter.yaml\n\n    使用[T-PAMI Global Filter Networks for Image Classification](https://github.com/raoyongming/GFNet)中的GlobalFilterBlock和[TransNeXt CVPR2024](https://github.com/DaiShiResearch/TransNeXt)中的Convolutional GLU改进C3k2.\n\n85. ultralytics/cfg/models/11/yolo11-C3k2-DynamicFilter.yaml\n\n    使用[AAAI2024 FFT-Based Dynamic Token Mixer for Vision](https://github.com/okojoalg/dfformer)中的DynamicFilter改进C3k2.\n\n86. ultralytics/cfg/models/11/yolo11-C3k2-TSSA.yaml\n     \n    使用[Token Statistics Transformer](https://github.com/RobinWu218/ToST)中的Token Statistics Self-Attention和[Metaformer TPAMI2024](https://github.com/sail-sg/metaformer)的metaformer改进C3k2.\n\n87. ultralytics/cfg/models/11/yolo11-RepHMS.yaml\n\n    使用[MHAF-YOLO](https://github.com/yang-0201/MHAF-YOLO)中的RepHMS改进yolo11.\n\n88. ultralytics/cfg/models/11/yolo11-C3k2-SAVSS.yaml\n\n    使用[CVPR2025 SCSegamba](https://github.com/Karl1109/SCSegamba)中的Structure-Aware Scanning Strategy改进C3k2.\n\n89. ultralytics/cfg/models/11/yolo11-C3k2-MobileMamba.yaml\n     \n    使用[CVPR2025 MobileMamba](https://github.com/lewandofskee/MobileMamba)中的MobileMambaBlock改进C3k2.\n\n90. ultralytics/cfg/models/11/yolo11-C3k2-MambaOut.yaml\n     \n    使用[CVPR2025 MambaOut](https://github.com/yuweihao/MambaOut)中的MambaOutBlock改进C3k2.\n\n91. ultralytics/cfg/models/11/yolo11-C3k2-EfficientVIM.yaml\n\n    使用[CVPR2025 EfficientViM](https://github.com/mlvlab/EfficientViM)中的EfficientViMBlock改进C3k2.\n\n92. ultralytics/cfg/models/11/yolo11-C3k2-RCB.yaml\n\n    使用[CVPR2025 OverLock](https://arxiv.org/pdf/2502.20087)中的RepConvBlock改进C3k2.\n\n93. ultralytics/cfg/models/11/yolo11-C3k2-LEGM.yaml\n\n    使用[CVPR2024 DCMPNet](https://github.com/zhoushen1/DCMPNet)中的LEGM改进C3k2.\n\n94. ultralytics/cfg/models/11/yolo11-C3k2-FAT.yaml\n\n    使用[ICLR2024-FTIC](https://github.com/qingshi9974/ICLR2024-FTIC)中的FATBlock改进C3k2.\n\n95. ultralytics/cfg/models/11/yolo11-C3k2-LFEM.yaml\n\n    使用[LEGNet](https://github.com/lwCVer/LEGNet)中的LFEModule改进C3k2.\n\n96. ultralytics/cfg/models/11/yolo11-C3k2-SBSM.yaml\n\n    使用[WACV2025 SEM-Net](https://github.com/ChrisChen1023/SEM-Net)的Snake Bi-Directional Sequence Modelling (SBSM)改进C3k2.\n\n97. ultralytics/cfg/models/11/yolo11-C3k2-LSBlock.yaml\n\n    使用[CVPR2025 LSNet](https://github.com/THU-MIG/lsnet)的LSBlock改进C3k2.\n\n98. ultralytics/cfg/models/11/yolo11-C3k2-TransMamba.yaml\n\n    使用[TransMamba](https://github.com/sunshangquan/TransMamba)的TransMamba改进C3k2.\n\n99. ultralytics/cfg/models/11/yolo11-C3k2-EVS.yaml\n\n    使用[CVPR2025 EVSSM](https://github.com/kkkls/EVSSM)中的EVS改进C3k2.\n\n100. ultralytics/cfg/models/11/yolo11-C3k2-EBlock.yaml\n\n    使用[CVPR2025 DarkIR](https://github.com/cidautai/DarkIR)中的EBlock改进C3k2.\n\n101. ultralytics/cfg/models/11/yolo11-C3k2-DBlock.yaml\n\n    使用[CVPR2025 DarkIR](https://github.com/cidautai/DarkIR)中的DBlock改进C3k2.\n\n102. ultralytics/cfg/models/11/yolo11-C3k2-FDConv.yaml\n\n    使用[CVPR2025 Frequency Dynamic Convolution for Dense Image Prediction](https://github.com/Linwei-Chen/FDConv)的FDConv改进C3k2.\n\n103. ultralytics/cfg/models/11/yolo11-C3k2-DSAN.yaml\n\n    使用[DSA: Deformable Spatial Attention](https://www.techrxiv.org/users/628671/articles/775010-deformable-spatial-attention-networks-enhancing-lightweight-convolutional-models-for-vision-tasks)中的Deformable Spatial Attention Block改进C3k2.\n\n104. ultralytics/cfg/models/11/yolo11-C3k2-DSA.yaml\n\n    使用[DSA: Deformable Spatial Attention](https://www.techrxiv.org/users/628671/articles/775010-deformable-spatial-attention-networks-enhancing-lightweight-convolutional-models-for-vision-tasks)中的Deformable Spatial Attention改进C3k2.\n\n105. ultralytics/cfg/models/11/yolo11-C3k2-RMB.yaml\n\n    使用[CVPR2025 MaIR](https://github.com/XLearning-SCU/2025-CVPR-MaIR)中的Residual Mamba Block改进C3k2.\n\n106. ultralytics/cfg/models/11/yolo11-C3k2-SFSConv.yaml\n\n    使用[CVPR2024 SFSConv](https://github.com/like413/SFS-Conv)的SFSConv改进C3k2.\n\n107. ultralytics/cfg/models/11/yolo11-C3k2-GroupMamba.yaml\n\n    使用[CVPR2025 GroupMamba](https://github.com/Amshaker/GroupMamba)中的GroupMambaLayer改进C3k2.\n\n108. ultralytics/cfg/models/11/yolo11-C3k2-GroupMambaBlock.yaml\n\n    使用[CVPR2025 GroupMamba](https://github.com/Amshaker/GroupMamba)中的GroupMambaBlock改进C3k2.\n\n109. ultralytics/cfg/models/11/yolo11-C3k2-MambaVision.yaml\n\n    使用[CVPR2025 MambaVision](https://github.com/NVlabs/MambaVision)中的MambaVision改进C3k2.\n\n110. ultralytics/cfg/models/11/yolo11-FCM.yaml\n\n    使用[AAAI2025 FBRT-YOLO](https://github.com/galaxy-oss/FCM)的模块改进yolo11.\n\n111. ultralytics/cfg/models/12/yolo12-FCM.yaml\n\n    使用[AAAI2025 FBRT-YOLO](https://github.com/galaxy-oss/FCM)的模块改进yolo12.\n\n112. ultralytics/cfg/models/11/yolo11-C3k2-wConv.yaml\n\n    使用[weightedConvolution2.0](https://github.com/cammarasana123/weightedConvolution2.0)中的wConv2d改进C3k2.\n\n113. ultralytics/cfg/models/11/yolo11-C3k2-FourierConv.yaml\n\n    使用[MIA2025 Fourier Convolution Block with global receptive field for MRI reconstruction](https://www.sciencedirect.com/science/article/abs/pii/S1361841524002743)中的FourierConv改进C3k2.\n\n114. ultralytics/cfg/models/11/yolo11-C3k2-GLVSS.yaml\n\n    使用[TGRS2025 UMFormer](https://github.com/takeyoutime/UMFormer)中的GLVSS改进C3k2.\n\n115. ultralytics/cfg/models/11/yolo11-C3k2-ESC.yaml\n\n    使用[ICCV2025 ESC: Emulating Self-attention with Convolution for Efficient Image Super-Resolution](https://github.com/dslisleedh/ESC)中的ESC改进C3k2.\n\n116. ultralytics/cfg/models/11/yolo11-C3k2-MBRConv3.yaml\n\n    使用[ICCV2025 MobileIE](https://github.com/AVC2-UESTC/MobileIE)中的MBRConv3改进C3k2.\n\n117. ultralytics/cfg/models/11/yolo11-C3k2-MBRConv5.yaml\n\n    使用[ICCV2025 MobileIE](https://github.com/AVC2-UESTC/MobileIE)中的MBRConv5改进C3k2.\n\n118. ultralytics/cfg/models/11/yolo11-C3k2-VSSD.yaml\n\n    使用[ICCV2025 VSSD](https://github.com/YuHengsss/VSSD)中的VSSD改进C3k2.\n\n119. ultralytics/cfg/models/11/yolo11-C3k2-TinyVIM.yaml\n\n    使用[ICCV2025 TinyVIM](https://arxiv.org/abs/2411.17473)中的TinyVIMBlock改进C3k2.\n\n120. ultralytics/cfg/models/11/yolo11-C3k2-CSI.yaml\n\n    使用[INFFUS2025 SAMamba](https://arxiv.org/pdf/2505.23214)中的CSI改进C3k2.\n\n121. ultralytics/cfg/models/11/yolo11-C3k2-ConvAttn.yaml    \n\n    使用[ICCV2025 ESC: Emulating Self-attention with Convolution for Efficient Image Super-Resolution](https://github.com/dslisleedh/ESC)中的ConvAttn改进C3k2.\n\n122. ultralytics/cfg/models/11/yolo11-C3k2-UniConv.yaml\n\n    使用[ICCV2025 UniConvBlock](https://github.com/ai-paperwithcode/UniConvNet)中的UniConvBlock改进C3k2.\n\n123. ultralytics/cfg/models/11/yolo11-C3k2-LGLB.yaml\n\n    使用[ACM MM 2025 Mobile U-ViT](https://github.com/FengheTan9/Mobile-U-ViT)中的LGLBBlock改进C3k2.\n\n124. ultralytics/cfg/models/11/yolo11-C3k2-ConverseB.yaml\n\n    使用[ICCV2025 ConverseBNet](https://github.com/cszn/ConverseNet)中的ConverseBlock改进C3k2.\n\n125. ultralytics/cfg/models/11/yolo11-C3k2-Converse.yaml\n\n    使用[ICCV2025 ConverseBNet](https://github.com/cszn/ConverseNet)中的Converse2D改进C3k2.\n\n126. ultralytics/cfg/models/11/yolo11-C3k2-GCConv.yaml\n\n    使用[CVPR2025 Golden Cudgel Network](https://github.com/gyyang23/GCNet)中的GCConv改进C3k2.\n\n127. ultralytics/cfg/models/11/yolo11-C3k2-CFBlock.yaml\n\n    使用[AAAI2024 SCTNet](https://arxiv.org/pdf/2312.17071)中的CFBlock改进C3k2.\n\n128. ultralytics/cfg/models/11/yolo11-C3k2-FMABlock.yaml\n\n    使用[IJCV2024 SRConvNet](https://github.com/lifengcs/SRConvNet)中的FMABlock改进C3k2.\n\n129. ultralytics/cfg/models/11/yolo11-C3k2-LWGA.yaml\n\n    使用[LWGANet](https://github.com/lwCVer/LWGANet)中的LWGABlock改进C3k2.\n\n130. ultralytics/cfg/models/11/yolo11-C3k2-CSSC.yaml\n\n    使用[TGRS2025 ASCNet](https://ieeexplore.ieee.org/document/10855453)中的CSSC改进C3k2.\n\n131. ultralytics/cfg/models/11/yolo11-C3k2-CNCM.yaml\n\n    使用[TGRS2025 ASCNet](https://ieeexplore.ieee.org/document/10855453)中的CNCM改进C3k2.\n\n132. ultralytics/cfg/models/11/yolo11-C3k2-HFRB.yaml\n\n    使用[ICCV2025 HFRB](https://arxiv.org/pdf/2507.10689)中的HFRB改进C3k2.\n\n133. ultralytics/cfg/models/11/yolo11-C3k2-EVA.yaml\n\n    使用[ICIP2025 BEVANET](https://arxiv.org/pdf/2508.07300)中的EVA改进C3k2.\n\n134. ultralytics/cfg/models/11/yolo11-C3k2-RMBC.yaml\n\n    使用[PlainUSR](https://arxiv.org/pdf/2409.13435)中的RepMBConv改进C3k2.\n\n135. ultralytics/cfg/models/11/yolo11-C3k2-RMBC-LA.yaml\n\n    使用[PlainUSR](https://arxiv.org/pdf/2409.13435)中的RepMBConv和Local Importance-based Attention改进C3k2.\n\n136. ultralytics/cfg/models/11/yolo11-C3k2-IEL.yaml\n\n    使用[CVPR2025 HVI](https://arxiv.org/pdf/2502.20272)中的IEL改进C3k2.\n\n137. ultralytics/cfg/models/11/yolo11-C3k2-SFMB.yaml\n\n    使用[TIP2025 SFMB](https://arxiv.org/pdf/2511.06593v1)中的SFMB改进C3k2.\n\n138. ultralytics/cfg/models/11/yolo11-C3k2-MFEB.yaml\n\n    使用[MICCAI2023 SHISRCNet](https://arxiv.org/abs/2306.14119)中的MFEB改进C3k2.\n\n139. ultralytics/cfg/models/11/yolo11-C3k2-PartialNetBlock.yaml\n\n    使用[AAAI2026 Partial Channel Network](https://arxiv.org/pdf/2502.01303)中的PartialNetBlock改进C3k2.\n\n140. ultralytics/cfg/models/11/yolo11-C3k2-DRG.yaml\n\n    使用[TGRS2025 DRPCA-Net](https://arxiv.org/pdf/2507.09541)中的DRG改进C3k2.\n\n151. ultralytics/cfg/models/11/yolo11-C3k2-GLGM.yaml\n\n    使用[TGRS2025 ISGLNet](https://ieeexplore.ieee.org/document/11232501)中的GLGM改进C3k2.\n\n152. ultralytics/cfg/models/11/yolo11-C3k2-MAC.yaml\n\n    使用[TGRS2025 HDNet](https://ieeexplore.ieee.org/document/11232501)中的MAC改进C3k2.\n\n153. ultralytics/cfg/models/11/yolo11-C3k2-SPJFB.yaml\n    \n    使用[AAAI2026 SPJFNet](https://arxiv.org/pdf/2508.04041)中的SPJFBlock改进C3k2.\n\n154. ultralytics/cfg/models/11/yolo11-C3k2-GLSS2D.yaml\n    \n    使用[TGRS2025 GLVMamba](https://ieeexplore.ieee.org/document/11014226)中的GLSS2D改进C3k2.\n\n155. ultralytics/cfg/models/11/yolo11-C3k2-DEGConv.yaml\n    \n    使用[CVPR2026 MixerCSeg](https://arxiv.org/pdf/2603.01361)中的DEGConv改进C3k2.\n\n156. ultralytics/cfg/models/11/yolo11-C3k2-TransMixer.yaml\n    \n    使用[CVPR2026 TransMixer](https://arxiv.org/pdf/2603.01361)中的TransMixer改进C3k2.\n\n### C2PSA系列\n\n1. ultralytics/cfg/models/11/yolo11-C2BRA.yaml\n\n    使用[BIFormer CVPR2023](https://github.com/rayleizhu/BiFormer)中的Bi-Level Routing Attention改进C2PSA.\n\n2. ultralytics/cfg/models/11/yolo11-C2CGA.yaml\n\n    使用[EfficientViT CVPR2023](https://github.com/microsoft/Cream/tree/main/EfficientViT)中的CascadedGroupAttention改进C2PSA.\n\n3. ultralytics/cfg/models/11/yolo11-C2DA.yaml\n\n    使用[Vision Transformer with Deformable Attention(CVPR2022)](https://github.com/LeapLabTHU/DAT)中的DAttention改进C2PSA.\n\n4. ultralytics/cfg/models/11/yolo11-C2DPB.yaml\n\n    使用[CrossFormer](https://arxiv.org/pdf/2108.00154)中的DynamicPosBias-Attention改进C2PSA.\n\n5. ultralytics/cfg/models/11/yolo11-DTAB.yaml\n\n    使用[AAAI2025 TBSN](https://github.com/nagejacob/TBSN)中的DTAB替换C2PSA.\n\n6. ultralytics/cfg/models/11/yolo11-ETB.yaml\n\n    使用[ECCV2024 FSEL](https://github.com/CSYSI/FSEL)中的entanglement transformer block替换C2PSA.\n\n7. ultralytics/cfg/models/11/yolo11-FDT.yaml\n\n    使用[ACMMM2024 WFEN](https://github.com/PRIS-CV/WFEN)中的Full-domain Transformer替换C2PSA.\n\n8. ultralytics/cfg/models/11/yolo11-C2Pola.yaml\n\n    使用[ICLR2025 PolaFormer)](https://github.com/ZacharyMeng/PolaFormer)中的PolaAttention改进C2PSA.\n\n9. ultralytics/cfg/models/11/yolo11-C2TSSA.yaml\n     \n    使用[Token Statistics Transformer](https://github.com/RobinWu218/ToST)中的Token Statistics Self-Attention改进C2PSA.\n\n10. ultralytics/cfg/models/11/yolo11-C2ASSA.yaml\n     \n    使用[CVPR2024 Adapt or Perish: Adaptive Sparse Transformer with Attentive Feature Refinement for Image Restoration](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhou_Adapt_or_Perish_Adaptive_Sparse_Transformer_with_Attentive_Feature_Refinement_CVPR_2024_paper.pdf)中的Adaptive Sparse Self-Attention改进C2PSA.\n\n11. ultralytics/cfg/models/11/yolo11-ASSR.yaml\n\n    使用[CVPR2025 MambaIR](https://github.com/csguoh/MambaIR)中的Attentive State Space Group改进yolo11.\n\n12. ultralytics/cfg/models/11/yolo11-C2PSA-DYT.yaml\n\n    使用[CVPR2025 DyT](https://github.com/jiachenzhu/DyT)中的DynamicTanh改进C2PSA.\n\n13. ultralytics/cfg/models/11/yolo11-C2PSA-FMFFN.yaml\n\n    使用[ICLR2024-FTIC](https://github.com/qingshi9974/ICLR2024-FTIC)中的FMFFN改进C2PSA.\n\n14. ultralytics/cfg/models/11/yolo11-C2PSA-CGLU.yaml\n\n    使用[TransNeXt CVPR2024](https://github.com/DaiShiResearch/TransNeXt)中的Convolutional GLU改进C2PSA.\n\n15. ultralytics/cfg/models/11/yolo11-C2PSA-SEFN.yaml\n\n    使用[WACV2025 SEM-Net](https://github.com/ChrisChen1023/SEM-Net)的Spatially-Enhanced Feedforward Network (SEFN)改进C2PSA.\n\n16. ultralytics/cfg/models/11/yolo11-C2PSA-Mona.yaml\n\n    使用[CVPR2025 Mona](https://github.com/Leiyi-Hu/mona)的Mona改进C2PSA.\n\n17. ultralytics/cfg/models/11/yolo11-C2PSA-SEFFN.yaml\n\n    使用[TransMamba](https://github.com/sunshangquan/TransMamba)的SpectralEnhancedFFN改进C2PSA.\n\n18. ultralytics/cfg/models/11/yolo11-C2PSA-EDFFN.yaml\n\n    使用[CVPR2025 EVSSM](https://github.com/kkkls/EVSSM)中的EDFFN改进C2PSA.\n\n19. ultralytics/cfg/models/11/yolo11-C2MSLA.yaml\n\n    使用[MSLA](https://arxiv.org/pdf/2505.18823)改进C2PSA.\n\n20. ultralytics/cfg/models/11/yolo11-C2PSA-EPGO.yaml   \n\n    使用[ACM MM 2025 CPRAformer](https://github.com/zs1314/CPRAformer)中的EPGO改进C2PSA中的self-attention.\n\n21. ultralytics/cfg/models/11/yolo11-C2PSA-DML.yaml\n\n    使用[IJCV2024 SRConvNet](https://github.com/lifengcs/SRConvNet)中的DMI改进C2PSA.\n\n22. ultralytics/cfg/models/11/yolo11-C2PSA-LRSA.yaml\n\n    使用[TPAMI2025 LRFormer](https://mmcheng.net/wp-content/uploads/2025/06/25PAMI_LRFormer.pdf)中的LRSA改进C2PSA.\n\n23. ultralytics/cfg/models/11/yolo11-C2PSA-MALA.yaml\n\n    使用[ICCV2025 Rectifying Magnitude Neglect in Linear Attention](https://arxiv.org/pdf/2507.00698)中的MALA改进C2PSA.\n\n24. ultralytics/cfg/models/11/yolo11-C2PSA-SWSA.yaml\n\n    使用[ACMMM2025 FlickCD](https://dl.acm.org/doi/epdf/10.1145/3746027.3755657)中的SWSA改进C2PSA.\n\n25. ultralytics/cfg/models/11/yolo11-C2PSA-EGSA.yaml\n\n    使用[ACMMM2025 FlickCD](https://dl.acm.org/doi/epdf/10.1145/3746027.3755657)中的EGSA改进C2PSA.\n\n26. ultralytics/cfg/models/11/yolo11-C2DWMMSA.yaml\n    \n    使用[TGRS2025 USTNet](https://ieeexplore.ieee.org/document/11146454)中的DWMMSA改进C2PSA.\n\n27. ultralytics/cfg/models/11/yolo11-C2BinaryAttn.yaml\n    \n    使用[CVPR2026 BinaryAttention](https://arxiv.org/pdf/2303.08810)中的BinaryAttention改进C2PSA.\n\n28. ultralytics/cfg/models/11/yolo11-C2WCA.yaml\n    \n    使用[CVPR2025 Wavelet and Prototype Augmented Query-based Transformer for Pixel-level Surface Defect Detection](https://openaccess.thecvf.com/content/CVPR2025/papers/Yan_Wavelet_and_Prototype_Augmented_Query-based_Transformer_for_Pixel-level_Surface_Defect_CVPR_2025_paper.pdf)中的WCA改进C2PSA.\n\n### A2C2f系列\n1. ultralytics/cfg/models/12/yolo12-A2C2f-CGLU.yaml\n     \n    使用[TransNeXt CVPR2024](https://github.com/DaiShiResearch/TransNeXt)中的Convolutional GLU改进A2C2f.\n\n2. ultralytics/cfg/models/12/yolo12-A2C2f-KAN.yaml\n\n    使用[ICLR2025 Kolmogorov-Arnold Transformer](https://github.com/Adamdad/kat)中的KAN改进A2C2f.\n\n3. ultralytics/cfg/models/12/yolo12-A2C2f-DFFN.yaml\n    \n    使用[FreqFormer](https://github.com/JPWang-CS/FreqFormer)中的DFFN改进A2C2f.\n\n4. ultralytics/cfg/models/12/yolo12-A2C2f-FRFN.yaml\n     \n    使用[CVPR2024 Adapt or Perish: Adaptive Sparse Transformer with Attentive Feature Refinement for Image Restoration](https://openaccess.thecvf.com/content/CVPR2024/papers/Zhou_Adapt_or_Perish_Adaptive_Sparse_Transformer_with_Attentive_Feature_Refinement_CVPR_2024_paper.pdf)中的feature refinement feed-forward改进A2C2f.\n\n5. ultralytics/cfg/models/12/yolo12-A2C2f-DYT.yaml\n     \n    使用[CVPR2025 DyT](https://github.com/jiachenzhu/DyT)中的DynamicTanh改进A2C2f.\n\n6. ultralytics/cfg/models/12/yolo12-A2C2f-FMFFN.yaml\n\n    使用[ICLR2024-FTIC](https://github.com/qingshi9974/ICLR2024-FTIC)中的FMFFN改进A2C2f.\n\n7. ultralytics/cfg/models/12/yolo12-A2C2f-SEFN.yaml\n\n    使用[WACV2025 SEM-Net](https://github.com/ChrisChen1023/SEM-Net)的Spatially-Enhanced Feedforward Network (SEFN)改进A2C2f.\n\n8. ultralytics/cfg/models/12/yolo12-A2C2f-Mona.yaml\n\n    使用[CVPR2025 Mona](https://github.com/Leiyi-Hu/mona)的Mona改进A2C2f.\n\n9. ultralytics/cfg/models/12/yolo12-A2C2f-SEFFN.yaml\n\n    使用[TransMamba](https://github.com/sunshangquan/TransMamba)的SpectralEnhancedFFN改进A2C2f.\n\n10. ultralytics/cfg/models/12/yolo12-A2C2f-EDFFN.yaml\n\n    使用[CVPR2025 EVSSM](https://github.com/kkkls/EVSSM)中的EDFFN改进A2C2f.\n\n### 组合系列\n1. ultralytics/cfg/models/11/yolo11-fasternet-bifpn.yaml\n\n    fasternet与bifpn的结合.  \n    其中BIFPN中有三个可选参数：\n    1. Fusion  \n        其中BIFPN中的Fusion模块支持五种: weight, adaptive, concat, bifpn(default), SDI  \n        其中weight, adaptive, concat出自[paper链接-Figure 3](https://openreview.net/pdf?id=q2ZaVU6bEsT), SDI出自[U-NetV2](https://github.com/yaoppeng/U-Net_v2)\n    2. node_mode  \n        其中目前(后续会更新喔)支持这些[结构](#b)\n    3. head_channel  \n        BIFPN中的通道数,默认设置为256.\n\n2. ultralytics/cfg/models/11/yolo11-ELA-HSFPN-TADDH.yaml\n\n    使用[Efficient Local Attention](https://arxiv.org/abs/2403.01123)改进HSFPN,使用自研动态动态对齐检测头改进Head.\n\n3. ultralytics/cfg/models/11/yolo11-FDPN-TADDH.yaml\n\n    自研结构的融合.\n    1. 自研特征聚焦扩散金字塔网络(Focusing Diffusion Pyramid Network)\n    2. 自研任务对齐动态检测头(Task Align Dynamic Detection Head)\n\n4. ultralytics/cfg/models/11/yolo11-starnet-C3k2-Star-LSCD.yaml\n\n    轻量化模型组合.\n    1. CVPR2024-StarNet Backbone.\n    2. C3k2-Star.\n    3. Lightweight Shared Convolutional Detection Head.\n\n# Mamba-YOLO\n1. [Mamba-YOLO](https://github.com/HZAI-ZJNU/Mamba-YOLO)\n\n    集成Mamba-YOLO.(需要编译请看百度云视频-20240619版本更新说明)\n    ultralytics/cfg/models/mamba-yolo/Mamba-YOLO-T.yaml\n    ultralytics/cfg/models/mamba-yolo/Mamba-YOLO-B.yaml\n    ultralytics/cfg/models/mamba-yolo/Mamba-YOLO-L.yaml\n    ultralytics/cfg/models/mamba-yolo/yolo-mamba-seg.yaml\n\n# Hyper-YOLO\n1. Hyper-YOLO(TPAMI2025)\n\n    1. ultralytics/cfg/models/hyper-yolo/hyper-yolo.yaml\n    2. ultralytics/cfg/models/hyper-yolo/hyper-yolot.yaml\n    3. ultralytics/cfg/models/hyper-yolo/hyper-yolo-seg.yaml\n\n# 注意力系列\n1. EMA\n2. SimAM\n3. SpatialGroupEnhance\n4. BiLevelRoutingAttention, BiLevelRoutingAttention_nchw\n5. TripletAttention\n6. CoordAtt\n7. CBAM\n8. BAMBlock\n9. EfficientAttention(CloFormer中的注意力)\n10. LSKBlock\n11. SEAttention\n12. CPCA\n13. deformable_LKA\n14. EffectiveSEModule\n15. LSKA\n16. SegNext_Attention\n17. DAttention(Vision Transformer with Deformable Attention CVPR2022)\n18. FocusedLinearAttention(ICCV2023)\n19. MLCA\n20. TransNeXt_AggregatedAttention\n21. LocalWindowAttention(EfficientViT中的CascadedGroupAttention注意力)\n22. Efficient Local Attention[Efficient Local Attention](https://arxiv.org/abs/2403.01123)\n23. CAA(CVPR2024 PKINet中的注意力)\n24. CAFM\n25. AFGCAttention[Neural Networks ECCV2024](https://www.sciencedirect.com/science/article/abs/pii/S0893608024002387)\n\n# Loss系列\n1. SlideLoss,EMASlideLoss.(可动态调节正负样本的系数,让模型更加注重难分类,错误分类的样本上)\n2. IoU,GIoU,DIoU,CIoU,EIoU,SIoU,MPDIoU,ShapeIoU.\n3. Inner-IoU,Inner-GIoU,Inner-DIoU,Inner-CIoU,Inner-EIoU,Inner-SIoU,Inner-ShapeIoU.\n4. Wise-IoU(v1,v2,v3)系列(IoU,WIoU,EIoU,GIoU,DIoU,CIoU,SIoU,MPDIoU,ShapeIoU).\n5. Inner-Wise-IoU(v1,v2,v3)系列(IoU,WIoU,EIoU,GIoU,DIoU,CIoU,SIoU,MPDIoU,ShapeIoU).\n6. FocalLoss,VarifocalLoss,QualityfocalLoss\n7. Focaler-IoU系列(IoU,GIoU,DIoU,CIoU,EIoU,SIoU,WIoU,MPDIoU,ShapeIoU)\n8. Powerful-IoU,Powerful-IoUV2,Inner-Powerful-IoU,Inner-Powerful-IoUV2,Focaler-Powerful-IoU,Focaler-Powerful-IoUV2,Wise-Powerful-IoU(v1,v2,v3),Wise-Powerful-IoUV2(v1,v2,v3)[论文链接](https://www.sciencedirect.com/science/article/abs/pii/S0893608023006640)\n9. Normalized Gaussian Wasserstein Distance.\n10. Gaussian Combined Distance.\n\n# 更新公告\n\n- **20241013-yolov11-v1.1**\n    1. 初版发布。\n\n- **20241018-yolov11-v1.2**\n    1. 移植完200+改进点。\n    2. 修复已知问题。\n\n- **20241027-yolov11-v1.3**\n    1. 修复已知问题。\n    2. 新增自研CSP-MutilScaleEdgeInformationEnhance.\n    3. 新增Efficient Frequency-Domain Image Deraining with Contrastive Regularization中的Fused_Fourier_Conv_Mixer.\n    4. 更新使用教程.\n    5. 百度云视频增加20241027更新说明.\n\n- **20241103-yolov11-v1.4**\n    1. 新增自研Rep Shared Convolutional Detection Head.\n    2. 修复已知问题。\n    3. 增加实例分割、姿态检测、旋转目标检测怎么用里面的改进视频在使用说明.\n    4. 百度云视频增加20241103更新说明.\n\n- **20241112-yolov11-v1.5**\n    1. 新增自研CSP-FreqSpatial.\n    2. 新增SFHformer ECCV2024中的block改进C3k2.\n    3. 新增Revitalizing Convolutional Network for Image Restoration TPAMI2024中的MSM改进C3k2.\n    4. 更新使用教程.\n    5. 百度云视频增加20241112更新说明.\n    6. 修复一些已知问题.\n\n- **20241124-yolov11-v1.6**\n    1. 基于自研CSP-MutilScaleEdgeInformationEnhance再次创新得到CSP-MutilScaleEdgeInformationSelect.\n    2. 新增Pattern Recognition 2024|DRANet中的HDRAB和RAB模块改进C3k2.\n    3. 新增ECCV2022-ELAN中的Local feature extraction改进C3k2.\n    4. 使用Bi-Level Routing Attention改进C2PSA.\n    5. 使用CascadedGroupAttention改进C2PSA.\n    6. 使用DAttention改进C2PSA.\n    7. 更新使用教程.\n    8. 百度云视频增加20241124更新说明.\n    9. 修复一些已知问题.\n\n- **20241207-yolov11-v1.7**\n    1. 新增自研GlobalEdgeInformationTransfer.\n    2. 新增FreqFormer的Frequency-aware Cascade Attention改进C3k2.\n    3. 新增CVPR2024InceptionNeXt中的IDWC、IDWB的改进.\n    4. 新增CrossFormer中的DynamicPosBias-Attention改进C2PSA.\n    5. 更新使用教程.\n    6. 百度云视频增加20241207更新说明.\n\n- **20241221-yolov11-v1.8**\n    1. 新增CAMixerSR中的CAMixer改进C3k2.\n    2. 新增支持Hyper-YOLO，并可以利用项目自带的改进改进Hyper-YOLO.\n    3. 新增Hyper-YOLO中的Hypergraph Computation in Semantic Space和Mixed Aggregation Network的改进.\n    4. 新增Fasternet中的PConv改进C3k2.\n    5. 新增一些注意力例子配合B站视频进行学习.\n    6. 更新使用教程.\n    7. 百度云视频增加20241221更新说明.\n\n- **20241228-yolov11-v1.9**\n    1. 新增基于Hyper-YOLO中的Mixed Aggregation Network三个二次改进系列.\n    2. 新增使用MSA^2 Net中的Multi-Scale Adaptive Spatial Attention Gate改进yolo11-neck.\n    3. 新增使用MSA^2 Net中的Multi-Scale Adaptive Spatial Attention Gate改进自研系列的MutilBackbone.\n    4. 更新使用教程.\n    5. 百度云视频增加20241228更新说明.\n\n- **20250112-yolo11-v1.10**\n    1. 新增CRAFT-SR中的high-frequency enhancement residual block.\n    2. 新增AAAI2025-TBSN中的DTAB.\n    3. 新增ECCV2024-FSEL中的多个模块.\n    4. 新增ACMMM2024-WFEN中的多个模块.\n    5. 更新使用教程.\n    6. 百度云视频增加20250112更新说明.\n\n- **20250119-yolo11-v1.11**\n    1. 新增AAAI2025 Pinwheel-shaped Convolution and Scale-based Dynamic Loss for Infrared Small Target Detection中的Pinwheel-shaped Convolution类型改进.\n    2. 新增AAAI2025 ConDSeg中的ContrastDrivenFeatureAggregation与ACMMM2024 WFEN中的小波变换进行创新.\n    3. 更新使用教程.\n    4. 百度云视频增加20250119更新说明.\n\n- **20250205-yolo11-v1.12**\n    1. 新增ELGC-Net的改进及其二次创新.\n    2. 新增ICLR2025 PolaFormer中的PolaAttention改进C2PSA.\n    3. 新增遥感目标检测Strip R-CNN中的StripBlock及其二次创新.\n    4. 新增BIBM2024 Spatial-Frequency Dual Domain Attention Network For Medical Image Segmentation中的Frequency-Spatial Attention和Multi-scale Progressive Channel Attention.\n    5. 新增ICLR2025 Kolmogorov-Arnold Transformer中的KAT及其配合FasterBlock的二次创新.<此模块需要编译>\n    6. 更新使用教程.\n    7. 百度云视频增加20250205更新说明.\n\n- **20250215-yolo11-v1.13**\n    1. 新增自研模块DynamicInceptionDWConv2d.\n    2. 新增GlobalFilter和DynamicFilter.\n    3. 更新使用教程.\n    4. 百度云视频增加20250215更新说明.\n\n- **20250222-yolo11-v1.14**\n    1. 新增yolo12配置文件.（包含目标检测、实例分割、姿态检测、旋转目标检测、分类）\n\n- **20250301-yolo11-v1.15**\n    1. 新增自研模块Hierarchical Attention Fusion并提供多种使用方式.\n    2. 新增ICLR2025-Token Statistics Transformer中的TSSA改进C3k2,C2PSA.\n    3. 新增MHAF-YOLO中的RepHMS.<这个是YOLO群内的一个博士新作品>\n    4. 新增对YOLO12的A2C2f结构中的MLP多个改进方案<CGLU、KAN、DFFN>.\n    5. 调整了YOLO12中的注意力实现，会自动检测是否安装好Flash-Attention，没的话自动切换Torch实现.\n    6. 更新使用教程.\n    7. 百度云视频增加20250301更新说明.\n\n- **20250312-yolo11-v1.16**\n    1. 修复yolo11-ReCalibrationFPN-P2345.yaml的序号错误bug.\n    2. 新增CVPR2024-Adaptive Sparse Transformer相关改进yolo11,yolo12.\n    3. 新增CVPR2025-MambaIR的模块.\n    4. 新增CVPR2025-SCSegamba中的模块.\n    5. 新增CVPR2025-MobileMamba中的模块.\n    6. 新增CVPR2025-MambaOut中的模块.\n    7. 更新使用教程.\n    8. 百度云视频增加20250312更新说明.\n\n- **20250319-yolo11-v1.17**\n    1. 新增CVPR2025-Dynamic-Tanh的的多个改进并于其他模块的二次创新.\n    2. 修复C2PSA部分改进一些问题，详细看本期更新说明.\n    3. 更新使用教程.\n    4. 百度云视频增加20250319更新说明.\n\n- **20250322-yolo11-v1.18**\n    1. 同步yolo12官方代码最新推出的YOLOv12-turbo.\n\n- **20250329-yolo11-v1.19**\n    1. 新增CVPR2025-MambaOut与CVPR2024-UniRepLKNet二次创新后的模块.\n    2. 新增CVPR2025-EfficientViM和其与CVPR2024-TransNeXt的二次创新后的模块.\n    3. 新增CVPR2025-DEIM中的Localization Quality Estimation改进YOLOHead使其分类头同时具备分类score和预测框质量score.\n    4. 新增Localization Quality Estimation - Lightweight Shared Convolutional Detection Head.\n    5. 新增CVPR2024-EMCAD中的EUCB.\n    6. 新增CVPR2025-BHViT中的ShiftChannelMix和CVPR2024-EMCAD中的EUCB二次创新模块.\n    7. 新增yolo11-EMBSFPN.yaml方案上引入[CVPR2025 BHViT](https://github.com/IMRL/BHViT)中的ShiftChannelMix.\n    8. 更新使用教程.\n    9. 百度云视频增加20250329更新说明.\n\n- **20250415-yolo11-v1.20**\n    1. 新增ICLR2024-FTIC中的多个模块.\n    2. 新增CVPR2024-TransNext中的CGLU改进C2PSA.\n    3. 新增CVPR2024-DCMPNet中的多个模块.\n    4. 新增CVPR2025-OverLock中的多个模块.\n    5. 新增统计配置文件的计算量和参数量并排序的脚本.\n    6. 更新使用教程.\n    7. 百度云视频增加20250415更新说明.\n\n- **20250502-yolo11-v1.21**\n    1. 新增LEGNet的LoGStem和LFEModule.\n    2. 新增WACV2025-SEMNet中的Snake Bi-Directional Sequence Modelling和Spatially-Enhanced Feedforward Network.\n    3. 新增CVPR2025-Mona中的多个改进和二次创新改进.\n    4. 新增新一代轻量化SOTA的CVPR2025-LSNet的LSNet和LSConv的多个改进和二次创新改进.\n    5. 修复MobileMamba训练速度极慢的问题.\n    6. 修改保存权重的逻辑，训练结束(注意是正常训练结束后，手动停止的没有)后统一会保存4个模型，分别是best.pt、last.pt、best_fp32.pt、last_fp32.pt，其中不带fp32后缀的是fp16格式保存的，但由于有些模块对fp16非常敏感，会出现后续使用val.py的时候精度为0的情况，这种情况下可以用后缀带fp32去测试。\n    7. 更新使用教程.\n    8. 百度云视频增加20250502更新说明.\n\n- **20250518-yolo11-v1.22**\n    1. 新增TransMamba中的多个改进.\n    2. 新增CVPR2025-EVSSM中的多个改进.\n    3. 新增CVPR2025-DarkIR中的多个改进.\n    4. 更新使用教程.\n    5. 百度云视频增加20250518更新说明.\n\n- **20250601-yolo11-v1.23**\n    1. 新增CVPR2025-FDConv的改进及其多个二次创新模块.\n    2. 新增DSA: Deformable Spatial Attention的改进及其多个二次创新模块.\n    3. 新增CVPR2025-MaIR中的Residual Mamba Block.\n    4. 更新使用教程.\n    5. 百度云视频增加20250601更新说明.\n\n- **20250612-yolo11-v1.24**\n    1. 新增ECCV2024-rethinkingfpn中的模块，并对原创改进SOEP再次创新。\n    2. 新增CVPR2024-SFSConv的改进及其多个二次创新模块.\n    3. 新增CVPR2025-GroupMamba中的模块.\n    4. 新增CVPR2025-MambaVision中的模块.\n    5. 新增AAAI2025-FBRTYOLO中的模块.\n    6. 更新使用教程.\n    7. 百度云视频增加20250612更新说明.\n\n- **20250624-yolo11-v1.25**\n    1. 新增YOLOV13配置文件(包含detect、seg、pose、obb)。\n    2. 更新使用教程.\n\n- **20250706-yolo11-v1.26**\n    1. 新增Pyramid Sparse Transformer改进yolo11-neck.\n    2. 新增Pyramid Sparse Transformer对SOEP再创新.\n    3. 新增weightedConvolution2.0.\n    4. 新增MIA2025-FourierConv.\n    5. 新增AAAI2025的HS-FPN.\n    6. 新增TGRS2025-UMFormer多个模块改进.\n    7. 更新使用教程.\n    8. 百度云视频增加20250706更新说明.\n\n- **20250721-yolo11-v1.27**\n    1. 新增ICCV2025-ESC中的模块.\n    2. 新增ICCV2025-MobileIE中的模块.\n    3. 新增ICCV2025-VSSD中的模块.\n    4. 新增ICCV2025-TinyVIM中的模块.\n    5. 新增MSLA.\n    6. 新增INFFUS2025-SAMamba中的模块.\n    7. 更新使用教程.\n    8. 百度云视频增加20250721更新说明.\n\n- **20250813-yolo11-v1.28**\n    1. 新增CPRAformer中的EPGO多个改进。\n    2. 新增ICCV2025-ESC中的ConvAttn改进。\n    3. 更新使用教程.\n    4. 百度云视频增加20250813更新说明.\n\n- **20250827-yolo11-v1.29**\n    1. 新增ICCV2025-UniConvBlock中的模块.\n    2. 新增ICCV2025-ConverseBNet中的模块.\n    3. 新增ACM MM 2025-Mobile U-ViT中的模块.\n    4. 更新使用教程.\n    5. 百度云视频增加20250827更新说明.\n\n- **20250912-yolo11-v1.30**\n    1. 新增CVPR2025-GCConv模块.\n    2. 新增AAAI2024-CFBlock模块.\n    3. 新增ICCV2023-FastViT中的RepStem模块.\n    4. 更新使用教程.\n    5. 百度云视频增加20250912更新说明.\n\n- **20251008-yolo11-v1.31**\n    1. 新增IJCV2024-SRConvNet中的模块.\n    2. 新增LWGANet中的模块.\n    3. 更新使用教程.\n    4. 百度云视频增加20251008更新说明.\n\n- **20251028-yolo11-v1.32**\n    1. 新增TGRS2025-ASCNet中的模块.\n    2. 新增ICCV2025-HFRB模块.\n    3. 新增ICIP2025-BEVANET中的模块.\n    4. 新增TPAMI2025-LRFormer中的模块.\n    5. 新增ICCV2025-Rectifying Magnitude Neglect in Linear Attention的模块.\n    6. 更新使用教程.\n    7. 百度云视频增加20251028更新说明.\n\n- **20251122-yolo11-v1.33**\n    1. 新增GRSL2025-Gaussian Combined Distance,支持在目标框损失和标签分配策略上更改，详细请看LOSS改进系列.md\n    2. 新增ACCV2024-PlainUSR中的模块.\n    3. 更新使用教程.\n    4. 百度云视频增加20251122更新说明.\n\n- **20251219-yolo11-v1.34**\n    1. 新增CVPR2025-HVI中的LCA模块.\n    2. 新增TIP2025-SFMB模块.\n    3. 新增TGRS2025-HAFNet中的HFFE模块.\n    4. 更新使用教程.\n    5. 百度云视频增加20251219更新说明.\n\n- **20260114-yolo11-v1.35**\n    1. 新增YOLO-Master中的MoE模块.\n    2. 新增ACMMM2025-FlickCD中的模块.\n    3. 更新使用教程.\n    4. 百度云视频增加20260114更新说明.\n\n- **20260203-yolo11-v1.36**\n    1. 新增TGRS2025-Think Locally and Act Globally中的模块.\n    2. 新增TGRS2025-ISGLNet中的多个模块.\n    3. 新增TGRS2025-MASFNet中的模块.\n    4. 更新使用教程.\n    5. 百度云视频增加20260203更新说明.\n\n- **20260224-yolo11-v1.37**\n    1. 新增MICCAI2023-SHISRCNet中的模块.\n    2. 新增AAAI2026-Partial Channel Network中的模块.\n    3. 新增TGRS2025-DRPCANet中的模块.\n    4. 新增TGRS2025-ISGLNet中的模块.\n    5. 新增TGRS2025-HDNet中的模块.\n    6. 更新使用教程.\n    7. 百度云视频增加20260223更新说明.\n\n- **20260307-yolo11-v1.38**\n    1. 优化detect.py中的特征图保存机制，使其可以单独保存每一个通道的特征图和总通道求和的特征图.\n    2. 优化训练过程的输出，增加训练过程中的mAP75输出.\n\n- **20260321-yolo11-v1.39**\n    1. 新增AAAI2026-SPJFBlock模块.\n    2. 新增TGRS2025-GLVMamba中的GLSS2D模块.\n    3. 新增TIP2025-DSMT中的CAFM模块.\n    4. 新增TGRS2025-USTNet中的DWMMSA模块.\n    5. 新增CVPR2026-MixerCSeg中的DEGConv模块.\n    6. 新增CVPR2026-BinaryAttention的模块.\n    7. 新增CVPR2026-TransMixer模块.\n    8. 新增CVPR2025-Wavelet and Prototype Augmented Query-based Transformer for Pixel-level Surface Defect Detection中的WCA模块.\n    9. 更新使用教程.\n    10. 百度云视频增加20260321更新说明."
  },
  {
    "path": "yolo-improve/yolov5-AIFI.py",
    "content": "import torch\nimport torch.nn as nn\n\nclass TransformerEncoderLayer(nn.Module):\n    \"\"\"Defines a single layer of the transformer encoder.\"\"\"\n\n    def __init__(self, c1, cm=2048, num_heads=8, dropout=0.0, act=nn.GELU(), normalize_before=False):\n        \"\"\"Initialize the TransformerEncoderLayer with specified parameters.\"\"\"\n        super().__init__()\n        self.ma = nn.MultiheadAttention(c1, num_heads, dropout=dropout, batch_first=True)\n        # Implementation of Feedforward model\n        self.fc1 = nn.Linear(c1, cm)\n        self.fc2 = nn.Linear(cm, c1)\n\n        self.norm1 = nn.LayerNorm(c1)\n        self.norm2 = nn.LayerNorm(c1)\n        self.dropout = nn.Dropout(dropout)\n        self.dropout1 = nn.Dropout(dropout)\n        self.dropout2 = nn.Dropout(dropout)\n\n        self.act = act\n        self.normalize_before = normalize_before\n\n    @staticmethod\n    def with_pos_embed(tensor, pos=None):\n        \"\"\"Add position embeddings to the tensor if provided.\"\"\"\n        return tensor if pos is None else tensor + pos\n\n    def forward_post(self, src, src_mask=None, src_key_padding_mask=None, pos=None):\n        \"\"\"Performs forward pass with post-normalization.\"\"\"\n        q = k = self.with_pos_embed(src, pos)\n        src2 = self.ma(q, k, value=src, attn_mask=src_mask, key_padding_mask=src_key_padding_mask)[0]\n        src = src + self.dropout1(src2)\n        src = self.norm1(src)\n        src2 = self.fc2(self.dropout(self.act(self.fc1(src))))\n        src = src + self.dropout2(src2)\n        return self.norm2(src)\n\n    def forward_pre(self, src, src_mask=None, src_key_padding_mask=None, pos=None):\n        \"\"\"Performs forward pass with pre-normalization.\"\"\"\n        src2 = self.norm1(src)\n        q = k = self.with_pos_embed(src2, pos)\n        src2 = self.ma(q, k, value=src2, attn_mask=src_mask, key_padding_mask=src_key_padding_mask)[0]\n        src = src + self.dropout1(src2)\n        src2 = self.norm2(src)\n        src2 = self.fc2(self.dropout(self.act(self.fc1(src2))))\n        return src + self.dropout2(src2)\n\n    def forward(self, src, src_mask=None, src_key_padding_mask=None, pos=None):\n        \"\"\"Forward propagates the input through the encoder module.\"\"\"\n        if self.normalize_before:\n            return self.forward_pre(src, src_mask, src_key_padding_mask, pos)\n        return self.forward_post(src, src_mask, src_key_padding_mask, pos)\n\n\nclass AIFI(TransformerEncoderLayer):\n    \"\"\"Defines the AIFI transformer layer.\"\"\"\n\n    def __init__(self, c1, cm=2048, num_heads=8, dropout=0, act=nn.GELU(), normalize_before=False):\n        \"\"\"Initialize the AIFI instance with specified parameters.\"\"\"\n        super().__init__(c1, cm, num_heads, dropout, act, normalize_before)\n\n    def forward(self, x):\n        \"\"\"Forward pass for the AIFI transformer layer.\"\"\"\n        c, h, w = x.shape[1:]\n        pos_embed = self.build_2d_sincos_position_embedding(w, h, c)\n        # Flatten [B, C, H, W] to [B, HxW, C]\n        x = super().forward(x.flatten(2).permute(0, 2, 1), pos=pos_embed.to(device=x.device, dtype=x.dtype))\n        return x.permute(0, 2, 1).view([-1, c, h, w]).contiguous()\n\n    @staticmethod\n    def build_2d_sincos_position_embedding(w, h, embed_dim=256, temperature=10000.0):\n        \"\"\"Builds 2D sine-cosine position embedding.\"\"\"\n        grid_w = torch.arange(int(w), dtype=torch.float32)\n        grid_h = torch.arange(int(h), dtype=torch.float32)\n        grid_w, grid_h = torch.meshgrid(grid_w, grid_h, indexing='ij')\n        assert embed_dim % 4 == 0, \\\n            'Embed dimension must be divisible by 4 for 2D sin-cos position embedding'\n        pos_dim = embed_dim // 4\n        omega = torch.arange(pos_dim, dtype=torch.float32) / pos_dim\n        omega = 1. / (temperature ** omega)\n\n        out_w = grid_w.flatten()[..., None] @ omega[None]\n        out_h = grid_h.flatten()[..., None] @ omega[None]\n\n        return torch.cat([torch.sin(out_w), torch.cos(out_w), torch.sin(out_h), torch.cos(out_h)], 1)[None]\n\n# yolov5\nelif m is AIFI:\n    args = [ch[f], *args]\n\n# YOLOv5 🚀 by Ultralytics, AGPL-3.0 license\n\n# Parameters\nnc: 80  # number of classes\ndepth_multiple: 0.33  # model depth multiple\nwidth_multiple: 0.25  # layer channel multiple\nanchors:\n  - [10,13, 16,30, 33,23]  # P3/8\n  - [30,61, 62,45, 59,119]  # P4/16\n  - [116,90, 156,198, 373,326]  # P5/32\n\n# YOLOv5 v6.0 backbone\nbackbone:\n  # [from, number, module, args]\n  [[-1, 1, Conv, [64, 6, 2, 2]],  # 0-P1/2\n   [-1, 1, Conv, [128, 3, 2]],  # 1-P2/4\n   [-1, 3, C3, [128]],\n   [-1, 1, Conv, [256, 3, 2]],  # 3-P3/8\n   [-1, 6, C3, [256]],\n   [-1, 1, Conv, [512, 3, 2]],  # 5-P4/16\n   [-1, 9, C3, [512]],\n   [-1, 1, Conv, [1024, 3, 2]],  # 7-P5/32\n   [-1, 3, C3, [1024]],\n   [-1, 1, Conv, [512, 1]],  # 9\n   [-1, 1, AIFI, [1024, 8]],  # 10\n  ]\n\n# YOLOv5 v6.0 head\nhead:\n  [[-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [[-1, 6], 1, Concat, [1]],  # cat backbone P4\n   [-1, 3, C3, [512, False]],  # 13\n\n   [-1, 1, Conv, [256, 1, 1]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [[-1, 4], 1, Concat, [1]],  # cat backbone P3\n   [-1, 3, C3, [256, False]],  # 17 (P3/8-small)\n\n   [-1, 1, Conv, [256, 3, 2]],\n   [[-1, 14], 1, Concat, [1]],  # cat head P4\n   [-1, 3, C3, [512, False]],  # 20 (P4/16-medium)\n\n   [-1, 1, Conv, [512, 3, 2]],\n   [[-1, 10], 1, Concat, [1]],  # cat head P5\n   [-1, 3, C3, [1024, False]],  # 23 (P5/32-large)\n\n   [[17, 20, 23], 1, Detect, [nc, anchors]],  # Detect(P3, P4, P5)\n  ]\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/benchmarks.py",
    "content": "# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n\"\"\"\nRun YOLOv5 benchmarks on all supported export formats\n\nFormat                      | `export.py --include`         | Model\n---                         | ---                           | ---\nPyTorch                     | -                             | yolov5s.pt\nTorchScript                 | `torchscript`                 | yolov5s.torchscript\nONNX                        | `onnx`                        | yolov5s.onnx\nOpenVINO                    | `openvino`                    | yolov5s_openvino_model/\nTensorRT                    | `engine`                      | yolov5s.engine\nCoreML                      | `coreml`                      | yolov5s.mlmodel\nTensorFlow SavedModel       | `saved_model`                 | yolov5s_saved_model/\nTensorFlow GraphDef         | `pb`                          | yolov5s.pb\nTensorFlow Lite             | `tflite`                      | yolov5s.tflite\nTensorFlow Edge TPU         | `edgetpu`                     | yolov5s_edgetpu.tflite\nTensorFlow.js               | `tfjs`                        | yolov5s_web_model/\n\nRequirements:\n    $ pip install -r requirements.txt coremltools onnx onnx-simplifier onnxruntime openvino-dev tensorflow-cpu  # CPU\n    $ pip install -r requirements.txt coremltools onnx onnx-simplifier onnxruntime-gpu openvino-dev tensorflow  # GPU\n    $ pip install -U nvidia-tensorrt --index-url https://pypi.ngc.nvidia.com  # TensorRT\n\nUsage:\n    $ python benchmarks.py --weights yolov5s.pt --img 640\n\"\"\"\n\nimport argparse\nimport platform\nimport sys\nimport time\nfrom pathlib import Path\n\nimport pandas as pd\n\nFILE = Path(__file__).resolve()\nROOT = FILE.parents[0]  # YOLOv5 root directory\nif str(ROOT) not in sys.path:\n    sys.path.append(str(ROOT))  # add ROOT to PATH\n# ROOT = ROOT.relative_to(Path.cwd())  # relative\n\nimport export\nfrom models.experimental import attempt_load\nfrom models.yolo import SegmentationModel\nfrom segment.val import run as val_seg\nfrom utils import notebook_init\nfrom utils.general import LOGGER, check_yaml, file_size, print_args\nfrom utils.torch_utils import select_device\nfrom val import run as val_det\n\n\ndef run(\n        weights=ROOT / 'yolov5s.pt',  # weights path\n        imgsz=640,  # inference size (pixels)\n        batch_size=1,  # batch size\n        data=ROOT / 'data/coco128.yaml',  # dataset.yaml path\n        device='',  # cuda device, i.e. 0 or 0,1,2,3 or cpu\n        half=False,  # use FP16 half-precision inference\n        test=False,  # test exports only\n        pt_only=False,  # test PyTorch only\n        hard_fail=False,  # throw error on benchmark failure\n):\n    y, t = [], time.time()\n    device = select_device(device)\n    model_type = type(attempt_load(weights, fuse=False))  # DetectionModel, SegmentationModel, etc.\n    for i, (name, f, suffix, cpu, gpu) in export.export_formats().iterrows():  # index, (name, file, suffix, CPU, GPU)\n        try:\n            assert i not in (9, 10), 'inference not supported'  # Edge TPU and TF.js are unsupported\n            assert i != 5 or platform.system() == 'Darwin', 'inference only supported on macOS>=10.13'  # CoreML\n            if 'cpu' in device.type:\n                assert cpu, 'inference not supported on CPU'\n            if 'cuda' in device.type:\n                assert gpu, 'inference not supported on GPU'\n\n            # Export\n            if f == '-':\n                w = weights  # PyTorch format\n            else:\n                w = export.run(weights=weights, imgsz=[imgsz], include=[f], device=device, half=half)[-1]  # all others\n            assert suffix in str(w), 'export failed'\n\n            # Validate\n            if model_type == SegmentationModel:\n                result = val_seg(data, w, batch_size, imgsz, plots=False, device=device, task='speed', half=half)\n                metric = result[0][7]  # (box(p, r, map50, map), mask(p, r, map50, map), *loss(box, obj, cls))\n            else:  # DetectionModel:\n                result = val_det(data, w, batch_size, imgsz, plots=False, device=device, task='speed', half=half)\n                metric = result[0][3]  # (p, r, map50, map, *loss(box, obj, cls))\n            speed = result[2][1]  # times (preprocess, inference, postprocess)\n            y.append([name, round(file_size(w), 1), round(metric, 4), round(speed, 2)])  # MB, mAP, t_inference\n        except Exception as e:\n            if hard_fail:\n                assert type(e) is AssertionError, f'Benchmark --hard-fail for {name}: {e}'\n            LOGGER.warning(f'WARNING ⚠️ Benchmark failure for {name}: {e}')\n            y.append([name, None, None, None])  # mAP, t_inference\n        if pt_only and i == 0:\n            break  # break after PyTorch\n\n    # Print results\n    LOGGER.info('\\n')\n    parse_opt()\n    notebook_init()  # print system info\n    c = ['Format', 'Size (MB)', 'mAP50-95', 'Inference time (ms)'] if map else ['Format', 'Export', '', '']\n    py = pd.DataFrame(y, columns=c)\n    LOGGER.info(f'\\nBenchmarks complete ({time.time() - t:.2f}s)')\n    LOGGER.info(str(py if map else py.iloc[:, :2]))\n    if hard_fail and isinstance(hard_fail, str):\n        metrics = py['mAP50-95'].array  # values to compare to floor\n        floor = eval(hard_fail)  # minimum metric floor to pass, i.e. = 0.29 mAP for YOLOv5n\n        assert all(x > floor for x in metrics if pd.notna(x)), f'HARD FAIL: mAP50-95 < floor {floor}'\n    return py\n\n\ndef test(\n        weights=ROOT / 'yolov5s.pt',  # weights path\n        imgsz=640,  # inference size (pixels)\n        batch_size=1,  # batch size\n        data=ROOT / 'data/coco128.yaml',  # dataset.yaml path\n        device='',  # cuda device, i.e. 0 or 0,1,2,3 or cpu\n        half=False,  # use FP16 half-precision inference\n        test=False,  # test exports only\n        pt_only=False,  # test PyTorch only\n        hard_fail=False,  # throw error on benchmark failure\n):\n    y, t = [], time.time()\n    device = select_device(device)\n    for i, (name, f, suffix, gpu) in export.export_formats().iterrows():  # index, (name, file, suffix, gpu-capable)\n        try:\n            w = weights if f == '-' else \\\n                export.run(weights=weights, imgsz=[imgsz], include=[f], device=device, half=half)[-1]  # weights\n            assert suffix in str(w), 'export failed'\n            y.append([name, True])\n        except Exception:\n            y.append([name, False])  # mAP, t_inference\n\n    # Print results\n    LOGGER.info('\\n')\n    parse_opt()\n    notebook_init()  # print system info\n    py = pd.DataFrame(y, columns=['Format', 'Export'])\n    LOGGER.info(f'\\nExports complete ({time.time() - t:.2f}s)')\n    LOGGER.info(str(py))\n    return py\n\n\ndef parse_opt():\n    parser = argparse.ArgumentParser()\n    parser.add_argument('--weights', type=str, default=ROOT / 'yolov5s.pt', help='weights path')\n    parser.add_argument('--imgsz', '--img', '--img-size', type=int, default=640, help='inference size (pixels)')\n    parser.add_argument('--batch-size', type=int, default=1, help='batch size')\n    parser.add_argument('--data', type=str, default=ROOT / 'data/coco128.yaml', help='dataset.yaml path')\n    parser.add_argument('--device', default='', help='cuda device, i.e. 0 or 0,1,2,3 or cpu')\n    parser.add_argument('--half', action='store_true', help='use FP16 half-precision inference')\n    parser.add_argument('--test', action='store_true', help='test exports only')\n    parser.add_argument('--pt-only', action='store_true', help='test PyTorch only')\n    parser.add_argument('--hard-fail', nargs='?', const=True, default=False, help='Exception on error or < min metric')\n    opt = parser.parse_args()\n    opt.data = check_yaml(opt.data)  # check YAML\n    print_args(vars(opt))\n    return opt\n\n\ndef main(opt):\n    test(**vars(opt)) if opt.test else run(**vars(opt))\n\n\nif __name__ == '__main__':\n    opt = parse_opt()\n    main(opt)\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/data/Argoverse.yaml",
    "content": "# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n# Argoverse-HD dataset (ring-front-center camera) http://www.cs.cmu.edu/~mengtial/proj/streaming/ by Argo AI\n# Example usage: python train.py --data Argoverse.yaml\n# parent\n# ├── yolov5\n# └── datasets\n#     └── Argoverse  ← downloads here (31.3 GB)\n\n\n# Train/val/test sets as 1) dir: path/to/imgs, 2) file: path/to/imgs.txt, or 3) list: [path/to/imgs1, path/to/imgs2, ..]\npath: ../datasets/Argoverse  # dataset root dir\ntrain: Argoverse-1.1/images/train/  # train images (relative to 'path') 39384 images\nval: Argoverse-1.1/images/val/  # val images (relative to 'path') 15062 images\ntest: Argoverse-1.1/images/test/  # test images (optional) https://eval.ai/web/challenges/challenge-page/800/overview\n\n# Classes\nnames:\n  0: person\n  1: bicycle\n  2: car\n  3: motorcycle\n  4: bus\n  5: truck\n  6: traffic_light\n  7: stop_sign\n\n\n# Download script/URL (optional) ---------------------------------------------------------------------------------------\ndownload: |\n  import json\n\n  from tqdm import tqdm\n  from utils.general import download, Path\n\n\n  def argoverse2yolo(set):\n      labels = {}\n      a = json.load(open(set, \"rb\"))\n      for annot in tqdm(a['annotations'], desc=f\"Converting {set} to YOLOv5 format...\"):\n          img_id = annot['image_id']\n          img_name = a['images'][img_id]['name']\n          img_label_name = f'{img_name[:-3]}txt'\n\n          cls = annot['category_id']  # instance class id\n          x_center, y_center, width, height = annot['bbox']\n          x_center = (x_center + width / 2) / 1920.0  # offset and scale\n          y_center = (y_center + height / 2) / 1200.0  # offset and scale\n          width /= 1920.0  # scale\n          height /= 1200.0  # scale\n\n          img_dir = set.parents[2] / 'Argoverse-1.1' / 'labels' / a['seq_dirs'][a['images'][annot['image_id']]['sid']]\n          if not img_dir.exists():\n              img_dir.mkdir(parents=True, exist_ok=True)\n\n          k = str(img_dir / img_label_name)\n          if k not in labels:\n              labels[k] = []\n          labels[k].append(f\"{cls} {x_center} {y_center} {width} {height}\\n\")\n\n      for k in labels:\n          with open(k, \"w\") as f:\n              f.writelines(labels[k])\n\n\n  # Download\n  dir = Path(yaml['path'])  # dataset root dir\n  urls = ['https://argoverse-hd.s3.us-east-2.amazonaws.com/Argoverse-HD-Full.zip']\n  download(urls, dir=dir, delete=False)\n\n  # Convert\n  annotations_dir = 'Argoverse-HD/annotations/'\n  (dir / 'Argoverse-1.1' / 'tracking').rename(dir / 'Argoverse-1.1' / 'images')  # rename 'tracking' to 'images'\n  for d in \"train.json\", \"val.json\":\n      argoverse2yolo(dir / annotations_dir / d)  # convert VisDrone annotations to YOLO labels\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/data/GlobalWheat2020.yaml",
    "content": "# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n# Global Wheat 2020 dataset http://www.global-wheat.com/ by University of Saskatchewan\n# Example usage: python train.py --data GlobalWheat2020.yaml\n# parent\n# ├── yolov5\n# └── datasets\n#     └── GlobalWheat2020  ← downloads here (7.0 GB)\n\n\n# Train/val/test sets as 1) dir: path/to/imgs, 2) file: path/to/imgs.txt, or 3) list: [path/to/imgs1, path/to/imgs2, ..]\npath: ../datasets/GlobalWheat2020  # dataset root dir\ntrain: # train images (relative to 'path') 3422 images\n  - images/arvalis_1\n  - images/arvalis_2\n  - images/arvalis_3\n  - images/ethz_1\n  - images/rres_1\n  - images/inrae_1\n  - images/usask_1\nval: # val images (relative to 'path') 748 images (WARNING: train set contains ethz_1)\n  - images/ethz_1\ntest: # test images (optional) 1276 images\n  - images/utokyo_1\n  - images/utokyo_2\n  - images/nau_1\n  - images/uq_1\n\n# Classes\nnames:\n  0: wheat_head\n\n\n# Download script/URL (optional) ---------------------------------------------------------------------------------------\ndownload: |\n  from utils.general import download, Path\n\n\n  # Download\n  dir = Path(yaml['path'])  # dataset root dir\n  urls = ['https://zenodo.org/record/4298502/files/global-wheat-codalab-official.zip',\n          'https://github.com/ultralytics/yolov5/releases/download/v1.0/GlobalWheat2020_labels.zip']\n  download(urls, dir=dir)\n\n  # Make Directories\n  for p in 'annotations', 'images', 'labels':\n      (dir / p).mkdir(parents=True, exist_ok=True)\n\n  # Move\n  for p in 'arvalis_1', 'arvalis_2', 'arvalis_3', 'ethz_1', 'rres_1', 'inrae_1', 'usask_1', \\\n           'utokyo_1', 'utokyo_2', 'nau_1', 'uq_1':\n      (dir / p).rename(dir / 'images' / p)  # move to /images\n      f = (dir / p).with_suffix('.json')  # json file\n      if f.exists():\n          f.rename((dir / 'annotations' / p).with_suffix('.json'))  # move to /annotations\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/data/ImageNet.yaml",
    "content": "# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n# ImageNet-1k dataset https://www.image-net.org/index.php by Stanford University\n# Simplified class names from https://github.com/anishathalye/imagenet-simple-labels\n# Example usage: python classify/train.py --data imagenet\n# parent\n# ├── yolov5\n# └── datasets\n#     └── imagenet  ← downloads here (144 GB)\n\n\n# Train/val/test sets as 1) dir: path/to/imgs, 2) file: path/to/imgs.txt, or 3) list: [path/to/imgs1, path/to/imgs2, ..]\npath: ../datasets/imagenet  # dataset root dir\ntrain: train  # train images (relative to 'path') 1281167 images\nval: val  # val images (relative to 'path') 50000 images\ntest:  # test images (optional)\n\n# Classes\nnames:\n  0: tench\n  1: goldfish\n  2: great white shark\n  3: tiger shark\n  4: hammerhead shark\n  5: electric ray\n  6: stingray\n  7: cock\n  8: hen\n  9: ostrich\n  10: brambling\n  11: goldfinch\n  12: house finch\n  13: junco\n  14: indigo bunting\n  15: American robin\n  16: bulbul\n  17: jay\n  18: magpie\n  19: chickadee\n  20: American dipper\n  21: kite\n  22: bald eagle\n  23: vulture\n  24: great grey owl\n  25: fire salamander\n  26: smooth newt\n  27: newt\n  28: spotted salamander\n  29: axolotl\n  30: American bullfrog\n  31: tree frog\n  32: tailed frog\n  33: loggerhead sea turtle\n  34: leatherback sea turtle\n  35: mud turtle\n  36: terrapin\n  37: box turtle\n  38: banded gecko\n  39: green iguana\n  40: Carolina anole\n  41: desert grassland whiptail lizard\n  42: agama\n  43: frilled-necked lizard\n  44: alligator lizard\n  45: Gila monster\n  46: European green lizard\n  47: chameleon\n  48: Komodo dragon\n  49: Nile crocodile\n  50: American alligator\n  51: triceratops\n  52: worm snake\n  53: ring-necked snake\n  54: eastern hog-nosed snake\n  55: smooth green snake\n  56: kingsnake\n  57: garter snake\n  58: water snake\n  59: vine snake\n  60: night snake\n  61: boa constrictor\n  62: African rock python\n  63: Indian cobra\n  64: green mamba\n  65: sea snake\n  66: Saharan horned viper\n  67: eastern diamondback rattlesnake\n  68: sidewinder\n  69: trilobite\n  70: harvestman\n  71: scorpion\n  72: yellow garden spider\n  73: barn spider\n  74: European garden spider\n  75: southern black widow\n  76: tarantula\n  77: wolf spider\n  78: tick\n  79: centipede\n  80: black grouse\n  81: ptarmigan\n  82: ruffed grouse\n  83: prairie grouse\n  84: peacock\n  85: quail\n  86: partridge\n  87: grey parrot\n  88: macaw\n  89: sulphur-crested cockatoo\n  90: lorikeet\n  91: coucal\n  92: bee eater\n  93: hornbill\n  94: hummingbird\n  95: jacamar\n  96: toucan\n  97: duck\n  98: red-breasted merganser\n  99: goose\n  100: black swan\n  101: tusker\n  102: echidna\n  103: platypus\n  104: wallaby\n  105: koala\n  106: wombat\n  107: jellyfish\n  108: sea anemone\n  109: brain coral\n  110: flatworm\n  111: nematode\n  112: conch\n  113: snail\n  114: slug\n  115: sea slug\n  116: chiton\n  117: chambered nautilus\n  118: Dungeness crab\n  119: rock crab\n  120: fiddler crab\n  121: red king crab\n  122: American lobster\n  123: spiny lobster\n  124: crayfish\n  125: hermit crab\n  126: isopod\n  127: white stork\n  128: black stork\n  129: spoonbill\n  130: flamingo\n  131: little blue heron\n  132: great egret\n  133: bittern\n  134: crane (bird)\n  135: limpkin\n  136: common gallinule\n  137: American coot\n  138: bustard\n  139: ruddy turnstone\n  140: dunlin\n  141: common redshank\n  142: dowitcher\n  143: oystercatcher\n  144: pelican\n  145: king penguin\n  146: albatross\n  147: grey whale\n  148: killer whale\n  149: dugong\n  150: sea lion\n  151: Chihuahua\n  152: Japanese Chin\n  153: Maltese\n  154: Pekingese\n  155: Shih Tzu\n  156: King Charles Spaniel\n  157: Papillon\n  158: toy terrier\n  159: Rhodesian Ridgeback\n  160: Afghan Hound\n  161: Basset Hound\n  162: Beagle\n  163: Bloodhound\n  164: Bluetick Coonhound\n  165: Black and Tan Coonhound\n  166: Treeing Walker Coonhound\n  167: English foxhound\n  168: Redbone Coonhound\n  169: borzoi\n  170: Irish Wolfhound\n  171: Italian Greyhound\n  172: Whippet\n  173: Ibizan Hound\n  174: Norwegian Elkhound\n  175: Otterhound\n  176: Saluki\n  177: Scottish Deerhound\n  178: Weimaraner\n  179: Staffordshire Bull Terrier\n  180: American Staffordshire Terrier\n  181: Bedlington Terrier\n  182: Border Terrier\n  183: Kerry Blue Terrier\n  184: Irish Terrier\n  185: Norfolk Terrier\n  186: Norwich Terrier\n  187: Yorkshire Terrier\n  188: Wire Fox Terrier\n  189: Lakeland Terrier\n  190: Sealyham Terrier\n  191: Airedale Terrier\n  192: Cairn Terrier\n  193: Australian Terrier\n  194: Dandie Dinmont Terrier\n  195: Boston Terrier\n  196: Miniature Schnauzer\n  197: Giant Schnauzer\n  198: Standard Schnauzer\n  199: Scottish Terrier\n  200: Tibetan Terrier\n  201: Australian Silky Terrier\n  202: Soft-coated Wheaten Terrier\n  203: West Highland White Terrier\n  204: Lhasa Apso\n  205: Flat-Coated Retriever\n  206: Curly-coated Retriever\n  207: Golden Retriever\n  208: Labrador Retriever\n  209: Chesapeake Bay Retriever\n  210: German Shorthaired Pointer\n  211: Vizsla\n  212: English Setter\n  213: Irish Setter\n  214: Gordon Setter\n  215: Brittany\n  216: Clumber Spaniel\n  217: English Springer Spaniel\n  218: Welsh Springer Spaniel\n  219: Cocker Spaniels\n  220: Sussex Spaniel\n  221: Irish Water Spaniel\n  222: Kuvasz\n  223: Schipperke\n  224: Groenendael\n  225: Malinois\n  226: Briard\n  227: Australian Kelpie\n  228: Komondor\n  229: Old English Sheepdog\n  230: Shetland Sheepdog\n  231: collie\n  232: Border Collie\n  233: Bouvier des Flandres\n  234: Rottweiler\n  235: German Shepherd Dog\n  236: Dobermann\n  237: Miniature Pinscher\n  238: Greater Swiss Mountain Dog\n  239: Bernese Mountain Dog\n  240: Appenzeller Sennenhund\n  241: Entlebucher Sennenhund\n  242: Boxer\n  243: Bullmastiff\n  244: Tibetan Mastiff\n  245: French Bulldog\n  246: Great Dane\n  247: St. Bernard\n  248: husky\n  249: Alaskan Malamute\n  250: Siberian Husky\n  251: Dalmatian\n  252: Affenpinscher\n  253: Basenji\n  254: pug\n  255: Leonberger\n  256: Newfoundland\n  257: Pyrenean Mountain Dog\n  258: Samoyed\n  259: Pomeranian\n  260: Chow Chow\n  261: Keeshond\n  262: Griffon Bruxellois\n  263: Pembroke Welsh Corgi\n  264: Cardigan Welsh Corgi\n  265: Toy Poodle\n  266: Miniature Poodle\n  267: Standard Poodle\n  268: Mexican hairless dog\n  269: grey wolf\n  270: Alaskan tundra wolf\n  271: red wolf\n  272: coyote\n  273: dingo\n  274: dhole\n  275: African wild dog\n  276: hyena\n  277: red fox\n  278: kit fox\n  279: Arctic fox\n  280: grey fox\n  281: tabby cat\n  282: tiger cat\n  283: Persian cat\n  284: Siamese cat\n  285: Egyptian Mau\n  286: cougar\n  287: lynx\n  288: leopard\n  289: snow leopard\n  290: jaguar\n  291: lion\n  292: tiger\n  293: cheetah\n  294: brown bear\n  295: American black bear\n  296: polar bear\n  297: sloth bear\n  298: mongoose\n  299: meerkat\n  300: tiger beetle\n  301: ladybug\n  302: ground beetle\n  303: longhorn beetle\n  304: leaf beetle\n  305: dung beetle\n  306: rhinoceros beetle\n  307: weevil\n  308: fly\n  309: bee\n  310: ant\n  311: grasshopper\n  312: cricket\n  313: stick insect\n  314: cockroach\n  315: mantis\n  316: cicada\n  317: leafhopper\n  318: lacewing\n  319: dragonfly\n  320: damselfly\n  321: red admiral\n  322: ringlet\n  323: monarch butterfly\n  324: small white\n  325: sulphur butterfly\n  326: gossamer-winged butterfly\n  327: starfish\n  328: sea urchin\n  329: sea cucumber\n  330: cottontail rabbit\n  331: hare\n  332: Angora rabbit\n  333: hamster\n  334: porcupine\n  335: fox squirrel\n  336: marmot\n  337: beaver\n  338: guinea pig\n  339: common sorrel\n  340: zebra\n  341: pig\n  342: wild boar\n  343: warthog\n  344: hippopotamus\n  345: ox\n  346: water buffalo\n  347: bison\n  348: ram\n  349: bighorn sheep\n  350: Alpine ibex\n  351: hartebeest\n  352: impala\n  353: gazelle\n  354: dromedary\n  355: llama\n  356: weasel\n  357: mink\n  358: European polecat\n  359: black-footed ferret\n  360: otter\n  361: skunk\n  362: badger\n  363: armadillo\n  364: three-toed sloth\n  365: orangutan\n  366: gorilla\n  367: chimpanzee\n  368: gibbon\n  369: siamang\n  370: guenon\n  371: patas monkey\n  372: baboon\n  373: macaque\n  374: langur\n  375: black-and-white colobus\n  376: proboscis monkey\n  377: marmoset\n  378: white-headed capuchin\n  379: howler monkey\n  380: titi\n  381: Geoffroy's spider monkey\n  382: common squirrel monkey\n  383: ring-tailed lemur\n  384: indri\n  385: Asian elephant\n  386: African bush elephant\n  387: red panda\n  388: giant panda\n  389: snoek\n  390: eel\n  391: coho salmon\n  392: rock beauty\n  393: clownfish\n  394: sturgeon\n  395: garfish\n  396: lionfish\n  397: pufferfish\n  398: abacus\n  399: abaya\n  400: academic gown\n  401: accordion\n  402: acoustic guitar\n  403: aircraft carrier\n  404: airliner\n  405: airship\n  406: altar\n  407: ambulance\n  408: amphibious vehicle\n  409: analog clock\n  410: apiary\n  411: apron\n  412: waste container\n  413: assault rifle\n  414: backpack\n  415: bakery\n  416: balance beam\n  417: balloon\n  418: ballpoint pen\n  419: Band-Aid\n  420: banjo\n  421: baluster\n  422: barbell\n  423: barber chair\n  424: barbershop\n  425: barn\n  426: barometer\n  427: barrel\n  428: wheelbarrow\n  429: baseball\n  430: basketball\n  431: bassinet\n  432: bassoon\n  433: swimming cap\n  434: bath towel\n  435: bathtub\n  436: station wagon\n  437: lighthouse\n  438: beaker\n  439: military cap\n  440: beer bottle\n  441: beer glass\n  442: bell-cot\n  443: bib\n  444: tandem bicycle\n  445: bikini\n  446: ring binder\n  447: binoculars\n  448: birdhouse\n  449: boathouse\n  450: bobsleigh\n  451: bolo tie\n  452: poke bonnet\n  453: bookcase\n  454: bookstore\n  455: bottle cap\n  456: bow\n  457: bow tie\n  458: brass\n  459: bra\n  460: breakwater\n  461: breastplate\n  462: broom\n  463: bucket\n  464: buckle\n  465: bulletproof vest\n  466: high-speed train\n  467: butcher shop\n  468: taxicab\n  469: cauldron\n  470: candle\n  471: cannon\n  472: canoe\n  473: can opener\n  474: cardigan\n  475: car mirror\n  476: carousel\n  477: tool kit\n  478: carton\n  479: car wheel\n  480: automated teller machine\n  481: cassette\n  482: cassette player\n  483: castle\n  484: catamaran\n  485: CD player\n  486: cello\n  487: mobile phone\n  488: chain\n  489: chain-link fence\n  490: chain mail\n  491: chainsaw\n  492: chest\n  493: chiffonier\n  494: chime\n  495: china cabinet\n  496: Christmas stocking\n  497: church\n  498: movie theater\n  499: cleaver\n  500: cliff dwelling\n  501: cloak\n  502: clogs\n  503: cocktail shaker\n  504: coffee mug\n  505: coffeemaker\n  506: coil\n  507: combination lock\n  508: computer keyboard\n  509: confectionery store\n  510: container ship\n  511: convertible\n  512: corkscrew\n  513: cornet\n  514: cowboy boot\n  515: cowboy hat\n  516: cradle\n  517: crane (machine)\n  518: crash helmet\n  519: crate\n  520: infant bed\n  521: Crock Pot\n  522: croquet ball\n  523: crutch\n  524: cuirass\n  525: dam\n  526: desk\n  527: desktop computer\n  528: rotary dial telephone\n  529: diaper\n  530: digital clock\n  531: digital watch\n  532: dining table\n  533: dishcloth\n  534: dishwasher\n  535: disc brake\n  536: dock\n  537: dog sled\n  538: dome\n  539: doormat\n  540: drilling rig\n  541: drum\n  542: drumstick\n  543: dumbbell\n  544: Dutch oven\n  545: electric fan\n  546: electric guitar\n  547: electric locomotive\n  548: entertainment center\n  549: envelope\n  550: espresso machine\n  551: face powder\n  552: feather boa\n  553: filing cabinet\n  554: fireboat\n  555: fire engine\n  556: fire screen sheet\n  557: flagpole\n  558: flute\n  559: folding chair\n  560: football helmet\n  561: forklift\n  562: fountain\n  563: fountain pen\n  564: four-poster bed\n  565: freight car\n  566: French horn\n  567: frying pan\n  568: fur coat\n  569: garbage truck\n  570: gas mask\n  571: gas pump\n  572: goblet\n  573: go-kart\n  574: golf ball\n  575: golf cart\n  576: gondola\n  577: gong\n  578: gown\n  579: grand piano\n  580: greenhouse\n  581: grille\n  582: grocery store\n  583: guillotine\n  584: barrette\n  585: hair spray\n  586: half-track\n  587: hammer\n  588: hamper\n  589: hair dryer\n  590: hand-held computer\n  591: handkerchief\n  592: hard disk drive\n  593: harmonica\n  594: harp\n  595: harvester\n  596: hatchet\n  597: holster\n  598: home theater\n  599: honeycomb\n  600: hook\n  601: hoop skirt\n  602: horizontal bar\n  603: horse-drawn vehicle\n  604: hourglass\n  605: iPod\n  606: clothes iron\n  607: jack-o'-lantern\n  608: jeans\n  609: jeep\n  610: T-shirt\n  611: jigsaw puzzle\n  612: pulled rickshaw\n  613: joystick\n  614: kimono\n  615: knee pad\n  616: knot\n  617: lab coat\n  618: ladle\n  619: lampshade\n  620: laptop computer\n  621: lawn mower\n  622: lens cap\n  623: paper knife\n  624: library\n  625: lifeboat\n  626: lighter\n  627: limousine\n  628: ocean liner\n  629: lipstick\n  630: slip-on shoe\n  631: lotion\n  632: speaker\n  633: loupe\n  634: sawmill\n  635: magnetic compass\n  636: mail bag\n  637: mailbox\n  638: tights\n  639: tank suit\n  640: manhole cover\n  641: maraca\n  642: marimba\n  643: mask\n  644: match\n  645: maypole\n  646: maze\n  647: measuring cup\n  648: medicine chest\n  649: megalith\n  650: microphone\n  651: microwave oven\n  652: military uniform\n  653: milk can\n  654: minibus\n  655: miniskirt\n  656: minivan\n  657: missile\n  658: mitten\n  659: mixing bowl\n  660: mobile home\n  661: Model T\n  662: modem\n  663: monastery\n  664: monitor\n  665: moped\n  666: mortar\n  667: square academic cap\n  668: mosque\n  669: mosquito net\n  670: scooter\n  671: mountain bike\n  672: tent\n  673: computer mouse\n  674: mousetrap\n  675: moving van\n  676: muzzle\n  677: nail\n  678: neck brace\n  679: necklace\n  680: nipple\n  681: notebook computer\n  682: obelisk\n  683: oboe\n  684: ocarina\n  685: odometer\n  686: oil filter\n  687: organ\n  688: oscilloscope\n  689: overskirt\n  690: bullock cart\n  691: oxygen mask\n  692: packet\n  693: paddle\n  694: paddle wheel\n  695: padlock\n  696: paintbrush\n  697: pajamas\n  698: palace\n  699: pan flute\n  700: paper towel\n  701: parachute\n  702: parallel bars\n  703: park bench\n  704: parking meter\n  705: passenger car\n  706: patio\n  707: payphone\n  708: pedestal\n  709: pencil case\n  710: pencil sharpener\n  711: perfume\n  712: Petri dish\n  713: photocopier\n  714: plectrum\n  715: Pickelhaube\n  716: picket fence\n  717: pickup truck\n  718: pier\n  719: piggy bank\n  720: pill bottle\n  721: pillow\n  722: ping-pong ball\n  723: pinwheel\n  724: pirate ship\n  725: pitcher\n  726: hand plane\n  727: planetarium\n  728: plastic bag\n  729: plate rack\n  730: plow\n  731: plunger\n  732: Polaroid camera\n  733: pole\n  734: police van\n  735: poncho\n  736: billiard table\n  737: soda bottle\n  738: pot\n  739: potter's wheel\n  740: power drill\n  741: prayer rug\n  742: printer\n  743: prison\n  744: projectile\n  745: projector\n  746: hockey puck\n  747: punching bag\n  748: purse\n  749: quill\n  750: quilt\n  751: race car\n  752: racket\n  753: radiator\n  754: radio\n  755: radio telescope\n  756: rain barrel\n  757: recreational vehicle\n  758: reel\n  759: reflex camera\n  760: refrigerator\n  761: remote control\n  762: restaurant\n  763: revolver\n  764: rifle\n  765: rocking chair\n  766: rotisserie\n  767: eraser\n  768: rugby ball\n  769: ruler\n  770: running shoe\n  771: safe\n  772: safety pin\n  773: salt shaker\n  774: sandal\n  775: sarong\n  776: saxophone\n  777: scabbard\n  778: weighing scale\n  779: school bus\n  780: schooner\n  781: scoreboard\n  782: CRT screen\n  783: screw\n  784: screwdriver\n  785: seat belt\n  786: sewing machine\n  787: shield\n  788: shoe store\n  789: shoji\n  790: shopping basket\n  791: shopping cart\n  792: shovel\n  793: shower cap\n  794: shower curtain\n  795: ski\n  796: ski mask\n  797: sleeping bag\n  798: slide rule\n  799: sliding door\n  800: slot machine\n  801: snorkel\n  802: snowmobile\n  803: snowplow\n  804: soap dispenser\n  805: soccer ball\n  806: sock\n  807: solar thermal collector\n  808: sombrero\n  809: soup bowl\n  810: space bar\n  811: space heater\n  812: space shuttle\n  813: spatula\n  814: motorboat\n  815: spider web\n  816: spindle\n  817: sports car\n  818: spotlight\n  819: stage\n  820: steam locomotive\n  821: through arch bridge\n  822: steel drum\n  823: stethoscope\n  824: scarf\n  825: stone wall\n  826: stopwatch\n  827: stove\n  828: strainer\n  829: tram\n  830: stretcher\n  831: couch\n  832: stupa\n  833: submarine\n  834: suit\n  835: sundial\n  836: sunglass\n  837: sunglasses\n  838: sunscreen\n  839: suspension bridge\n  840: mop\n  841: sweatshirt\n  842: swimsuit\n  843: swing\n  844: switch\n  845: syringe\n  846: table lamp\n  847: tank\n  848: tape player\n  849: teapot\n  850: teddy bear\n  851: television\n  852: tennis ball\n  853: thatched roof\n  854: front curtain\n  855: thimble\n  856: threshing machine\n  857: throne\n  858: tile roof\n  859: toaster\n  860: tobacco shop\n  861: toilet seat\n  862: torch\n  863: totem pole\n  864: tow truck\n  865: toy store\n  866: tractor\n  867: semi-trailer truck\n  868: tray\n  869: trench coat\n  870: tricycle\n  871: trimaran\n  872: tripod\n  873: triumphal arch\n  874: trolleybus\n  875: trombone\n  876: tub\n  877: turnstile\n  878: typewriter keyboard\n  879: umbrella\n  880: unicycle\n  881: upright piano\n  882: vacuum cleaner\n  883: vase\n  884: vault\n  885: velvet\n  886: vending machine\n  887: vestment\n  888: viaduct\n  889: violin\n  890: volleyball\n  891: waffle iron\n  892: wall clock\n  893: wallet\n  894: wardrobe\n  895: military aircraft\n  896: sink\n  897: washing machine\n  898: water bottle\n  899: water jug\n  900: water tower\n  901: whiskey jug\n  902: whistle\n  903: wig\n  904: window screen\n  905: window shade\n  906: Windsor tie\n  907: wine bottle\n  908: wing\n  909: wok\n  910: wooden spoon\n  911: wool\n  912: split-rail fence\n  913: shipwreck\n  914: yawl\n  915: yurt\n  916: website\n  917: comic book\n  918: crossword\n  919: traffic sign\n  920: traffic light\n  921: dust jacket\n  922: menu\n  923: plate\n  924: guacamole\n  925: consomme\n  926: hot pot\n  927: trifle\n  928: ice cream\n  929: ice pop\n  930: baguette\n  931: bagel\n  932: pretzel\n  933: cheeseburger\n  934: hot dog\n  935: mashed potato\n  936: cabbage\n  937: broccoli\n  938: cauliflower\n  939: zucchini\n  940: spaghetti squash\n  941: acorn squash\n  942: butternut squash\n  943: cucumber\n  944: artichoke\n  945: bell pepper\n  946: cardoon\n  947: mushroom\n  948: Granny Smith\n  949: strawberry\n  950: orange\n  951: lemon\n  952: fig\n  953: pineapple\n  954: banana\n  955: jackfruit\n  956: custard apple\n  957: pomegranate\n  958: hay\n  959: carbonara\n  960: chocolate syrup\n  961: dough\n  962: meatloaf\n  963: pizza\n  964: pot pie\n  965: burrito\n  966: red wine\n  967: espresso\n  968: cup\n  969: eggnog\n  970: alp\n  971: bubble\n  972: cliff\n  973: coral reef\n  974: geyser\n  975: lakeshore\n  976: promontory\n  977: shoal\n  978: seashore\n  979: valley\n  980: volcano\n  981: baseball player\n  982: bridegroom\n  983: scuba diver\n  984: rapeseed\n  985: daisy\n  986: yellow lady's slipper\n  987: corn\n  988: acorn\n  989: rose hip\n  990: horse chestnut seed\n  991: coral fungus\n  992: agaric\n  993: gyromitra\n  994: stinkhorn mushroom\n  995: earth star\n  996: hen-of-the-woods\n  997: bolete\n  998: ear\n  999: toilet paper\n\n\n# Download script/URL (optional)\ndownload: data/scripts/get_imagenet.sh\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/data/Objects365.yaml",
    "content": "# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n# Objects365 dataset https://www.objects365.org/ by Megvii\n# Example usage: python train.py --data Objects365.yaml\n# parent\n# ├── yolov5\n# └── datasets\n#     └── Objects365  ← downloads here (712 GB = 367G data + 345G zips)\n\n\n# Train/val/test sets as 1) dir: path/to/imgs, 2) file: path/to/imgs.txt, or 3) list: [path/to/imgs1, path/to/imgs2, ..]\npath: ../datasets/Objects365  # dataset root dir\ntrain: images/train  # train images (relative to 'path') 1742289 images\nval: images/val # val images (relative to 'path') 80000 images\ntest:  # test images (optional)\n\n# Classes\nnames:\n  0: Person\n  1: Sneakers\n  2: Chair\n  3: Other Shoes\n  4: Hat\n  5: Car\n  6: Lamp\n  7: Glasses\n  8: Bottle\n  9: Desk\n  10: Cup\n  11: Street Lights\n  12: Cabinet/shelf\n  13: Handbag/Satchel\n  14: Bracelet\n  15: Plate\n  16: Picture/Frame\n  17: Helmet\n  18: Book\n  19: Gloves\n  20: Storage box\n  21: Boat\n  22: Leather Shoes\n  23: Flower\n  24: Bench\n  25: Potted Plant\n  26: Bowl/Basin\n  27: Flag\n  28: Pillow\n  29: Boots\n  30: Vase\n  31: Microphone\n  32: Necklace\n  33: Ring\n  34: SUV\n  35: Wine Glass\n  36: Belt\n  37: Monitor/TV\n  38: Backpack\n  39: Umbrella\n  40: Traffic Light\n  41: Speaker\n  42: Watch\n  43: Tie\n  44: Trash bin Can\n  45: Slippers\n  46: Bicycle\n  47: Stool\n  48: Barrel/bucket\n  49: Van\n  50: Couch\n  51: Sandals\n  52: Basket\n  53: Drum\n  54: Pen/Pencil\n  55: Bus\n  56: Wild Bird\n  57: High Heels\n  58: Motorcycle\n  59: Guitar\n  60: Carpet\n  61: Cell Phone\n  62: Bread\n  63: Camera\n  64: Canned\n  65: Truck\n  66: Traffic cone\n  67: Cymbal\n  68: Lifesaver\n  69: Towel\n  70: Stuffed Toy\n  71: Candle\n  72: Sailboat\n  73: Laptop\n  74: Awning\n  75: Bed\n  76: Faucet\n  77: Tent\n  78: Horse\n  79: Mirror\n  80: Power outlet\n  81: Sink\n  82: Apple\n  83: Air Conditioner\n  84: Knife\n  85: Hockey Stick\n  86: Paddle\n  87: Pickup Truck\n  88: Fork\n  89: Traffic Sign\n  90: Balloon\n  91: Tripod\n  92: Dog\n  93: Spoon\n  94: Clock\n  95: Pot\n  96: Cow\n  97: Cake\n  98: Dinning Table\n  99: Sheep\n  100: Hanger\n  101: Blackboard/Whiteboard\n  102: Napkin\n  103: Other Fish\n  104: Orange/Tangerine\n  105: Toiletry\n  106: Keyboard\n  107: Tomato\n  108: Lantern\n  109: Machinery Vehicle\n  110: Fan\n  111: Green Vegetables\n  112: Banana\n  113: Baseball Glove\n  114: Airplane\n  115: Mouse\n  116: Train\n  117: Pumpkin\n  118: Soccer\n  119: Skiboard\n  120: Luggage\n  121: Nightstand\n  122: Tea pot\n  123: Telephone\n  124: Trolley\n  125: Head Phone\n  126: Sports Car\n  127: Stop Sign\n  128: Dessert\n  129: Scooter\n  130: Stroller\n  131: Crane\n  132: Remote\n  133: Refrigerator\n  134: Oven\n  135: Lemon\n  136: Duck\n  137: Baseball Bat\n  138: Surveillance Camera\n  139: Cat\n  140: Jug\n  141: Broccoli\n  142: Piano\n  143: Pizza\n  144: Elephant\n  145: Skateboard\n  146: Surfboard\n  147: Gun\n  148: Skating and Skiing shoes\n  149: Gas stove\n  150: Donut\n  151: Bow Tie\n  152: Carrot\n  153: Toilet\n  154: Kite\n  155: Strawberry\n  156: Other Balls\n  157: Shovel\n  158: Pepper\n  159: Computer Box\n  160: Toilet Paper\n  161: Cleaning Products\n  162: Chopsticks\n  163: Microwave\n  164: Pigeon\n  165: Baseball\n  166: Cutting/chopping Board\n  167: Coffee Table\n  168: Side Table\n  169: Scissors\n  170: Marker\n  171: Pie\n  172: Ladder\n  173: Snowboard\n  174: Cookies\n  175: Radiator\n  176: Fire Hydrant\n  177: Basketball\n  178: Zebra\n  179: Grape\n  180: Giraffe\n  181: Potato\n  182: Sausage\n  183: Tricycle\n  184: Violin\n  185: Egg\n  186: Fire Extinguisher\n  187: Candy\n  188: Fire Truck\n  189: Billiards\n  190: Converter\n  191: Bathtub\n  192: Wheelchair\n  193: Golf Club\n  194: Briefcase\n  195: Cucumber\n  196: Cigar/Cigarette\n  197: Paint Brush\n  198: Pear\n  199: Heavy Truck\n  200: Hamburger\n  201: Extractor\n  202: Extension Cord\n  203: Tong\n  204: Tennis Racket\n  205: Folder\n  206: American Football\n  207: earphone\n  208: Mask\n  209: Kettle\n  210: Tennis\n  211: Ship\n  212: Swing\n  213: Coffee Machine\n  214: Slide\n  215: Carriage\n  216: Onion\n  217: Green beans\n  218: Projector\n  219: Frisbee\n  220: Washing Machine/Drying Machine\n  221: Chicken\n  222: Printer\n  223: Watermelon\n  224: Saxophone\n  225: Tissue\n  226: Toothbrush\n  227: Ice cream\n  228: Hot-air balloon\n  229: Cello\n  230: French Fries\n  231: Scale\n  232: Trophy\n  233: Cabbage\n  234: Hot dog\n  235: Blender\n  236: Peach\n  237: Rice\n  238: Wallet/Purse\n  239: Volleyball\n  240: Deer\n  241: Goose\n  242: Tape\n  243: Tablet\n  244: Cosmetics\n  245: Trumpet\n  246: Pineapple\n  247: Golf Ball\n  248: Ambulance\n  249: Parking meter\n  250: Mango\n  251: Key\n  252: Hurdle\n  253: Fishing Rod\n  254: Medal\n  255: Flute\n  256: Brush\n  257: Penguin\n  258: Megaphone\n  259: Corn\n  260: Lettuce\n  261: Garlic\n  262: Swan\n  263: Helicopter\n  264: Green Onion\n  265: Sandwich\n  266: Nuts\n  267: Speed Limit Sign\n  268: Induction Cooker\n  269: Broom\n  270: Trombone\n  271: Plum\n  272: Rickshaw\n  273: Goldfish\n  274: Kiwi fruit\n  275: Router/modem\n  276: Poker Card\n  277: Toaster\n  278: Shrimp\n  279: Sushi\n  280: Cheese\n  281: Notepaper\n  282: Cherry\n  283: Pliers\n  284: CD\n  285: Pasta\n  286: Hammer\n  287: Cue\n  288: Avocado\n  289: Hamimelon\n  290: Flask\n  291: Mushroom\n  292: Screwdriver\n  293: Soap\n  294: Recorder\n  295: Bear\n  296: Eggplant\n  297: Board Eraser\n  298: Coconut\n  299: Tape Measure/Ruler\n  300: Pig\n  301: Showerhead\n  302: Globe\n  303: Chips\n  304: Steak\n  305: Crosswalk Sign\n  306: Stapler\n  307: Camel\n  308: Formula 1\n  309: Pomegranate\n  310: Dishwasher\n  311: Crab\n  312: Hoverboard\n  313: Meat ball\n  314: Rice Cooker\n  315: Tuba\n  316: Calculator\n  317: Papaya\n  318: Antelope\n  319: Parrot\n  320: Seal\n  321: Butterfly\n  322: Dumbbell\n  323: Donkey\n  324: Lion\n  325: Urinal\n  326: Dolphin\n  327: Electric Drill\n  328: Hair Dryer\n  329: Egg tart\n  330: Jellyfish\n  331: Treadmill\n  332: Lighter\n  333: Grapefruit\n  334: Game board\n  335: Mop\n  336: Radish\n  337: Baozi\n  338: Target\n  339: French\n  340: Spring Rolls\n  341: Monkey\n  342: Rabbit\n  343: Pencil Case\n  344: Yak\n  345: Red Cabbage\n  346: Binoculars\n  347: Asparagus\n  348: Barbell\n  349: Scallop\n  350: Noddles\n  351: Comb\n  352: Dumpling\n  353: Oyster\n  354: Table Tennis paddle\n  355: Cosmetics Brush/Eyeliner Pencil\n  356: Chainsaw\n  357: Eraser\n  358: Lobster\n  359: Durian\n  360: Okra\n  361: Lipstick\n  362: Cosmetics Mirror\n  363: Curling\n  364: Table Tennis\n\n\n# Download script/URL (optional) ---------------------------------------------------------------------------------------\ndownload: |\n  from tqdm import tqdm\n\n  from utils.general import Path, check_requirements, download, np, xyxy2xywhn\n\n  check_requirements(('pycocotools>=2.0',))\n  from pycocotools.coco import COCO\n\n  # Make Directories\n  dir = Path(yaml['path'])  # dataset root dir\n  for p in 'images', 'labels':\n      (dir / p).mkdir(parents=True, exist_ok=True)\n      for q in 'train', 'val':\n          (dir / p / q).mkdir(parents=True, exist_ok=True)\n\n  # Train, Val Splits\n  for split, patches in [('train', 50 + 1), ('val', 43 + 1)]:\n      print(f\"Processing {split} in {patches} patches ...\")\n      images, labels = dir / 'images' / split, dir / 'labels' / split\n\n      # Download\n      url = f\"https://dorc.ks3-cn-beijing.ksyun.com/data-set/2020Objects365%E6%95%B0%E6%8D%AE%E9%9B%86/{split}/\"\n      if split == 'train':\n          download([f'{url}zhiyuan_objv2_{split}.tar.gz'], dir=dir, delete=False)  # annotations json\n          download([f'{url}patch{i}.tar.gz' for i in range(patches)], dir=images, curl=True, delete=False, threads=8)\n      elif split == 'val':\n          download([f'{url}zhiyuan_objv2_{split}.json'], dir=dir, delete=False)  # annotations json\n          download([f'{url}images/v1/patch{i}.tar.gz' for i in range(15 + 1)], dir=images, curl=True, delete=False, threads=8)\n          download([f'{url}images/v2/patch{i}.tar.gz' for i in range(16, patches)], dir=images, curl=True, delete=False, threads=8)\n\n      # Move\n      for f in tqdm(images.rglob('*.jpg'), desc=f'Moving {split} images'):\n          f.rename(images / f.name)  # move to /images/{split}\n\n      # Labels\n      coco = COCO(dir / f'zhiyuan_objv2_{split}.json')\n      names = [x[\"name\"] for x in coco.loadCats(coco.getCatIds())]\n      for cid, cat in enumerate(names):\n          catIds = coco.getCatIds(catNms=[cat])\n          imgIds = coco.getImgIds(catIds=catIds)\n          for im in tqdm(coco.loadImgs(imgIds), desc=f'Class {cid + 1}/{len(names)} {cat}'):\n              width, height = im[\"width\"], im[\"height\"]\n              path = Path(im[\"file_name\"])  # image filename\n              try:\n                  with open(labels / path.with_suffix('.txt').name, 'a') as file:\n                      annIds = coco.getAnnIds(imgIds=im[\"id\"], catIds=catIds, iscrowd=None)\n                      for a in coco.loadAnns(annIds):\n                          x, y, w, h = a['bbox']  # bounding box in xywh (xy top-left corner)\n                          xyxy = np.array([x, y, x + w, y + h])[None]  # pixels(1,4)\n                          x, y, w, h = xyxy2xywhn(xyxy, w=width, h=height, clip=True)[0]  # normalized and clipped\n                          file.write(f\"{cid} {x:.5f} {y:.5f} {w:.5f} {h:.5f}\\n\")\n              except Exception as e:\n                  print(e)\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/data/SKU-110K.yaml",
    "content": "# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n# SKU-110K retail items dataset https://github.com/eg4000/SKU110K_CVPR19 by Trax Retail\n# Example usage: python train.py --data SKU-110K.yaml\n# parent\n# ├── yolov5\n# └── datasets\n#     └── SKU-110K  ← downloads here (13.6 GB)\n\n\n# Train/val/test sets as 1) dir: path/to/imgs, 2) file: path/to/imgs.txt, or 3) list: [path/to/imgs1, path/to/imgs2, ..]\npath: ../datasets/SKU-110K  # dataset root dir\ntrain: train.txt  # train images (relative to 'path')  8219 images\nval: val.txt  # val images (relative to 'path')  588 images\ntest: test.txt  # test images (optional)  2936 images\n\n# Classes\nnames:\n  0: object\n\n\n# Download script/URL (optional) ---------------------------------------------------------------------------------------\ndownload: |\n  import shutil\n  from tqdm import tqdm\n  from utils.general import np, pd, Path, download, xyxy2xywh\n\n\n  # Download\n  dir = Path(yaml['path'])  # dataset root dir\n  parent = Path(dir.parent)  # download dir\n  urls = ['http://trax-geometry.s3.amazonaws.com/cvpr_challenge/SKU110K_fixed.tar.gz']\n  download(urls, dir=parent, delete=False)\n\n  # Rename directories\n  if dir.exists():\n      shutil.rmtree(dir)\n  (parent / 'SKU110K_fixed').rename(dir)  # rename dir\n  (dir / 'labels').mkdir(parents=True, exist_ok=True)  # create labels dir\n\n  # Convert labels\n  names = 'image', 'x1', 'y1', 'x2', 'y2', 'class', 'image_width', 'image_height'  # column names\n  for d in 'annotations_train.csv', 'annotations_val.csv', 'annotations_test.csv':\n      x = pd.read_csv(dir / 'annotations' / d, names=names).values  # annotations\n      images, unique_images = x[:, 0], np.unique(x[:, 0])\n      with open((dir / d).with_suffix('.txt').__str__().replace('annotations_', ''), 'w') as f:\n          f.writelines(f'./images/{s}\\n' for s in unique_images)\n      for im in tqdm(unique_images, desc=f'Converting {dir / d}'):\n          cls = 0  # single-class dataset\n          with open((dir / 'labels' / im).with_suffix('.txt'), 'a') as f:\n              for r in x[images == im]:\n                  w, h = r[6], r[7]  # image width, height\n                  xywh = xyxy2xywh(np.array([[r[1] / w, r[2] / h, r[3] / w, r[4] / h]]))[0]  # instance\n                  f.write(f\"{cls} {xywh[0]:.5f} {xywh[1]:.5f} {xywh[2]:.5f} {xywh[3]:.5f}\\n\")  # write label\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/data/VOC.yaml",
    "content": "# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n# PASCAL VOC dataset http://host.robots.ox.ac.uk/pascal/VOC by University of Oxford\n# Example usage: python train.py --data VOC.yaml\n# parent\n# ├── yolov5\n# └── datasets\n#     └── VOC  ← downloads here (2.8 GB)\n\n\n# Train/val/test sets as 1) dir: path/to/imgs, 2) file: path/to/imgs.txt, or 3) list: [path/to/imgs1, path/to/imgs2, ..]\npath: ../datasets/VOC\ntrain: # train images (relative to 'path')  16551 images\n  - images/train2012\n  - images/train2007\n  - images/val2012\n  - images/val2007\nval: # val images (relative to 'path')  4952 images\n  - images/test2007\ntest: # test images (optional)\n  - images/test2007\n\n# Classes\nnames:\n  0: aeroplane\n  1: bicycle\n  2: bird\n  3: boat\n  4: bottle\n  5: bus\n  6: car\n  7: cat\n  8: chair\n  9: cow\n  10: diningtable\n  11: dog\n  12: horse\n  13: motorbike\n  14: person\n  15: pottedplant\n  16: sheep\n  17: sofa\n  18: train\n  19: tvmonitor\n\n\n# Download script/URL (optional) ---------------------------------------------------------------------------------------\ndownload: |\n  import xml.etree.ElementTree as ET\n\n  from tqdm import tqdm\n  from utils.general import download, Path\n\n\n  def convert_label(path, lb_path, year, image_id):\n      def convert_box(size, box):\n          dw, dh = 1. / size[0], 1. / size[1]\n          x, y, w, h = (box[0] + box[1]) / 2.0 - 1, (box[2] + box[3]) / 2.0 - 1, box[1] - box[0], box[3] - box[2]\n          return x * dw, y * dh, w * dw, h * dh\n\n      in_file = open(path / f'VOC{year}/Annotations/{image_id}.xml')\n      out_file = open(lb_path, 'w')\n      tree = ET.parse(in_file)\n      root = tree.getroot()\n      size = root.find('size')\n      w = int(size.find('width').text)\n      h = int(size.find('height').text)\n\n      names = list(yaml['names'].values())  # names list\n      for obj in root.iter('object'):\n          cls = obj.find('name').text\n          if cls in names and int(obj.find('difficult').text) != 1:\n              xmlbox = obj.find('bndbox')\n              bb = convert_box((w, h), [float(xmlbox.find(x).text) for x in ('xmin', 'xmax', 'ymin', 'ymax')])\n              cls_id = names.index(cls)  # class id\n              out_file.write(\" \".join([str(a) for a in (cls_id, *bb)]) + '\\n')\n\n\n  # Download\n  dir = Path(yaml['path'])  # dataset root dir\n  url = 'https://github.com/ultralytics/yolov5/releases/download/v1.0/'\n  urls = [f'{url}VOCtrainval_06-Nov-2007.zip',  # 446MB, 5012 images\n          f'{url}VOCtest_06-Nov-2007.zip',  # 438MB, 4953 images\n          f'{url}VOCtrainval_11-May-2012.zip']  # 1.95GB, 17126 images\n  download(urls, dir=dir / 'images', delete=False, curl=True, threads=3)\n\n  # Convert\n  path = dir / 'images/VOCdevkit'\n  for year, image_set in ('2012', 'train'), ('2012', 'val'), ('2007', 'train'), ('2007', 'val'), ('2007', 'test'):\n      imgs_path = dir / 'images' / f'{image_set}{year}'\n      lbs_path = dir / 'labels' / f'{image_set}{year}'\n      imgs_path.mkdir(exist_ok=True, parents=True)\n      lbs_path.mkdir(exist_ok=True, parents=True)\n\n      with open(path / f'VOC{year}/ImageSets/Main/{image_set}.txt') as f:\n          image_ids = f.read().strip().split()\n      for id in tqdm(image_ids, desc=f'{image_set}{year}'):\n          f = path / f'VOC{year}/JPEGImages/{id}.jpg'  # old img path\n          lb_path = (lbs_path / f.name).with_suffix('.txt')  # new label path\n          f.rename(imgs_path / f.name)  # move image\n          convert_label(path, lb_path, year, id)  # convert labels to YOLO format\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/data/VisDrone.yaml",
    "content": "# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n# VisDrone2019-DET dataset https://github.com/VisDrone/VisDrone-Dataset by Tianjin University\n# Example usage: python train.py --data VisDrone.yaml\n# parent\n# ├── yolov5\n# └── datasets\n#     └── VisDrone  ← downloads here (2.3 GB)\n\n\n# Train/val/test sets as 1) dir: path/to/imgs, 2) file: path/to/imgs.txt, or 3) list: [path/to/imgs1, path/to/imgs2, ..]\npath: ../datasets/VisDrone  # dataset root dir\ntrain: VisDrone2019-DET-train/images  # train images (relative to 'path')  6471 images\nval: VisDrone2019-DET-val/images  # val images (relative to 'path')  548 images\ntest: VisDrone2019-DET-test-dev/images  # test images (optional)  1610 images\n\n# Classes\nnames:\n  0: pedestrian\n  1: people\n  2: bicycle\n  3: car\n  4: van\n  5: truck\n  6: tricycle\n  7: awning-tricycle\n  8: bus\n  9: motor\n\n\n# Download script/URL (optional) ---------------------------------------------------------------------------------------\ndownload: |\n  from utils.general import download, os, Path\n\n  def visdrone2yolo(dir):\n      from PIL import Image\n      from tqdm import tqdm\n\n      def convert_box(size, box):\n          # Convert VisDrone box to YOLO xywh box\n          dw = 1. / size[0]\n          dh = 1. / size[1]\n          return (box[0] + box[2] / 2) * dw, (box[1] + box[3] / 2) * dh, box[2] * dw, box[3] * dh\n\n      (dir / 'labels').mkdir(parents=True, exist_ok=True)  # make labels directory\n      pbar = tqdm((dir / 'annotations').glob('*.txt'), desc=f'Converting {dir}')\n      for f in pbar:\n          img_size = Image.open((dir / 'images' / f.name).with_suffix('.jpg')).size\n          lines = []\n          with open(f, 'r') as file:  # read annotation.txt\n              for row in [x.split(',') for x in file.read().strip().splitlines()]:\n                  if row[4] == '0':  # VisDrone 'ignored regions' class 0\n                      continue\n                  cls = int(row[5]) - 1\n                  box = convert_box(img_size, tuple(map(int, row[:4])))\n                  lines.append(f\"{cls} {' '.join(f'{x:.6f}' for x in box)}\\n\")\n                  with open(str(f).replace(os.sep + 'annotations' + os.sep, os.sep + 'labels' + os.sep), 'w') as fl:\n                      fl.writelines(lines)  # write label.txt\n\n\n  # Download\n  dir = Path(yaml['path'])  # dataset root dir\n  urls = ['https://github.com/ultralytics/yolov5/releases/download/v1.0/VisDrone2019-DET-train.zip',\n          'https://github.com/ultralytics/yolov5/releases/download/v1.0/VisDrone2019-DET-val.zip',\n          'https://github.com/ultralytics/yolov5/releases/download/v1.0/VisDrone2019-DET-test-dev.zip',\n          'https://github.com/ultralytics/yolov5/releases/download/v1.0/VisDrone2019-DET-test-challenge.zip']\n  download(urls, dir=dir, curl=True, threads=4)\n\n  # Convert\n  for d in 'VisDrone2019-DET-train', 'VisDrone2019-DET-val', 'VisDrone2019-DET-test-dev':\n      visdrone2yolo(dir / d)  # convert VisDrone annotations to YOLO labels\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/data/coco.yaml",
    "content": "# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n# COCO 2017 dataset http://cocodataset.org by Microsoft\n# Example usage: python train.py --data coco.yaml\n# parent\n# ├── yolov5\n# └── datasets\n#     └── coco  ← downloads here (20.1 GB)\n\n\n# Train/val/test sets as 1) dir: path/to/imgs, 2) file: path/to/imgs.txt, or 3) list: [path/to/imgs1, path/to/imgs2, ..]\npath: ../datasets/coco  # dataset root dir\ntrain: train2017.txt  # train images (relative to 'path') 118287 images\nval: val2017.txt  # val images (relative to 'path') 5000 images\ntest: test-dev2017.txt  # 20288 of 40670 images, submit to https://competitions.codalab.org/competitions/20794\n\n# Classes\nnames:\n  0: person\n  1: bicycle\n  2: car\n  3: motorcycle\n  4: airplane\n  5: bus\n  6: train\n  7: truck\n  8: boat\n  9: traffic light\n  10: fire hydrant\n  11: stop sign\n  12: parking meter\n  13: bench\n  14: bird\n  15: cat\n  16: dog\n  17: horse\n  18: sheep\n  19: cow\n  20: elephant\n  21: bear\n  22: zebra\n  23: giraffe\n  24: backpack\n  25: umbrella\n  26: handbag\n  27: tie\n  28: suitcase\n  29: frisbee\n  30: skis\n  31: snowboard\n  32: sports ball\n  33: kite\n  34: baseball bat\n  35: baseball glove\n  36: skateboard\n  37: surfboard\n  38: tennis racket\n  39: bottle\n  40: wine glass\n  41: cup\n  42: fork\n  43: knife\n  44: spoon\n  45: bowl\n  46: banana\n  47: apple\n  48: sandwich\n  49: orange\n  50: broccoli\n  51: carrot\n  52: hot dog\n  53: pizza\n  54: donut\n  55: cake\n  56: chair\n  57: couch\n  58: potted plant\n  59: bed\n  60: dining table\n  61: toilet\n  62: tv\n  63: laptop\n  64: mouse\n  65: remote\n  66: keyboard\n  67: cell phone\n  68: microwave\n  69: oven\n  70: toaster\n  71: sink\n  72: refrigerator\n  73: book\n  74: clock\n  75: vase\n  76: scissors\n  77: teddy bear\n  78: hair drier\n  79: toothbrush\n\n\n# Download script/URL (optional)\ndownload: |\n  from utils.general import download, Path\n\n\n  # Download labels\n  segments = False  # segment or box labels\n  dir = Path(yaml['path'])  # dataset root dir\n  url = 'https://github.com/ultralytics/yolov5/releases/download/v1.0/'\n  urls = [url + ('coco2017labels-segments.zip' if segments else 'coco2017labels.zip')]  # labels\n  download(urls, dir=dir.parent)\n\n  # Download data\n  urls = ['http://images.cocodataset.org/zips/train2017.zip',  # 19G, 118k images\n          'http://images.cocodataset.org/zips/val2017.zip',  # 1G, 5k images\n          'http://images.cocodataset.org/zips/test2017.zip']  # 7G, 41k images (optional)\n  download(urls, dir=dir / 'images', threads=3)\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/data/coco128-seg.yaml",
    "content": "# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n# COCO128-seg dataset https://www.kaggle.com/ultralytics/coco128 (first 128 images from COCO train2017) by Ultralytics\n# Example usage: python train.py --data coco128.yaml\n# parent\n# ├── yolov5\n# └── datasets\n#     └── coco128-seg  ← downloads here (7 MB)\n\n\n# Train/val/test sets as 1) dir: path/to/imgs, 2) file: path/to/imgs.txt, or 3) list: [path/to/imgs1, path/to/imgs2, ..]\npath: ../datasets/coco128-seg  # dataset root dir\ntrain: images/train2017  # train images (relative to 'path') 128 images\nval: images/train2017  # val images (relative to 'path') 128 images\ntest:  # test images (optional)\n\n# Classes\nnames:\n  0: person\n  1: bicycle\n  2: car\n  3: motorcycle\n  4: airplane\n  5: bus\n  6: train\n  7: truck\n  8: boat\n  9: traffic light\n  10: fire hydrant\n  11: stop sign\n  12: parking meter\n  13: bench\n  14: bird\n  15: cat\n  16: dog\n  17: horse\n  18: sheep\n  19: cow\n  20: elephant\n  21: bear\n  22: zebra\n  23: giraffe\n  24: backpack\n  25: umbrella\n  26: handbag\n  27: tie\n  28: suitcase\n  29: frisbee\n  30: skis\n  31: snowboard\n  32: sports ball\n  33: kite\n  34: baseball bat\n  35: baseball glove\n  36: skateboard\n  37: surfboard\n  38: tennis racket\n  39: bottle\n  40: wine glass\n  41: cup\n  42: fork\n  43: knife\n  44: spoon\n  45: bowl\n  46: banana\n  47: apple\n  48: sandwich\n  49: orange\n  50: broccoli\n  51: carrot\n  52: hot dog\n  53: pizza\n  54: donut\n  55: cake\n  56: chair\n  57: couch\n  58: potted plant\n  59: bed\n  60: dining table\n  61: toilet\n  62: tv\n  63: laptop\n  64: mouse\n  65: remote\n  66: keyboard\n  67: cell phone\n  68: microwave\n  69: oven\n  70: toaster\n  71: sink\n  72: refrigerator\n  73: book\n  74: clock\n  75: vase\n  76: scissors\n  77: teddy bear\n  78: hair drier\n  79: toothbrush\n\n\n# Download script/URL (optional)\ndownload: https://ultralytics.com/assets/coco128-seg.zip\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/data/coco128.yaml",
    "content": "# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n# COCO128 dataset https://www.kaggle.com/ultralytics/coco128 (first 128 images from COCO train2017) by Ultralytics\n# Example usage: python train.py --data coco128.yaml\n# parent\n# ├── yolov5\n# └── datasets\n#     └── coco128  ← downloads here (7 MB)\n\n\n# Train/val/test sets as 1) dir: path/to/imgs, 2) file: path/to/imgs.txt, or 3) list: [path/to/imgs1, path/to/imgs2, ..]\npath: ../datasets/coco128  # dataset root dir\ntrain: images/train2017  # train images (relative to 'path') 128 images\nval: images/train2017  # val images (relative to 'path') 128 images\ntest:  # test images (optional)\n\n# Classes\nnames:\n  0: person\n  1: bicycle\n  2: car\n  3: motorcycle\n  4: airplane\n  5: bus\n  6: train\n  7: truck\n  8: boat\n  9: traffic light\n  10: fire hydrant\n  11: stop sign\n  12: parking meter\n  13: bench\n  14: bird\n  15: cat\n  16: dog\n  17: horse\n  18: sheep\n  19: cow\n  20: elephant\n  21: bear\n  22: zebra\n  23: giraffe\n  24: backpack\n  25: umbrella\n  26: handbag\n  27: tie\n  28: suitcase\n  29: frisbee\n  30: skis\n  31: snowboard\n  32: sports ball\n  33: kite\n  34: baseball bat\n  35: baseball glove\n  36: skateboard\n  37: surfboard\n  38: tennis racket\n  39: bottle\n  40: wine glass\n  41: cup\n  42: fork\n  43: knife\n  44: spoon\n  45: bowl\n  46: banana\n  47: apple\n  48: sandwich\n  49: orange\n  50: broccoli\n  51: carrot\n  52: hot dog\n  53: pizza\n  54: donut\n  55: cake\n  56: chair\n  57: couch\n  58: potted plant\n  59: bed\n  60: dining table\n  61: toilet\n  62: tv\n  63: laptop\n  64: mouse\n  65: remote\n  66: keyboard\n  67: cell phone\n  68: microwave\n  69: oven\n  70: toaster\n  71: sink\n  72: refrigerator\n  73: book\n  74: clock\n  75: vase\n  76: scissors\n  77: teddy bear\n  78: hair drier\n  79: toothbrush\n\n\n# Download script/URL (optional)\ndownload: https://ultralytics.com/assets/coco128.zip\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/data/hyps/hyp.Objects365.yaml",
    "content": "# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n# Hyperparameters for Objects365 training\n# python train.py --weights yolov5m.pt --data Objects365.yaml --evolve\n# See Hyperparameter Evolution tutorial for details https://github.com/ultralytics/yolov5#tutorials\n\nlr0: 0.00258\nlrf: 0.17\nmomentum: 0.779\nweight_decay: 0.00058\nwarmup_epochs: 1.33\nwarmup_momentum: 0.86\nwarmup_bias_lr: 0.0711\nbox: 0.0539\ncls: 0.299\ncls_pw: 0.825\nobj: 0.632\nobj_pw: 1.0\niou_t: 0.2\nanchor_t: 3.44\nanchors: 3.2\nfl_gamma: 0.0\nhsv_h: 0.0188\nhsv_s: 0.704\nhsv_v: 0.36\ndegrees: 0.0\ntranslate: 0.0902\nscale: 0.491\nshear: 0.0\nperspective: 0.0\nflipud: 0.0\nfliplr: 0.5\nmosaic: 1.0\nmixup: 0.0\ncopy_paste: 0.0\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/data/hyps/hyp.VOC.yaml",
    "content": "# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n# Hyperparameters for VOC training\n# python train.py --batch 128 --weights yolov5m6.pt --data VOC.yaml --epochs 50 --img 512 --hyp hyp.scratch-med.yaml --evolve\n# See Hyperparameter Evolution tutorial for details https://github.com/ultralytics/yolov5#tutorials\n\n# YOLOv5 Hyperparameter Evolution Results\n# Best generation: 467\n# Last generation: 996\n#    metrics/precision,       metrics/recall,      metrics/mAP_0.5, metrics/mAP_0.5:0.95,         val/box_loss,         val/obj_loss,         val/cls_loss\n#              0.87729,              0.85125,              0.91286,              0.72664,            0.0076739,            0.0042529,            0.0013865\n\nlr0: 0.00334\nlrf: 0.15135\nmomentum: 0.74832\nweight_decay: 0.00025\nwarmup_epochs: 3.3835\nwarmup_momentum: 0.59462\nwarmup_bias_lr: 0.18657\nbox: 0.02\ncls: 0.21638\ncls_pw: 0.5\nobj: 0.51728\nobj_pw: 0.67198\niou_t: 0.2\nanchor_t: 3.3744\nfl_gamma: 0.0\nhsv_h: 0.01041\nhsv_s: 0.54703\nhsv_v: 0.27739\ndegrees: 0.0\ntranslate: 0.04591\nscale: 0.75544\nshear: 0.0\nperspective: 0.0\nflipud: 0.0\nfliplr: 0.5\nmosaic: 0.85834\nmixup: 0.04266\ncopy_paste: 0.0\nanchors: 3.412\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/data/hyps/hyp.no-augmentation.yaml",
    "content": "# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n# Hyperparameters when using Albumentations frameworks\n# python train.py --hyp hyp.no-augmentation.yaml\n# See https://github.com/ultralytics/yolov5/pull/3882 for YOLOv5 + Albumentations Usage examples\n\nlr0: 0.01  # initial learning rate (SGD=1E-2, Adam=1E-3)\nlrf: 0.1  # final OneCycleLR learning rate (lr0 * lrf)\nmomentum: 0.937  # SGD momentum/Adam beta1\nweight_decay: 0.0005  # optimizer weight decay 5e-4\nwarmup_epochs: 3.0  # warmup epochs (fractions ok)\nwarmup_momentum: 0.8  # warmup initial momentum\nwarmup_bias_lr: 0.1  # warmup initial bias lr\nbox: 0.05  # box loss gain\ncls: 0.3  # cls loss gain\ncls_pw: 1.0  # cls BCELoss positive_weight\nobj: 0.7  # obj loss gain (scale with pixels)\nobj_pw: 1.0  # obj BCELoss positive_weight\niou_t: 0.20  # IoU training threshold\nanchor_t: 4.0  # anchor-multiple threshold\n# anchors: 3  # anchors per output layer (0 to ignore)\n# this parameters are all zero since we want to use albumentation framework\nfl_gamma: 0.0  # focal loss gamma (efficientDet default gamma=1.5)\nhsv_h: 0  # image HSV-Hue augmentation (fraction)\nhsv_s: 00  # image HSV-Saturation augmentation (fraction)\nhsv_v: 0  # image HSV-Value augmentation (fraction)\ndegrees: 0.0  # image rotation (+/- deg)\ntranslate: 0  # image translation (+/- fraction)\nscale: 0  # image scale (+/- gain)\nshear: 0  # image shear (+/- deg)\nperspective: 0.0  # image perspective (+/- fraction), range 0-0.001\nflipud: 0.0  # image flip up-down (probability)\nfliplr: 0.0  # image flip left-right (probability)\nmosaic: 0.0  # image mosaic (probability)\nmixup: 0.0  # image mixup (probability)\ncopy_paste: 0.0  # segment copy-paste (probability)\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/data/hyps/hyp.scratch-high.yaml",
    "content": "# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n# Hyperparameters for high-augmentation COCO training from scratch\n# python train.py --batch 32 --cfg yolov5m6.yaml --weights '' --data coco.yaml --img 1280 --epochs 300\n# See tutorials for hyperparameter evolution https://github.com/ultralytics/yolov5#tutorials\n\nlr0: 0.01  # initial learning rate (SGD=1E-2, Adam=1E-3)\nlrf: 0.1  # final OneCycleLR learning rate (lr0 * lrf)\nmomentum: 0.937  # SGD momentum/Adam beta1\nweight_decay: 0.0005  # optimizer weight decay 5e-4\nwarmup_epochs: 3.0  # warmup epochs (fractions ok)\nwarmup_momentum: 0.8  # warmup initial momentum\nwarmup_bias_lr: 0.1  # warmup initial bias lr\nbox: 0.05  # box loss gain\ncls: 0.3  # cls loss gain\ncls_pw: 1.0  # cls BCELoss positive_weight\nobj: 0.7  # obj loss gain (scale with pixels)\nobj_pw: 1.0  # obj BCELoss positive_weight\niou_t: 0.20  # IoU training threshold\nanchor_t: 4.0  # anchor-multiple threshold\n# anchors: 3  # anchors per output layer (0 to ignore)\nfl_gamma: 0.0  # focal loss gamma (efficientDet default gamma=1.5)\nhsv_h: 0.015  # image HSV-Hue augmentation (fraction)\nhsv_s: 0.7  # image HSV-Saturation augmentation (fraction)\nhsv_v: 0.4  # image HSV-Value augmentation (fraction)\ndegrees: 0.0  # image rotation (+/- deg)\ntranslate: 0.1  # image translation (+/- fraction)\nscale: 0.9  # image scale (+/- gain)\nshear: 0.0  # image shear (+/- deg)\nperspective: 0.0  # image perspective (+/- fraction), range 0-0.001\nflipud: 0.0  # image flip up-down (probability)\nfliplr: 0.5  # image flip left-right (probability)\nmosaic: 1.0  # image mosaic (probability)\nmixup: 0.1  # image mixup (probability)\ncopy_paste: 0.1  # segment copy-paste (probability)\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/data/hyps/hyp.scratch-low.yaml",
    "content": "# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n# Hyperparameters for low-augmentation COCO training from scratch\n# python train.py --batch 64 --cfg yolov5n6.yaml --weights '' --data coco.yaml --img 640 --epochs 300 --linear\n# See tutorials for hyperparameter evolution https://github.com/ultralytics/yolov5#tutorials\n\nlr0: 0.01  # initial learning rate (SGD=1E-2, Adam=1E-3)\nlrf: 0.01  # final OneCycleLR learning rate (lr0 * lrf)\nmomentum: 0.937  # SGD momentum/Adam beta1\nweight_decay: 0.0005  # optimizer weight decay 5e-4\nwarmup_epochs: 3.0  # warmup epochs (fractions ok)\nwarmup_momentum: 0.8  # warmup initial momentum\nwarmup_bias_lr: 0.1  # warmup initial bias lr\nbox: 0.05  # box loss gain\ncls: 0.5  # cls loss gain\ncls_pw: 1.0  # cls BCELoss positive_weight\nobj: 1.0  # obj loss gain (scale with pixels)\nobj_pw: 1.0  # obj BCELoss positive_weight\niou_t: 0.20  # IoU training threshold\nanchor_t: 4.0  # anchor-multiple threshold\n# anchors: 3  # anchors per output layer (0 to ignore)\nfl_gamma: 0.0  # focal loss gamma (efficientDet default gamma=1.5)\nhsv_h: 0.015  # image HSV-Hue augmentation (fraction)\nhsv_s: 0.7  # image HSV-Saturation augmentation (fraction)\nhsv_v: 0.4  # image HSV-Value augmentation (fraction)\ndegrees: 0.0  # image rotation (+/- deg)\ntranslate: 0.1  # image translation (+/- fraction)\nscale: 0.5  # image scale (+/- gain)\nshear: 0.0  # image shear (+/- deg)\nperspective: 0.0  # image perspective (+/- fraction), range 0-0.001\nflipud: 0.0  # image flip up-down (probability)\nfliplr: 0.5  # image flip left-right (probability)\nmosaic: 1.0  # image mosaic (probability)\nmixup: 0.0  # image mixup (probability)\ncopy_paste: 0.0  # segment copy-paste (probability)\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/data/hyps/hyp.scratch-med.yaml",
    "content": "# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n# Hyperparameters for medium-augmentation COCO training from scratch\n# python train.py --batch 32 --cfg yolov5m6.yaml --weights '' --data coco.yaml --img 1280 --epochs 300\n# See tutorials for hyperparameter evolution https://github.com/ultralytics/yolov5#tutorials\n\nlr0: 0.01  # initial learning rate (SGD=1E-2, Adam=1E-3)\nlrf: 0.1  # final OneCycleLR learning rate (lr0 * lrf)\nmomentum: 0.937  # SGD momentum/Adam beta1\nweight_decay: 0.0005  # optimizer weight decay 5e-4\nwarmup_epochs: 3.0  # warmup epochs (fractions ok)\nwarmup_momentum: 0.8  # warmup initial momentum\nwarmup_bias_lr: 0.1  # warmup initial bias lr\nbox: 0.05  # box loss gain\ncls: 0.3  # cls loss gain\ncls_pw: 1.0  # cls BCELoss positive_weight\nobj: 0.7  # obj loss gain (scale with pixels)\nobj_pw: 1.0  # obj BCELoss positive_weight\niou_t: 0.20  # IoU training threshold\nanchor_t: 4.0  # anchor-multiple threshold\n# anchors: 3  # anchors per output layer (0 to ignore)\nfl_gamma: 0.0  # focal loss gamma (efficientDet default gamma=1.5)\nhsv_h: 0.015  # image HSV-Hue augmentation (fraction)\nhsv_s: 0.7  # image HSV-Saturation augmentation (fraction)\nhsv_v: 0.4  # image HSV-Value augmentation (fraction)\ndegrees: 0.0  # image rotation (+/- deg)\ntranslate: 0.1  # image translation (+/- fraction)\nscale: 0.9  # image scale (+/- gain)\nshear: 0.0  # image shear (+/- deg)\nperspective: 0.0  # image perspective (+/- fraction), range 0-0.001\nflipud: 0.0  # image flip up-down (probability)\nfliplr: 0.5  # image flip left-right (probability)\nmosaic: 1.0  # image mosaic (probability)\nmixup: 0.1  # image mixup (probability)\ncopy_paste: 0.0  # segment copy-paste (probability)\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/data/scripts/download_weights.sh",
    "content": "#!/bin/bash\n# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n# Download latest models from https://github.com/ultralytics/yolov5/releases\n# Example usage: bash data/scripts/download_weights.sh\n# parent\n# └── yolov5\n#     ├── yolov5s.pt  ← downloads here\n#     ├── yolov5m.pt\n#     └── ...\n\npython - <<EOF\nfrom utils.downloads import attempt_download\n\np5 = list('nsmlx')  # P5 models\np6 = [f'{x}6' for x in p5]  # P6 models\ncls = [f'{x}-cls' for x in p5]  # classification models\nseg = [f'{x}-seg' for x in p5]  # classification models\n\nfor x in p5 + p6 + cls + seg:\n    attempt_download(f'weights/yolov5{x}.pt')\n\nEOF\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/data/scripts/get_coco.sh",
    "content": "#!/bin/bash\n# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n# Download COCO 2017 dataset http://cocodataset.org\n# Example usage: bash data/scripts/get_coco.sh\n# parent\n# ├── yolov5\n# └── datasets\n#     └── coco  ← downloads here\n\n# Arguments (optional) Usage: bash data/scripts/get_coco.sh --train --val --test --segments\nif [ \"$#\" -gt 0 ]; then\n  for opt in \"$@\"; do\n    case \"${opt}\" in\n    --train) train=true ;;\n    --val) val=true ;;\n    --test) test=true ;;\n    --segments) segments=true ;;\n    esac\n  done\nelse\n  train=true\n  val=true\n  test=false\n  segments=false\nfi\n\n# Download/unzip labels\nd='../datasets' # unzip directory\nurl=https://github.com/ultralytics/yolov5/releases/download/v1.0/\nif [ \"$segments\" == \"true\" ]; then\n  f='coco2017labels-segments.zip' # 168 MB\nelse\n  f='coco2017labels.zip' # 46 MB\nfi\necho 'Downloading' $url$f ' ...'\ncurl -L $url$f -o $f -# && unzip -q $f -d $d && rm $f &\n\n# Download/unzip images\nd='../datasets/coco/images' # unzip directory\nurl=http://images.cocodataset.org/zips/\nif [ \"$train\" == \"true\" ]; then\n  f='train2017.zip' # 19G, 118k images\n  echo 'Downloading' $url$f '...'\n  curl -L $url$f -o $f -# && unzip -q $f -d $d && rm $f &\nfi\nif [ \"$val\" == \"true\" ]; then\n  f='val2017.zip' # 1G, 5k images\n  echo 'Downloading' $url$f '...'\n  curl -L $url$f -o $f -# && unzip -q $f -d $d && rm $f &\nfi\nif [ \"$test\" == \"true\" ]; then\n  f='test2017.zip' # 7G, 41k images (optional)\n  echo 'Downloading' $url$f '...'\n  curl -L $url$f -o $f -# && unzip -q $f -d $d && rm $f &\nfi\nwait # finish background tasks\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/data/scripts/get_coco128.sh",
    "content": "#!/bin/bash\n# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n# Download COCO128 dataset https://www.kaggle.com/ultralytics/coco128 (first 128 images from COCO train2017)\n# Example usage: bash data/scripts/get_coco128.sh\n# parent\n# ├── yolov5\n# └── datasets\n#     └── coco128  ← downloads here\n\n# Download/unzip images and labels\nd='../datasets' # unzip directory\nurl=https://github.com/ultralytics/yolov5/releases/download/v1.0/\nf='coco128.zip' # or 'coco128-segments.zip', 68 MB\necho 'Downloading' $url$f ' ...'\ncurl -L $url$f -o $f -# && unzip -q $f -d $d && rm $f &\n\nwait # finish background tasks\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/data/scripts/get_imagenet.sh",
    "content": "#!/bin/bash\n# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n# Download ILSVRC2012 ImageNet dataset https://image-net.org\n# Example usage: bash data/scripts/get_imagenet.sh\n# parent\n# ├── yolov5\n# └── datasets\n#     └── imagenet  ← downloads here\n\n# Arguments (optional) Usage: bash data/scripts/get_imagenet.sh --train --val\nif [ \"$#\" -gt 0 ]; then\n  for opt in \"$@\"; do\n    case \"${opt}\" in\n    --train) train=true ;;\n    --val) val=true ;;\n    esac\n  done\nelse\n  train=true\n  val=true\nfi\n\n# Make dir\nd='../datasets/imagenet' # unzip directory\nmkdir -p $d && cd $d\n\n# Download/unzip train\nif [ \"$train\" == \"true\" ]; then\n  wget https://image-net.org/data/ILSVRC/2012/ILSVRC2012_img_train.tar # download 138G, 1281167 images\n  mkdir train && mv ILSVRC2012_img_train.tar train/ && cd train\n  tar -xf ILSVRC2012_img_train.tar && rm -f ILSVRC2012_img_train.tar\n  find . -name \"*.tar\" | while read NAME; do\n    mkdir -p \"${NAME%.tar}\"\n    tar -xf \"${NAME}\" -C \"${NAME%.tar}\"\n    rm -f \"${NAME}\"\n  done\n  cd ..\nfi\n\n# Download/unzip val\nif [ \"$val\" == \"true\" ]; then\n  wget https://image-net.org/data/ILSVRC/2012/ILSVRC2012_img_val.tar # download 6.3G, 50000 images\n  mkdir val && mv ILSVRC2012_img_val.tar val/ && cd val && tar -xf ILSVRC2012_img_val.tar\n  wget -qO- https://raw.githubusercontent.com/soumith/imagenetloader.torch/master/valprep.sh | bash # move into subdirs\nfi\n\n# Delete corrupted image (optional: PNG under JPEG name that may cause dataloaders to fail)\n# rm train/n04266014/n04266014_10835.JPEG\n\n# TFRecords (optional)\n# wget https://raw.githubusercontent.com/tensorflow/models/master/research/slim/datasets/imagenet_lsvrc_2015_synsets.txt\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/data/xView.yaml",
    "content": "# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n# DIUx xView 2018 Challenge https://challenge.xviewdataset.org by U.S. National Geospatial-Intelligence Agency (NGA)\n# --------  DOWNLOAD DATA MANUALLY and jar xf val_images.zip to 'datasets/xView' before running train command!  --------\n# Example usage: python train.py --data xView.yaml\n# parent\n# ├── yolov5\n# └── datasets\n#     └── xView  ← downloads here (20.7 GB)\n\n\n# Train/val/test sets as 1) dir: path/to/imgs, 2) file: path/to/imgs.txt, or 3) list: [path/to/imgs1, path/to/imgs2, ..]\npath: ../datasets/xView  # dataset root dir\ntrain: images/autosplit_train.txt  # train images (relative to 'path') 90% of 847 train images\nval: images/autosplit_val.txt  # train images (relative to 'path') 10% of 847 train images\n\n# Classes\nnames:\n  0: Fixed-wing Aircraft\n  1: Small Aircraft\n  2: Cargo Plane\n  3: Helicopter\n  4: Passenger Vehicle\n  5: Small Car\n  6: Bus\n  7: Pickup Truck\n  8: Utility Truck\n  9: Truck\n  10: Cargo Truck\n  11: Truck w/Box\n  12: Truck Tractor\n  13: Trailer\n  14: Truck w/Flatbed\n  15: Truck w/Liquid\n  16: Crane Truck\n  17: Railway Vehicle\n  18: Passenger Car\n  19: Cargo Car\n  20: Flat Car\n  21: Tank car\n  22: Locomotive\n  23: Maritime Vessel\n  24: Motorboat\n  25: Sailboat\n  26: Tugboat\n  27: Barge\n  28: Fishing Vessel\n  29: Ferry\n  30: Yacht\n  31: Container Ship\n  32: Oil Tanker\n  33: Engineering Vehicle\n  34: Tower crane\n  35: Container Crane\n  36: Reach Stacker\n  37: Straddle Carrier\n  38: Mobile Crane\n  39: Dump Truck\n  40: Haul Truck\n  41: Scraper/Tractor\n  42: Front loader/Bulldozer\n  43: Excavator\n  44: Cement Mixer\n  45: Ground Grader\n  46: Hut/Tent\n  47: Shed\n  48: Building\n  49: Aircraft Hangar\n  50: Damaged Building\n  51: Facility\n  52: Construction Site\n  53: Vehicle Lot\n  54: Helipad\n  55: Storage Tank\n  56: Shipping container lot\n  57: Shipping Container\n  58: Pylon\n  59: Tower\n\n\n# Download script/URL (optional) ---------------------------------------------------------------------------------------\ndownload: |\n  import json\n  import os\n  from pathlib import Path\n\n  import numpy as np\n  from PIL import Image\n  from tqdm import tqdm\n\n  from utils.dataloaders import autosplit\n  from utils.general import download, xyxy2xywhn\n\n\n  def convert_labels(fname=Path('xView/xView_train.geojson')):\n      # Convert xView geoJSON labels to YOLO format\n      path = fname.parent\n      with open(fname) as f:\n          print(f'Loading {fname}...')\n          data = json.load(f)\n\n      # Make dirs\n      labels = Path(path / 'labels' / 'train')\n      os.system(f'rm -rf {labels}')\n      labels.mkdir(parents=True, exist_ok=True)\n\n      # xView classes 11-94 to 0-59\n      xview_class2index = [-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 0, 1, 2, -1, 3, -1, 4, 5, 6, 7, 8, -1, 9, 10, 11,\n                           12, 13, 14, 15, -1, -1, 16, 17, 18, 19, 20, 21, 22, -1, 23, 24, 25, -1, 26, 27, -1, 28, -1,\n                           29, 30, 31, 32, 33, 34, 35, 36, 37, -1, 38, 39, 40, 41, 42, 43, 44, 45, -1, -1, -1, -1, 46,\n                           47, 48, 49, -1, 50, 51, -1, 52, -1, -1, -1, 53, 54, -1, 55, -1, -1, 56, -1, 57, -1, 58, 59]\n\n      shapes = {}\n      for feature in tqdm(data['features'], desc=f'Converting {fname}'):\n          p = feature['properties']\n          if p['bounds_imcoords']:\n              id = p['image_id']\n              file = path / 'train_images' / id\n              if file.exists():  # 1395.tif missing\n                  try:\n                      box = np.array([int(num) for num in p['bounds_imcoords'].split(\",\")])\n                      assert box.shape[0] == 4, f'incorrect box shape {box.shape[0]}'\n                      cls = p['type_id']\n                      cls = xview_class2index[int(cls)]  # xView class to 0-60\n                      assert 59 >= cls >= 0, f'incorrect class index {cls}'\n\n                      # Write YOLO label\n                      if id not in shapes:\n                          shapes[id] = Image.open(file).size\n                      box = xyxy2xywhn(box[None].astype(np.float), w=shapes[id][0], h=shapes[id][1], clip=True)\n                      with open((labels / id).with_suffix('.txt'), 'a') as f:\n                          f.write(f\"{cls} {' '.join(f'{x:.6f}' for x in box[0])}\\n\")  # write label.txt\n                  except Exception as e:\n                      print(f'WARNING: skipping one label for {file}: {e}')\n\n\n  # Download manually from https://challenge.xviewdataset.org\n  dir = Path(yaml['path'])  # dataset root dir\n  # urls = ['https://d307kc0mrhucc3.cloudfront.net/train_labels.zip',  # train labels\n  #         'https://d307kc0mrhucc3.cloudfront.net/train_images.zip',  # 15G, 847 train images\n  #         'https://d307kc0mrhucc3.cloudfront.net/val_images.zip']  # 5G, 282 val images (no labels)\n  # download(urls, dir=dir, delete=False)\n\n  # Convert labels\n  convert_labels(dir / 'xView_train.geojson')\n\n  # Move images\n  images = Path(dir / 'images')\n  images.mkdir(parents=True, exist_ok=True)\n  Path(dir / 'train_images').rename(dir / 'images' / 'train')\n  Path(dir / 'val_images').rename(dir / 'images' / 'val')\n\n  # Split\n  autosplit(dir / 'images' / 'train')\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/detect.py",
    "content": "# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n\"\"\"\nRun YOLOv5 detection inference on images, videos, directories, globs, YouTube, webcam, streams, etc.\n\nUsage - sources:\n    $ python detect.py --weights yolov5s.pt --source 0                               # webcam\n                                                     img.jpg                         # image\n                                                     vid.mp4                         # video\n                                                     screen                          # screenshot\n                                                     path/                           # directory\n                                                     list.txt                        # list of images\n                                                     list.streams                    # list of streams\n                                                     'path/*.jpg'                    # glob\n                                                     'https://youtu.be/Zgi9g1ksQHc'  # YouTube\n                                                     'rtsp://example.com/media.mp4'  # RTSP, RTMP, HTTP stream\n\nUsage - formats:\n    $ python detect.py --weights yolov5s.pt                 # PyTorch\n                                 yolov5s.torchscript        # TorchScript\n                                 yolov5s.onnx               # ONNX Runtime or OpenCV DNN with --dnn\n                                 yolov5s_openvino_model     # OpenVINO\n                                 yolov5s.engine             # TensorRT\n                                 yolov5s.mlmodel            # CoreML (macOS-only)\n                                 yolov5s_saved_model        # TensorFlow SavedModel\n                                 yolov5s.pb                 # TensorFlow GraphDef\n                                 yolov5s.tflite             # TensorFlow Lite\n                                 yolov5s_edgetpu.tflite     # TensorFlow Edge TPU\n                                 yolov5s_paddle_model       # PaddlePaddle\n\"\"\"\n\nimport argparse\nimport os\nimport platform\nimport sys\nfrom pathlib import Path\n\nimport torch\n\nFILE = Path(__file__).resolve()\nROOT = FILE.parents[0]  # YOLOv5 root directory\nif str(ROOT) not in sys.path:\n    sys.path.append(str(ROOT))  # add ROOT to PATH\nROOT = Path(os.path.relpath(ROOT, Path.cwd()))  # relative\n\nfrom models.common import DetectMultiBackend\nfrom utils.dataloaders import IMG_FORMATS, VID_FORMATS, LoadImages, LoadScreenshots, LoadStreams\nfrom utils.general import (LOGGER, Profile, check_file, check_img_size, check_imshow, check_requirements, colorstr, cv2,\n                           increment_path, non_max_suppression, print_args, scale_boxes, strip_optimizer, xyxy2xywh)\nfrom utils.plots import Annotator, colors, save_one_box\nfrom utils.torch_utils import select_device, smart_inference_mode\n\n\n@smart_inference_mode()\ndef run(\n        weights=ROOT / 'yolov5s.pt',  # model path or triton URL\n        source=ROOT / 'data/images',  # file/dir/URL/glob/screen/0(webcam)\n        data=ROOT / 'data/coco128.yaml',  # dataset.yaml path\n        imgsz=(640, 640),  # inference size (height, width)\n        conf_thres=0.25,  # confidence threshold\n        iou_thres=0.45,  # NMS IOU threshold\n        max_det=1000,  # maximum detections per image\n        device='',  # cuda device, i.e. 0 or 0,1,2,3 or cpu\n        view_img=False,  # show results\n        save_txt=False,  # save results to *.txt\n        save_conf=False,  # save confidences in --save-txt labels\n        save_crop=False,  # save cropped prediction boxes\n        nosave=False,  # do not save images/videos\n        classes=None,  # filter by class: --class 0, or --class 0 2 3\n        agnostic_nms=False,  # class-agnostic NMS\n        augment=False,  # augmented inference\n        visualize=False,  # visualize features\n        update=False,  # update all models\n        project=ROOT / 'runs/detect',  # save results to project/name\n        name='exp',  # save results to project/name\n        exist_ok=False,  # existing project/name ok, do not increment\n        line_thickness=3,  # bounding box thickness (pixels)\n        hide_labels=False,  # hide labels\n        hide_conf=False,  # hide confidences\n        half=False,  # use FP16 half-precision inference\n        dnn=False,  # use OpenCV DNN for ONNX inference\n        vid_stride=1,  # video frame-rate stride\n):\n    source = str(source)\n    save_img = not nosave and not source.endswith('.txt')  # save inference images\n    is_file = Path(source).suffix[1:] in (IMG_FORMATS + VID_FORMATS)\n    is_url = source.lower().startswith(('rtsp://', 'rtmp://', 'http://', 'https://'))\n    webcam = source.isnumeric() or source.endswith('.streams') or (is_url and not is_file)\n    screenshot = source.lower().startswith('screen')\n    if is_url and is_file:\n        source = check_file(source)  # download\n\n    # Directories\n    save_dir = increment_path(Path(project) / name, exist_ok=exist_ok)  # increment run\n    (save_dir / 'labels' if save_txt else save_dir).mkdir(parents=True, exist_ok=True)  # make dir\n\n    # Load model\n    device = select_device(device)\n    model = DetectMultiBackend(weights, device=device, dnn=dnn, data=data, fp16=half)\n    stride, names, pt = model.stride, model.names, model.pt\n    imgsz = check_img_size(imgsz, s=stride)  # check image size\n\n    # Dataloader\n    bs = 1  # batch_size\n    if webcam:\n        view_img = check_imshow(warn=True)\n        dataset = LoadStreams(source, img_size=imgsz, stride=stride, auto=pt, vid_stride=vid_stride)\n        bs = len(dataset)\n    elif screenshot:\n        dataset = LoadScreenshots(source, img_size=imgsz, stride=stride, auto=pt)\n    else:\n        dataset = LoadImages(source, img_size=imgsz, stride=stride, auto=pt, vid_stride=vid_stride)\n    vid_path, vid_writer = [None] * bs, [None] * bs\n\n    # Run inference\n    model.warmup(imgsz=(1 if pt or model.triton else bs, 3, *imgsz))  # warmup\n    seen, windows, dt = 0, [], (Profile(), Profile(), Profile())\n    for path, im, im0s, vid_cap, s in dataset:\n        with dt[0]:\n            im = torch.from_numpy(im).to(model.device)\n            im = im.half() if model.fp16 else im.float()  # uint8 to fp16/32\n            im /= 255  # 0 - 255 to 0.0 - 1.0\n            if len(im.shape) == 3:\n                im = im[None]  # expand for batch dim\n\n        # Inference\n        with dt[1]:\n            visualize = increment_path(save_dir / Path(path).stem, mkdir=True) if visualize else False\n            pred = model(im, augment=augment, visualize=visualize)\n\n        # NMS\n        with dt[2]:\n            pred = non_max_suppression(pred, conf_thres, iou_thres, classes, agnostic_nms, max_det=max_det)\n\n        # Second-stage classifier (optional)\n        # pred = utils.general.apply_classifier(pred, classifier_model, im, im0s)\n\n        # Process predictions\n        for i, det in enumerate(pred):  # per image\n            seen += 1\n            if webcam:  # batch_size >= 1\n                p, im0, frame = path[i], im0s[i].copy(), dataset.count\n                s += f'{i}: '\n            else:\n                p, im0, frame = path, im0s.copy(), getattr(dataset, 'frame', 0)\n\n            p = Path(p)  # to Path\n            save_path = str(save_dir / p.name)  # im.jpg\n            txt_path = str(save_dir / 'labels' / p.stem) + ('' if dataset.mode == 'image' else f'_{frame}')  # im.txt\n            s += '%gx%g ' % im.shape[2:]  # print string\n            gn = torch.tensor(im0.shape)[[1, 0, 1, 0]]  # normalization gain whwh\n            imc = im0.copy() if save_crop else im0  # for save_crop\n            annotator = Annotator(im0, line_width=line_thickness, example=str(names))\n            if len(det):\n                # Rescale boxes from img_size to im0 size\n                det[:, :4] = scale_boxes(im.shape[2:], det[:, :4], im0.shape).round()\n\n                # Print results\n                for c in det[:, 5].unique():\n                    n = (det[:, 5] == c).sum()  # detections per class\n                    s += f\"{n} {names[int(c)]}{'s' * (n > 1)}, \"  # add to string\n\n                # Write results\n                for *xyxy, conf, cls in reversed(det):\n                    if save_txt:  # Write to file\n                        xywh = (xyxy2xywh(torch.tensor(xyxy).view(1, 4)) / gn).view(-1).tolist()  # normalized xywh\n                        line = (cls, *xywh, conf) if save_conf else (cls, *xywh)  # label format\n                        with open(f'{txt_path}.txt', 'a') as f:\n                            f.write(('%g ' * len(line)).rstrip() % line + '\\n')\n\n                    if save_img or save_crop or view_img:  # Add bbox to image\n                        c = int(cls)  # integer class\n                        label = None if hide_labels else (names[c] if hide_conf else f'{names[c]} {conf:.2f}')\n                        annotator.box_label(xyxy, label, color=colors(c, True))\n                    if save_crop:\n                        save_one_box(xyxy, imc, file=save_dir / 'crops' / names[c] / f'{p.stem}.jpg', BGR=True)\n\n            # Stream results\n            im0 = annotator.result()\n            if view_img:\n                if platform.system() == 'Linux' and p not in windows:\n                    windows.append(p)\n                    cv2.namedWindow(str(p), cv2.WINDOW_NORMAL | cv2.WINDOW_KEEPRATIO)  # allow window resize (Linux)\n                    cv2.resizeWindow(str(p), im0.shape[1], im0.shape[0])\n                cv2.imshow(str(p), im0)\n                cv2.waitKey(1)  # 1 millisecond\n\n            # Save results (image with detections)\n            if save_img:\n                if dataset.mode == 'image':\n                    cv2.imwrite(save_path, im0)\n                else:  # 'video' or 'stream'\n                    if vid_path[i] != save_path:  # new video\n                        vid_path[i] = save_path\n                        if isinstance(vid_writer[i], cv2.VideoWriter):\n                            vid_writer[i].release()  # release previous video writer\n                        if vid_cap:  # video\n                            fps = vid_cap.get(cv2.CAP_PROP_FPS)\n                            w = int(vid_cap.get(cv2.CAP_PROP_FRAME_WIDTH))\n                            h = int(vid_cap.get(cv2.CAP_PROP_FRAME_HEIGHT))\n                        else:  # stream\n                            fps, w, h = 30, im0.shape[1], im0.shape[0]\n                        save_path = str(Path(save_path).with_suffix('.mp4'))  # force *.mp4 suffix on results videos\n                        vid_writer[i] = cv2.VideoWriter(save_path, cv2.VideoWriter_fourcc(*'mp4v'), fps, (w, h))\n                    vid_writer[i].write(im0)\n\n        # Print time (inference-only)\n        LOGGER.info(f\"{s}{'' if len(det) else '(no detections), '}{dt[1].dt * 1E3:.1f}ms\")\n\n    # Print results\n    t = tuple(x.t / seen * 1E3 for x in dt)  # speeds per image\n    LOGGER.info(f'Speed: %.1fms pre-process, %.1fms inference, %.1fms NMS per image at shape {(1, 3, *imgsz)}' % t)\n    if save_txt or save_img:\n        s = f\"\\n{len(list(save_dir.glob('labels/*.txt')))} labels saved to {save_dir / 'labels'}\" if save_txt else ''\n        LOGGER.info(f\"Results saved to {colorstr('bold', save_dir)}{s}\")\n    if update:\n        strip_optimizer(weights[0])  # update model (to fix SourceChangeWarning)\n\n\ndef parse_opt():\n    parser = argparse.ArgumentParser()\n    parser.add_argument('--weights', nargs='+', type=str, default=ROOT / 'yolov5s.pt', help='model path or triton URL')\n    parser.add_argument('--source', type=str, default=ROOT / 'data/images', help='file/dir/URL/glob/screen/0(webcam)')\n    parser.add_argument('--data', type=str, default=ROOT / 'data/coco128.yaml', help='(optional) dataset.yaml path')\n    parser.add_argument('--imgsz', '--img', '--img-size', nargs='+', type=int, default=[640], help='inference size h,w')\n    parser.add_argument('--conf-thres', type=float, default=0.25, help='confidence threshold')\n    parser.add_argument('--iou-thres', type=float, default=0.45, help='NMS IoU threshold')\n    parser.add_argument('--max-det', type=int, default=1000, help='maximum detections per image')\n    parser.add_argument('--device', default='', help='cuda device, i.e. 0 or 0,1,2,3 or cpu')\n    parser.add_argument('--view-img', action='store_true', help='show results')\n    parser.add_argument('--save-txt', action='store_true', help='save results to *.txt')\n    parser.add_argument('--save-conf', action='store_true', help='save confidences in --save-txt labels')\n    parser.add_argument('--save-crop', action='store_true', help='save cropped prediction boxes')\n    parser.add_argument('--nosave', action='store_true', help='do not save images/videos')\n    parser.add_argument('--classes', nargs='+', type=int, help='filter by class: --classes 0, or --classes 0 2 3')\n    parser.add_argument('--agnostic-nms', action='store_true', help='class-agnostic NMS')\n    parser.add_argument('--augment', action='store_true', help='augmented inference')\n    parser.add_argument('--visualize', action='store_true', help='visualize features')\n    parser.add_argument('--update', action='store_true', help='update all models')\n    parser.add_argument('--project', default=ROOT / 'runs/detect', help='save results to project/name')\n    parser.add_argument('--name', default='exp', help='save results to project/name')\n    parser.add_argument('--exist-ok', action='store_true', help='existing project/name ok, do not increment')\n    parser.add_argument('--line-thickness', default=3, type=int, help='bounding box thickness (pixels)')\n    parser.add_argument('--hide-labels', default=False, action='store_true', help='hide labels')\n    parser.add_argument('--hide-conf', default=False, action='store_true', help='hide confidences')\n    parser.add_argument('--half', action='store_true', help='use FP16 half-precision inference')\n    parser.add_argument('--dnn', action='store_true', help='use OpenCV DNN for ONNX inference')\n    parser.add_argument('--vid-stride', type=int, default=1, help='video frame-rate stride')\n    opt = parser.parse_args()\n    opt.imgsz *= 2 if len(opt.imgsz) == 1 else 1  # expand\n    print_args(vars(opt))\n    return opt\n\n\ndef main(opt):\n    check_requirements(exclude=('tensorboard', 'thop'))\n    run(**vars(opt))\n\n\nif __name__ == '__main__':\n    opt = parse_opt()\n    main(opt)\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/export.py",
    "content": "# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n\"\"\"\nExport a YOLOv5 PyTorch model to other formats. TensorFlow exports authored by https://github.com/zldrobit\n\nFormat                      | `export.py --include`         | Model\n---                         | ---                           | ---\nPyTorch                     | -                             | yolov5s.pt\nTorchScript                 | `torchscript`                 | yolov5s.torchscript\nONNX                        | `onnx`                        | yolov5s.onnx\nOpenVINO                    | `openvino`                    | yolov5s_openvino_model/\nTensorRT                    | `engine`                      | yolov5s.engine\nCoreML                      | `coreml`                      | yolov5s.mlmodel\nTensorFlow SavedModel       | `saved_model`                 | yolov5s_saved_model/\nTensorFlow GraphDef         | `pb`                          | yolov5s.pb\nTensorFlow Lite             | `tflite`                      | yolov5s.tflite\nTensorFlow Edge TPU         | `edgetpu`                     | yolov5s_edgetpu.tflite\nTensorFlow.js               | `tfjs`                        | yolov5s_web_model/\nPaddlePaddle                | `paddle`                      | yolov5s_paddle_model/\n\nRequirements:\n    $ pip install -r requirements.txt coremltools onnx onnx-simplifier onnxruntime openvino-dev tensorflow-cpu  # CPU\n    $ pip install -r requirements.txt coremltools onnx onnx-simplifier onnxruntime-gpu openvino-dev tensorflow  # GPU\n\nUsage:\n    $ python export.py --weights yolov5s.pt --include torchscript onnx openvino engine coreml tflite ...\n\nInference:\n    $ python detect.py --weights yolov5s.pt                 # PyTorch\n                                 yolov5s.torchscript        # TorchScript\n                                 yolov5s.onnx               # ONNX Runtime or OpenCV DNN with --dnn\n                                 yolov5s_openvino_model     # OpenVINO\n                                 yolov5s.engine             # TensorRT\n                                 yolov5s.mlmodel            # CoreML (macOS-only)\n                                 yolov5s_saved_model        # TensorFlow SavedModel\n                                 yolov5s.pb                 # TensorFlow GraphDef\n                                 yolov5s.tflite             # TensorFlow Lite\n                                 yolov5s_edgetpu.tflite     # TensorFlow Edge TPU\n                                 yolov5s_paddle_model       # PaddlePaddle\n\nTensorFlow.js:\n    $ cd .. && git clone https://github.com/zldrobit/tfjs-yolov5-example.git && cd tfjs-yolov5-example\n    $ npm install\n    $ ln -s ../../yolov5/yolov5s_web_model public/yolov5s_web_model\n    $ npm start\n\"\"\"\n\nimport argparse\nimport contextlib\nimport json\nimport os\nimport platform\nimport re\nimport subprocess\nimport sys\nimport time\nimport warnings\nfrom pathlib import Path\n\nimport pandas as pd\nimport torch\nfrom torch.utils.mobile_optimizer import optimize_for_mobile\n\nFILE = Path(__file__).resolve()\nROOT = FILE.parents[0]  # YOLOv5 root directory\nif str(ROOT) not in sys.path:\n    sys.path.append(str(ROOT))  # add ROOT to PATH\nif platform.system() != 'Windows':\n    ROOT = Path(os.path.relpath(ROOT, Path.cwd()))  # relative\n\nfrom models.experimental import attempt_load\nfrom models.yolo import ClassificationModel, Detect, DetectionModel, SegmentationModel\nfrom utils.dataloaders import LoadImages\nfrom utils.general import (LOGGER, Profile, check_dataset, check_img_size, check_requirements, check_version,\n                           check_yaml, colorstr, file_size, get_default_args, print_args, url2file, yaml_save)\nfrom utils.torch_utils import select_device, smart_inference_mode\n\nMACOS = platform.system() == 'Darwin'  # macOS environment\n\n\ndef export_formats():\n    # YOLOv5 export formats\n    x = [\n        ['PyTorch', '-', '.pt', True, True],\n        ['TorchScript', 'torchscript', '.torchscript', True, True],\n        ['ONNX', 'onnx', '.onnx', True, True],\n        ['OpenVINO', 'openvino', '_openvino_model', True, False],\n        ['TensorRT', 'engine', '.engine', False, True],\n        ['CoreML', 'coreml', '.mlmodel', True, False],\n        ['TensorFlow SavedModel', 'saved_model', '_saved_model', True, True],\n        ['TensorFlow GraphDef', 'pb', '.pb', True, True],\n        ['TensorFlow Lite', 'tflite', '.tflite', True, False],\n        ['TensorFlow Edge TPU', 'edgetpu', '_edgetpu.tflite', False, False],\n        ['TensorFlow.js', 'tfjs', '_web_model', False, False],\n        ['PaddlePaddle', 'paddle', '_paddle_model', True, True],]\n    return pd.DataFrame(x, columns=['Format', 'Argument', 'Suffix', 'CPU', 'GPU'])\n\n\ndef try_export(inner_func):\n    # YOLOv5 export decorator, i..e @try_export\n    inner_args = get_default_args(inner_func)\n\n    def outer_func(*args, **kwargs):\n        prefix = inner_args['prefix']\n        try:\n            with Profile() as dt:\n                f, model = inner_func(*args, **kwargs)\n            LOGGER.info(f'{prefix} export success ✅ {dt.t:.1f}s, saved as {f} ({file_size(f):.1f} MB)')\n            return f, model\n        except Exception as e:\n            LOGGER.info(f'{prefix} export failure ❌ {dt.t:.1f}s: {e}')\n            return None, None\n\n    return outer_func\n\n\n@try_export\ndef export_torchscript(model, im, file, optimize, prefix=colorstr('TorchScript:')):\n    # YOLOv5 TorchScript model export\n    LOGGER.info(f'\\n{prefix} starting export with torch {torch.__version__}...')\n    f = file.with_suffix('.torchscript')\n\n    ts = torch.jit.trace(model, im, strict=False)\n    d = {'shape': im.shape, 'stride': int(max(model.stride)), 'names': model.names}\n    extra_files = {'config.txt': json.dumps(d)}  # torch._C.ExtraFilesMap()\n    if optimize:  # https://pytorch.org/tutorials/recipes/mobile_interpreter.html\n        optimize_for_mobile(ts)._save_for_lite_interpreter(str(f), _extra_files=extra_files)\n    else:\n        ts.save(str(f), _extra_files=extra_files)\n    return f, None\n\n\n@try_export\ndef export_onnx(model, im, file, opset, dynamic, simplify, prefix=colorstr('ONNX:')):\n    # YOLOv5 ONNX export\n    check_requirements('onnx>=1.12.0')\n    import onnx\n\n    LOGGER.info(f'\\n{prefix} starting export with onnx {onnx.__version__}...')\n    f = file.with_suffix('.onnx')\n\n    output_names = ['output0', 'output1'] if isinstance(model, SegmentationModel) else ['output0']\n    if dynamic:\n        dynamic = {'images': {0: 'batch', 2: 'height', 3: 'width'}}  # shape(1,3,640,640)\n        if isinstance(model, SegmentationModel):\n            dynamic['output0'] = {0: 'batch', 1: 'anchors'}  # shape(1,25200,85)\n            dynamic['output1'] = {0: 'batch', 2: 'mask_height', 3: 'mask_width'}  # shape(1,32,160,160)\n        elif isinstance(model, DetectionModel):\n            dynamic['output0'] = {0: 'batch', 1: 'anchors'}  # shape(1,25200,85)\n\n    torch.onnx.export(\n        model.cpu() if dynamic else model,  # --dynamic only compatible with cpu\n        im.cpu() if dynamic else im,\n        f,\n        verbose=False,\n        opset_version=opset,\n        do_constant_folding=True,  # WARNING: DNN inference with torch>=1.12 may require do_constant_folding=False\n        input_names=['images'],\n        output_names=output_names,\n        dynamic_axes=dynamic or None)\n\n    # Checks\n    model_onnx = onnx.load(f)  # load onnx model\n    onnx.checker.check_model(model_onnx)  # check onnx model\n\n    # Metadata\n    d = {'stride': int(max(model.stride)), 'names': model.names}\n    for k, v in d.items():\n        meta = model_onnx.metadata_props.add()\n        meta.key, meta.value = k, str(v)\n    onnx.save(model_onnx, f)\n\n    # Simplify\n    if simplify:\n        try:\n            cuda = torch.cuda.is_available()\n            check_requirements(('onnxruntime-gpu' if cuda else 'onnxruntime', 'onnx-simplifier>=0.4.1'))\n            import onnxsim\n\n            LOGGER.info(f'{prefix} simplifying with onnx-simplifier {onnxsim.__version__}...')\n            model_onnx, check = onnxsim.simplify(model_onnx)\n            assert check, 'assert check failed'\n            onnx.save(model_onnx, f)\n        except Exception as e:\n            LOGGER.info(f'{prefix} simplifier failure: {e}')\n    return f, model_onnx\n\n\n@try_export\ndef export_openvino(file, metadata, half, prefix=colorstr('OpenVINO:')):\n    # YOLOv5 OpenVINO export\n    check_requirements('openvino-dev')  # requires openvino-dev: https://pypi.org/project/openvino-dev/\n    import openvino.inference_engine as ie\n\n    LOGGER.info(f'\\n{prefix} starting export with openvino {ie.__version__}...')\n    f = str(file).replace('.pt', f'_openvino_model{os.sep}')\n\n    args = [\n        'mo',\n        '--input_model',\n        str(file.with_suffix('.onnx')),\n        '--output_dir',\n        f,\n        '--data_type',\n        ('FP16' if half else 'FP32'),]\n    subprocess.run(args, check=True, env=os.environ)  # export\n    yaml_save(Path(f) / file.with_suffix('.yaml').name, metadata)  # add metadata.yaml\n    return f, None\n\n\n@try_export\ndef export_paddle(model, im, file, metadata, prefix=colorstr('PaddlePaddle:')):\n    # YOLOv5 Paddle export\n    check_requirements(('paddlepaddle', 'x2paddle'))\n    import x2paddle\n    from x2paddle.convert import pytorch2paddle\n\n    LOGGER.info(f'\\n{prefix} starting export with X2Paddle {x2paddle.__version__}...')\n    f = str(file).replace('.pt', f'_paddle_model{os.sep}')\n\n    pytorch2paddle(module=model, save_dir=f, jit_type='trace', input_examples=[im])  # export\n    yaml_save(Path(f) / file.with_suffix('.yaml').name, metadata)  # add metadata.yaml\n    return f, None\n\n\n@try_export\ndef export_coreml(model, im, file, int8, half, prefix=colorstr('CoreML:')):\n    # YOLOv5 CoreML export\n    check_requirements('coremltools')\n    import coremltools as ct\n\n    LOGGER.info(f'\\n{prefix} starting export with coremltools {ct.__version__}...')\n    f = file.with_suffix('.mlmodel')\n\n    ts = torch.jit.trace(model, im, strict=False)  # TorchScript model\n    ct_model = ct.convert(ts, inputs=[ct.ImageType('image', shape=im.shape, scale=1 / 255, bias=[0, 0, 0])])\n    bits, mode = (8, 'kmeans_lut') if int8 else (16, 'linear') if half else (32, None)\n    if bits < 32:\n        if MACOS:  # quantization only supported on macOS\n            with warnings.catch_warnings():\n                warnings.filterwarnings('ignore', category=DeprecationWarning)  # suppress numpy==1.20 float warning\n                ct_model = ct.models.neural_network.quantization_utils.quantize_weights(ct_model, bits, mode)\n        else:\n            print(f'{prefix} quantization only supported on macOS, skipping...')\n    ct_model.save(f)\n    return f, ct_model\n\n\n@try_export\ndef export_engine(model, im, file, half, dynamic, simplify, workspace=4, verbose=False, prefix=colorstr('TensorRT:')):\n    # YOLOv5 TensorRT export https://developer.nvidia.com/tensorrt\n    assert im.device.type != 'cpu', 'export running on CPU but must be on GPU, i.e. `python export.py --device 0`'\n    try:\n        import tensorrt as trt\n    except Exception:\n        if platform.system() == 'Linux':\n            check_requirements('nvidia-tensorrt', cmds='-U --index-url https://pypi.ngc.nvidia.com')\n        import tensorrt as trt\n\n    if trt.__version__[0] == '7':  # TensorRT 7 handling https://github.com/ultralytics/yolov5/issues/6012\n        grid = model.model[-1].anchor_grid\n        model.model[-1].anchor_grid = [a[..., :1, :1, :] for a in grid]\n        export_onnx(model, im, file, 12, dynamic, simplify)  # opset 12\n        model.model[-1].anchor_grid = grid\n    else:  # TensorRT >= 8\n        check_version(trt.__version__, '8.0.0', hard=True)  # require tensorrt>=8.0.0\n        export_onnx(model, im, file, 12, dynamic, simplify)  # opset 12\n    onnx = file.with_suffix('.onnx')\n\n    LOGGER.info(f'\\n{prefix} starting export with TensorRT {trt.__version__}...')\n    assert onnx.exists(), f'failed to export ONNX file: {onnx}'\n    f = file.with_suffix('.engine')  # TensorRT engine file\n    logger = trt.Logger(trt.Logger.INFO)\n    if verbose:\n        logger.min_severity = trt.Logger.Severity.VERBOSE\n\n    builder = trt.Builder(logger)\n    config = builder.create_builder_config()\n    config.max_workspace_size = workspace * 1 << 30\n    # config.set_memory_pool_limit(trt.MemoryPoolType.WORKSPACE, workspace << 30)  # fix TRT 8.4 deprecation notice\n\n    flag = (1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH))\n    network = builder.create_network(flag)\n    parser = trt.OnnxParser(network, logger)\n    if not parser.parse_from_file(str(onnx)):\n        raise RuntimeError(f'failed to load ONNX file: {onnx}')\n\n    inputs = [network.get_input(i) for i in range(network.num_inputs)]\n    outputs = [network.get_output(i) for i in range(network.num_outputs)]\n    for inp in inputs:\n        LOGGER.info(f'{prefix} input \"{inp.name}\" with shape{inp.shape} {inp.dtype}')\n    for out in outputs:\n        LOGGER.info(f'{prefix} output \"{out.name}\" with shape{out.shape} {out.dtype}')\n\n    if dynamic:\n        if im.shape[0] <= 1:\n            LOGGER.warning(f'{prefix} WARNING ⚠️ --dynamic model requires maximum --batch-size argument')\n        profile = builder.create_optimization_profile()\n        for inp in inputs:\n            profile.set_shape(inp.name, (1, *im.shape[1:]), (max(1, im.shape[0] // 2), *im.shape[1:]), im.shape)\n        config.add_optimization_profile(profile)\n\n    LOGGER.info(f'{prefix} building FP{16 if builder.platform_has_fast_fp16 and half else 32} engine as {f}')\n    if builder.platform_has_fast_fp16 and half:\n        config.set_flag(trt.BuilderFlag.FP16)\n    with builder.build_engine(network, config) as engine, open(f, 'wb') as t:\n        t.write(engine.serialize())\n    return f, None\n\n\n@try_export\ndef export_saved_model(model,\n                       im,\n                       file,\n                       dynamic,\n                       tf_nms=False,\n                       agnostic_nms=False,\n                       topk_per_class=100,\n                       topk_all=100,\n                       iou_thres=0.45,\n                       conf_thres=0.25,\n                       keras=False,\n                       prefix=colorstr('TensorFlow SavedModel:')):\n    # YOLOv5 TensorFlow SavedModel export\n    try:\n        import tensorflow as tf\n    except Exception:\n        check_requirements(f\"tensorflow{'' if torch.cuda.is_available() else '-macos' if MACOS else '-cpu'}\")\n        import tensorflow as tf\n    from tensorflow.python.framework.convert_to_constants import convert_variables_to_constants_v2\n\n    from models.tf import TFModel\n\n    LOGGER.info(f'\\n{prefix} starting export with tensorflow {tf.__version__}...')\n    f = str(file).replace('.pt', '_saved_model')\n    batch_size, ch, *imgsz = list(im.shape)  # BCHW\n\n    tf_model = TFModel(cfg=model.yaml, model=model, nc=model.nc, imgsz=imgsz)\n    im = tf.zeros((batch_size, *imgsz, ch))  # BHWC order for TensorFlow\n    _ = tf_model.predict(im, tf_nms, agnostic_nms, topk_per_class, topk_all, iou_thres, conf_thres)\n    inputs = tf.keras.Input(shape=(*imgsz, ch), batch_size=None if dynamic else batch_size)\n    outputs = tf_model.predict(inputs, tf_nms, agnostic_nms, topk_per_class, topk_all, iou_thres, conf_thres)\n    keras_model = tf.keras.Model(inputs=inputs, outputs=outputs)\n    keras_model.trainable = False\n    keras_model.summary()\n    if keras:\n        keras_model.save(f, save_format='tf')\n    else:\n        spec = tf.TensorSpec(keras_model.inputs[0].shape, keras_model.inputs[0].dtype)\n        m = tf.function(lambda x: keras_model(x))  # full model\n        m = m.get_concrete_function(spec)\n        frozen_func = convert_variables_to_constants_v2(m)\n        tfm = tf.Module()\n        tfm.__call__ = tf.function(lambda x: frozen_func(x)[:4] if tf_nms else frozen_func(x), [spec])\n        tfm.__call__(im)\n        tf.saved_model.save(tfm,\n                            f,\n                            options=tf.saved_model.SaveOptions(experimental_custom_gradients=False) if check_version(\n                                tf.__version__, '2.6') else tf.saved_model.SaveOptions())\n    return f, keras_model\n\n\n@try_export\ndef export_pb(keras_model, file, prefix=colorstr('TensorFlow GraphDef:')):\n    # YOLOv5 TensorFlow GraphDef *.pb export https://github.com/leimao/Frozen_Graph_TensorFlow\n    import tensorflow as tf\n    from tensorflow.python.framework.convert_to_constants import convert_variables_to_constants_v2\n\n    LOGGER.info(f'\\n{prefix} starting export with tensorflow {tf.__version__}...')\n    f = file.with_suffix('.pb')\n\n    m = tf.function(lambda x: keras_model(x))  # full model\n    m = m.get_concrete_function(tf.TensorSpec(keras_model.inputs[0].shape, keras_model.inputs[0].dtype))\n    frozen_func = convert_variables_to_constants_v2(m)\n    frozen_func.graph.as_graph_def()\n    tf.io.write_graph(graph_or_graph_def=frozen_func.graph, logdir=str(f.parent), name=f.name, as_text=False)\n    return f, None\n\n\n@try_export\ndef export_tflite(keras_model, im, file, int8, data, nms, agnostic_nms, prefix=colorstr('TensorFlow Lite:')):\n    # YOLOv5 TensorFlow Lite export\n    import tensorflow as tf\n\n    LOGGER.info(f'\\n{prefix} starting export with tensorflow {tf.__version__}...')\n    batch_size, ch, *imgsz = list(im.shape)  # BCHW\n    f = str(file).replace('.pt', '-fp16.tflite')\n\n    converter = tf.lite.TFLiteConverter.from_keras_model(keras_model)\n    converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS]\n    converter.target_spec.supported_types = [tf.float16]\n    converter.optimizations = [tf.lite.Optimize.DEFAULT]\n    if int8:\n        from models.tf import representative_dataset_gen\n        dataset = LoadImages(check_dataset(check_yaml(data))['train'], img_size=imgsz, auto=False)\n        converter.representative_dataset = lambda: representative_dataset_gen(dataset, ncalib=100)\n        converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]\n        converter.target_spec.supported_types = []\n        converter.inference_input_type = tf.uint8  # or tf.int8\n        converter.inference_output_type = tf.uint8  # or tf.int8\n        converter.experimental_new_quantizer = True\n        f = str(file).replace('.pt', '-int8.tflite')\n    if nms or agnostic_nms:\n        converter.target_spec.supported_ops.append(tf.lite.OpsSet.SELECT_TF_OPS)\n\n    tflite_model = converter.convert()\n    open(f, 'wb').write(tflite_model)\n    return f, None\n\n\n@try_export\ndef export_edgetpu(file, prefix=colorstr('Edge TPU:')):\n    # YOLOv5 Edge TPU export https://coral.ai/docs/edgetpu/models-intro/\n    cmd = 'edgetpu_compiler --version'\n    help_url = 'https://coral.ai/docs/edgetpu/compiler/'\n    assert platform.system() == 'Linux', f'export only supported on Linux. See {help_url}'\n    if subprocess.run(f'{cmd} >/dev/null', shell=True).returncode != 0:\n        LOGGER.info(f'\\n{prefix} export requires Edge TPU compiler. Attempting install from {help_url}')\n        sudo = subprocess.run('sudo --version >/dev/null', shell=True).returncode == 0  # sudo installed on system\n        for c in (\n                'curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -',\n                'echo \"deb https://packages.cloud.google.com/apt coral-edgetpu-stable main\" | sudo tee /etc/apt/sources.list.d/coral-edgetpu.list',\n                'sudo apt-get update', 'sudo apt-get install edgetpu-compiler'):\n            subprocess.run(c if sudo else c.replace('sudo ', ''), shell=True, check=True)\n    ver = subprocess.run(cmd, shell=True, capture_output=True, check=True).stdout.decode().split()[-1]\n\n    LOGGER.info(f'\\n{prefix} starting export with Edge TPU compiler {ver}...')\n    f = str(file).replace('.pt', '-int8_edgetpu.tflite')  # Edge TPU model\n    f_tfl = str(file).replace('.pt', '-int8.tflite')  # TFLite model\n\n    subprocess.run([\n        'edgetpu_compiler',\n        '-s',\n        '-d',\n        '-k',\n        '10',\n        '--out_dir',\n        str(file.parent),\n        f_tfl,], check=True)\n    return f, None\n\n\n@try_export\ndef export_tfjs(file, int8, prefix=colorstr('TensorFlow.js:')):\n    # YOLOv5 TensorFlow.js export\n    check_requirements('tensorflowjs')\n    import tensorflowjs as tfjs\n\n    LOGGER.info(f'\\n{prefix} starting export with tensorflowjs {tfjs.__version__}...')\n    f = str(file).replace('.pt', '_web_model')  # js dir\n    f_pb = file.with_suffix('.pb')  # *.pb path\n    f_json = f'{f}/model.json'  # *.json path\n\n    args = [\n        'tensorflowjs_converter',\n        '--input_format=tf_frozen_model',\n        '--quantize_uint8' if int8 else '',\n        '--output_node_names=Identity,Identity_1,Identity_2,Identity_3',\n        str(f_pb),\n        str(f),]\n    subprocess.run([arg for arg in args if arg], check=True)\n\n    json = Path(f_json).read_text()\n    with open(f_json, 'w') as j:  # sort JSON Identity_* in ascending order\n        subst = re.sub(\n            r'{\"outputs\": {\"Identity.?.?\": {\"name\": \"Identity.?.?\"}, '\n            r'\"Identity.?.?\": {\"name\": \"Identity.?.?\"}, '\n            r'\"Identity.?.?\": {\"name\": \"Identity.?.?\"}, '\n            r'\"Identity.?.?\": {\"name\": \"Identity.?.?\"}}}', r'{\"outputs\": {\"Identity\": {\"name\": \"Identity\"}, '\n            r'\"Identity_1\": {\"name\": \"Identity_1\"}, '\n            r'\"Identity_2\": {\"name\": \"Identity_2\"}, '\n            r'\"Identity_3\": {\"name\": \"Identity_3\"}}}', json)\n        j.write(subst)\n    return f, None\n\n\ndef add_tflite_metadata(file, metadata, num_outputs):\n    # Add metadata to *.tflite models per https://www.tensorflow.org/lite/models/convert/metadata\n    with contextlib.suppress(ImportError):\n        # check_requirements('tflite_support')\n        from tflite_support import flatbuffers\n        from tflite_support import metadata as _metadata\n        from tflite_support import metadata_schema_py_generated as _metadata_fb\n\n        tmp_file = Path('/tmp/meta.txt')\n        with open(tmp_file, 'w') as meta_f:\n            meta_f.write(str(metadata))\n\n        model_meta = _metadata_fb.ModelMetadataT()\n        label_file = _metadata_fb.AssociatedFileT()\n        label_file.name = tmp_file.name\n        model_meta.associatedFiles = [label_file]\n\n        subgraph = _metadata_fb.SubGraphMetadataT()\n        subgraph.inputTensorMetadata = [_metadata_fb.TensorMetadataT()]\n        subgraph.outputTensorMetadata = [_metadata_fb.TensorMetadataT()] * num_outputs\n        model_meta.subgraphMetadata = [subgraph]\n\n        b = flatbuffers.Builder(0)\n        b.Finish(model_meta.Pack(b), _metadata.MetadataPopulator.METADATA_FILE_IDENTIFIER)\n        metadata_buf = b.Output()\n\n        populator = _metadata.MetadataPopulator.with_model_file(file)\n        populator.load_metadata_buffer(metadata_buf)\n        populator.load_associated_files([str(tmp_file)])\n        populator.populate()\n        tmp_file.unlink()\n\n\n@smart_inference_mode()\ndef run(\n        data=ROOT / 'data/coco128.yaml',  # 'dataset.yaml path'\n        weights=ROOT / 'yolov5s.pt',  # weights path\n        imgsz=(640, 640),  # image (height, width)\n        batch_size=1,  # batch size\n        device='cpu',  # cuda device, i.e. 0 or 0,1,2,3 or cpu\n        include=('torchscript', 'onnx'),  # include formats\n        half=False,  # FP16 half-precision export\n        inplace=False,  # set YOLOv5 Detect() inplace=True\n        keras=False,  # use Keras\n        optimize=False,  # TorchScript: optimize for mobile\n        int8=False,  # CoreML/TF INT8 quantization\n        dynamic=False,  # ONNX/TF/TensorRT: dynamic axes\n        simplify=False,  # ONNX: simplify model\n        opset=12,  # ONNX: opset version\n        verbose=False,  # TensorRT: verbose log\n        workspace=4,  # TensorRT: workspace size (GB)\n        nms=False,  # TF: add NMS to model\n        agnostic_nms=False,  # TF: add agnostic NMS to model\n        topk_per_class=100,  # TF.js NMS: topk per class to keep\n        topk_all=100,  # TF.js NMS: topk for all classes to keep\n        iou_thres=0.45,  # TF.js NMS: IoU threshold\n        conf_thres=0.25,  # TF.js NMS: confidence threshold\n):\n    t = time.time()\n    include = [x.lower() for x in include]  # to lowercase\n    fmts = tuple(export_formats()['Argument'][1:])  # --include arguments\n    flags = [x in include for x in fmts]\n    assert sum(flags) == len(include), f'ERROR: Invalid --include {include}, valid --include arguments are {fmts}'\n    jit, onnx, xml, engine, coreml, saved_model, pb, tflite, edgetpu, tfjs, paddle = flags  # export booleans\n    file = Path(url2file(weights) if str(weights).startswith(('http:/', 'https:/')) else weights)  # PyTorch weights\n\n    # Load PyTorch model\n    device = select_device(device)\n    if half:\n        assert device.type != 'cpu' or coreml, '--half only compatible with GPU export, i.e. use --device 0'\n        assert not dynamic, '--half not compatible with --dynamic, i.e. use either --half or --dynamic but not both'\n    model = attempt_load(weights, device=device, inplace=True, fuse=True)  # load FP32 model\n\n    # Checks\n    imgsz *= 2 if len(imgsz) == 1 else 1  # expand\n    if optimize:\n        assert device.type == 'cpu', '--optimize not compatible with cuda devices, i.e. use --device cpu'\n\n    # Input\n    gs = int(max(model.stride))  # grid size (max stride)\n    imgsz = [check_img_size(x, gs) for x in imgsz]  # verify img_size are gs-multiples\n    im = torch.zeros(batch_size, 3, *imgsz).to(device)  # image size(1,3,320,192) BCHW iDetection\n\n    # Update model\n    model.eval()\n    for k, m in model.named_modules():\n        if isinstance(m, Detect):\n            m.inplace = inplace\n            m.dynamic = dynamic\n            m.export = True\n\n    for _ in range(2):\n        y = model(im)  # dry runs\n    if half and not coreml:\n        im, model = im.half(), model.half()  # to FP16\n    shape = tuple((y[0] if isinstance(y, tuple) else y).shape)  # model output shape\n    metadata = {'stride': int(max(model.stride)), 'names': model.names}  # model metadata\n    LOGGER.info(f\"\\n{colorstr('PyTorch:')} starting from {file} with output shape {shape} ({file_size(file):.1f} MB)\")\n\n    # Exports\n    f = [''] * len(fmts)  # exported filenames\n    warnings.filterwarnings(action='ignore', category=torch.jit.TracerWarning)  # suppress TracerWarning\n    if jit:  # TorchScript\n        f[0], _ = export_torchscript(model, im, file, optimize)\n    if engine:  # TensorRT required before ONNX\n        f[1], _ = export_engine(model, im, file, half, dynamic, simplify, workspace, verbose)\n    if onnx or xml:  # OpenVINO requires ONNX\n        f[2], _ = export_onnx(model, im, file, opset, dynamic, simplify)\n    if xml:  # OpenVINO\n        f[3], _ = export_openvino(file, metadata, half)\n    if coreml:  # CoreML\n        f[4], _ = export_coreml(model, im, file, int8, half)\n    if any((saved_model, pb, tflite, edgetpu, tfjs)):  # TensorFlow formats\n        assert not tflite or not tfjs, 'TFLite and TF.js models must be exported separately, please pass only one type.'\n        assert not isinstance(model, ClassificationModel), 'ClassificationModel export to TF formats not yet supported.'\n        f[5], s_model = export_saved_model(model.cpu(),\n                                           im,\n                                           file,\n                                           dynamic,\n                                           tf_nms=nms or agnostic_nms or tfjs,\n                                           agnostic_nms=agnostic_nms or tfjs,\n                                           topk_per_class=topk_per_class,\n                                           topk_all=topk_all,\n                                           iou_thres=iou_thres,\n                                           conf_thres=conf_thres,\n                                           keras=keras)\n        if pb or tfjs:  # pb prerequisite to tfjs\n            f[6], _ = export_pb(s_model, file)\n        if tflite or edgetpu:\n            f[7], _ = export_tflite(s_model, im, file, int8 or edgetpu, data=data, nms=nms, agnostic_nms=agnostic_nms)\n            if edgetpu:\n                f[8], _ = export_edgetpu(file)\n            add_tflite_metadata(f[8] or f[7], metadata, num_outputs=len(s_model.outputs))\n        if tfjs:\n            f[9], _ = export_tfjs(file, int8)\n    if paddle:  # PaddlePaddle\n        f[10], _ = export_paddle(model, im, file, metadata)\n\n    # Finish\n    f = [str(x) for x in f if x]  # filter out '' and None\n    if any(f):\n        cls, det, seg = (isinstance(model, x) for x in (ClassificationModel, DetectionModel, SegmentationModel))  # type\n        det &= not seg  # segmentation models inherit from SegmentationModel(DetectionModel)\n        dir = Path('segment' if seg else 'classify' if cls else '')\n        h = '--half' if half else ''  # --half FP16 inference arg\n        s = '# WARNING ⚠️ ClassificationModel not yet supported for PyTorch Hub AutoShape inference' if cls else \\\n            '# WARNING ⚠️ SegmentationModel not yet supported for PyTorch Hub AutoShape inference' if seg else ''\n        LOGGER.info(f'\\nExport complete ({time.time() - t:.1f}s)'\n                    f\"\\nResults saved to {colorstr('bold', file.parent.resolve())}\"\n                    f\"\\nDetect:          python {dir / ('detect.py' if det else 'predict.py')} --weights {f[-1]} {h}\"\n                    f\"\\nValidate:        python {dir / 'val.py'} --weights {f[-1]} {h}\"\n                    f\"\\nPyTorch Hub:     model = torch.hub.load('ultralytics/yolov5', 'custom', '{f[-1]}')  {s}\"\n                    f'\\nVisualize:       https://netron.app')\n    return f  # return list of exported files/dirs\n\n\ndef parse_opt(known=False):\n    parser = argparse.ArgumentParser()\n    parser.add_argument('--data', type=str, default=ROOT / 'data/coco128.yaml', help='dataset.yaml path')\n    parser.add_argument('--weights', nargs='+', type=str, default=ROOT / 'yolov5s.pt', help='model.pt path(s)')\n    parser.add_argument('--imgsz', '--img', '--img-size', nargs='+', type=int, default=[640, 640], help='image (h, w)')\n    parser.add_argument('--batch-size', type=int, default=1, help='batch size')\n    parser.add_argument('--device', default='cpu', help='cuda device, i.e. 0 or 0,1,2,3 or cpu')\n    parser.add_argument('--half', action='store_true', help='FP16 half-precision export')\n    parser.add_argument('--inplace', action='store_true', help='set YOLOv5 Detect() inplace=True')\n    parser.add_argument('--keras', action='store_true', help='TF: use Keras')\n    parser.add_argument('--optimize', action='store_true', help='TorchScript: optimize for mobile')\n    parser.add_argument('--int8', action='store_true', help='CoreML/TF INT8 quantization')\n    parser.add_argument('--dynamic', action='store_true', help='ONNX/TF/TensorRT: dynamic axes')\n    parser.add_argument('--simplify', action='store_true', help='ONNX: simplify model')\n    parser.add_argument('--opset', type=int, default=17, help='ONNX: opset version')\n    parser.add_argument('--verbose', action='store_true', help='TensorRT: verbose log')\n    parser.add_argument('--workspace', type=int, default=4, help='TensorRT: workspace size (GB)')\n    parser.add_argument('--nms', action='store_true', help='TF: add NMS to model')\n    parser.add_argument('--agnostic-nms', action='store_true', help='TF: add agnostic NMS to model')\n    parser.add_argument('--topk-per-class', type=int, default=100, help='TF.js NMS: topk per class to keep')\n    parser.add_argument('--topk-all', type=int, default=100, help='TF.js NMS: topk for all classes to keep')\n    parser.add_argument('--iou-thres', type=float, default=0.45, help='TF.js NMS: IoU threshold')\n    parser.add_argument('--conf-thres', type=float, default=0.25, help='TF.js NMS: confidence threshold')\n    parser.add_argument(\n        '--include',\n        nargs='+',\n        default=['torchscript'],\n        help='torchscript, onnx, openvino, engine, coreml, saved_model, pb, tflite, edgetpu, tfjs, paddle')\n    opt = parser.parse_known_args()[0] if known else parser.parse_args()\n    print_args(vars(opt))\n    return opt\n\n\ndef main(opt):\n    for opt.weights in (opt.weights if isinstance(opt.weights, list) else [opt.weights]):\n        run(**vars(opt))\n\n\nif __name__ == '__main__':\n    opt = parse_opt()\n    main(opt)\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/hubconf.py",
    "content": "# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n\"\"\"\nPyTorch Hub models https://pytorch.org/hub/ultralytics_yolov5\n\nUsage:\n    import torch\n    model = torch.hub.load('ultralytics/yolov5', 'yolov5s')  # official model\n    model = torch.hub.load('ultralytics/yolov5:master', 'yolov5s')  # from branch\n    model = torch.hub.load('ultralytics/yolov5', 'custom', 'yolov5s.pt')  # custom/local model\n    model = torch.hub.load('.', 'custom', 'yolov5s.pt', source='local')  # local repo\n\"\"\"\n\nimport torch\n\n\ndef _create(name, pretrained=True, channels=3, classes=80, autoshape=True, verbose=True, device=None):\n    \"\"\"Creates or loads a YOLOv5 model\n\n    Arguments:\n        name (str): model name 'yolov5s' or path 'path/to/best.pt'\n        pretrained (bool): load pretrained weights into the model\n        channels (int): number of input channels\n        classes (int): number of model classes\n        autoshape (bool): apply YOLOv5 .autoshape() wrapper to model\n        verbose (bool): print all information to screen\n        device (str, torch.device, None): device to use for model parameters\n\n    Returns:\n        YOLOv5 model\n    \"\"\"\n    from pathlib import Path\n\n    from models.common import AutoShape, DetectMultiBackend\n    from models.experimental import attempt_load\n    from models.yolo import ClassificationModel, DetectionModel, SegmentationModel\n    from utils.downloads import attempt_download\n    from utils.general import LOGGER, check_requirements, intersect_dicts, logging\n    from utils.torch_utils import select_device\n\n    if not verbose:\n        LOGGER.setLevel(logging.WARNING)\n    check_requirements(exclude=('opencv-python', 'tensorboard', 'thop'))\n    name = Path(name)\n    path = name.with_suffix('.pt') if name.suffix == '' and not name.is_dir() else name  # checkpoint path\n    try:\n        device = select_device(device)\n        if pretrained and channels == 3 and classes == 80:\n            try:\n                model = DetectMultiBackend(path, device=device, fuse=autoshape)  # detection model\n                if autoshape:\n                    if model.pt and isinstance(model.model, ClassificationModel):\n                        LOGGER.warning('WARNING ⚠️ YOLOv5 ClassificationModel is not yet AutoShape compatible. '\n                                       'You must pass torch tensors in BCHW to this model, i.e. shape(1,3,224,224).')\n                    elif model.pt and isinstance(model.model, SegmentationModel):\n                        LOGGER.warning('WARNING ⚠️ YOLOv5 SegmentationModel is not yet AutoShape compatible. '\n                                       'You will not be able to run inference with this model.')\n                    else:\n                        model = AutoShape(model)  # for file/URI/PIL/cv2/np inputs and NMS\n            except Exception:\n                model = attempt_load(path, device=device, fuse=False)  # arbitrary model\n        else:\n            cfg = list((Path(__file__).parent / 'models').rglob(f'{path.stem}.yaml'))[0]  # model.yaml path\n            model = DetectionModel(cfg, channels, classes)  # create model\n            if pretrained:\n                ckpt = torch.load(attempt_download(path), map_location=device)  # load\n                csd = ckpt['model'].float().state_dict()  # checkpoint state_dict as FP32\n                csd = intersect_dicts(csd, model.state_dict(), exclude=['anchors'])  # intersect\n                model.load_state_dict(csd, strict=False)  # load\n                if len(ckpt['model'].names) == classes:\n                    model.names = ckpt['model'].names  # set class names attribute\n        if not verbose:\n            LOGGER.setLevel(logging.INFO)  # reset to default\n        return model.to(device)\n\n    except Exception as e:\n        help_url = 'https://github.com/ultralytics/yolov5/issues/36'\n        s = f'{e}. Cache may be out of date, try `force_reload=True` or see {help_url} for help.'\n        raise Exception(s) from e\n\n\ndef custom(path='path/to/model.pt', autoshape=True, _verbose=True, device=None):\n    # YOLOv5 custom or local model\n    return _create(path, autoshape=autoshape, verbose=_verbose, device=device)\n\n\ndef yolov5n(pretrained=True, channels=3, classes=80, autoshape=True, _verbose=True, device=None):\n    # YOLOv5-nano model https://github.com/ultralytics/yolov5\n    return _create('yolov5n', pretrained, channels, classes, autoshape, _verbose, device)\n\n\ndef yolov5s(pretrained=True, channels=3, classes=80, autoshape=True, _verbose=True, device=None):\n    # YOLOv5-small model https://github.com/ultralytics/yolov5\n    return _create('yolov5s', pretrained, channels, classes, autoshape, _verbose, device)\n\n\ndef yolov5m(pretrained=True, channels=3, classes=80, autoshape=True, _verbose=True, device=None):\n    # YOLOv5-medium model https://github.com/ultralytics/yolov5\n    return _create('yolov5m', pretrained, channels, classes, autoshape, _verbose, device)\n\n\ndef yolov5l(pretrained=True, channels=3, classes=80, autoshape=True, _verbose=True, device=None):\n    # YOLOv5-large model https://github.com/ultralytics/yolov5\n    return _create('yolov5l', pretrained, channels, classes, autoshape, _verbose, device)\n\n\ndef yolov5x(pretrained=True, channels=3, classes=80, autoshape=True, _verbose=True, device=None):\n    # YOLOv5-xlarge model https://github.com/ultralytics/yolov5\n    return _create('yolov5x', pretrained, channels, classes, autoshape, _verbose, device)\n\n\ndef yolov5n6(pretrained=True, channels=3, classes=80, autoshape=True, _verbose=True, device=None):\n    # YOLOv5-nano-P6 model https://github.com/ultralytics/yolov5\n    return _create('yolov5n6', pretrained, channels, classes, autoshape, _verbose, device)\n\n\ndef yolov5s6(pretrained=True, channels=3, classes=80, autoshape=True, _verbose=True, device=None):\n    # YOLOv5-small-P6 model https://github.com/ultralytics/yolov5\n    return _create('yolov5s6', pretrained, channels, classes, autoshape, _verbose, device)\n\n\ndef yolov5m6(pretrained=True, channels=3, classes=80, autoshape=True, _verbose=True, device=None):\n    # YOLOv5-medium-P6 model https://github.com/ultralytics/yolov5\n    return _create('yolov5m6', pretrained, channels, classes, autoshape, _verbose, device)\n\n\ndef yolov5l6(pretrained=True, channels=3, classes=80, autoshape=True, _verbose=True, device=None):\n    # YOLOv5-large-P6 model https://github.com/ultralytics/yolov5\n    return _create('yolov5l6', pretrained, channels, classes, autoshape, _verbose, device)\n\n\ndef yolov5x6(pretrained=True, channels=3, classes=80, autoshape=True, _verbose=True, device=None):\n    # YOLOv5-xlarge-P6 model https://github.com/ultralytics/yolov5\n    return _create('yolov5x6', pretrained, channels, classes, autoshape, _verbose, device)\n\n\nif __name__ == '__main__':\n    import argparse\n    from pathlib import Path\n\n    import numpy as np\n    from PIL import Image\n\n    from utils.general import cv2, print_args\n\n    # Argparser\n    parser = argparse.ArgumentParser()\n    parser.add_argument('--model', type=str, default='yolov5s', help='model name')\n    opt = parser.parse_args()\n    print_args(vars(opt))\n\n    # Model\n    model = _create(name=opt.model, pretrained=True, channels=3, classes=80, autoshape=True, verbose=True)\n    # model = custom(path='path/to/model.pt')  # custom\n\n    # Images\n    imgs = [\n        'data/images/zidane.jpg',  # filename\n        Path('data/images/zidane.jpg'),  # Path\n        'https://ultralytics.com/images/zidane.jpg',  # URI\n        cv2.imread('data/images/bus.jpg')[:, :, ::-1],  # OpenCV\n        Image.open('data/images/bus.jpg'),  # PIL\n        np.zeros((320, 640, 3))]  # numpy\n\n    # Inference\n    results = model(imgs, size=320)  # batched inference\n\n    # Results\n    results.print()\n    results.save()\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/models/__init__.py",
    "content": ""
  },
  {
    "path": "yolo-improve/yolov5-AUX/models/common.py",
    "content": "# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n\"\"\"\nCommon modules\n\"\"\"\n\nimport ast\nimport contextlib\nimport json\nimport math\nimport platform\nimport warnings\nimport zipfile\nfrom collections import OrderedDict, namedtuple\nfrom copy import copy\nfrom pathlib import Path\nfrom urllib.parse import urlparse\n\nimport cv2\nimport numpy as np\nimport pandas as pd\nimport requests\nimport torch\nimport torch.nn as nn\nfrom IPython.display import display\nfrom PIL import Image\nfrom torch.cuda import amp\n\nfrom utils import TryExcept\nfrom utils.dataloaders import exif_transpose, letterbox\nfrom utils.general import (LOGGER, ROOT, Profile, check_requirements, check_suffix, check_version, colorstr,\n                           increment_path, is_notebook, make_divisible, non_max_suppression, scale_boxes, xywh2xyxy,\n                           xyxy2xywh, yaml_load)\nfrom utils.plots import Annotator, colors, save_one_box\nfrom utils.torch_utils import copy_attr, smart_inference_mode\n\n\ndef autopad(k, p=None, d=1):  # kernel, padding, dilation\n    # Pad to 'same' shape outputs\n    if d > 1:\n        k = d * (k - 1) + 1 if isinstance(k, int) else [d * (x - 1) + 1 for x in k]  # actual kernel-size\n    if p is None:\n        p = k // 2 if isinstance(k, int) else [x // 2 for x in k]  # auto-pad\n    return p\n\n\nclass Conv(nn.Module):\n    # Standard convolution with args(ch_in, ch_out, kernel, stride, padding, groups, dilation, activation)\n    default_act = nn.SiLU()  # default activation\n\n    def __init__(self, c1, c2, k=1, s=1, p=None, g=1, d=1, act=True):\n        super().__init__()\n        self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p, d), groups=g, dilation=d, bias=False)\n        self.bn = nn.BatchNorm2d(c2)\n        self.act = self.default_act if act is True else act if isinstance(act, nn.Module) else nn.Identity()\n\n    def forward(self, x):\n        return self.act(self.bn(self.conv(x)))\n\n    def forward_fuse(self, x):\n        return self.act(self.conv(x))\n\n\nclass DWConv(Conv):\n    # Depth-wise convolution\n    def __init__(self, c1, c2, k=1, s=1, d=1, act=True):  # ch_in, ch_out, kernel, stride, dilation, activation\n        super().__init__(c1, c2, k, s, g=math.gcd(c1, c2), d=d, act=act)\n\n\nclass DWConvTranspose2d(nn.ConvTranspose2d):\n    # Depth-wise transpose convolution\n    def __init__(self, c1, c2, k=1, s=1, p1=0, p2=0):  # ch_in, ch_out, kernel, stride, padding, padding_out\n        super().__init__(c1, c2, k, s, p1, p2, groups=math.gcd(c1, c2))\n\n\nclass TransformerLayer(nn.Module):\n    # Transformer layer https://arxiv.org/abs/2010.11929 (LayerNorm layers removed for better performance)\n    def __init__(self, c, num_heads):\n        super().__init__()\n        self.q = nn.Linear(c, c, bias=False)\n        self.k = nn.Linear(c, c, bias=False)\n        self.v = nn.Linear(c, c, bias=False)\n        self.ma = nn.MultiheadAttention(embed_dim=c, num_heads=num_heads)\n        self.fc1 = nn.Linear(c, c, bias=False)\n        self.fc2 = nn.Linear(c, c, bias=False)\n\n    def forward(self, x):\n        x = self.ma(self.q(x), self.k(x), self.v(x))[0] + x\n        x = self.fc2(self.fc1(x)) + x\n        return x\n\n\nclass TransformerBlock(nn.Module):\n    # Vision Transformer https://arxiv.org/abs/2010.11929\n    def __init__(self, c1, c2, num_heads, num_layers):\n        super().__init__()\n        self.conv = None\n        if c1 != c2:\n            self.conv = Conv(c1, c2)\n        self.linear = nn.Linear(c2, c2)  # learnable position embedding\n        self.tr = nn.Sequential(*(TransformerLayer(c2, num_heads) for _ in range(num_layers)))\n        self.c2 = c2\n\n    def forward(self, x):\n        if self.conv is not None:\n            x = self.conv(x)\n        b, _, w, h = x.shape\n        p = x.flatten(2).permute(2, 0, 1)\n        return self.tr(p + self.linear(p)).permute(1, 2, 0).reshape(b, self.c2, w, h)\n\n\nclass Bottleneck(nn.Module):\n    # Standard bottleneck\n    def __init__(self, c1, c2, shortcut=True, g=1, e=0.5):  # ch_in, ch_out, shortcut, groups, expansion\n        super().__init__()\n        c_ = int(c2 * e)  # hidden channels\n        self.cv1 = Conv(c1, c_, 1, 1)\n        self.cv2 = Conv(c_, c2, 3, 1, g=g)\n        self.add = shortcut and c1 == c2\n\n    def forward(self, x):\n        return x + self.cv2(self.cv1(x)) if self.add else self.cv2(self.cv1(x))\n\n\nclass BottleneckCSP(nn.Module):\n    # CSP Bottleneck https://github.com/WongKinYiu/CrossStagePartialNetworks\n    def __init__(self, c1, c2, n=1, shortcut=True, g=1, e=0.5):  # ch_in, ch_out, number, shortcut, groups, expansion\n        super().__init__()\n        c_ = int(c2 * e)  # hidden channels\n        self.cv1 = Conv(c1, c_, 1, 1)\n        self.cv2 = nn.Conv2d(c1, c_, 1, 1, bias=False)\n        self.cv3 = nn.Conv2d(c_, c_, 1, 1, bias=False)\n        self.cv4 = Conv(2 * c_, c2, 1, 1)\n        self.bn = nn.BatchNorm2d(2 * c_)  # applied to cat(cv2, cv3)\n        self.act = nn.SiLU()\n        self.m = nn.Sequential(*(Bottleneck(c_, c_, shortcut, g, e=1.0) for _ in range(n)))\n\n    def forward(self, x):\n        y1 = self.cv3(self.m(self.cv1(x)))\n        y2 = self.cv2(x)\n        return self.cv4(self.act(self.bn(torch.cat((y1, y2), 1))))\n\n\nclass CrossConv(nn.Module):\n    # Cross Convolution Downsample\n    def __init__(self, c1, c2, k=3, s=1, g=1, e=1.0, shortcut=False):\n        # ch_in, ch_out, kernel, stride, groups, expansion, shortcut\n        super().__init__()\n        c_ = int(c2 * e)  # hidden channels\n        self.cv1 = Conv(c1, c_, (1, k), (1, s))\n        self.cv2 = Conv(c_, c2, (k, 1), (s, 1), g=g)\n        self.add = shortcut and c1 == c2\n\n    def forward(self, x):\n        return x + self.cv2(self.cv1(x)) if self.add else self.cv2(self.cv1(x))\n\n\nclass C3(nn.Module):\n    # CSP Bottleneck with 3 convolutions\n    def __init__(self, c1, c2, n=1, shortcut=True, g=1, e=0.5):  # ch_in, ch_out, number, shortcut, groups, expansion\n        super().__init__()\n        c_ = int(c2 * e)  # hidden channels\n        self.cv1 = Conv(c1, c_, 1, 1)\n        self.cv2 = Conv(c1, c_, 1, 1)\n        self.cv3 = Conv(2 * c_, c2, 1)  # optional act=FReLU(c2)\n        self.m = nn.Sequential(*(Bottleneck(c_, c_, shortcut, g, e=1.0) for _ in range(n)))\n\n    def forward(self, x):\n        return self.cv3(torch.cat((self.m(self.cv1(x)), self.cv2(x)), 1))\n\n\nclass C3x(C3):\n    # C3 module with cross-convolutions\n    def __init__(self, c1, c2, n=1, shortcut=True, g=1, e=0.5):\n        super().__init__(c1, c2, n, shortcut, g, e)\n        c_ = int(c2 * e)\n        self.m = nn.Sequential(*(CrossConv(c_, c_, 3, 1, g, 1.0, shortcut) for _ in range(n)))\n\n\nclass C3TR(C3):\n    # C3 module with TransformerBlock()\n    def __init__(self, c1, c2, n=1, shortcut=True, g=1, e=0.5):\n        super().__init__(c1, c2, n, shortcut, g, e)\n        c_ = int(c2 * e)\n        self.m = TransformerBlock(c_, c_, 4, n)\n\n\nclass C3SPP(C3):\n    # C3 module with SPP()\n    def __init__(self, c1, c2, k=(5, 9, 13), n=1, shortcut=True, g=1, e=0.5):\n        super().__init__(c1, c2, n, shortcut, g, e)\n        c_ = int(c2 * e)\n        self.m = SPP(c_, c_, k)\n\n\nclass C3Ghost(C3):\n    # C3 module with GhostBottleneck()\n    def __init__(self, c1, c2, n=1, shortcut=True, g=1, e=0.5):\n        super().__init__(c1, c2, n, shortcut, g, e)\n        c_ = int(c2 * e)  # hidden channels\n        self.m = nn.Sequential(*(GhostBottleneck(c_, c_) for _ in range(n)))\n\n\nclass SPP(nn.Module):\n    # Spatial Pyramid Pooling (SPP) layer https://arxiv.org/abs/1406.4729\n    def __init__(self, c1, c2, k=(5, 9, 13)):\n        super().__init__()\n        c_ = c1 // 2  # hidden channels\n        self.cv1 = Conv(c1, c_, 1, 1)\n        self.cv2 = Conv(c_ * (len(k) + 1), c2, 1, 1)\n        self.m = nn.ModuleList([nn.MaxPool2d(kernel_size=x, stride=1, padding=x // 2) for x in k])\n\n    def forward(self, x):\n        x = self.cv1(x)\n        with warnings.catch_warnings():\n            warnings.simplefilter('ignore')  # suppress torch 1.9.0 max_pool2d() warning\n            return self.cv2(torch.cat([x] + [m(x) for m in self.m], 1))\n\n\nclass SPPF(nn.Module):\n    # Spatial Pyramid Pooling - Fast (SPPF) layer for YOLOv5 by Glenn Jocher\n    def __init__(self, c1, c2, k=5):  # equivalent to SPP(k=(5, 9, 13))\n        super().__init__()\n        c_ = c1 // 2  # hidden channels\n        self.cv1 = Conv(c1, c_, 1, 1)\n        self.cv2 = Conv(c_ * 4, c2, 1, 1)\n        self.m = nn.MaxPool2d(kernel_size=k, stride=1, padding=k // 2)\n\n    def forward(self, x):\n        x = self.cv1(x)\n        with warnings.catch_warnings():\n            warnings.simplefilter('ignore')  # suppress torch 1.9.0 max_pool2d() warning\n            y1 = self.m(x)\n            y2 = self.m(y1)\n            return self.cv2(torch.cat((x, y1, y2, self.m(y2)), 1))\n\n\nclass Focus(nn.Module):\n    # Focus wh information into c-space\n    def __init__(self, c1, c2, k=1, s=1, p=None, g=1, act=True):  # ch_in, ch_out, kernel, stride, padding, groups\n        super().__init__()\n        self.conv = Conv(c1 * 4, c2, k, s, p, g, act=act)\n        # self.contract = Contract(gain=2)\n\n    def forward(self, x):  # x(b,c,w,h) -> y(b,4c,w/2,h/2)\n        return self.conv(torch.cat((x[..., ::2, ::2], x[..., 1::2, ::2], x[..., ::2, 1::2], x[..., 1::2, 1::2]), 1))\n        # return self.conv(self.contract(x))\n\n\nclass GhostConv(nn.Module):\n    # Ghost Convolution https://github.com/huawei-noah/ghostnet\n    def __init__(self, c1, c2, k=1, s=1, g=1, act=True):  # ch_in, ch_out, kernel, stride, groups\n        super().__init__()\n        c_ = c2 // 2  # hidden channels\n        self.cv1 = Conv(c1, c_, k, s, None, g, act=act)\n        self.cv2 = Conv(c_, c_, 5, 1, None, c_, act=act)\n\n    def forward(self, x):\n        y = self.cv1(x)\n        return torch.cat((y, self.cv2(y)), 1)\n\n\nclass GhostBottleneck(nn.Module):\n    # Ghost Bottleneck https://github.com/huawei-noah/ghostnet\n    def __init__(self, c1, c2, k=3, s=1):  # ch_in, ch_out, kernel, stride\n        super().__init__()\n        c_ = c2 // 2\n        self.conv = nn.Sequential(\n            GhostConv(c1, c_, 1, 1),  # pw\n            DWConv(c_, c_, k, s, act=False) if s == 2 else nn.Identity(),  # dw\n            GhostConv(c_, c2, 1, 1, act=False))  # pw-linear\n        self.shortcut = nn.Sequential(DWConv(c1, c1, k, s, act=False), Conv(c1, c2, 1, 1,\n                                                                            act=False)) if s == 2 else nn.Identity()\n\n    def forward(self, x):\n        return self.conv(x) + self.shortcut(x)\n\n\nclass Contract(nn.Module):\n    # Contract width-height into channels, i.e. x(1,64,80,80) to x(1,256,40,40)\n    def __init__(self, gain=2):\n        super().__init__()\n        self.gain = gain\n\n    def forward(self, x):\n        b, c, h, w = x.size()  # assert (h / s == 0) and (W / s == 0), 'Indivisible gain'\n        s = self.gain\n        x = x.view(b, c, h // s, s, w // s, s)  # x(1,64,40,2,40,2)\n        x = x.permute(0, 3, 5, 1, 2, 4).contiguous()  # x(1,2,2,64,40,40)\n        return x.view(b, c * s * s, h // s, w // s)  # x(1,256,40,40)\n\n\nclass Expand(nn.Module):\n    # Expand channels into width-height, i.e. x(1,64,80,80) to x(1,16,160,160)\n    def __init__(self, gain=2):\n        super().__init__()\n        self.gain = gain\n\n    def forward(self, x):\n        b, c, h, w = x.size()  # assert C / s ** 2 == 0, 'Indivisible gain'\n        s = self.gain\n        x = x.view(b, s, s, c // s ** 2, h, w)  # x(1,2,2,16,80,80)\n        x = x.permute(0, 3, 4, 1, 5, 2).contiguous()  # x(1,16,80,2,80,2)\n        return x.view(b, c // s ** 2, h * s, w * s)  # x(1,16,160,160)\n\n\nclass Concat(nn.Module):\n    # Concatenate a list of tensors along dimension\n    def __init__(self, dimension=1):\n        super().__init__()\n        self.d = dimension\n\n    def forward(self, x):\n        return torch.cat(x, self.d)\n\n\nclass DetectMultiBackend(nn.Module):\n    # YOLOv5 MultiBackend class for python inference on various backends\n    def __init__(self, weights='yolov5s.pt', device=torch.device('cpu'), dnn=False, data=None, fp16=False, fuse=True):\n        # Usage:\n        #   PyTorch:              weights = *.pt\n        #   TorchScript:                    *.torchscript\n        #   ONNX Runtime:                   *.onnx\n        #   ONNX OpenCV DNN:                *.onnx --dnn\n        #   OpenVINO:                       *_openvino_model\n        #   CoreML:                         *.mlmodel\n        #   TensorRT:                       *.engine\n        #   TensorFlow SavedModel:          *_saved_model\n        #   TensorFlow GraphDef:            *.pb\n        #   TensorFlow Lite:                *.tflite\n        #   TensorFlow Edge TPU:            *_edgetpu.tflite\n        #   PaddlePaddle:                   *_paddle_model\n        from models.experimental import attempt_download, attempt_load  # scoped to avoid circular import\n\n        super().__init__()\n        w = str(weights[0] if isinstance(weights, list) else weights)\n        pt, jit, onnx, xml, engine, coreml, saved_model, pb, tflite, edgetpu, tfjs, paddle, triton = self._model_type(w)\n        fp16 &= pt or jit or onnx or engine  # FP16\n        nhwc = coreml or saved_model or pb or tflite or edgetpu  # BHWC formats (vs torch BCWH)\n        stride = 32  # default stride\n        cuda = torch.cuda.is_available() and device.type != 'cpu'  # use CUDA\n        if not (pt or triton):\n            w = attempt_download(w)  # download if not local\n\n        if pt:  # PyTorch\n            model = attempt_load(weights if isinstance(weights, list) else w, device=device, inplace=True, fuse=fuse)\n            stride = max(int(model.stride.max()), 32)  # model stride\n            names = model.module.names if hasattr(model, 'module') else model.names  # get class names\n            model.half() if fp16 else model.float()\n            self.model = model  # explicitly assign for to(), cpu(), cuda(), half()\n        elif jit:  # TorchScript\n            LOGGER.info(f'Loading {w} for TorchScript inference...')\n            extra_files = {'config.txt': ''}  # model metadata\n            model = torch.jit.load(w, _extra_files=extra_files, map_location=device)\n            model.half() if fp16 else model.float()\n            if extra_files['config.txt']:  # load metadata dict\n                d = json.loads(extra_files['config.txt'],\n                               object_hook=lambda d: {int(k) if k.isdigit() else k: v\n                                                      for k, v in d.items()})\n                stride, names = int(d['stride']), d['names']\n        elif dnn:  # ONNX OpenCV DNN\n            LOGGER.info(f'Loading {w} for ONNX OpenCV DNN inference...')\n            check_requirements('opencv-python>=4.5.4')\n            net = cv2.dnn.readNetFromONNX(w)\n        elif onnx:  # ONNX Runtime\n            LOGGER.info(f'Loading {w} for ONNX Runtime inference...')\n            check_requirements(('onnx', 'onnxruntime-gpu' if cuda else 'onnxruntime'))\n            import onnxruntime\n            providers = ['CUDAExecutionProvider', 'CPUExecutionProvider'] if cuda else ['CPUExecutionProvider']\n            session = onnxruntime.InferenceSession(w, providers=providers)\n            output_names = [x.name for x in session.get_outputs()]\n            meta = session.get_modelmeta().custom_metadata_map  # metadata\n            if 'stride' in meta:\n                stride, names = int(meta['stride']), eval(meta['names'])\n        elif xml:  # OpenVINO\n            LOGGER.info(f'Loading {w} for OpenVINO inference...')\n            check_requirements('openvino')  # requires openvino-dev: https://pypi.org/project/openvino-dev/\n            from openvino.runtime import Core, Layout, get_batch\n            ie = Core()\n            if not Path(w).is_file():  # if not *.xml\n                w = next(Path(w).glob('*.xml'))  # get *.xml file from *_openvino_model dir\n            network = ie.read_model(model=w, weights=Path(w).with_suffix('.bin'))\n            if network.get_parameters()[0].get_layout().empty:\n                network.get_parameters()[0].set_layout(Layout('NCHW'))\n            batch_dim = get_batch(network)\n            if batch_dim.is_static:\n                batch_size = batch_dim.get_length()\n            executable_network = ie.compile_model(network, device_name='CPU')  # device_name=\"MYRIAD\" for Intel NCS2\n            stride, names = self._load_metadata(Path(w).with_suffix('.yaml'))  # load metadata\n        elif engine:  # TensorRT\n            LOGGER.info(f'Loading {w} for TensorRT inference...')\n            import tensorrt as trt  # https://developer.nvidia.com/nvidia-tensorrt-download\n            check_version(trt.__version__, '7.0.0', hard=True)  # require tensorrt>=7.0.0\n            if device.type == 'cpu':\n                device = torch.device('cuda:0')\n            Binding = namedtuple('Binding', ('name', 'dtype', 'shape', 'data', 'ptr'))\n            logger = trt.Logger(trt.Logger.INFO)\n            with open(w, 'rb') as f, trt.Runtime(logger) as runtime:\n                model = runtime.deserialize_cuda_engine(f.read())\n            context = model.create_execution_context()\n            bindings = OrderedDict()\n            output_names = []\n            fp16 = False  # default updated below\n            dynamic = False\n            for i in range(model.num_bindings):\n                name = model.get_binding_name(i)\n                dtype = trt.nptype(model.get_binding_dtype(i))\n                if model.binding_is_input(i):\n                    if -1 in tuple(model.get_binding_shape(i)):  # dynamic\n                        dynamic = True\n                        context.set_binding_shape(i, tuple(model.get_profile_shape(0, i)[2]))\n                    if dtype == np.float16:\n                        fp16 = True\n                else:  # output\n                    output_names.append(name)\n                shape = tuple(context.get_binding_shape(i))\n                im = torch.from_numpy(np.empty(shape, dtype=dtype)).to(device)\n                bindings[name] = Binding(name, dtype, shape, im, int(im.data_ptr()))\n            binding_addrs = OrderedDict((n, d.ptr) for n, d in bindings.items())\n            batch_size = bindings['images'].shape[0]  # if dynamic, this is instead max batch size\n        elif coreml:  # CoreML\n            LOGGER.info(f'Loading {w} for CoreML inference...')\n            import coremltools as ct\n            model = ct.models.MLModel(w)\n        elif saved_model:  # TF SavedModel\n            LOGGER.info(f'Loading {w} for TensorFlow SavedModel inference...')\n            import tensorflow as tf\n            keras = False  # assume TF1 saved_model\n            model = tf.keras.models.load_model(w) if keras else tf.saved_model.load(w)\n        elif pb:  # GraphDef https://www.tensorflow.org/guide/migrate#a_graphpb_or_graphpbtxt\n            LOGGER.info(f'Loading {w} for TensorFlow GraphDef inference...')\n            import tensorflow as tf\n\n            def wrap_frozen_graph(gd, inputs, outputs):\n                x = tf.compat.v1.wrap_function(lambda: tf.compat.v1.import_graph_def(gd, name=''), [])  # wrapped\n                ge = x.graph.as_graph_element\n                return x.prune(tf.nest.map_structure(ge, inputs), tf.nest.map_structure(ge, outputs))\n\n            def gd_outputs(gd):\n                name_list, input_list = [], []\n                for node in gd.node:  # tensorflow.core.framework.node_def_pb2.NodeDef\n                    name_list.append(node.name)\n                    input_list.extend(node.input)\n                return sorted(f'{x}:0' for x in list(set(name_list) - set(input_list)) if not x.startswith('NoOp'))\n\n            gd = tf.Graph().as_graph_def()  # TF GraphDef\n            with open(w, 'rb') as f:\n                gd.ParseFromString(f.read())\n            frozen_func = wrap_frozen_graph(gd, inputs='x:0', outputs=gd_outputs(gd))\n        elif tflite or edgetpu:  # https://www.tensorflow.org/lite/guide/python#install_tensorflow_lite_for_python\n            try:  # https://coral.ai/docs/edgetpu/tflite-python/#update-existing-tf-lite-code-for-the-edge-tpu\n                from tflite_runtime.interpreter import Interpreter, load_delegate\n            except ImportError:\n                import tensorflow as tf\n                Interpreter, load_delegate = tf.lite.Interpreter, tf.lite.experimental.load_delegate,\n            if edgetpu:  # TF Edge TPU https://coral.ai/software/#edgetpu-runtime\n                LOGGER.info(f'Loading {w} for TensorFlow Lite Edge TPU inference...')\n                delegate = {\n                    'Linux': 'libedgetpu.so.1',\n                    'Darwin': 'libedgetpu.1.dylib',\n                    'Windows': 'edgetpu.dll'}[platform.system()]\n                interpreter = Interpreter(model_path=w, experimental_delegates=[load_delegate(delegate)])\n            else:  # TFLite\n                LOGGER.info(f'Loading {w} for TensorFlow Lite inference...')\n                interpreter = Interpreter(model_path=w)  # load TFLite model\n            interpreter.allocate_tensors()  # allocate\n            input_details = interpreter.get_input_details()  # inputs\n            output_details = interpreter.get_output_details()  # outputs\n            # load metadata\n            with contextlib.suppress(zipfile.BadZipFile):\n                with zipfile.ZipFile(w, 'r') as model:\n                    meta_file = model.namelist()[0]\n                    meta = ast.literal_eval(model.read(meta_file).decode('utf-8'))\n                    stride, names = int(meta['stride']), meta['names']\n        elif tfjs:  # TF.js\n            raise NotImplementedError('ERROR: YOLOv5 TF.js inference is not supported')\n        elif paddle:  # PaddlePaddle\n            LOGGER.info(f'Loading {w} for PaddlePaddle inference...')\n            check_requirements('paddlepaddle-gpu' if cuda else 'paddlepaddle')\n            import paddle.inference as pdi\n            if not Path(w).is_file():  # if not *.pdmodel\n                w = next(Path(w).rglob('*.pdmodel'))  # get *.pdmodel file from *_paddle_model dir\n            weights = Path(w).with_suffix('.pdiparams')\n            config = pdi.Config(str(w), str(weights))\n            if cuda:\n                config.enable_use_gpu(memory_pool_init_size_mb=2048, device_id=0)\n            predictor = pdi.create_predictor(config)\n            input_handle = predictor.get_input_handle(predictor.get_input_names()[0])\n            output_names = predictor.get_output_names()\n        elif triton:  # NVIDIA Triton Inference Server\n            LOGGER.info(f'Using {w} as Triton Inference Server...')\n            check_requirements('tritonclient[all]')\n            from utils.triton import TritonRemoteModel\n            model = TritonRemoteModel(url=w)\n            nhwc = model.runtime.startswith('tensorflow')\n        else:\n            raise NotImplementedError(f'ERROR: {w} is not a supported format')\n\n        # class names\n        if 'names' not in locals():\n            names = yaml_load(data)['names'] if data else {i: f'class{i}' for i in range(999)}\n        if names[0] == 'n01440764' and len(names) == 1000:  # ImageNet\n            names = yaml_load(ROOT / 'data/ImageNet.yaml')['names']  # human-readable names\n\n        self.__dict__.update(locals())  # assign all variables to self\n\n    def forward(self, im, augment=False, visualize=False):\n        # YOLOv5 MultiBackend inference\n        b, ch, h, w = im.shape  # batch, channel, height, width\n        if self.fp16 and im.dtype != torch.float16:\n            im = im.half()  # to FP16\n        if self.nhwc:\n            im = im.permute(0, 2, 3, 1)  # torch BCHW to numpy BHWC shape(1,320,192,3)\n\n        if self.pt:  # PyTorch\n            y = self.model(im, augment=augment, visualize=visualize) if augment or visualize else self.model(im)\n        elif self.jit:  # TorchScript\n            y = self.model(im)\n        elif self.dnn:  # ONNX OpenCV DNN\n            im = im.cpu().numpy()  # torch to numpy\n            self.net.setInput(im)\n            y = self.net.forward()\n        elif self.onnx:  # ONNX Runtime\n            im = im.cpu().numpy()  # torch to numpy\n            y = self.session.run(self.output_names, {self.session.get_inputs()[0].name: im})\n        elif self.xml:  # OpenVINO\n            im = im.cpu().numpy()  # FP32\n            y = list(self.executable_network([im]).values())\n        elif self.engine:  # TensorRT\n            if self.dynamic and im.shape != self.bindings['images'].shape:\n                i = self.model.get_binding_index('images')\n                self.context.set_binding_shape(i, im.shape)  # reshape if dynamic\n                self.bindings['images'] = self.bindings['images']._replace(shape=im.shape)\n                for name in self.output_names:\n                    i = self.model.get_binding_index(name)\n                    self.bindings[name].data.resize_(tuple(self.context.get_binding_shape(i)))\n            s = self.bindings['images'].shape\n            assert im.shape == s, f\"input size {im.shape} {'>' if self.dynamic else 'not equal to'} max model size {s}\"\n            self.binding_addrs['images'] = int(im.data_ptr())\n            self.context.execute_v2(list(self.binding_addrs.values()))\n            y = [self.bindings[x].data for x in sorted(self.output_names)]\n        elif self.coreml:  # CoreML\n            im = im.cpu().numpy()\n            im = Image.fromarray((im[0] * 255).astype('uint8'))\n            # im = im.resize((192, 320), Image.ANTIALIAS)\n            y = self.model.predict({'image': im})  # coordinates are xywh normalized\n            if 'confidence' in y:\n                box = xywh2xyxy(y['coordinates'] * [[w, h, w, h]])  # xyxy pixels\n                conf, cls = y['confidence'].max(1), y['confidence'].argmax(1).astype(np.float)\n                y = np.concatenate((box, conf.reshape(-1, 1), cls.reshape(-1, 1)), 1)\n            else:\n                y = list(reversed(y.values()))  # reversed for segmentation models (pred, proto)\n        elif self.paddle:  # PaddlePaddle\n            im = im.cpu().numpy().astype(np.float32)\n            self.input_handle.copy_from_cpu(im)\n            self.predictor.run()\n            y = [self.predictor.get_output_handle(x).copy_to_cpu() for x in self.output_names]\n        elif self.triton:  # NVIDIA Triton Inference Server\n            y = self.model(im)\n        else:  # TensorFlow (SavedModel, GraphDef, Lite, Edge TPU)\n            im = im.cpu().numpy()\n            if self.saved_model:  # SavedModel\n                y = self.model(im, training=False) if self.keras else self.model(im)\n            elif self.pb:  # GraphDef\n                y = self.frozen_func(x=self.tf.constant(im))\n            else:  # Lite or Edge TPU\n                input = self.input_details[0]\n                int8 = input['dtype'] == np.uint8  # is TFLite quantized uint8 model\n                if int8:\n                    scale, zero_point = input['quantization']\n                    im = (im / scale + zero_point).astype(np.uint8)  # de-scale\n                self.interpreter.set_tensor(input['index'], im)\n                self.interpreter.invoke()\n                y = []\n                for output in self.output_details:\n                    x = self.interpreter.get_tensor(output['index'])\n                    if int8:\n                        scale, zero_point = output['quantization']\n                        x = (x.astype(np.float32) - zero_point) * scale  # re-scale\n                    y.append(x)\n            y = [x if isinstance(x, np.ndarray) else x.numpy() for x in y]\n            y[0][..., :4] *= [w, h, w, h]  # xywh normalized to pixels\n\n        if isinstance(y, (list, tuple)):\n            return self.from_numpy(y[0]) if len(y) == 1 else [self.from_numpy(x) for x in y]\n        else:\n            return self.from_numpy(y)\n\n    def from_numpy(self, x):\n        return torch.from_numpy(x).to(self.device) if isinstance(x, np.ndarray) else x\n\n    def warmup(self, imgsz=(1, 3, 640, 640)):\n        # Warmup model by running inference once\n        warmup_types = self.pt, self.jit, self.onnx, self.engine, self.saved_model, self.pb, self.triton\n        if any(warmup_types) and (self.device.type != 'cpu' or self.triton):\n            im = torch.empty(*imgsz, dtype=torch.half if self.fp16 else torch.float, device=self.device)  # input\n            for _ in range(2 if self.jit else 1):  #\n                self.forward(im)  # warmup\n\n    @staticmethod\n    def _model_type(p='path/to/model.pt'):\n        # Return model type from model path, i.e. path='path/to/model.onnx' -> type=onnx\n        # types = [pt, jit, onnx, xml, engine, coreml, saved_model, pb, tflite, edgetpu, tfjs, paddle]\n        from export import export_formats\n        from utils.downloads import is_url\n        sf = list(export_formats().Suffix)  # export suffixes\n        if not is_url(p, check=False):\n            check_suffix(p, sf)  # checks\n        url = urlparse(p)  # if url may be Triton inference server\n        types = [s in Path(p).name for s in sf]\n        types[8] &= not types[9]  # tflite &= not edgetpu\n        triton = not any(types) and all([any(s in url.scheme for s in ['http', 'grpc']), url.netloc])\n        return types + [triton]\n\n    @staticmethod\n    def _load_metadata(f=Path('path/to/meta.yaml')):\n        # Load metadata from meta.yaml if it exists\n        if f.exists():\n            d = yaml_load(f)\n            return d['stride'], d['names']  # assign stride, names\n        return None, None\n\n\nclass AutoShape(nn.Module):\n    # YOLOv5 input-robust model wrapper for passing cv2/np/PIL/torch inputs. Includes preprocessing, inference and NMS\n    conf = 0.25  # NMS confidence threshold\n    iou = 0.45  # NMS IoU threshold\n    agnostic = False  # NMS class-agnostic\n    multi_label = False  # NMS multiple labels per box\n    classes = None  # (optional list) filter by class, i.e. = [0, 15, 16] for COCO persons, cats and dogs\n    max_det = 1000  # maximum number of detections per image\n    amp = False  # Automatic Mixed Precision (AMP) inference\n\n    def __init__(self, model, verbose=True):\n        super().__init__()\n        if verbose:\n            LOGGER.info('Adding AutoShape... ')\n        copy_attr(self, model, include=('yaml', 'nc', 'hyp', 'names', 'stride', 'abc'), exclude=())  # copy attributes\n        self.dmb = isinstance(model, DetectMultiBackend)  # DetectMultiBackend() instance\n        self.pt = not self.dmb or model.pt  # PyTorch model\n        self.model = model.eval()\n        if self.pt:\n            m = self.model.model.model[-1] if self.dmb else self.model.model[-1]  # Detect()\n            m.inplace = False  # Detect.inplace=False for safe multithread inference\n            m.export = True  # do not output loss values\n\n    def _apply(self, fn):\n        # Apply to(), cpu(), cuda(), half() to model tensors that are not parameters or registered buffers\n        self = super()._apply(fn)\n        if self.pt:\n            m = self.model.model.model[-1] if self.dmb else self.model.model[-1]  # Detect()\n            m.stride = fn(m.stride)\n            m.grid = list(map(fn, m.grid))\n            if isinstance(m.anchor_grid, list):\n                m.anchor_grid = list(map(fn, m.anchor_grid))\n        return self\n\n    @smart_inference_mode()\n    def forward(self, ims, size=640, augment=False, profile=False):\n        # Inference from various sources. For size(height=640, width=1280), RGB images example inputs are:\n        #   file:        ims = 'data/images/zidane.jpg'  # str or PosixPath\n        #   URI:             = 'https://ultralytics.com/images/zidane.jpg'\n        #   OpenCV:          = cv2.imread('image.jpg')[:,:,::-1]  # HWC BGR to RGB x(640,1280,3)\n        #   PIL:             = Image.open('image.jpg') or ImageGrab.grab()  # HWC x(640,1280,3)\n        #   numpy:           = np.zeros((640,1280,3))  # HWC\n        #   torch:           = torch.zeros(16,3,320,640)  # BCHW (scaled to size=640, 0-1 values)\n        #   multiple:        = [Image.open('image1.jpg'), Image.open('image2.jpg'), ...]  # list of images\n\n        dt = (Profile(), Profile(), Profile())\n        with dt[0]:\n            if isinstance(size, int):  # expand\n                size = (size, size)\n            p = next(self.model.parameters()) if self.pt else torch.empty(1, device=self.model.device)  # param\n            autocast = self.amp and (p.device.type != 'cpu')  # Automatic Mixed Precision (AMP) inference\n            if isinstance(ims, torch.Tensor):  # torch\n                with amp.autocast(autocast):\n                    return self.model(ims.to(p.device).type_as(p), augment=augment)  # inference\n\n            # Pre-process\n            n, ims = (len(ims), list(ims)) if isinstance(ims, (list, tuple)) else (1, [ims])  # number, list of images\n            shape0, shape1, files = [], [], []  # image and inference shapes, filenames\n            for i, im in enumerate(ims):\n                f = f'image{i}'  # filename\n                if isinstance(im, (str, Path)):  # filename or uri\n                    im, f = Image.open(requests.get(im, stream=True).raw if str(im).startswith('http') else im), im\n                    im = np.asarray(exif_transpose(im))\n                elif isinstance(im, Image.Image):  # PIL Image\n                    im, f = np.asarray(exif_transpose(im)), getattr(im, 'filename', f) or f\n                files.append(Path(f).with_suffix('.jpg').name)\n                if im.shape[0] < 5:  # image in CHW\n                    im = im.transpose((1, 2, 0))  # reverse dataloader .transpose(2, 0, 1)\n                im = im[..., :3] if im.ndim == 3 else cv2.cvtColor(im, cv2.COLOR_GRAY2BGR)  # enforce 3ch input\n                s = im.shape[:2]  # HWC\n                shape0.append(s)  # image shape\n                g = max(size) / max(s)  # gain\n                shape1.append([int(y * g) for y in s])\n                ims[i] = im if im.data.contiguous else np.ascontiguousarray(im)  # update\n            shape1 = [make_divisible(x, self.stride) for x in np.array(shape1).max(0)]  # inf shape\n            x = [letterbox(im, shape1, auto=False)[0] for im in ims]  # pad\n            x = np.ascontiguousarray(np.array(x).transpose((0, 3, 1, 2)))  # stack and BHWC to BCHW\n            x = torch.from_numpy(x).to(p.device).type_as(p) / 255  # uint8 to fp16/32\n\n        with amp.autocast(autocast):\n            # Inference\n            with dt[1]:\n                y = self.model(x, augment=augment)  # forward\n\n            # Post-process\n            with dt[2]:\n                y = non_max_suppression(y if self.dmb else y[0],\n                                        self.conf,\n                                        self.iou,\n                                        self.classes,\n                                        self.agnostic,\n                                        self.multi_label,\n                                        max_det=self.max_det)  # NMS\n                for i in range(n):\n                    scale_boxes(shape1, y[i][:, :4], shape0[i])\n\n            return Detections(ims, y, files, dt, self.names, x.shape)\n\n\nclass Detections:\n    # YOLOv5 detections class for inference results\n    def __init__(self, ims, pred, files, times=(0, 0, 0), names=None, shape=None):\n        super().__init__()\n        d = pred[0].device  # device\n        gn = [torch.tensor([*(im.shape[i] for i in [1, 0, 1, 0]), 1, 1], device=d) for im in ims]  # normalizations\n        self.ims = ims  # list of images as numpy arrays\n        self.pred = pred  # list of tensors pred[0] = (xyxy, conf, cls)\n        self.names = names  # class names\n        self.files = files  # image filenames\n        self.times = times  # profiling times\n        self.xyxy = pred  # xyxy pixels\n        self.xywh = [xyxy2xywh(x) for x in pred]  # xywh pixels\n        self.xyxyn = [x / g for x, g in zip(self.xyxy, gn)]  # xyxy normalized\n        self.xywhn = [x / g for x, g in zip(self.xywh, gn)]  # xywh normalized\n        self.n = len(self.pred)  # number of images (batch size)\n        self.t = tuple(x.t / self.n * 1E3 for x in times)  # timestamps (ms)\n        self.s = tuple(shape)  # inference BCHW shape\n\n    def _run(self, pprint=False, show=False, save=False, crop=False, render=False, labels=True, save_dir=Path('')):\n        s, crops = '', []\n        for i, (im, pred) in enumerate(zip(self.ims, self.pred)):\n            s += f'\\nimage {i + 1}/{len(self.pred)}: {im.shape[0]}x{im.shape[1]} '  # string\n            if pred.shape[0]:\n                for c in pred[:, -1].unique():\n                    n = (pred[:, -1] == c).sum()  # detections per class\n                    s += f\"{n} {self.names[int(c)]}{'s' * (n > 1)}, \"  # add to string\n                s = s.rstrip(', ')\n                if show or save or render or crop:\n                    annotator = Annotator(im, example=str(self.names))\n                    for *box, conf, cls in reversed(pred):  # xyxy, confidence, class\n                        label = f'{self.names[int(cls)]} {conf:.2f}'\n                        if crop:\n                            file = save_dir / 'crops' / self.names[int(cls)] / self.files[i] if save else None\n                            crops.append({\n                                'box': box,\n                                'conf': conf,\n                                'cls': cls,\n                                'label': label,\n                                'im': save_one_box(box, im, file=file, save=save)})\n                        else:  # all others\n                            annotator.box_label(box, label if labels else '', color=colors(cls))\n                    im = annotator.im\n            else:\n                s += '(no detections)'\n\n            im = Image.fromarray(im.astype(np.uint8)) if isinstance(im, np.ndarray) else im  # from np\n            if show:\n                display(im) if is_notebook() else im.show(self.files[i])\n            if save:\n                f = self.files[i]\n                im.save(save_dir / f)  # save\n                if i == self.n - 1:\n                    LOGGER.info(f\"Saved {self.n} image{'s' * (self.n > 1)} to {colorstr('bold', save_dir)}\")\n            if render:\n                self.ims[i] = np.asarray(im)\n        if pprint:\n            s = s.lstrip('\\n')\n            return f'{s}\\nSpeed: %.1fms pre-process, %.1fms inference, %.1fms NMS per image at shape {self.s}' % self.t\n        if crop:\n            if save:\n                LOGGER.info(f'Saved results to {save_dir}\\n')\n            return crops\n\n    @TryExcept('Showing images is not supported in this environment')\n    def show(self, labels=True):\n        self._run(show=True, labels=labels)  # show results\n\n    def save(self, labels=True, save_dir='runs/detect/exp', exist_ok=False):\n        save_dir = increment_path(save_dir, exist_ok, mkdir=True)  # increment save_dir\n        self._run(save=True, labels=labels, save_dir=save_dir)  # save results\n\n    def crop(self, save=True, save_dir='runs/detect/exp', exist_ok=False):\n        save_dir = increment_path(save_dir, exist_ok, mkdir=True) if save else None\n        return self._run(crop=True, save=save, save_dir=save_dir)  # crop results\n\n    def render(self, labels=True):\n        self._run(render=True, labels=labels)  # render results\n        return self.ims\n\n    def pandas(self):\n        # return detections as pandas DataFrames, i.e. print(results.pandas().xyxy[0])\n        new = copy(self)  # return copy\n        ca = 'xmin', 'ymin', 'xmax', 'ymax', 'confidence', 'class', 'name'  # xyxy columns\n        cb = 'xcenter', 'ycenter', 'width', 'height', 'confidence', 'class', 'name'  # xywh columns\n        for k, c in zip(['xyxy', 'xyxyn', 'xywh', 'xywhn'], [ca, ca, cb, cb]):\n            a = [[x[:5] + [int(x[5]), self.names[int(x[5])]] for x in x.tolist()] for x in getattr(self, k)]  # update\n            setattr(new, k, [pd.DataFrame(x, columns=c) for x in a])\n        return new\n\n    def tolist(self):\n        # return a list of Detections objects, i.e. 'for result in results.tolist():'\n        r = range(self.n)  # iterable\n        x = [Detections([self.ims[i]], [self.pred[i]], [self.files[i]], self.times, self.names, self.s) for i in r]\n        # for d in x:\n        #    for k in ['ims', 'pred', 'xyxy', 'xyxyn', 'xywh', 'xywhn']:\n        #        setattr(d, k, getattr(d, k)[0])  # pop out of list\n        return x\n\n    def print(self):\n        LOGGER.info(self.__str__())\n\n    def __len__(self):  # override len(results)\n        return self.n\n\n    def __str__(self):  # override print(results)\n        return self._run(pprint=True)  # print results\n\n    def __repr__(self):\n        return f'YOLOv5 {self.__class__} instance\\n' + self.__str__()\n\n\nclass Proto(nn.Module):\n    # YOLOv5 mask Proto module for segmentation models\n    def __init__(self, c1, c_=256, c2=32):  # ch_in, number of protos, number of masks\n        super().__init__()\n        self.cv1 = Conv(c1, c_, k=3)\n        self.upsample = nn.Upsample(scale_factor=2, mode='nearest')\n        self.cv2 = Conv(c_, c_, k=3)\n        self.cv3 = Conv(c_, c2)\n\n    def forward(self, x):\n        return self.cv3(self.cv2(self.upsample(self.cv1(x))))\n\n\nclass Classify(nn.Module):\n    # YOLOv5 classification head, i.e. x(b,c1,20,20) to x(b,c2)\n    def __init__(self,\n                 c1,\n                 c2,\n                 k=1,\n                 s=1,\n                 p=None,\n                 g=1,\n                 dropout_p=0.0):  # ch_in, ch_out, kernel, stride, padding, groups, dropout probability\n        super().__init__()\n        c_ = 1280  # efficientnet_b0 size\n        self.conv = Conv(c1, c_, k, s, autopad(k, p), g)\n        self.pool = nn.AdaptiveAvgPool2d(1)  # to x(b,c_,1,1)\n        self.drop = nn.Dropout(p=dropout_p, inplace=True)\n        self.linear = nn.Linear(c_, c2)  # to x(b,c2)\n\n    def forward(self, x):\n        if isinstance(x, list):\n            x = torch.cat(x, 1)\n        return self.linear(self.drop(self.pool(self.conv(x)).flatten(1)))\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/models/experimental.py",
    "content": "# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n\"\"\"\nExperimental modules\n\"\"\"\nimport math\n\nimport numpy as np\nimport torch\nimport torch.nn as nn\n\nfrom utils.downloads import attempt_download\n\n\nclass Sum(nn.Module):\n    # Weighted sum of 2 or more layers https://arxiv.org/abs/1911.09070\n    def __init__(self, n, weight=False):  # n: number of inputs\n        super().__init__()\n        self.weight = weight  # apply weights boolean\n        self.iter = range(n - 1)  # iter object\n        if weight:\n            self.w = nn.Parameter(-torch.arange(1.0, n) / 2, requires_grad=True)  # layer weights\n\n    def forward(self, x):\n        y = x[0]  # no weight\n        if self.weight:\n            w = torch.sigmoid(self.w) * 2\n            for i in self.iter:\n                y = y + x[i + 1] * w[i]\n        else:\n            for i in self.iter:\n                y = y + x[i + 1]\n        return y\n\n\nclass MixConv2d(nn.Module):\n    # Mixed Depth-wise Conv https://arxiv.org/abs/1907.09595\n    def __init__(self, c1, c2, k=(1, 3), s=1, equal_ch=True):  # ch_in, ch_out, kernel, stride, ch_strategy\n        super().__init__()\n        n = len(k)  # number of convolutions\n        if equal_ch:  # equal c_ per group\n            i = torch.linspace(0, n - 1E-6, c2).floor()  # c2 indices\n            c_ = [(i == g).sum() for g in range(n)]  # intermediate channels\n        else:  # equal weight.numel() per group\n            b = [c2] + [0] * n\n            a = np.eye(n + 1, n, k=-1)\n            a -= np.roll(a, 1, axis=1)\n            a *= np.array(k) ** 2\n            a[0] = 1\n            c_ = np.linalg.lstsq(a, b, rcond=None)[0].round()  # solve for equal weight indices, ax = b\n\n        self.m = nn.ModuleList([\n            nn.Conv2d(c1, int(c_), k, s, k // 2, groups=math.gcd(c1, int(c_)), bias=False) for k, c_ in zip(k, c_)])\n        self.bn = nn.BatchNorm2d(c2)\n        self.act = nn.SiLU()\n\n    def forward(self, x):\n        return self.act(self.bn(torch.cat([m(x) for m in self.m], 1)))\n\n\nclass Ensemble(nn.ModuleList):\n    # Ensemble of models\n    def __init__(self):\n        super().__init__()\n\n    def forward(self, x, augment=False, profile=False, visualize=False):\n        y = [module(x, augment, profile, visualize)[0] for module in self]\n        # y = torch.stack(y).max(0)[0]  # max ensemble\n        # y = torch.stack(y).mean(0)  # mean ensemble\n        y = torch.cat(y, 1)  # nms ensemble\n        return y, None  # inference, train output\n\n\ndef attempt_load(weights, device=None, inplace=True, fuse=True):\n    # Loads an ensemble of models weights=[a,b,c] or a single model weights=[a] or weights=a\n    from models.yolo import Detect, Model\n\n    model = Ensemble()\n    for w in weights if isinstance(weights, list) else [weights]:\n        ckpt = torch.load(attempt_download(w), map_location='cpu')  # load\n        ckpt = (ckpt.get('ema') or ckpt['model']).to(device).float()  # FP32 model\n\n        # Model compatibility updates\n        if not hasattr(ckpt, 'stride'):\n            ckpt.stride = torch.tensor([32.])\n        if hasattr(ckpt, 'names') and isinstance(ckpt.names, (list, tuple)):\n            ckpt.names = dict(enumerate(ckpt.names))  # convert to dict\n\n        model.append(ckpt.fuse().eval() if fuse and hasattr(ckpt, 'fuse') else ckpt.eval())  # model in eval mode\n\n    # Module compatibility updates\n    for m in model.modules():\n        t = type(m)\n        if t in (nn.Hardswish, nn.LeakyReLU, nn.ReLU, nn.ReLU6, nn.SiLU, Detect, Model):\n            m.inplace = inplace  # torch 1.7.0 compatibility\n            if t is Detect and not isinstance(m.anchor_grid, list):\n                delattr(m, 'anchor_grid')\n                setattr(m, 'anchor_grid', [torch.zeros(1)] * m.nl)\n        elif t is nn.Upsample and not hasattr(m, 'recompute_scale_factor'):\n            m.recompute_scale_factor = None  # torch 1.11.0 compatibility\n\n    # Return model\n    if len(model) == 1:\n        return model[-1]\n\n    # Return detection ensemble\n    print(f'Ensemble created with {weights}\\n')\n    for k in 'names', 'nc', 'yaml':\n        setattr(model, k, getattr(model[0], k))\n    model.stride = model[torch.argmax(torch.tensor([m.stride.max() for m in model])).int()].stride  # max stride\n    assert all(model[0].nc == m.nc for m in model), f'Models have different class counts: {[m.nc for m in model]}'\n    return model\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/models/hub/anchors.yaml",
    "content": "# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n# Default anchors for COCO data\n\n\n# P5 -------------------------------------------------------------------------------------------------------------------\n# P5-640:\nanchors_p5_640:\n  - [10,13, 16,30, 33,23]  # P3/8\n  - [30,61, 62,45, 59,119]  # P4/16\n  - [116,90, 156,198, 373,326]  # P5/32\n\n\n# P6 -------------------------------------------------------------------------------------------------------------------\n# P6-640:  thr=0.25: 0.9964 BPR, 5.54 anchors past thr, n=12, img_size=640, metric_all=0.281/0.716-mean/best, past_thr=0.469-mean: 9,11,  21,19,  17,41,  43,32,  39,70,  86,64,  65,131,  134,130,  120,265,  282,180,  247,354,  512,387\nanchors_p6_640:\n  - [9,11,  21,19,  17,41]  # P3/8\n  - [43,32,  39,70,  86,64]  # P4/16\n  - [65,131,  134,130,  120,265]  # P5/32\n  - [282,180,  247,354,  512,387]  # P6/64\n\n# P6-1280:  thr=0.25: 0.9950 BPR, 5.55 anchors past thr, n=12, img_size=1280, metric_all=0.281/0.714-mean/best, past_thr=0.468-mean: 19,27,  44,40,  38,94,  96,68,  86,152,  180,137,  140,301,  303,264,  238,542,  436,615,  739,380,  925,792\nanchors_p6_1280:\n  - [19,27,  44,40,  38,94]  # P3/8\n  - [96,68,  86,152,  180,137]  # P4/16\n  - [140,301,  303,264,  238,542]  # P5/32\n  - [436,615,  739,380,  925,792]  # P6/64\n\n# P6-1920:  thr=0.25: 0.9950 BPR, 5.55 anchors past thr, n=12, img_size=1920, metric_all=0.281/0.714-mean/best, past_thr=0.468-mean: 28,41,  67,59,  57,141,  144,103,  129,227,  270,205,  209,452,  455,396,  358,812,  653,922,  1109,570,  1387,1187\nanchors_p6_1920:\n  - [28,41,  67,59,  57,141]  # P3/8\n  - [144,103,  129,227,  270,205]  # P4/16\n  - [209,452,  455,396,  358,812]  # P5/32\n  - [653,922,  1109,570,  1387,1187]  # P6/64\n\n\n# P7 -------------------------------------------------------------------------------------------------------------------\n# P7-640:  thr=0.25: 0.9962 BPR, 6.76 anchors past thr, n=15, img_size=640, metric_all=0.275/0.733-mean/best, past_thr=0.466-mean: 11,11,  13,30,  29,20,  30,46,  61,38,  39,92,  78,80,  146,66,  79,163,  149,150,  321,143,  157,303,  257,402,  359,290,  524,372\nanchors_p7_640:\n  - [11,11,  13,30,  29,20]  # P3/8\n  - [30,46,  61,38,  39,92]  # P4/16\n  - [78,80,  146,66,  79,163]  # P5/32\n  - [149,150,  321,143,  157,303]  # P6/64\n  - [257,402,  359,290,  524,372]  # P7/128\n\n# P7-1280:  thr=0.25: 0.9968 BPR, 6.71 anchors past thr, n=15, img_size=1280, metric_all=0.273/0.732-mean/best, past_thr=0.463-mean: 19,22,  54,36,  32,77,  70,83,  138,71,  75,173,  165,159,  148,334,  375,151,  334,317,  251,626,  499,474,  750,326,  534,814,  1079,818\nanchors_p7_1280:\n  - [19,22,  54,36,  32,77]  # P3/8\n  - [70,83,  138,71,  75,173]  # P4/16\n  - [165,159,  148,334,  375,151]  # P5/32\n  - [334,317,  251,626,  499,474]  # P6/64\n  - [750,326,  534,814,  1079,818]  # P7/128\n\n# P7-1920:  thr=0.25: 0.9968 BPR, 6.71 anchors past thr, n=15, img_size=1920, metric_all=0.273/0.732-mean/best, past_thr=0.463-mean: 29,34,  81,55,  47,115,  105,124,  207,107,  113,259,  247,238,  222,500,  563,227,  501,476,  376,939,  749,711,  1126,489,  801,1222,  1618,1227\nanchors_p7_1920:\n  - [29,34,  81,55,  47,115]  # P3/8\n  - [105,124,  207,107,  113,259]  # P4/16\n  - [247,238,  222,500,  563,227]  # P5/32\n  - [501,476,  376,939,  749,711]  # P6/64\n  - [1126,489,  801,1222,  1618,1227]  # P7/128\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/models/hub/yolov3-spp.yaml",
    "content": "# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n\n# Parameters\nnc: 80  # number of classes\ndepth_multiple: 1.0  # model depth multiple\nwidth_multiple: 1.0  # layer channel multiple\nanchors:\n  - [10,13, 16,30, 33,23]  # P3/8\n  - [30,61, 62,45, 59,119]  # P4/16\n  - [116,90, 156,198, 373,326]  # P5/32\n\n# darknet53 backbone\nbackbone:\n  # [from, number, module, args]\n  [[-1, 1, Conv, [32, 3, 1]],  # 0\n   [-1, 1, Conv, [64, 3, 2]],  # 1-P1/2\n   [-1, 1, Bottleneck, [64]],\n   [-1, 1, Conv, [128, 3, 2]],  # 3-P2/4\n   [-1, 2, Bottleneck, [128]],\n   [-1, 1, Conv, [256, 3, 2]],  # 5-P3/8\n   [-1, 8, Bottleneck, [256]],\n   [-1, 1, Conv, [512, 3, 2]],  # 7-P4/16\n   [-1, 8, Bottleneck, [512]],\n   [-1, 1, Conv, [1024, 3, 2]],  # 9-P5/32\n   [-1, 4, Bottleneck, [1024]],  # 10\n  ]\n\n# YOLOv3-SPP head\nhead:\n  [[-1, 1, Bottleneck, [1024, False]],\n   [-1, 1, SPP, [512, [5, 9, 13]]],\n   [-1, 1, Conv, [1024, 3, 1]],\n   [-1, 1, Conv, [512, 1, 1]],\n   [-1, 1, Conv, [1024, 3, 1]],  # 15 (P5/32-large)\n\n   [-2, 1, Conv, [256, 1, 1]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [[-1, 8], 1, Concat, [1]],  # cat backbone P4\n   [-1, 1, Bottleneck, [512, False]],\n   [-1, 1, Bottleneck, [512, False]],\n   [-1, 1, Conv, [256, 1, 1]],\n   [-1, 1, Conv, [512, 3, 1]],  # 22 (P4/16-medium)\n\n   [-2, 1, Conv, [128, 1, 1]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [[-1, 6], 1, Concat, [1]],  # cat backbone P3\n   [-1, 1, Bottleneck, [256, False]],\n   [-1, 2, Bottleneck, [256, False]],  # 27 (P3/8-small)\n\n   [[27, 22, 15], 1, Detect, [nc, anchors]],   # Detect(P3, P4, P5)\n  ]\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/models/hub/yolov3-tiny.yaml",
    "content": "# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n\n# Parameters\nnc: 80  # number of classes\ndepth_multiple: 1.0  # model depth multiple\nwidth_multiple: 1.0  # layer channel multiple\nanchors:\n  - [10,14, 23,27, 37,58]  # P4/16\n  - [81,82, 135,169, 344,319]  # P5/32\n\n# YOLOv3-tiny backbone\nbackbone:\n  # [from, number, module, args]\n  [[-1, 1, Conv, [16, 3, 1]],  # 0\n   [-1, 1, nn.MaxPool2d, [2, 2, 0]],  # 1-P1/2\n   [-1, 1, Conv, [32, 3, 1]],\n   [-1, 1, nn.MaxPool2d, [2, 2, 0]],  # 3-P2/4\n   [-1, 1, Conv, [64, 3, 1]],\n   [-1, 1, nn.MaxPool2d, [2, 2, 0]],  # 5-P3/8\n   [-1, 1, Conv, [128, 3, 1]],\n   [-1, 1, nn.MaxPool2d, [2, 2, 0]],  # 7-P4/16\n   [-1, 1, Conv, [256, 3, 1]],\n   [-1, 1, nn.MaxPool2d, [2, 2, 0]],  # 9-P5/32\n   [-1, 1, Conv, [512, 3, 1]],\n   [-1, 1, nn.ZeroPad2d, [[0, 1, 0, 1]]],  # 11\n   [-1, 1, nn.MaxPool2d, [2, 1, 0]],  # 12\n  ]\n\n# YOLOv3-tiny head\nhead:\n  [[-1, 1, Conv, [1024, 3, 1]],\n   [-1, 1, Conv, [256, 1, 1]],\n   [-1, 1, Conv, [512, 3, 1]],  # 15 (P5/32-large)\n\n   [-2, 1, Conv, [128, 1, 1]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [[-1, 8], 1, Concat, [1]],  # cat backbone P4\n   [-1, 1, Conv, [256, 3, 1]],  # 19 (P4/16-medium)\n\n   [[19, 15], 1, Detect, [nc, anchors]],  # Detect(P4, P5)\n  ]\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/models/hub/yolov3.yaml",
    "content": "# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n\n# Parameters\nnc: 80  # number of classes\ndepth_multiple: 1.0  # model depth multiple\nwidth_multiple: 1.0  # layer channel multiple\nanchors:\n  - [10,13, 16,30, 33,23]  # P3/8\n  - [30,61, 62,45, 59,119]  # P4/16\n  - [116,90, 156,198, 373,326]  # P5/32\n\n# darknet53 backbone\nbackbone:\n  # [from, number, module, args]\n  [[-1, 1, Conv, [32, 3, 1]],  # 0\n   [-1, 1, Conv, [64, 3, 2]],  # 1-P1/2\n   [-1, 1, Bottleneck, [64]],\n   [-1, 1, Conv, [128, 3, 2]],  # 3-P2/4\n   [-1, 2, Bottleneck, [128]],\n   [-1, 1, Conv, [256, 3, 2]],  # 5-P3/8\n   [-1, 8, Bottleneck, [256]],\n   [-1, 1, Conv, [512, 3, 2]],  # 7-P4/16\n   [-1, 8, Bottleneck, [512]],\n   [-1, 1, Conv, [1024, 3, 2]],  # 9-P5/32\n   [-1, 4, Bottleneck, [1024]],  # 10\n  ]\n\n# YOLOv3 head\nhead:\n  [[-1, 1, Bottleneck, [1024, False]],\n   [-1, 1, Conv, [512, 1, 1]],\n   [-1, 1, Conv, [1024, 3, 1]],\n   [-1, 1, Conv, [512, 1, 1]],\n   [-1, 1, Conv, [1024, 3, 1]],  # 15 (P5/32-large)\n\n   [-2, 1, Conv, [256, 1, 1]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [[-1, 8], 1, Concat, [1]],  # cat backbone P4\n   [-1, 1, Bottleneck, [512, False]],\n   [-1, 1, Bottleneck, [512, False]],\n   [-1, 1, Conv, [256, 1, 1]],\n   [-1, 1, Conv, [512, 3, 1]],  # 22 (P4/16-medium)\n\n   [-2, 1, Conv, [128, 1, 1]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [[-1, 6], 1, Concat, [1]],  # cat backbone P3\n   [-1, 1, Bottleneck, [256, False]],\n   [-1, 2, Bottleneck, [256, False]],  # 27 (P3/8-small)\n\n   [[27, 22, 15], 1, Detect, [nc, anchors]],   # Detect(P3, P4, P5)\n  ]\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/models/hub/yolov5-bifpn.yaml",
    "content": "# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n\n# Parameters\nnc: 80  # number of classes\ndepth_multiple: 1.0  # model depth multiple\nwidth_multiple: 1.0  # layer channel multiple\nanchors:\n  - [10,13, 16,30, 33,23]  # P3/8\n  - [30,61, 62,45, 59,119]  # P4/16\n  - [116,90, 156,198, 373,326]  # P5/32\n\n# YOLOv5 v6.0 backbone\nbackbone:\n  # [from, number, module, args]\n  [[-1, 1, Conv, [64, 6, 2, 2]],  # 0-P1/2\n   [-1, 1, Conv, [128, 3, 2]],  # 1-P2/4\n   [-1, 3, C3, [128]],\n   [-1, 1, Conv, [256, 3, 2]],  # 3-P3/8\n   [-1, 6, C3, [256]],\n   [-1, 1, Conv, [512, 3, 2]],  # 5-P4/16\n   [-1, 9, C3, [512]],\n   [-1, 1, Conv, [1024, 3, 2]],  # 7-P5/32\n   [-1, 3, C3, [1024]],\n   [-1, 1, SPPF, [1024, 5]],  # 9\n  ]\n\n# YOLOv5 v6.0 BiFPN head\nhead:\n  [[-1, 1, Conv, [512, 1, 1]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [[-1, 6], 1, Concat, [1]],  # cat backbone P4\n   [-1, 3, C3, [512, False]],  # 13\n\n   [-1, 1, Conv, [256, 1, 1]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [[-1, 4], 1, Concat, [1]],  # cat backbone P3\n   [-1, 3, C3, [256, False]],  # 17 (P3/8-small)\n\n   [-1, 1, Conv, [256, 3, 2]],\n   [[-1, 14, 6], 1, Concat, [1]],  # cat P4 <--- BiFPN change\n   [-1, 3, C3, [512, False]],  # 20 (P4/16-medium)\n\n   [-1, 1, Conv, [512, 3, 2]],\n   [[-1, 10], 1, Concat, [1]],  # cat head P5\n   [-1, 3, C3, [1024, False]],  # 23 (P5/32-large)\n\n   [[17, 20, 23], 1, Detect, [nc, anchors]],  # Detect(P3, P4, P5)\n  ]\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/models/hub/yolov5-fpn.yaml",
    "content": "# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n\n# Parameters\nnc: 80  # number of classes\ndepth_multiple: 1.0  # model depth multiple\nwidth_multiple: 1.0  # layer channel multiple\nanchors:\n  - [10,13, 16,30, 33,23]  # P3/8\n  - [30,61, 62,45, 59,119]  # P4/16\n  - [116,90, 156,198, 373,326]  # P5/32\n\n# YOLOv5 v6.0 backbone\nbackbone:\n  # [from, number, module, args]\n  [[-1, 1, Conv, [64, 6, 2, 2]],  # 0-P1/2\n   [-1, 1, Conv, [128, 3, 2]],  # 1-P2/4\n   [-1, 3, C3, [128]],\n   [-1, 1, Conv, [256, 3, 2]],  # 3-P3/8\n   [-1, 6, C3, [256]],\n   [-1, 1, Conv, [512, 3, 2]],  # 5-P4/16\n   [-1, 9, C3, [512]],\n   [-1, 1, Conv, [1024, 3, 2]],  # 7-P5/32\n   [-1, 3, C3, [1024]],\n   [-1, 1, SPPF, [1024, 5]],  # 9\n  ]\n\n# YOLOv5 v6.0 FPN head\nhead:\n  [[-1, 3, C3, [1024, False]],  # 10 (P5/32-large)\n\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [[-1, 6], 1, Concat, [1]],  # cat backbone P4\n   [-1, 1, Conv, [512, 1, 1]],\n   [-1, 3, C3, [512, False]],  # 14 (P4/16-medium)\n\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [[-1, 4], 1, Concat, [1]],  # cat backbone P3\n   [-1, 1, Conv, [256, 1, 1]],\n   [-1, 3, C3, [256, False]],  # 18 (P3/8-small)\n\n   [[18, 14, 10], 1, Detect, [nc, anchors]],  # Detect(P3, P4, P5)\n  ]\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/models/hub/yolov5-p2.yaml",
    "content": "# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n\n# Parameters\nnc: 80  # number of classes\ndepth_multiple: 1.0  # model depth multiple\nwidth_multiple: 1.0  # layer channel multiple\nanchors: 3  # AutoAnchor evolves 3 anchors per P output layer\n\n# YOLOv5 v6.0 backbone\nbackbone:\n  # [from, number, module, args]\n  [[-1, 1, Conv, [64, 6, 2, 2]],  # 0-P1/2\n   [-1, 1, Conv, [128, 3, 2]],  # 1-P2/4\n   [-1, 3, C3, [128]],\n   [-1, 1, Conv, [256, 3, 2]],  # 3-P3/8\n   [-1, 6, C3, [256]],\n   [-1, 1, Conv, [512, 3, 2]],  # 5-P4/16\n   [-1, 9, C3, [512]],\n   [-1, 1, Conv, [1024, 3, 2]],  # 7-P5/32\n   [-1, 3, C3, [1024]],\n   [-1, 1, SPPF, [1024, 5]],  # 9\n  ]\n\n# YOLOv5 v6.0 head with (P2, P3, P4, P5) outputs\nhead:\n  [[-1, 1, Conv, [512, 1, 1]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [[-1, 6], 1, Concat, [1]],  # cat backbone P4\n   [-1, 3, C3, [512, False]],  # 13\n\n   [-1, 1, Conv, [256, 1, 1]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [[-1, 4], 1, Concat, [1]],  # cat backbone P3\n   [-1, 3, C3, [256, False]],  # 17 (P3/8-small)\n\n   [-1, 1, Conv, [128, 1, 1]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [[-1, 2], 1, Concat, [1]],  # cat backbone P2\n   [-1, 1, C3, [128, False]],  # 21 (P2/4-xsmall)\n\n   [-1, 1, Conv, [128, 3, 2]],\n   [[-1, 18], 1, Concat, [1]],  # cat head P3\n   [-1, 3, C3, [256, False]],  # 24 (P3/8-small)\n\n   [-1, 1, Conv, [256, 3, 2]],\n   [[-1, 14], 1, Concat, [1]],  # cat head P4\n   [-1, 3, C3, [512, False]],  # 27 (P4/16-medium)\n\n   [-1, 1, Conv, [512, 3, 2]],\n   [[-1, 10], 1, Concat, [1]],  # cat head P5\n   [-1, 3, C3, [1024, False]],  # 30 (P5/32-large)\n\n   [[21, 24, 27, 30], 1, Detect, [nc, anchors]],  # Detect(P2, P3, P4, P5)\n  ]\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/models/hub/yolov5-p34.yaml",
    "content": "# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n\n# Parameters\nnc: 80  # number of classes\ndepth_multiple: 0.33  # model depth multiple\nwidth_multiple: 0.50  # layer channel multiple\nanchors: 3  # AutoAnchor evolves 3 anchors per P output layer\n\n# YOLOv5 v6.0 backbone\nbackbone:\n  # [from, number, module, args]\n  [ [ -1, 1, Conv, [ 64, 6, 2, 2 ] ],  # 0-P1/2\n    [ -1, 1, Conv, [ 128, 3, 2 ] ],  # 1-P2/4\n    [ -1, 3, C3, [ 128 ] ],\n    [ -1, 1, Conv, [ 256, 3, 2 ] ],  # 3-P3/8\n    [ -1, 6, C3, [ 256 ] ],\n    [ -1, 1, Conv, [ 512, 3, 2 ] ],  # 5-P4/16\n    [ -1, 9, C3, [ 512 ] ],\n    [ -1, 1, Conv, [ 1024, 3, 2 ] ],  # 7-P5/32\n    [ -1, 3, C3, [ 1024 ] ],\n    [ -1, 1, SPPF, [ 1024, 5 ] ],  # 9\n  ]\n\n# YOLOv5 v6.0 head with (P3, P4) outputs\nhead:\n  [ [ -1, 1, Conv, [ 512, 1, 1 ] ],\n    [ -1, 1, nn.Upsample, [ None, 2, 'nearest' ] ],\n    [ [ -1, 6 ], 1, Concat, [ 1 ] ],  # cat backbone P4\n    [ -1, 3, C3, [ 512, False ] ],  # 13\n\n    [ -1, 1, Conv, [ 256, 1, 1 ] ],\n    [ -1, 1, nn.Upsample, [ None, 2, 'nearest' ] ],\n    [ [ -1, 4 ], 1, Concat, [ 1 ] ],  # cat backbone P3\n    [ -1, 3, C3, [ 256, False ] ],  # 17 (P3/8-small)\n\n    [ -1, 1, Conv, [ 256, 3, 2 ] ],\n    [ [ -1, 14 ], 1, Concat, [ 1 ] ],  # cat head P4\n    [ -1, 3, C3, [ 512, False ] ],  # 20 (P4/16-medium)\n\n    [ [ 17, 20 ], 1, Detect, [ nc, anchors ] ],  # Detect(P3, P4)\n  ]\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/models/hub/yolov5-p6.yaml",
    "content": "# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n\n# Parameters\nnc: 80  # number of classes\ndepth_multiple: 1.0  # model depth multiple\nwidth_multiple: 1.0  # layer channel multiple\nanchors: 3  # AutoAnchor evolves 3 anchors per P output layer\n\n# YOLOv5 v6.0 backbone\nbackbone:\n  # [from, number, module, args]\n  [[-1, 1, Conv, [64, 6, 2, 2]],  # 0-P1/2\n   [-1, 1, Conv, [128, 3, 2]],  # 1-P2/4\n   [-1, 3, C3, [128]],\n   [-1, 1, Conv, [256, 3, 2]],  # 3-P3/8\n   [-1, 6, C3, [256]],\n   [-1, 1, Conv, [512, 3, 2]],  # 5-P4/16\n   [-1, 9, C3, [512]],\n   [-1, 1, Conv, [768, 3, 2]],  # 7-P5/32\n   [-1, 3, C3, [768]],\n   [-1, 1, Conv, [1024, 3, 2]],  # 9-P6/64\n   [-1, 3, C3, [1024]],\n   [-1, 1, SPPF, [1024, 5]],  # 11\n  ]\n\n# YOLOv5 v6.0 head with (P3, P4, P5, P6) outputs\nhead:\n  [[-1, 1, Conv, [768, 1, 1]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [[-1, 8], 1, Concat, [1]],  # cat backbone P5\n   [-1, 3, C3, [768, False]],  # 15\n\n   [-1, 1, Conv, [512, 1, 1]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [[-1, 6], 1, Concat, [1]],  # cat backbone P4\n   [-1, 3, C3, [512, False]],  # 19\n\n   [-1, 1, Conv, [256, 1, 1]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [[-1, 4], 1, Concat, [1]],  # cat backbone P3\n   [-1, 3, C3, [256, False]],  # 23 (P3/8-small)\n\n   [-1, 1, Conv, [256, 3, 2]],\n   [[-1, 20], 1, Concat, [1]],  # cat head P4\n   [-1, 3, C3, [512, False]],  # 26 (P4/16-medium)\n\n   [-1, 1, Conv, [512, 3, 2]],\n   [[-1, 16], 1, Concat, [1]],  # cat head P5\n   [-1, 3, C3, [768, False]],  # 29 (P5/32-large)\n\n   [-1, 1, Conv, [768, 3, 2]],\n   [[-1, 12], 1, Concat, [1]],  # cat head P6\n   [-1, 3, C3, [1024, False]],  # 32 (P6/64-xlarge)\n\n   [[23, 26, 29, 32], 1, Detect, [nc, anchors]],  # Detect(P3, P4, P5, P6)\n  ]\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/models/hub/yolov5-p7.yaml",
    "content": "# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n\n# Parameters\nnc: 80  # number of classes\ndepth_multiple: 1.0  # model depth multiple\nwidth_multiple: 1.0  # layer channel multiple\nanchors: 3  # AutoAnchor evolves 3 anchors per P output layer\n\n# YOLOv5 v6.0 backbone\nbackbone:\n  # [from, number, module, args]\n  [[-1, 1, Conv, [64, 6, 2, 2]],  # 0-P1/2\n   [-1, 1, Conv, [128, 3, 2]],  # 1-P2/4\n   [-1, 3, C3, [128]],\n   [-1, 1, Conv, [256, 3, 2]],  # 3-P3/8\n   [-1, 6, C3, [256]],\n   [-1, 1, Conv, [512, 3, 2]],  # 5-P4/16\n   [-1, 9, C3, [512]],\n   [-1, 1, Conv, [768, 3, 2]],  # 7-P5/32\n   [-1, 3, C3, [768]],\n   [-1, 1, Conv, [1024, 3, 2]],  # 9-P6/64\n   [-1, 3, C3, [1024]],\n   [-1, 1, Conv, [1280, 3, 2]],  # 11-P7/128\n   [-1, 3, C3, [1280]],\n   [-1, 1, SPPF, [1280, 5]],  # 13\n  ]\n\n# YOLOv5 v6.0 head with (P3, P4, P5, P6, P7) outputs\nhead:\n  [[-1, 1, Conv, [1024, 1, 1]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [[-1, 10], 1, Concat, [1]],  # cat backbone P6\n   [-1, 3, C3, [1024, False]],  # 17\n\n   [-1, 1, Conv, [768, 1, 1]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [[-1, 8], 1, Concat, [1]],  # cat backbone P5\n   [-1, 3, C3, [768, False]],  # 21\n\n   [-1, 1, Conv, [512, 1, 1]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [[-1, 6], 1, Concat, [1]],  # cat backbone P4\n   [-1, 3, C3, [512, False]],  # 25\n\n   [-1, 1, Conv, [256, 1, 1]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [[-1, 4], 1, Concat, [1]],  # cat backbone P3\n   [-1, 3, C3, [256, False]],  # 29 (P3/8-small)\n\n   [-1, 1, Conv, [256, 3, 2]],\n   [[-1, 26], 1, Concat, [1]],  # cat head P4\n   [-1, 3, C3, [512, False]],  # 32 (P4/16-medium)\n\n   [-1, 1, Conv, [512, 3, 2]],\n   [[-1, 22], 1, Concat, [1]],  # cat head P5\n   [-1, 3, C3, [768, False]],  # 35 (P5/32-large)\n\n   [-1, 1, Conv, [768, 3, 2]],\n   [[-1, 18], 1, Concat, [1]],  # cat head P6\n   [-1, 3, C3, [1024, False]],  # 38 (P6/64-xlarge)\n\n   [-1, 1, Conv, [1024, 3, 2]],\n   [[-1, 14], 1, Concat, [1]],  # cat head P7\n   [-1, 3, C3, [1280, False]],  # 41 (P7/128-xxlarge)\n\n   [[29, 32, 35, 38, 41], 1, Detect, [nc, anchors]],  # Detect(P3, P4, P5, P6, P7)\n  ]\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/models/hub/yolov5-panet.yaml",
    "content": "# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n\n# Parameters\nnc: 80  # number of classes\ndepth_multiple: 1.0  # model depth multiple\nwidth_multiple: 1.0  # layer channel multiple\nanchors:\n  - [10,13, 16,30, 33,23]  # P3/8\n  - [30,61, 62,45, 59,119]  # P4/16\n  - [116,90, 156,198, 373,326]  # P5/32\n\n# YOLOv5 v6.0 backbone\nbackbone:\n  # [from, number, module, args]\n  [[-1, 1, Conv, [64, 6, 2, 2]],  # 0-P1/2\n   [-1, 1, Conv, [128, 3, 2]],  # 1-P2/4\n   [-1, 3, C3, [128]],\n   [-1, 1, Conv, [256, 3, 2]],  # 3-P3/8\n   [-1, 6, C3, [256]],\n   [-1, 1, Conv, [512, 3, 2]],  # 5-P4/16\n   [-1, 9, C3, [512]],\n   [-1, 1, Conv, [1024, 3, 2]],  # 7-P5/32\n   [-1, 3, C3, [1024]],\n   [-1, 1, SPPF, [1024, 5]],  # 9\n  ]\n\n# YOLOv5 v6.0 PANet head\nhead:\n  [[-1, 1, Conv, [512, 1, 1]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [[-1, 6], 1, Concat, [1]],  # cat backbone P4\n   [-1, 3, C3, [512, False]],  # 13\n\n   [-1, 1, Conv, [256, 1, 1]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [[-1, 4], 1, Concat, [1]],  # cat backbone P3\n   [-1, 3, C3, [256, False]],  # 17 (P3/8-small)\n\n   [-1, 1, Conv, [256, 3, 2]],\n   [[-1, 14], 1, Concat, [1]],  # cat head P4\n   [-1, 3, C3, [512, False]],  # 20 (P4/16-medium)\n\n   [-1, 1, Conv, [512, 3, 2]],\n   [[-1, 10], 1, Concat, [1]],  # cat head P5\n   [-1, 3, C3, [1024, False]],  # 23 (P5/32-large)\n\n   [[17, 20, 23], 1, Detect, [nc, anchors]],  # Detect(P3, P4, P5)\n  ]\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/models/hub/yolov5l6.yaml",
    "content": "# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n\n# Parameters\nnc: 80  # number of classes\ndepth_multiple: 1.0  # model depth multiple\nwidth_multiple: 1.0  # layer channel multiple\nanchors:\n  - [19,27,  44,40,  38,94]  # P3/8\n  - [96,68,  86,152,  180,137]  # P4/16\n  - [140,301,  303,264,  238,542]  # P5/32\n  - [436,615,  739,380,  925,792]  # P6/64\n\n# YOLOv5 v6.0 backbone\nbackbone:\n  # [from, number, module, args]\n  [[-1, 1, Conv, [64, 6, 2, 2]],  # 0-P1/2\n   [-1, 1, Conv, [128, 3, 2]],  # 1-P2/4\n   [-1, 3, C3, [128]],\n   [-1, 1, Conv, [256, 3, 2]],  # 3-P3/8\n   [-1, 6, C3, [256]],\n   [-1, 1, Conv, [512, 3, 2]],  # 5-P4/16\n   [-1, 9, C3, [512]],\n   [-1, 1, Conv, [768, 3, 2]],  # 7-P5/32\n   [-1, 3, C3, [768]],\n   [-1, 1, Conv, [1024, 3, 2]],  # 9-P6/64\n   [-1, 3, C3, [1024]],\n   [-1, 1, SPPF, [1024, 5]],  # 11\n  ]\n\n# YOLOv5 v6.0 head\nhead:\n  [[-1, 1, Conv, [768, 1, 1]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [[-1, 8], 1, Concat, [1]],  # cat backbone P5\n   [-1, 3, C3, [768, False]],  # 15\n\n   [-1, 1, Conv, [512, 1, 1]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [[-1, 6], 1, Concat, [1]],  # cat backbone P4\n   [-1, 3, C3, [512, False]],  # 19\n\n   [-1, 1, Conv, [256, 1, 1]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [[-1, 4], 1, Concat, [1]],  # cat backbone P3\n   [-1, 3, C3, [256, False]],  # 23 (P3/8-small)\n\n   [-1, 1, Conv, [256, 3, 2]],\n   [[-1, 20], 1, Concat, [1]],  # cat head P4\n   [-1, 3, C3, [512, False]],  # 26 (P4/16-medium)\n\n   [-1, 1, Conv, [512, 3, 2]],\n   [[-1, 16], 1, Concat, [1]],  # cat head P5\n   [-1, 3, C3, [768, False]],  # 29 (P5/32-large)\n\n   [-1, 1, Conv, [768, 3, 2]],\n   [[-1, 12], 1, Concat, [1]],  # cat head P6\n   [-1, 3, C3, [1024, False]],  # 32 (P6/64-xlarge)\n\n   [[23, 26, 29, 32], 1, Detect, [nc, anchors]],  # Detect(P3, P4, P5, P6)\n  ]\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/models/hub/yolov5m6.yaml",
    "content": "# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n\n# Parameters\nnc: 80  # number of classes\ndepth_multiple: 0.67  # model depth multiple\nwidth_multiple: 0.75  # layer channel multiple\nanchors:\n  - [19,27,  44,40,  38,94]  # P3/8\n  - [96,68,  86,152,  180,137]  # P4/16\n  - [140,301,  303,264,  238,542]  # P5/32\n  - [436,615,  739,380,  925,792]  # P6/64\n\n# YOLOv5 v6.0 backbone\nbackbone:\n  # [from, number, module, args]\n  [[-1, 1, Conv, [64, 6, 2, 2]],  # 0-P1/2\n   [-1, 1, Conv, [128, 3, 2]],  # 1-P2/4\n   [-1, 3, C3, [128]],\n   [-1, 1, Conv, [256, 3, 2]],  # 3-P3/8\n   [-1, 6, C3, [256]],\n   [-1, 1, Conv, [512, 3, 2]],  # 5-P4/16\n   [-1, 9, C3, [512]],\n   [-1, 1, Conv, [768, 3, 2]],  # 7-P5/32\n   [-1, 3, C3, [768]],\n   [-1, 1, Conv, [1024, 3, 2]],  # 9-P6/64\n   [-1, 3, C3, [1024]],\n   [-1, 1, SPPF, [1024, 5]],  # 11\n  ]\n\n# YOLOv5 v6.0 head\nhead:\n  [[-1, 1, Conv, [768, 1, 1]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [[-1, 8], 1, Concat, [1]],  # cat backbone P5\n   [-1, 3, C3, [768, False]],  # 15\n\n   [-1, 1, Conv, [512, 1, 1]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [[-1, 6], 1, Concat, [1]],  # cat backbone P4\n   [-1, 3, C3, [512, False]],  # 19\n\n   [-1, 1, Conv, [256, 1, 1]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [[-1, 4], 1, Concat, [1]],  # cat backbone P3\n   [-1, 3, C3, [256, False]],  # 23 (P3/8-small)\n\n   [-1, 1, Conv, [256, 3, 2]],\n   [[-1, 20], 1, Concat, [1]],  # cat head P4\n   [-1, 3, C3, [512, False]],  # 26 (P4/16-medium)\n\n   [-1, 1, Conv, [512, 3, 2]],\n   [[-1, 16], 1, Concat, [1]],  # cat head P5\n   [-1, 3, C3, [768, False]],  # 29 (P5/32-large)\n\n   [-1, 1, Conv, [768, 3, 2]],\n   [[-1, 12], 1, Concat, [1]],  # cat head P6\n   [-1, 3, C3, [1024, False]],  # 32 (P6/64-xlarge)\n\n   [[23, 26, 29, 32], 1, Detect, [nc, anchors]],  # Detect(P3, P4, P5, P6)\n  ]\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/models/hub/yolov5n6.yaml",
    "content": "# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n\n# Parameters\nnc: 80  # number of classes\ndepth_multiple: 0.33  # model depth multiple\nwidth_multiple: 0.25  # layer channel multiple\nanchors:\n  - [19,27,  44,40,  38,94]  # P3/8\n  - [96,68,  86,152,  180,137]  # P4/16\n  - [140,301,  303,264,  238,542]  # P5/32\n  - [436,615,  739,380,  925,792]  # P6/64\n\n# YOLOv5 v6.0 backbone\nbackbone:\n  # [from, number, module, args]\n  [[-1, 1, Conv, [64, 6, 2, 2]],  # 0-P1/2\n   [-1, 1, Conv, [128, 3, 2]],  # 1-P2/4\n   [-1, 3, C3, [128]],\n   [-1, 1, Conv, [256, 3, 2]],  # 3-P3/8\n   [-1, 6, C3, [256]],\n   [-1, 1, Conv, [512, 3, 2]],  # 5-P4/16\n   [-1, 9, C3, [512]],\n   [-1, 1, Conv, [768, 3, 2]],  # 7-P5/32\n   [-1, 3, C3, [768]],\n   [-1, 1, Conv, [1024, 3, 2]],  # 9-P6/64\n   [-1, 3, C3, [1024]],\n   [-1, 1, SPPF, [1024, 5]],  # 11\n  ]\n\n# YOLOv5 v6.0 head\nhead:\n  [[-1, 1, Conv, [768, 1, 1]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [[-1, 8], 1, Concat, [1]],  # cat backbone P5\n   [-1, 3, C3, [768, False]],  # 15\n\n   [-1, 1, Conv, [512, 1, 1]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [[-1, 6], 1, Concat, [1]],  # cat backbone P4\n   [-1, 3, C3, [512, False]],  # 19\n\n   [-1, 1, Conv, [256, 1, 1]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [[-1, 4], 1, Concat, [1]],  # cat backbone P3\n   [-1, 3, C3, [256, False]],  # 23 (P3/8-small)\n\n   [-1, 1, Conv, [256, 3, 2]],\n   [[-1, 20], 1, Concat, [1]],  # cat head P4\n   [-1, 3, C3, [512, False]],  # 26 (P4/16-medium)\n\n   [-1, 1, Conv, [512, 3, 2]],\n   [[-1, 16], 1, Concat, [1]],  # cat head P5\n   [-1, 3, C3, [768, False]],  # 29 (P5/32-large)\n\n   [-1, 1, Conv, [768, 3, 2]],\n   [[-1, 12], 1, Concat, [1]],  # cat head P6\n   [-1, 3, C3, [1024, False]],  # 32 (P6/64-xlarge)\n\n   [[23, 26, 29, 32], 1, Detect, [nc, anchors]],  # Detect(P3, P4, P5, P6)\n  ]\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/models/hub/yolov5s-LeakyReLU.yaml",
    "content": "# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n\n# Parameters\nnc: 80  # number of classes\nactivation: nn.LeakyReLU(0.1)  # <----- Conv() activation used throughout entire YOLOv5 model\ndepth_multiple: 0.33  # model depth multiple\nwidth_multiple: 0.50  # layer channel multiple\nanchors:\n  - [10,13, 16,30, 33,23]  # P3/8\n  - [30,61, 62,45, 59,119]  # P4/16\n  - [116,90, 156,198, 373,326]  # P5/32\n\n# YOLOv5 v6.0 backbone\nbackbone:\n  # [from, number, module, args]\n  [[-1, 1, Conv, [64, 6, 2, 2]],  # 0-P1/2\n   [-1, 1, Conv, [128, 3, 2]],  # 1-P2/4\n   [-1, 3, C3, [128]],\n   [-1, 1, Conv, [256, 3, 2]],  # 3-P3/8\n   [-1, 6, C3, [256]],\n   [-1, 1, Conv, [512, 3, 2]],  # 5-P4/16\n   [-1, 9, C3, [512]],\n   [-1, 1, Conv, [1024, 3, 2]],  # 7-P5/32\n   [-1, 3, C3, [1024]],\n   [-1, 1, SPPF, [1024, 5]],  # 9\n  ]\n\n# YOLOv5 v6.0 head\nhead:\n  [[-1, 1, Conv, [512, 1, 1]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [[-1, 6], 1, Concat, [1]],  # cat backbone P4\n   [-1, 3, C3, [512, False]],  # 13\n\n   [-1, 1, Conv, [256, 1, 1]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [[-1, 4], 1, Concat, [1]],  # cat backbone P3\n   [-1, 3, C3, [256, False]],  # 17 (P3/8-small)\n\n   [-1, 1, Conv, [256, 3, 2]],\n   [[-1, 14], 1, Concat, [1]],  # cat head P4\n   [-1, 3, C3, [512, False]],  # 20 (P4/16-medium)\n\n   [-1, 1, Conv, [512, 3, 2]],\n   [[-1, 10], 1, Concat, [1]],  # cat head P5\n   [-1, 3, C3, [1024, False]],  # 23 (P5/32-large)\n\n   [[17, 20, 23], 1, Detect, [nc, anchors]],  # Detect(P3, P4, P5)\n  ]\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/models/hub/yolov5s-ghost.yaml",
    "content": "# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n\n# Parameters\nnc: 80  # number of classes\ndepth_multiple: 0.33  # model depth multiple\nwidth_multiple: 0.50  # layer channel multiple\nanchors:\n  - [10,13, 16,30, 33,23]  # P3/8\n  - [30,61, 62,45, 59,119]  # P4/16\n  - [116,90, 156,198, 373,326]  # P5/32\n\n# YOLOv5 v6.0 backbone\nbackbone:\n  # [from, number, module, args]\n  [[-1, 1, Conv, [64, 6, 2, 2]],  # 0-P1/2\n   [-1, 1, GhostConv, [128, 3, 2]],  # 1-P2/4\n   [-1, 3, C3Ghost, [128]],\n   [-1, 1, GhostConv, [256, 3, 2]],  # 3-P3/8\n   [-1, 6, C3Ghost, [256]],\n   [-1, 1, GhostConv, [512, 3, 2]],  # 5-P4/16\n   [-1, 9, C3Ghost, [512]],\n   [-1, 1, GhostConv, [1024, 3, 2]],  # 7-P5/32\n   [-1, 3, C3Ghost, [1024]],\n   [-1, 1, SPPF, [1024, 5]],  # 9\n  ]\n\n# YOLOv5 v6.0 head\nhead:\n  [[-1, 1, GhostConv, [512, 1, 1]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [[-1, 6], 1, Concat, [1]],  # cat backbone P4\n   [-1, 3, C3Ghost, [512, False]],  # 13\n\n   [-1, 1, GhostConv, [256, 1, 1]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [[-1, 4], 1, Concat, [1]],  # cat backbone P3\n   [-1, 3, C3Ghost, [256, False]],  # 17 (P3/8-small)\n\n   [-1, 1, GhostConv, [256, 3, 2]],\n   [[-1, 14], 1, Concat, [1]],  # cat head P4\n   [-1, 3, C3Ghost, [512, False]],  # 20 (P4/16-medium)\n\n   [-1, 1, GhostConv, [512, 3, 2]],\n   [[-1, 10], 1, Concat, [1]],  # cat head P5\n   [-1, 3, C3Ghost, [1024, False]],  # 23 (P5/32-large)\n\n   [[17, 20, 23], 1, Detect, [nc, anchors]],  # Detect(P3, P4, P5)\n  ]\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/models/hub/yolov5s-transformer.yaml",
    "content": "# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n\n# Parameters\nnc: 80  # number of classes\ndepth_multiple: 0.33  # model depth multiple\nwidth_multiple: 0.50  # layer channel multiple\nanchors:\n  - [10,13, 16,30, 33,23]  # P3/8\n  - [30,61, 62,45, 59,119]  # P4/16\n  - [116,90, 156,198, 373,326]  # P5/32\n\n# YOLOv5 v6.0 backbone\nbackbone:\n  # [from, number, module, args]\n  [[-1, 1, Conv, [64, 6, 2, 2]],  # 0-P1/2\n   [-1, 1, Conv, [128, 3, 2]],  # 1-P2/4\n   [-1, 3, C3, [128]],\n   [-1, 1, Conv, [256, 3, 2]],  # 3-P3/8\n   [-1, 6, C3, [256]],\n   [-1, 1, Conv, [512, 3, 2]],  # 5-P4/16\n   [-1, 9, C3, [512]],\n   [-1, 1, Conv, [1024, 3, 2]],  # 7-P5/32\n   [-1, 3, C3TR, [1024]],  # 9 <--- C3TR() Transformer module\n   [-1, 1, SPPF, [1024, 5]],  # 9\n  ]\n\n# YOLOv5 v6.0 head\nhead:\n  [[-1, 1, Conv, [512, 1, 1]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [[-1, 6], 1, Concat, [1]],  # cat backbone P4\n   [-1, 3, C3, [512, False]],  # 13\n\n   [-1, 1, Conv, [256, 1, 1]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [[-1, 4], 1, Concat, [1]],  # cat backbone P3\n   [-1, 3, C3, [256, False]],  # 17 (P3/8-small)\n\n   [-1, 1, Conv, [256, 3, 2]],\n   [[-1, 14], 1, Concat, [1]],  # cat head P4\n   [-1, 3, C3, [512, False]],  # 20 (P4/16-medium)\n\n   [-1, 1, Conv, [512, 3, 2]],\n   [[-1, 10], 1, Concat, [1]],  # cat head P5\n   [-1, 3, C3, [1024, False]],  # 23 (P5/32-large)\n\n   [[17, 20, 23], 1, Detect, [nc, anchors]],  # Detect(P3, P4, P5)\n  ]\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/models/hub/yolov5s6.yaml",
    "content": "# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n\n# Parameters\nnc: 80  # number of classes\ndepth_multiple: 0.33  # model depth multiple\nwidth_multiple: 0.50  # layer channel multiple\nanchors:\n  - [19,27,  44,40,  38,94]  # P3/8\n  - [96,68,  86,152,  180,137]  # P4/16\n  - [140,301,  303,264,  238,542]  # P5/32\n  - [436,615,  739,380,  925,792]  # P6/64\n\n# YOLOv5 v6.0 backbone\nbackbone:\n  # [from, number, module, args]\n  [[-1, 1, Conv, [64, 6, 2, 2]],  # 0-P1/2\n   [-1, 1, Conv, [128, 3, 2]],  # 1-P2/4\n   [-1, 3, C3, [128]],\n   [-1, 1, Conv, [256, 3, 2]],  # 3-P3/8\n   [-1, 6, C3, [256]],\n   [-1, 1, Conv, [512, 3, 2]],  # 5-P4/16\n   [-1, 9, C3, [512]],\n   [-1, 1, Conv, [768, 3, 2]],  # 7-P5/32\n   [-1, 3, C3, [768]],\n   [-1, 1, Conv, [1024, 3, 2]],  # 9-P6/64\n   [-1, 3, C3, [1024]],\n   [-1, 1, SPPF, [1024, 5]],  # 11\n  ]\n\n# YOLOv5 v6.0 head\nhead:\n  [[-1, 1, Conv, [768, 1, 1]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [[-1, 8], 1, Concat, [1]],  # cat backbone P5\n   [-1, 3, C3, [768, False]],  # 15\n\n   [-1, 1, Conv, [512, 1, 1]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [[-1, 6], 1, Concat, [1]],  # cat backbone P4\n   [-1, 3, C3, [512, False]],  # 19\n\n   [-1, 1, Conv, [256, 1, 1]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [[-1, 4], 1, Concat, [1]],  # cat backbone P3\n   [-1, 3, C3, [256, False]],  # 23 (P3/8-small)\n\n   [-1, 1, Conv, [256, 3, 2]],\n   [[-1, 20], 1, Concat, [1]],  # cat head P4\n   [-1, 3, C3, [512, False]],  # 26 (P4/16-medium)\n\n   [-1, 1, Conv, [512, 3, 2]],\n   [[-1, 16], 1, Concat, [1]],  # cat head P5\n   [-1, 3, C3, [768, False]],  # 29 (P5/32-large)\n\n   [-1, 1, Conv, [768, 3, 2]],\n   [[-1, 12], 1, Concat, [1]],  # cat head P6\n   [-1, 3, C3, [1024, False]],  # 32 (P6/64-xlarge)\n\n   [[23, 26, 29, 32], 1, Detect, [nc, anchors]],  # Detect(P3, P4, P5, P6)\n  ]\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/models/hub/yolov5x6.yaml",
    "content": "# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n\n# Parameters\nnc: 80  # number of classes\ndepth_multiple: 1.33  # model depth multiple\nwidth_multiple: 1.25  # layer channel multiple\nanchors:\n  - [19,27,  44,40,  38,94]  # P3/8\n  - [96,68,  86,152,  180,137]  # P4/16\n  - [140,301,  303,264,  238,542]  # P5/32\n  - [436,615,  739,380,  925,792]  # P6/64\n\n# YOLOv5 v6.0 backbone\nbackbone:\n  # [from, number, module, args]\n  [[-1, 1, Conv, [64, 6, 2, 2]],  # 0-P1/2\n   [-1, 1, Conv, [128, 3, 2]],  # 1-P2/4\n   [-1, 3, C3, [128]],\n   [-1, 1, Conv, [256, 3, 2]],  # 3-P3/8\n   [-1, 6, C3, [256]],\n   [-1, 1, Conv, [512, 3, 2]],  # 5-P4/16\n   [-1, 9, C3, [512]],\n   [-1, 1, Conv, [768, 3, 2]],  # 7-P5/32\n   [-1, 3, C3, [768]],\n   [-1, 1, Conv, [1024, 3, 2]],  # 9-P6/64\n   [-1, 3, C3, [1024]],\n   [-1, 1, SPPF, [1024, 5]],  # 11\n  ]\n\n# YOLOv5 v6.0 head\nhead:\n  [[-1, 1, Conv, [768, 1, 1]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [[-1, 8], 1, Concat, [1]],  # cat backbone P5\n   [-1, 3, C3, [768, False]],  # 15\n\n   [-1, 1, Conv, [512, 1, 1]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [[-1, 6], 1, Concat, [1]],  # cat backbone P4\n   [-1, 3, C3, [512, False]],  # 19\n\n   [-1, 1, Conv, [256, 1, 1]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [[-1, 4], 1, Concat, [1]],  # cat backbone P3\n   [-1, 3, C3, [256, False]],  # 23 (P3/8-small)\n\n   [-1, 1, Conv, [256, 3, 2]],\n   [[-1, 20], 1, Concat, [1]],  # cat head P4\n   [-1, 3, C3, [512, False]],  # 26 (P4/16-medium)\n\n   [-1, 1, Conv, [512, 3, 2]],\n   [[-1, 16], 1, Concat, [1]],  # cat head P5\n   [-1, 3, C3, [768, False]],  # 29 (P5/32-large)\n\n   [-1, 1, Conv, [768, 3, 2]],\n   [[-1, 12], 1, Concat, [1]],  # cat head P6\n   [-1, 3, C3, [1024, False]],  # 32 (P6/64-xlarge)\n\n   [[23, 26, 29, 32], 1, Detect, [nc, anchors]],  # Detect(P3, P4, P5, P6)\n  ]\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/models/segment/yolov5l-seg.yaml",
    "content": "# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n\n# Parameters\nnc: 80  # number of classes\ndepth_multiple: 1.0  # model depth multiple\nwidth_multiple: 1.0  # layer channel multiple\nanchors:\n  - [10,13, 16,30, 33,23]  # P3/8\n  - [30,61, 62,45, 59,119]  # P4/16\n  - [116,90, 156,198, 373,326]  # P5/32\n\n# YOLOv5 v6.0 backbone\nbackbone:\n  # [from, number, module, args]\n  [[-1, 1, Conv, [64, 6, 2, 2]],  # 0-P1/2\n   [-1, 1, Conv, [128, 3, 2]],  # 1-P2/4\n   [-1, 3, C3, [128]],\n   [-1, 1, Conv, [256, 3, 2]],  # 3-P3/8\n   [-1, 6, C3, [256]],\n   [-1, 1, Conv, [512, 3, 2]],  # 5-P4/16\n   [-1, 9, C3, [512]],\n   [-1, 1, Conv, [1024, 3, 2]],  # 7-P5/32\n   [-1, 3, C3, [1024]],\n   [-1, 1, SPPF, [1024, 5]],  # 9\n  ]\n\n# YOLOv5 v6.0 head\nhead:\n  [[-1, 1, Conv, [512, 1, 1]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [[-1, 6], 1, Concat, [1]],  # cat backbone P4\n   [-1, 3, C3, [512, False]],  # 13\n\n   [-1, 1, Conv, [256, 1, 1]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [[-1, 4], 1, Concat, [1]],  # cat backbone P3\n   [-1, 3, C3, [256, False]],  # 17 (P3/8-small)\n\n   [-1, 1, Conv, [256, 3, 2]],\n   [[-1, 14], 1, Concat, [1]],  # cat head P4\n   [-1, 3, C3, [512, False]],  # 20 (P4/16-medium)\n\n   [-1, 1, Conv, [512, 3, 2]],\n   [[-1, 10], 1, Concat, [1]],  # cat head P5\n   [-1, 3, C3, [1024, False]],  # 23 (P5/32-large)\n\n   [[17, 20, 23], 1, Segment, [nc, anchors, 32, 256]],  # Detect(P3, P4, P5)\n  ]\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/models/segment/yolov5m-seg.yaml",
    "content": "# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n\n# Parameters\nnc: 80  # number of classes\ndepth_multiple: 0.67  # model depth multiple\nwidth_multiple: 0.75  # layer channel multiple\nanchors:\n  - [10,13, 16,30, 33,23]  # P3/8\n  - [30,61, 62,45, 59,119]  # P4/16\n  - [116,90, 156,198, 373,326]  # P5/32\n\n# YOLOv5 v6.0 backbone\nbackbone:\n  # [from, number, module, args]\n  [[-1, 1, Conv, [64, 6, 2, 2]],  # 0-P1/2\n   [-1, 1, Conv, [128, 3, 2]],  # 1-P2/4\n   [-1, 3, C3, [128]],\n   [-1, 1, Conv, [256, 3, 2]],  # 3-P3/8\n   [-1, 6, C3, [256]],\n   [-1, 1, Conv, [512, 3, 2]],  # 5-P4/16\n   [-1, 9, C3, [512]],\n   [-1, 1, Conv, [1024, 3, 2]],  # 7-P5/32\n   [-1, 3, C3, [1024]],\n   [-1, 1, SPPF, [1024, 5]],  # 9\n  ]\n\n# YOLOv5 v6.0 head\nhead:\n  [[-1, 1, Conv, [512, 1, 1]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [[-1, 6], 1, Concat, [1]],  # cat backbone P4\n   [-1, 3, C3, [512, False]],  # 13\n\n   [-1, 1, Conv, [256, 1, 1]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [[-1, 4], 1, Concat, [1]],  # cat backbone P3\n   [-1, 3, C3, [256, False]],  # 17 (P3/8-small)\n\n   [-1, 1, Conv, [256, 3, 2]],\n   [[-1, 14], 1, Concat, [1]],  # cat head P4\n   [-1, 3, C3, [512, False]],  # 20 (P4/16-medium)\n\n   [-1, 1, Conv, [512, 3, 2]],\n   [[-1, 10], 1, Concat, [1]],  # cat head P5\n   [-1, 3, C3, [1024, False]],  # 23 (P5/32-large)\n\n   [[17, 20, 23], 1, Segment, [nc, anchors, 32, 256]],  # Detect(P3, P4, P5)\n  ]\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/models/segment/yolov5n-seg.yaml",
    "content": "# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n\n# Parameters\nnc: 80  # number of classes\ndepth_multiple: 0.33  # model depth multiple\nwidth_multiple: 0.25  # layer channel multiple\nanchors:\n  - [10,13, 16,30, 33,23]  # P3/8\n  - [30,61, 62,45, 59,119]  # P4/16\n  - [116,90, 156,198, 373,326]  # P5/32\n\n# YOLOv5 v6.0 backbone\nbackbone:\n  # [from, number, module, args]\n  [[-1, 1, Conv, [64, 6, 2, 2]],  # 0-P1/2\n   [-1, 1, Conv, [128, 3, 2]],  # 1-P2/4\n   [-1, 3, C3, [128]],\n   [-1, 1, Conv, [256, 3, 2]],  # 3-P3/8\n   [-1, 6, C3, [256]],\n   [-1, 1, Conv, [512, 3, 2]],  # 5-P4/16\n   [-1, 9, C3, [512]],\n   [-1, 1, Conv, [1024, 3, 2]],  # 7-P5/32\n   [-1, 3, C3, [1024]],\n   [-1, 1, SPPF, [1024, 5]],  # 9\n  ]\n\n# YOLOv5 v6.0 head\nhead:\n  [[-1, 1, Conv, [512, 1, 1]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [[-1, 6], 1, Concat, [1]],  # cat backbone P4\n   [-1, 3, C3, [512, False]],  # 13\n\n   [-1, 1, Conv, [256, 1, 1]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [[-1, 4], 1, Concat, [1]],  # cat backbone P3\n   [-1, 3, C3, [256, False]],  # 17 (P3/8-small)\n\n   [-1, 1, Conv, [256, 3, 2]],\n   [[-1, 14], 1, Concat, [1]],  # cat head P4\n   [-1, 3, C3, [512, False]],  # 20 (P4/16-medium)\n\n   [-1, 1, Conv, [512, 3, 2]],\n   [[-1, 10], 1, Concat, [1]],  # cat head P5\n   [-1, 3, C3, [1024, False]],  # 23 (P5/32-large)\n\n   [[17, 20, 23], 1, Segment, [nc, anchors, 32, 256]],  # Detect(P3, P4, P5)\n  ]\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/models/segment/yolov5s-seg.yaml",
    "content": "# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n\n# Parameters\nnc: 80  # number of classes\ndepth_multiple: 0.33  # model depth multiple\nwidth_multiple: 0.5  # layer channel multiple\nanchors:\n  - [10,13, 16,30, 33,23]  # P3/8\n  - [30,61, 62,45, 59,119]  # P4/16\n  - [116,90, 156,198, 373,326]  # P5/32\n\n# YOLOv5 v6.0 backbone\nbackbone:\n  # [from, number, module, args]\n  [[-1, 1, Conv, [64, 6, 2, 2]],  # 0-P1/2\n   [-1, 1, Conv, [128, 3, 2]],  # 1-P2/4\n   [-1, 3, C3, [128]],\n   [-1, 1, Conv, [256, 3, 2]],  # 3-P3/8\n   [-1, 6, C3, [256]],\n   [-1, 1, Conv, [512, 3, 2]],  # 5-P4/16\n   [-1, 9, C3, [512]],\n   [-1, 1, Conv, [1024, 3, 2]],  # 7-P5/32\n   [-1, 3, C3, [1024]],\n   [-1, 1, SPPF, [1024, 5]],  # 9\n  ]\n\n# YOLOv5 v6.0 head\nhead:\n  [[-1, 1, Conv, [512, 1, 1]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [[-1, 6], 1, Concat, [1]],  # cat backbone P4\n   [-1, 3, C3, [512, False]],  # 13\n\n   [-1, 1, Conv, [256, 1, 1]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [[-1, 4], 1, Concat, [1]],  # cat backbone P3\n   [-1, 3, C3, [256, False]],  # 17 (P3/8-small)\n\n   [-1, 1, Conv, [256, 3, 2]],\n   [[-1, 14], 1, Concat, [1]],  # cat head P4\n   [-1, 3, C3, [512, False]],  # 20 (P4/16-medium)\n\n   [-1, 1, Conv, [512, 3, 2]],\n   [[-1, 10], 1, Concat, [1]],  # cat head P5\n   [-1, 3, C3, [1024, False]],  # 23 (P5/32-large)\n\n   [[17, 20, 23], 1, Segment, [nc, anchors, 32, 256]],  # Detect(P3, P4, P5)\n  ]\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/models/segment/yolov5x-seg.yaml",
    "content": "# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n\n# Parameters\nnc: 80  # number of classes\ndepth_multiple: 1.33  # model depth multiple\nwidth_multiple: 1.25  # layer channel multiple\nanchors:\n  - [10,13, 16,30, 33,23]  # P3/8\n  - [30,61, 62,45, 59,119]  # P4/16\n  - [116,90, 156,198, 373,326]  # P5/32\n\n# YOLOv5 v6.0 backbone\nbackbone:\n  # [from, number, module, args]\n  [[-1, 1, Conv, [64, 6, 2, 2]],  # 0-P1/2\n   [-1, 1, Conv, [128, 3, 2]],  # 1-P2/4\n   [-1, 3, C3, [128]],\n   [-1, 1, Conv, [256, 3, 2]],  # 3-P3/8\n   [-1, 6, C3, [256]],\n   [-1, 1, Conv, [512, 3, 2]],  # 5-P4/16\n   [-1, 9, C3, [512]],\n   [-1, 1, Conv, [1024, 3, 2]],  # 7-P5/32\n   [-1, 3, C3, [1024]],\n   [-1, 1, SPPF, [1024, 5]],  # 9\n  ]\n\n# YOLOv5 v6.0 head\nhead:\n  [[-1, 1, Conv, [512, 1, 1]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [[-1, 6], 1, Concat, [1]],  # cat backbone P4\n   [-1, 3, C3, [512, False]],  # 13\n\n   [-1, 1, Conv, [256, 1, 1]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [[-1, 4], 1, Concat, [1]],  # cat backbone P3\n   [-1, 3, C3, [256, False]],  # 17 (P3/8-small)\n\n   [-1, 1, Conv, [256, 3, 2]],\n   [[-1, 14], 1, Concat, [1]],  # cat head P4\n   [-1, 3, C3, [512, False]],  # 20 (P4/16-medium)\n\n   [-1, 1, Conv, [512, 3, 2]],\n   [[-1, 10], 1, Concat, [1]],  # cat head P5\n   [-1, 3, C3, [1024, False]],  # 23 (P5/32-large)\n\n   [[17, 20, 23], 1, Segment, [nc, anchors, 32, 256]],  # Detect(P3, P4, P5)\n  ]\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/models/tf.py",
    "content": "# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n\"\"\"\nTensorFlow, Keras and TFLite versions of YOLOv5\nAuthored by https://github.com/zldrobit in PR https://github.com/ultralytics/yolov5/pull/1127\n\nUsage:\n    $ python models/tf.py --weights yolov5s.pt\n\nExport:\n    $ python export.py --weights yolov5s.pt --include saved_model pb tflite tfjs\n\"\"\"\n\nimport argparse\nimport sys\nfrom copy import deepcopy\nfrom pathlib import Path\n\nFILE = Path(__file__).resolve()\nROOT = FILE.parents[1]  # YOLOv5 root directory\nif str(ROOT) not in sys.path:\n    sys.path.append(str(ROOT))  # add ROOT to PATH\n# ROOT = ROOT.relative_to(Path.cwd())  # relative\n\nimport numpy as np\nimport tensorflow as tf\nimport torch\nimport torch.nn as nn\nfrom tensorflow import keras\n\nfrom models.common import (C3, SPP, SPPF, Bottleneck, BottleneckCSP, C3x, Concat, Conv, CrossConv, DWConv,\n                           DWConvTranspose2d, Focus, autopad)\nfrom models.experimental import MixConv2d, attempt_load\nfrom models.yolo import Detect, Segment\nfrom utils.activations import SiLU\nfrom utils.general import LOGGER, make_divisible, print_args\n\n\nclass TFBN(keras.layers.Layer):\n    # TensorFlow BatchNormalization wrapper\n    def __init__(self, w=None):\n        super().__init__()\n        self.bn = keras.layers.BatchNormalization(\n            beta_initializer=keras.initializers.Constant(w.bias.numpy()),\n            gamma_initializer=keras.initializers.Constant(w.weight.numpy()),\n            moving_mean_initializer=keras.initializers.Constant(w.running_mean.numpy()),\n            moving_variance_initializer=keras.initializers.Constant(w.running_var.numpy()),\n            epsilon=w.eps)\n\n    def call(self, inputs):\n        return self.bn(inputs)\n\n\nclass TFPad(keras.layers.Layer):\n    # Pad inputs in spatial dimensions 1 and 2\n    def __init__(self, pad):\n        super().__init__()\n        if isinstance(pad, int):\n            self.pad = tf.constant([[0, 0], [pad, pad], [pad, pad], [0, 0]])\n        else:  # tuple/list\n            self.pad = tf.constant([[0, 0], [pad[0], pad[0]], [pad[1], pad[1]], [0, 0]])\n\n    def call(self, inputs):\n        return tf.pad(inputs, self.pad, mode='constant', constant_values=0)\n\n\nclass TFConv(keras.layers.Layer):\n    # Standard convolution\n    def __init__(self, c1, c2, k=1, s=1, p=None, g=1, act=True, w=None):\n        # ch_in, ch_out, weights, kernel, stride, padding, groups\n        super().__init__()\n        assert g == 1, \"TF v2.2 Conv2D does not support 'groups' argument\"\n        # TensorFlow convolution padding is inconsistent with PyTorch (e.g. k=3 s=2 'SAME' padding)\n        # see https://stackoverflow.com/questions/52975843/comparing-conv2d-with-padding-between-tensorflow-and-pytorch\n        conv = keras.layers.Conv2D(\n            filters=c2,\n            kernel_size=k,\n            strides=s,\n            padding='SAME' if s == 1 else 'VALID',\n            use_bias=not hasattr(w, 'bn'),\n            kernel_initializer=keras.initializers.Constant(w.conv.weight.permute(2, 3, 1, 0).numpy()),\n            bias_initializer='zeros' if hasattr(w, 'bn') else keras.initializers.Constant(w.conv.bias.numpy()))\n        self.conv = conv if s == 1 else keras.Sequential([TFPad(autopad(k, p)), conv])\n        self.bn = TFBN(w.bn) if hasattr(w, 'bn') else tf.identity\n        self.act = activations(w.act) if act else tf.identity\n\n    def call(self, inputs):\n        return self.act(self.bn(self.conv(inputs)))\n\n\nclass TFDWConv(keras.layers.Layer):\n    # Depthwise convolution\n    def __init__(self, c1, c2, k=1, s=1, p=None, act=True, w=None):\n        # ch_in, ch_out, weights, kernel, stride, padding, groups\n        super().__init__()\n        assert c2 % c1 == 0, f'TFDWConv() output={c2} must be a multiple of input={c1} channels'\n        conv = keras.layers.DepthwiseConv2D(\n            kernel_size=k,\n            depth_multiplier=c2 // c1,\n            strides=s,\n            padding='SAME' if s == 1 else 'VALID',\n            use_bias=not hasattr(w, 'bn'),\n            depthwise_initializer=keras.initializers.Constant(w.conv.weight.permute(2, 3, 1, 0).numpy()),\n            bias_initializer='zeros' if hasattr(w, 'bn') else keras.initializers.Constant(w.conv.bias.numpy()))\n        self.conv = conv if s == 1 else keras.Sequential([TFPad(autopad(k, p)), conv])\n        self.bn = TFBN(w.bn) if hasattr(w, 'bn') else tf.identity\n        self.act = activations(w.act) if act else tf.identity\n\n    def call(self, inputs):\n        return self.act(self.bn(self.conv(inputs)))\n\n\nclass TFDWConvTranspose2d(keras.layers.Layer):\n    # Depthwise ConvTranspose2d\n    def __init__(self, c1, c2, k=1, s=1, p1=0, p2=0, w=None):\n        # ch_in, ch_out, weights, kernel, stride, padding, groups\n        super().__init__()\n        assert c1 == c2, f'TFDWConv() output={c2} must be equal to input={c1} channels'\n        assert k == 4 and p1 == 1, 'TFDWConv() only valid for k=4 and p1=1'\n        weight, bias = w.weight.permute(2, 3, 1, 0).numpy(), w.bias.numpy()\n        self.c1 = c1\n        self.conv = [\n            keras.layers.Conv2DTranspose(filters=1,\n                                         kernel_size=k,\n                                         strides=s,\n                                         padding='VALID',\n                                         output_padding=p2,\n                                         use_bias=True,\n                                         kernel_initializer=keras.initializers.Constant(weight[..., i:i + 1]),\n                                         bias_initializer=keras.initializers.Constant(bias[i])) for i in range(c1)]\n\n    def call(self, inputs):\n        return tf.concat([m(x) for m, x in zip(self.conv, tf.split(inputs, self.c1, 3))], 3)[:, 1:-1, 1:-1]\n\n\nclass TFFocus(keras.layers.Layer):\n    # Focus wh information into c-space\n    def __init__(self, c1, c2, k=1, s=1, p=None, g=1, act=True, w=None):\n        # ch_in, ch_out, kernel, stride, padding, groups\n        super().__init__()\n        self.conv = TFConv(c1 * 4, c2, k, s, p, g, act, w.conv)\n\n    def call(self, inputs):  # x(b,w,h,c) -> y(b,w/2,h/2,4c)\n        # inputs = inputs / 255  # normalize 0-255 to 0-1\n        inputs = [inputs[:, ::2, ::2, :], inputs[:, 1::2, ::2, :], inputs[:, ::2, 1::2, :], inputs[:, 1::2, 1::2, :]]\n        return self.conv(tf.concat(inputs, 3))\n\n\nclass TFBottleneck(keras.layers.Layer):\n    # Standard bottleneck\n    def __init__(self, c1, c2, shortcut=True, g=1, e=0.5, w=None):  # ch_in, ch_out, shortcut, groups, expansion\n        super().__init__()\n        c_ = int(c2 * e)  # hidden channels\n        self.cv1 = TFConv(c1, c_, 1, 1, w=w.cv1)\n        self.cv2 = TFConv(c_, c2, 3, 1, g=g, w=w.cv2)\n        self.add = shortcut and c1 == c2\n\n    def call(self, inputs):\n        return inputs + self.cv2(self.cv1(inputs)) if self.add else self.cv2(self.cv1(inputs))\n\n\nclass TFCrossConv(keras.layers.Layer):\n    # Cross Convolution\n    def __init__(self, c1, c2, k=3, s=1, g=1, e=1.0, shortcut=False, w=None):\n        super().__init__()\n        c_ = int(c2 * e)  # hidden channels\n        self.cv1 = TFConv(c1, c_, (1, k), (1, s), w=w.cv1)\n        self.cv2 = TFConv(c_, c2, (k, 1), (s, 1), g=g, w=w.cv2)\n        self.add = shortcut and c1 == c2\n\n    def call(self, inputs):\n        return inputs + self.cv2(self.cv1(inputs)) if self.add else self.cv2(self.cv1(inputs))\n\n\nclass TFConv2d(keras.layers.Layer):\n    # Substitution for PyTorch nn.Conv2D\n    def __init__(self, c1, c2, k, s=1, g=1, bias=True, w=None):\n        super().__init__()\n        assert g == 1, \"TF v2.2 Conv2D does not support 'groups' argument\"\n        self.conv = keras.layers.Conv2D(filters=c2,\n                                        kernel_size=k,\n                                        strides=s,\n                                        padding='VALID',\n                                        use_bias=bias,\n                                        kernel_initializer=keras.initializers.Constant(\n                                            w.weight.permute(2, 3, 1, 0).numpy()),\n                                        bias_initializer=keras.initializers.Constant(w.bias.numpy()) if bias else None)\n\n    def call(self, inputs):\n        return self.conv(inputs)\n\n\nclass TFBottleneckCSP(keras.layers.Layer):\n    # CSP Bottleneck https://github.com/WongKinYiu/CrossStagePartialNetworks\n    def __init__(self, c1, c2, n=1, shortcut=True, g=1, e=0.5, w=None):\n        # ch_in, ch_out, number, shortcut, groups, expansion\n        super().__init__()\n        c_ = int(c2 * e)  # hidden channels\n        self.cv1 = TFConv(c1, c_, 1, 1, w=w.cv1)\n        self.cv2 = TFConv2d(c1, c_, 1, 1, bias=False, w=w.cv2)\n        self.cv3 = TFConv2d(c_, c_, 1, 1, bias=False, w=w.cv3)\n        self.cv4 = TFConv(2 * c_, c2, 1, 1, w=w.cv4)\n        self.bn = TFBN(w.bn)\n        self.act = lambda x: keras.activations.swish(x)\n        self.m = keras.Sequential([TFBottleneck(c_, c_, shortcut, g, e=1.0, w=w.m[j]) for j in range(n)])\n\n    def call(self, inputs):\n        y1 = self.cv3(self.m(self.cv1(inputs)))\n        y2 = self.cv2(inputs)\n        return self.cv4(self.act(self.bn(tf.concat((y1, y2), axis=3))))\n\n\nclass TFC3(keras.layers.Layer):\n    # CSP Bottleneck with 3 convolutions\n    def __init__(self, c1, c2, n=1, shortcut=True, g=1, e=0.5, w=None):\n        # ch_in, ch_out, number, shortcut, groups, expansion\n        super().__init__()\n        c_ = int(c2 * e)  # hidden channels\n        self.cv1 = TFConv(c1, c_, 1, 1, w=w.cv1)\n        self.cv2 = TFConv(c1, c_, 1, 1, w=w.cv2)\n        self.cv3 = TFConv(2 * c_, c2, 1, 1, w=w.cv3)\n        self.m = keras.Sequential([TFBottleneck(c_, c_, shortcut, g, e=1.0, w=w.m[j]) for j in range(n)])\n\n    def call(self, inputs):\n        return self.cv3(tf.concat((self.m(self.cv1(inputs)), self.cv2(inputs)), axis=3))\n\n\nclass TFC3x(keras.layers.Layer):\n    # 3 module with cross-convolutions\n    def __init__(self, c1, c2, n=1, shortcut=True, g=1, e=0.5, w=None):\n        # ch_in, ch_out, number, shortcut, groups, expansion\n        super().__init__()\n        c_ = int(c2 * e)  # hidden channels\n        self.cv1 = TFConv(c1, c_, 1, 1, w=w.cv1)\n        self.cv2 = TFConv(c1, c_, 1, 1, w=w.cv2)\n        self.cv3 = TFConv(2 * c_, c2, 1, 1, w=w.cv3)\n        self.m = keras.Sequential([\n            TFCrossConv(c_, c_, k=3, s=1, g=g, e=1.0, shortcut=shortcut, w=w.m[j]) for j in range(n)])\n\n    def call(self, inputs):\n        return self.cv3(tf.concat((self.m(self.cv1(inputs)), self.cv2(inputs)), axis=3))\n\n\nclass TFSPP(keras.layers.Layer):\n    # Spatial pyramid pooling layer used in YOLOv3-SPP\n    def __init__(self, c1, c2, k=(5, 9, 13), w=None):\n        super().__init__()\n        c_ = c1 // 2  # hidden channels\n        self.cv1 = TFConv(c1, c_, 1, 1, w=w.cv1)\n        self.cv2 = TFConv(c_ * (len(k) + 1), c2, 1, 1, w=w.cv2)\n        self.m = [keras.layers.MaxPool2D(pool_size=x, strides=1, padding='SAME') for x in k]\n\n    def call(self, inputs):\n        x = self.cv1(inputs)\n        return self.cv2(tf.concat([x] + [m(x) for m in self.m], 3))\n\n\nclass TFSPPF(keras.layers.Layer):\n    # Spatial pyramid pooling-Fast layer\n    def __init__(self, c1, c2, k=5, w=None):\n        super().__init__()\n        c_ = c1 // 2  # hidden channels\n        self.cv1 = TFConv(c1, c_, 1, 1, w=w.cv1)\n        self.cv2 = TFConv(c_ * 4, c2, 1, 1, w=w.cv2)\n        self.m = keras.layers.MaxPool2D(pool_size=k, strides=1, padding='SAME')\n\n    def call(self, inputs):\n        x = self.cv1(inputs)\n        y1 = self.m(x)\n        y2 = self.m(y1)\n        return self.cv2(tf.concat([x, y1, y2, self.m(y2)], 3))\n\n\nclass TFDetect(keras.layers.Layer):\n    # TF YOLOv5 Detect layer\n    def __init__(self, nc=80, anchors=(), ch=(), imgsz=(640, 640), w=None):  # detection layer\n        super().__init__()\n        self.stride = tf.convert_to_tensor(w.stride.numpy(), dtype=tf.float32)\n        self.nc = nc  # number of classes\n        self.no = nc + 5  # number of outputs per anchor\n        self.nl = len(anchors)  # number of detection layers\n        self.na = len(anchors[0]) // 2  # number of anchors\n        self.grid = [tf.zeros(1)] * self.nl  # init grid\n        self.anchors = tf.convert_to_tensor(w.anchors.numpy(), dtype=tf.float32)\n        self.anchor_grid = tf.reshape(self.anchors * tf.reshape(self.stride, [self.nl, 1, 1]), [self.nl, 1, -1, 1, 2])\n        self.m = [TFConv2d(x, self.no * self.na, 1, w=w.m[i]) for i, x in enumerate(ch)]\n        self.training = False  # set to False after building model\n        self.imgsz = imgsz\n        for i in range(self.nl):\n            ny, nx = self.imgsz[0] // self.stride[i], self.imgsz[1] // self.stride[i]\n            self.grid[i] = self._make_grid(nx, ny)\n\n    def call(self, inputs):\n        z = []  # inference output\n        x = []\n        for i in range(self.nl):\n            x.append(self.m[i](inputs[i]))\n            # x(bs,20,20,255) to x(bs,3,20,20,85)\n            ny, nx = self.imgsz[0] // self.stride[i], self.imgsz[1] // self.stride[i]\n            x[i] = tf.reshape(x[i], [-1, ny * nx, self.na, self.no])\n\n            if not self.training:  # inference\n                y = x[i]\n                grid = tf.transpose(self.grid[i], [0, 2, 1, 3]) - 0.5\n                anchor_grid = tf.transpose(self.anchor_grid[i], [0, 2, 1, 3]) * 4\n                xy = (tf.sigmoid(y[..., 0:2]) * 2 + grid) * self.stride[i]  # xy\n                wh = tf.sigmoid(y[..., 2:4]) ** 2 * anchor_grid\n                # Normalize xywh to 0-1 to reduce calibration error\n                xy /= tf.constant([[self.imgsz[1], self.imgsz[0]]], dtype=tf.float32)\n                wh /= tf.constant([[self.imgsz[1], self.imgsz[0]]], dtype=tf.float32)\n                y = tf.concat([xy, wh, tf.sigmoid(y[..., 4:5 + self.nc]), y[..., 5 + self.nc:]], -1)\n                z.append(tf.reshape(y, [-1, self.na * ny * nx, self.no]))\n\n        return tf.transpose(x, [0, 2, 1, 3]) if self.training else (tf.concat(z, 1),)\n\n    @staticmethod\n    def _make_grid(nx=20, ny=20):\n        # yv, xv = torch.meshgrid([torch.arange(ny), torch.arange(nx)])\n        # return torch.stack((xv, yv), 2).view((1, 1, ny, nx, 2)).float()\n        xv, yv = tf.meshgrid(tf.range(nx), tf.range(ny))\n        return tf.cast(tf.reshape(tf.stack([xv, yv], 2), [1, 1, ny * nx, 2]), dtype=tf.float32)\n\n\nclass TFSegment(TFDetect):\n    # YOLOv5 Segment head for segmentation models\n    def __init__(self, nc=80, anchors=(), nm=32, npr=256, ch=(), imgsz=(640, 640), w=None):\n        super().__init__(nc, anchors, ch, imgsz, w)\n        self.nm = nm  # number of masks\n        self.npr = npr  # number of protos\n        self.no = 5 + nc + self.nm  # number of outputs per anchor\n        self.m = [TFConv2d(x, self.no * self.na, 1, w=w.m[i]) for i, x in enumerate(ch)]  # output conv\n        self.proto = TFProto(ch[0], self.npr, self.nm, w=w.proto)  # protos\n        self.detect = TFDetect.call\n\n    def call(self, x):\n        p = self.proto(x[0])\n        # p = TFUpsample(None, scale_factor=4, mode='nearest')(self.proto(x[0]))  # (optional) full-size protos\n        p = tf.transpose(p, [0, 3, 1, 2])  # from shape(1,160,160,32) to shape(1,32,160,160)\n        x = self.detect(self, x)\n        return (x, p) if self.training else (x[0], p)\n\n\nclass TFProto(keras.layers.Layer):\n\n    def __init__(self, c1, c_=256, c2=32, w=None):\n        super().__init__()\n        self.cv1 = TFConv(c1, c_, k=3, w=w.cv1)\n        self.upsample = TFUpsample(None, scale_factor=2, mode='nearest')\n        self.cv2 = TFConv(c_, c_, k=3, w=w.cv2)\n        self.cv3 = TFConv(c_, c2, w=w.cv3)\n\n    def call(self, inputs):\n        return self.cv3(self.cv2(self.upsample(self.cv1(inputs))))\n\n\nclass TFUpsample(keras.layers.Layer):\n    # TF version of torch.nn.Upsample()\n    def __init__(self, size, scale_factor, mode, w=None):  # warning: all arguments needed including 'w'\n        super().__init__()\n        assert scale_factor % 2 == 0, 'scale_factor must be multiple of 2'\n        self.upsample = lambda x: tf.image.resize(x, (x.shape[1] * scale_factor, x.shape[2] * scale_factor), mode)\n        # self.upsample = keras.layers.UpSampling2D(size=scale_factor, interpolation=mode)\n        # with default arguments: align_corners=False, half_pixel_centers=False\n        # self.upsample = lambda x: tf.raw_ops.ResizeNearestNeighbor(images=x,\n        #                                                            size=(x.shape[1] * 2, x.shape[2] * 2))\n\n    def call(self, inputs):\n        return self.upsample(inputs)\n\n\nclass TFConcat(keras.layers.Layer):\n    # TF version of torch.concat()\n    def __init__(self, dimension=1, w=None):\n        super().__init__()\n        assert dimension == 1, 'convert only NCHW to NHWC concat'\n        self.d = 3\n\n    def call(self, inputs):\n        return tf.concat(inputs, self.d)\n\n\ndef parse_model(d, ch, model, imgsz):  # model_dict, input_channels(3)\n    LOGGER.info(f\"\\n{'':>3}{'from':>18}{'n':>3}{'params':>10}  {'module':<40}{'arguments':<30}\")\n    anchors, nc, gd, gw = d['anchors'], d['nc'], d['depth_multiple'], d['width_multiple']\n    na = (len(anchors[0]) // 2) if isinstance(anchors, list) else anchors  # number of anchors\n    no = na * (nc + 5)  # number of outputs = anchors * (classes + 5)\n\n    layers, save, c2 = [], [], ch[-1]  # layers, savelist, ch out\n    for i, (f, n, m, args) in enumerate(d['backbone'] + d['head']):  # from, number, module, args\n        m_str = m\n        m = eval(m) if isinstance(m, str) else m  # eval strings\n        for j, a in enumerate(args):\n            try:\n                args[j] = eval(a) if isinstance(a, str) else a  # eval strings\n            except NameError:\n                pass\n\n        n = max(round(n * gd), 1) if n > 1 else n  # depth gain\n        if m in [\n                nn.Conv2d, Conv, DWConv, DWConvTranspose2d, Bottleneck, SPP, SPPF, MixConv2d, Focus, CrossConv,\n                BottleneckCSP, C3, C3x]:\n            c1, c2 = ch[f], args[0]\n            c2 = make_divisible(c2 * gw, 8) if c2 != no else c2\n\n            args = [c1, c2, *args[1:]]\n            if m in [BottleneckCSP, C3, C3x]:\n                args.insert(2, n)\n                n = 1\n        elif m is nn.BatchNorm2d:\n            args = [ch[f]]\n        elif m is Concat:\n            c2 = sum(ch[-1 if x == -1 else x + 1] for x in f)\n        elif m in [Detect, Segment]:\n            args.append([ch[x + 1] for x in f])\n            if isinstance(args[1], int):  # number of anchors\n                args[1] = [list(range(args[1] * 2))] * len(f)\n            if m is Segment:\n                args[3] = make_divisible(args[3] * gw, 8)\n            args.append(imgsz)\n        else:\n            c2 = ch[f]\n\n        tf_m = eval('TF' + m_str.replace('nn.', ''))\n        m_ = keras.Sequential([tf_m(*args, w=model.model[i][j]) for j in range(n)]) if n > 1 \\\n            else tf_m(*args, w=model.model[i])  # module\n\n        torch_m_ = nn.Sequential(*(m(*args) for _ in range(n))) if n > 1 else m(*args)  # module\n        t = str(m)[8:-2].replace('__main__.', '')  # module type\n        np = sum(x.numel() for x in torch_m_.parameters())  # number params\n        m_.i, m_.f, m_.type, m_.np = i, f, t, np  # attach index, 'from' index, type, number params\n        LOGGER.info(f'{i:>3}{str(f):>18}{str(n):>3}{np:>10}  {t:<40}{str(args):<30}')  # print\n        save.extend(x % i for x in ([f] if isinstance(f, int) else f) if x != -1)  # append to savelist\n        layers.append(m_)\n        ch.append(c2)\n    return keras.Sequential(layers), sorted(save)\n\n\nclass TFModel:\n    # TF YOLOv5 model\n    def __init__(self, cfg='yolov5s.yaml', ch=3, nc=None, model=None, imgsz=(640, 640)):  # model, channels, classes\n        super().__init__()\n        if isinstance(cfg, dict):\n            self.yaml = cfg  # model dict\n        else:  # is *.yaml\n            import yaml  # for torch hub\n            self.yaml_file = Path(cfg).name\n            with open(cfg) as f:\n                self.yaml = yaml.load(f, Loader=yaml.FullLoader)  # model dict\n\n        # Define model\n        if nc and nc != self.yaml['nc']:\n            LOGGER.info(f\"Overriding {cfg} nc={self.yaml['nc']} with nc={nc}\")\n            self.yaml['nc'] = nc  # override yaml value\n        self.model, self.savelist = parse_model(deepcopy(self.yaml), ch=[ch], model=model, imgsz=imgsz)\n\n    def predict(self,\n                inputs,\n                tf_nms=False,\n                agnostic_nms=False,\n                topk_per_class=100,\n                topk_all=100,\n                iou_thres=0.45,\n                conf_thres=0.25):\n        y = []  # outputs\n        x = inputs\n        for m in self.model.layers:\n            if m.f != -1:  # if not from previous layer\n                x = y[m.f] if isinstance(m.f, int) else [x if j == -1 else y[j] for j in m.f]  # from earlier layers\n\n            x = m(x)  # run\n            y.append(x if m.i in self.savelist else None)  # save output\n\n        # Add TensorFlow NMS\n        if tf_nms:\n            boxes = self._xywh2xyxy(x[0][..., :4])\n            probs = x[0][:, :, 4:5]\n            classes = x[0][:, :, 5:]\n            scores = probs * classes\n            if agnostic_nms:\n                nms = AgnosticNMS()((boxes, classes, scores), topk_all, iou_thres, conf_thres)\n            else:\n                boxes = tf.expand_dims(boxes, 2)\n                nms = tf.image.combined_non_max_suppression(boxes,\n                                                            scores,\n                                                            topk_per_class,\n                                                            topk_all,\n                                                            iou_thres,\n                                                            conf_thres,\n                                                            clip_boxes=False)\n            return (nms,)\n        return x  # output [1,6300,85] = [xywh, conf, class0, class1, ...]\n        # x = x[0]  # [x(1,6300,85), ...] to x(6300,85)\n        # xywh = x[..., :4]  # x(6300,4) boxes\n        # conf = x[..., 4:5]  # x(6300,1) confidences\n        # cls = tf.reshape(tf.cast(tf.argmax(x[..., 5:], axis=1), tf.float32), (-1, 1))  # x(6300,1)  classes\n        # return tf.concat([conf, cls, xywh], 1)\n\n    @staticmethod\n    def _xywh2xyxy(xywh):\n        # Convert nx4 boxes from [x, y, w, h] to [x1, y1, x2, y2] where xy1=top-left, xy2=bottom-right\n        x, y, w, h = tf.split(xywh, num_or_size_splits=4, axis=-1)\n        return tf.concat([x - w / 2, y - h / 2, x + w / 2, y + h / 2], axis=-1)\n\n\nclass AgnosticNMS(keras.layers.Layer):\n    # TF Agnostic NMS\n    def call(self, input, topk_all, iou_thres, conf_thres):\n        # wrap map_fn to avoid TypeSpec related error https://stackoverflow.com/a/65809989/3036450\n        return tf.map_fn(lambda x: self._nms(x, topk_all, iou_thres, conf_thres),\n                         input,\n                         fn_output_signature=(tf.float32, tf.float32, tf.float32, tf.int32),\n                         name='agnostic_nms')\n\n    @staticmethod\n    def _nms(x, topk_all=100, iou_thres=0.45, conf_thres=0.25):  # agnostic NMS\n        boxes, classes, scores = x\n        class_inds = tf.cast(tf.argmax(classes, axis=-1), tf.float32)\n        scores_inp = tf.reduce_max(scores, -1)\n        selected_inds = tf.image.non_max_suppression(boxes,\n                                                     scores_inp,\n                                                     max_output_size=topk_all,\n                                                     iou_threshold=iou_thres,\n                                                     score_threshold=conf_thres)\n        selected_boxes = tf.gather(boxes, selected_inds)\n        padded_boxes = tf.pad(selected_boxes,\n                              paddings=[[0, topk_all - tf.shape(selected_boxes)[0]], [0, 0]],\n                              mode='CONSTANT',\n                              constant_values=0.0)\n        selected_scores = tf.gather(scores_inp, selected_inds)\n        padded_scores = tf.pad(selected_scores,\n                               paddings=[[0, topk_all - tf.shape(selected_boxes)[0]]],\n                               mode='CONSTANT',\n                               constant_values=-1.0)\n        selected_classes = tf.gather(class_inds, selected_inds)\n        padded_classes = tf.pad(selected_classes,\n                                paddings=[[0, topk_all - tf.shape(selected_boxes)[0]]],\n                                mode='CONSTANT',\n                                constant_values=-1.0)\n        valid_detections = tf.shape(selected_inds)[0]\n        return padded_boxes, padded_scores, padded_classes, valid_detections\n\n\ndef activations(act=nn.SiLU):\n    # Returns TF activation from input PyTorch activation\n    if isinstance(act, nn.LeakyReLU):\n        return lambda x: keras.activations.relu(x, alpha=0.1)\n    elif isinstance(act, nn.Hardswish):\n        return lambda x: x * tf.nn.relu6(x + 3) * 0.166666667\n    elif isinstance(act, (nn.SiLU, SiLU)):\n        return lambda x: keras.activations.swish(x)\n    else:\n        raise Exception(f'no matching TensorFlow activation found for PyTorch activation {act}')\n\n\ndef representative_dataset_gen(dataset, ncalib=100):\n    # Representative dataset generator for use with converter.representative_dataset, returns a generator of np arrays\n    for n, (path, img, im0s, vid_cap, string) in enumerate(dataset):\n        im = np.transpose(img, [1, 2, 0])\n        im = np.expand_dims(im, axis=0).astype(np.float32)\n        im /= 255\n        yield [im]\n        if n >= ncalib:\n            break\n\n\ndef run(\n        weights=ROOT / 'yolov5s.pt',  # weights path\n        imgsz=(640, 640),  # inference size h,w\n        batch_size=1,  # batch size\n        dynamic=False,  # dynamic batch size\n):\n    # PyTorch model\n    im = torch.zeros((batch_size, 3, *imgsz))  # BCHW image\n    model = attempt_load(weights, device=torch.device('cpu'), inplace=True, fuse=False)\n    _ = model(im)  # inference\n    model.info()\n\n    # TensorFlow model\n    im = tf.zeros((batch_size, *imgsz, 3))  # BHWC image\n    tf_model = TFModel(cfg=model.yaml, model=model, nc=model.nc, imgsz=imgsz)\n    _ = tf_model.predict(im)  # inference\n\n    # Keras model\n    im = keras.Input(shape=(*imgsz, 3), batch_size=None if dynamic else batch_size)\n    keras_model = keras.Model(inputs=im, outputs=tf_model.predict(im))\n    keras_model.summary()\n\n    LOGGER.info('PyTorch, TensorFlow and Keras models successfully verified.\\nUse export.py for TF model export.')\n\n\ndef parse_opt():\n    parser = argparse.ArgumentParser()\n    parser.add_argument('--weights', type=str, default=ROOT / 'yolov5s.pt', help='weights path')\n    parser.add_argument('--imgsz', '--img', '--img-size', nargs='+', type=int, default=[640], help='inference size h,w')\n    parser.add_argument('--batch-size', type=int, default=1, help='batch size')\n    parser.add_argument('--dynamic', action='store_true', help='dynamic batch size')\n    opt = parser.parse_args()\n    opt.imgsz *= 2 if len(opt.imgsz) == 1 else 1  # expand\n    print_args(vars(opt))\n    return opt\n\n\ndef main(opt):\n    run(**vars(opt))\n\n\nif __name__ == '__main__':\n    opt = parse_opt()\n    main(opt)\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/models/yolo.py",
    "content": "# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n\"\"\"\nYOLO-specific modules\n\nUsage:\n    $ python models/yolo.py --cfg yolov5s.yaml\n\"\"\"\n\nimport argparse\nimport contextlib\nimport os\nimport platform\nimport sys\nfrom copy import deepcopy\nfrom pathlib import Path\n\nFILE = Path(__file__).resolve()\nROOT = FILE.parents[1]  # YOLOv5 root directory\nif str(ROOT) not in sys.path:\n    sys.path.append(str(ROOT))  # add ROOT to PATH\nif platform.system() != 'Windows':\n    ROOT = Path(os.path.relpath(ROOT, Path.cwd()))  # relative\n\nfrom models.common import *\nfrom models.experimental import *\nfrom utils.autoanchor import check_anchor_order\nfrom utils.general import LOGGER, check_version, check_yaml, make_divisible, print_args\nfrom utils.plots import feature_visualization\nfrom utils.torch_utils import (fuse_conv_and_bn, initialize_weights, model_info, profile, scale_img, select_device,\n                               time_sync)\n\ntry:\n    import thop  # for FLOPs computation\nexcept ImportError:\n    thop = None\n\n\nclass Detect(nn.Module):\n    # YOLOv5 Detect head for detection models\n    stride = None  # strides computed during build\n    dynamic = False  # force grid reconstruction\n    export = False  # export mode\n\n    def __init__(self, nc=80, anchors=(), ch=(), inplace=True):  # detection layer\n        super().__init__()\n        self.nc = nc  # number of classes\n        self.no = nc + 5  # number of outputs per anchor\n        self.nl = len(anchors)  # number of detection layers\n        self.na = len(anchors[0]) // 2  # number of anchors\n        self.grid = [torch.empty(0) for _ in range(self.nl)]  # init grid\n        self.anchor_grid = [torch.empty(0) for _ in range(self.nl)]  # init anchor grid\n        self.register_buffer('anchors', torch.tensor(anchors).float().view(self.nl, -1, 2))  # shape(nl,na,2)\n        self.m = nn.ModuleList(nn.Conv2d(x, self.no * self.na, 1) for x in ch[:self.nl])  # output conv\n        self.m2 = nn.ModuleList(nn.Conv2d(x, self.no * self.na, 1) for x in ch[self.nl:])  # output conv\n        self.inplace = inplace  # use inplace ops (e.g. slice assignment)\n\n    def forward(self, x):\n        z = []  # inference output\n        for i in range(self.nl):\n            x[i] = self.m[i](x[i])  # conv\n            bs, _, ny, nx = x[i].shape  # x(bs,255,20,20) to x(bs,3,20,20,85)\n            x[i] = x[i].view(bs, self.na, self.no, ny, nx).permute(0, 1, 3, 4, 2).contiguous()\n            \n            x[i + self.nl] = self.m2[i](x[i + self.nl])  # conv\n            bs, _, ny, nx = x[i + self.nl].shape\n            x[i + self.nl] = x[i + self.nl].view(bs, self.na, self.no, ny, nx).permute(0, 1, 3, 4, 2).contiguous()\n\n            if not self.training:  # inference\n                if self.dynamic or self.grid[i].shape[2:4] != x[i].shape[2:4]:\n                    self.grid[i], self.anchor_grid[i] = self._make_grid(nx, ny, i)\n\n                if isinstance(self, Segment):  # (boxes + masks)\n                    xy, wh, conf, mask = x[i].split((2, 2, self.nc + 1, self.no - self.nc - 5), 4)\n                    xy = (xy.sigmoid() * 2 + self.grid[i]) * self.stride[i]  # xy\n                    wh = (wh.sigmoid() * 2) ** 2 * self.anchor_grid[i]  # wh\n                    y = torch.cat((xy, wh, conf.sigmoid(), mask), 4)\n                else:  # Detect (boxes only)\n                    xy, wh, conf = x[i].sigmoid().split((2, 2, self.nc + 1), 4)\n                    xy = (xy * 2 + self.grid[i]) * self.stride[i]  # xy\n                    wh = (wh * 2) ** 2 * self.anchor_grid[i]  # wh\n                    y = torch.cat((xy, wh, conf), 4)\n                z.append(y.view(bs, self.na * nx * ny, self.no))\n\n        return x if self.training else (torch.cat(z, 1),) if self.export else (torch.cat(z, 1), x[:self.nl])\n\n    def _make_grid(self, nx=20, ny=20, i=0, torch_1_10=check_version(torch.__version__, '1.10.0')):\n        d = self.anchors[i].device\n        t = self.anchors[i].dtype\n        shape = 1, self.na, ny, nx, 2  # grid shape\n        y, x = torch.arange(ny, device=d, dtype=t), torch.arange(nx, device=d, dtype=t)\n        yv, xv = torch.meshgrid(y, x, indexing='ij') if torch_1_10 else torch.meshgrid(y, x)  # torch>=0.7 compatibility\n        grid = torch.stack((xv, yv), 2).expand(shape) - 0.5  # add grid offset, i.e. y = 2.0 * x - 0.5\n        anchor_grid = (self.anchors[i] * self.stride[i]).view((1, self.na, 1, 1, 2)).expand(shape)\n        return grid, anchor_grid\n\n\nclass Segment(Detect):\n    # YOLOv5 Segment head for segmentation models\n    def __init__(self, nc=80, anchors=(), nm=32, npr=256, ch=(), inplace=True):\n        super().__init__(nc, anchors, ch, inplace)\n        self.nm = nm  # number of masks\n        self.npr = npr  # number of protos\n        self.no = 5 + nc + self.nm  # number of outputs per anchor\n        self.m = nn.ModuleList(nn.Conv2d(x, self.no * self.na, 1) for x in ch)  # output conv\n        self.proto = Proto(ch[0], self.npr, self.nm)  # protos\n        self.detect = Detect.forward\n\n    def forward(self, x):\n        p = self.proto(x[0])\n        x = self.detect(self, x)\n        return (x, p) if self.training else (x[0], p) if self.export else (x[0], p, x[1])\n\n\nclass BaseModel(nn.Module):\n    # YOLOv5 base model\n    def forward(self, x, profile=False, visualize=False):\n        return self._forward_once(x, profile, visualize)  # single-scale inference, train\n\n    def _forward_once(self, x, profile=False, visualize=False):\n        y, dt = [], []  # outputs\n        for m in self.model:\n            if m.f != -1:  # if not from previous layer\n                x = y[m.f] if isinstance(m.f, int) else [x if j == -1 else y[j] for j in m.f]  # from earlier layers\n            if profile:\n                self._profile_one_layer(m, x, dt)\n            x = m(x)  # run\n            y.append(x if m.i in self.save else None)  # save output\n            if visualize:\n                feature_visualization(x, m.type, m.i, save_dir=visualize)\n        return x\n\n    def _profile_one_layer(self, m, x, dt):\n        c = m == self.model[-1]  # is final layer, copy input as inplace fix\n        o = thop.profile(m, inputs=(x.copy() if c else x,), verbose=False)[0] / 1E9 * 2 if thop else 0  # FLOPs\n        t = time_sync()\n        for _ in range(10):\n            m(x.copy() if c else x)\n        dt.append((time_sync() - t) * 100)\n        if m == self.model[0]:\n            LOGGER.info(f\"{'time (ms)':>10s} {'GFLOPs':>10s} {'params':>10s}  module\")\n        LOGGER.info(f'{dt[-1]:10.2f} {o:10.2f} {m.np:10.0f}  {m.type}')\n        if c:\n            LOGGER.info(f\"{sum(dt):10.2f} {'-':>10s} {'-':>10s}  Total\")\n\n    def fuse(self):  # fuse model Conv2d() + BatchNorm2d() layers\n        LOGGER.info('Fusing layers... ')\n        for m in self.model.modules():\n            if isinstance(m, (Conv, DWConv)) and hasattr(m, 'bn'):\n                m.conv = fuse_conv_and_bn(m.conv, m.bn)  # update conv\n                delattr(m, 'bn')  # remove batchnorm\n                m.forward = m.forward_fuse  # update forward\n        self.info()\n        return self\n\n    def info(self, verbose=False, img_size=640):  # print model information\n        model_info(self, verbose, img_size)\n\n    def _apply(self, fn):\n        # Apply to(), cpu(), cuda(), half() to model tensors that are not parameters or registered buffers\n        self = super()._apply(fn)\n        m = self.model[-1]  # Detect()\n        if isinstance(m, (Detect, Segment)):\n            m.stride = fn(m.stride)\n            m.grid = list(map(fn, m.grid))\n            if isinstance(m.anchor_grid, list):\n                m.anchor_grid = list(map(fn, m.anchor_grid))\n        return self\n\n\nclass DetectionModel(BaseModel):\n    # YOLOv5 detection model\n    def __init__(self, cfg='yolov5s.yaml', ch=3, nc=None, anchors=None):  # model, input channels, number of classes\n        super().__init__()\n        if isinstance(cfg, dict):\n            self.yaml = cfg  # model dict\n        else:  # is *.yaml\n            import yaml  # for torch hub\n            self.yaml_file = Path(cfg).name\n            with open(cfg, encoding='ascii', errors='ignore') as f:\n                self.yaml = yaml.safe_load(f)  # model dict\n\n        # Define model\n        ch = self.yaml['ch'] = self.yaml.get('ch', ch)  # input channels\n        if nc and nc != self.yaml['nc']:\n            LOGGER.info(f\"Overriding model.yaml nc={self.yaml['nc']} with nc={nc}\")\n            self.yaml['nc'] = nc  # override yaml value\n        if anchors:\n            LOGGER.info(f'Overriding model.yaml anchors with anchors={anchors}')\n            self.yaml['anchors'] = round(anchors)  # override yaml value\n        self.model, self.save = parse_model(deepcopy(self.yaml), ch=[ch])  # model, savelist\n        self.names = [str(i) for i in range(self.yaml['nc'])]  # default names\n        self.inplace = self.yaml.get('inplace', True)\n\n        # Build strides, anchors\n        m = self.model[-1]  # Detect()\n        if isinstance(m, (Detect, Segment)):\n            s = 256  # 2x min stride\n            m.inplace = self.inplace\n            forward = lambda x: self.forward(x)[0] if isinstance(m, Segment) else self.forward(x)\n            m.stride = torch.tensor([s / x.shape[-2] for x in forward(torch.zeros(1, ch, s, s))][:3])  # forward\n            check_anchor_order(m)\n            m.anchors /= m.stride.view(-1, 1, 1)\n            self.stride = m.stride\n            self._initialize_biases()  # only run once\n\n        # Init weights, biases\n        initialize_weights(self)\n        self.info()\n        LOGGER.info('')\n\n    def forward(self, x, augment=False, profile=False, visualize=False):\n        if augment:\n            return self._forward_augment(x)  # augmented inference, None\n        return self._forward_once(x, profile, visualize)  # single-scale inference, train\n\n    def _forward_augment(self, x):\n        img_size = x.shape[-2:]  # height, width\n        s = [1, 0.83, 0.67]  # scales\n        f = [None, 3, None]  # flips (2-ud, 3-lr)\n        y = []  # outputs\n        for si, fi in zip(s, f):\n            xi = scale_img(x.flip(fi) if fi else x, si, gs=int(self.stride.max()))\n            yi = self._forward_once(xi)[0]  # forward\n            # cv2.imwrite(f'img_{si}.jpg', 255 * xi[0].cpu().numpy().transpose((1, 2, 0))[:, :, ::-1])  # save\n            yi = self._descale_pred(yi, fi, si, img_size)\n            y.append(yi)\n        y = self._clip_augmented(y)  # clip augmented tails\n        return torch.cat(y, 1), None  # augmented inference, train\n\n    def _descale_pred(self, p, flips, scale, img_size):\n        # de-scale predictions following augmented inference (inverse operation)\n        if self.inplace:\n            p[..., :4] /= scale  # de-scale\n            if flips == 2:\n                p[..., 1] = img_size[0] - p[..., 1]  # de-flip ud\n            elif flips == 3:\n                p[..., 0] = img_size[1] - p[..., 0]  # de-flip lr\n        else:\n            x, y, wh = p[..., 0:1] / scale, p[..., 1:2] / scale, p[..., 2:4] / scale  # de-scale\n            if flips == 2:\n                y = img_size[0] - y  # de-flip ud\n            elif flips == 3:\n                x = img_size[1] - x  # de-flip lr\n            p = torch.cat((x, y, wh, p[..., 4:]), -1)\n        return p\n\n    def _clip_augmented(self, y):\n        # Clip YOLOv5 augmented inference tails\n        nl = self.model[-1].nl  # number of detection layers (P3-P5)\n        g = sum(4 ** x for x in range(nl))  # grid points\n        e = 1  # exclude layer count\n        i = (y[0].shape[1] // g) * sum(4 ** x for x in range(e))  # indices\n        y[0] = y[0][:, :-i]  # large\n        i = (y[-1].shape[1] // g) * sum(4 ** (nl - 1 - x) for x in range(e))  # indices\n        y[-1] = y[-1][:, i:]  # small\n        return y\n\n    def _initialize_biases(self, cf=None):  # initialize biases into Detect(), cf is class frequency\n        # https://arxiv.org/abs/1708.02002 section 3.3\n        # cf = torch.bincount(torch.tensor(np.concatenate(dataset.labels, 0)[:, 0]).long(), minlength=nc) + 1.\n        m = self.model[-1]  # Detect() module\n        for mi, s in zip(m.m, m.stride):  # from\n            b = mi.bias.view(m.na, -1)  # conv.bias(255) to (3,85)\n            b.data[:, 4] += math.log(8 / (640 / s) ** 2)  # obj (8 objects per 640 image)\n            b.data[:, 5:5 + m.nc] += math.log(0.6 / (m.nc - 0.99999)) if cf is None else torch.log(cf / cf.sum())  # cls\n            mi.bias = torch.nn.Parameter(b.view(-1), requires_grad=True)\n\n\nModel = DetectionModel  # retain YOLOv5 'Model' class for backwards compatibility\n\n\nclass SegmentationModel(DetectionModel):\n    # YOLOv5 segmentation model\n    def __init__(self, cfg='yolov5s-seg.yaml', ch=3, nc=None, anchors=None):\n        super().__init__(cfg, ch, nc, anchors)\n\n\nclass ClassificationModel(BaseModel):\n    # YOLOv5 classification model\n    def __init__(self, cfg=None, model=None, nc=1000, cutoff=10):  # yaml, model, number of classes, cutoff index\n        super().__init__()\n        self._from_detection_model(model, nc, cutoff) if model is not None else self._from_yaml(cfg)\n\n    def _from_detection_model(self, model, nc=1000, cutoff=10):\n        # Create a YOLOv5 classification model from a YOLOv5 detection model\n        if isinstance(model, DetectMultiBackend):\n            model = model.model  # unwrap DetectMultiBackend\n        model.model = model.model[:cutoff]  # backbone\n        m = model.model[-1]  # last layer\n        ch = m.conv.in_channels if hasattr(m, 'conv') else m.cv1.conv.in_channels  # ch into module\n        c = Classify(ch, nc)  # Classify()\n        c.i, c.f, c.type = m.i, m.f, 'models.common.Classify'  # index, from, type\n        model.model[-1] = c  # replace\n        self.model = model.model\n        self.stride = model.stride\n        self.save = []\n        self.nc = nc\n\n    def _from_yaml(self, cfg):\n        # Create a YOLOv5 classification model from a *.yaml file\n        self.model = None\n\n\ndef parse_model(d, ch):  # model_dict, input_channels(3)\n    # Parse a YOLOv5 model.yaml dictionary\n    LOGGER.info(f\"\\n{'':>3}{'from':>18}{'n':>3}{'params':>10}  {'module':<40}{'arguments':<30}\")\n    anchors, nc, gd, gw, act = d['anchors'], d['nc'], d['depth_multiple'], d['width_multiple'], d.get('activation')\n    if act:\n        Conv.default_act = eval(act)  # redefine default activation, i.e. Conv.default_act = nn.SiLU()\n        LOGGER.info(f\"{colorstr('activation:')} {act}\")  # print\n    na = (len(anchors[0]) // 2) if isinstance(anchors, list) else anchors  # number of anchors\n    no = na * (nc + 5)  # number of outputs = anchors * (classes + 5)\n\n    layers, save, c2 = [], [], ch[-1]  # layers, savelist, ch out\n    for i, (f, n, m, args) in enumerate(d['backbone'] + d['head']):  # from, number, module, args\n        m = eval(m) if isinstance(m, str) else m  # eval strings\n        for j, a in enumerate(args):\n            with contextlib.suppress(NameError):\n                args[j] = eval(a) if isinstance(a, str) else a  # eval strings\n\n        n = n_ = max(round(n * gd), 1) if n > 1 else n  # depth gain\n        if m in {\n                Conv, GhostConv, Bottleneck, GhostBottleneck, SPP, SPPF, DWConv, MixConv2d, Focus, CrossConv,\n                BottleneckCSP, C3, C3TR, C3SPP, C3Ghost, nn.ConvTranspose2d, DWConvTranspose2d, C3x}:\n            c1, c2 = ch[f], args[0]\n            if c2 != no:  # if not output\n                c2 = make_divisible(c2 * gw, 8)\n\n            args = [c1, c2, *args[1:]]\n            if m in {BottleneckCSP, C3, C3TR, C3Ghost, C3x}:\n                args.insert(2, n)  # number of repeats\n                n = 1\n        elif m is nn.BatchNorm2d:\n            args = [ch[f]]\n        elif m is Concat:\n            c2 = sum(ch[x] for x in f)\n        # TODO: channel, gw, gd\n        elif m in {Detect, Segment}:\n            args.append([ch[x] for x in f])\n            if isinstance(args[1], int):  # number of anchors\n                args[1] = [list(range(args[1] * 2))] * len(f)\n            if m is Segment:\n                args[3] = make_divisible(args[3] * gw, 8)\n        elif m is Contract:\n            c2 = ch[f] * args[0] ** 2\n        elif m is Expand:\n            c2 = ch[f] // args[0] ** 2\n        else:\n            c2 = ch[f]\n\n        m_ = nn.Sequential(*(m(*args) for _ in range(n))) if n > 1 else m(*args)  # module\n        t = str(m)[8:-2].replace('__main__.', '')  # module type\n        np = sum(x.numel() for x in m_.parameters())  # number params\n        m_.i, m_.f, m_.type, m_.np = i, f, t, np  # attach index, 'from' index, type, number params\n        LOGGER.info(f'{i:>3}{str(f):>18}{n_:>3}{np:10.0f}  {t:<40}{str(args):<30}')  # print\n        save.extend(x % i for x in ([f] if isinstance(f, int) else f) if x != -1)  # append to savelist\n        layers.append(m_)\n        if i == 0:\n            ch = []\n        ch.append(c2)\n    return nn.Sequential(*layers), sorted(save)\n\n\nif __name__ == '__main__':\n    parser = argparse.ArgumentParser()\n    parser.add_argument('--cfg', type=str, default='yolov5s.yaml', help='model.yaml')\n    parser.add_argument('--batch-size', type=int, default=1, help='total batch size for all GPUs')\n    parser.add_argument('--device', default='', help='cuda device, i.e. 0 or 0,1,2,3 or cpu')\n    parser.add_argument('--profile', action='store_true', help='profile model speed')\n    parser.add_argument('--line-profile', action='store_true', help='profile model speed layer by layer')\n    parser.add_argument('--test', action='store_true', help='test all yolo*.yaml')\n    opt = parser.parse_args()\n    opt.cfg = check_yaml(opt.cfg)  # check YAML\n    print_args(vars(opt))\n    device = select_device(opt.device)\n\n    # Create model\n    im = torch.rand(opt.batch_size, 3, 640, 640).to(device)\n    model = Model(opt.cfg).to(device)\n\n    # Options\n    if opt.line_profile:  # profile layer by layer\n        model(im, profile=True)\n\n    elif opt.profile:  # profile forward-backward\n        results = profile(input=im, ops=[model], n=3)\n\n    elif opt.test:  # test all models\n        for cfg in Path(ROOT / 'models').rglob('yolo*.yaml'):\n            try:\n                _ = Model(cfg)\n            except Exception as e:\n                print(f'Error in {cfg}: {e}')\n\n    else:  # report fused model summary\n        model.fuse()\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/models/yolov5_aux.yaml",
    "content": "# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n\n# Parameters\nnc: 80  # number of classes\ndepth_multiple: 0.33  # model depth multiple\nwidth_multiple: 0.25  # layer channel multiple\nanchors:\n  - [10,13, 16,30, 33,23]  # P3/8\n  - [30,61, 62,45, 59,119]  # P4/16\n  - [116,90, 156,198, 373,326]  # P5/32\n\n# YOLOv5 v6.0 backbone\nbackbone:\n  # [from, number, module, args]\n  [[-1, 1, Conv, [64, 6, 2, 2]],  # 0-P1/2\n   [-1, 1, Conv, [128, 3, 2]],  # 1-P2/4\n   [-1, 3, C3, [128]],\n   [-1, 1, Conv, [256, 3, 2]],  # 3-P3/8\n   [-1, 6, C3, [256]],\n   [-1, 1, Conv, [512, 3, 2]],  # 5-P4/16\n   [-1, 9, C3, [512]],\n   [-1, 1, Conv, [1024, 3, 2]],  # 7-P5/32\n   [-1, 3, C3, [1024]],\n   [-1, 1, SPPF, [1024, 5]],  # 9\n  ]\n\n# YOLOv5 v6.0 head\nhead:\n  [[-1, 1, Conv, [512, 1, 1]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [[-1, 6], 1, Concat, [1]],  # cat backbone P4\n   [-1, 3, C3, [512, False]],  # 13\n\n   [-1, 1, Conv, [256, 1, 1]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [[-1, 4], 1, Concat, [1]],  # cat backbone P3\n   [-1, 3, C3, [256, False]],  # 17 (P3/8-small)\n\n   [-1, 1, Conv, [256, 3, 2]],\n   [[-1, 14], 1, Concat, [1]],  # cat head P4\n   [-1, 3, C3, [512, False]],  # 20 (P4/16-medium)\n\n   [-1, 1, Conv, [512, 3, 2]],\n   [[-1, 10], 1, Concat, [1]],  # cat head P5\n   [-1, 3, C3, [1024, False]],  # 23 (P5/32-large)\n\n   [17, 1, Conv, [256, 3, 1]], # 24\n   [13, 1, Conv, [512, 3, 1]], # 25\n   [9, 1, Conv, [1024, 3, 1]], # 26\n\n   [[17, 20, 23, 24, 25, 26], 1, Detect, [nc, anchors]],  # Detect(P3, P4, P5)\n  ]\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/models/yolov5l.yaml",
    "content": "# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n\n# Parameters\nnc: 80  # number of classes\ndepth_multiple: 1.0  # model depth multiple\nwidth_multiple: 1.0  # layer channel multiple\nanchors:\n  - [10,13, 16,30, 33,23]  # P3/8\n  - [30,61, 62,45, 59,119]  # P4/16\n  - [116,90, 156,198, 373,326]  # P5/32\n\n# YOLOv5 v6.0 backbone\nbackbone:\n  # [from, number, module, args]\n  [[-1, 1, Conv, [64, 6, 2, 2]],  # 0-P1/2\n   [-1, 1, Conv, [128, 3, 2]],  # 1-P2/4\n   [-1, 3, C3, [128]],\n   [-1, 1, Conv, [256, 3, 2]],  # 3-P3/8\n   [-1, 6, C3, [256]],\n   [-1, 1, Conv, [512, 3, 2]],  # 5-P4/16\n   [-1, 9, C3, [512]],\n   [-1, 1, Conv, [1024, 3, 2]],  # 7-P5/32\n   [-1, 3, C3, [1024]],\n   [-1, 1, SPPF, [1024, 5]],  # 9\n  ]\n\n# YOLOv5 v6.0 head\nhead:\n  [[-1, 1, Conv, [512, 1, 1]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [[-1, 6], 1, Concat, [1]],  # cat backbone P4\n   [-1, 3, C3, [512, False]],  # 13\n\n   [-1, 1, Conv, [256, 1, 1]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [[-1, 4], 1, Concat, [1]],  # cat backbone P3\n   [-1, 3, C3, [256, False]],  # 17 (P3/8-small)\n\n   [-1, 1, Conv, [256, 3, 2]],\n   [[-1, 14], 1, Concat, [1]],  # cat head P4\n   [-1, 3, C3, [512, False]],  # 20 (P4/16-medium)\n\n   [-1, 1, Conv, [512, 3, 2]],\n   [[-1, 10], 1, Concat, [1]],  # cat head P5\n   [-1, 3, C3, [1024, False]],  # 23 (P5/32-large)\n\n   [[17, 20, 23], 1, Detect, [nc, anchors]],  # Detect(P3, P4, P5)\n  ]\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/models/yolov5m.yaml",
    "content": "# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n\n# Parameters\nnc: 80  # number of classes\ndepth_multiple: 0.67  # model depth multiple\nwidth_multiple: 0.75  # layer channel multiple\nanchors:\n  - [10,13, 16,30, 33,23]  # P3/8\n  - [30,61, 62,45, 59,119]  # P4/16\n  - [116,90, 156,198, 373,326]  # P5/32\n\n# YOLOv5 v6.0 backbone\nbackbone:\n  # [from, number, module, args]\n  [[-1, 1, Conv, [64, 6, 2, 2]],  # 0-P1/2\n   [-1, 1, Conv, [128, 3, 2]],  # 1-P2/4\n   [-1, 3, C3, [128]],\n   [-1, 1, Conv, [256, 3, 2]],  # 3-P3/8\n   [-1, 6, C3, [256]],\n   [-1, 1, Conv, [512, 3, 2]],  # 5-P4/16\n   [-1, 9, C3, [512]],\n   [-1, 1, Conv, [1024, 3, 2]],  # 7-P5/32\n   [-1, 3, C3, [1024]],\n   [-1, 1, SPPF, [1024, 5]],  # 9\n  ]\n\n# YOLOv5 v6.0 head\nhead:\n  [[-1, 1, Conv, [512, 1, 1]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [[-1, 6], 1, Concat, [1]],  # cat backbone P4\n   [-1, 3, C3, [512, False]],  # 13\n\n   [-1, 1, Conv, [256, 1, 1]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [[-1, 4], 1, Concat, [1]],  # cat backbone P3\n   [-1, 3, C3, [256, False]],  # 17 (P3/8-small)\n\n   [-1, 1, Conv, [256, 3, 2]],\n   [[-1, 14], 1, Concat, [1]],  # cat head P4\n   [-1, 3, C3, [512, False]],  # 20 (P4/16-medium)\n\n   [-1, 1, Conv, [512, 3, 2]],\n   [[-1, 10], 1, Concat, [1]],  # cat head P5\n   [-1, 3, C3, [1024, False]],  # 23 (P5/32-large)\n\n   [[17, 20, 23], 1, Detect, [nc, anchors]],  # Detect(P3, P4, P5)\n  ]\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/models/yolov5n.yaml",
    "content": "# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n\n# Parameters\nnc: 80  # number of classes\ndepth_multiple: 0.33  # model depth multiple\nwidth_multiple: 0.25  # layer channel multiple\nanchors:\n  - [10,13, 16,30, 33,23]  # P3/8\n  - [30,61, 62,45, 59,119]  # P4/16\n  - [116,90, 156,198, 373,326]  # P5/32\n\n# YOLOv5 v6.0 backbone\nbackbone:\n  # [from, number, module, args]\n  [[-1, 1, Conv, [64, 6, 2, 2]],  # 0-P1/2\n   [-1, 1, Conv, [128, 3, 2]],  # 1-P2/4\n   [-1, 3, C3, [128]],\n   [-1, 1, Conv, [256, 3, 2]],  # 3-P3/8\n   [-1, 6, C3, [256]],\n   [-1, 1, Conv, [512, 3, 2]],  # 5-P4/16\n   [-1, 9, C3, [512]],\n   [-1, 1, Conv, [1024, 3, 2]],  # 7-P5/32\n   [-1, 3, C3, [1024]],\n   [-1, 1, SPPF, [1024, 5]],  # 9\n  ]\n\n# YOLOv5 v6.0 head\nhead:\n  [[-1, 1, Conv, [512, 1, 1]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [[-1, 6], 1, Concat, [1]],  # cat backbone P4\n   [-1, 3, C3, [512, False]],  # 13\n\n   [-1, 1, Conv, [256, 1, 1]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [[-1, 4], 1, Concat, [1]],  # cat backbone P3\n   [-1, 3, C3, [256, False]],  # 17 (P3/8-small)\n\n   [-1, 1, Conv, [256, 3, 2]],\n   [[-1, 14], 1, Concat, [1]],  # cat head P4\n   [-1, 3, C3, [512, False]],  # 20 (P4/16-medium)\n\n   [-1, 1, Conv, [512, 3, 2]],\n   [[-1, 10], 1, Concat, [1]],  # cat head P5\n   [-1, 3, C3, [1024, False]],  # 23 (P5/32-large)\n\n   [[17, 20, 23], 1, Detect, [nc, anchors]],  # Detect(P3, P4, P5)\n  ]\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/models/yolov5s.yaml",
    "content": "# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n\n# Parameters\nnc: 80  # number of classes\ndepth_multiple: 0.33  # model depth multiple\nwidth_multiple: 0.50  # layer channel multiple\nanchors:\n  - [10,13, 16,30, 33,23]  # P3/8\n  - [30,61, 62,45, 59,119]  # P4/16\n  - [116,90, 156,198, 373,326]  # P5/32\n\n# YOLOv5 v6.0 backbone\nbackbone:\n  # [from, number, module, args]\n  [[-1, 1, Conv, [64, 6, 2, 2]],  # 0-P1/2\n   [-1, 1, Conv, [128, 3, 2]],  # 1-P2/4\n   [-1, 3, C3, [128]],\n   [-1, 1, Conv, [256, 3, 2]],  # 3-P3/8\n   [-1, 6, C3, [256]],\n   [-1, 1, Conv, [512, 3, 2]],  # 5-P4/16\n   [-1, 9, C3, [512]],\n   [-1, 1, Conv, [1024, 3, 2]],  # 7-P5/32\n   [-1, 3, C3, [1024]],\n   [-1, 1, SPPF, [1024, 5]],  # 9\n  ]\n\n# YOLOv5 v6.0 head\nhead:\n  [[-1, 1, Conv, [512, 1, 1]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [[-1, 6], 1, Concat, [1]],  # cat backbone P4\n   [-1, 3, C3, [512, False]],  # 13\n\n   [-1, 1, Conv, [256, 1, 1]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [[-1, 4], 1, Concat, [1]],  # cat backbone P3\n   [-1, 3, C3, [256, False]],  # 17 (P3/8-small)\n\n   [-1, 1, Conv, [256, 3, 2]],\n   [[-1, 14], 1, Concat, [1]],  # cat head P4\n   [-1, 3, C3, [512, False]],  # 20 (P4/16-medium)\n\n   [-1, 1, Conv, [512, 3, 2]],\n   [[-1, 10], 1, Concat, [1]],  # cat head P5\n   [-1, 3, C3, [1024, False]],  # 23 (P5/32-large)\n\n   [[17, 20, 23], 1, Detect, [nc, anchors]],  # Detect(P3, P4, P5)\n  ]\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/models/yolov5x.yaml",
    "content": "# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n\n# Parameters\nnc: 80  # number of classes\ndepth_multiple: 1.33  # model depth multiple\nwidth_multiple: 1.25  # layer channel multiple\nanchors:\n  - [10,13, 16,30, 33,23]  # P3/8\n  - [30,61, 62,45, 59,119]  # P4/16\n  - [116,90, 156,198, 373,326]  # P5/32\n\n# YOLOv5 v6.0 backbone\nbackbone:\n  # [from, number, module, args]\n  [[-1, 1, Conv, [64, 6, 2, 2]],  # 0-P1/2\n   [-1, 1, Conv, [128, 3, 2]],  # 1-P2/4\n   [-1, 3, C3, [128]],\n   [-1, 1, Conv, [256, 3, 2]],  # 3-P3/8\n   [-1, 6, C3, [256]],\n   [-1, 1, Conv, [512, 3, 2]],  # 5-P4/16\n   [-1, 9, C3, [512]],\n   [-1, 1, Conv, [1024, 3, 2]],  # 7-P5/32\n   [-1, 3, C3, [1024]],\n   [-1, 1, SPPF, [1024, 5]],  # 9\n  ]\n\n# YOLOv5 v6.0 head\nhead:\n  [[-1, 1, Conv, [512, 1, 1]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [[-1, 6], 1, Concat, [1]],  # cat backbone P4\n   [-1, 3, C3, [512, False]],  # 13\n\n   [-1, 1, Conv, [256, 1, 1]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [[-1, 4], 1, Concat, [1]],  # cat backbone P3\n   [-1, 3, C3, [256, False]],  # 17 (P3/8-small)\n\n   [-1, 1, Conv, [256, 3, 2]],\n   [[-1, 14], 1, Concat, [1]],  # cat head P4\n   [-1, 3, C3, [512, False]],  # 20 (P4/16-medium)\n\n   [-1, 1, Conv, [512, 3, 2]],\n   [[-1, 10], 1, Concat, [1]],  # cat head P5\n   [-1, 3, C3, [1024, False]],  # 23 (P5/32-large)\n\n   [[17, 20, 23], 1, Detect, [nc, anchors]],  # Detect(P3, P4, P5)\n  ]\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/train.py",
    "content": "# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n\"\"\"\nTrain a YOLOv5 model on a custom dataset.\nModels and datasets download automatically from the latest YOLOv5 release.\n\nUsage - Single-GPU training:\n    $ python train.py --data coco128.yaml --weights yolov5s.pt --img 640  # from pretrained (recommended)\n    $ python train.py --data coco128.yaml --weights '' --cfg yolov5s.yaml --img 640  # from scratch\n\nUsage - Multi-GPU DDP training:\n    $ python -m torch.distributed.run --nproc_per_node 4 --master_port 1 train.py --data coco128.yaml --weights yolov5s.pt --img 640 --device 0,1,2,3\n\nModels:     https://github.com/ultralytics/yolov5/tree/master/models\nDatasets:   https://github.com/ultralytics/yolov5/tree/master/data\nTutorial:   https://github.com/ultralytics/yolov5/wiki/Train-Custom-Data\n\"\"\"\n\nimport argparse\nimport math\nimport os\nimport random\nimport subprocess\nimport sys\nimport time\nfrom copy import deepcopy\nfrom datetime import datetime\nfrom pathlib import Path\n\nimport numpy as np\nimport torch\nimport torch.distributed as dist\nimport torch.nn as nn\nimport yaml\nfrom torch.optim import lr_scheduler\nfrom tqdm import tqdm\n\nFILE = Path(__file__).resolve()\nROOT = FILE.parents[0]  # YOLOv5 root directory\nif str(ROOT) not in sys.path:\n    sys.path.append(str(ROOT))  # add ROOT to PATH\nROOT = Path(os.path.relpath(ROOT, Path.cwd()))  # relative\n\nimport val as validate  # for end-of-epoch mAP\nfrom models.experimental import attempt_load\nfrom models.yolo import Model\nfrom utils.autoanchor import check_anchors\nfrom utils.autobatch import check_train_batch_size\nfrom utils.callbacks import Callbacks\nfrom utils.dataloaders import create_dataloader\nfrom utils.downloads import attempt_download, is_url\nfrom utils.general import (LOGGER, TQDM_BAR_FORMAT, check_amp, check_dataset, check_file, check_git_info,\n                           check_git_status, check_img_size, check_requirements, check_suffix, check_yaml, colorstr,\n                           get_latest_run, increment_path, init_seeds, intersect_dicts, labels_to_class_weights,\n                           labels_to_image_weights, methods, one_cycle, print_args, print_mutation, strip_optimizer,\n                           yaml_save)\nfrom utils.loggers import Loggers\nfrom utils.loggers.comet.comet_utils import check_comet_resume\nfrom utils.loss import ComputeLossAuxOTA, ComputeLoss\nfrom utils.metrics import fitness\nfrom utils.plots import plot_evolve\nfrom utils.torch_utils import (EarlyStopping, ModelEMA, de_parallel, select_device, smart_DDP, smart_optimizer,\n                               smart_resume, torch_distributed_zero_first)\n\nLOCAL_RANK = int(os.getenv('LOCAL_RANK', -1))  # https://pytorch.org/docs/stable/elastic/run.html\nRANK = int(os.getenv('RANK', -1))\nWORLD_SIZE = int(os.getenv('WORLD_SIZE', 1))\nGIT_INFO = check_git_info()\n\n\ndef train(hyp, opt, device, callbacks):  # hyp is path/to/hyp.yaml or hyp dictionary\n    save_dir, epochs, batch_size, weights, single_cls, evolve, data, cfg, resume, noval, nosave, workers, freeze = \\\n        Path(opt.save_dir), opt.epochs, opt.batch_size, opt.weights, opt.single_cls, opt.evolve, opt.data, opt.cfg, \\\n        opt.resume, opt.noval, opt.nosave, opt.workers, opt.freeze\n    callbacks.run('on_pretrain_routine_start')\n\n    # Directories\n    w = save_dir / 'weights'  # weights dir\n    (w.parent if evolve else w).mkdir(parents=True, exist_ok=True)  # make dir\n    last, best = w / 'last.pt', w / 'best.pt'\n\n    # Hyperparameters\n    if isinstance(hyp, str):\n        with open(hyp, errors='ignore') as f:\n            hyp = yaml.safe_load(f)  # load hyps dict\n    LOGGER.info(colorstr('hyperparameters: ') + ', '.join(f'{k}={v}' for k, v in hyp.items()))\n    opt.hyp = hyp.copy()  # for saving hyps to checkpoints\n\n    # Save run settings\n    if not evolve:\n        yaml_save(save_dir / 'hyp.yaml', hyp)\n        yaml_save(save_dir / 'opt.yaml', vars(opt))\n\n    # Loggers\n    data_dict = None\n    if RANK in {-1, 0}:\n        loggers = Loggers(save_dir, weights, opt, hyp, LOGGER)  # loggers instance\n\n        # Register actions\n        for k in methods(loggers):\n            callbacks.register_action(k, callback=getattr(loggers, k))\n\n        # Process custom dataset artifact link\n        data_dict = loggers.remote_dataset\n        if resume:  # If resuming runs from remote artifact\n            weights, epochs, hyp, batch_size = opt.weights, opt.epochs, opt.hyp, opt.batch_size\n\n    # Config\n    plots = not evolve and not opt.noplots  # create plots\n    cuda = device.type != 'cpu'\n    init_seeds(opt.seed + 1 + RANK, deterministic=True)\n    with torch_distributed_zero_first(LOCAL_RANK):\n        data_dict = data_dict or check_dataset(data)  # check if None\n    train_path, val_path = data_dict['train'], data_dict['val']\n    nc = 1 if single_cls else int(data_dict['nc'])  # number of classes\n    names = {0: 'item'} if single_cls and len(data_dict['names']) != 1 else data_dict['names']  # class names\n    is_coco = isinstance(val_path, str) and val_path.endswith('coco/val2017.txt')  # COCO dataset\n\n    # Model\n    check_suffix(weights, '.pt')  # check weights\n    pretrained = weights.endswith('.pt')\n    if pretrained:\n        with torch_distributed_zero_first(LOCAL_RANK):\n            weights = attempt_download(weights)  # download if not found locally\n        ckpt = torch.load(weights, map_location='cpu')  # load checkpoint to CPU to avoid CUDA memory leak\n        model = Model(cfg or ckpt['model'].yaml, ch=3, nc=nc, anchors=hyp.get('anchors')).to(device)  # create\n        exclude = ['anchor'] if (cfg or hyp.get('anchors')) and not resume else []  # exclude keys\n        csd = ckpt['model'].float().state_dict()  # checkpoint state_dict as FP32\n        csd = intersect_dicts(csd, model.state_dict(), exclude=exclude)  # intersect\n        model.load_state_dict(csd, strict=False)  # load\n        LOGGER.info(f'Transferred {len(csd)}/{len(model.state_dict())} items from {weights}')  # report\n    else:\n        model = Model(cfg, ch=3, nc=nc, anchors=hyp.get('anchors')).to(device)  # create\n    amp = check_amp(model)  # check AMP\n\n    # Freeze\n    freeze = [f'model.{x}.' for x in (freeze if len(freeze) > 1 else range(freeze[0]))]  # layers to freeze\n    for k, v in model.named_parameters():\n        v.requires_grad = True  # train all layers\n        # v.register_hook(lambda x: torch.nan_to_num(x))  # NaN to 0 (commented for erratic training results)\n        if any(x in k for x in freeze):\n            LOGGER.info(f'freezing {k}')\n            v.requires_grad = False\n\n    # Image size\n    gs = max(int(model.stride.max()), 32)  # grid size (max stride)\n    imgsz = check_img_size(opt.imgsz, gs, floor=gs * 2)  # verify imgsz is gs-multiple\n\n    # Batch size\n    if RANK == -1 and batch_size == -1:  # single-GPU only, estimate best batch size\n        batch_size = check_train_batch_size(model, imgsz, amp)\n        loggers.on_params_update({'batch_size': batch_size})\n\n    # Optimizer\n    nbs = 64  # nominal batch size\n    accumulate = max(round(nbs / batch_size), 1)  # accumulate loss before optimizing\n    hyp['weight_decay'] *= batch_size * accumulate / nbs  # scale weight_decay\n    optimizer = smart_optimizer(model, opt.optimizer, hyp['lr0'], hyp['momentum'], hyp['weight_decay'])\n\n    # Scheduler\n    if opt.cos_lr:\n        lf = one_cycle(1, hyp['lrf'], epochs)  # cosine 1->hyp['lrf']\n    else:\n        lf = lambda x: (1 - x / epochs) * (1.0 - hyp['lrf']) + hyp['lrf']  # linear\n    scheduler = lr_scheduler.LambdaLR(optimizer, lr_lambda=lf)  # plot_lr_scheduler(optimizer, scheduler, epochs)\n\n    # EMA\n    ema = ModelEMA(model) if RANK in {-1, 0} else None\n\n    # Resume\n    best_fitness, start_epoch = 0.0, 0\n    if pretrained:\n        if resume:\n            best_fitness, start_epoch, epochs = smart_resume(ckpt, optimizer, ema, weights, epochs, resume)\n        del ckpt, csd\n\n    # DP mode\n    if cuda and RANK == -1 and torch.cuda.device_count() > 1:\n        LOGGER.warning('WARNING ⚠️ DP not recommended, use torch.distributed.run for best DDP Multi-GPU results.\\n'\n                       'See Multi-GPU Tutorial at https://github.com/ultralytics/yolov5/issues/475 to get started.')\n        model = torch.nn.DataParallel(model)\n\n    # SyncBatchNorm\n    if opt.sync_bn and cuda and RANK != -1:\n        model = torch.nn.SyncBatchNorm.convert_sync_batchnorm(model).to(device)\n        LOGGER.info('Using SyncBatchNorm()')\n\n    # Trainloader\n    train_loader, dataset = create_dataloader(train_path,\n                                              imgsz,\n                                              batch_size // WORLD_SIZE,\n                                              gs,\n                                              single_cls,\n                                              hyp=hyp,\n                                              augment=True,\n                                              cache=None if opt.cache == 'val' else opt.cache,\n                                              rect=opt.rect,\n                                              rank=LOCAL_RANK,\n                                              workers=workers,\n                                              image_weights=opt.image_weights,\n                                              quad=opt.quad,\n                                              prefix=colorstr('train: '),\n                                              shuffle=True,\n                                              seed=opt.seed)\n    labels = np.concatenate(dataset.labels, 0)\n    mlc = int(labels[:, 0].max())  # max label class\n    assert mlc < nc, f'Label class {mlc} exceeds nc={nc} in {data}. Possible class labels are 0-{nc - 1}'\n\n    # Process 0\n    if RANK in {-1, 0}:\n        val_loader = create_dataloader(val_path,\n                                       imgsz,\n                                       batch_size // WORLD_SIZE * 2,\n                                       gs,\n                                       single_cls,\n                                       hyp=hyp,\n                                       cache=None if noval else opt.cache,\n                                       rect=True,\n                                       rank=-1,\n                                       workers=workers * 2,\n                                       pad=0.5,\n                                       prefix=colorstr('val: '))[0]\n\n        if not resume:\n            if not opt.noautoanchor:\n                check_anchors(dataset, model=model, thr=hyp['anchor_t'], imgsz=imgsz)  # run AutoAnchor\n            model.half().float()  # pre-reduce anchor precision\n\n        callbacks.run('on_pretrain_routine_end', labels, names)\n\n    # DDP mode\n    if cuda and RANK != -1:\n        model = smart_DDP(model)\n\n    # Model attributes\n    nl = de_parallel(model).model[-1].nl  # number of detection layers (to scale hyps)\n    hyp['box'] *= 3 / nl  # scale to layers\n    hyp['cls'] *= nc / 80 * 3 / nl  # scale to classes and layers\n    hyp['obj'] *= (imgsz / 640) ** 2 * 3 / nl  # scale to image size and layers\n    hyp['label_smoothing'] = opt.label_smoothing\n    model.nc = nc  # attach number of classes to model\n    model.hyp = hyp  # attach hyperparameters to model\n    model.class_weights = labels_to_class_weights(dataset.labels, nc).to(device) * nc  # attach class weights\n    model.names = names\n\n    # Start training\n    t0 = time.time()\n    nb = len(train_loader)  # number of batches\n    nw = max(round(hyp['warmup_epochs'] * nb), 100)  # number of warmup iterations, max(3 epochs, 100 iterations)\n    # nw = min(nw, (epochs - start_epoch) / 2 * nb)  # limit warmup to < 1/2 of training\n    last_opt_step = -1\n    maps = np.zeros(nc)  # mAP per class\n    results = (0, 0, 0, 0, 0, 0, 0)  # P, R, mAP@.5, mAP@.5-.95, val_loss(box, obj, cls)\n    scheduler.last_epoch = start_epoch - 1  # do not move\n    scaler = torch.cuda.amp.GradScaler(enabled=amp)\n    stopper, stop = EarlyStopping(patience=opt.patience), False\n    compute_loss_ota = ComputeLossAuxOTA(model)  # init loss class\n    compute_loss = ComputeLoss(model)\n    callbacks.run('on_train_start')\n    LOGGER.info(f'Image sizes {imgsz} train, {imgsz} val\\n'\n                f'Using {train_loader.num_workers * WORLD_SIZE} dataloader workers\\n'\n                f\"Logging results to {colorstr('bold', save_dir)}\\n\"\n                f'Starting training for {epochs} epochs...')\n    for epoch in range(start_epoch, epochs):  # epoch ------------------------------------------------------------------\n        callbacks.run('on_train_epoch_start')\n        model.train()\n\n        # Update image weights (optional, single-GPU only)\n        if opt.image_weights:\n            cw = model.class_weights.cpu().numpy() * (1 - maps) ** 2 / nc  # class weights\n            iw = labels_to_image_weights(dataset.labels, nc=nc, class_weights=cw)  # image weights\n            dataset.indices = random.choices(range(dataset.n), weights=iw, k=dataset.n)  # rand weighted idx\n\n        # Update mosaic border (optional)\n        # b = int(random.uniform(0.25 * imgsz, 0.75 * imgsz + gs) // gs * gs)\n        # dataset.mosaic_border = [b - imgsz, -b]  # height, width borders\n\n        mloss = torch.zeros(3, device=device)  # mean losses\n        if RANK != -1:\n            train_loader.sampler.set_epoch(epoch)\n        pbar = enumerate(train_loader)\n        LOGGER.info(('\\n' + '%11s' * 7) % ('Epoch', 'GPU_mem', 'box_loss', 'obj_loss', 'cls_loss', 'Instances', 'Size'))\n        if RANK in {-1, 0}:\n            pbar = tqdm(pbar, total=nb, bar_format=TQDM_BAR_FORMAT)  # progress bar\n        optimizer.zero_grad()\n        for i, (imgs, targets, paths, _) in pbar:  # batch -------------------------------------------------------------\n            callbacks.run('on_train_batch_start')\n            ni = i + nb * epoch  # number integrated batches (since train start)\n            imgs = imgs.to(device, non_blocking=True).float() / 255  # uint8 to float32, 0-255 to 0.0-1.0\n\n            # Warmup\n            if ni <= nw:\n                xi = [0, nw]  # x interp\n                # compute_loss.gr = np.interp(ni, xi, [0.0, 1.0])  # iou loss ratio (obj_loss = 1.0 or iou)\n                accumulate = max(1, np.interp(ni, xi, [1, nbs / batch_size]).round())\n                for j, x in enumerate(optimizer.param_groups):\n                    # bias lr falls from 0.1 to lr0, all other lrs rise from 0.0 to lr0\n                    x['lr'] = np.interp(ni, xi, [hyp['warmup_bias_lr'] if j == 0 else 0.0, x['initial_lr'] * lf(epoch)])\n                    if 'momentum' in x:\n                        x['momentum'] = np.interp(ni, xi, [hyp['warmup_momentum'], hyp['momentum']])\n\n            # Multi-scale\n            if opt.multi_scale:\n                sz = random.randrange(imgsz * 0.5, imgsz * 1.5 + gs) // gs * gs  # size\n                sf = sz / max(imgs.shape[2:])  # scale factor\n                if sf != 1:\n                    ns = [math.ceil(x * sf / gs) * gs for x in imgs.shape[2:]]  # new shape (stretched to gs-multiple)\n                    imgs = nn.functional.interpolate(imgs, size=ns, mode='bilinear', align_corners=False)\n\n            # Forward\n            with torch.cuda.amp.autocast(amp):\n                pred = model(imgs)  # forward\n                loss, loss_items = compute_loss_ota(pred, targets.to(device), imgs)  # loss scaled by batch_size\n                if RANK != -1:\n                    loss *= WORLD_SIZE  # gradient averaged between devices in DDP mode\n                if opt.quad:\n                    loss *= 4.\n\n            # Backward\n            scaler.scale(loss).backward()\n\n            # Optimize - https://pytorch.org/docs/master/notes/amp_examples.html\n            if ni - last_opt_step >= accumulate:\n                scaler.unscale_(optimizer)  # unscale gradients\n                torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=10.0)  # clip gradients\n                scaler.step(optimizer)  # optimizer.step\n                scaler.update()\n                optimizer.zero_grad()\n                if ema:\n                    ema.update(model)\n                last_opt_step = ni\n\n            # Log\n            if RANK in {-1, 0}:\n                mloss = (mloss * i + loss_items) / (i + 1)  # update mean losses\n                mem = f'{torch.cuda.memory_reserved() / 1E9 if torch.cuda.is_available() else 0:.3g}G'  # (GB)\n                pbar.set_description(('%11s' * 2 + '%11.4g' * 5) %\n                                     (f'{epoch}/{epochs - 1}', mem, *mloss, targets.shape[0], imgs.shape[-1]))\n                callbacks.run('on_train_batch_end', model, ni, imgs, targets, paths, list(mloss))\n                if callbacks.stop_training:\n                    return\n            # end batch ------------------------------------------------------------------------------------------------\n\n        # Scheduler\n        lr = [x['lr'] for x in optimizer.param_groups]  # for loggers\n        scheduler.step()\n\n        if RANK in {-1, 0}:\n            # mAP\n            callbacks.run('on_train_epoch_end', epoch=epoch)\n            ema.update_attr(model, include=['yaml', 'nc', 'hyp', 'names', 'stride', 'class_weights'])\n            final_epoch = (epoch + 1 == epochs) or stopper.possible_stop\n            if not noval or final_epoch:  # Calculate mAP\n                results, maps, _ = validate.run(data_dict,\n                                                batch_size=batch_size // WORLD_SIZE * 2,\n                                                imgsz=imgsz,\n                                                half=amp,\n                                                model=ema.ema,\n                                                single_cls=single_cls,\n                                                dataloader=val_loader,\n                                                save_dir=save_dir,\n                                                plots=False,\n                                                callbacks=callbacks,\n                                                compute_loss=compute_loss)\n\n            # Update best mAP\n            fi = fitness(np.array(results).reshape(1, -1))  # weighted combination of [P, R, mAP@.5, mAP@.5-.95]\n            stop = stopper(epoch=epoch, fitness=fi)  # early stop check\n            if fi > best_fitness:\n                best_fitness = fi\n            log_vals = list(mloss) + list(results) + lr\n            callbacks.run('on_fit_epoch_end', log_vals, epoch, best_fitness, fi)\n\n            # Save model\n            if (not nosave) or (final_epoch and not evolve):  # if save\n                ckpt = {\n                    'epoch': epoch,\n                    'best_fitness': best_fitness,\n                    'model': deepcopy(de_parallel(model)).half(),\n                    'ema': deepcopy(ema.ema).half(),\n                    'updates': ema.updates,\n                    'optimizer': optimizer.state_dict(),\n                    'opt': vars(opt),\n                    'git': GIT_INFO,  # {remote, branch, commit} if a git repo\n                    'date': datetime.now().isoformat()}\n\n                # Save last, best and delete\n                torch.save(ckpt, last)\n                if best_fitness == fi:\n                    torch.save(ckpt, best)\n                if opt.save_period > 0 and epoch % opt.save_period == 0:\n                    torch.save(ckpt, w / f'epoch{epoch}.pt')\n                del ckpt\n                callbacks.run('on_model_save', last, epoch, final_epoch, best_fitness, fi)\n\n        # EarlyStopping\n        if RANK != -1:  # if DDP training\n            broadcast_list = [stop if RANK == 0 else None]\n            dist.broadcast_object_list(broadcast_list, 0)  # broadcast 'stop' to all ranks\n            if RANK != 0:\n                stop = broadcast_list[0]\n        if stop:\n            break  # must break all DDP ranks\n\n        # end epoch ----------------------------------------------------------------------------------------------------\n    # end training -----------------------------------------------------------------------------------------------------\n    if RANK in {-1, 0}:\n        LOGGER.info(f'\\n{epoch - start_epoch + 1} epochs completed in {(time.time() - t0) / 3600:.3f} hours.')\n        for f in last, best:\n            if f.exists():\n                strip_optimizer(f)  # strip optimizers\n                if f is best:\n                    LOGGER.info(f'\\nValidating {f}...')\n                    results, _, _ = validate.run(\n                        data_dict,\n                        batch_size=batch_size // WORLD_SIZE * 2,\n                        imgsz=imgsz,\n                        model=attempt_load(f, device).half(),\n                        iou_thres=0.65 if is_coco else 0.60,  # best pycocotools at iou 0.65\n                        single_cls=single_cls,\n                        dataloader=val_loader,\n                        save_dir=save_dir,\n                        save_json=is_coco,\n                        verbose=True,\n                        plots=plots,\n                        callbacks=callbacks,\n                        compute_loss=compute_loss)  # val best model with plots\n                    if is_coco:\n                        callbacks.run('on_fit_epoch_end', list(mloss) + list(results) + lr, epoch, best_fitness, fi)\n\n        callbacks.run('on_train_end', last, best, epoch, results)\n\n    torch.cuda.empty_cache()\n    return results\n\n\ndef parse_opt(known=False):\n    parser = argparse.ArgumentParser()\n    parser.add_argument('--weights', type=str, default=ROOT / 'yolov5n.pt', help='initial weights path')\n    parser.add_argument('--cfg', type=str, default='models/yolov5_aux.yaml', help='model.yaml path')\n    parser.add_argument('--data', type=str, default=ROOT / '/home/hjj/Desktop/dataset/data.yaml', help='dataset.yaml path')\n    parser.add_argument('--hyp', type=str, default=ROOT / 'data/hyps/hyp.scratch-low.yaml', help='hyperparameters path')\n    parser.add_argument('--epochs', type=int, default=100, help='total training epochs')\n    parser.add_argument('--batch-size', type=int, default=64, help='total batch size for all GPUs, -1 for autobatch')\n    parser.add_argument('--imgsz', '--img', '--img-size', type=int, default=640, help='train, val image size (pixels)')\n    parser.add_argument('--rect', action='store_true', help='rectangular training')\n    parser.add_argument('--resume', nargs='?', const=True, default=False, help='resume most recent training')\n    parser.add_argument('--nosave', action='store_true', help='only save final checkpoint')\n    parser.add_argument('--noval', action='store_true', help='only validate final epoch')\n    parser.add_argument('--noautoanchor', action='store_true', help='disable AutoAnchor')\n    parser.add_argument('--noplots', action='store_true', help='save no plot files')\n    parser.add_argument('--evolve', type=int, nargs='?', const=300, help='evolve hyperparameters for x generations')\n    parser.add_argument('--bucket', type=str, default='', help='gsutil bucket')\n    parser.add_argument('--cache', type=str, nargs='?', const='ram', default=True, help='image --cache ram/disk')\n    parser.add_argument('--image-weights', action='store_true', help='use weighted image selection for training')\n    parser.add_argument('--device', default='0', help='cuda device, i.e. 0 or 0,1,2,3 or cpu')\n    parser.add_argument('--multi-scale', action='store_true', help='vary img-size +/- 50%%')\n    parser.add_argument('--single-cls', action='store_true', help='train multi-class data as single-class')\n    parser.add_argument('--optimizer', type=str, choices=['SGD', 'Adam', 'AdamW'], default='SGD', help='optimizer')\n    parser.add_argument('--sync-bn', action='store_true', help='use SyncBatchNorm, only available in DDP mode')\n    parser.add_argument('--workers', type=int, default=4, help='max dataloader workers (per RANK in DDP mode)')\n    parser.add_argument('--project', default=ROOT / 'runs/train', help='save to project/name')\n    parser.add_argument('--name', default='exp', help='save to project/name')\n    parser.add_argument('--exist-ok', action='store_true', help='existing project/name ok, do not increment')\n    parser.add_argument('--quad', action='store_true', help='quad dataloader')\n    parser.add_argument('--cos-lr', action='store_true', help='cosine LR scheduler')\n    parser.add_argument('--label-smoothing', type=float, default=0.0, help='Label smoothing epsilon')\n    parser.add_argument('--patience', type=int, default=100, help='EarlyStopping patience (epochs without improvement)')\n    parser.add_argument('--freeze', nargs='+', type=int, default=[0], help='Freeze layers: backbone=10, first3=0 1 2')\n    parser.add_argument('--save-period', type=int, default=-1, help='Save checkpoint every x epochs (disabled if < 1)')\n    parser.add_argument('--seed', type=int, default=0, help='Global training seed')\n    parser.add_argument('--local_rank', type=int, default=-1, help='Automatic DDP Multi-GPU argument, do not modify')\n\n    # Logger arguments\n    parser.add_argument('--entity', default=None, help='Entity')\n    parser.add_argument('--upload_dataset', nargs='?', const=True, default=False, help='Upload data, \"val\" option')\n    parser.add_argument('--bbox_interval', type=int, default=-1, help='Set bounding-box image logging interval')\n    parser.add_argument('--artifact_alias', type=str, default='latest', help='Version of dataset artifact to use')\n\n    return parser.parse_known_args()[0] if known else parser.parse_args()\n\n\ndef main(opt, callbacks=Callbacks()):\n    # Checks\n    if RANK in {-1, 0}:\n        print_args(vars(opt))\n        check_git_status()\n        check_requirements()\n\n    # Resume (from specified or most recent last.pt)\n    if opt.resume and not check_comet_resume(opt) and not opt.evolve:\n        last = Path(check_file(opt.resume) if isinstance(opt.resume, str) else get_latest_run())\n        opt_yaml = last.parent.parent / 'opt.yaml'  # train options yaml\n        opt_data = opt.data  # original dataset\n        if opt_yaml.is_file():\n            with open(opt_yaml, errors='ignore') as f:\n                d = yaml.safe_load(f)\n        else:\n            d = torch.load(last, map_location='cpu')['opt']\n        opt = argparse.Namespace(**d)  # replace\n        opt.cfg, opt.weights, opt.resume = '', str(last), True  # reinstate\n        if is_url(opt_data):\n            opt.data = check_file(opt_data)  # avoid HUB resume auth timeout\n    else:\n        opt.data, opt.cfg, opt.hyp, opt.weights, opt.project = \\\n            check_file(opt.data), check_yaml(opt.cfg), check_yaml(opt.hyp), str(opt.weights), str(opt.project)  # checks\n        assert len(opt.cfg) or len(opt.weights), 'either --cfg or --weights must be specified'\n        if opt.evolve:\n            if opt.project == str(ROOT / 'runs/train'):  # if default project name, rename to runs/evolve\n                opt.project = str(ROOT / 'runs/evolve')\n            opt.exist_ok, opt.resume = opt.resume, False  # pass resume to exist_ok and disable resume\n        if opt.name == 'cfg':\n            opt.name = Path(opt.cfg).stem  # use model.yaml as name\n        opt.save_dir = str(increment_path(Path(opt.project) / opt.name, exist_ok=opt.exist_ok))\n\n    # DDP mode\n    device = select_device(opt.device, batch_size=opt.batch_size)\n    if LOCAL_RANK != -1:\n        msg = 'is not compatible with YOLOv5 Multi-GPU DDP training'\n        assert not opt.image_weights, f'--image-weights {msg}'\n        assert not opt.evolve, f'--evolve {msg}'\n        assert opt.batch_size != -1, f'AutoBatch with --batch-size -1 {msg}, please pass a valid --batch-size'\n        assert opt.batch_size % WORLD_SIZE == 0, f'--batch-size {opt.batch_size} must be multiple of WORLD_SIZE'\n        assert torch.cuda.device_count() > LOCAL_RANK, 'insufficient CUDA devices for DDP command'\n        torch.cuda.set_device(LOCAL_RANK)\n        device = torch.device('cuda', LOCAL_RANK)\n        dist.init_process_group(backend='nccl' if dist.is_nccl_available() else 'gloo')\n\n    # Train\n    if not opt.evolve:\n        train(opt.hyp, opt, device, callbacks)\n\n    # Evolve hyperparameters (optional)\n    else:\n        # Hyperparameter evolution metadata (mutation scale 0-1, lower_limit, upper_limit)\n        meta = {\n            'lr0': (1, 1e-5, 1e-1),  # initial learning rate (SGD=1E-2, Adam=1E-3)\n            'lrf': (1, 0.01, 1.0),  # final OneCycleLR learning rate (lr0 * lrf)\n            'momentum': (0.3, 0.6, 0.98),  # SGD momentum/Adam beta1\n            'weight_decay': (1, 0.0, 0.001),  # optimizer weight decay\n            'warmup_epochs': (1, 0.0, 5.0),  # warmup epochs (fractions ok)\n            'warmup_momentum': (1, 0.0, 0.95),  # warmup initial momentum\n            'warmup_bias_lr': (1, 0.0, 0.2),  # warmup initial bias lr\n            'box': (1, 0.02, 0.2),  # box loss gain\n            'cls': (1, 0.2, 4.0),  # cls loss gain\n            'cls_pw': (1, 0.5, 2.0),  # cls BCELoss positive_weight\n            'obj': (1, 0.2, 4.0),  # obj loss gain (scale with pixels)\n            'obj_pw': (1, 0.5, 2.0),  # obj BCELoss positive_weight\n            'iou_t': (0, 0.1, 0.7),  # IoU training threshold\n            'anchor_t': (1, 2.0, 8.0),  # anchor-multiple threshold\n            'anchors': (2, 2.0, 10.0),  # anchors per output grid (0 to ignore)\n            'fl_gamma': (0, 0.0, 2.0),  # focal loss gamma (efficientDet default gamma=1.5)\n            'hsv_h': (1, 0.0, 0.1),  # image HSV-Hue augmentation (fraction)\n            'hsv_s': (1, 0.0, 0.9),  # image HSV-Saturation augmentation (fraction)\n            'hsv_v': (1, 0.0, 0.9),  # image HSV-Value augmentation (fraction)\n            'degrees': (1, 0.0, 45.0),  # image rotation (+/- deg)\n            'translate': (1, 0.0, 0.9),  # image translation (+/- fraction)\n            'scale': (1, 0.0, 0.9),  # image scale (+/- gain)\n            'shear': (1, 0.0, 10.0),  # image shear (+/- deg)\n            'perspective': (0, 0.0, 0.001),  # image perspective (+/- fraction), range 0-0.001\n            'flipud': (1, 0.0, 1.0),  # image flip up-down (probability)\n            'fliplr': (0, 0.0, 1.0),  # image flip left-right (probability)\n            'mosaic': (1, 0.0, 1.0),  # image mixup (probability)\n            'mixup': (1, 0.0, 1.0),  # image mixup (probability)\n            'copy_paste': (1, 0.0, 1.0)}  # segment copy-paste (probability)\n\n        with open(opt.hyp, errors='ignore') as f:\n            hyp = yaml.safe_load(f)  # load hyps dict\n            if 'anchors' not in hyp:  # anchors commented in hyp.yaml\n                hyp['anchors'] = 3\n        if opt.noautoanchor:\n            del hyp['anchors'], meta['anchors']\n        opt.noval, opt.nosave, save_dir = True, True, Path(opt.save_dir)  # only val/save final epoch\n        # ei = [isinstance(x, (int, float)) for x in hyp.values()]  # evolvable indices\n        evolve_yaml, evolve_csv = save_dir / 'hyp_evolve.yaml', save_dir / 'evolve.csv'\n        if opt.bucket:\n            # download evolve.csv if exists\n            subprocess.run([\n                'gsutil',\n                'cp',\n                f'gs://{opt.bucket}/evolve.csv',\n                str(evolve_csv),])\n\n        for _ in range(opt.evolve):  # generations to evolve\n            if evolve_csv.exists():  # if evolve.csv exists: select best hyps and mutate\n                # Select parent(s)\n                parent = 'single'  # parent selection method: 'single' or 'weighted'\n                x = np.loadtxt(evolve_csv, ndmin=2, delimiter=',', skiprows=1)\n                n = min(5, len(x))  # number of previous results to consider\n                x = x[np.argsort(-fitness(x))][:n]  # top n mutations\n                w = fitness(x) - fitness(x).min() + 1E-6  # weights (sum > 0)\n                if parent == 'single' or len(x) == 1:\n                    # x = x[random.randint(0, n - 1)]  # random selection\n                    x = x[random.choices(range(n), weights=w)[0]]  # weighted selection\n                elif parent == 'weighted':\n                    x = (x * w.reshape(n, 1)).sum(0) / w.sum()  # weighted combination\n\n                # Mutate\n                mp, s = 0.8, 0.2  # mutation probability, sigma\n                npr = np.random\n                npr.seed(int(time.time()))\n                g = np.array([meta[k][0] for k in hyp.keys()])  # gains 0-1\n                ng = len(meta)\n                v = np.ones(ng)\n                while all(v == 1):  # mutate until a change occurs (prevent duplicates)\n                    v = (g * (npr.random(ng) < mp) * npr.randn(ng) * npr.random() * s + 1).clip(0.3, 3.0)\n                for i, k in enumerate(hyp.keys()):  # plt.hist(v.ravel(), 300)\n                    hyp[k] = float(x[i + 7] * v[i])  # mutate\n\n            # Constrain to limits\n            for k, v in meta.items():\n                hyp[k] = max(hyp[k], v[1])  # lower limit\n                hyp[k] = min(hyp[k], v[2])  # upper limit\n                hyp[k] = round(hyp[k], 5)  # significant digits\n\n            # Train mutation\n            results = train(hyp.copy(), opt, device, callbacks)\n            callbacks = Callbacks()\n            # Write mutation results\n            keys = ('metrics/precision', 'metrics/recall', 'metrics/mAP_0.5', 'metrics/mAP_0.5:0.95', 'val/box_loss',\n                    'val/obj_loss', 'val/cls_loss')\n            print_mutation(keys, results, hyp.copy(), save_dir, opt.bucket)\n\n        # Plot results\n        plot_evolve(evolve_csv)\n        LOGGER.info(f'Hyperparameter evolution finished {opt.evolve} generations\\n'\n                    f\"Results saved to {colorstr('bold', save_dir)}\\n\"\n                    f'Usage example: $ python train.py --hyp {evolve_yaml}')\n\n\ndef run(**kwargs):\n    # Usage: import train; train.run(data='coco128.yaml', imgsz=320, weights='yolov5m.pt')\n    opt = parse_opt(True)\n    for k, v in kwargs.items():\n        setattr(opt, k, v)\n    main(opt)\n    return opt\n\n\nif __name__ == '__main__':\n    opt = parse_opt()\n    main(opt)\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/utils/__init__.py",
    "content": "# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n\"\"\"\nutils/initialization\n\"\"\"\n\nimport contextlib\nimport platform\nimport threading\n\n\ndef emojis(str=''):\n    # Return platform-dependent emoji-safe version of string\n    return str.encode().decode('ascii', 'ignore') if platform.system() == 'Windows' else str\n\n\nclass TryExcept(contextlib.ContextDecorator):\n    # YOLOv5 TryExcept class. Usage: @TryExcept() decorator or 'with TryExcept():' context manager\n    def __init__(self, msg=''):\n        self.msg = msg\n\n    def __enter__(self):\n        pass\n\n    def __exit__(self, exc_type, value, traceback):\n        if value:\n            print(emojis(f\"{self.msg}{': ' if self.msg else ''}{value}\"))\n        return True\n\n\ndef threaded(func):\n    # Multi-threads a target function and returns thread. Usage: @threaded decorator\n    def wrapper(*args, **kwargs):\n        thread = threading.Thread(target=func, args=args, kwargs=kwargs, daemon=True)\n        thread.start()\n        return thread\n\n    return wrapper\n\n\ndef join_threads(verbose=False):\n    # Join all daemon threads, i.e. atexit.register(lambda: join_threads())\n    main_thread = threading.current_thread()\n    for t in threading.enumerate():\n        if t is not main_thread:\n            if verbose:\n                print(f'Joining thread {t.name}')\n            t.join()\n\n\ndef notebook_init(verbose=True):\n    # Check system software and hardware\n    print('Checking setup...')\n\n    import os\n    import shutil\n\n    from utils.general import check_font, check_requirements, is_colab\n    from utils.torch_utils import select_device  # imports\n\n    check_font()\n\n    import psutil\n    from IPython import display  # to display images and clear console output\n\n    if is_colab():\n        shutil.rmtree('/content/sample_data', ignore_errors=True)  # remove colab /sample_data directory\n\n    # System info\n    if verbose:\n        gb = 1 << 30  # bytes to GiB (1024 ** 3)\n        ram = psutil.virtual_memory().total\n        total, used, free = shutil.disk_usage('/')\n        display.clear_output()\n        s = f'({os.cpu_count()} CPUs, {ram / gb:.1f} GB RAM, {(total - free) / gb:.1f}/{total / gb:.1f} GB disk)'\n    else:\n        s = ''\n\n    select_device(newline=False)\n    print(emojis(f'Setup complete ✅ {s}'))\n    return display\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/utils/activations.py",
    "content": "# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n\"\"\"\nActivation functions\n\"\"\"\n\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\n\n\nclass SiLU(nn.Module):\n    # SiLU activation https://arxiv.org/pdf/1606.08415.pdf\n    @staticmethod\n    def forward(x):\n        return x * torch.sigmoid(x)\n\n\nclass Hardswish(nn.Module):\n    # Hard-SiLU activation\n    @staticmethod\n    def forward(x):\n        # return x * F.hardsigmoid(x)  # for TorchScript and CoreML\n        return x * F.hardtanh(x + 3, 0.0, 6.0) / 6.0  # for TorchScript, CoreML and ONNX\n\n\nclass Mish(nn.Module):\n    # Mish activation https://github.com/digantamisra98/Mish\n    @staticmethod\n    def forward(x):\n        return x * F.softplus(x).tanh()\n\n\nclass MemoryEfficientMish(nn.Module):\n    # Mish activation memory-efficient\n    class F(torch.autograd.Function):\n\n        @staticmethod\n        def forward(ctx, x):\n            ctx.save_for_backward(x)\n            return x.mul(torch.tanh(F.softplus(x)))  # x * tanh(ln(1 + exp(x)))\n\n        @staticmethod\n        def backward(ctx, grad_output):\n            x = ctx.saved_tensors[0]\n            sx = torch.sigmoid(x)\n            fx = F.softplus(x).tanh()\n            return grad_output * (fx + x * sx * (1 - fx * fx))\n\n    def forward(self, x):\n        return self.F.apply(x)\n\n\nclass FReLU(nn.Module):\n    # FReLU activation https://arxiv.org/abs/2007.11824\n    def __init__(self, c1, k=3):  # ch_in, kernel\n        super().__init__()\n        self.conv = nn.Conv2d(c1, c1, k, 1, 1, groups=c1, bias=False)\n        self.bn = nn.BatchNorm2d(c1)\n\n    def forward(self, x):\n        return torch.max(x, self.bn(self.conv(x)))\n\n\nclass AconC(nn.Module):\n    r\"\"\" ACON activation (activate or not)\n    AconC: (p1*x-p2*x) * sigmoid(beta*(p1*x-p2*x)) + p2*x, beta is a learnable parameter\n    according to \"Activate or Not: Learning Customized Activation\" <https://arxiv.org/pdf/2009.04759.pdf>.\n    \"\"\"\n\n    def __init__(self, c1):\n        super().__init__()\n        self.p1 = nn.Parameter(torch.randn(1, c1, 1, 1))\n        self.p2 = nn.Parameter(torch.randn(1, c1, 1, 1))\n        self.beta = nn.Parameter(torch.ones(1, c1, 1, 1))\n\n    def forward(self, x):\n        dpx = (self.p1 - self.p2) * x\n        return dpx * torch.sigmoid(self.beta * dpx) + self.p2 * x\n\n\nclass MetaAconC(nn.Module):\n    r\"\"\" ACON activation (activate or not)\n    MetaAconC: (p1*x-p2*x) * sigmoid(beta*(p1*x-p2*x)) + p2*x, beta is generated by a small network\n    according to \"Activate or Not: Learning Customized Activation\" <https://arxiv.org/pdf/2009.04759.pdf>.\n    \"\"\"\n\n    def __init__(self, c1, k=1, s=1, r=16):  # ch_in, kernel, stride, r\n        super().__init__()\n        c2 = max(r, c1 // r)\n        self.p1 = nn.Parameter(torch.randn(1, c1, 1, 1))\n        self.p2 = nn.Parameter(torch.randn(1, c1, 1, 1))\n        self.fc1 = nn.Conv2d(c1, c2, k, s, bias=True)\n        self.fc2 = nn.Conv2d(c2, c1, k, s, bias=True)\n        # self.bn1 = nn.BatchNorm2d(c2)\n        # self.bn2 = nn.BatchNorm2d(c1)\n\n    def forward(self, x):\n        y = x.mean(dim=2, keepdims=True).mean(dim=3, keepdims=True)\n        # batch-size 1 bug/instabilities https://github.com/ultralytics/yolov5/issues/2891\n        # beta = torch.sigmoid(self.bn2(self.fc2(self.bn1(self.fc1(y)))))  # bug/unstable\n        beta = torch.sigmoid(self.fc2(self.fc1(y)))  # bug patch BN layers removed\n        dpx = (self.p1 - self.p2) * x\n        return dpx * torch.sigmoid(beta * dpx) + self.p2 * x\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/utils/augmentations.py",
    "content": "# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n\"\"\"\nImage augmentation functions\n\"\"\"\n\nimport math\nimport random\n\nimport cv2\nimport numpy as np\nimport torch\nimport torchvision.transforms as T\nimport torchvision.transforms.functional as TF\n\nfrom utils.general import LOGGER, check_version, colorstr, resample_segments, segment2box, xywhn2xyxy\nfrom utils.metrics import bbox_ioa\n\nIMAGENET_MEAN = 0.485, 0.456, 0.406  # RGB mean\nIMAGENET_STD = 0.229, 0.224, 0.225  # RGB standard deviation\n\n\nclass Albumentations:\n    # YOLOv5 Albumentations class (optional, only used if package is installed)\n    def __init__(self, size=640):\n        self.transform = None\n        prefix = colorstr('albumentations: ')\n        try:\n            import albumentations as A\n            check_version(A.__version__, '1.0.3', hard=True)  # version requirement\n\n            T = [\n                A.RandomResizedCrop(height=size, width=size, scale=(0.8, 1.0), ratio=(0.9, 1.11), p=0.0),\n                A.Blur(p=0.01),\n                A.MedianBlur(p=0.01),\n                A.ToGray(p=0.01),\n                A.CLAHE(p=0.01),\n                A.RandomBrightnessContrast(p=0.0),\n                A.RandomGamma(p=0.0),\n                A.ImageCompression(quality_lower=75, p=0.0)]  # transforms\n            self.transform = A.Compose(T, bbox_params=A.BboxParams(format='yolo', label_fields=['class_labels']))\n\n            LOGGER.info(prefix + ', '.join(f'{x}'.replace('always_apply=False, ', '') for x in T if x.p))\n        except ImportError:  # package not installed, skip\n            pass\n        except Exception as e:\n            LOGGER.info(f'{prefix}{e}')\n\n    def __call__(self, im, labels, p=1.0):\n        if self.transform and random.random() < p:\n            new = self.transform(image=im, bboxes=labels[:, 1:], class_labels=labels[:, 0])  # transformed\n            im, labels = new['image'], np.array([[c, *b] for c, b in zip(new['class_labels'], new['bboxes'])])\n        return im, labels\n\n\ndef normalize(x, mean=IMAGENET_MEAN, std=IMAGENET_STD, inplace=False):\n    # Denormalize RGB images x per ImageNet stats in BCHW format, i.e. = (x - mean) / std\n    return TF.normalize(x, mean, std, inplace=inplace)\n\n\ndef denormalize(x, mean=IMAGENET_MEAN, std=IMAGENET_STD):\n    # Denormalize RGB images x per ImageNet stats in BCHW format, i.e. = x * std + mean\n    for i in range(3):\n        x[:, i] = x[:, i] * std[i] + mean[i]\n    return x\n\n\ndef augment_hsv(im, hgain=0.5, sgain=0.5, vgain=0.5):\n    # HSV color-space augmentation\n    if hgain or sgain or vgain:\n        r = np.random.uniform(-1, 1, 3) * [hgain, sgain, vgain] + 1  # random gains\n        hue, sat, val = cv2.split(cv2.cvtColor(im, cv2.COLOR_BGR2HSV))\n        dtype = im.dtype  # uint8\n\n        x = np.arange(0, 256, dtype=r.dtype)\n        lut_hue = ((x * r[0]) % 180).astype(dtype)\n        lut_sat = np.clip(x * r[1], 0, 255).astype(dtype)\n        lut_val = np.clip(x * r[2], 0, 255).astype(dtype)\n\n        im_hsv = cv2.merge((cv2.LUT(hue, lut_hue), cv2.LUT(sat, lut_sat), cv2.LUT(val, lut_val)))\n        cv2.cvtColor(im_hsv, cv2.COLOR_HSV2BGR, dst=im)  # no return needed\n\n\ndef hist_equalize(im, clahe=True, bgr=False):\n    # Equalize histogram on BGR image 'im' with im.shape(n,m,3) and range 0-255\n    yuv = cv2.cvtColor(im, cv2.COLOR_BGR2YUV if bgr else cv2.COLOR_RGB2YUV)\n    if clahe:\n        c = cv2.createCLAHE(clipLimit=2.0, tileGridSize=(8, 8))\n        yuv[:, :, 0] = c.apply(yuv[:, :, 0])\n    else:\n        yuv[:, :, 0] = cv2.equalizeHist(yuv[:, :, 0])  # equalize Y channel histogram\n    return cv2.cvtColor(yuv, cv2.COLOR_YUV2BGR if bgr else cv2.COLOR_YUV2RGB)  # convert YUV image to RGB\n\n\ndef replicate(im, labels):\n    # Replicate labels\n    h, w = im.shape[:2]\n    boxes = labels[:, 1:].astype(int)\n    x1, y1, x2, y2 = boxes.T\n    s = ((x2 - x1) + (y2 - y1)) / 2  # side length (pixels)\n    for i in s.argsort()[:round(s.size * 0.5)]:  # smallest indices\n        x1b, y1b, x2b, y2b = boxes[i]\n        bh, bw = y2b - y1b, x2b - x1b\n        yc, xc = int(random.uniform(0, h - bh)), int(random.uniform(0, w - bw))  # offset x, y\n        x1a, y1a, x2a, y2a = [xc, yc, xc + bw, yc + bh]\n        im[y1a:y2a, x1a:x2a] = im[y1b:y2b, x1b:x2b]  # im4[ymin:ymax, xmin:xmax]\n        labels = np.append(labels, [[labels[i, 0], x1a, y1a, x2a, y2a]], axis=0)\n\n    return im, labels\n\n\ndef letterbox(im, new_shape=(640, 640), color=(114, 114, 114), auto=True, scaleFill=False, scaleup=True, stride=32):\n    # Resize and pad image while meeting stride-multiple constraints\n    shape = im.shape[:2]  # current shape [height, width]\n    if isinstance(new_shape, int):\n        new_shape = (new_shape, new_shape)\n\n    # Scale ratio (new / old)\n    r = min(new_shape[0] / shape[0], new_shape[1] / shape[1])\n    if not scaleup:  # only scale down, do not scale up (for better val mAP)\n        r = min(r, 1.0)\n\n    # Compute padding\n    ratio = r, r  # width, height ratios\n    new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r))\n    dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1]  # wh padding\n    if auto:  # minimum rectangle\n        dw, dh = np.mod(dw, stride), np.mod(dh, stride)  # wh padding\n    elif scaleFill:  # stretch\n        dw, dh = 0.0, 0.0\n        new_unpad = (new_shape[1], new_shape[0])\n        ratio = new_shape[1] / shape[1], new_shape[0] / shape[0]  # width, height ratios\n\n    dw /= 2  # divide padding into 2 sides\n    dh /= 2\n\n    if shape[::-1] != new_unpad:  # resize\n        im = cv2.resize(im, new_unpad, interpolation=cv2.INTER_LINEAR)\n    top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1))\n    left, right = int(round(dw - 0.1)), int(round(dw + 0.1))\n    im = cv2.copyMakeBorder(im, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color)  # add border\n    return im, ratio, (dw, dh)\n\n\ndef random_perspective(im,\n                       targets=(),\n                       segments=(),\n                       degrees=10,\n                       translate=.1,\n                       scale=.1,\n                       shear=10,\n                       perspective=0.0,\n                       border=(0, 0)):\n    # torchvision.transforms.RandomAffine(degrees=(-10, 10), translate=(0.1, 0.1), scale=(0.9, 1.1), shear=(-10, 10))\n    # targets = [cls, xyxy]\n\n    height = im.shape[0] + border[0] * 2  # shape(h,w,c)\n    width = im.shape[1] + border[1] * 2\n\n    # Center\n    C = np.eye(3)\n    C[0, 2] = -im.shape[1] / 2  # x translation (pixels)\n    C[1, 2] = -im.shape[0] / 2  # y translation (pixels)\n\n    # Perspective\n    P = np.eye(3)\n    P[2, 0] = random.uniform(-perspective, perspective)  # x perspective (about y)\n    P[2, 1] = random.uniform(-perspective, perspective)  # y perspective (about x)\n\n    # Rotation and Scale\n    R = np.eye(3)\n    a = random.uniform(-degrees, degrees)\n    # a += random.choice([-180, -90, 0, 90])  # add 90deg rotations to small rotations\n    s = random.uniform(1 - scale, 1 + scale)\n    # s = 2 ** random.uniform(-scale, scale)\n    R[:2] = cv2.getRotationMatrix2D(angle=a, center=(0, 0), scale=s)\n\n    # Shear\n    S = np.eye(3)\n    S[0, 1] = math.tan(random.uniform(-shear, shear) * math.pi / 180)  # x shear (deg)\n    S[1, 0] = math.tan(random.uniform(-shear, shear) * math.pi / 180)  # y shear (deg)\n\n    # Translation\n    T = np.eye(3)\n    T[0, 2] = random.uniform(0.5 - translate, 0.5 + translate) * width  # x translation (pixels)\n    T[1, 2] = random.uniform(0.5 - translate, 0.5 + translate) * height  # y translation (pixels)\n\n    # Combined rotation matrix\n    M = T @ S @ R @ P @ C  # order of operations (right to left) is IMPORTANT\n    if (border[0] != 0) or (border[1] != 0) or (M != np.eye(3)).any():  # image changed\n        if perspective:\n            im = cv2.warpPerspective(im, M, dsize=(width, height), borderValue=(114, 114, 114))\n        else:  # affine\n            im = cv2.warpAffine(im, M[:2], dsize=(width, height), borderValue=(114, 114, 114))\n\n    # Visualize\n    # import matplotlib.pyplot as plt\n    # ax = plt.subplots(1, 2, figsize=(12, 6))[1].ravel()\n    # ax[0].imshow(im[:, :, ::-1])  # base\n    # ax[1].imshow(im2[:, :, ::-1])  # warped\n\n    # Transform label coordinates\n    n = len(targets)\n    if n:\n        use_segments = any(x.any() for x in segments) and len(segments) == n\n        new = np.zeros((n, 4))\n        if use_segments:  # warp segments\n            segments = resample_segments(segments)  # upsample\n            for i, segment in enumerate(segments):\n                xy = np.ones((len(segment), 3))\n                xy[:, :2] = segment\n                xy = xy @ M.T  # transform\n                xy = xy[:, :2] / xy[:, 2:3] if perspective else xy[:, :2]  # perspective rescale or affine\n\n                # clip\n                new[i] = segment2box(xy, width, height)\n\n        else:  # warp boxes\n            xy = np.ones((n * 4, 3))\n            xy[:, :2] = targets[:, [1, 2, 3, 4, 1, 4, 3, 2]].reshape(n * 4, 2)  # x1y1, x2y2, x1y2, x2y1\n            xy = xy @ M.T  # transform\n            xy = (xy[:, :2] / xy[:, 2:3] if perspective else xy[:, :2]).reshape(n, 8)  # perspective rescale or affine\n\n            # create new boxes\n            x = xy[:, [0, 2, 4, 6]]\n            y = xy[:, [1, 3, 5, 7]]\n            new = np.concatenate((x.min(1), y.min(1), x.max(1), y.max(1))).reshape(4, n).T\n\n            # clip\n            new[:, [0, 2]] = new[:, [0, 2]].clip(0, width)\n            new[:, [1, 3]] = new[:, [1, 3]].clip(0, height)\n\n        # filter candidates\n        i = box_candidates(box1=targets[:, 1:5].T * s, box2=new.T, area_thr=0.01 if use_segments else 0.10)\n        targets = targets[i]\n        targets[:, 1:5] = new[i]\n\n    return im, targets\n\n\ndef copy_paste(im, labels, segments, p=0.5):\n    # Implement Copy-Paste augmentation https://arxiv.org/abs/2012.07177, labels as nx5 np.array(cls, xyxy)\n    n = len(segments)\n    if p and n:\n        h, w, c = im.shape  # height, width, channels\n        im_new = np.zeros(im.shape, np.uint8)\n        for j in random.sample(range(n), k=round(p * n)):\n            l, s = labels[j], segments[j]\n            box = w - l[3], l[2], w - l[1], l[4]\n            ioa = bbox_ioa(box, labels[:, 1:5])  # intersection over area\n            if (ioa < 0.30).all():  # allow 30% obscuration of existing labels\n                labels = np.concatenate((labels, [[l[0], *box]]), 0)\n                segments.append(np.concatenate((w - s[:, 0:1], s[:, 1:2]), 1))\n                cv2.drawContours(im_new, [segments[j].astype(np.int32)], -1, (1, 1, 1), cv2.FILLED)\n\n        result = cv2.flip(im, 1)  # augment segments (flip left-right)\n        i = cv2.flip(im_new, 1).astype(bool)\n        im[i] = result[i]  # cv2.imwrite('debug.jpg', im)  # debug\n\n    return im, labels, segments\n\n\ndef cutout(im, labels, p=0.5):\n    # Applies image cutout augmentation https://arxiv.org/abs/1708.04552\n    if random.random() < p:\n        h, w = im.shape[:2]\n        scales = [0.5] * 1 + [0.25] * 2 + [0.125] * 4 + [0.0625] * 8 + [0.03125] * 16  # image size fraction\n        for s in scales:\n            mask_h = random.randint(1, int(h * s))  # create random masks\n            mask_w = random.randint(1, int(w * s))\n\n            # box\n            xmin = max(0, random.randint(0, w) - mask_w // 2)\n            ymin = max(0, random.randint(0, h) - mask_h // 2)\n            xmax = min(w, xmin + mask_w)\n            ymax = min(h, ymin + mask_h)\n\n            # apply random color mask\n            im[ymin:ymax, xmin:xmax] = [random.randint(64, 191) for _ in range(3)]\n\n            # return unobscured labels\n            if len(labels) and s > 0.03:\n                box = np.array([xmin, ymin, xmax, ymax], dtype=np.float32)\n                ioa = bbox_ioa(box, xywhn2xyxy(labels[:, 1:5], w, h))  # intersection over area\n                labels = labels[ioa < 0.60]  # remove >60% obscured labels\n\n    return labels\n\n\ndef mixup(im, labels, im2, labels2):\n    # Applies MixUp augmentation https://arxiv.org/pdf/1710.09412.pdf\n    r = np.random.beta(32.0, 32.0)  # mixup ratio, alpha=beta=32.0\n    im = (im * r + im2 * (1 - r)).astype(np.uint8)\n    labels = np.concatenate((labels, labels2), 0)\n    return im, labels\n\n\ndef box_candidates(box1, box2, wh_thr=2, ar_thr=100, area_thr=0.1, eps=1e-16):  # box1(4,n), box2(4,n)\n    # Compute candidate boxes: box1 before augment, box2 after augment, wh_thr (pixels), aspect_ratio_thr, area_ratio\n    w1, h1 = box1[2] - box1[0], box1[3] - box1[1]\n    w2, h2 = box2[2] - box2[0], box2[3] - box2[1]\n    ar = np.maximum(w2 / (h2 + eps), h2 / (w2 + eps))  # aspect ratio\n    return (w2 > wh_thr) & (h2 > wh_thr) & (w2 * h2 / (w1 * h1 + eps) > area_thr) & (ar < ar_thr)  # candidates\n\n\ndef classify_albumentations(\n        augment=True,\n        size=224,\n        scale=(0.08, 1.0),\n        ratio=(0.75, 1.0 / 0.75),  # 0.75, 1.33\n        hflip=0.5,\n        vflip=0.0,\n        jitter=0.4,\n        mean=IMAGENET_MEAN,\n        std=IMAGENET_STD,\n        auto_aug=False):\n    # YOLOv5 classification Albumentations (optional, only used if package is installed)\n    prefix = colorstr('albumentations: ')\n    try:\n        import albumentations as A\n        from albumentations.pytorch import ToTensorV2\n        check_version(A.__version__, '1.0.3', hard=True)  # version requirement\n        if augment:  # Resize and crop\n            T = [A.RandomResizedCrop(height=size, width=size, scale=scale, ratio=ratio)]\n            if auto_aug:\n                # TODO: implement AugMix, AutoAug & RandAug in albumentation\n                LOGGER.info(f'{prefix}auto augmentations are currently not supported')\n            else:\n                if hflip > 0:\n                    T += [A.HorizontalFlip(p=hflip)]\n                if vflip > 0:\n                    T += [A.VerticalFlip(p=vflip)]\n                if jitter > 0:\n                    color_jitter = (float(jitter),) * 3  # repeat value for brightness, contrast, satuaration, 0 hue\n                    T += [A.ColorJitter(*color_jitter, 0)]\n        else:  # Use fixed crop for eval set (reproducibility)\n            T = [A.SmallestMaxSize(max_size=size), A.CenterCrop(height=size, width=size)]\n        T += [A.Normalize(mean=mean, std=std), ToTensorV2()]  # Normalize and convert to Tensor\n        LOGGER.info(prefix + ', '.join(f'{x}'.replace('always_apply=False, ', '') for x in T if x.p))\n        return A.Compose(T)\n\n    except ImportError:  # package not installed, skip\n        LOGGER.warning(f'{prefix}⚠️ not found, install with `pip install albumentations` (recommended)')\n    except Exception as e:\n        LOGGER.info(f'{prefix}{e}')\n\n\ndef classify_transforms(size=224):\n    # Transforms to apply if albumentations not installed\n    assert isinstance(size, int), f'ERROR: classify_transforms size {size} must be integer, not (list, tuple)'\n    # T.Compose([T.ToTensor(), T.Resize(size), T.CenterCrop(size), T.Normalize(IMAGENET_MEAN, IMAGENET_STD)])\n    return T.Compose([CenterCrop(size), ToTensor(), T.Normalize(IMAGENET_MEAN, IMAGENET_STD)])\n\n\nclass LetterBox:\n    # YOLOv5 LetterBox class for image preprocessing, i.e. T.Compose([LetterBox(size), ToTensor()])\n    def __init__(self, size=(640, 640), auto=False, stride=32):\n        super().__init__()\n        self.h, self.w = (size, size) if isinstance(size, int) else size\n        self.auto = auto  # pass max size integer, automatically solve for short side using stride\n        self.stride = stride  # used with auto\n\n    def __call__(self, im):  # im = np.array HWC\n        imh, imw = im.shape[:2]\n        r = min(self.h / imh, self.w / imw)  # ratio of new/old\n        h, w = round(imh * r), round(imw * r)  # resized image\n        hs, ws = (math.ceil(x / self.stride) * self.stride for x in (h, w)) if self.auto else self.h, self.w\n        top, left = round((hs - h) / 2 - 0.1), round((ws - w) / 2 - 0.1)\n        im_out = np.full((self.h, self.w, 3), 114, dtype=im.dtype)\n        im_out[top:top + h, left:left + w] = cv2.resize(im, (w, h), interpolation=cv2.INTER_LINEAR)\n        return im_out\n\n\nclass CenterCrop:\n    # YOLOv5 CenterCrop class for image preprocessing, i.e. T.Compose([CenterCrop(size), ToTensor()])\n    def __init__(self, size=640):\n        super().__init__()\n        self.h, self.w = (size, size) if isinstance(size, int) else size\n\n    def __call__(self, im):  # im = np.array HWC\n        imh, imw = im.shape[:2]\n        m = min(imh, imw)  # min dimension\n        top, left = (imh - m) // 2, (imw - m) // 2\n        return cv2.resize(im[top:top + m, left:left + m], (self.w, self.h), interpolation=cv2.INTER_LINEAR)\n\n\nclass ToTensor:\n    # YOLOv5 ToTensor class for image preprocessing, i.e. T.Compose([LetterBox(size), ToTensor()])\n    def __init__(self, half=False):\n        super().__init__()\n        self.half = half\n\n    def __call__(self, im):  # im = np.array HWC in BGR order\n        im = np.ascontiguousarray(im.transpose((2, 0, 1))[::-1])  # HWC to CHW -> BGR to RGB -> contiguous\n        im = torch.from_numpy(im)  # to torch\n        im = im.half() if self.half else im.float()  # uint8 to fp16/32\n        im /= 255.0  # 0-255 to 0.0-1.0\n        return im\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/utils/autoanchor.py",
    "content": "# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n\"\"\"\nAutoAnchor utils\n\"\"\"\n\nimport random\n\nimport numpy as np\nimport torch\nimport yaml\nfrom tqdm import tqdm\n\nfrom utils import TryExcept\nfrom utils.general import LOGGER, TQDM_BAR_FORMAT, colorstr\n\nPREFIX = colorstr('AutoAnchor: ')\n\n\ndef check_anchor_order(m):\n    # Check anchor order against stride order for YOLOv5 Detect() module m, and correct if necessary\n    a = m.anchors.prod(-1).mean(-1).view(-1)  # mean anchor area per output layer\n    da = a[-1] - a[0]  # delta a\n    ds = m.stride[-1] - m.stride[0]  # delta s\n    if da and (da.sign() != ds.sign()):  # same order\n        LOGGER.info(f'{PREFIX}Reversing anchor order')\n        m.anchors[:] = m.anchors.flip(0)\n\n\n@TryExcept(f'{PREFIX}ERROR')\ndef check_anchors(dataset, model, thr=4.0, imgsz=640):\n    # Check anchor fit to data, recompute if necessary\n    m = model.module.model[-1] if hasattr(model, 'module') else model.model[-1]  # Detect()\n    shapes = imgsz * dataset.shapes / dataset.shapes.max(1, keepdims=True)\n    scale = np.random.uniform(0.9, 1.1, size=(shapes.shape[0], 1))  # augment scale\n    wh = torch.tensor(np.concatenate([l[:, 3:5] * s for s, l in zip(shapes * scale, dataset.labels)])).float()  # wh\n\n    def metric(k):  # compute metric\n        r = wh[:, None] / k[None]\n        x = torch.min(r, 1 / r).min(2)[0]  # ratio metric\n        best = x.max(1)[0]  # best_x\n        aat = (x > 1 / thr).float().sum(1).mean()  # anchors above threshold\n        bpr = (best > 1 / thr).float().mean()  # best possible recall\n        return bpr, aat\n\n    stride = m.stride.to(m.anchors.device).view(-1, 1, 1)  # model strides\n    anchors = m.anchors.clone() * stride  # current anchors\n    bpr, aat = metric(anchors.cpu().view(-1, 2))\n    s = f'\\n{PREFIX}{aat:.2f} anchors/target, {bpr:.3f} Best Possible Recall (BPR). '\n    if bpr > 0.98:  # threshold to recompute\n        LOGGER.info(f'{s}Current anchors are a good fit to dataset ✅')\n    else:\n        LOGGER.info(f'{s}Anchors are a poor fit to dataset ⚠️, attempting to improve...')\n        na = m.anchors.numel() // 2  # number of anchors\n        anchors = kmean_anchors(dataset, n=na, img_size=imgsz, thr=thr, gen=1000, verbose=False)\n        new_bpr = metric(anchors)[0]\n        if new_bpr > bpr:  # replace anchors\n            anchors = torch.tensor(anchors, device=m.anchors.device).type_as(m.anchors)\n            m.anchors[:] = anchors.clone().view_as(m.anchors)\n            check_anchor_order(m)  # must be in pixel-space (not grid-space)\n            m.anchors /= stride\n            s = f'{PREFIX}Done ✅ (optional: update model *.yaml to use these anchors in the future)'\n        else:\n            s = f'{PREFIX}Done ⚠️ (original anchors better than new anchors, proceeding with original anchors)'\n        LOGGER.info(s)\n\n\ndef kmean_anchors(dataset='./data/coco128.yaml', n=9, img_size=640, thr=4.0, gen=1000, verbose=True):\n    \"\"\" Creates kmeans-evolved anchors from training dataset\n\n        Arguments:\n            dataset: path to data.yaml, or a loaded dataset\n            n: number of anchors\n            img_size: image size used for training\n            thr: anchor-label wh ratio threshold hyperparameter hyp['anchor_t'] used for training, default=4.0\n            gen: generations to evolve anchors using genetic algorithm\n            verbose: print all results\n\n        Return:\n            k: kmeans evolved anchors\n\n        Usage:\n            from utils.autoanchor import *; _ = kmean_anchors()\n    \"\"\"\n    from scipy.cluster.vq import kmeans\n\n    npr = np.random\n    thr = 1 / thr\n\n    def metric(k, wh):  # compute metrics\n        r = wh[:, None] / k[None]\n        x = torch.min(r, 1 / r).min(2)[0]  # ratio metric\n        # x = wh_iou(wh, torch.tensor(k))  # iou metric\n        return x, x.max(1)[0]  # x, best_x\n\n    def anchor_fitness(k):  # mutation fitness\n        _, best = metric(torch.tensor(k, dtype=torch.float32), wh)\n        return (best * (best > thr).float()).mean()  # fitness\n\n    def print_results(k, verbose=True):\n        k = k[np.argsort(k.prod(1))]  # sort small to large\n        x, best = metric(k, wh0)\n        bpr, aat = (best > thr).float().mean(), (x > thr).float().mean() * n  # best possible recall, anch > thr\n        s = f'{PREFIX}thr={thr:.2f}: {bpr:.4f} best possible recall, {aat:.2f} anchors past thr\\n' \\\n            f'{PREFIX}n={n}, img_size={img_size}, metric_all={x.mean():.3f}/{best.mean():.3f}-mean/best, ' \\\n            f'past_thr={x[x > thr].mean():.3f}-mean: '\n        for x in k:\n            s += '%i,%i, ' % (round(x[0]), round(x[1]))\n        if verbose:\n            LOGGER.info(s[:-2])\n        return k\n\n    if isinstance(dataset, str):  # *.yaml file\n        with open(dataset, errors='ignore') as f:\n            data_dict = yaml.safe_load(f)  # model dict\n        from utils.dataloaders import LoadImagesAndLabels\n        dataset = LoadImagesAndLabels(data_dict['train'], augment=True, rect=True)\n\n    # Get label wh\n    shapes = img_size * dataset.shapes / dataset.shapes.max(1, keepdims=True)\n    wh0 = np.concatenate([l[:, 3:5] * s for s, l in zip(shapes, dataset.labels)])  # wh\n\n    # Filter\n    i = (wh0 < 3.0).any(1).sum()\n    if i:\n        LOGGER.info(f'{PREFIX}WARNING ⚠️ Extremely small objects found: {i} of {len(wh0)} labels are <3 pixels in size')\n    wh = wh0[(wh0 >= 2.0).any(1)].astype(np.float32)  # filter > 2 pixels\n    # wh = wh * (npr.rand(wh.shape[0], 1) * 0.9 + 0.1)  # multiply by random scale 0-1\n\n    # Kmeans init\n    try:\n        LOGGER.info(f'{PREFIX}Running kmeans for {n} anchors on {len(wh)} points...')\n        assert n <= len(wh)  # apply overdetermined constraint\n        s = wh.std(0)  # sigmas for whitening\n        k = kmeans(wh / s, n, iter=30)[0] * s  # points\n        assert n == len(k)  # kmeans may return fewer points than requested if wh is insufficient or too similar\n    except Exception:\n        LOGGER.warning(f'{PREFIX}WARNING ⚠️ switching strategies from kmeans to random init')\n        k = np.sort(npr.rand(n * 2)).reshape(n, 2) * img_size  # random init\n    wh, wh0 = (torch.tensor(x, dtype=torch.float32) for x in (wh, wh0))\n    k = print_results(k, verbose=False)\n\n    # Plot\n    # k, d = [None] * 20, [None] * 20\n    # for i in tqdm(range(1, 21)):\n    #     k[i-1], d[i-1] = kmeans(wh / s, i)  # points, mean distance\n    # fig, ax = plt.subplots(1, 2, figsize=(14, 7), tight_layout=True)\n    # ax = ax.ravel()\n    # ax[0].plot(np.arange(1, 21), np.array(d) ** 2, marker='.')\n    # fig, ax = plt.subplots(1, 2, figsize=(14, 7))  # plot wh\n    # ax[0].hist(wh[wh[:, 0]<100, 0],400)\n    # ax[1].hist(wh[wh[:, 1]<100, 1],400)\n    # fig.savefig('wh.png', dpi=200)\n\n    # Evolve\n    f, sh, mp, s = anchor_fitness(k), k.shape, 0.9, 0.1  # fitness, generations, mutation prob, sigma\n    pbar = tqdm(range(gen), bar_format=TQDM_BAR_FORMAT)  # progress bar\n    for _ in pbar:\n        v = np.ones(sh)\n        while (v == 1).all():  # mutate until a change occurs (prevent duplicates)\n            v = ((npr.random(sh) < mp) * random.random() * npr.randn(*sh) * s + 1).clip(0.3, 3.0)\n        kg = (k.copy() * v).clip(min=2.0)\n        fg = anchor_fitness(kg)\n        if fg > f:\n            f, k = fg, kg.copy()\n            pbar.desc = f'{PREFIX}Evolving anchors with Genetic Algorithm: fitness = {f:.4f}'\n            if verbose:\n                print_results(k, verbose)\n\n    return print_results(k).astype(np.float32)\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/utils/autobatch.py",
    "content": "# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n\"\"\"\nAuto-batch utils\n\"\"\"\n\nfrom copy import deepcopy\n\nimport numpy as np\nimport torch\n\nfrom utils.general import LOGGER, colorstr\nfrom utils.torch_utils import profile\n\n\ndef check_train_batch_size(model, imgsz=640, amp=True):\n    # Check YOLOv5 training batch size\n    with torch.cuda.amp.autocast(amp):\n        return autobatch(deepcopy(model).train(), imgsz)  # compute optimal batch size\n\n\ndef autobatch(model, imgsz=640, fraction=0.8, batch_size=16):\n    # Automatically estimate best YOLOv5 batch size to use `fraction` of available CUDA memory\n    # Usage:\n    #     import torch\n    #     from utils.autobatch import autobatch\n    #     model = torch.hub.load('ultralytics/yolov5', 'yolov5s', autoshape=False)\n    #     print(autobatch(model))\n\n    # Check device\n    prefix = colorstr('AutoBatch: ')\n    LOGGER.info(f'{prefix}Computing optimal batch size for --imgsz {imgsz}')\n    device = next(model.parameters()).device  # get model device\n    if device.type == 'cpu':\n        LOGGER.info(f'{prefix}CUDA not detected, using default CPU batch-size {batch_size}')\n        return batch_size\n    if torch.backends.cudnn.benchmark:\n        LOGGER.info(f'{prefix} ⚠️ Requires torch.backends.cudnn.benchmark=False, using default batch-size {batch_size}')\n        return batch_size\n\n    # Inspect CUDA memory\n    gb = 1 << 30  # bytes to GiB (1024 ** 3)\n    d = str(device).upper()  # 'CUDA:0'\n    properties = torch.cuda.get_device_properties(device)  # device properties\n    t = properties.total_memory / gb  # GiB total\n    r = torch.cuda.memory_reserved(device) / gb  # GiB reserved\n    a = torch.cuda.memory_allocated(device) / gb  # GiB allocated\n    f = t - (r + a)  # GiB free\n    LOGGER.info(f'{prefix}{d} ({properties.name}) {t:.2f}G total, {r:.2f}G reserved, {a:.2f}G allocated, {f:.2f}G free')\n\n    # Profile batch sizes\n    batch_sizes = [1, 2, 4, 8, 16]\n    try:\n        img = [torch.empty(b, 3, imgsz, imgsz) for b in batch_sizes]\n        results = profile(img, model, n=3, device=device)\n    except Exception as e:\n        LOGGER.warning(f'{prefix}{e}')\n\n    # Fit a solution\n    y = [x[2] for x in results if x]  # memory [2]\n    p = np.polyfit(batch_sizes[:len(y)], y, deg=1)  # first degree polynomial fit\n    b = int((f * fraction - p[1]) / p[0])  # y intercept (optimal batch size)\n    if None in results:  # some sizes failed\n        i = results.index(None)  # first fail index\n        if b >= batch_sizes[i]:  # y intercept above failure point\n            b = batch_sizes[max(i - 1, 0)]  # select prior safe point\n    if b < 1 or b > 1024:  # b outside of safe range\n        b = batch_size\n        LOGGER.warning(f'{prefix}WARNING ⚠️ CUDA anomaly detected, recommend restart environment and retry command.')\n\n    fraction = (np.polyval(p, b) + r + a) / t  # actual fraction predicted\n    LOGGER.info(f'{prefix}Using batch-size {b} for {d} {t * fraction:.2f}G/{t:.2f}G ({fraction * 100:.0f}%) ✅')\n    return b\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/utils/aws/__init__.py",
    "content": ""
  },
  {
    "path": "yolo-improve/yolov5-AUX/utils/aws/mime.sh",
    "content": "# AWS EC2 instance startup 'MIME' script https://aws.amazon.com/premiumsupport/knowledge-center/execute-user-data-ec2/\n# This script will run on every instance restart, not only on first start\n# --- DO NOT COPY ABOVE COMMENTS WHEN PASTING INTO USERDATA ---\n\nContent-Type: multipart/mixed; boundary=\"//\"\nMIME-Version: 1.0\n\n--//\nContent-Type: text/cloud-config; charset=\"us-ascii\"\nMIME-Version: 1.0\nContent-Transfer-Encoding: 7bit\nContent-Disposition: attachment; filename=\"cloud-config.txt\"\n\n#cloud-config\ncloud_final_modules:\n- [scripts-user, always]\n\n--//\nContent-Type: text/x-shellscript; charset=\"us-ascii\"\nMIME-Version: 1.0\nContent-Transfer-Encoding: 7bit\nContent-Disposition: attachment; filename=\"userdata.txt\"\n\n#!/bin/bash\n# --- paste contents of userdata.sh here ---\n--//\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/utils/aws/resume.py",
    "content": "# Resume all interrupted trainings in yolov5/ dir including DDP trainings\n# Usage: $ python utils/aws/resume.py\n\nimport os\nimport sys\nfrom pathlib import Path\n\nimport torch\nimport yaml\n\nFILE = Path(__file__).resolve()\nROOT = FILE.parents[2]  # YOLOv5 root directory\nif str(ROOT) not in sys.path:\n    sys.path.append(str(ROOT))  # add ROOT to PATH\n\nport = 0  # --master_port\npath = Path('').resolve()\nfor last in path.rglob('*/**/last.pt'):\n    ckpt = torch.load(last)\n    if ckpt['optimizer'] is None:\n        continue\n\n    # Load opt.yaml\n    with open(last.parent.parent / 'opt.yaml', errors='ignore') as f:\n        opt = yaml.safe_load(f)\n\n    # Get device count\n    d = opt['device'].split(',')  # devices\n    nd = len(d)  # number of devices\n    ddp = nd > 1 or (nd == 0 and torch.cuda.device_count() > 1)  # distributed data parallel\n\n    if ddp:  # multi-GPU\n        port += 1\n        cmd = f'python -m torch.distributed.run --nproc_per_node {nd} --master_port {port} train.py --resume {last}'\n    else:  # single-GPU\n        cmd = f'python train.py --resume {last}'\n\n    cmd += ' > /dev/null 2>&1 &'  # redirect output to dev/null and run in daemon thread\n    print(cmd)\n    os.system(cmd)\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/utils/aws/userdata.sh",
    "content": "#!/bin/bash\n# AWS EC2 instance startup script https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/user-data.html\n# This script will run only once on first instance start (for a re-start script see mime.sh)\n# /home/ubuntu (ubuntu) or /home/ec2-user (amazon-linux) is working dir\n# Use >300 GB SSD\n\ncd home/ubuntu\nif [ ! -d yolov5 ]; then\n  echo \"Running first-time script.\" # install dependencies, download COCO, pull Docker\n  git clone https://github.com/ultralytics/yolov5 -b master && sudo chmod -R 777 yolov5\n  cd yolov5\n  bash data/scripts/get_coco.sh && echo \"COCO done.\" &\n  sudo docker pull ultralytics/yolov5:latest && echo \"Docker done.\" &\n  python -m pip install --upgrade pip && pip install -r requirements.txt && python detect.py && echo \"Requirements done.\" &\n  wait && echo \"All tasks done.\" # finish background tasks\nelse\n  echo \"Running re-start script.\" # resume interrupted runs\n  i=0\n  list=$(sudo docker ps -qa) # container list i.e. $'one\\ntwo\\nthree\\nfour'\n  while IFS= read -r id; do\n    ((i++))\n    echo \"restarting container $i: $id\"\n    sudo docker start $id\n    # sudo docker exec -it $id python train.py --resume # single-GPU\n    sudo docker exec -d $id python utils/aws/resume.py # multi-scenario\n  done <<<\"$list\"\nfi\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/utils/callbacks.py",
    "content": "# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n\"\"\"\nCallback utils\n\"\"\"\n\nimport threading\n\n\nclass Callbacks:\n    \"\"\"\"\n    Handles all registered callbacks for YOLOv5 Hooks\n    \"\"\"\n\n    def __init__(self):\n        # Define the available callbacks\n        self._callbacks = {\n            'on_pretrain_routine_start': [],\n            'on_pretrain_routine_end': [],\n            'on_train_start': [],\n            'on_train_epoch_start': [],\n            'on_train_batch_start': [],\n            'optimizer_step': [],\n            'on_before_zero_grad': [],\n            'on_train_batch_end': [],\n            'on_train_epoch_end': [],\n            'on_val_start': [],\n            'on_val_batch_start': [],\n            'on_val_image_end': [],\n            'on_val_batch_end': [],\n            'on_val_end': [],\n            'on_fit_epoch_end': [],  # fit = train + val\n            'on_model_save': [],\n            'on_train_end': [],\n            'on_params_update': [],\n            'teardown': [],}\n        self.stop_training = False  # set True to interrupt training\n\n    def register_action(self, hook, name='', callback=None):\n        \"\"\"\n        Register a new action to a callback hook\n\n        Args:\n            hook: The callback hook name to register the action to\n            name: The name of the action for later reference\n            callback: The callback to fire\n        \"\"\"\n        assert hook in self._callbacks, f\"hook '{hook}' not found in callbacks {self._callbacks}\"\n        assert callable(callback), f\"callback '{callback}' is not callable\"\n        self._callbacks[hook].append({'name': name, 'callback': callback})\n\n    def get_registered_actions(self, hook=None):\n        \"\"\"\"\n        Returns all the registered actions by callback hook\n\n        Args:\n            hook: The name of the hook to check, defaults to all\n        \"\"\"\n        return self._callbacks[hook] if hook else self._callbacks\n\n    def run(self, hook, *args, thread=False, **kwargs):\n        \"\"\"\n        Loop through the registered actions and fire all callbacks on main thread\n\n        Args:\n            hook: The name of the hook to check, defaults to all\n            args: Arguments to receive from YOLOv5\n            thread: (boolean) Run callbacks in daemon thread\n            kwargs: Keyword Arguments to receive from YOLOv5\n        \"\"\"\n\n        assert hook in self._callbacks, f\"hook '{hook}' not found in callbacks {self._callbacks}\"\n        for logger in self._callbacks[hook]:\n            if thread:\n                threading.Thread(target=logger['callback'], args=args, kwargs=kwargs, daemon=True).start()\n            else:\n                logger['callback'](*args, **kwargs)\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/utils/dataloaders.py",
    "content": "# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n\"\"\"\nDataloaders and dataset utils\n\"\"\"\n\nimport contextlib\nimport glob\nimport hashlib\nimport json\nimport math\nimport os\nimport random\nimport shutil\nimport time\nfrom itertools import repeat\nfrom multiprocessing.pool import Pool, ThreadPool\nfrom pathlib import Path\nfrom threading import Thread\nfrom urllib.parse import urlparse\n\nimport numpy as np\nimport psutil\nimport torch\nimport torch.nn.functional as F\nimport torchvision\nimport yaml\nfrom PIL import ExifTags, Image, ImageOps\nfrom torch.utils.data import DataLoader, Dataset, dataloader, distributed\nfrom tqdm import tqdm\n\nfrom utils.augmentations import (Albumentations, augment_hsv, classify_albumentations, classify_transforms, copy_paste,\n                                 letterbox, mixup, random_perspective)\nfrom utils.general import (DATASETS_DIR, LOGGER, NUM_THREADS, TQDM_BAR_FORMAT, check_dataset, check_requirements,\n                           check_yaml, clean_str, cv2, is_colab, is_kaggle, segments2boxes, unzip_file, xyn2xy,\n                           xywh2xyxy, xywhn2xyxy, xyxy2xywhn)\nfrom utils.torch_utils import torch_distributed_zero_first\n\n# Parameters\nHELP_URL = 'See https://github.com/ultralytics/yolov5/wiki/Train-Custom-Data'\nIMG_FORMATS = 'bmp', 'dng', 'jpeg', 'jpg', 'mpo', 'png', 'tif', 'tiff', 'webp', 'pfm'  # include image suffixes\nVID_FORMATS = 'asf', 'avi', 'gif', 'm4v', 'mkv', 'mov', 'mp4', 'mpeg', 'mpg', 'ts', 'wmv'  # include video suffixes\nLOCAL_RANK = int(os.getenv('LOCAL_RANK', -1))  # https://pytorch.org/docs/stable/elastic/run.html\nRANK = int(os.getenv('RANK', -1))\nPIN_MEMORY = str(os.getenv('PIN_MEMORY', True)).lower() == 'true'  # global pin_memory for dataloaders\n\n# Get orientation exif tag\nfor orientation in ExifTags.TAGS.keys():\n    if ExifTags.TAGS[orientation] == 'Orientation':\n        break\n\n\ndef get_hash(paths):\n    # Returns a single hash value of a list of paths (files or dirs)\n    size = sum(os.path.getsize(p) for p in paths if os.path.exists(p))  # sizes\n    h = hashlib.sha256(str(size).encode())  # hash sizes\n    h.update(''.join(paths).encode())  # hash paths\n    return h.hexdigest()  # return hash\n\n\ndef exif_size(img):\n    # Returns exif-corrected PIL size\n    s = img.size  # (width, height)\n    with contextlib.suppress(Exception):\n        rotation = dict(img._getexif().items())[orientation]\n        if rotation in [6, 8]:  # rotation 270 or 90\n            s = (s[1], s[0])\n    return s\n\n\ndef exif_transpose(image):\n    \"\"\"\n    Transpose a PIL image accordingly if it has an EXIF Orientation tag.\n    Inplace version of https://github.com/python-pillow/Pillow/blob/master/src/PIL/ImageOps.py exif_transpose()\n\n    :param image: The image to transpose.\n    :return: An image.\n    \"\"\"\n    exif = image.getexif()\n    orientation = exif.get(0x0112, 1)  # default 1\n    if orientation > 1:\n        method = {\n            2: Image.FLIP_LEFT_RIGHT,\n            3: Image.ROTATE_180,\n            4: Image.FLIP_TOP_BOTTOM,\n            5: Image.TRANSPOSE,\n            6: Image.ROTATE_270,\n            7: Image.TRANSVERSE,\n            8: Image.ROTATE_90}.get(orientation)\n        if method is not None:\n            image = image.transpose(method)\n            del exif[0x0112]\n            image.info['exif'] = exif.tobytes()\n    return image\n\n\ndef seed_worker(worker_id):\n    # Set dataloader worker seed https://pytorch.org/docs/stable/notes/randomness.html#dataloader\n    worker_seed = torch.initial_seed() % 2 ** 32\n    np.random.seed(worker_seed)\n    random.seed(worker_seed)\n\n\ndef create_dataloader(path,\n                      imgsz,\n                      batch_size,\n                      stride,\n                      single_cls=False,\n                      hyp=None,\n                      augment=False,\n                      cache=False,\n                      pad=0.0,\n                      rect=False,\n                      rank=-1,\n                      workers=8,\n                      image_weights=False,\n                      quad=False,\n                      prefix='',\n                      shuffle=False,\n                      seed=0):\n    if rect and shuffle:\n        LOGGER.warning('WARNING ⚠️ --rect is incompatible with DataLoader shuffle, setting shuffle=False')\n        shuffle = False\n    with torch_distributed_zero_first(rank):  # init dataset *.cache only once if DDP\n        dataset = LoadImagesAndLabels(\n            path,\n            imgsz,\n            batch_size,\n            augment=augment,  # augmentation\n            hyp=hyp,  # hyperparameters\n            rect=rect,  # rectangular batches\n            cache_images=cache,\n            single_cls=single_cls,\n            stride=int(stride),\n            pad=pad,\n            image_weights=image_weights,\n            prefix=prefix)\n\n    batch_size = min(batch_size, len(dataset))\n    nd = torch.cuda.device_count()  # number of CUDA devices\n    nw = min([os.cpu_count() // max(nd, 1), batch_size if batch_size > 1 else 0, workers])  # number of workers\n    sampler = None if rank == -1 else distributed.DistributedSampler(dataset, shuffle=shuffle)\n    loader = DataLoader if image_weights else InfiniteDataLoader  # only DataLoader allows for attribute updates\n    generator = torch.Generator()\n    generator.manual_seed(6148914691236517205 + seed + RANK)\n    return loader(dataset,\n                  batch_size=batch_size,\n                  shuffle=shuffle and sampler is None,\n                  num_workers=nw,\n                  sampler=sampler,\n                  pin_memory=PIN_MEMORY,\n                  collate_fn=LoadImagesAndLabels.collate_fn4 if quad else LoadImagesAndLabels.collate_fn,\n                  worker_init_fn=seed_worker,\n                  generator=generator), dataset\n\n\nclass InfiniteDataLoader(dataloader.DataLoader):\n    \"\"\" Dataloader that reuses workers\n\n    Uses same syntax as vanilla DataLoader\n    \"\"\"\n\n    def __init__(self, *args, **kwargs):\n        super().__init__(*args, **kwargs)\n        object.__setattr__(self, 'batch_sampler', _RepeatSampler(self.batch_sampler))\n        self.iterator = super().__iter__()\n\n    def __len__(self):\n        return len(self.batch_sampler.sampler)\n\n    def __iter__(self):\n        for _ in range(len(self)):\n            yield next(self.iterator)\n\n\nclass _RepeatSampler:\n    \"\"\" Sampler that repeats forever\n\n    Args:\n        sampler (Sampler)\n    \"\"\"\n\n    def __init__(self, sampler):\n        self.sampler = sampler\n\n    def __iter__(self):\n        while True:\n            yield from iter(self.sampler)\n\n\nclass LoadScreenshots:\n    # YOLOv5 screenshot dataloader, i.e. `python detect.py --source \"screen 0 100 100 512 256\"`\n    def __init__(self, source, img_size=640, stride=32, auto=True, transforms=None):\n        # source = [screen_number left top width height] (pixels)\n        check_requirements('mss')\n        import mss\n\n        source, *params = source.split()\n        self.screen, left, top, width, height = 0, None, None, None, None  # default to full screen 0\n        if len(params) == 1:\n            self.screen = int(params[0])\n        elif len(params) == 4:\n            left, top, width, height = (int(x) for x in params)\n        elif len(params) == 5:\n            self.screen, left, top, width, height = (int(x) for x in params)\n        self.img_size = img_size\n        self.stride = stride\n        self.transforms = transforms\n        self.auto = auto\n        self.mode = 'stream'\n        self.frame = 0\n        self.sct = mss.mss()\n\n        # Parse monitor shape\n        monitor = self.sct.monitors[self.screen]\n        self.top = monitor['top'] if top is None else (monitor['top'] + top)\n        self.left = monitor['left'] if left is None else (monitor['left'] + left)\n        self.width = width or monitor['width']\n        self.height = height or monitor['height']\n        self.monitor = {'left': self.left, 'top': self.top, 'width': self.width, 'height': self.height}\n\n    def __iter__(self):\n        return self\n\n    def __next__(self):\n        # mss screen capture: get raw pixels from the screen as np array\n        im0 = np.array(self.sct.grab(self.monitor))[:, :, :3]  # [:, :, :3] BGRA to BGR\n        s = f'screen {self.screen} (LTWH): {self.left},{self.top},{self.width},{self.height}: '\n\n        if self.transforms:\n            im = self.transforms(im0)  # transforms\n        else:\n            im = letterbox(im0, self.img_size, stride=self.stride, auto=self.auto)[0]  # padded resize\n            im = im.transpose((2, 0, 1))[::-1]  # HWC to CHW, BGR to RGB\n            im = np.ascontiguousarray(im)  # contiguous\n        self.frame += 1\n        return str(self.screen), im, im0, None, s  # screen, img, original img, im0s, s\n\n\nclass LoadImages:\n    # YOLOv5 image/video dataloader, i.e. `python detect.py --source image.jpg/vid.mp4`\n    def __init__(self, path, img_size=640, stride=32, auto=True, transforms=None, vid_stride=1):\n        if isinstance(path, str) and Path(path).suffix == '.txt':  # *.txt file with img/vid/dir on each line\n            path = Path(path).read_text().rsplit()\n        files = []\n        for p in sorted(path) if isinstance(path, (list, tuple)) else [path]:\n            p = str(Path(p).resolve())\n            if '*' in p:\n                files.extend(sorted(glob.glob(p, recursive=True)))  # glob\n            elif os.path.isdir(p):\n                files.extend(sorted(glob.glob(os.path.join(p, '*.*'))))  # dir\n            elif os.path.isfile(p):\n                files.append(p)  # files\n            else:\n                raise FileNotFoundError(f'{p} does not exist')\n\n        images = [x for x in files if x.split('.')[-1].lower() in IMG_FORMATS]\n        videos = [x for x in files if x.split('.')[-1].lower() in VID_FORMATS]\n        ni, nv = len(images), len(videos)\n\n        self.img_size = img_size\n        self.stride = stride\n        self.files = images + videos\n        self.nf = ni + nv  # number of files\n        self.video_flag = [False] * ni + [True] * nv\n        self.mode = 'image'\n        self.auto = auto\n        self.transforms = transforms  # optional\n        self.vid_stride = vid_stride  # video frame-rate stride\n        if any(videos):\n            self._new_video(videos[0])  # new video\n        else:\n            self.cap = None\n        assert self.nf > 0, f'No images or videos found in {p}. ' \\\n                            f'Supported formats are:\\nimages: {IMG_FORMATS}\\nvideos: {VID_FORMATS}'\n\n    def __iter__(self):\n        self.count = 0\n        return self\n\n    def __next__(self):\n        if self.count == self.nf:\n            raise StopIteration\n        path = self.files[self.count]\n\n        if self.video_flag[self.count]:\n            # Read video\n            self.mode = 'video'\n            for _ in range(self.vid_stride):\n                self.cap.grab()\n            ret_val, im0 = self.cap.retrieve()\n            while not ret_val:\n                self.count += 1\n                self.cap.release()\n                if self.count == self.nf:  # last video\n                    raise StopIteration\n                path = self.files[self.count]\n                self._new_video(path)\n                ret_val, im0 = self.cap.read()\n\n            self.frame += 1\n            # im0 = self._cv2_rotate(im0)  # for use if cv2 autorotation is False\n            s = f'video {self.count + 1}/{self.nf} ({self.frame}/{self.frames}) {path}: '\n\n        else:\n            # Read image\n            self.count += 1\n            im0 = cv2.imread(path)  # BGR\n            assert im0 is not None, f'Image Not Found {path}'\n            s = f'image {self.count}/{self.nf} {path}: '\n\n        if self.transforms:\n            im = self.transforms(im0)  # transforms\n        else:\n            im = letterbox(im0, self.img_size, stride=self.stride, auto=self.auto)[0]  # padded resize\n            im = im.transpose((2, 0, 1))[::-1]  # HWC to CHW, BGR to RGB\n            im = np.ascontiguousarray(im)  # contiguous\n\n        return path, im, im0, self.cap, s\n\n    def _new_video(self, path):\n        # Create a new video capture object\n        self.frame = 0\n        self.cap = cv2.VideoCapture(path)\n        self.frames = int(self.cap.get(cv2.CAP_PROP_FRAME_COUNT) / self.vid_stride)\n        self.orientation = int(self.cap.get(cv2.CAP_PROP_ORIENTATION_META))  # rotation degrees\n        # self.cap.set(cv2.CAP_PROP_ORIENTATION_AUTO, 0)  # disable https://github.com/ultralytics/yolov5/issues/8493\n\n    def _cv2_rotate(self, im):\n        # Rotate a cv2 video manually\n        if self.orientation == 0:\n            return cv2.rotate(im, cv2.ROTATE_90_CLOCKWISE)\n        elif self.orientation == 180:\n            return cv2.rotate(im, cv2.ROTATE_90_COUNTERCLOCKWISE)\n        elif self.orientation == 90:\n            return cv2.rotate(im, cv2.ROTATE_180)\n        return im\n\n    def __len__(self):\n        return self.nf  # number of files\n\n\nclass LoadStreams:\n    # YOLOv5 streamloader, i.e. `python detect.py --source 'rtsp://example.com/media.mp4'  # RTSP, RTMP, HTTP streams`\n    def __init__(self, sources='file.streams', img_size=640, stride=32, auto=True, transforms=None, vid_stride=1):\n        torch.backends.cudnn.benchmark = True  # faster for fixed-size inference\n        self.mode = 'stream'\n        self.img_size = img_size\n        self.stride = stride\n        self.vid_stride = vid_stride  # video frame-rate stride\n        sources = Path(sources).read_text().rsplit() if os.path.isfile(sources) else [sources]\n        n = len(sources)\n        self.sources = [clean_str(x) for x in sources]  # clean source names for later\n        self.imgs, self.fps, self.frames, self.threads = [None] * n, [0] * n, [0] * n, [None] * n\n        for i, s in enumerate(sources):  # index, source\n            # Start thread to read frames from video stream\n            st = f'{i + 1}/{n}: {s}... '\n            if urlparse(s).hostname in ('www.youtube.com', 'youtube.com', 'youtu.be'):  # if source is YouTube video\n                # YouTube format i.e. 'https://www.youtube.com/watch?v=Zgi9g1ksQHc' or 'https://youtu.be/Zgi9g1ksQHc'\n                check_requirements(('pafy', 'youtube_dl==2020.12.2'))\n                import pafy\n                s = pafy.new(s).getbest(preftype='mp4').url  # YouTube URL\n            s = eval(s) if s.isnumeric() else s  # i.e. s = '0' local webcam\n            if s == 0:\n                assert not is_colab(), '--source 0 webcam unsupported on Colab. Rerun command in a local environment.'\n                assert not is_kaggle(), '--source 0 webcam unsupported on Kaggle. Rerun command in a local environment.'\n            cap = cv2.VideoCapture(s)\n            assert cap.isOpened(), f'{st}Failed to open {s}'\n            w = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))\n            h = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))\n            fps = cap.get(cv2.CAP_PROP_FPS)  # warning: may return 0 or nan\n            self.frames[i] = max(int(cap.get(cv2.CAP_PROP_FRAME_COUNT)), 0) or float('inf')  # infinite stream fallback\n            self.fps[i] = max((fps if math.isfinite(fps) else 0) % 100, 0) or 30  # 30 FPS fallback\n\n            _, self.imgs[i] = cap.read()  # guarantee first frame\n            self.threads[i] = Thread(target=self.update, args=([i, cap, s]), daemon=True)\n            LOGGER.info(f'{st} Success ({self.frames[i]} frames {w}x{h} at {self.fps[i]:.2f} FPS)')\n            self.threads[i].start()\n        LOGGER.info('')  # newline\n\n        # check for common shapes\n        s = np.stack([letterbox(x, img_size, stride=stride, auto=auto)[0].shape for x in self.imgs])\n        self.rect = np.unique(s, axis=0).shape[0] == 1  # rect inference if all shapes equal\n        self.auto = auto and self.rect\n        self.transforms = transforms  # optional\n        if not self.rect:\n            LOGGER.warning('WARNING ⚠️ Stream shapes differ. For optimal performance supply similarly-shaped streams.')\n\n    def update(self, i, cap, stream):\n        # Read stream `i` frames in daemon thread\n        n, f = 0, self.frames[i]  # frame number, frame array\n        while cap.isOpened() and n < f:\n            n += 1\n            cap.grab()  # .read() = .grab() followed by .retrieve()\n            if n % self.vid_stride == 0:\n                success, im = cap.retrieve()\n                if success:\n                    self.imgs[i] = im\n                else:\n                    LOGGER.warning('WARNING ⚠️ Video stream unresponsive, please check your IP camera connection.')\n                    self.imgs[i] = np.zeros_like(self.imgs[i])\n                    cap.open(stream)  # re-open stream if signal was lost\n            time.sleep(0.0)  # wait time\n\n    def __iter__(self):\n        self.count = -1\n        return self\n\n    def __next__(self):\n        self.count += 1\n        if not all(x.is_alive() for x in self.threads) or cv2.waitKey(1) == ord('q'):  # q to quit\n            cv2.destroyAllWindows()\n            raise StopIteration\n\n        im0 = self.imgs.copy()\n        if self.transforms:\n            im = np.stack([self.transforms(x) for x in im0])  # transforms\n        else:\n            im = np.stack([letterbox(x, self.img_size, stride=self.stride, auto=self.auto)[0] for x in im0])  # resize\n            im = im[..., ::-1].transpose((0, 3, 1, 2))  # BGR to RGB, BHWC to BCHW\n            im = np.ascontiguousarray(im)  # contiguous\n\n        return self.sources, im, im0, None, ''\n\n    def __len__(self):\n        return len(self.sources)  # 1E12 frames = 32 streams at 30 FPS for 30 years\n\n\ndef img2label_paths(img_paths):\n    # Define label paths as a function of image paths\n    sa, sb = f'{os.sep}images{os.sep}', f'{os.sep}labels{os.sep}'  # /images/, /labels/ substrings\n    return [sb.join(x.rsplit(sa, 1)).rsplit('.', 1)[0] + '.txt' for x in img_paths]\n\n\nclass LoadImagesAndLabels(Dataset):\n    # YOLOv5 train_loader/val_loader, loads images and labels for training and validation\n    cache_version = 0.6  # dataset labels *.cache version\n    rand_interp_methods = [cv2.INTER_NEAREST, cv2.INTER_LINEAR, cv2.INTER_CUBIC, cv2.INTER_AREA, cv2.INTER_LANCZOS4]\n\n    def __init__(self,\n                 path,\n                 img_size=640,\n                 batch_size=16,\n                 augment=False,\n                 hyp=None,\n                 rect=False,\n                 image_weights=False,\n                 cache_images=False,\n                 single_cls=False,\n                 stride=32,\n                 pad=0.0,\n                 min_items=0,\n                 prefix=''):\n        self.img_size = img_size\n        self.augment = augment\n        self.hyp = hyp\n        self.image_weights = image_weights\n        self.rect = False if image_weights else rect\n        self.mosaic = self.augment and not self.rect  # load 4 images at a time into a mosaic (only during training)\n        self.mosaic_border = [-img_size // 2, -img_size // 2]\n        self.stride = stride\n        self.path = path\n        self.albumentations = Albumentations(size=img_size) if augment else None\n\n        try:\n            f = []  # image files\n            for p in path if isinstance(path, list) else [path]:\n                p = Path(p)  # os-agnostic\n                if p.is_dir():  # dir\n                    f += glob.glob(str(p / '**' / '*.*'), recursive=True)\n                    # f = list(p.rglob('*.*'))  # pathlib\n                elif p.is_file():  # file\n                    with open(p) as t:\n                        t = t.read().strip().splitlines()\n                        parent = str(p.parent) + os.sep\n                        f += [x.replace('./', parent, 1) if x.startswith('./') else x for x in t]  # to global path\n                        # f += [p.parent / x.lstrip(os.sep) for x in t]  # to global path (pathlib)\n                else:\n                    raise FileNotFoundError(f'{prefix}{p} does not exist')\n            self.im_files = sorted(x.replace('/', os.sep) for x in f if x.split('.')[-1].lower() in IMG_FORMATS)\n            # self.img_files = sorted([x for x in f if x.suffix[1:].lower() in IMG_FORMATS])  # pathlib\n            assert self.im_files, f'{prefix}No images found'\n        except Exception as e:\n            raise Exception(f'{prefix}Error loading data from {path}: {e}\\n{HELP_URL}') from e\n\n        # Check cache\n        self.label_files = img2label_paths(self.im_files)  # labels\n        cache_path = (p if p.is_file() else Path(self.label_files[0]).parent).with_suffix('.cache')\n        try:\n            cache, exists = np.load(cache_path, allow_pickle=True).item(), True  # load dict\n            assert cache['version'] == self.cache_version  # matches current version\n            assert cache['hash'] == get_hash(self.label_files + self.im_files)  # identical hash\n        except Exception:\n            cache, exists = self.cache_labels(cache_path, prefix), False  # run cache ops\n\n        # Display cache\n        nf, nm, ne, nc, n = cache.pop('results')  # found, missing, empty, corrupt, total\n        if exists and LOCAL_RANK in {-1, 0}:\n            d = f'Scanning {cache_path}... {nf} images, {nm + ne} backgrounds, {nc} corrupt'\n            tqdm(None, desc=prefix + d, total=n, initial=n, bar_format=TQDM_BAR_FORMAT)  # display cache results\n            if cache['msgs']:\n                LOGGER.info('\\n'.join(cache['msgs']))  # display warnings\n        assert nf > 0 or not augment, f'{prefix}No labels found in {cache_path}, can not start training. {HELP_URL}'\n\n        # Read cache\n        [cache.pop(k) for k in ('hash', 'version', 'msgs')]  # remove items\n        labels, shapes, self.segments = zip(*cache.values())\n        nl = len(np.concatenate(labels, 0))  # number of labels\n        assert nl > 0 or not augment, f'{prefix}All labels empty in {cache_path}, can not start training. {HELP_URL}'\n        self.labels = list(labels)\n        self.shapes = np.array(shapes)\n        self.im_files = list(cache.keys())  # update\n        self.label_files = img2label_paths(cache.keys())  # update\n\n        # Filter images\n        if min_items:\n            include = np.array([len(x) >= min_items for x in self.labels]).nonzero()[0].astype(int)\n            LOGGER.info(f'{prefix}{n - len(include)}/{n} images filtered from dataset')\n            self.im_files = [self.im_files[i] for i in include]\n            self.label_files = [self.label_files[i] for i in include]\n            self.labels = [self.labels[i] for i in include]\n            self.segments = [self.segments[i] for i in include]\n            self.shapes = self.shapes[include]  # wh\n\n        # Create indices\n        n = len(self.shapes)  # number of images\n        bi = np.floor(np.arange(n) / batch_size).astype(int)  # batch index\n        nb = bi[-1] + 1  # number of batches\n        self.batch = bi  # batch index of image\n        self.n = n\n        self.indices = range(n)\n\n        # Update labels\n        include_class = []  # filter labels to include only these classes (optional)\n        include_class_array = np.array(include_class).reshape(1, -1)\n        for i, (label, segment) in enumerate(zip(self.labels, self.segments)):\n            if include_class:\n                j = (label[:, 0:1] == include_class_array).any(1)\n                self.labels[i] = label[j]\n                if segment:\n                    self.segments[i] = segment[j]\n            if single_cls:  # single-class training, merge all classes into 0\n                self.labels[i][:, 0] = 0\n\n        # Rectangular Training\n        if self.rect:\n            # Sort by aspect ratio\n            s = self.shapes  # wh\n            ar = s[:, 1] / s[:, 0]  # aspect ratio\n            irect = ar.argsort()\n            self.im_files = [self.im_files[i] for i in irect]\n            self.label_files = [self.label_files[i] for i in irect]\n            self.labels = [self.labels[i] for i in irect]\n            self.segments = [self.segments[i] for i in irect]\n            self.shapes = s[irect]  # wh\n            ar = ar[irect]\n\n            # Set training image shapes\n            shapes = [[1, 1]] * nb\n            for i in range(nb):\n                ari = ar[bi == i]\n                mini, maxi = ari.min(), ari.max()\n                if maxi < 1:\n                    shapes[i] = [maxi, 1]\n                elif mini > 1:\n                    shapes[i] = [1, 1 / mini]\n\n            self.batch_shapes = np.ceil(np.array(shapes) * img_size / stride + pad).astype(int) * stride\n\n        # Cache images into RAM/disk for faster training\n        if cache_images == 'ram' and not self.check_cache_ram(prefix=prefix):\n            cache_images = False\n        self.ims = [None] * n\n        self.npy_files = [Path(f).with_suffix('.npy') for f in self.im_files]\n        if cache_images:\n            b, gb = 0, 1 << 30  # bytes of cached images, bytes per gigabytes\n            self.im_hw0, self.im_hw = [None] * n, [None] * n\n            fcn = self.cache_images_to_disk if cache_images == 'disk' else self.load_image\n            results = ThreadPool(NUM_THREADS).imap(fcn, range(n))\n            pbar = tqdm(enumerate(results), total=n, bar_format=TQDM_BAR_FORMAT, disable=LOCAL_RANK > 0)\n            for i, x in pbar:\n                if cache_images == 'disk':\n                    b += self.npy_files[i].stat().st_size\n                else:  # 'ram'\n                    self.ims[i], self.im_hw0[i], self.im_hw[i] = x  # im, hw_orig, hw_resized = load_image(self, i)\n                    b += self.ims[i].nbytes\n                pbar.desc = f'{prefix}Caching images ({b / gb:.1f}GB {cache_images})'\n            pbar.close()\n\n    def check_cache_ram(self, safety_margin=0.1, prefix=''):\n        # Check image caching requirements vs available memory\n        b, gb = 0, 1 << 30  # bytes of cached images, bytes per gigabytes\n        n = min(self.n, 30)  # extrapolate from 30 random images\n        for _ in range(n):\n            im = cv2.imread(random.choice(self.im_files))  # sample image\n            ratio = self.img_size / max(im.shape[0], im.shape[1])  # max(h, w)  # ratio\n            b += im.nbytes * ratio ** 2\n        mem_required = b * self.n / n  # GB required to cache dataset into RAM\n        mem = psutil.virtual_memory()\n        cache = mem_required * (1 + safety_margin) < mem.available  # to cache or not to cache, that is the question\n        if not cache:\n            LOGGER.info(f'{prefix}{mem_required / gb:.1f}GB RAM required, '\n                        f'{mem.available / gb:.1f}/{mem.total / gb:.1f}GB available, '\n                        f\"{'caching images ✅' if cache else 'not caching images ⚠️'}\")\n        return cache\n\n    def cache_labels(self, path=Path('./labels.cache'), prefix=''):\n        # Cache dataset labels, check images and read shapes\n        x = {}  # dict\n        nm, nf, ne, nc, msgs = 0, 0, 0, 0, []  # number missing, found, empty, corrupt, messages\n        desc = f'{prefix}Scanning {path.parent / path.stem}...'\n        with Pool(NUM_THREADS) as pool:\n            pbar = tqdm(pool.imap(verify_image_label, zip(self.im_files, self.label_files, repeat(prefix))),\n                        desc=desc,\n                        total=len(self.im_files),\n                        bar_format=TQDM_BAR_FORMAT)\n            for im_file, lb, shape, segments, nm_f, nf_f, ne_f, nc_f, msg in pbar:\n                nm += nm_f\n                nf += nf_f\n                ne += ne_f\n                nc += nc_f\n                if im_file:\n                    x[im_file] = [lb, shape, segments]\n                if msg:\n                    msgs.append(msg)\n                pbar.desc = f'{desc} {nf} images, {nm + ne} backgrounds, {nc} corrupt'\n\n        pbar.close()\n        if msgs:\n            LOGGER.info('\\n'.join(msgs))\n        if nf == 0:\n            LOGGER.warning(f'{prefix}WARNING ⚠️ No labels found in {path}. {HELP_URL}')\n        x['hash'] = get_hash(self.label_files + self.im_files)\n        x['results'] = nf, nm, ne, nc, len(self.im_files)\n        x['msgs'] = msgs  # warnings\n        x['version'] = self.cache_version  # cache version\n        try:\n            np.save(path, x)  # save cache for next time\n            path.with_suffix('.cache.npy').rename(path)  # remove .npy suffix\n            LOGGER.info(f'{prefix}New cache created: {path}')\n        except Exception as e:\n            LOGGER.warning(f'{prefix}WARNING ⚠️ Cache directory {path.parent} is not writeable: {e}')  # not writeable\n        return x\n\n    def __len__(self):\n        return len(self.im_files)\n\n    # def __iter__(self):\n    #     self.count = -1\n    #     print('ran dataset iter')\n    #     #self.shuffled_vector = np.random.permutation(self.nF) if self.augment else np.arange(self.nF)\n    #     return self\n\n    def __getitem__(self, index):\n        index = self.indices[index]  # linear, shuffled, or image_weights\n\n        hyp = self.hyp\n        mosaic = self.mosaic and random.random() < hyp['mosaic']\n        if mosaic:\n            # Load mosaic\n            img, labels = self.load_mosaic(index)\n            shapes = None\n\n            # MixUp augmentation\n            if random.random() < hyp['mixup']:\n                img, labels = mixup(img, labels, *self.load_mosaic(random.randint(0, self.n - 1)))\n\n        else:\n            # Load image\n            img, (h0, w0), (h, w) = self.load_image(index)\n\n            # Letterbox\n            shape = self.batch_shapes[self.batch[index]] if self.rect else self.img_size  # final letterboxed shape\n            img, ratio, pad = letterbox(img, shape, auto=False, scaleup=self.augment)\n            shapes = (h0, w0), ((h / h0, w / w0), pad)  # for COCO mAP rescaling\n\n            labels = self.labels[index].copy()\n            if labels.size:  # normalized xywh to pixel xyxy format\n                labels[:, 1:] = xywhn2xyxy(labels[:, 1:], ratio[0] * w, ratio[1] * h, padw=pad[0], padh=pad[1])\n\n            if self.augment:\n                img, labels = random_perspective(img,\n                                                 labels,\n                                                 degrees=hyp['degrees'],\n                                                 translate=hyp['translate'],\n                                                 scale=hyp['scale'],\n                                                 shear=hyp['shear'],\n                                                 perspective=hyp['perspective'])\n\n        nl = len(labels)  # number of labels\n        if nl:\n            labels[:, 1:5] = xyxy2xywhn(labels[:, 1:5], w=img.shape[1], h=img.shape[0], clip=True, eps=1E-3)\n\n        if self.augment:\n            # Albumentations\n            img, labels = self.albumentations(img, labels)\n            nl = len(labels)  # update after albumentations\n\n            # HSV color-space\n            augment_hsv(img, hgain=hyp['hsv_h'], sgain=hyp['hsv_s'], vgain=hyp['hsv_v'])\n\n            # Flip up-down\n            if random.random() < hyp['flipud']:\n                img = np.flipud(img)\n                if nl:\n                    labels[:, 2] = 1 - labels[:, 2]\n\n            # Flip left-right\n            if random.random() < hyp['fliplr']:\n                img = np.fliplr(img)\n                if nl:\n                    labels[:, 1] = 1 - labels[:, 1]\n\n            # Cutouts\n            # labels = cutout(img, labels, p=0.5)\n            # nl = len(labels)  # update after cutout\n\n        labels_out = torch.zeros((nl, 6))\n        if nl:\n            labels_out[:, 1:] = torch.from_numpy(labels)\n\n        # Convert\n        img = img.transpose((2, 0, 1))[::-1]  # HWC to CHW, BGR to RGB\n        img = np.ascontiguousarray(img)\n\n        return torch.from_numpy(img), labels_out, self.im_files[index], shapes\n\n    def load_image(self, i):\n        # Loads 1 image from dataset index 'i', returns (im, original hw, resized hw)\n        im, f, fn = self.ims[i], self.im_files[i], self.npy_files[i],\n        if im is None:  # not cached in RAM\n            if fn.exists():  # load npy\n                im = np.load(fn)\n            else:  # read image\n                im = cv2.imread(f)  # BGR\n                assert im is not None, f'Image Not Found {f}'\n            h0, w0 = im.shape[:2]  # orig hw\n            r = self.img_size / max(h0, w0)  # ratio\n            if r != 1:  # if sizes are not equal\n                interp = cv2.INTER_LINEAR if (self.augment or r > 1) else cv2.INTER_AREA\n                im = cv2.resize(im, (math.ceil(w0 * r), math.ceil(h0 * r)), interpolation=interp)\n            return im, (h0, w0), im.shape[:2]  # im, hw_original, hw_resized\n        return self.ims[i], self.im_hw0[i], self.im_hw[i]  # im, hw_original, hw_resized\n\n    def cache_images_to_disk(self, i):\n        # Saves an image as an *.npy file for faster loading\n        f = self.npy_files[i]\n        if not f.exists():\n            np.save(f.as_posix(), cv2.imread(self.im_files[i]))\n\n    def load_mosaic(self, index):\n        # YOLOv5 4-mosaic loader. Loads 1 image + 3 random images into a 4-image mosaic\n        labels4, segments4 = [], []\n        s = self.img_size\n        yc, xc = (int(random.uniform(-x, 2 * s + x)) for x in self.mosaic_border)  # mosaic center x, y\n        indices = [index] + random.choices(self.indices, k=3)  # 3 additional image indices\n        random.shuffle(indices)\n        for i, index in enumerate(indices):\n            # Load image\n            img, _, (h, w) = self.load_image(index)\n\n            # place img in img4\n            if i == 0:  # top left\n                img4 = np.full((s * 2, s * 2, img.shape[2]), 114, dtype=np.uint8)  # base image with 4 tiles\n                x1a, y1a, x2a, y2a = max(xc - w, 0), max(yc - h, 0), xc, yc  # xmin, ymin, xmax, ymax (large image)\n                x1b, y1b, x2b, y2b = w - (x2a - x1a), h - (y2a - y1a), w, h  # xmin, ymin, xmax, ymax (small image)\n            elif i == 1:  # top right\n                x1a, y1a, x2a, y2a = xc, max(yc - h, 0), min(xc + w, s * 2), yc\n                x1b, y1b, x2b, y2b = 0, h - (y2a - y1a), min(w, x2a - x1a), h\n            elif i == 2:  # bottom left\n                x1a, y1a, x2a, y2a = max(xc - w, 0), yc, xc, min(s * 2, yc + h)\n                x1b, y1b, x2b, y2b = w - (x2a - x1a), 0, w, min(y2a - y1a, h)\n            elif i == 3:  # bottom right\n                x1a, y1a, x2a, y2a = xc, yc, min(xc + w, s * 2), min(s * 2, yc + h)\n                x1b, y1b, x2b, y2b = 0, 0, min(w, x2a - x1a), min(y2a - y1a, h)\n\n            img4[y1a:y2a, x1a:x2a] = img[y1b:y2b, x1b:x2b]  # img4[ymin:ymax, xmin:xmax]\n            padw = x1a - x1b\n            padh = y1a - y1b\n\n            # Labels\n            labels, segments = self.labels[index].copy(), self.segments[index].copy()\n            if labels.size:\n                labels[:, 1:] = xywhn2xyxy(labels[:, 1:], w, h, padw, padh)  # normalized xywh to pixel xyxy format\n                segments = [xyn2xy(x, w, h, padw, padh) for x in segments]\n            labels4.append(labels)\n            segments4.extend(segments)\n\n        # Concat/clip labels\n        labels4 = np.concatenate(labels4, 0)\n        for x in (labels4[:, 1:], *segments4):\n            np.clip(x, 0, 2 * s, out=x)  # clip when using random_perspective()\n        # img4, labels4 = replicate(img4, labels4)  # replicate\n\n        # Augment\n        img4, labels4, segments4 = copy_paste(img4, labels4, segments4, p=self.hyp['copy_paste'])\n        img4, labels4 = random_perspective(img4,\n                                           labels4,\n                                           segments4,\n                                           degrees=self.hyp['degrees'],\n                                           translate=self.hyp['translate'],\n                                           scale=self.hyp['scale'],\n                                           shear=self.hyp['shear'],\n                                           perspective=self.hyp['perspective'],\n                                           border=self.mosaic_border)  # border to remove\n\n        return img4, labels4\n\n    def load_mosaic9(self, index):\n        # YOLOv5 9-mosaic loader. Loads 1 image + 8 random images into a 9-image mosaic\n        labels9, segments9 = [], []\n        s = self.img_size\n        indices = [index] + random.choices(self.indices, k=8)  # 8 additional image indices\n        random.shuffle(indices)\n        hp, wp = -1, -1  # height, width previous\n        for i, index in enumerate(indices):\n            # Load image\n            img, _, (h, w) = self.load_image(index)\n\n            # place img in img9\n            if i == 0:  # center\n                img9 = np.full((s * 3, s * 3, img.shape[2]), 114, dtype=np.uint8)  # base image with 4 tiles\n                h0, w0 = h, w\n                c = s, s, s + w, s + h  # xmin, ymin, xmax, ymax (base) coordinates\n            elif i == 1:  # top\n                c = s, s - h, s + w, s\n            elif i == 2:  # top right\n                c = s + wp, s - h, s + wp + w, s\n            elif i == 3:  # right\n                c = s + w0, s, s + w0 + w, s + h\n            elif i == 4:  # bottom right\n                c = s + w0, s + hp, s + w0 + w, s + hp + h\n            elif i == 5:  # bottom\n                c = s + w0 - w, s + h0, s + w0, s + h0 + h\n            elif i == 6:  # bottom left\n                c = s + w0 - wp - w, s + h0, s + w0 - wp, s + h0 + h\n            elif i == 7:  # left\n                c = s - w, s + h0 - h, s, s + h0\n            elif i == 8:  # top left\n                c = s - w, s + h0 - hp - h, s, s + h0 - hp\n\n            padx, pady = c[:2]\n            x1, y1, x2, y2 = (max(x, 0) for x in c)  # allocate coords\n\n            # Labels\n            labels, segments = self.labels[index].copy(), self.segments[index].copy()\n            if labels.size:\n                labels[:, 1:] = xywhn2xyxy(labels[:, 1:], w, h, padx, pady)  # normalized xywh to pixel xyxy format\n                segments = [xyn2xy(x, w, h, padx, pady) for x in segments]\n            labels9.append(labels)\n            segments9.extend(segments)\n\n            # Image\n            img9[y1:y2, x1:x2] = img[y1 - pady:, x1 - padx:]  # img9[ymin:ymax, xmin:xmax]\n            hp, wp = h, w  # height, width previous\n\n        # Offset\n        yc, xc = (int(random.uniform(0, s)) for _ in self.mosaic_border)  # mosaic center x, y\n        img9 = img9[yc:yc + 2 * s, xc:xc + 2 * s]\n\n        # Concat/clip labels\n        labels9 = np.concatenate(labels9, 0)\n        labels9[:, [1, 3]] -= xc\n        labels9[:, [2, 4]] -= yc\n        c = np.array([xc, yc])  # centers\n        segments9 = [x - c for x in segments9]\n\n        for x in (labels9[:, 1:], *segments9):\n            np.clip(x, 0, 2 * s, out=x)  # clip when using random_perspective()\n        # img9, labels9 = replicate(img9, labels9)  # replicate\n\n        # Augment\n        img9, labels9, segments9 = copy_paste(img9, labels9, segments9, p=self.hyp['copy_paste'])\n        img9, labels9 = random_perspective(img9,\n                                           labels9,\n                                           segments9,\n                                           degrees=self.hyp['degrees'],\n                                           translate=self.hyp['translate'],\n                                           scale=self.hyp['scale'],\n                                           shear=self.hyp['shear'],\n                                           perspective=self.hyp['perspective'],\n                                           border=self.mosaic_border)  # border to remove\n\n        return img9, labels9\n\n    @staticmethod\n    def collate_fn(batch):\n        im, label, path, shapes = zip(*batch)  # transposed\n        for i, lb in enumerate(label):\n            lb[:, 0] = i  # add target image index for build_targets()\n        return torch.stack(im, 0), torch.cat(label, 0), path, shapes\n\n    @staticmethod\n    def collate_fn4(batch):\n        im, label, path, shapes = zip(*batch)  # transposed\n        n = len(shapes) // 4\n        im4, label4, path4, shapes4 = [], [], path[:n], shapes[:n]\n\n        ho = torch.tensor([[0.0, 0, 0, 1, 0, 0]])\n        wo = torch.tensor([[0.0, 0, 1, 0, 0, 0]])\n        s = torch.tensor([[1, 1, 0.5, 0.5, 0.5, 0.5]])  # scale\n        for i in range(n):  # zidane torch.zeros(16,3,720,1280)  # BCHW\n            i *= 4\n            if random.random() < 0.5:\n                im1 = F.interpolate(im[i].unsqueeze(0).float(), scale_factor=2.0, mode='bilinear',\n                                    align_corners=False)[0].type(im[i].type())\n                lb = label[i]\n            else:\n                im1 = torch.cat((torch.cat((im[i], im[i + 1]), 1), torch.cat((im[i + 2], im[i + 3]), 1)), 2)\n                lb = torch.cat((label[i], label[i + 1] + ho, label[i + 2] + wo, label[i + 3] + ho + wo), 0) * s\n            im4.append(im1)\n            label4.append(lb)\n\n        for i, lb in enumerate(label4):\n            lb[:, 0] = i  # add target image index for build_targets()\n\n        return torch.stack(im4, 0), torch.cat(label4, 0), path4, shapes4\n\n\n# Ancillary functions --------------------------------------------------------------------------------------------------\ndef flatten_recursive(path=DATASETS_DIR / 'coco128'):\n    # Flatten a recursive directory by bringing all files to top level\n    new_path = Path(f'{str(path)}_flat')\n    if os.path.exists(new_path):\n        shutil.rmtree(new_path)  # delete output folder\n    os.makedirs(new_path)  # make new output folder\n    for file in tqdm(glob.glob(f'{str(Path(path))}/**/*.*', recursive=True)):\n        shutil.copyfile(file, new_path / Path(file).name)\n\n\ndef extract_boxes(path=DATASETS_DIR / 'coco128'):  # from utils.dataloaders import *; extract_boxes()\n    # Convert detection dataset into classification dataset, with one directory per class\n    path = Path(path)  # images dir\n    shutil.rmtree(path / 'classification') if (path / 'classification').is_dir() else None  # remove existing\n    files = list(path.rglob('*.*'))\n    n = len(files)  # number of files\n    for im_file in tqdm(files, total=n):\n        if im_file.suffix[1:] in IMG_FORMATS:\n            # image\n            im = cv2.imread(str(im_file))[..., ::-1]  # BGR to RGB\n            h, w = im.shape[:2]\n\n            # labels\n            lb_file = Path(img2label_paths([str(im_file)])[0])\n            if Path(lb_file).exists():\n                with open(lb_file) as f:\n                    lb = np.array([x.split() for x in f.read().strip().splitlines()], dtype=np.float32)  # labels\n\n                for j, x in enumerate(lb):\n                    c = int(x[0])  # class\n                    f = (path / 'classifier') / f'{c}' / f'{path.stem}_{im_file.stem}_{j}.jpg'  # new filename\n                    if not f.parent.is_dir():\n                        f.parent.mkdir(parents=True)\n\n                    b = x[1:] * [w, h, w, h]  # box\n                    # b[2:] = b[2:].max()  # rectangle to square\n                    b[2:] = b[2:] * 1.2 + 3  # pad\n                    b = xywh2xyxy(b.reshape(-1, 4)).ravel().astype(int)\n\n                    b[[0, 2]] = np.clip(b[[0, 2]], 0, w)  # clip boxes outside of image\n                    b[[1, 3]] = np.clip(b[[1, 3]], 0, h)\n                    assert cv2.imwrite(str(f), im[b[1]:b[3], b[0]:b[2]]), f'box failure in {f}'\n\n\ndef autosplit(path=DATASETS_DIR / 'coco128/images', weights=(0.9, 0.1, 0.0), annotated_only=False):\n    \"\"\" Autosplit a dataset into train/val/test splits and save path/autosplit_*.txt files\n    Usage: from utils.dataloaders import *; autosplit()\n    Arguments\n        path:            Path to images directory\n        weights:         Train, val, test weights (list, tuple)\n        annotated_only:  Only use images with an annotated txt file\n    \"\"\"\n    path = Path(path)  # images dir\n    files = sorted(x for x in path.rglob('*.*') if x.suffix[1:].lower() in IMG_FORMATS)  # image files only\n    n = len(files)  # number of files\n    random.seed(0)  # for reproducibility\n    indices = random.choices([0, 1, 2], weights=weights, k=n)  # assign each image to a split\n\n    txt = ['autosplit_train.txt', 'autosplit_val.txt', 'autosplit_test.txt']  # 3 txt files\n    for x in txt:\n        if (path.parent / x).exists():\n            (path.parent / x).unlink()  # remove existing\n\n    print(f'Autosplitting images from {path}' + ', using *.txt labeled images only' * annotated_only)\n    for i, img in tqdm(zip(indices, files), total=n):\n        if not annotated_only or Path(img2label_paths([str(img)])[0]).exists():  # check label\n            with open(path.parent / txt[i], 'a') as f:\n                f.write(f'./{img.relative_to(path.parent).as_posix()}' + '\\n')  # add image to txt file\n\n\ndef verify_image_label(args):\n    # Verify one image-label pair\n    im_file, lb_file, prefix = args\n    nm, nf, ne, nc, msg, segments = 0, 0, 0, 0, '', []  # number (missing, found, empty, corrupt), message, segments\n    try:\n        # verify images\n        im = Image.open(im_file)\n        im.verify()  # PIL verify\n        shape = exif_size(im)  # image size\n        assert (shape[0] > 9) & (shape[1] > 9), f'image size {shape} <10 pixels'\n        assert im.format.lower() in IMG_FORMATS, f'invalid image format {im.format}'\n        if im.format.lower() in ('jpg', 'jpeg'):\n            with open(im_file, 'rb') as f:\n                f.seek(-2, 2)\n                if f.read() != b'\\xff\\xd9':  # corrupt JPEG\n                    ImageOps.exif_transpose(Image.open(im_file)).save(im_file, 'JPEG', subsampling=0, quality=100)\n                    msg = f'{prefix}WARNING ⚠️ {im_file}: corrupt JPEG restored and saved'\n\n        # verify labels\n        if os.path.isfile(lb_file):\n            nf = 1  # label found\n            with open(lb_file) as f:\n                lb = [x.split() for x in f.read().strip().splitlines() if len(x)]\n                if any(len(x) > 6 for x in lb):  # is segment\n                    classes = np.array([x[0] for x in lb], dtype=np.float32)\n                    segments = [np.array(x[1:], dtype=np.float32).reshape(-1, 2) for x in lb]  # (cls, xy1...)\n                    lb = np.concatenate((classes.reshape(-1, 1), segments2boxes(segments)), 1)  # (cls, xywh)\n                lb = np.array(lb, dtype=np.float32)\n            nl = len(lb)\n            if nl:\n                assert lb.shape[1] == 5, f'labels require 5 columns, {lb.shape[1]} columns detected'\n                assert (lb >= 0).all(), f'negative label values {lb[lb < 0]}'\n                assert (lb[:, 1:] <= 1).all(), f'non-normalized or out of bounds coordinates {lb[:, 1:][lb[:, 1:] > 1]}'\n                _, i = np.unique(lb, axis=0, return_index=True)\n                if len(i) < nl:  # duplicate row check\n                    lb = lb[i]  # remove duplicates\n                    if segments:\n                        segments = [segments[x] for x in i]\n                    msg = f'{prefix}WARNING ⚠️ {im_file}: {nl - len(i)} duplicate labels removed'\n            else:\n                ne = 1  # label empty\n                lb = np.zeros((0, 5), dtype=np.float32)\n        else:\n            nm = 1  # label missing\n            lb = np.zeros((0, 5), dtype=np.float32)\n        return im_file, lb, shape, segments, nm, nf, ne, nc, msg\n    except Exception as e:\n        nc = 1\n        msg = f'{prefix}WARNING ⚠️ {im_file}: ignoring corrupt image/label: {e}'\n        return [None, None, None, None, nm, nf, ne, nc, msg]\n\n\nclass HUBDatasetStats():\n    \"\"\" Class for generating HUB dataset JSON and `-hub` dataset directory\n\n    Arguments\n        path:           Path to data.yaml or data.zip (with data.yaml inside data.zip)\n        autodownload:   Attempt to download dataset if not found locally\n\n    Usage\n        from utils.dataloaders import HUBDatasetStats\n        stats = HUBDatasetStats('coco128.yaml', autodownload=True)  # usage 1\n        stats = HUBDatasetStats('path/to/coco128.zip')  # usage 2\n        stats.get_json(save=False)\n        stats.process_images()\n    \"\"\"\n\n    def __init__(self, path='coco128.yaml', autodownload=False):\n        # Initialize class\n        zipped, data_dir, yaml_path = self._unzip(Path(path))\n        try:\n            with open(check_yaml(yaml_path), errors='ignore') as f:\n                data = yaml.safe_load(f)  # data dict\n                if zipped:\n                    data['path'] = data_dir\n        except Exception as e:\n            raise Exception('error/HUB/dataset_stats/yaml_load') from e\n\n        check_dataset(data, autodownload)  # download dataset if missing\n        self.hub_dir = Path(data['path'] + '-hub')\n        self.im_dir = self.hub_dir / 'images'\n        self.im_dir.mkdir(parents=True, exist_ok=True)  # makes /images\n        self.stats = {'nc': data['nc'], 'names': list(data['names'].values())}  # statistics dictionary\n        self.data = data\n\n    @staticmethod\n    def _find_yaml(dir):\n        # Return data.yaml file\n        files = list(dir.glob('*.yaml')) or list(dir.rglob('*.yaml'))  # try root level first and then recursive\n        assert files, f'No *.yaml file found in {dir}'\n        if len(files) > 1:\n            files = [f for f in files if f.stem == dir.stem]  # prefer *.yaml files that match dir name\n            assert files, f'Multiple *.yaml files found in {dir}, only 1 *.yaml file allowed'\n        assert len(files) == 1, f'Multiple *.yaml files found: {files}, only 1 *.yaml file allowed in {dir}'\n        return files[0]\n\n    def _unzip(self, path):\n        # Unzip data.zip\n        if not str(path).endswith('.zip'):  # path is data.yaml\n            return False, None, path\n        assert Path(path).is_file(), f'Error unzipping {path}, file not found'\n        unzip_file(path, path=path.parent)\n        dir = path.with_suffix('')  # dataset directory == zip name\n        assert dir.is_dir(), f'Error unzipping {path}, {dir} not found. path/to/abc.zip MUST unzip to path/to/abc/'\n        return True, str(dir), self._find_yaml(dir)  # zipped, data_dir, yaml_path\n\n    def _hub_ops(self, f, max_dim=1920):\n        # HUB ops for 1 image 'f': resize and save at reduced quality in /dataset-hub for web/app viewing\n        f_new = self.im_dir / Path(f).name  # dataset-hub image filename\n        try:  # use PIL\n            im = Image.open(f)\n            r = max_dim / max(im.height, im.width)  # ratio\n            if r < 1.0:  # image too large\n                im = im.resize((int(im.width * r), int(im.height * r)))\n            im.save(f_new, 'JPEG', quality=50, optimize=True)  # save\n        except Exception as e:  # use OpenCV\n            LOGGER.info(f'WARNING ⚠️ HUB ops PIL failure {f}: {e}')\n            im = cv2.imread(f)\n            im_height, im_width = im.shape[:2]\n            r = max_dim / max(im_height, im_width)  # ratio\n            if r < 1.0:  # image too large\n                im = cv2.resize(im, (int(im_width * r), int(im_height * r)), interpolation=cv2.INTER_AREA)\n            cv2.imwrite(str(f_new), im)\n\n    def get_json(self, save=False, verbose=False):\n        # Return dataset JSON for Ultralytics HUB\n        def _round(labels):\n            # Update labels to integer class and 6 decimal place floats\n            return [[int(c), *(round(x, 4) for x in points)] for c, *points in labels]\n\n        for split in 'train', 'val', 'test':\n            if self.data.get(split) is None:\n                self.stats[split] = None  # i.e. no test set\n                continue\n            dataset = LoadImagesAndLabels(self.data[split])  # load dataset\n            x = np.array([\n                np.bincount(label[:, 0].astype(int), minlength=self.data['nc'])\n                for label in tqdm(dataset.labels, total=dataset.n, desc='Statistics')])  # shape(128x80)\n            self.stats[split] = {\n                'instance_stats': {\n                    'total': int(x.sum()),\n                    'per_class': x.sum(0).tolist()},\n                'image_stats': {\n                    'total': dataset.n,\n                    'unlabelled': int(np.all(x == 0, 1).sum()),\n                    'per_class': (x > 0).sum(0).tolist()},\n                'labels': [{\n                    str(Path(k).name): _round(v.tolist())} for k, v in zip(dataset.im_files, dataset.labels)]}\n\n        # Save, print and return\n        if save:\n            stats_path = self.hub_dir / 'stats.json'\n            print(f'Saving {stats_path.resolve()}...')\n            with open(stats_path, 'w') as f:\n                json.dump(self.stats, f)  # save stats.json\n        if verbose:\n            print(json.dumps(self.stats, indent=2, sort_keys=False))\n        return self.stats\n\n    def process_images(self):\n        # Compress images for Ultralytics HUB\n        for split in 'train', 'val', 'test':\n            if self.data.get(split) is None:\n                continue\n            dataset = LoadImagesAndLabels(self.data[split])  # load dataset\n            desc = f'{split} images'\n            for _ in tqdm(ThreadPool(NUM_THREADS).imap(self._hub_ops, dataset.im_files), total=dataset.n, desc=desc):\n                pass\n        print(f'Done. All images saved to {self.im_dir}')\n        return self.im_dir\n\n\n# Classification dataloaders -------------------------------------------------------------------------------------------\nclass ClassificationDataset(torchvision.datasets.ImageFolder):\n    \"\"\"\n    YOLOv5 Classification Dataset.\n    Arguments\n        root:  Dataset path\n        transform:  torchvision transforms, used by default\n        album_transform: Albumentations transforms, used if installed\n    \"\"\"\n\n    def __init__(self, root, augment, imgsz, cache=False):\n        super().__init__(root=root)\n        self.torch_transforms = classify_transforms(imgsz)\n        self.album_transforms = classify_albumentations(augment, imgsz) if augment else None\n        self.cache_ram = cache is True or cache == 'ram'\n        self.cache_disk = cache == 'disk'\n        self.samples = [list(x) + [Path(x[0]).with_suffix('.npy'), None] for x in self.samples]  # file, index, npy, im\n\n    def __getitem__(self, i):\n        f, j, fn, im = self.samples[i]  # filename, index, filename.with_suffix('.npy'), image\n        if self.cache_ram and im is None:\n            im = self.samples[i][3] = cv2.imread(f)\n        elif self.cache_disk:\n            if not fn.exists():  # load npy\n                np.save(fn.as_posix(), cv2.imread(f))\n            im = np.load(fn)\n        else:  # read image\n            im = cv2.imread(f)  # BGR\n        if self.album_transforms:\n            sample = self.album_transforms(image=cv2.cvtColor(im, cv2.COLOR_BGR2RGB))['image']\n        else:\n            sample = self.torch_transforms(im)\n        return sample, j\n\n\ndef create_classification_dataloader(path,\n                                     imgsz=224,\n                                     batch_size=16,\n                                     augment=True,\n                                     cache=False,\n                                     rank=-1,\n                                     workers=8,\n                                     shuffle=True):\n    # Returns Dataloader object to be used with YOLOv5 Classifier\n    with torch_distributed_zero_first(rank):  # init dataset *.cache only once if DDP\n        dataset = ClassificationDataset(root=path, imgsz=imgsz, augment=augment, cache=cache)\n    batch_size = min(batch_size, len(dataset))\n    nd = torch.cuda.device_count()\n    nw = min([os.cpu_count() // max(nd, 1), batch_size if batch_size > 1 else 0, workers])\n    sampler = None if rank == -1 else distributed.DistributedSampler(dataset, shuffle=shuffle)\n    generator = torch.Generator()\n    generator.manual_seed(6148914691236517205 + RANK)\n    return InfiniteDataLoader(dataset,\n                              batch_size=batch_size,\n                              shuffle=shuffle and sampler is None,\n                              num_workers=nw,\n                              sampler=sampler,\n                              pin_memory=PIN_MEMORY,\n                              worker_init_fn=seed_worker,\n                              generator=generator)  # or DataLoader(persistent_workers=True)\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/utils/docker/Dockerfile",
    "content": "# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n# Builds ultralytics/yolov5:latest image on DockerHub https://hub.docker.com/r/ultralytics/yolov5\n# Image is CUDA-optimized for YOLOv5 single/multi-GPU training and inference\n\n# Start FROM NVIDIA PyTorch image https://ngc.nvidia.com/catalog/containers/nvidia:pytorch\n# FROM docker.io/pytorch/pytorch:latest\nFROM pytorch/pytorch:latest\n\n# Downloads to user config dir\nADD https://ultralytics.com/assets/Arial.ttf https://ultralytics.com/assets/Arial.Unicode.ttf /root/.config/Ultralytics/\n\n# Install linux packages\nENV DEBIAN_FRONTEND noninteractive\nRUN apt update\nRUN TZ=Etc/UTC apt install -y tzdata\nRUN apt install --no-install-recommends -y gcc git zip curl htop libgl1-mesa-glx libglib2.0-0 libpython3-dev gnupg\n# RUN alias python=python3\n\n# Security updates\n# https://security.snyk.io/vuln/SNYK-UBUNTU1804-OPENSSL-3314796\nRUN apt upgrade --no-install-recommends -y openssl\n\n# Create working directory\nRUN rm -rf /usr/src/app && mkdir -p /usr/src/app\nWORKDIR /usr/src/app\n\n# Copy contents\n# COPY . /usr/src/app  (issues as not a .git directory)\nRUN git clone https://github.com/ultralytics/yolov5 /usr/src/app\n\n# Install pip packages\nCOPY requirements.txt .\nRUN python3 -m pip install --upgrade pip wheel\nRUN pip install --no-cache -r requirements.txt albumentations comet gsutil notebook \\\n    coremltools onnx onnx-simplifier onnxruntime 'openvino-dev>=2022.3'\n    # tensorflow tensorflowjs \\\n\n# Set environment variables\nENV OMP_NUM_THREADS=1\n\n# Cleanup\nENV DEBIAN_FRONTEND teletype\n\n\n# Usage Examples -------------------------------------------------------------------------------------------------------\n\n# Build and Push\n# t=ultralytics/yolov5:latest && sudo docker build -f utils/docker/Dockerfile -t $t . && sudo docker push $t\n\n# Pull and Run\n# t=ultralytics/yolov5:latest && sudo docker pull $t && sudo docker run -it --ipc=host --gpus all $t\n\n# Pull and Run with local directory access\n# t=ultralytics/yolov5:latest && sudo docker pull $t && sudo docker run -it --ipc=host --gpus all -v \"$(pwd)\"/datasets:/usr/src/datasets $t\n\n# Kill all\n# sudo docker kill $(sudo docker ps -q)\n\n# Kill all image-based\n# sudo docker kill $(sudo docker ps -qa --filter ancestor=ultralytics/yolov5:latest)\n\n# DockerHub tag update\n# t=ultralytics/yolov5:latest tnew=ultralytics/yolov5:v6.2 && sudo docker pull $t && sudo docker tag $t $tnew && sudo docker push $tnew\n\n# Clean up\n# sudo docker system prune -a --volumes\n\n# Update Ubuntu drivers\n# https://www.maketecheasier.com/install-nvidia-drivers-ubuntu/\n\n# DDP test\n# python -m torch.distributed.run --nproc_per_node 2 --master_port 1 train.py --epochs 3\n\n# GCP VM from Image\n# docker.io/ultralytics/yolov5:latest\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/utils/docker/Dockerfile-arm64",
    "content": "# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n# Builds ultralytics/yolov5:latest-arm64 image on DockerHub https://hub.docker.com/r/ultralytics/yolov5\n# Image is aarch64-compatible for Apple M1 and other ARM architectures i.e. Jetson Nano and Raspberry Pi\n\n# Start FROM Ubuntu image https://hub.docker.com/_/ubuntu\nFROM arm64v8/ubuntu:rolling\n\n# Downloads to user config dir\nADD https://ultralytics.com/assets/Arial.ttf https://ultralytics.com/assets/Arial.Unicode.ttf /root/.config/Ultralytics/\n\n# Install linux packages\nENV DEBIAN_FRONTEND noninteractive\nRUN apt update\nRUN TZ=Etc/UTC apt install -y tzdata\nRUN apt install --no-install-recommends -y python3-pip git zip curl htop gcc libgl1-mesa-glx libglib2.0-0 libpython3-dev\n# RUN alias python=python3\n\n# Install pip packages\nCOPY requirements.txt .\nRUN python3 -m pip install --upgrade pip wheel\nRUN pip install --no-cache -r requirements.txt albumentations gsutil notebook \\\n    coremltools onnx onnxruntime\n    # tensorflow-aarch64 tensorflowjs \\\n\n# Create working directory\nRUN mkdir -p /usr/src/app\nWORKDIR /usr/src/app\n\n# Copy contents\n# COPY . /usr/src/app  (issues as not a .git directory)\nRUN git clone https://github.com/ultralytics/yolov5 /usr/src/app\nENV DEBIAN_FRONTEND teletype\n\n\n# Usage Examples -------------------------------------------------------------------------------------------------------\n\n# Build and Push\n# t=ultralytics/yolov5:latest-arm64 && sudo docker build --platform linux/arm64 -f utils/docker/Dockerfile-arm64 -t $t . && sudo docker push $t\n\n# Pull and Run\n# t=ultralytics/yolov5:latest-arm64 && sudo docker pull $t && sudo docker run -it --ipc=host -v \"$(pwd)\"/datasets:/usr/src/datasets $t\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/utils/docker/Dockerfile-cpu",
    "content": "# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n# Builds ultralytics/yolov5:latest-cpu image on DockerHub https://hub.docker.com/r/ultralytics/yolov5\n# Image is CPU-optimized for ONNX, OpenVINO and PyTorch YOLOv5 deployments\n\n# Start FROM Ubuntu image https://hub.docker.com/_/ubuntu\nFROM ubuntu:rolling\n\n# Downloads to user config dir\nADD https://ultralytics.com/assets/Arial.ttf https://ultralytics.com/assets/Arial.Unicode.ttf /root/.config/Ultralytics/\n\n# Install linux packages\nENV DEBIAN_FRONTEND noninteractive\nRUN apt update\nRUN TZ=Etc/UTC apt install -y tzdata\nRUN apt install --no-install-recommends -y python3-pip git zip curl htop libgl1-mesa-glx libglib2.0-0 libpython3-dev gnupg\n# RUN alias python=python3\n\n# Install pip packages\nCOPY requirements.txt .\nRUN python3 -m pip install --upgrade pip wheel\nRUN pip install --no-cache -r requirements.txt albumentations gsutil notebook \\\n    coremltools onnx onnx-simplifier onnxruntime 'openvino-dev>=2022.3' \\\n    # tensorflow tensorflowjs \\\n    --extra-index-url https://download.pytorch.org/whl/cpu\n\n# Create working directory\nRUN mkdir -p /usr/src/app\nWORKDIR /usr/src/app\n\n# Copy contents\n# COPY . /usr/src/app  (issues as not a .git directory)\nRUN git clone https://github.com/ultralytics/yolov5 /usr/src/app\nENV DEBIAN_FRONTEND teletype\n\n\n# Usage Examples -------------------------------------------------------------------------------------------------------\n\n# Build and Push\n# t=ultralytics/yolov5:latest-cpu && sudo docker build -f utils/docker/Dockerfile-cpu -t $t . && sudo docker push $t\n\n# Pull and Run\n# t=ultralytics/yolov5:latest-cpu && sudo docker pull $t && sudo docker run -it --ipc=host -v \"$(pwd)\"/datasets:/usr/src/datasets $t\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/utils/downloads.py",
    "content": "# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n\"\"\"\nDownload utils\n\"\"\"\n\nimport logging\nimport os\nimport subprocess\nimport urllib\nfrom pathlib import Path\n\nimport requests\nimport torch\n\n\ndef is_url(url, check=True):\n    # Check if string is URL and check if URL exists\n    try:\n        url = str(url)\n        result = urllib.parse.urlparse(url)\n        assert all([result.scheme, result.netloc])  # check if is url\n        return (urllib.request.urlopen(url).getcode() == 200) if check else True  # check if exists online\n    except (AssertionError, urllib.request.HTTPError):\n        return False\n\n\ndef gsutil_getsize(url=''):\n    # gs://bucket/file size https://cloud.google.com/storage/docs/gsutil/commands/du\n    output = subprocess.check_output(['gsutil', 'du', url], shell=True, encoding='utf-8')\n    if output:\n        return int(output.split()[0])\n    return 0\n\n\ndef url_getsize(url='https://ultralytics.com/images/bus.jpg'):\n    # Return downloadable file size in bytes\n    response = requests.head(url, allow_redirects=True)\n    return int(response.headers.get('content-length', -1))\n\n\ndef curl_download(url, filename, *, silent: bool = False) -> bool:\n    \"\"\"\n    Download a file from a url to a filename using curl.\n    \"\"\"\n    silent_option = 'sS' if silent else ''  # silent\n    proc = subprocess.run([\n        'curl',\n        '-#',\n        f'-{silent_option}L',\n        url,\n        '--output',\n        filename,\n        '--retry',\n        '9',\n        '-C',\n        '-',])\n    return proc.returncode == 0\n\n\ndef safe_download(file, url, url2=None, min_bytes=1E0, error_msg=''):\n    # Attempts to download file from url or url2, checks and removes incomplete downloads < min_bytes\n    from utils.general import LOGGER\n\n    file = Path(file)\n    assert_msg = f\"Downloaded file '{file}' does not exist or size is < min_bytes={min_bytes}\"\n    try:  # url1\n        LOGGER.info(f'Downloading {url} to {file}...')\n        torch.hub.download_url_to_file(url, str(file), progress=LOGGER.level <= logging.INFO)\n        assert file.exists() and file.stat().st_size > min_bytes, assert_msg  # check\n    except Exception as e:  # url2\n        if file.exists():\n            file.unlink()  # remove partial downloads\n        LOGGER.info(f'ERROR: {e}\\nRe-attempting {url2 or url} to {file}...')\n        # curl download, retry and resume on fail\n        curl_download(url2 or url, file)\n    finally:\n        if not file.exists() or file.stat().st_size < min_bytes:  # check\n            if file.exists():\n                file.unlink()  # remove partial downloads\n            LOGGER.info(f'ERROR: {assert_msg}\\n{error_msg}')\n        LOGGER.info('')\n\n\ndef attempt_download(file, repo='ultralytics/yolov5', release='v7.0'):\n    # Attempt file download from GitHub release assets if not found locally. release = 'latest', 'v7.0', etc.\n    from utils.general import LOGGER\n\n    def github_assets(repository, version='latest'):\n        # Return GitHub repo tag (i.e. 'v7.0') and assets (i.e. ['yolov5s.pt', 'yolov5m.pt', ...])\n        if version != 'latest':\n            version = f'tags/{version}'  # i.e. tags/v7.0\n        response = requests.get(f'https://api.github.com/repos/{repository}/releases/{version}').json()  # github api\n        return response['tag_name'], [x['name'] for x in response['assets']]  # tag, assets\n\n    file = Path(str(file).strip().replace(\"'\", ''))\n    if not file.exists():\n        # URL specified\n        name = Path(urllib.parse.unquote(str(file))).name  # decode '%2F' to '/' etc.\n        if str(file).startswith(('http:/', 'https:/')):  # download\n            url = str(file).replace(':/', '://')  # Pathlib turns :// -> :/\n            file = name.split('?')[0]  # parse authentication https://url.com/file.txt?auth...\n            if Path(file).is_file():\n                LOGGER.info(f'Found {url} locally at {file}')  # file already exists\n            else:\n                safe_download(file=file, url=url, min_bytes=1E5)\n            return file\n\n        # GitHub assets\n        assets = [f'yolov5{size}{suffix}.pt' for size in 'nsmlx' for suffix in ('', '6', '-cls', '-seg')]  # default\n        try:\n            tag, assets = github_assets(repo, release)\n        except Exception:\n            try:\n                tag, assets = github_assets(repo)  # latest release\n            except Exception:\n                try:\n                    tag = subprocess.check_output('git tag', shell=True, stderr=subprocess.STDOUT).decode().split()[-1]\n                except Exception:\n                    tag = release\n\n        file.parent.mkdir(parents=True, exist_ok=True)  # make parent dir (if required)\n        if name in assets:\n            safe_download(file,\n                          url=f'https://github.com/{repo}/releases/download/{tag}/{name}',\n                          min_bytes=1E5,\n                          error_msg=f'{file} missing, try downloading from https://github.com/{repo}/releases/{tag}')\n\n    return str(file)\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/utils/flask_rest_api/README.md",
    "content": "# Flask REST API\n\n[REST](https://en.wikipedia.org/wiki/Representational_state_transfer) [API](https://en.wikipedia.org/wiki/API)s are\ncommonly used to expose Machine Learning (ML)  models to other services. This folder contains an example REST API\ncreated using Flask to expose the YOLOv5s model from [PyTorch Hub](https://pytorch.org/hub/ultralytics_yolov5/).\n\n## Requirements\n\n[Flask](https://palletsprojects.com/p/flask/) is required. Install with:\n\n```shell\n$ pip install Flask\n```\n\n## Run\n\nAfter Flask installation run:\n\n```shell\n$ python3 restapi.py --port 5000\n```\n\nThen use [curl](https://curl.se/) to perform a request:\n\n```shell\n$ curl -X POST -F image=@zidane.jpg 'http://localhost:5000/v1/object-detection/yolov5s'\n```\n\nThe model inference results are returned as a JSON response:\n\n```json\n[\n  {\n    \"class\": 0,\n    \"confidence\": 0.8900438547,\n    \"height\": 0.9318675399,\n    \"name\": \"person\",\n    \"width\": 0.3264600933,\n    \"xcenter\": 0.7438579798,\n    \"ycenter\": 0.5207948685\n  },\n  {\n    \"class\": 0,\n    \"confidence\": 0.8440024257,\n    \"height\": 0.7155083418,\n    \"name\": \"person\",\n    \"width\": 0.6546785235,\n    \"xcenter\": 0.427829951,\n    \"ycenter\": 0.6334488392\n  },\n  {\n    \"class\": 27,\n    \"confidence\": 0.3771208823,\n    \"height\": 0.3902671337,\n    \"name\": \"tie\",\n    \"width\": 0.0696444362,\n    \"xcenter\": 0.3675483763,\n    \"ycenter\": 0.7991207838\n  },\n  {\n    \"class\": 27,\n    \"confidence\": 0.3527112305,\n    \"height\": 0.1540903747,\n    \"name\": \"tie\",\n    \"width\": 0.0336618312,\n    \"xcenter\": 0.7814827561,\n    \"ycenter\": 0.5065554976\n  }\n]\n```\n\nAn example python script to perform inference using [requests](https://docs.python-requests.org/en/master/) is given\nin `example_request.py`\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/utils/flask_rest_api/example_request.py",
    "content": "# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n\"\"\"\nPerform test request\n\"\"\"\n\nimport pprint\n\nimport requests\n\nDETECTION_URL = 'http://localhost:5000/v1/object-detection/yolov5s'\nIMAGE = 'zidane.jpg'\n\n# Read image\nwith open(IMAGE, 'rb') as f:\n    image_data = f.read()\n\nresponse = requests.post(DETECTION_URL, files={'image': image_data}).json()\n\npprint.pprint(response)\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/utils/flask_rest_api/restapi.py",
    "content": "# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n\"\"\"\nRun a Flask REST API exposing one or more YOLOv5s models\n\"\"\"\n\nimport argparse\nimport io\n\nimport torch\nfrom flask import Flask, request\nfrom PIL import Image\n\napp = Flask(__name__)\nmodels = {}\n\nDETECTION_URL = '/v1/object-detection/<model>'\n\n\n@app.route(DETECTION_URL, methods=['POST'])\ndef predict(model):\n    if request.method != 'POST':\n        return\n\n    if request.files.get('image'):\n        # Method 1\n        # with request.files[\"image\"] as f:\n        #     im = Image.open(io.BytesIO(f.read()))\n\n        # Method 2\n        im_file = request.files['image']\n        im_bytes = im_file.read()\n        im = Image.open(io.BytesIO(im_bytes))\n\n        if model in models:\n            results = models[model](im, size=640)  # reduce size=320 for faster inference\n            return results.pandas().xyxy[0].to_json(orient='records')\n\n\nif __name__ == '__main__':\n    parser = argparse.ArgumentParser(description='Flask API exposing YOLOv5 model')\n    parser.add_argument('--port', default=5000, type=int, help='port number')\n    parser.add_argument('--model', nargs='+', default=['yolov5s'], help='model(s) to run, i.e. --model yolov5n yolov5s')\n    opt = parser.parse_args()\n\n    for m in opt.model:\n        models[m] = torch.hub.load('ultralytics/yolov5', m, force_reload=True, skip_validation=True)\n\n    app.run(host='0.0.0.0', port=opt.port)  # debug=True causes Restarting with stat\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/utils/general.py",
    "content": "# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n\"\"\"\nGeneral utils\n\"\"\"\n\nimport contextlib\nimport glob\nimport inspect\nimport logging\nimport logging.config\nimport math\nimport os\nimport platform\nimport random\nimport re\nimport signal\nimport subprocess\nimport sys\nimport time\nimport urllib\nfrom copy import deepcopy\nfrom datetime import datetime\nfrom itertools import repeat\nfrom multiprocessing.pool import ThreadPool\nfrom pathlib import Path\nfrom subprocess import check_output\nfrom tarfile import is_tarfile\nfrom typing import Optional\nfrom zipfile import ZipFile, is_zipfile\n\nimport cv2\nimport IPython\nimport numpy as np\nimport pandas as pd\nimport pkg_resources as pkg\nimport torch\nimport torchvision\nimport yaml\n\nfrom utils import TryExcept, emojis\nfrom utils.downloads import curl_download, gsutil_getsize\nfrom utils.metrics import box_iou, fitness\n\nFILE = Path(__file__).resolve()\nROOT = FILE.parents[1]  # YOLOv5 root directory\nRANK = int(os.getenv('RANK', -1))\n\n# Settings\nNUM_THREADS = min(8, max(1, os.cpu_count() - 1))  # number of YOLOv5 multiprocessing threads\nDATASETS_DIR = Path(os.getenv('YOLOv5_DATASETS_DIR', ROOT.parent / 'datasets'))  # global datasets directory\nAUTOINSTALL = str(os.getenv('YOLOv5_AUTOINSTALL', True)).lower() == 'true'  # global auto-install mode\nVERBOSE = str(os.getenv('YOLOv5_VERBOSE', True)).lower() == 'true'  # global verbose mode\nTQDM_BAR_FORMAT = '{l_bar}{bar:10}{r_bar}'  # tqdm bar format\nFONT = 'Arial.ttf'  # https://ultralytics.com/assets/Arial.ttf\n\ntorch.set_printoptions(linewidth=320, precision=5, profile='long')\nnp.set_printoptions(linewidth=320, formatter={'float_kind': '{:11.5g}'.format})  # format short g, %precision=5\npd.options.display.max_columns = 10\ncv2.setNumThreads(0)  # prevent OpenCV from multithreading (incompatible with PyTorch DataLoader)\nos.environ['NUMEXPR_MAX_THREADS'] = str(NUM_THREADS)  # NumExpr max threads\nos.environ['OMP_NUM_THREADS'] = '1' if platform.system() == 'darwin' else str(NUM_THREADS)  # OpenMP (PyTorch and SciPy)\n\n\ndef is_ascii(s=''):\n    # Is string composed of all ASCII (no UTF) characters? (note str().isascii() introduced in python 3.7)\n    s = str(s)  # convert list, tuple, None, etc. to str\n    return len(s.encode().decode('ascii', 'ignore')) == len(s)\n\n\ndef is_chinese(s='人工智能'):\n    # Is string composed of any Chinese characters?\n    return bool(re.search('[\\u4e00-\\u9fff]', str(s)))\n\n\ndef is_colab():\n    # Is environment a Google Colab instance?\n    return 'google.colab' in sys.modules\n\n\ndef is_notebook():\n    # Is environment a Jupyter notebook? Verified on Colab, Jupyterlab, Kaggle, Paperspace\n    ipython_type = str(type(IPython.get_ipython()))\n    return 'colab' in ipython_type or 'zmqshell' in ipython_type\n\n\ndef is_kaggle():\n    # Is environment a Kaggle Notebook?\n    return os.environ.get('PWD') == '/kaggle/working' and os.environ.get('KAGGLE_URL_BASE') == 'https://www.kaggle.com'\n\n\ndef is_docker() -> bool:\n    \"\"\"Check if the process runs inside a docker container.\"\"\"\n    if Path('/.dockerenv').exists():\n        return True\n    try:  # check if docker is in control groups\n        with open('/proc/self/cgroup') as file:\n            return any('docker' in line for line in file)\n    except OSError:\n        return False\n\n\ndef is_writeable(dir, test=False):\n    # Return True if directory has write permissions, test opening a file with write permissions if test=True\n    if not test:\n        return os.access(dir, os.W_OK)  # possible issues on Windows\n    file = Path(dir) / 'tmp.txt'\n    try:\n        with open(file, 'w'):  # open file with write permissions\n            pass\n        file.unlink()  # remove file\n        return True\n    except OSError:\n        return False\n\n\nLOGGING_NAME = 'yolov5'\n\n\ndef set_logging(name=LOGGING_NAME, verbose=True):\n    # sets up logging for the given name\n    rank = int(os.getenv('RANK', -1))  # rank in world for Multi-GPU trainings\n    level = logging.INFO if verbose and rank in {-1, 0} else logging.ERROR\n    logging.config.dictConfig({\n        'version': 1,\n        'disable_existing_loggers': False,\n        'formatters': {\n            name: {\n                'format': '%(message)s'}},\n        'handlers': {\n            name: {\n                'class': 'logging.StreamHandler',\n                'formatter': name,\n                'level': level,}},\n        'loggers': {\n            name: {\n                'level': level,\n                'handlers': [name],\n                'propagate': False,}}})\n\n\nset_logging(LOGGING_NAME)  # run before defining LOGGER\nLOGGER = logging.getLogger(LOGGING_NAME)  # define globally (used in train.py, val.py, detect.py, etc.)\nif platform.system() == 'Windows':\n    for fn in LOGGER.info, LOGGER.warning:\n        setattr(LOGGER, fn.__name__, lambda x: fn(emojis(x)))  # emoji safe logging\n\n\ndef user_config_dir(dir='Ultralytics', env_var='YOLOV5_CONFIG_DIR'):\n    # Return path of user configuration directory. Prefer environment variable if exists. Make dir if required.\n    env = os.getenv(env_var)\n    if env:\n        path = Path(env)  # use environment variable\n    else:\n        cfg = {'Windows': 'AppData/Roaming', 'Linux': '.config', 'Darwin': 'Library/Application Support'}  # 3 OS dirs\n        path = Path.home() / cfg.get(platform.system(), '')  # OS-specific config dir\n        path = (path if is_writeable(path) else Path('/tmp')) / dir  # GCP and AWS lambda fix, only /tmp is writeable\n    path.mkdir(exist_ok=True)  # make if required\n    return path\n\n\nCONFIG_DIR = user_config_dir()  # Ultralytics settings dir\n\n\nclass Profile(contextlib.ContextDecorator):\n    # YOLOv5 Profile class. Usage: @Profile() decorator or 'with Profile():' context manager\n    def __init__(self, t=0.0):\n        self.t = t\n        self.cuda = torch.cuda.is_available()\n\n    def __enter__(self):\n        self.start = self.time()\n        return self\n\n    def __exit__(self, type, value, traceback):\n        self.dt = self.time() - self.start  # delta-time\n        self.t += self.dt  # accumulate dt\n\n    def time(self):\n        if self.cuda:\n            torch.cuda.synchronize()\n        return time.time()\n\n\nclass Timeout(contextlib.ContextDecorator):\n    # YOLOv5 Timeout class. Usage: @Timeout(seconds) decorator or 'with Timeout(seconds):' context manager\n    def __init__(self, seconds, *, timeout_msg='', suppress_timeout_errors=True):\n        self.seconds = int(seconds)\n        self.timeout_message = timeout_msg\n        self.suppress = bool(suppress_timeout_errors)\n\n    def _timeout_handler(self, signum, frame):\n        raise TimeoutError(self.timeout_message)\n\n    def __enter__(self):\n        if platform.system() != 'Windows':  # not supported on Windows\n            signal.signal(signal.SIGALRM, self._timeout_handler)  # Set handler for SIGALRM\n            signal.alarm(self.seconds)  # start countdown for SIGALRM to be raised\n\n    def __exit__(self, exc_type, exc_val, exc_tb):\n        if platform.system() != 'Windows':\n            signal.alarm(0)  # Cancel SIGALRM if it's scheduled\n            if self.suppress and exc_type is TimeoutError:  # Suppress TimeoutError\n                return True\n\n\nclass WorkingDirectory(contextlib.ContextDecorator):\n    # Usage: @WorkingDirectory(dir) decorator or 'with WorkingDirectory(dir):' context manager\n    def __init__(self, new_dir):\n        self.dir = new_dir  # new dir\n        self.cwd = Path.cwd().resolve()  # current dir\n\n    def __enter__(self):\n        os.chdir(self.dir)\n\n    def __exit__(self, exc_type, exc_val, exc_tb):\n        os.chdir(self.cwd)\n\n\ndef methods(instance):\n    # Get class/instance methods\n    return [f for f in dir(instance) if callable(getattr(instance, f)) and not f.startswith('__')]\n\n\ndef print_args(args: Optional[dict] = None, show_file=True, show_func=False):\n    # Print function arguments (optional args dict)\n    x = inspect.currentframe().f_back  # previous frame\n    file, _, func, _, _ = inspect.getframeinfo(x)\n    if args is None:  # get args automatically\n        args, _, _, frm = inspect.getargvalues(x)\n        args = {k: v for k, v in frm.items() if k in args}\n    try:\n        file = Path(file).resolve().relative_to(ROOT).with_suffix('')\n    except ValueError:\n        file = Path(file).stem\n    s = (f'{file}: ' if show_file else '') + (f'{func}: ' if show_func else '')\n    LOGGER.info(colorstr(s) + ', '.join(f'{k}={v}' for k, v in args.items()))\n\n\ndef init_seeds(seed=0, deterministic=False):\n    # Initialize random number generator (RNG) seeds https://pytorch.org/docs/stable/notes/randomness.html\n    random.seed(seed)\n    np.random.seed(seed)\n    torch.manual_seed(seed)\n    torch.cuda.manual_seed(seed)\n    torch.cuda.manual_seed_all(seed)  # for Multi-GPU, exception safe\n    # torch.backends.cudnn.benchmark = True  # AutoBatch problem https://github.com/ultralytics/yolov5/issues/9287\n    if deterministic and check_version(torch.__version__, '1.12.0'):  # https://github.com/ultralytics/yolov5/pull/8213\n        torch.use_deterministic_algorithms(True)\n        torch.backends.cudnn.deterministic = True\n        os.environ['CUBLAS_WORKSPACE_CONFIG'] = ':4096:8'\n        os.environ['PYTHONHASHSEED'] = str(seed)\n\n\ndef intersect_dicts(da, db, exclude=()):\n    # Dictionary intersection of matching keys and shapes, omitting 'exclude' keys, using da values\n    return {k: v for k, v in da.items() if k in db and all(x not in k for x in exclude) and v.shape == db[k].shape}\n\n\ndef get_default_args(func):\n    # Get func() default arguments\n    signature = inspect.signature(func)\n    return {k: v.default for k, v in signature.parameters.items() if v.default is not inspect.Parameter.empty}\n\n\ndef get_latest_run(search_dir='.'):\n    # Return path to most recent 'last.pt' in /runs (i.e. to --resume from)\n    last_list = glob.glob(f'{search_dir}/**/last*.pt', recursive=True)\n    return max(last_list, key=os.path.getctime) if last_list else ''\n\n\ndef file_age(path=__file__):\n    # Return days since last file update\n    dt = (datetime.now() - datetime.fromtimestamp(Path(path).stat().st_mtime))  # delta\n    return dt.days  # + dt.seconds / 86400  # fractional days\n\n\ndef file_date(path=__file__):\n    # Return human-readable file modification date, i.e. '2021-3-26'\n    t = datetime.fromtimestamp(Path(path).stat().st_mtime)\n    return f'{t.year}-{t.month}-{t.day}'\n\n\ndef file_size(path):\n    # Return file/dir size (MB)\n    mb = 1 << 20  # bytes to MiB (1024 ** 2)\n    path = Path(path)\n    if path.is_file():\n        return path.stat().st_size / mb\n    elif path.is_dir():\n        return sum(f.stat().st_size for f in path.glob('**/*') if f.is_file()) / mb\n    else:\n        return 0.0\n\n\ndef check_online():\n    # Check internet connectivity\n    import socket\n\n    def run_once():\n        # Check once\n        try:\n            socket.create_connection(('1.1.1.1', 443), 5)  # check host accessibility\n            return True\n        except OSError:\n            return False\n\n    return run_once() or run_once()  # check twice to increase robustness to intermittent connectivity issues\n\n\ndef git_describe(path=ROOT):  # path must be a directory\n    # Return human-readable git description, i.e. v5.0-5-g3e25f1e https://git-scm.com/docs/git-describe\n    try:\n        assert (Path(path) / '.git').is_dir()\n        return check_output(f'git -C {path} describe --tags --long --always', shell=True).decode()[:-1]\n    except Exception:\n        return ''\n\n\n@TryExcept()\n@WorkingDirectory(ROOT)\ndef check_git_status(repo='ultralytics/yolov5', branch='master'):\n    # YOLOv5 status check, recommend 'git pull' if code is out of date\n    url = f'https://github.com/{repo}'\n    msg = f', for updates see {url}'\n    s = colorstr('github: ')  # string\n    assert Path('.git').exists(), s + 'skipping check (not a git repository)' + msg\n    assert check_online(), s + 'skipping check (offline)' + msg\n\n    splits = re.split(pattern=r'\\s', string=check_output('git remote -v', shell=True).decode())\n    matches = [repo in s for s in splits]\n    if any(matches):\n        remote = splits[matches.index(True) - 1]\n    else:\n        remote = 'ultralytics'\n        check_output(f'git remote add {remote} {url}', shell=True)\n    check_output(f'git fetch {remote}', shell=True, timeout=5)  # git fetch\n    local_branch = check_output('git rev-parse --abbrev-ref HEAD', shell=True).decode().strip()  # checked out\n    n = int(check_output(f'git rev-list {local_branch}..{remote}/{branch} --count', shell=True))  # commits behind\n    if n > 0:\n        pull = 'git pull' if remote == 'origin' else f'git pull {remote} {branch}'\n        s += f\"⚠️ YOLOv5 is out of date by {n} commit{'s' * (n > 1)}. Use `{pull}` or `git clone {url}` to update.\"\n    else:\n        s += f'up to date with {url} ✅'\n    LOGGER.info(s)\n\n\n@WorkingDirectory(ROOT)\ndef check_git_info(path='.'):\n    # YOLOv5 git info check, return {remote, branch, commit}\n    check_requirements('gitpython')\n    import git\n    try:\n        repo = git.Repo(path)\n        remote = repo.remotes.origin.url.replace('.git', '')  # i.e. 'https://github.com/ultralytics/yolov5'\n        commit = repo.head.commit.hexsha  # i.e. '3134699c73af83aac2a481435550b968d5792c0d'\n        try:\n            branch = repo.active_branch.name  # i.e. 'main'\n        except TypeError:  # not on any branch\n            branch = None  # i.e. 'detached HEAD' state\n        return {'remote': remote, 'branch': branch, 'commit': commit}\n    except git.exc.InvalidGitRepositoryError:  # path is not a git dir\n        return {'remote': None, 'branch': None, 'commit': None}\n\n\ndef check_python(minimum='3.7.0'):\n    # Check current python version vs. required python version\n    check_version(platform.python_version(), minimum, name='Python ', hard=True)\n\n\ndef check_version(current='0.0.0', minimum='0.0.0', name='version ', pinned=False, hard=False, verbose=False):\n    # Check version vs. required version\n    current, minimum = (pkg.parse_version(x) for x in (current, minimum))\n    result = (current == minimum) if pinned else (current >= minimum)  # bool\n    s = f'WARNING ⚠️ {name}{minimum} is required by YOLOv5, but {name}{current} is currently installed'  # string\n    if hard:\n        assert result, emojis(s)  # assert min requirements met\n    if verbose and not result:\n        LOGGER.warning(s)\n    return result\n\n\n@TryExcept()\ndef check_requirements(requirements=ROOT / 'requirements.txt', exclude=(), install=True, cmds=''):\n    # Check installed dependencies meet YOLOv5 requirements (pass *.txt file or list of packages or single package str)\n    prefix = colorstr('red', 'bold', 'requirements:')\n    check_python()  # check python version\n    if isinstance(requirements, Path):  # requirements.txt file\n        file = requirements.resolve()\n        assert file.exists(), f'{prefix} {file} not found, check failed.'\n        with file.open() as f:\n            requirements = [f'{x.name}{x.specifier}' for x in pkg.parse_requirements(f) if x.name not in exclude]\n    elif isinstance(requirements, str):\n        requirements = [requirements]\n\n    s = ''\n    n = 0\n    for r in requirements:\n        try:\n            pkg.require(r)\n        except (pkg.VersionConflict, pkg.DistributionNotFound):  # exception if requirements not met\n            s += f'\"{r}\" '\n            n += 1\n\n    if s and install and AUTOINSTALL:  # check environment variable\n        LOGGER.info(f\"{prefix} YOLOv5 requirement{'s' * (n > 1)} {s}not found, attempting AutoUpdate...\")\n        try:\n            # assert check_online(), \"AutoUpdate skipped (offline)\"\n            LOGGER.info(check_output(f'pip install {s} {cmds}', shell=True).decode())\n            source = file if 'file' in locals() else requirements\n            s = f\"{prefix} {n} package{'s' * (n > 1)} updated per {source}\\n\" \\\n                f\"{prefix} ⚠️ {colorstr('bold', 'Restart runtime or rerun command for updates to take effect')}\\n\"\n            LOGGER.info(s)\n        except Exception as e:\n            LOGGER.warning(f'{prefix} ❌ {e}')\n\n\ndef check_img_size(imgsz, s=32, floor=0):\n    # Verify image size is a multiple of stride s in each dimension\n    if isinstance(imgsz, int):  # integer i.e. img_size=640\n        new_size = max(make_divisible(imgsz, int(s)), floor)\n    else:  # list i.e. img_size=[640, 480]\n        imgsz = list(imgsz)  # convert to list if tuple\n        new_size = [max(make_divisible(x, int(s)), floor) for x in imgsz]\n    if new_size != imgsz:\n        LOGGER.warning(f'WARNING ⚠️ --img-size {imgsz} must be multiple of max stride {s}, updating to {new_size}')\n    return new_size\n\n\ndef check_imshow(warn=False):\n    # Check if environment supports image displays\n    try:\n        assert not is_notebook()\n        assert not is_docker()\n        cv2.imshow('test', np.zeros((1, 1, 3)))\n        cv2.waitKey(1)\n        cv2.destroyAllWindows()\n        cv2.waitKey(1)\n        return True\n    except Exception as e:\n        if warn:\n            LOGGER.warning(f'WARNING ⚠️ Environment does not support cv2.imshow() or PIL Image.show()\\n{e}')\n        return False\n\n\ndef check_suffix(file='yolov5s.pt', suffix=('.pt',), msg=''):\n    # Check file(s) for acceptable suffix\n    if file and suffix:\n        if isinstance(suffix, str):\n            suffix = [suffix]\n        for f in file if isinstance(file, (list, tuple)) else [file]:\n            s = Path(f).suffix.lower()  # file suffix\n            if len(s):\n                assert s in suffix, f'{msg}{f} acceptable suffix is {suffix}'\n\n\ndef check_yaml(file, suffix=('.yaml', '.yml')):\n    # Search/download YAML file (if necessary) and return path, checking suffix\n    return check_file(file, suffix)\n\n\ndef check_file(file, suffix=''):\n    # Search/download file (if necessary) and return path\n    check_suffix(file, suffix)  # optional\n    file = str(file)  # convert to str()\n    if os.path.isfile(file) or not file:  # exists\n        return file\n    elif file.startswith(('http:/', 'https:/')):  # download\n        url = file  # warning: Pathlib turns :// -> :/\n        file = Path(urllib.parse.unquote(file).split('?')[0]).name  # '%2F' to '/', split https://url.com/file.txt?auth\n        if os.path.isfile(file):\n            LOGGER.info(f'Found {url} locally at {file}')  # file already exists\n        else:\n            LOGGER.info(f'Downloading {url} to {file}...')\n            torch.hub.download_url_to_file(url, file)\n            assert Path(file).exists() and Path(file).stat().st_size > 0, f'File download failed: {url}'  # check\n        return file\n    elif file.startswith('clearml://'):  # ClearML Dataset ID\n        assert 'clearml' in sys.modules, \"ClearML is not installed, so cannot use ClearML dataset. Try running 'pip install clearml'.\"\n        return file\n    else:  # search\n        files = []\n        for d in 'data', 'models', 'utils':  # search directories\n            files.extend(glob.glob(str(ROOT / d / '**' / file), recursive=True))  # find file\n        assert len(files), f'File not found: {file}'  # assert file was found\n        assert len(files) == 1, f\"Multiple files match '{file}', specify exact path: {files}\"  # assert unique\n        return files[0]  # return file\n\n\ndef check_font(font=FONT, progress=False):\n    # Download font to CONFIG_DIR if necessary\n    font = Path(font)\n    file = CONFIG_DIR / font.name\n    if not font.exists() and not file.exists():\n        url = f'https://ultralytics.com/assets/{font.name}'\n        LOGGER.info(f'Downloading {url} to {file}...')\n        torch.hub.download_url_to_file(url, str(file), progress=progress)\n\n\ndef check_dataset(data, autodownload=True):\n    # Download, check and/or unzip dataset if not found locally\n\n    # Download (optional)\n    extract_dir = ''\n    if isinstance(data, (str, Path)) and (is_zipfile(data) or is_tarfile(data)):\n        download(data, dir=f'{DATASETS_DIR}/{Path(data).stem}', unzip=True, delete=False, curl=False, threads=1)\n        data = next((DATASETS_DIR / Path(data).stem).rglob('*.yaml'))\n        extract_dir, autodownload = data.parent, False\n\n    # Read yaml (optional)\n    if isinstance(data, (str, Path)):\n        data = yaml_load(data)  # dictionary\n\n    # Checks\n    for k in 'train', 'val', 'names':\n        assert k in data, emojis(f\"data.yaml '{k}:' field missing ❌\")\n    if isinstance(data['names'], (list, tuple)):  # old array format\n        data['names'] = dict(enumerate(data['names']))  # convert to dict\n    assert all(isinstance(k, int) for k in data['names'].keys()), 'data.yaml names keys must be integers, i.e. 2: car'\n    data['nc'] = len(data['names'])\n\n    # Resolve paths\n    path = Path(extract_dir or data.get('path') or '')  # optional 'path' default to '.'\n    if not path.is_absolute():\n        path = (ROOT / path).resolve()\n        data['path'] = path  # download scripts\n    for k in 'train', 'val', 'test':\n        if data.get(k):  # prepend path\n            if isinstance(data[k], str):\n                x = (path / data[k]).resolve()\n                if not x.exists() and data[k].startswith('../'):\n                    x = (path / data[k][3:]).resolve()\n                data[k] = str(x)\n            else:\n                data[k] = [str((path / x).resolve()) for x in data[k]]\n\n    # Parse yaml\n    train, val, test, s = (data.get(x) for x in ('train', 'val', 'test', 'download'))\n    if val:\n        val = [Path(x).resolve() for x in (val if isinstance(val, list) else [val])]  # val path\n        if not all(x.exists() for x in val):\n            LOGGER.info('\\nDataset not found ⚠️, missing paths %s' % [str(x) for x in val if not x.exists()])\n            if not s or not autodownload:\n                raise Exception('Dataset not found ❌')\n            t = time.time()\n            if s.startswith('http') and s.endswith('.zip'):  # URL\n                f = Path(s).name  # filename\n                LOGGER.info(f'Downloading {s} to {f}...')\n                torch.hub.download_url_to_file(s, f)\n                Path(DATASETS_DIR).mkdir(parents=True, exist_ok=True)  # create root\n                unzip_file(f, path=DATASETS_DIR)  # unzip\n                Path(f).unlink()  # remove zip\n                r = None  # success\n            elif s.startswith('bash '):  # bash script\n                LOGGER.info(f'Running {s} ...')\n                r = subprocess.run(s, shell=True)\n            else:  # python script\n                r = exec(s, {'yaml': data})  # return None\n            dt = f'({round(time.time() - t, 1)}s)'\n            s = f\"success ✅ {dt}, saved to {colorstr('bold', DATASETS_DIR)}\" if r in (0, None) else f'failure {dt} ❌'\n            LOGGER.info(f'Dataset download {s}')\n    check_font('Arial.ttf' if is_ascii(data['names']) else 'Arial.Unicode.ttf', progress=True)  # download fonts\n    return data  # dictionary\n\n\ndef check_amp(model):\n    # Check PyTorch Automatic Mixed Precision (AMP) functionality. Return True on correct operation\n    from models.common import AutoShape, DetectMultiBackend\n\n    def amp_allclose(model, im):\n        # All close FP32 vs AMP results\n        m = AutoShape(model, verbose=False)  # model\n        a = m(im).xywhn[0]  # FP32 inference\n        m.amp = True\n        b = m(im).xywhn[0]  # AMP inference\n        return a.shape == b.shape and torch.allclose(a, b, atol=0.1)  # close to 10% absolute tolerance\n\n    prefix = colorstr('AMP: ')\n    device = next(model.parameters()).device  # get model device\n    if device.type in ('cpu', 'mps'):\n        return False  # AMP only used on CUDA devices\n    f = ROOT / 'data' / 'images' / 'bus.jpg'  # image to check\n    im = f if f.exists() else 'https://ultralytics.com/images/bus.jpg' if check_online() else np.ones((640, 640, 3))\n    try:\n        assert amp_allclose(deepcopy(model), im) or amp_allclose(DetectMultiBackend('yolov5n.pt', device), im)\n        LOGGER.info(f'{prefix}checks passed ✅')\n        return True\n    except Exception:\n        help_url = 'https://github.com/ultralytics/yolov5/issues/7908'\n        LOGGER.warning(f'{prefix}checks failed ❌, disabling Automatic Mixed Precision. See {help_url}')\n        return False\n\n\ndef yaml_load(file='data.yaml'):\n    # Single-line safe yaml loading\n    with open(file, errors='ignore') as f:\n        return yaml.safe_load(f)\n\n\ndef yaml_save(file='data.yaml', data={}):\n    # Single-line safe yaml saving\n    with open(file, 'w') as f:\n        yaml.safe_dump({k: str(v) if isinstance(v, Path) else v for k, v in data.items()}, f, sort_keys=False)\n\n\ndef unzip_file(file, path=None, exclude=('.DS_Store', '__MACOSX')):\n    # Unzip a *.zip file to path/, excluding files containing strings in exclude list\n    if path is None:\n        path = Path(file).parent  # default path\n    with ZipFile(file) as zipObj:\n        for f in zipObj.namelist():  # list all archived filenames in the zip\n            if all(x not in f for x in exclude):\n                zipObj.extract(f, path=path)\n\n\ndef url2file(url):\n    # Convert URL to filename, i.e. https://url.com/file.txt?auth -> file.txt\n    url = str(Path(url)).replace(':/', '://')  # Pathlib turns :// -> :/\n    return Path(urllib.parse.unquote(url)).name.split('?')[0]  # '%2F' to '/', split https://url.com/file.txt?auth\n\n\ndef download(url, dir='.', unzip=True, delete=True, curl=False, threads=1, retry=3):\n    # Multithreaded file download and unzip function, used in data.yaml for autodownload\n    def download_one(url, dir):\n        # Download 1 file\n        success = True\n        if os.path.isfile(url):\n            f = Path(url)  # filename\n        else:  # does not exist\n            f = dir / Path(url).name\n            LOGGER.info(f'Downloading {url} to {f}...')\n            for i in range(retry + 1):\n                if curl:\n                    success = curl_download(url, f, silent=(threads > 1))\n                else:\n                    torch.hub.download_url_to_file(url, f, progress=threads == 1)  # torch download\n                    success = f.is_file()\n                if success:\n                    break\n                elif i < retry:\n                    LOGGER.warning(f'⚠️ Download failure, retrying {i + 1}/{retry} {url}...')\n                else:\n                    LOGGER.warning(f'❌ Failed to download {url}...')\n\n        if unzip and success and (f.suffix == '.gz' or is_zipfile(f) or is_tarfile(f)):\n            LOGGER.info(f'Unzipping {f}...')\n            if is_zipfile(f):\n                unzip_file(f, dir)  # unzip\n            elif is_tarfile(f):\n                subprocess.run(['tar', 'xf', f, '--directory', f.parent], check=True)  # unzip\n            elif f.suffix == '.gz':\n                subprocess.run(['tar', 'xfz', f, '--directory', f.parent], check=True)  # unzip\n            if delete:\n                f.unlink()  # remove zip\n\n    dir = Path(dir)\n    dir.mkdir(parents=True, exist_ok=True)  # make directory\n    if threads > 1:\n        pool = ThreadPool(threads)\n        pool.imap(lambda x: download_one(*x), zip(url, repeat(dir)))  # multithreaded\n        pool.close()\n        pool.join()\n    else:\n        for u in [url] if isinstance(url, (str, Path)) else url:\n            download_one(u, dir)\n\n\ndef make_divisible(x, divisor):\n    # Returns nearest x divisible by divisor\n    if isinstance(divisor, torch.Tensor):\n        divisor = int(divisor.max())  # to int\n    return math.ceil(x / divisor) * divisor\n\n\ndef clean_str(s):\n    # Cleans a string by replacing special characters with underscore _\n    return re.sub(pattern='[|@#!¡·$€%&()=?¿^*;:,¨´><+]', repl='_', string=s)\n\n\ndef one_cycle(y1=0.0, y2=1.0, steps=100):\n    # lambda function for sinusoidal ramp from y1 to y2 https://arxiv.org/pdf/1812.01187.pdf\n    return lambda x: ((1 - math.cos(x * math.pi / steps)) / 2) * (y2 - y1) + y1\n\n\ndef colorstr(*input):\n    # Colors a string https://en.wikipedia.org/wiki/ANSI_escape_code, i.e.  colorstr('blue', 'hello world')\n    *args, string = input if len(input) > 1 else ('blue', 'bold', input[0])  # color arguments, string\n    colors = {\n        'black': '\\033[30m',  # basic colors\n        'red': '\\033[31m',\n        'green': '\\033[32m',\n        'yellow': '\\033[33m',\n        'blue': '\\033[34m',\n        'magenta': '\\033[35m',\n        'cyan': '\\033[36m',\n        'white': '\\033[37m',\n        'bright_black': '\\033[90m',  # bright colors\n        'bright_red': '\\033[91m',\n        'bright_green': '\\033[92m',\n        'bright_yellow': '\\033[93m',\n        'bright_blue': '\\033[94m',\n        'bright_magenta': '\\033[95m',\n        'bright_cyan': '\\033[96m',\n        'bright_white': '\\033[97m',\n        'end': '\\033[0m',  # misc\n        'bold': '\\033[1m',\n        'underline': '\\033[4m'}\n    return ''.join(colors[x] for x in args) + f'{string}' + colors['end']\n\n\ndef labels_to_class_weights(labels, nc=80):\n    # Get class weights (inverse frequency) from training labels\n    if labels[0] is None:  # no labels loaded\n        return torch.Tensor()\n\n    labels = np.concatenate(labels, 0)  # labels.shape = (866643, 5) for COCO\n    classes = labels[:, 0].astype(int)  # labels = [class xywh]\n    weights = np.bincount(classes, minlength=nc)  # occurrences per class\n\n    # Prepend gridpoint count (for uCE training)\n    # gpi = ((320 / 32 * np.array([1, 2, 4])) ** 2 * 3).sum()  # gridpoints per image\n    # weights = np.hstack([gpi * len(labels)  - weights.sum() * 9, weights * 9]) ** 0.5  # prepend gridpoints to start\n\n    weights[weights == 0] = 1  # replace empty bins with 1\n    weights = 1 / weights  # number of targets per class\n    weights /= weights.sum()  # normalize\n    return torch.from_numpy(weights).float()\n\n\ndef labels_to_image_weights(labels, nc=80, class_weights=np.ones(80)):\n    # Produces image weights based on class_weights and image contents\n    # Usage: index = random.choices(range(n), weights=image_weights, k=1)  # weighted image sample\n    class_counts = np.array([np.bincount(x[:, 0].astype(int), minlength=nc) for x in labels])\n    return (class_weights.reshape(1, nc) * class_counts).sum(1)\n\n\ndef coco80_to_coco91_class():  # converts 80-index (val2014) to 91-index (paper)\n    # https://tech.amikelive.com/node-718/what-object-categories-labels-are-in-coco-dataset/\n    # a = np.loadtxt('data/coco.names', dtype='str', delimiter='\\n')\n    # b = np.loadtxt('data/coco_paper.names', dtype='str', delimiter='\\n')\n    # x1 = [list(a[i] == b).index(True) + 1 for i in range(80)]  # darknet to coco\n    # x2 = [list(b[i] == a).index(True) if any(b[i] == a) else None for i in range(91)]  # coco to darknet\n    return [\n        1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 27, 28, 31, 32, 33, 34,\n        35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63,\n        64, 65, 67, 70, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 84, 85, 86, 87, 88, 89, 90]\n\n\ndef xyxy2xywh(x):\n    # Convert nx4 boxes from [x1, y1, x2, y2] to [x, y, w, h] where xy1=top-left, xy2=bottom-right\n    y = x.clone() if isinstance(x, torch.Tensor) else np.copy(x)\n    y[..., 0] = (x[..., 0] + x[..., 2]) / 2  # x center\n    y[..., 1] = (x[..., 1] + x[..., 3]) / 2  # y center\n    y[..., 2] = x[..., 2] - x[..., 0]  # width\n    y[..., 3] = x[..., 3] - x[..., 1]  # height\n    return y\n\n\ndef xywh2xyxy(x):\n    # Convert nx4 boxes from [x, y, w, h] to [x1, y1, x2, y2] where xy1=top-left, xy2=bottom-right\n    y = x.clone() if isinstance(x, torch.Tensor) else np.copy(x)\n    y[..., 0] = x[..., 0] - x[..., 2] / 2  # top left x\n    y[..., 1] = x[..., 1] - x[..., 3] / 2  # top left y\n    y[..., 2] = x[..., 0] + x[..., 2] / 2  # bottom right x\n    y[..., 3] = x[..., 1] + x[..., 3] / 2  # bottom right y\n    return y\n\n\ndef xywhn2xyxy(x, w=640, h=640, padw=0, padh=0):\n    # Convert nx4 boxes from [x, y, w, h] normalized to [x1, y1, x2, y2] where xy1=top-left, xy2=bottom-right\n    y = x.clone() if isinstance(x, torch.Tensor) else np.copy(x)\n    y[..., 0] = w * (x[..., 0] - x[..., 2] / 2) + padw  # top left x\n    y[..., 1] = h * (x[..., 1] - x[..., 3] / 2) + padh  # top left y\n    y[..., 2] = w * (x[..., 0] + x[..., 2] / 2) + padw  # bottom right x\n    y[..., 3] = h * (x[..., 1] + x[..., 3] / 2) + padh  # bottom right y\n    return y\n\n\ndef xyxy2xywhn(x, w=640, h=640, clip=False, eps=0.0):\n    # Convert nx4 boxes from [x1, y1, x2, y2] to [x, y, w, h] normalized where xy1=top-left, xy2=bottom-right\n    if clip:\n        clip_boxes(x, (h - eps, w - eps))  # warning: inplace clip\n    y = x.clone() if isinstance(x, torch.Tensor) else np.copy(x)\n    y[..., 0] = ((x[..., 0] + x[..., 2]) / 2) / w  # x center\n    y[..., 1] = ((x[..., 1] + x[..., 3]) / 2) / h  # y center\n    y[..., 2] = (x[..., 2] - x[..., 0]) / w  # width\n    y[..., 3] = (x[..., 3] - x[..., 1]) / h  # height\n    return y\n\n\ndef xyn2xy(x, w=640, h=640, padw=0, padh=0):\n    # Convert normalized segments into pixel segments, shape (n,2)\n    y = x.clone() if isinstance(x, torch.Tensor) else np.copy(x)\n    y[..., 0] = w * x[..., 0] + padw  # top left x\n    y[..., 1] = h * x[..., 1] + padh  # top left y\n    return y\n\n\ndef segment2box(segment, width=640, height=640):\n    # Convert 1 segment label to 1 box label, applying inside-image constraint, i.e. (xy1, xy2, ...) to (xyxy)\n    x, y = segment.T  # segment xy\n    inside = (x >= 0) & (y >= 0) & (x <= width) & (y <= height)\n    x, y, = x[inside], y[inside]\n    return np.array([x.min(), y.min(), x.max(), y.max()]) if any(x) else np.zeros((1, 4))  # xyxy\n\n\ndef segments2boxes(segments):\n    # Convert segment labels to box labels, i.e. (cls, xy1, xy2, ...) to (cls, xywh)\n    boxes = []\n    for s in segments:\n        x, y = s.T  # segment xy\n        boxes.append([x.min(), y.min(), x.max(), y.max()])  # cls, xyxy\n    return xyxy2xywh(np.array(boxes))  # cls, xywh\n\n\ndef resample_segments(segments, n=1000):\n    # Up-sample an (n,2) segment\n    for i, s in enumerate(segments):\n        s = np.concatenate((s, s[0:1, :]), axis=0)\n        x = np.linspace(0, len(s) - 1, n)\n        xp = np.arange(len(s))\n        segments[i] = np.concatenate([np.interp(x, xp, s[:, i]) for i in range(2)]).reshape(2, -1).T  # segment xy\n    return segments\n\n\ndef scale_boxes(img1_shape, boxes, img0_shape, ratio_pad=None):\n    # Rescale boxes (xyxy) from img1_shape to img0_shape\n    if ratio_pad is None:  # calculate from img0_shape\n        gain = min(img1_shape[0] / img0_shape[0], img1_shape[1] / img0_shape[1])  # gain  = old / new\n        pad = (img1_shape[1] - img0_shape[1] * gain) / 2, (img1_shape[0] - img0_shape[0] * gain) / 2  # wh padding\n    else:\n        gain = ratio_pad[0][0]\n        pad = ratio_pad[1]\n\n    boxes[..., [0, 2]] -= pad[0]  # x padding\n    boxes[..., [1, 3]] -= pad[1]  # y padding\n    boxes[..., :4] /= gain\n    clip_boxes(boxes, img0_shape)\n    return boxes\n\n\ndef scale_segments(img1_shape, segments, img0_shape, ratio_pad=None, normalize=False):\n    # Rescale coords (xyxy) from img1_shape to img0_shape\n    if ratio_pad is None:  # calculate from img0_shape\n        gain = min(img1_shape[0] / img0_shape[0], img1_shape[1] / img0_shape[1])  # gain  = old / new\n        pad = (img1_shape[1] - img0_shape[1] * gain) / 2, (img1_shape[0] - img0_shape[0] * gain) / 2  # wh padding\n    else:\n        gain = ratio_pad[0][0]\n        pad = ratio_pad[1]\n\n    segments[:, 0] -= pad[0]  # x padding\n    segments[:, 1] -= pad[1]  # y padding\n    segments /= gain\n    clip_segments(segments, img0_shape)\n    if normalize:\n        segments[:, 0] /= img0_shape[1]  # width\n        segments[:, 1] /= img0_shape[0]  # height\n    return segments\n\n\ndef clip_boxes(boxes, shape):\n    # Clip boxes (xyxy) to image shape (height, width)\n    if isinstance(boxes, torch.Tensor):  # faster individually\n        boxes[..., 0].clamp_(0, shape[1])  # x1\n        boxes[..., 1].clamp_(0, shape[0])  # y1\n        boxes[..., 2].clamp_(0, shape[1])  # x2\n        boxes[..., 3].clamp_(0, shape[0])  # y2\n    else:  # np.array (faster grouped)\n        boxes[..., [0, 2]] = boxes[..., [0, 2]].clip(0, shape[1])  # x1, x2\n        boxes[..., [1, 3]] = boxes[..., [1, 3]].clip(0, shape[0])  # y1, y2\n\n\ndef clip_segments(segments, shape):\n    # Clip segments (xy1,xy2,...) to image shape (height, width)\n    if isinstance(segments, torch.Tensor):  # faster individually\n        segments[:, 0].clamp_(0, shape[1])  # x\n        segments[:, 1].clamp_(0, shape[0])  # y\n    else:  # np.array (faster grouped)\n        segments[:, 0] = segments[:, 0].clip(0, shape[1])  # x\n        segments[:, 1] = segments[:, 1].clip(0, shape[0])  # y\n\n\ndef non_max_suppression(\n        prediction,\n        conf_thres=0.25,\n        iou_thres=0.45,\n        classes=None,\n        agnostic=False,\n        multi_label=False,\n        labels=(),\n        max_det=300,\n        nm=0,  # number of masks\n):\n    \"\"\"Non-Maximum Suppression (NMS) on inference results to reject overlapping detections\n\n    Returns:\n         list of detections, on (n,6) tensor per image [xyxy, conf, cls]\n    \"\"\"\n\n    # Checks\n    assert 0 <= conf_thres <= 1, f'Invalid Confidence threshold {conf_thres}, valid values are between 0.0 and 1.0'\n    assert 0 <= iou_thres <= 1, f'Invalid IoU {iou_thres}, valid values are between 0.0 and 1.0'\n    if isinstance(prediction, (list, tuple)):  # YOLOv5 model in validation model, output = (inference_out, loss_out)\n        prediction = prediction[0]  # select only inference output\n\n    device = prediction.device\n    mps = 'mps' in device.type  # Apple MPS\n    if mps:  # MPS not fully supported yet, convert tensors to CPU before NMS\n        prediction = prediction.cpu()\n    bs = prediction.shape[0]  # batch size\n    nc = prediction.shape[2] - nm - 5  # number of classes\n    xc = prediction[..., 4] > conf_thres  # candidates\n\n    # Settings\n    # min_wh = 2  # (pixels) minimum box width and height\n    max_wh = 7680  # (pixels) maximum box width and height\n    max_nms = 30000  # maximum number of boxes into torchvision.ops.nms()\n    time_limit = 0.5 + 0.05 * bs  # seconds to quit after\n    redundant = True  # require redundant detections\n    multi_label &= nc > 1  # multiple labels per box (adds 0.5ms/img)\n    merge = False  # use merge-NMS\n\n    t = time.time()\n    mi = 5 + nc  # mask start index\n    output = [torch.zeros((0, 6 + nm), device=prediction.device)] * bs\n    for xi, x in enumerate(prediction):  # image index, image inference\n        # Apply constraints\n        # x[((x[..., 2:4] < min_wh) | (x[..., 2:4] > max_wh)).any(1), 4] = 0  # width-height\n        x = x[xc[xi]]  # confidence\n\n        # Cat apriori labels if autolabelling\n        if labels and len(labels[xi]):\n            lb = labels[xi]\n            v = torch.zeros((len(lb), nc + nm + 5), device=x.device)\n            v[:, :4] = lb[:, 1:5]  # box\n            v[:, 4] = 1.0  # conf\n            v[range(len(lb)), lb[:, 0].long() + 5] = 1.0  # cls\n            x = torch.cat((x, v), 0)\n\n        # If none remain process next image\n        if not x.shape[0]:\n            continue\n\n        # Compute conf\n        x[:, 5:] *= x[:, 4:5]  # conf = obj_conf * cls_conf\n\n        # Box/Mask\n        box = xywh2xyxy(x[:, :4])  # center_x, center_y, width, height) to (x1, y1, x2, y2)\n        mask = x[:, mi:]  # zero columns if no masks\n\n        # Detections matrix nx6 (xyxy, conf, cls)\n        if multi_label:\n            i, j = (x[:, 5:mi] > conf_thres).nonzero(as_tuple=False).T\n            x = torch.cat((box[i], x[i, 5 + j, None], j[:, None].float(), mask[i]), 1)\n        else:  # best class only\n            conf, j = x[:, 5:mi].max(1, keepdim=True)\n            x = torch.cat((box, conf, j.float(), mask), 1)[conf.view(-1) > conf_thres]\n\n        # Filter by class\n        if classes is not None:\n            x = x[(x[:, 5:6] == torch.tensor(classes, device=x.device)).any(1)]\n\n        # Apply finite constraint\n        # if not torch.isfinite(x).all():\n        #     x = x[torch.isfinite(x).all(1)]\n\n        # Check shape\n        n = x.shape[0]  # number of boxes\n        if not n:  # no boxes\n            continue\n        x = x[x[:, 4].argsort(descending=True)[:max_nms]]  # sort by confidence and remove excess boxes\n\n        # Batched NMS\n        c = x[:, 5:6] * (0 if agnostic else max_wh)  # classes\n        boxes, scores = x[:, :4] + c, x[:, 4]  # boxes (offset by class), scores\n        i = torchvision.ops.nms(boxes, scores, iou_thres)  # NMS\n        i = i[:max_det]  # limit detections\n        if merge and (1 < n < 3E3):  # Merge NMS (boxes merged using weighted mean)\n            # update boxes as boxes(i,4) = weights(i,n) * boxes(n,4)\n            iou = box_iou(boxes[i], boxes) > iou_thres  # iou matrix\n            weights = iou * scores[None]  # box weights\n            x[i, :4] = torch.mm(weights, x[:, :4]).float() / weights.sum(1, keepdim=True)  # merged boxes\n            if redundant:\n                i = i[iou.sum(1) > 1]  # require redundancy\n\n        output[xi] = x[i]\n        if mps:\n            output[xi] = output[xi].to(device)\n        if (time.time() - t) > time_limit:\n            LOGGER.warning(f'WARNING ⚠️ NMS time limit {time_limit:.3f}s exceeded')\n            break  # time limit exceeded\n\n    return output\n\n\ndef strip_optimizer(f='best.pt', s=''):  # from utils.general import *; strip_optimizer()\n    # Strip optimizer from 'f' to finalize training, optionally save as 's'\n    x = torch.load(f, map_location=torch.device('cpu'))\n    if x.get('ema'):\n        x['model'] = x['ema']  # replace model with ema\n    for k in 'optimizer', 'best_fitness', 'ema', 'updates':  # keys\n        x[k] = None\n    x['epoch'] = -1\n    x['model'].half()  # to FP16\n    for p in x['model'].parameters():\n        p.requires_grad = False\n    torch.save(x, s or f)\n    mb = os.path.getsize(s or f) / 1E6  # filesize\n    LOGGER.info(f\"Optimizer stripped from {f},{f' saved as {s},' if s else ''} {mb:.1f}MB\")\n\n\ndef print_mutation(keys, results, hyp, save_dir, bucket, prefix=colorstr('evolve: ')):\n    evolve_csv = save_dir / 'evolve.csv'\n    evolve_yaml = save_dir / 'hyp_evolve.yaml'\n    keys = tuple(keys) + tuple(hyp.keys())  # [results + hyps]\n    keys = tuple(x.strip() for x in keys)\n    vals = results + tuple(hyp.values())\n    n = len(keys)\n\n    # Download (optional)\n    if bucket:\n        url = f'gs://{bucket}/evolve.csv'\n        if gsutil_getsize(url) > (evolve_csv.stat().st_size if evolve_csv.exists() else 0):\n            subprocess.run(['gsutil', 'cp', f'{url}', f'{save_dir}'])  # download evolve.csv if larger than local\n\n    # Log to evolve.csv\n    s = '' if evolve_csv.exists() else (('%20s,' * n % keys).rstrip(',') + '\\n')  # add header\n    with open(evolve_csv, 'a') as f:\n        f.write(s + ('%20.5g,' * n % vals).rstrip(',') + '\\n')\n\n    # Save yaml\n    with open(evolve_yaml, 'w') as f:\n        data = pd.read_csv(evolve_csv, skipinitialspace=True)\n        data = data.rename(columns=lambda x: x.strip())  # strip keys\n        i = np.argmax(fitness(data.values[:, :4]))  #\n        generations = len(data)\n        f.write('# YOLOv5 Hyperparameter Evolution Results\\n' + f'# Best generation: {i}\\n' +\n                f'# Last generation: {generations - 1}\\n' + '# ' + ', '.join(f'{x.strip():>20s}' for x in keys[:7]) +\n                '\\n' + '# ' + ', '.join(f'{x:>20.5g}' for x in data.values[i, :7]) + '\\n\\n')\n        yaml.safe_dump(data.loc[i][7:].to_dict(), f, sort_keys=False)\n\n    # Print to screen\n    LOGGER.info(prefix + f'{generations} generations finished, current result:\\n' + prefix +\n                ', '.join(f'{x.strip():>20s}' for x in keys) + '\\n' + prefix + ', '.join(f'{x:20.5g}'\n                                                                                         for x in vals) + '\\n\\n')\n\n    if bucket:\n        subprocess.run(['gsutil', 'cp', f'{evolve_csv}', f'{evolve_yaml}', f'gs://{bucket}'])  # upload\n\n\ndef apply_classifier(x, model, img, im0):\n    # Apply a second stage classifier to YOLO outputs\n    # Example model = torchvision.models.__dict__['efficientnet_b0'](pretrained=True).to(device).eval()\n    im0 = [im0] if isinstance(im0, np.ndarray) else im0\n    for i, d in enumerate(x):  # per image\n        if d is not None and len(d):\n            d = d.clone()\n\n            # Reshape and pad cutouts\n            b = xyxy2xywh(d[:, :4])  # boxes\n            b[:, 2:] = b[:, 2:].max(1)[0].unsqueeze(1)  # rectangle to square\n            b[:, 2:] = b[:, 2:] * 1.3 + 30  # pad\n            d[:, :4] = xywh2xyxy(b).long()\n\n            # Rescale boxes from img_size to im0 size\n            scale_boxes(img.shape[2:], d[:, :4], im0[i].shape)\n\n            # Classes\n            pred_cls1 = d[:, 5].long()\n            ims = []\n            for a in d:\n                cutout = im0[i][int(a[1]):int(a[3]), int(a[0]):int(a[2])]\n                im = cv2.resize(cutout, (224, 224))  # BGR\n\n                im = im[:, :, ::-1].transpose(2, 0, 1)  # BGR to RGB, to 3x416x416\n                im = np.ascontiguousarray(im, dtype=np.float32)  # uint8 to float32\n                im /= 255  # 0 - 255 to 0.0 - 1.0\n                ims.append(im)\n\n            pred_cls2 = model(torch.Tensor(ims).to(d.device)).argmax(1)  # classifier prediction\n            x[i] = x[i][pred_cls1 == pred_cls2]  # retain matching class detections\n\n    return x\n\n\ndef increment_path(path, exist_ok=False, sep='', mkdir=False):\n    # Increment file or directory path, i.e. runs/exp --> runs/exp{sep}2, runs/exp{sep}3, ... etc.\n    path = Path(path)  # os-agnostic\n    if path.exists() and not exist_ok:\n        path, suffix = (path.with_suffix(''), path.suffix) if path.is_file() else (path, '')\n\n        # Method 1\n        for n in range(2, 9999):\n            p = f'{path}{sep}{n}{suffix}'  # increment path\n            if not os.path.exists(p):  #\n                break\n        path = Path(p)\n\n        # Method 2 (deprecated)\n        # dirs = glob.glob(f\"{path}{sep}*\")  # similar paths\n        # matches = [re.search(rf\"{path.stem}{sep}(\\d+)\", d) for d in dirs]\n        # i = [int(m.groups()[0]) for m in matches if m]  # indices\n        # n = max(i) + 1 if i else 2  # increment number\n        # path = Path(f\"{path}{sep}{n}{suffix}\")  # increment path\n\n    if mkdir:\n        path.mkdir(parents=True, exist_ok=True)  # make directory\n\n    return path\n\n\n# OpenCV Multilanguage-friendly functions ------------------------------------------------------------------------------------\nimshow_ = cv2.imshow  # copy to avoid recursion errors\n\n\ndef imread(path, flags=cv2.IMREAD_COLOR):\n    return cv2.imdecode(np.fromfile(path, np.uint8), flags)\n\n\ndef imwrite(path, im):\n    try:\n        cv2.imencode(Path(path).suffix, im)[1].tofile(path)\n        return True\n    except Exception:\n        return False\n\n\ndef imshow(path, im):\n    imshow_(path.encode('unicode_escape').decode(), im)\n\n\ncv2.imread, cv2.imwrite, cv2.imshow = imread, imwrite, imshow  # redefine\n\n# Variables ------------------------------------------------------------------------------------------------------------\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/utils/google_app_engine/Dockerfile",
    "content": "FROM gcr.io/google-appengine/python\n\n# Create a virtualenv for dependencies. This isolates these packages from\n# system-level packages.\n# Use -p python3 or -p python3.7 to select python version. Default is version 2.\nRUN virtualenv /env -p python3\n\n# Setting these environment variables are the same as running\n# source /env/bin/activate.\nENV VIRTUAL_ENV /env\nENV PATH /env/bin:$PATH\n\nRUN apt-get update && apt-get install -y python-opencv\n\n# Copy the application's requirements.txt and run pip to install all\n# dependencies into the virtualenv.\nADD requirements.txt /app/requirements.txt\nRUN pip install -r /app/requirements.txt\n\n# Add the application source code.\nADD . /app\n\n# Run a WSGI server to serve the application. gunicorn must be declared as\n# a dependency in requirements.txt.\nCMD gunicorn -b :$PORT main:app\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/utils/google_app_engine/additional_requirements.txt",
    "content": "# add these requirements in your app on top of the existing ones\npip==21.1\nFlask==1.0.2\ngunicorn==19.10.0\nwerkzeug>=2.2.3 # not directly required, pinned by Snyk to avoid a vulnerability\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/utils/google_app_engine/app.yaml",
    "content": "runtime: custom\nenv: flex\n\nservice: yolov5app\n\nliveness_check:\n  initial_delay_sec: 600\n\nmanual_scaling:\n  instances: 1\nresources:\n  cpu: 1\n  memory_gb: 4\n  disk_size_gb: 20\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/utils/loggers/__init__.py",
    "content": "# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n\"\"\"\nLogging utils\n\"\"\"\n\nimport os\nimport warnings\nfrom pathlib import Path\n\nimport pkg_resources as pkg\nimport torch\nfrom torch.utils.tensorboard import SummaryWriter\n\nfrom utils.general import LOGGER, colorstr, cv2\nfrom utils.loggers.clearml.clearml_utils import ClearmlLogger\nfrom utils.loggers.wandb.wandb_utils import WandbLogger\nfrom utils.plots import plot_images, plot_labels, plot_results\nfrom utils.torch_utils import de_parallel\n\nLOGGERS = ('csv', 'tb', 'wandb', 'clearml', 'comet')  # *.csv, TensorBoard, Weights & Biases, ClearML\nRANK = int(os.getenv('RANK', -1))\n\ntry:\n    import wandb\n\n    assert hasattr(wandb, '__version__')  # verify package import not local dir\n    if pkg.parse_version(wandb.__version__) >= pkg.parse_version('0.12.2') and RANK in {0, -1}:\n        try:\n            wandb_login_success = wandb.login(timeout=30)\n        except wandb.errors.UsageError:  # known non-TTY terminal issue\n            wandb_login_success = False\n        if not wandb_login_success:\n            wandb = None\nexcept (ImportError, AssertionError):\n    wandb = None\n\ntry:\n    import clearml\n\n    assert hasattr(clearml, '__version__')  # verify package import not local dir\nexcept (ImportError, AssertionError):\n    clearml = None\n\ntry:\n    if RANK not in [0, -1]:\n        comet_ml = None\n    else:\n        import comet_ml\n\n        assert hasattr(comet_ml, '__version__')  # verify package import not local dir\n        from utils.loggers.comet import CometLogger\n\nexcept (ModuleNotFoundError, ImportError, AssertionError):\n    comet_ml = None\n\n\nclass Loggers():\n    # YOLOv5 Loggers class\n    def __init__(self, save_dir=None, weights=None, opt=None, hyp=None, logger=None, include=LOGGERS):\n        self.save_dir = save_dir\n        self.weights = weights\n        self.opt = opt\n        self.hyp = hyp\n        self.plots = not opt.noplots  # plot results\n        self.logger = logger  # for printing results to console\n        self.include = include\n        self.keys = [\n            'train/box_loss',\n            'train/obj_loss',\n            'train/cls_loss',  # train loss\n            'metrics/precision',\n            'metrics/recall',\n            'metrics/mAP_0.5',\n            'metrics/mAP_0.5:0.95',  # metrics\n            'val/box_loss',\n            'val/obj_loss',\n            'val/cls_loss',  # val loss\n            'x/lr0',\n            'x/lr1',\n            'x/lr2']  # params\n        self.best_keys = ['best/epoch', 'best/precision', 'best/recall', 'best/mAP_0.5', 'best/mAP_0.5:0.95']\n        for k in LOGGERS:\n            setattr(self, k, None)  # init empty logger dictionary\n        self.csv = True  # always log to csv\n\n        # Messages\n        if not clearml:\n            prefix = colorstr('ClearML: ')\n            s = f\"{prefix}run 'pip install clearml' to automatically track, visualize and remotely train YOLOv5 🚀 in ClearML\"\n            self.logger.info(s)\n        if not comet_ml:\n            prefix = colorstr('Comet: ')\n            s = f\"{prefix}run 'pip install comet_ml' to automatically track and visualize YOLOv5 🚀 runs in Comet\"\n            self.logger.info(s)\n        # TensorBoard\n        s = self.save_dir\n        if 'tb' in self.include and not self.opt.evolve:\n            prefix = colorstr('TensorBoard: ')\n            self.logger.info(f\"{prefix}Start with 'tensorboard --logdir {s.parent}', view at http://localhost:6006/\")\n            self.tb = SummaryWriter(str(s))\n\n        # W&B\n        if wandb and 'wandb' in self.include:\n            self.opt.hyp = self.hyp  # add hyperparameters\n            self.wandb = WandbLogger(self.opt)\n        else:\n            self.wandb = None\n\n        # ClearML\n        if clearml and 'clearml' in self.include:\n            try:\n                self.clearml = ClearmlLogger(self.opt, self.hyp)\n            except Exception:\n                self.clearml = None\n                prefix = colorstr('ClearML: ')\n                LOGGER.warning(f'{prefix}WARNING ⚠️ ClearML is installed but not configured, skipping ClearML logging.'\n                               f' See https://github.com/ultralytics/yolov5/tree/master/utils/loggers/clearml#readme')\n\n        else:\n            self.clearml = None\n\n        # Comet\n        if comet_ml and 'comet' in self.include:\n            if isinstance(self.opt.resume, str) and self.opt.resume.startswith('comet://'):\n                run_id = self.opt.resume.split('/')[-1]\n                self.comet_logger = CometLogger(self.opt, self.hyp, run_id=run_id)\n\n            else:\n                self.comet_logger = CometLogger(self.opt, self.hyp)\n\n        else:\n            self.comet_logger = None\n\n    @property\n    def remote_dataset(self):\n        # Get data_dict if custom dataset artifact link is provided\n        data_dict = None\n        if self.clearml:\n            data_dict = self.clearml.data_dict\n        if self.wandb:\n            data_dict = self.wandb.data_dict\n        if self.comet_logger:\n            data_dict = self.comet_logger.data_dict\n\n        return data_dict\n\n    def on_train_start(self):\n        if self.comet_logger:\n            self.comet_logger.on_train_start()\n\n    def on_pretrain_routine_start(self):\n        if self.comet_logger:\n            self.comet_logger.on_pretrain_routine_start()\n\n    def on_pretrain_routine_end(self, labels, names):\n        # Callback runs on pre-train routine end\n        if self.plots:\n            plot_labels(labels, names, self.save_dir)\n            paths = self.save_dir.glob('*labels*.jpg')  # training labels\n            if self.wandb:\n                self.wandb.log({'Labels': [wandb.Image(str(x), caption=x.name) for x in paths]})\n            # if self.clearml:\n            #    pass  # ClearML saves these images automatically using hooks\n            if self.comet_logger:\n                self.comet_logger.on_pretrain_routine_end(paths)\n\n    def on_train_batch_end(self, model, ni, imgs, targets, paths, vals):\n        log_dict = dict(zip(self.keys[:3], vals))\n        # Callback runs on train batch end\n        # ni: number integrated batches (since train start)\n        if self.plots:\n            if ni < 3:\n                f = self.save_dir / f'train_batch{ni}.jpg'  # filename\n                plot_images(imgs, targets, paths, f)\n                if ni == 0 and self.tb and not self.opt.sync_bn:\n                    log_tensorboard_graph(self.tb, model, imgsz=(self.opt.imgsz, self.opt.imgsz))\n            if ni == 10 and (self.wandb or self.clearml):\n                files = sorted(self.save_dir.glob('train*.jpg'))\n                if self.wandb:\n                    self.wandb.log({'Mosaics': [wandb.Image(str(f), caption=f.name) for f in files if f.exists()]})\n                if self.clearml:\n                    self.clearml.log_debug_samples(files, title='Mosaics')\n\n        if self.comet_logger:\n            self.comet_logger.on_train_batch_end(log_dict, step=ni)\n\n    def on_train_epoch_end(self, epoch):\n        # Callback runs on train epoch end\n        if self.wandb:\n            self.wandb.current_epoch = epoch + 1\n\n        if self.comet_logger:\n            self.comet_logger.on_train_epoch_end(epoch)\n\n    def on_val_start(self):\n        if self.comet_logger:\n            self.comet_logger.on_val_start()\n\n    def on_val_image_end(self, pred, predn, path, names, im):\n        # Callback runs on val image end\n        if self.wandb:\n            self.wandb.val_one_image(pred, predn, path, names, im)\n        if self.clearml:\n            self.clearml.log_image_with_boxes(path, pred, names, im)\n\n    def on_val_batch_end(self, batch_i, im, targets, paths, shapes, out):\n        if self.comet_logger:\n            self.comet_logger.on_val_batch_end(batch_i, im, targets, paths, shapes, out)\n\n    def on_val_end(self, nt, tp, fp, p, r, f1, ap, ap50, ap_class, confusion_matrix):\n        # Callback runs on val end\n        if self.wandb or self.clearml:\n            files = sorted(self.save_dir.glob('val*.jpg'))\n        if self.wandb:\n            self.wandb.log({'Validation': [wandb.Image(str(f), caption=f.name) for f in files]})\n        if self.clearml:\n            self.clearml.log_debug_samples(files, title='Validation')\n\n        if self.comet_logger:\n            self.comet_logger.on_val_end(nt, tp, fp, p, r, f1, ap, ap50, ap_class, confusion_matrix)\n\n    def on_fit_epoch_end(self, vals, epoch, best_fitness, fi):\n        # Callback runs at the end of each fit (train+val) epoch\n        x = dict(zip(self.keys, vals))\n        if self.csv:\n            file = self.save_dir / 'results.csv'\n            n = len(x) + 1  # number of cols\n            s = '' if file.exists() else (('%20s,' * n % tuple(['epoch'] + self.keys)).rstrip(',') + '\\n')  # add header\n            with open(file, 'a') as f:\n                f.write(s + ('%20.5g,' * n % tuple([epoch] + vals)).rstrip(',') + '\\n')\n\n        if self.tb:\n            for k, v in x.items():\n                self.tb.add_scalar(k, v, epoch)\n        elif self.clearml:  # log to ClearML if TensorBoard not used\n            for k, v in x.items():\n                title, series = k.split('/')\n                self.clearml.task.get_logger().report_scalar(title, series, v, epoch)\n\n        if self.wandb:\n            if best_fitness == fi:\n                best_results = [epoch] + vals[3:7]\n                for i, name in enumerate(self.best_keys):\n                    self.wandb.wandb_run.summary[name] = best_results[i]  # log best results in the summary\n            self.wandb.log(x)\n            self.wandb.end_epoch()\n\n        if self.clearml:\n            self.clearml.current_epoch_logged_images = set()  # reset epoch image limit\n            self.clearml.current_epoch += 1\n\n        if self.comet_logger:\n            self.comet_logger.on_fit_epoch_end(x, epoch=epoch)\n\n    def on_model_save(self, last, epoch, final_epoch, best_fitness, fi):\n        # Callback runs on model save event\n        if (epoch + 1) % self.opt.save_period == 0 and not final_epoch and self.opt.save_period != -1:\n            if self.wandb:\n                self.wandb.log_model(last.parent, self.opt, epoch, fi, best_model=best_fitness == fi)\n            if self.clearml:\n                self.clearml.task.update_output_model(model_path=str(last),\n                                                      model_name='Latest Model',\n                                                      auto_delete_file=False)\n\n        if self.comet_logger:\n            self.comet_logger.on_model_save(last, epoch, final_epoch, best_fitness, fi)\n\n    def on_train_end(self, last, best, epoch, results):\n        # Callback runs on training end, i.e. saving best model\n        if self.plots:\n            plot_results(file=self.save_dir / 'results.csv')  # save results.png\n        files = ['results.png', 'confusion_matrix.png', *(f'{x}_curve.png' for x in ('F1', 'PR', 'P', 'R'))]\n        files = [(self.save_dir / f) for f in files if (self.save_dir / f).exists()]  # filter\n        self.logger.info(f\"Results saved to {colorstr('bold', self.save_dir)}\")\n\n        if self.tb and not self.clearml:  # These images are already captured by ClearML by now, we don't want doubles\n            for f in files:\n                self.tb.add_image(f.stem, cv2.imread(str(f))[..., ::-1], epoch, dataformats='HWC')\n\n        if self.wandb:\n            self.wandb.log(dict(zip(self.keys[3:10], results)))\n            self.wandb.log({'Results': [wandb.Image(str(f), caption=f.name) for f in files]})\n            # Calling wandb.log. TODO: Refactor this into WandbLogger.log_model\n            if not self.opt.evolve:\n                wandb.log_artifact(str(best if best.exists() else last),\n                                   type='model',\n                                   name=f'run_{self.wandb.wandb_run.id}_model',\n                                   aliases=['latest', 'best', 'stripped'])\n            self.wandb.finish_run()\n\n        if self.clearml and not self.opt.evolve:\n            self.clearml.task.update_output_model(model_path=str(best if best.exists() else last),\n                                                  name='Best Model',\n                                                  auto_delete_file=False)\n\n        if self.comet_logger:\n            final_results = dict(zip(self.keys[3:10], results))\n            self.comet_logger.on_train_end(files, self.save_dir, last, best, epoch, final_results)\n\n    def on_params_update(self, params: dict):\n        # Update hyperparams or configs of the experiment\n        if self.wandb:\n            self.wandb.wandb_run.config.update(params, allow_val_change=True)\n        if self.comet_logger:\n            self.comet_logger.on_params_update(params)\n\n\nclass GenericLogger:\n    \"\"\"\n    YOLOv5 General purpose logger for non-task specific logging\n    Usage: from utils.loggers import GenericLogger; logger = GenericLogger(...)\n    Arguments\n        opt:             Run arguments\n        console_logger:  Console logger\n        include:         loggers to include\n    \"\"\"\n\n    def __init__(self, opt, console_logger, include=('tb', 'wandb')):\n        # init default loggers\n        self.save_dir = Path(opt.save_dir)\n        self.include = include\n        self.console_logger = console_logger\n        self.csv = self.save_dir / 'results.csv'  # CSV logger\n        if 'tb' in self.include:\n            prefix = colorstr('TensorBoard: ')\n            self.console_logger.info(\n                f\"{prefix}Start with 'tensorboard --logdir {self.save_dir.parent}', view at http://localhost:6006/\")\n            self.tb = SummaryWriter(str(self.save_dir))\n\n        if wandb and 'wandb' in self.include:\n            self.wandb = wandb.init(project=web_project_name(str(opt.project)),\n                                    name=None if opt.name == 'exp' else opt.name,\n                                    config=opt)\n        else:\n            self.wandb = None\n\n    def log_metrics(self, metrics, epoch):\n        # Log metrics dictionary to all loggers\n        if self.csv:\n            keys, vals = list(metrics.keys()), list(metrics.values())\n            n = len(metrics) + 1  # number of cols\n            s = '' if self.csv.exists() else (('%23s,' * n % tuple(['epoch'] + keys)).rstrip(',') + '\\n')  # header\n            with open(self.csv, 'a') as f:\n                f.write(s + ('%23.5g,' * n % tuple([epoch] + vals)).rstrip(',') + '\\n')\n\n        if self.tb:\n            for k, v in metrics.items():\n                self.tb.add_scalar(k, v, epoch)\n\n        if self.wandb:\n            self.wandb.log(metrics, step=epoch)\n\n    def log_images(self, files, name='Images', epoch=0):\n        # Log images to all loggers\n        files = [Path(f) for f in (files if isinstance(files, (tuple, list)) else [files])]  # to Path\n        files = [f for f in files if f.exists()]  # filter by exists\n\n        if self.tb:\n            for f in files:\n                self.tb.add_image(f.stem, cv2.imread(str(f))[..., ::-1], epoch, dataformats='HWC')\n\n        if self.wandb:\n            self.wandb.log({name: [wandb.Image(str(f), caption=f.name) for f in files]}, step=epoch)\n\n    def log_graph(self, model, imgsz=(640, 640)):\n        # Log model graph to all loggers\n        if self.tb:\n            log_tensorboard_graph(self.tb, model, imgsz)\n\n    def log_model(self, model_path, epoch=0, metadata={}):\n        # Log model to all loggers\n        if self.wandb:\n            art = wandb.Artifact(name=f'run_{wandb.run.id}_model', type='model', metadata=metadata)\n            art.add_file(str(model_path))\n            wandb.log_artifact(art)\n\n    def update_params(self, params):\n        # Update the parameters logged\n        if self.wandb:\n            wandb.run.config.update(params, allow_val_change=True)\n\n\ndef log_tensorboard_graph(tb, model, imgsz=(640, 640)):\n    # Log model graph to TensorBoard\n    try:\n        p = next(model.parameters())  # for device, type\n        imgsz = (imgsz, imgsz) if isinstance(imgsz, int) else imgsz  # expand\n        im = torch.zeros((1, 3, *imgsz)).to(p.device).type_as(p)  # input image (WARNING: must be zeros, not empty)\n        with warnings.catch_warnings():\n            warnings.simplefilter('ignore')  # suppress jit trace warning\n            tb.add_graph(torch.jit.trace(de_parallel(model), im, strict=False), [])\n    except Exception as e:\n        LOGGER.warning(f'WARNING ⚠️ TensorBoard graph visualization failure {e}')\n\n\ndef web_project_name(project):\n    # Convert local project name to web project name\n    if not project.startswith('runs/train'):\n        return project\n    suffix = '-Classify' if project.endswith('-cls') else '-Segment' if project.endswith('-seg') else ''\n    return f'YOLOv5{suffix}'\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/utils/loggers/clearml/README.md",
    "content": "# ClearML Integration\n\n<img align=\"center\" src=\"https://github.com/thepycoder/clearml_screenshots/raw/main/logos_dark.png#gh-light-mode-only\" alt=\"Clear|ML\"><img align=\"center\" src=\"https://github.com/thepycoder/clearml_screenshots/raw/main/logos_light.png#gh-dark-mode-only\" alt=\"Clear|ML\">\n\n## About ClearML\n\n[ClearML](https://cutt.ly/yolov5-tutorial-clearml) is an [open-source](https://github.com/allegroai/clearml) toolbox designed to save you time ⏱️.\n\n🔨 Track every YOLOv5 training run in the <b>experiment manager</b>\n\n🔧 Version and easily access your custom training data with the integrated ClearML <b>Data Versioning Tool</b>\n\n🔦 <b>Remotely train and monitor</b> your YOLOv5 training runs using ClearML Agent\n\n🔬 Get the very best mAP using ClearML <b>Hyperparameter Optimization</b>\n\n🔭 Turn your newly trained <b>YOLOv5 model into an API</b> with just a few commands using ClearML Serving\n\n<br />\nAnd so much more. It's up to you how many of these tools you want to use, you can stick to the experiment manager, or chain them all together into an impressive pipeline!\n<br />\n<br />\n\n![ClearML scalars dashboard](https://github.com/thepycoder/clearml_screenshots/raw/main/experiment_manager_with_compare.gif)\n\n<br />\n<br />\n\n## 🦾 Setting Things Up\n\nTo keep track of your experiments and/or data, ClearML needs to communicate to a server. You have 2 options to get one:\n\nEither sign up for free to the [ClearML Hosted Service](https://cutt.ly/yolov5-tutorial-clearml) or you can set up your own server, see [here](https://clear.ml/docs/latest/docs/deploying_clearml/clearml_server). Even the server is open-source, so even if you're dealing with sensitive data, you should be good to go!\n\n1. Install the `clearml` python package:\n\n   ```bash\n   pip install clearml\n   ```\n\n1. Connect the ClearML SDK to the server by [creating credentials](https://app.clear.ml/settings/workspace-configuration) (go right top to Settings -> Workspace -> Create new credentials), then execute the command below and follow the instructions:\n\n   ```bash\n   clearml-init\n   ```\n\nThat's it! You're done 😎\n\n<br />\n\n## 🚀 Training YOLOv5 With ClearML\n\nTo enable ClearML experiment tracking, simply install the ClearML pip package.\n\n```bash\npip install clearml>=1.2.0\n```\n\nThis will enable integration with the YOLOv5 training script. Every training run from now on, will be captured and stored by the ClearML experiment manager.\n\nIf you want to change the `project_name` or `task_name`, use the `--project` and `--name` arguments of the `train.py` script, by default the project will be called `YOLOv5` and the task `Training`.\nPLEASE NOTE: ClearML uses `/` as a delimiter for subprojects, so be careful when using `/` in your project name!\n\n```bash\npython train.py --img 640 --batch 16 --epochs 3 --data coco128.yaml --weights yolov5s.pt --cache\n```\n\nor with custom project and task name:\n\n```bash\npython train.py --project my_project --name my_training --img 640 --batch 16 --epochs 3 --data coco128.yaml --weights yolov5s.pt --cache\n```\n\nThis will capture:\n\n- Source code + uncommitted changes\n- Installed packages\n- (Hyper)parameters\n- Model files (use `--save-period n` to save a checkpoint every n epochs)\n- Console output\n- Scalars (mAP_0.5, mAP_0.5:0.95, precision, recall, losses, learning rates, ...)\n- General info such as machine details, runtime, creation date etc.\n- All produced plots such as label correlogram and confusion matrix\n- Images with bounding boxes per epoch\n- Mosaic per epoch\n- Validation images per epoch\n- ...\n\nThat's a lot right? 🤯\nNow, we can visualize all of this information in the ClearML UI to get an overview of our training progress. Add custom columns to the table view (such as e.g. mAP_0.5) so you can easily sort on the best performing model. Or select multiple experiments and directly compare them!\n\nThere even more we can do with all of this information, like hyperparameter optimization and remote execution, so keep reading if you want to see how that works!\n\n<br />\n\n## 🔗 Dataset Version Management\n\nVersioning your data separately from your code is generally a good idea and makes it easy to acquire the latest version too. This repository supports supplying a dataset version ID, and it will make sure to get the data if it's not there yet. Next to that, this workflow also saves the used dataset ID as part of the task parameters, so you will always know for sure which data was used in which experiment!\n\n![ClearML Dataset Interface](https://github.com/thepycoder/clearml_screenshots/raw/main/clearml_data.gif)\n\n### Prepare Your Dataset\n\nThe YOLOv5 repository supports a number of different datasets by using yaml files containing their information. By default datasets are downloaded to the `../datasets` folder in relation to the repository root folder. So if you downloaded the `coco128` dataset using the link in the yaml or with the scripts provided by yolov5, you get this folder structure:\n\n```\n..\n|_ yolov5\n|_ datasets\n    |_ coco128\n        |_ images\n        |_ labels\n        |_ LICENSE\n        |_ README.txt\n```\n\nBut this can be any dataset you wish. Feel free to use your own, as long as you keep to this folder structure.\n\nNext, ⚠️**copy the corresponding yaml file to the root of the dataset folder**⚠️. This yaml files contains the information ClearML will need to properly use the dataset. You can make this yourself too, of course, just follow the structure of the example yamls.\n\nBasically we need the following keys: `path`, `train`, `test`, `val`, `nc`, `names`.\n\n```\n..\n|_ yolov5\n|_ datasets\n    |_ coco128\n        |_ images\n        |_ labels\n        |_ coco128.yaml  # <---- HERE!\n        |_ LICENSE\n        |_ README.txt\n```\n\n### Upload Your Dataset\n\nTo get this dataset into ClearML as a versioned dataset, go to the dataset root folder and run the following command:\n\n```bash\ncd coco128\nclearml-data sync --project YOLOv5 --name coco128 --folder .\n```\n\nThe command `clearml-data sync` is actually a shorthand command. You could also run these commands one after the other:\n\n```bash\n# Optionally add --parent <parent_dataset_id> if you want to base\n# this version on another dataset version, so no duplicate files are uploaded!\nclearml-data create --name coco128 --project YOLOv5\nclearml-data add --files .\nclearml-data close\n```\n\n### Run Training Using A ClearML Dataset\n\nNow that you have a ClearML dataset, you can very simply use it to train custom YOLOv5 🚀 models!\n\n```bash\npython train.py --img 640 --batch 16 --epochs 3 --data clearml://<your_dataset_id> --weights yolov5s.pt --cache\n```\n\n<br />\n\n## 👀 Hyperparameter Optimization\n\nNow that we have our experiments and data versioned, it's time to take a look at what we can build on top!\n\nUsing the code information, installed packages and environment details, the experiment itself is now **completely reproducible**. In fact, ClearML allows you to clone an experiment and even change its parameters. We can then just rerun it with these new parameters automatically, this is basically what HPO does!\n\nTo **run hyperparameter optimization locally**, we've included a pre-made script for you. Just make sure a training task has been run at least once, so it is in the ClearML experiment manager, we will essentially clone it and change its hyperparameters.\n\nYou'll need to fill in the ID of this `template task` in the script found at `utils/loggers/clearml/hpo.py` and then just run it :) You can change `task.execute_locally()` to `task.execute()` to put it in a ClearML queue and have a remote agent work on it instead.\n\n```bash\n# To use optuna, install it first, otherwise you can change the optimizer to just be RandomSearch\npip install optuna\npython utils/loggers/clearml/hpo.py\n```\n\n![HPO](https://github.com/thepycoder/clearml_screenshots/raw/main/hpo.png)\n\n## 🤯 Remote Execution (advanced)\n\nRunning HPO locally is really handy, but what if we want to run our experiments on a remote machine instead? Maybe you have access to a very powerful GPU machine on-site, or you have some budget to use cloud GPUs.\nThis is where the ClearML Agent comes into play. Check out what the agent can do here:\n\n- [YouTube video](https://youtu.be/MX3BrXnaULs)\n- [Documentation](https://clear.ml/docs/latest/docs/clearml_agent)\n\nIn short: every experiment tracked by the experiment manager contains enough information to reproduce it on a different machine (installed packages, uncommitted changes etc.). So a ClearML agent does just that: it listens to a queue for incoming tasks and when it finds one, it recreates the environment and runs it while still reporting scalars, plots etc. to the experiment manager.\n\nYou can turn any machine (a cloud VM, a local GPU machine, your own laptop ... ) into a ClearML agent by simply running:\n\n```bash\nclearml-agent daemon --queue <queues_to_listen_to> [--docker]\n```\n\n### Cloning, Editing And Enqueuing\n\nWith our agent running, we can give it some work. Remember from the HPO section that we can clone a task and edit the hyperparameters? We can do that from the interface too!\n\n🪄 Clone the experiment by right-clicking it\n\n🎯 Edit the hyperparameters to what you wish them to be\n\n⏳ Enqueue the task to any of the queues by right-clicking it\n\n![Enqueue a task from the UI](https://github.com/thepycoder/clearml_screenshots/raw/main/enqueue.gif)\n\n### Executing A Task Remotely\n\nNow you can clone a task like we explained above, or simply mark your current script by adding `task.execute_remotely()` and on execution it will be put into a queue, for the agent to start working on!\n\nTo run the YOLOv5 training script remotely, all you have to do is add this line to the training.py script after the clearml logger has been instantiated:\n\n```python\n# ...\n# Loggers\ndata_dict = None\nif RANK in {-1, 0}:\n    loggers = Loggers(save_dir, weights, opt, hyp, LOGGER)  # loggers instance\n    if loggers.clearml:\n        loggers.clearml.task.execute_remotely(queue=\"my_queue\")  # <------ ADD THIS LINE\n        # Data_dict is either None is user did not choose for ClearML dataset or is filled in by ClearML\n        data_dict = loggers.clearml.data_dict\n# ...\n```\n\nWhen running the training script after this change, python will run the script up until that line, after which it will package the code and send it to the queue instead!\n\n### Autoscaling workers\n\nClearML comes with autoscalers too! This tool will automatically spin up new remote machines in the cloud of your choice (AWS, GCP, Azure) and turn them into ClearML agents for you whenever there are experiments detected in the queue. Once the tasks are processed, the autoscaler will automatically shut down the remote machines, and you stop paying!\n\nCheck out the autoscalers getting started video below.\n\n[![Watch the video](https://img.youtube.com/vi/j4XVMAaUt3E/0.jpg)](https://youtu.be/j4XVMAaUt3E)\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/utils/loggers/clearml/__init__.py",
    "content": ""
  },
  {
    "path": "yolo-improve/yolov5-AUX/utils/loggers/clearml/clearml_utils.py",
    "content": "\"\"\"Main Logger class for ClearML experiment tracking.\"\"\"\nimport glob\nimport re\nfrom pathlib import Path\n\nimport numpy as np\nimport yaml\n\nfrom utils.plots import Annotator, colors\n\ntry:\n    import clearml\n    from clearml import Dataset, Task\n\n    assert hasattr(clearml, '__version__')  # verify package import not local dir\nexcept (ImportError, AssertionError):\n    clearml = None\n\n\ndef construct_dataset(clearml_info_string):\n    \"\"\"Load in a clearml dataset and fill the internal data_dict with its contents.\n    \"\"\"\n    dataset_id = clearml_info_string.replace('clearml://', '')\n    dataset = Dataset.get(dataset_id=dataset_id)\n    dataset_root_path = Path(dataset.get_local_copy())\n\n    # We'll search for the yaml file definition in the dataset\n    yaml_filenames = list(glob.glob(str(dataset_root_path / '*.yaml')) + glob.glob(str(dataset_root_path / '*.yml')))\n    if len(yaml_filenames) > 1:\n        raise ValueError('More than one yaml file was found in the dataset root, cannot determine which one contains '\n                         'the dataset definition this way.')\n    elif len(yaml_filenames) == 0:\n        raise ValueError('No yaml definition found in dataset root path, check that there is a correct yaml file '\n                         'inside the dataset root path.')\n    with open(yaml_filenames[0]) as f:\n        dataset_definition = yaml.safe_load(f)\n\n    assert set(dataset_definition.keys()).issuperset(\n        {'train', 'test', 'val', 'nc', 'names'}\n    ), \"The right keys were not found in the yaml file, make sure it at least has the following keys: ('train', 'test', 'val', 'nc', 'names')\"\n\n    data_dict = dict()\n    data_dict['train'] = str(\n        (dataset_root_path / dataset_definition['train']).resolve()) if dataset_definition['train'] else None\n    data_dict['test'] = str(\n        (dataset_root_path / dataset_definition['test']).resolve()) if dataset_definition['test'] else None\n    data_dict['val'] = str(\n        (dataset_root_path / dataset_definition['val']).resolve()) if dataset_definition['val'] else None\n    data_dict['nc'] = dataset_definition['nc']\n    data_dict['names'] = dataset_definition['names']\n\n    return data_dict\n\n\nclass ClearmlLogger:\n    \"\"\"Log training runs, datasets, models, and predictions to ClearML.\n\n    This logger sends information to ClearML at app.clear.ml or to your own hosted server. By default,\n    this information includes hyperparameters, system configuration and metrics, model metrics, code information and\n    basic data metrics and analyses.\n\n    By providing additional command line arguments to train.py, datasets,\n    models and predictions can also be logged.\n    \"\"\"\n\n    def __init__(self, opt, hyp):\n        \"\"\"\n        - Initialize ClearML Task, this object will capture the experiment\n        - Upload dataset version to ClearML Data if opt.upload_dataset is True\n\n        arguments:\n        opt (namespace) -- Commandline arguments for this run\n        hyp (dict) -- Hyperparameters for this run\n\n        \"\"\"\n        self.current_epoch = 0\n        # Keep tracked of amount of logged images to enforce a limit\n        self.current_epoch_logged_images = set()\n        # Maximum number of images to log to clearML per epoch\n        self.max_imgs_to_log_per_epoch = 16\n        # Get the interval of epochs when bounding box images should be logged\n        self.bbox_interval = opt.bbox_interval\n        self.clearml = clearml\n        self.task = None\n        self.data_dict = None\n        if self.clearml:\n            self.task = Task.init(\n                project_name=opt.project if opt.project != 'runs/train' else 'YOLOv5',\n                task_name=opt.name if opt.name != 'exp' else 'Training',\n                tags=['YOLOv5'],\n                output_uri=True,\n                reuse_last_task_id=opt.exist_ok,\n                auto_connect_frameworks={'pytorch': False}\n                # We disconnect pytorch auto-detection, because we added manual model save points in the code\n            )\n            # ClearML's hooks will already grab all general parameters\n            # Only the hyperparameters coming from the yaml config file\n            # will have to be added manually!\n            self.task.connect(hyp, name='Hyperparameters')\n            self.task.connect(opt, name='Args')\n\n            # Make sure the code is easily remotely runnable by setting the docker image to use by the remote agent\n            self.task.set_base_docker('ultralytics/yolov5:latest',\n                                      docker_arguments='--ipc=host -e=\"CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=1\"',\n                                      docker_setup_bash_script='pip install clearml')\n\n            # Get ClearML Dataset Version if requested\n            if opt.data.startswith('clearml://'):\n                # data_dict should have the following keys:\n                # names, nc (number of classes), test, train, val (all three relative paths to ../datasets)\n                self.data_dict = construct_dataset(opt.data)\n                # Set data to data_dict because wandb will crash without this information and opt is the best way\n                # to give it to them\n                opt.data = self.data_dict\n\n    def log_debug_samples(self, files, title='Debug Samples'):\n        \"\"\"\n        Log files (images) as debug samples in the ClearML task.\n\n        arguments:\n        files (List(PosixPath)) a list of file paths in PosixPath format\n        title (str) A title that groups together images with the same values\n        \"\"\"\n        for f in files:\n            if f.exists():\n                it = re.search(r'_batch(\\d+)', f.name)\n                iteration = int(it.groups()[0]) if it else 0\n                self.task.get_logger().report_image(title=title,\n                                                    series=f.name.replace(it.group(), ''),\n                                                    local_path=str(f),\n                                                    iteration=iteration)\n\n    def log_image_with_boxes(self, image_path, boxes, class_names, image, conf_threshold=0.25):\n        \"\"\"\n        Draw the bounding boxes on a single image and report the result as a ClearML debug sample.\n\n        arguments:\n        image_path (PosixPath) the path the original image file\n        boxes (list): list of scaled predictions in the format - [xmin, ymin, xmax, ymax, confidence, class]\n        class_names (dict): dict containing mapping of class int to class name\n        image (Tensor): A torch tensor containing the actual image data\n        \"\"\"\n        if len(self.current_epoch_logged_images) < self.max_imgs_to_log_per_epoch and self.current_epoch >= 0:\n            # Log every bbox_interval times and deduplicate for any intermittend extra eval runs\n            if self.current_epoch % self.bbox_interval == 0 and image_path not in self.current_epoch_logged_images:\n                im = np.ascontiguousarray(np.moveaxis(image.mul(255).clamp(0, 255).byte().cpu().numpy(), 0, 2))\n                annotator = Annotator(im=im, pil=True)\n                for i, (conf, class_nr, box) in enumerate(zip(boxes[:, 4], boxes[:, 5], boxes[:, :4])):\n                    color = colors(i)\n\n                    class_name = class_names[int(class_nr)]\n                    confidence_percentage = round(float(conf) * 100, 2)\n                    label = f'{class_name}: {confidence_percentage}%'\n\n                    if conf > conf_threshold:\n                        annotator.rectangle(box.cpu().numpy(), outline=color)\n                        annotator.box_label(box.cpu().numpy(), label=label, color=color)\n\n                annotated_image = annotator.result()\n                self.task.get_logger().report_image(title='Bounding Boxes',\n                                                    series=image_path.name,\n                                                    iteration=self.current_epoch,\n                                                    image=annotated_image)\n                self.current_epoch_logged_images.add(image_path)\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/utils/loggers/clearml/hpo.py",
    "content": "from clearml import Task\n# Connecting ClearML with the current process,\n# from here on everything is logged automatically\nfrom clearml.automation import HyperParameterOptimizer, UniformParameterRange\nfrom clearml.automation.optuna import OptimizerOptuna\n\ntask = Task.init(project_name='Hyper-Parameter Optimization',\n                 task_name='YOLOv5',\n                 task_type=Task.TaskTypes.optimizer,\n                 reuse_last_task_id=False)\n\n# Example use case:\noptimizer = HyperParameterOptimizer(\n    # This is the experiment we want to optimize\n    base_task_id='<your_template_task_id>',\n    # here we define the hyper-parameters to optimize\n    # Notice: The parameter name should exactly match what you see in the UI: <section_name>/<parameter>\n    # For Example, here we see in the base experiment a section Named: \"General\"\n    # under it a parameter named \"batch_size\", this becomes \"General/batch_size\"\n    # If you have `argparse` for example, then arguments will appear under the \"Args\" section,\n    # and you should instead pass \"Args/batch_size\"\n    hyper_parameters=[\n        UniformParameterRange('Hyperparameters/lr0', min_value=1e-5, max_value=1e-1),\n        UniformParameterRange('Hyperparameters/lrf', min_value=0.01, max_value=1.0),\n        UniformParameterRange('Hyperparameters/momentum', min_value=0.6, max_value=0.98),\n        UniformParameterRange('Hyperparameters/weight_decay', min_value=0.0, max_value=0.001),\n        UniformParameterRange('Hyperparameters/warmup_epochs', min_value=0.0, max_value=5.0),\n        UniformParameterRange('Hyperparameters/warmup_momentum', min_value=0.0, max_value=0.95),\n        UniformParameterRange('Hyperparameters/warmup_bias_lr', min_value=0.0, max_value=0.2),\n        UniformParameterRange('Hyperparameters/box', min_value=0.02, max_value=0.2),\n        UniformParameterRange('Hyperparameters/cls', min_value=0.2, max_value=4.0),\n        UniformParameterRange('Hyperparameters/cls_pw', min_value=0.5, max_value=2.0),\n        UniformParameterRange('Hyperparameters/obj', min_value=0.2, max_value=4.0),\n        UniformParameterRange('Hyperparameters/obj_pw', min_value=0.5, max_value=2.0),\n        UniformParameterRange('Hyperparameters/iou_t', min_value=0.1, max_value=0.7),\n        UniformParameterRange('Hyperparameters/anchor_t', min_value=2.0, max_value=8.0),\n        UniformParameterRange('Hyperparameters/fl_gamma', min_value=0.0, max_value=4.0),\n        UniformParameterRange('Hyperparameters/hsv_h', min_value=0.0, max_value=0.1),\n        UniformParameterRange('Hyperparameters/hsv_s', min_value=0.0, max_value=0.9),\n        UniformParameterRange('Hyperparameters/hsv_v', min_value=0.0, max_value=0.9),\n        UniformParameterRange('Hyperparameters/degrees', min_value=0.0, max_value=45.0),\n        UniformParameterRange('Hyperparameters/translate', min_value=0.0, max_value=0.9),\n        UniformParameterRange('Hyperparameters/scale', min_value=0.0, max_value=0.9),\n        UniformParameterRange('Hyperparameters/shear', min_value=0.0, max_value=10.0),\n        UniformParameterRange('Hyperparameters/perspective', min_value=0.0, max_value=0.001),\n        UniformParameterRange('Hyperparameters/flipud', min_value=0.0, max_value=1.0),\n        UniformParameterRange('Hyperparameters/fliplr', min_value=0.0, max_value=1.0),\n        UniformParameterRange('Hyperparameters/mosaic', min_value=0.0, max_value=1.0),\n        UniformParameterRange('Hyperparameters/mixup', min_value=0.0, max_value=1.0),\n        UniformParameterRange('Hyperparameters/copy_paste', min_value=0.0, max_value=1.0)],\n    # this is the objective metric we want to maximize/minimize\n    objective_metric_title='metrics',\n    objective_metric_series='mAP_0.5',\n    # now we decide if we want to maximize it or minimize it (accuracy we maximize)\n    objective_metric_sign='max',\n    # let us limit the number of concurrent experiments,\n    # this in turn will make sure we do dont bombard the scheduler with experiments.\n    # if we have an auto-scaler connected, this, by proxy, will limit the number of machine\n    max_number_of_concurrent_tasks=1,\n    # this is the optimizer class (actually doing the optimization)\n    # Currently, we can choose from GridSearch, RandomSearch or OptimizerBOHB (Bayesian optimization Hyper-Band)\n    optimizer_class=OptimizerOptuna,\n    # If specified only the top K performing Tasks will be kept, the others will be automatically archived\n    save_top_k_tasks_only=5,  # 5,\n    compute_time_limit=None,\n    total_max_jobs=20,\n    min_iteration_per_job=None,\n    max_iteration_per_job=None,\n)\n\n# report every 10 seconds, this is way too often, but we are testing here\noptimizer.set_report_period(10 / 60)\n# You can also use the line below instead to run all the optimizer tasks locally, without using queues or agent\n# an_optimizer.start_locally(job_complete_callback=job_complete_callback)\n# set the time limit for the optimization process (2 hours)\noptimizer.set_time_limit(in_minutes=120.0)\n# Start the optimization process in the local environment\noptimizer.start_locally()\n# wait until process is done (notice we are controlling the optimization process in the background)\noptimizer.wait()\n# make sure background optimization stopped\noptimizer.stop()\n\nprint('We are done, good bye')\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/utils/loggers/comet/README.md",
    "content": "<img src=\"https://cdn.comet.ml/img/notebook_logo.png\">\n\n# YOLOv5 with Comet\n\nThis guide will cover how to use YOLOv5 with [Comet](https://bit.ly/yolov5-readme-comet2)\n\n# About Comet\n\nComet builds tools that help data scientists, engineers, and team leaders accelerate and optimize machine learning and deep learning models.\n\nTrack and visualize model metrics in real time, save your hyperparameters, datasets, and model checkpoints, and visualize your model predictions with [Comet Custom Panels](https://www.comet.com/docs/v2/guides/comet-dashboard/code-panels/about-panels/?utm_source=yolov5&utm_medium=partner&utm_campaign=partner_yolov5_2022&utm_content=github)!\nComet makes sure you never lose track of your work and makes it easy to share results and collaborate across teams of all sizes!\n\n# Getting Started\n\n## Install Comet\n\n```shell\npip install comet_ml\n```\n\n## Configure Comet Credentials\n\nThere are two ways to configure Comet with YOLOv5.\n\nYou can either set your credentials through environment variables\n\n**Environment Variables**\n\n```shell\nexport COMET_API_KEY=<Your Comet API Key>\nexport COMET_PROJECT_NAME=<Your Comet Project Name> # This will default to 'yolov5'\n```\n\nOr create a `.comet.config` file in your working directory and set your credentials there.\n\n**Comet Configuration File**\n\n```\n[comet]\napi_key=<Your Comet API Key>\nproject_name=<Your Comet Project Name> # This will default to 'yolov5'\n```\n\n## Run the Training Script\n\n```shell\n# Train YOLOv5s on COCO128 for 5 epochs\npython train.py --img 640 --batch 16 --epochs 5 --data coco128.yaml --weights yolov5s.pt\n```\n\nThat's it! Comet will automatically log your hyperparameters, command line arguments, training and validation metrics. You can visualize and analyze your runs in the Comet UI\n\n<img width=\"1920\" alt=\"yolo-ui\" src=\"https://user-images.githubusercontent.com/26833433/202851203-164e94e1-2238-46dd-91f8-de020e9d6b41.png\">\n\n# Try out an Example!\n\nCheck out an example of a [completed run here](https://www.comet.com/examples/comet-example-yolov5/a0e29e0e9b984e4a822db2a62d0cb357?experiment-tab=chart&showOutliers=true&smoothing=0&transformY=smoothing&xAxis=step&utm_source=yolov5&utm_medium=partner&utm_campaign=partner_yolov5_2022&utm_content=github)\n\nOr better yet, try it out yourself in this Colab Notebook\n\n[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1RG0WOQyxlDlo5Km8GogJpIEJlg_5lyYO?usp=sharing)\n\n# Log automatically\n\nBy default, Comet will log the following items\n\n## Metrics\n\n- Box Loss, Object Loss, Classification Loss for the training and validation data\n- mAP_0.5, mAP_0.5:0.95 metrics for the validation data.\n- Precision and Recall for the validation data\n\n## Parameters\n\n- Model Hyperparameters\n- All parameters passed through the command line options\n\n## Visualizations\n\n- Confusion Matrix of the model predictions on the validation data\n- Plots for the PR and F1 curves across all classes\n- Correlogram of the Class Labels\n\n# Configure Comet Logging\n\nComet can be configured to log additional data either through command line flags passed to the training script\nor through environment variables.\n\n```shell\nexport COMET_MODE=online # Set whether to run Comet in 'online' or 'offline' mode. Defaults to online\nexport COMET_MODEL_NAME=<your model name> #Set the name for the saved model. Defaults to yolov5\nexport COMET_LOG_CONFUSION_MATRIX=false # Set to disable logging a Comet Confusion Matrix. Defaults to true\nexport COMET_MAX_IMAGE_UPLOADS=<number of allowed images to upload to Comet> # Controls how many total image predictions to log to Comet. Defaults to 100.\nexport COMET_LOG_PER_CLASS_METRICS=true # Set to log evaluation metrics for each detected class at the end of training. Defaults to false\nexport COMET_DEFAULT_CHECKPOINT_FILENAME=<your checkpoint filename> # Set this if you would like to resume training from a different checkpoint. Defaults to 'last.pt'\nexport COMET_LOG_BATCH_LEVEL_METRICS=true # Set this if you would like to log training metrics at the batch level. Defaults to false.\nexport COMET_LOG_PREDICTIONS=true # Set this to false to disable logging model predictions\n```\n\n## Logging Checkpoints with Comet\n\nLogging Models to Comet is disabled by default. To enable it, pass the `save-period` argument to the training script. This will save the\nlogged checkpoints to Comet based on the interval value provided by `save-period`\n\n```shell\npython train.py \\\n--img 640 \\\n--batch 16 \\\n--epochs 5 \\\n--data coco128.yaml \\\n--weights yolov5s.pt \\\n--save-period 1\n```\n\n## Logging Model Predictions\n\nBy default, model predictions (images, ground truth labels and bounding boxes) will be logged to Comet.\n\nYou can control the frequency of logged predictions and the associated images by passing the `bbox_interval` command line argument. Predictions can be visualized using Comet's Object Detection Custom Panel. This frequency corresponds to every Nth batch of data per epoch. In the example below, we are logging every 2nd batch of data for each epoch.\n\n**Note:** The YOLOv5 validation dataloader will default to a batch size of 32, so you will have to set the logging frequency accordingly.\n\nHere is an [example project using the Panel](https://www.comet.com/examples/comet-example-yolov5?shareable=YcwMiJaZSXfcEXpGOHDD12vA1&utm_source=yolov5&utm_medium=partner&utm_campaign=partner_yolov5_2022&utm_content=github)\n\n```shell\npython train.py \\\n--img 640 \\\n--batch 16 \\\n--epochs 5 \\\n--data coco128.yaml \\\n--weights yolov5s.pt \\\n--bbox_interval 2\n```\n\n### Controlling the number of Prediction Images logged to Comet\n\nWhen logging predictions from YOLOv5, Comet will log the images associated with each set of predictions. By default a maximum of 100 validation images are logged. You can increase or decrease this number using the `COMET_MAX_IMAGE_UPLOADS` environment variable.\n\n```shell\nenv COMET_MAX_IMAGE_UPLOADS=200 python train.py \\\n--img 640 \\\n--batch 16 \\\n--epochs 5 \\\n--data coco128.yaml \\\n--weights yolov5s.pt \\\n--bbox_interval 1\n```\n\n### Logging Class Level Metrics\n\nUse the `COMET_LOG_PER_CLASS_METRICS` environment variable to log mAP, precision, recall, f1 for each class.\n\n```shell\nenv COMET_LOG_PER_CLASS_METRICS=true python train.py \\\n--img 640 \\\n--batch 16 \\\n--epochs 5 \\\n--data coco128.yaml \\\n--weights yolov5s.pt\n```\n\n## Uploading a Dataset to Comet Artifacts\n\nIf you would like to store your data using [Comet Artifacts](https://www.comet.com/docs/v2/guides/data-management/using-artifacts/#learn-more?utm_source=yolov5&utm_medium=partner&utm_campaign=partner_yolov5_2022&utm_content=github), you can do so using the `upload_dataset` flag.\n\nThe dataset be organized in the way described in the [YOLOv5 documentation](https://docs.ultralytics.com/tutorials/train-custom-datasets/#3-organize-directories). The dataset config `yaml` file must follow the same format as that of the `coco128.yaml` file.\n\n```shell\npython train.py \\\n--img 640 \\\n--batch 16 \\\n--epochs 5 \\\n--data coco128.yaml \\\n--weights yolov5s.pt \\\n--upload_dataset\n```\n\nYou can find the uploaded dataset in the Artifacts tab in your Comet Workspace\n<img width=\"1073\" alt=\"artifact-1\" src=\"https://user-images.githubusercontent.com/7529846/186929193-162718bf-ec7b-4eb9-8c3b-86b3763ef8ea.png\">\n\nYou can preview the data directly in the Comet UI.\n<img width=\"1082\" alt=\"artifact-2\" src=\"https://user-images.githubusercontent.com/7529846/186929215-432c36a9-c109-4eb0-944b-84c2786590d6.png\">\n\nArtifacts are versioned and also support adding metadata about the dataset. Comet will automatically log the metadata from your dataset `yaml` file\n<img width=\"963\" alt=\"artifact-3\" src=\"https://user-images.githubusercontent.com/7529846/186929256-9d44d6eb-1a19-42de-889a-bcbca3018f2e.png\">\n\n### Using a saved Artifact\n\nIf you would like to use a dataset from Comet Artifacts, set the `path` variable in your dataset `yaml` file to point to the following Artifact resource URL.\n\n```\n# contents of artifact.yaml file\npath: \"comet://<workspace name>/<artifact name>:<artifact version or alias>\"\n```\n\nThen pass this file to your training script in the following way\n\n```shell\npython train.py \\\n--img 640 \\\n--batch 16 \\\n--epochs 5 \\\n--data artifact.yaml \\\n--weights yolov5s.pt\n```\n\nArtifacts also allow you to track the lineage of data as it flows through your Experimentation workflow. Here you can see a graph that shows you all the experiments that have used your uploaded dataset.\n<img width=\"1391\" alt=\"artifact-4\" src=\"https://user-images.githubusercontent.com/7529846/186929264-4c4014fa-fe51-4f3c-a5c5-f6d24649b1b4.png\">\n\n## Resuming a Training Run\n\nIf your training run is interrupted for any reason, e.g. disrupted internet connection, you can resume the run using the `resume` flag and the Comet Run Path.\n\nThe Run Path has the following format `comet://<your workspace name>/<your project name>/<experiment id>`.\n\nThis will restore the run to its state before the interruption, which includes restoring the  model from a checkpoint, restoring all hyperparameters and training arguments and downloading Comet dataset Artifacts if they were used in the original run. The resumed run will continue logging to the existing Experiment in the Comet UI\n\n```shell\npython train.py \\\n--resume \"comet://<your run path>\"\n```\n\n## Hyperparameter Search with the Comet Optimizer\n\nYOLOv5 is also integrated with Comet's Optimizer, making is simple to visualize hyperparameter sweeps in the Comet UI.\n\n### Configuring an Optimizer Sweep\n\nTo configure the Comet Optimizer, you will have to create a JSON file with the information about the sweep. An example file has been provided in `utils/loggers/comet/optimizer_config.json`\n\n```shell\npython utils/loggers/comet/hpo.py \\\n  --comet_optimizer_config \"utils/loggers/comet/optimizer_config.json\"\n```\n\nThe `hpo.py` script accepts the same arguments as `train.py`. If you wish to pass additional arguments to your sweep simply add them after\nthe script.\n\n```shell\npython utils/loggers/comet/hpo.py \\\n  --comet_optimizer_config \"utils/loggers/comet/optimizer_config.json\" \\\n  --save-period 1 \\\n  --bbox_interval 1\n```\n\n### Running a Sweep in Parallel\n\n```shell\ncomet optimizer -j <set number of workers> utils/loggers/comet/hpo.py \\\n  utils/loggers/comet/optimizer_config.json\"\n```\n\n### Visualizing Results\n\nComet provides a number of ways to visualize the results of your sweep. Take a look at a [project with a completed sweep here](https://www.comet.com/examples/comet-example-yolov5/view/PrlArHGuuhDTKC1UuBmTtOSXD/panels?utm_source=yolov5&utm_medium=partner&utm_campaign=partner_yolov5_2022&utm_content=github)\n\n<img width=\"1626\" alt=\"hyperparameter-yolo\" src=\"https://user-images.githubusercontent.com/7529846/186914869-7dc1de14-583f-4323-967b-c9a66a29e495.png\">\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/utils/loggers/comet/__init__.py",
    "content": "import glob\nimport json\nimport logging\nimport os\nimport sys\nfrom pathlib import Path\n\nlogger = logging.getLogger(__name__)\n\nFILE = Path(__file__).resolve()\nROOT = FILE.parents[3]  # YOLOv5 root directory\nif str(ROOT) not in sys.path:\n    sys.path.append(str(ROOT))  # add ROOT to PATH\n\ntry:\n    import comet_ml\n\n    # Project Configuration\n    config = comet_ml.config.get_config()\n    COMET_PROJECT_NAME = config.get_string(os.getenv('COMET_PROJECT_NAME'), 'comet.project_name', default='yolov5')\nexcept (ModuleNotFoundError, ImportError):\n    comet_ml = None\n    COMET_PROJECT_NAME = None\n\nimport PIL\nimport torch\nimport torchvision.transforms as T\nimport yaml\n\nfrom utils.dataloaders import img2label_paths\nfrom utils.general import check_dataset, scale_boxes, xywh2xyxy\nfrom utils.metrics import box_iou\n\nCOMET_PREFIX = 'comet://'\n\nCOMET_MODE = os.getenv('COMET_MODE', 'online')\n\n# Model Saving Settings\nCOMET_MODEL_NAME = os.getenv('COMET_MODEL_NAME', 'yolov5')\n\n# Dataset Artifact Settings\nCOMET_UPLOAD_DATASET = os.getenv('COMET_UPLOAD_DATASET', 'false').lower() == 'true'\n\n# Evaluation Settings\nCOMET_LOG_CONFUSION_MATRIX = os.getenv('COMET_LOG_CONFUSION_MATRIX', 'true').lower() == 'true'\nCOMET_LOG_PREDICTIONS = os.getenv('COMET_LOG_PREDICTIONS', 'true').lower() == 'true'\nCOMET_MAX_IMAGE_UPLOADS = int(os.getenv('COMET_MAX_IMAGE_UPLOADS', 100))\n\n# Confusion Matrix Settings\nCONF_THRES = float(os.getenv('CONF_THRES', 0.001))\nIOU_THRES = float(os.getenv('IOU_THRES', 0.6))\n\n# Batch Logging Settings\nCOMET_LOG_BATCH_METRICS = os.getenv('COMET_LOG_BATCH_METRICS', 'false').lower() == 'true'\nCOMET_BATCH_LOGGING_INTERVAL = os.getenv('COMET_BATCH_LOGGING_INTERVAL', 1)\nCOMET_PREDICTION_LOGGING_INTERVAL = os.getenv('COMET_PREDICTION_LOGGING_INTERVAL', 1)\nCOMET_LOG_PER_CLASS_METRICS = os.getenv('COMET_LOG_PER_CLASS_METRICS', 'false').lower() == 'true'\n\nRANK = int(os.getenv('RANK', -1))\n\nto_pil = T.ToPILImage()\n\n\nclass CometLogger:\n    \"\"\"Log metrics, parameters, source code, models and much more\n    with Comet\n    \"\"\"\n\n    def __init__(self, opt, hyp, run_id=None, job_type='Training', **experiment_kwargs) -> None:\n        self.job_type = job_type\n        self.opt = opt\n        self.hyp = hyp\n\n        # Comet Flags\n        self.comet_mode = COMET_MODE\n\n        self.save_model = opt.save_period > -1\n        self.model_name = COMET_MODEL_NAME\n\n        # Batch Logging Settings\n        self.log_batch_metrics = COMET_LOG_BATCH_METRICS\n        self.comet_log_batch_interval = COMET_BATCH_LOGGING_INTERVAL\n\n        # Dataset Artifact Settings\n        self.upload_dataset = self.opt.upload_dataset if self.opt.upload_dataset else COMET_UPLOAD_DATASET\n        self.resume = self.opt.resume\n\n        # Default parameters to pass to Experiment objects\n        self.default_experiment_kwargs = {\n            'log_code': False,\n            'log_env_gpu': True,\n            'log_env_cpu': True,\n            'project_name': COMET_PROJECT_NAME,}\n        self.default_experiment_kwargs.update(experiment_kwargs)\n        self.experiment = self._get_experiment(self.comet_mode, run_id)\n\n        self.data_dict = self.check_dataset(self.opt.data)\n        self.class_names = self.data_dict['names']\n        self.num_classes = self.data_dict['nc']\n\n        self.logged_images_count = 0\n        self.max_images = COMET_MAX_IMAGE_UPLOADS\n\n        if run_id is None:\n            self.experiment.log_other('Created from', 'YOLOv5')\n            if not isinstance(self.experiment, comet_ml.OfflineExperiment):\n                workspace, project_name, experiment_id = self.experiment.url.split('/')[-3:]\n                self.experiment.log_other(\n                    'Run Path',\n                    f'{workspace}/{project_name}/{experiment_id}',\n                )\n            self.log_parameters(vars(opt))\n            self.log_parameters(self.opt.hyp)\n            self.log_asset_data(\n                self.opt.hyp,\n                name='hyperparameters.json',\n                metadata={'type': 'hyp-config-file'},\n            )\n            self.log_asset(\n                f'{self.opt.save_dir}/opt.yaml',\n                metadata={'type': 'opt-config-file'},\n            )\n\n        self.comet_log_confusion_matrix = COMET_LOG_CONFUSION_MATRIX\n\n        if hasattr(self.opt, 'conf_thres'):\n            self.conf_thres = self.opt.conf_thres\n        else:\n            self.conf_thres = CONF_THRES\n        if hasattr(self.opt, 'iou_thres'):\n            self.iou_thres = self.opt.iou_thres\n        else:\n            self.iou_thres = IOU_THRES\n\n        self.log_parameters({'val_iou_threshold': self.iou_thres, 'val_conf_threshold': self.conf_thres})\n\n        self.comet_log_predictions = COMET_LOG_PREDICTIONS\n        if self.opt.bbox_interval == -1:\n            self.comet_log_prediction_interval = 1 if self.opt.epochs < 10 else self.opt.epochs // 10\n        else:\n            self.comet_log_prediction_interval = self.opt.bbox_interval\n\n        if self.comet_log_predictions:\n            self.metadata_dict = {}\n            self.logged_image_names = []\n\n        self.comet_log_per_class_metrics = COMET_LOG_PER_CLASS_METRICS\n\n        self.experiment.log_others({\n            'comet_mode': COMET_MODE,\n            'comet_max_image_uploads': COMET_MAX_IMAGE_UPLOADS,\n            'comet_log_per_class_metrics': COMET_LOG_PER_CLASS_METRICS,\n            'comet_log_batch_metrics': COMET_LOG_BATCH_METRICS,\n            'comet_log_confusion_matrix': COMET_LOG_CONFUSION_MATRIX,\n            'comet_model_name': COMET_MODEL_NAME,})\n\n        # Check if running the Experiment with the Comet Optimizer\n        if hasattr(self.opt, 'comet_optimizer_id'):\n            self.experiment.log_other('optimizer_id', self.opt.comet_optimizer_id)\n            self.experiment.log_other('optimizer_objective', self.opt.comet_optimizer_objective)\n            self.experiment.log_other('optimizer_metric', self.opt.comet_optimizer_metric)\n            self.experiment.log_other('optimizer_parameters', json.dumps(self.hyp))\n\n    def _get_experiment(self, mode, experiment_id=None):\n        if mode == 'offline':\n            if experiment_id is not None:\n                return comet_ml.ExistingOfflineExperiment(\n                    previous_experiment=experiment_id,\n                    **self.default_experiment_kwargs,\n                )\n\n            return comet_ml.OfflineExperiment(**self.default_experiment_kwargs,)\n\n        else:\n            try:\n                if experiment_id is not None:\n                    return comet_ml.ExistingExperiment(\n                        previous_experiment=experiment_id,\n                        **self.default_experiment_kwargs,\n                    )\n\n                return comet_ml.Experiment(**self.default_experiment_kwargs)\n\n            except ValueError:\n                logger.warning('COMET WARNING: '\n                               'Comet credentials have not been set. '\n                               'Comet will default to offline logging. '\n                               'Please set your credentials to enable online logging.')\n                return self._get_experiment('offline', experiment_id)\n\n        return\n\n    def log_metrics(self, log_dict, **kwargs):\n        self.experiment.log_metrics(log_dict, **kwargs)\n\n    def log_parameters(self, log_dict, **kwargs):\n        self.experiment.log_parameters(log_dict, **kwargs)\n\n    def log_asset(self, asset_path, **kwargs):\n        self.experiment.log_asset(asset_path, **kwargs)\n\n    def log_asset_data(self, asset, **kwargs):\n        self.experiment.log_asset_data(asset, **kwargs)\n\n    def log_image(self, img, **kwargs):\n        self.experiment.log_image(img, **kwargs)\n\n    def log_model(self, path, opt, epoch, fitness_score, best_model=False):\n        if not self.save_model:\n            return\n\n        model_metadata = {\n            'fitness_score': fitness_score[-1],\n            'epochs_trained': epoch + 1,\n            'save_period': opt.save_period,\n            'total_epochs': opt.epochs,}\n\n        model_files = glob.glob(f'{path}/*.pt')\n        for model_path in model_files:\n            name = Path(model_path).name\n\n            self.experiment.log_model(\n                self.model_name,\n                file_or_folder=model_path,\n                file_name=name,\n                metadata=model_metadata,\n                overwrite=True,\n            )\n\n    def check_dataset(self, data_file):\n        with open(data_file) as f:\n            data_config = yaml.safe_load(f)\n\n        if data_config['path'].startswith(COMET_PREFIX):\n            path = data_config['path'].replace(COMET_PREFIX, '')\n            data_dict = self.download_dataset_artifact(path)\n\n            return data_dict\n\n        self.log_asset(self.opt.data, metadata={'type': 'data-config-file'})\n\n        return check_dataset(data_file)\n\n    def log_predictions(self, image, labelsn, path, shape, predn):\n        if self.logged_images_count >= self.max_images:\n            return\n        detections = predn[predn[:, 4] > self.conf_thres]\n        iou = box_iou(labelsn[:, 1:], detections[:, :4])\n        mask, _ = torch.where(iou > self.iou_thres)\n        if len(mask) == 0:\n            return\n\n        filtered_detections = detections[mask]\n        filtered_labels = labelsn[mask]\n\n        image_id = path.split('/')[-1].split('.')[0]\n        image_name = f'{image_id}_curr_epoch_{self.experiment.curr_epoch}'\n        if image_name not in self.logged_image_names:\n            native_scale_image = PIL.Image.open(path)\n            self.log_image(native_scale_image, name=image_name)\n            self.logged_image_names.append(image_name)\n\n        metadata = []\n        for cls, *xyxy in filtered_labels.tolist():\n            metadata.append({\n                'label': f'{self.class_names[int(cls)]}-gt',\n                'score': 100,\n                'box': {\n                    'x': xyxy[0],\n                    'y': xyxy[1],\n                    'x2': xyxy[2],\n                    'y2': xyxy[3]},})\n        for *xyxy, conf, cls in filtered_detections.tolist():\n            metadata.append({\n                'label': f'{self.class_names[int(cls)]}',\n                'score': conf * 100,\n                'box': {\n                    'x': xyxy[0],\n                    'y': xyxy[1],\n                    'x2': xyxy[2],\n                    'y2': xyxy[3]},})\n\n        self.metadata_dict[image_name] = metadata\n        self.logged_images_count += 1\n\n        return\n\n    def preprocess_prediction(self, image, labels, shape, pred):\n        nl, _ = labels.shape[0], pred.shape[0]\n\n        # Predictions\n        if self.opt.single_cls:\n            pred[:, 5] = 0\n\n        predn = pred.clone()\n        scale_boxes(image.shape[1:], predn[:, :4], shape[0], shape[1])\n\n        labelsn = None\n        if nl:\n            tbox = xywh2xyxy(labels[:, 1:5])  # target boxes\n            scale_boxes(image.shape[1:], tbox, shape[0], shape[1])  # native-space labels\n            labelsn = torch.cat((labels[:, 0:1], tbox), 1)  # native-space labels\n            scale_boxes(image.shape[1:], predn[:, :4], shape[0], shape[1])  # native-space pred\n\n        return predn, labelsn\n\n    def add_assets_to_artifact(self, artifact, path, asset_path, split):\n        img_paths = sorted(glob.glob(f'{asset_path}/*'))\n        label_paths = img2label_paths(img_paths)\n\n        for image_file, label_file in zip(img_paths, label_paths):\n            image_logical_path, label_logical_path = map(lambda x: os.path.relpath(x, path), [image_file, label_file])\n\n            try:\n                artifact.add(image_file, logical_path=image_logical_path, metadata={'split': split})\n                artifact.add(label_file, logical_path=label_logical_path, metadata={'split': split})\n            except ValueError as e:\n                logger.error('COMET ERROR: Error adding file to Artifact. Skipping file.')\n                logger.error(f'COMET ERROR: {e}')\n                continue\n\n        return artifact\n\n    def upload_dataset_artifact(self):\n        dataset_name = self.data_dict.get('dataset_name', 'yolov5-dataset')\n        path = str((ROOT / Path(self.data_dict['path'])).resolve())\n\n        metadata = self.data_dict.copy()\n        for key in ['train', 'val', 'test']:\n            split_path = metadata.get(key)\n            if split_path is not None:\n                metadata[key] = split_path.replace(path, '')\n\n        artifact = comet_ml.Artifact(name=dataset_name, artifact_type='dataset', metadata=metadata)\n        for key in metadata.keys():\n            if key in ['train', 'val', 'test']:\n                if isinstance(self.upload_dataset, str) and (key != self.upload_dataset):\n                    continue\n\n                asset_path = self.data_dict.get(key)\n                if asset_path is not None:\n                    artifact = self.add_assets_to_artifact(artifact, path, asset_path, key)\n\n        self.experiment.log_artifact(artifact)\n\n        return\n\n    def download_dataset_artifact(self, artifact_path):\n        logged_artifact = self.experiment.get_artifact(artifact_path)\n        artifact_save_dir = str(Path(self.opt.save_dir) / logged_artifact.name)\n        logged_artifact.download(artifact_save_dir)\n\n        metadata = logged_artifact.metadata\n        data_dict = metadata.copy()\n        data_dict['path'] = artifact_save_dir\n\n        metadata_names = metadata.get('names')\n        if type(metadata_names) == dict:\n            data_dict['names'] = {int(k): v for k, v in metadata.get('names').items()}\n        elif type(metadata_names) == list:\n            data_dict['names'] = {int(k): v for k, v in zip(range(len(metadata_names)), metadata_names)}\n        else:\n            raise \"Invalid 'names' field in dataset yaml file. Please use a list or dictionary\"\n\n        data_dict = self.update_data_paths(data_dict)\n        return data_dict\n\n    def update_data_paths(self, data_dict):\n        path = data_dict.get('path', '')\n\n        for split in ['train', 'val', 'test']:\n            if data_dict.get(split):\n                split_path = data_dict.get(split)\n                data_dict[split] = (f'{path}/{split_path}' if isinstance(split, str) else [\n                    f'{path}/{x}' for x in split_path])\n\n        return data_dict\n\n    def on_pretrain_routine_end(self, paths):\n        if self.opt.resume:\n            return\n\n        for path in paths:\n            self.log_asset(str(path))\n\n        if self.upload_dataset:\n            if not self.resume:\n                self.upload_dataset_artifact()\n\n        return\n\n    def on_train_start(self):\n        self.log_parameters(self.hyp)\n\n    def on_train_epoch_start(self):\n        return\n\n    def on_train_epoch_end(self, epoch):\n        self.experiment.curr_epoch = epoch\n\n        return\n\n    def on_train_batch_start(self):\n        return\n\n    def on_train_batch_end(self, log_dict, step):\n        self.experiment.curr_step = step\n        if self.log_batch_metrics and (step % self.comet_log_batch_interval == 0):\n            self.log_metrics(log_dict, step=step)\n\n        return\n\n    def on_train_end(self, files, save_dir, last, best, epoch, results):\n        if self.comet_log_predictions:\n            curr_epoch = self.experiment.curr_epoch\n            self.experiment.log_asset_data(self.metadata_dict, 'image-metadata.json', epoch=curr_epoch)\n\n        for f in files:\n            self.log_asset(f, metadata={'epoch': epoch})\n        self.log_asset(f'{save_dir}/results.csv', metadata={'epoch': epoch})\n\n        if not self.opt.evolve:\n            model_path = str(best if best.exists() else last)\n            name = Path(model_path).name\n            if self.save_model:\n                self.experiment.log_model(\n                    self.model_name,\n                    file_or_folder=model_path,\n                    file_name=name,\n                    overwrite=True,\n                )\n\n        # Check if running Experiment with Comet Optimizer\n        if hasattr(self.opt, 'comet_optimizer_id'):\n            metric = results.get(self.opt.comet_optimizer_metric)\n            self.experiment.log_other('optimizer_metric_value', metric)\n\n        self.finish_run()\n\n    def on_val_start(self):\n        return\n\n    def on_val_batch_start(self):\n        return\n\n    def on_val_batch_end(self, batch_i, images, targets, paths, shapes, outputs):\n        if not (self.comet_log_predictions and ((batch_i + 1) % self.comet_log_prediction_interval == 0)):\n            return\n\n        for si, pred in enumerate(outputs):\n            if len(pred) == 0:\n                continue\n\n            image = images[si]\n            labels = targets[targets[:, 0] == si, 1:]\n            shape = shapes[si]\n            path = paths[si]\n            predn, labelsn = self.preprocess_prediction(image, labels, shape, pred)\n            if labelsn is not None:\n                self.log_predictions(image, labelsn, path, shape, predn)\n\n        return\n\n    def on_val_end(self, nt, tp, fp, p, r, f1, ap, ap50, ap_class, confusion_matrix):\n        if self.comet_log_per_class_metrics:\n            if self.num_classes > 1:\n                for i, c in enumerate(ap_class):\n                    class_name = self.class_names[c]\n                    self.experiment.log_metrics(\n                        {\n                            'mAP@.5': ap50[i],\n                            'mAP@.5:.95': ap[i],\n                            'precision': p[i],\n                            'recall': r[i],\n                            'f1': f1[i],\n                            'true_positives': tp[i],\n                            'false_positives': fp[i],\n                            'support': nt[c]},\n                        prefix=class_name)\n\n        if self.comet_log_confusion_matrix:\n            epoch = self.experiment.curr_epoch\n            class_names = list(self.class_names.values())\n            class_names.append('background')\n            num_classes = len(class_names)\n\n            self.experiment.log_confusion_matrix(\n                matrix=confusion_matrix.matrix,\n                max_categories=num_classes,\n                labels=class_names,\n                epoch=epoch,\n                column_label='Actual Category',\n                row_label='Predicted Category',\n                file_name=f'confusion-matrix-epoch-{epoch}.json',\n            )\n\n    def on_fit_epoch_end(self, result, epoch):\n        self.log_metrics(result, epoch=epoch)\n\n    def on_model_save(self, last, epoch, final_epoch, best_fitness, fi):\n        if ((epoch + 1) % self.opt.save_period == 0 and not final_epoch) and self.opt.save_period != -1:\n            self.log_model(last.parent, self.opt, epoch, fi, best_model=best_fitness == fi)\n\n    def on_params_update(self, params):\n        self.log_parameters(params)\n\n    def finish_run(self):\n        self.experiment.end()\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/utils/loggers/comet/comet_utils.py",
    "content": "import logging\nimport os\nfrom urllib.parse import urlparse\n\ntry:\n    import comet_ml\nexcept (ModuleNotFoundError, ImportError):\n    comet_ml = None\n\nimport yaml\n\nlogger = logging.getLogger(__name__)\n\nCOMET_PREFIX = 'comet://'\nCOMET_MODEL_NAME = os.getenv('COMET_MODEL_NAME', 'yolov5')\nCOMET_DEFAULT_CHECKPOINT_FILENAME = os.getenv('COMET_DEFAULT_CHECKPOINT_FILENAME', 'last.pt')\n\n\ndef download_model_checkpoint(opt, experiment):\n    model_dir = f'{opt.project}/{experiment.name}'\n    os.makedirs(model_dir, exist_ok=True)\n\n    model_name = COMET_MODEL_NAME\n    model_asset_list = experiment.get_model_asset_list(model_name)\n\n    if len(model_asset_list) == 0:\n        logger.error(f'COMET ERROR: No checkpoints found for model name : {model_name}')\n        return\n\n    model_asset_list = sorted(\n        model_asset_list,\n        key=lambda x: x['step'],\n        reverse=True,\n    )\n    logged_checkpoint_map = {asset['fileName']: asset['assetId'] for asset in model_asset_list}\n\n    resource_url = urlparse(opt.weights)\n    checkpoint_filename = resource_url.query\n\n    if checkpoint_filename:\n        asset_id = logged_checkpoint_map.get(checkpoint_filename)\n    else:\n        asset_id = logged_checkpoint_map.get(COMET_DEFAULT_CHECKPOINT_FILENAME)\n        checkpoint_filename = COMET_DEFAULT_CHECKPOINT_FILENAME\n\n    if asset_id is None:\n        logger.error(f'COMET ERROR: Checkpoint {checkpoint_filename} not found in the given Experiment')\n        return\n\n    try:\n        logger.info(f'COMET INFO: Downloading checkpoint {checkpoint_filename}')\n        asset_filename = checkpoint_filename\n\n        model_binary = experiment.get_asset(asset_id, return_type='binary', stream=False)\n        model_download_path = f'{model_dir}/{asset_filename}'\n        with open(model_download_path, 'wb') as f:\n            f.write(model_binary)\n\n        opt.weights = model_download_path\n\n    except Exception as e:\n        logger.warning('COMET WARNING: Unable to download checkpoint from Comet')\n        logger.exception(e)\n\n\ndef set_opt_parameters(opt, experiment):\n    \"\"\"Update the opts Namespace with parameters\n    from Comet's ExistingExperiment when resuming a run\n\n    Args:\n        opt (argparse.Namespace): Namespace of command line options\n        experiment (comet_ml.APIExperiment): Comet API Experiment object\n    \"\"\"\n    asset_list = experiment.get_asset_list()\n    resume_string = opt.resume\n\n    for asset in asset_list:\n        if asset['fileName'] == 'opt.yaml':\n            asset_id = asset['assetId']\n            asset_binary = experiment.get_asset(asset_id, return_type='binary', stream=False)\n            opt_dict = yaml.safe_load(asset_binary)\n            for key, value in opt_dict.items():\n                setattr(opt, key, value)\n            opt.resume = resume_string\n\n    # Save hyperparameters to YAML file\n    # Necessary to pass checks in training script\n    save_dir = f'{opt.project}/{experiment.name}'\n    os.makedirs(save_dir, exist_ok=True)\n\n    hyp_yaml_path = f'{save_dir}/hyp.yaml'\n    with open(hyp_yaml_path, 'w') as f:\n        yaml.dump(opt.hyp, f)\n    opt.hyp = hyp_yaml_path\n\n\ndef check_comet_weights(opt):\n    \"\"\"Downloads model weights from Comet and updates the\n    weights path to point to saved weights location\n\n    Args:\n        opt (argparse.Namespace): Command Line arguments passed\n            to YOLOv5 training script\n\n    Returns:\n        None/bool: Return True if weights are successfully downloaded\n            else return None\n    \"\"\"\n    if comet_ml is None:\n        return\n\n    if isinstance(opt.weights, str):\n        if opt.weights.startswith(COMET_PREFIX):\n            api = comet_ml.API()\n            resource = urlparse(opt.weights)\n            experiment_path = f'{resource.netloc}{resource.path}'\n            experiment = api.get(experiment_path)\n            download_model_checkpoint(opt, experiment)\n            return True\n\n    return None\n\n\ndef check_comet_resume(opt):\n    \"\"\"Restores run parameters to its original state based on the model checkpoint\n    and logged Experiment parameters.\n\n    Args:\n        opt (argparse.Namespace): Command Line arguments passed\n            to YOLOv5 training script\n\n    Returns:\n        None/bool: Return True if the run is restored successfully\n            else return None\n    \"\"\"\n    if comet_ml is None:\n        return\n\n    if isinstance(opt.resume, str):\n        if opt.resume.startswith(COMET_PREFIX):\n            api = comet_ml.API()\n            resource = urlparse(opt.resume)\n            experiment_path = f'{resource.netloc}{resource.path}'\n            experiment = api.get(experiment_path)\n            set_opt_parameters(opt, experiment)\n            download_model_checkpoint(opt, experiment)\n\n            return True\n\n    return None\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/utils/loggers/comet/hpo.py",
    "content": "import argparse\nimport json\nimport logging\nimport os\nimport sys\nfrom pathlib import Path\n\nimport comet_ml\n\nlogger = logging.getLogger(__name__)\n\nFILE = Path(__file__).resolve()\nROOT = FILE.parents[3]  # YOLOv5 root directory\nif str(ROOT) not in sys.path:\n    sys.path.append(str(ROOT))  # add ROOT to PATH\n\nfrom train import train\nfrom utils.callbacks import Callbacks\nfrom utils.general import increment_path\nfrom utils.torch_utils import select_device\n\n# Project Configuration\nconfig = comet_ml.config.get_config()\nCOMET_PROJECT_NAME = config.get_string(os.getenv('COMET_PROJECT_NAME'), 'comet.project_name', default='yolov5')\n\n\ndef get_args(known=False):\n    parser = argparse.ArgumentParser()\n    parser.add_argument('--weights', type=str, default=ROOT / 'yolov5s.pt', help='initial weights path')\n    parser.add_argument('--cfg', type=str, default='', help='model.yaml path')\n    parser.add_argument('--data', type=str, default=ROOT / 'data/coco128.yaml', help='dataset.yaml path')\n    parser.add_argument('--hyp', type=str, default=ROOT / 'data/hyps/hyp.scratch-low.yaml', help='hyperparameters path')\n    parser.add_argument('--epochs', type=int, default=300, help='total training epochs')\n    parser.add_argument('--batch-size', type=int, default=16, help='total batch size for all GPUs, -1 for autobatch')\n    parser.add_argument('--imgsz', '--img', '--img-size', type=int, default=640, help='train, val image size (pixels)')\n    parser.add_argument('--rect', action='store_true', help='rectangular training')\n    parser.add_argument('--resume', nargs='?', const=True, default=False, help='resume most recent training')\n    parser.add_argument('--nosave', action='store_true', help='only save final checkpoint')\n    parser.add_argument('--noval', action='store_true', help='only validate final epoch')\n    parser.add_argument('--noautoanchor', action='store_true', help='disable AutoAnchor')\n    parser.add_argument('--noplots', action='store_true', help='save no plot files')\n    parser.add_argument('--evolve', type=int, nargs='?', const=300, help='evolve hyperparameters for x generations')\n    parser.add_argument('--bucket', type=str, default='', help='gsutil bucket')\n    parser.add_argument('--cache', type=str, nargs='?', const='ram', help='--cache images in \"ram\" (default) or \"disk\"')\n    parser.add_argument('--image-weights', action='store_true', help='use weighted image selection for training')\n    parser.add_argument('--device', default='', help='cuda device, i.e. 0 or 0,1,2,3 or cpu')\n    parser.add_argument('--multi-scale', action='store_true', help='vary img-size +/- 50%%')\n    parser.add_argument('--single-cls', action='store_true', help='train multi-class data as single-class')\n    parser.add_argument('--optimizer', type=str, choices=['SGD', 'Adam', 'AdamW'], default='SGD', help='optimizer')\n    parser.add_argument('--sync-bn', action='store_true', help='use SyncBatchNorm, only available in DDP mode')\n    parser.add_argument('--workers', type=int, default=8, help='max dataloader workers (per RANK in DDP mode)')\n    parser.add_argument('--project', default=ROOT / 'runs/train', help='save to project/name')\n    parser.add_argument('--name', default='exp', help='save to project/name')\n    parser.add_argument('--exist-ok', action='store_true', help='existing project/name ok, do not increment')\n    parser.add_argument('--quad', action='store_true', help='quad dataloader')\n    parser.add_argument('--cos-lr', action='store_true', help='cosine LR scheduler')\n    parser.add_argument('--label-smoothing', type=float, default=0.0, help='Label smoothing epsilon')\n    parser.add_argument('--patience', type=int, default=100, help='EarlyStopping patience (epochs without improvement)')\n    parser.add_argument('--freeze', nargs='+', type=int, default=[0], help='Freeze layers: backbone=10, first3=0 1 2')\n    parser.add_argument('--save-period', type=int, default=-1, help='Save checkpoint every x epochs (disabled if < 1)')\n    parser.add_argument('--seed', type=int, default=0, help='Global training seed')\n    parser.add_argument('--local_rank', type=int, default=-1, help='Automatic DDP Multi-GPU argument, do not modify')\n\n    # Weights & Biases arguments\n    parser.add_argument('--entity', default=None, help='W&B: Entity')\n    parser.add_argument('--upload_dataset', nargs='?', const=True, default=False, help='W&B: Upload data, \"val\" option')\n    parser.add_argument('--bbox_interval', type=int, default=-1, help='W&B: Set bounding-box image logging interval')\n    parser.add_argument('--artifact_alias', type=str, default='latest', help='W&B: Version of dataset artifact to use')\n\n    # Comet Arguments\n    parser.add_argument('--comet_optimizer_config', type=str, help='Comet: Path to a Comet Optimizer Config File.')\n    parser.add_argument('--comet_optimizer_id', type=str, help='Comet: ID of the Comet Optimizer sweep.')\n    parser.add_argument('--comet_optimizer_objective', type=str, help=\"Comet: Set to 'minimize' or 'maximize'.\")\n    parser.add_argument('--comet_optimizer_metric', type=str, help='Comet: Metric to Optimize.')\n    parser.add_argument('--comet_optimizer_workers',\n                        type=int,\n                        default=1,\n                        help='Comet: Number of Parallel Workers to use with the Comet Optimizer.')\n\n    return parser.parse_known_args()[0] if known else parser.parse_args()\n\n\ndef run(parameters, opt):\n    hyp_dict = {k: v for k, v in parameters.items() if k not in ['epochs', 'batch_size']}\n\n    opt.save_dir = str(increment_path(Path(opt.project) / opt.name, exist_ok=opt.exist_ok or opt.evolve))\n    opt.batch_size = parameters.get('batch_size')\n    opt.epochs = parameters.get('epochs')\n\n    device = select_device(opt.device, batch_size=opt.batch_size)\n    train(hyp_dict, opt, device, callbacks=Callbacks())\n\n\nif __name__ == '__main__':\n    opt = get_args(known=True)\n\n    opt.weights = str(opt.weights)\n    opt.cfg = str(opt.cfg)\n    opt.data = str(opt.data)\n    opt.project = str(opt.project)\n\n    optimizer_id = os.getenv('COMET_OPTIMIZER_ID')\n    if optimizer_id is None:\n        with open(opt.comet_optimizer_config) as f:\n            optimizer_config = json.load(f)\n        optimizer = comet_ml.Optimizer(optimizer_config)\n    else:\n        optimizer = comet_ml.Optimizer(optimizer_id)\n\n    opt.comet_optimizer_id = optimizer.id\n    status = optimizer.status()\n\n    opt.comet_optimizer_objective = status['spec']['objective']\n    opt.comet_optimizer_metric = status['spec']['metric']\n\n    logger.info('COMET INFO: Starting Hyperparameter Sweep')\n    for parameter in optimizer.get_parameters():\n        run(parameter['parameters'], opt)\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/utils/loggers/comet/optimizer_config.json",
    "content": "{\n  \"algorithm\": \"random\",\n  \"parameters\": {\n    \"anchor_t\": {\n      \"type\": \"discrete\",\n      \"values\": [\n        2,\n        8\n      ]\n    },\n    \"batch_size\": {\n      \"type\": \"discrete\",\n      \"values\": [\n        16,\n        32,\n        64\n      ]\n    },\n    \"box\": {\n      \"type\": \"discrete\",\n      \"values\": [\n        0.02,\n        0.2\n      ]\n    },\n    \"cls\": {\n      \"type\": \"discrete\",\n      \"values\": [\n        0.2\n      ]\n    },\n    \"cls_pw\": {\n      \"type\": \"discrete\",\n      \"values\": [\n        0.5\n      ]\n    },\n    \"copy_paste\": {\n      \"type\": \"discrete\",\n      \"values\": [\n        1\n      ]\n    },\n    \"degrees\": {\n      \"type\": \"discrete\",\n      \"values\": [\n        0,\n        45\n      ]\n    },\n    \"epochs\": {\n      \"type\": \"discrete\",\n      \"values\": [\n        5\n      ]\n    },\n    \"fl_gamma\": {\n      \"type\": \"discrete\",\n      \"values\": [\n        0\n      ]\n    },\n    \"fliplr\": {\n      \"type\": \"discrete\",\n      \"values\": [\n        0\n      ]\n    },\n    \"flipud\": {\n      \"type\": \"discrete\",\n      \"values\": [\n        0\n      ]\n    },\n    \"hsv_h\": {\n      \"type\": \"discrete\",\n      \"values\": [\n        0\n      ]\n    },\n    \"hsv_s\": {\n      \"type\": \"discrete\",\n      \"values\": [\n        0\n      ]\n    },\n    \"hsv_v\": {\n      \"type\": \"discrete\",\n      \"values\": [\n        0\n      ]\n    },\n    \"iou_t\": {\n      \"type\": \"discrete\",\n      \"values\": [\n        0.7\n      ]\n    },\n    \"lr0\": {\n      \"type\": \"discrete\",\n      \"values\": [\n        1e-05,\n        0.1\n      ]\n    },\n    \"lrf\": {\n      \"type\": \"discrete\",\n      \"values\": [\n        0.01,\n        1\n      ]\n    },\n    \"mixup\": {\n      \"type\": \"discrete\",\n      \"values\": [\n        1\n      ]\n    },\n    \"momentum\": {\n      \"type\": \"discrete\",\n      \"values\": [\n        0.6\n      ]\n    },\n    \"mosaic\": {\n      \"type\": \"discrete\",\n      \"values\": [\n        0\n      ]\n    },\n    \"obj\": {\n      \"type\": \"discrete\",\n      \"values\": [\n        0.2\n      ]\n    },\n    \"obj_pw\": {\n      \"type\": \"discrete\",\n      \"values\": [\n        0.5\n      ]\n    },\n    \"optimizer\": {\n      \"type\": \"categorical\",\n      \"values\": [\n        \"SGD\",\n        \"Adam\",\n        \"AdamW\"\n      ]\n    },\n    \"perspective\": {\n      \"type\": \"discrete\",\n      \"values\": [\n        0\n      ]\n    },\n    \"scale\": {\n      \"type\": \"discrete\",\n      \"values\": [\n        0\n      ]\n    },\n    \"shear\": {\n      \"type\": \"discrete\",\n      \"values\": [\n        0\n      ]\n    },\n    \"translate\": {\n      \"type\": \"discrete\",\n      \"values\": [\n        0\n      ]\n    },\n    \"warmup_bias_lr\": {\n      \"type\": \"discrete\",\n      \"values\": [\n        0,\n        0.2\n      ]\n    },\n    \"warmup_epochs\": {\n      \"type\": \"discrete\",\n      \"values\": [\n        5\n      ]\n    },\n    \"warmup_momentum\": {\n      \"type\": \"discrete\",\n      \"values\": [\n        0,\n        0.95\n      ]\n    },\n    \"weight_decay\": {\n      \"type\": \"discrete\",\n      \"values\": [\n        0,\n        0.001\n      ]\n    }\n  },\n  \"spec\": {\n    \"maxCombo\": 0,\n    \"metric\": \"metrics/mAP_0.5\",\n    \"objective\": \"maximize\"\n  },\n  \"trials\": 1\n}\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/utils/loss.py",
    "content": "# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n\"\"\"\nLoss functions\n\"\"\"\n\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\n\nfrom utils.metrics import bbox_iou, box_iou\nfrom utils.torch_utils import de_parallel\nfrom utils.general import xywh2xyxy\n\n\ndef smooth_BCE(eps=0.1):  # https://github.com/ultralytics/yolov3/issues/238#issuecomment-598028441\n    # return positive, negative label smoothing BCE targets\n    return 1.0 - 0.5 * eps, 0.5 * eps\n\n\nclass BCEBlurWithLogitsLoss(nn.Module):\n    # BCEwithLogitLoss() with reduced missing label effects.\n    def __init__(self, alpha=0.05):\n        super().__init__()\n        self.loss_fcn = nn.BCEWithLogitsLoss(reduction='none')  # must be nn.BCEWithLogitsLoss()\n        self.alpha = alpha\n\n    def forward(self, pred, true):\n        loss = self.loss_fcn(pred, true)\n        pred = torch.sigmoid(pred)  # prob from logits\n        dx = pred - true  # reduce only missing label effects\n        # dx = (pred - true).abs()  # reduce missing label and false label effects\n        alpha_factor = 1 - torch.exp((dx - 1) / (self.alpha + 1e-4))\n        loss *= alpha_factor\n        return loss.mean()\n\n\nclass FocalLoss(nn.Module):\n    # Wraps focal loss around existing loss_fcn(), i.e. criteria = FocalLoss(nn.BCEWithLogitsLoss(), gamma=1.5)\n    def __init__(self, loss_fcn, gamma=1.5, alpha=0.25):\n        super().__init__()\n        self.loss_fcn = loss_fcn  # must be nn.BCEWithLogitsLoss()\n        self.gamma = gamma\n        self.alpha = alpha\n        self.reduction = loss_fcn.reduction\n        self.loss_fcn.reduction = 'none'  # required to apply FL to each element\n\n    def forward(self, pred, true):\n        loss = self.loss_fcn(pred, true)\n        # p_t = torch.exp(-loss)\n        # loss *= self.alpha * (1.000001 - p_t) ** self.gamma  # non-zero power for gradient stability\n\n        # TF implementation https://github.com/tensorflow/addons/blob/v0.7.1/tensorflow_addons/losses/focal_loss.py\n        pred_prob = torch.sigmoid(pred)  # prob from logits\n        p_t = true * pred_prob + (1 - true) * (1 - pred_prob)\n        alpha_factor = true * self.alpha + (1 - true) * (1 - self.alpha)\n        modulating_factor = (1.0 - p_t) ** self.gamma\n        loss *= alpha_factor * modulating_factor\n\n        if self.reduction == 'mean':\n            return loss.mean()\n        elif self.reduction == 'sum':\n            return loss.sum()\n        else:  # 'none'\n            return loss\n\n\nclass QFocalLoss(nn.Module):\n    # Wraps Quality focal loss around existing loss_fcn(), i.e. criteria = FocalLoss(nn.BCEWithLogitsLoss(), gamma=1.5)\n    def __init__(self, loss_fcn, gamma=1.5, alpha=0.25):\n        super().__init__()\n        self.loss_fcn = loss_fcn  # must be nn.BCEWithLogitsLoss()\n        self.gamma = gamma\n        self.alpha = alpha\n        self.reduction = loss_fcn.reduction\n        self.loss_fcn.reduction = 'none'  # required to apply FL to each element\n\n    def forward(self, pred, true):\n        loss = self.loss_fcn(pred, true)\n\n        pred_prob = torch.sigmoid(pred)  # prob from logits\n        alpha_factor = true * self.alpha + (1 - true) * (1 - self.alpha)\n        modulating_factor = torch.abs(true - pred_prob) ** self.gamma\n        loss *= alpha_factor * modulating_factor\n\n        if self.reduction == 'mean':\n            return loss.mean()\n        elif self.reduction == 'sum':\n            return loss.sum()\n        else:  # 'none'\n            return loss\n\n\nclass ComputeLoss:\n    sort_obj_iou = False\n\n    # Compute losses\n    def __init__(self, model, autobalance=False):\n        device = next(model.parameters()).device  # get model device\n        h = model.hyp  # hyperparameters\n\n        # Define criteria\n        BCEcls = nn.BCEWithLogitsLoss(pos_weight=torch.tensor([h['cls_pw']], device=device))\n        BCEobj = nn.BCEWithLogitsLoss(pos_weight=torch.tensor([h['obj_pw']], device=device))\n\n        # Class label smoothing https://arxiv.org/pdf/1902.04103.pdf eqn 3\n        self.cp, self.cn = smooth_BCE(eps=h.get('label_smoothing', 0.0))  # positive, negative BCE targets\n\n        # Focal loss\n        g = h['fl_gamma']  # focal loss gamma\n        if g > 0:\n            BCEcls, BCEobj = FocalLoss(BCEcls, g), FocalLoss(BCEobj, g)\n\n        m = de_parallel(model).model[-1]  # Detect() module\n        self.balance = {3: [4.0, 1.0, 0.4]}.get(m.nl, [4.0, 1.0, 0.25, 0.06, 0.02])  # P3-P7\n        self.ssi = list(m.stride).index(16) if autobalance else 0  # stride 16 index\n        self.BCEcls, self.BCEobj, self.gr, self.hyp, self.autobalance = BCEcls, BCEobj, 1.0, h, autobalance\n        self.na = m.na  # number of anchors\n        self.nc = m.nc  # number of classes\n        self.nl = m.nl  # number of layers\n        self.anchors = m.anchors\n        self.device = device\n\n    def __call__(self, p, targets):  # predictions, targets\n        lcls = torch.zeros(1, device=self.device)  # class loss\n        lbox = torch.zeros(1, device=self.device)  # box loss\n        lobj = torch.zeros(1, device=self.device)  # object loss\n        tcls, tbox, indices, anchors = self.build_targets(p, targets)  # targets\n\n        # Losses\n        for i, pi in enumerate(p):  # layer index, layer predictions\n            b, a, gj, gi = indices[i]  # image, anchor, gridy, gridx\n            tobj = torch.zeros(pi.shape[:4], dtype=pi.dtype, device=self.device)  # target obj\n\n            n = b.shape[0]  # number of targets\n            if n:\n                # pxy, pwh, _, pcls = pi[b, a, gj, gi].tensor_split((2, 4, 5), dim=1)  # faster, requires torch 1.8.0\n                pxy, pwh, _, pcls = pi[b, a, gj, gi].split((2, 2, 1, self.nc), 1)  # target-subset of predictions\n\n                # Regression\n                pxy = pxy.sigmoid() * 2 - 0.5\n                pwh = (pwh.sigmoid() * 2) ** 2 * anchors[i]\n                pbox = torch.cat((pxy, pwh), 1)  # predicted box\n                iou = bbox_iou(pbox, tbox[i], CIoU=True).squeeze()  # iou(prediction, target)\n                lbox += (1.0 - iou).mean()  # iou loss\n\n                # Objectness\n                iou = iou.detach().clamp(0).type(tobj.dtype)\n                if self.sort_obj_iou:\n                    j = iou.argsort()\n                    b, a, gj, gi, iou = b[j], a[j], gj[j], gi[j], iou[j]\n                if self.gr < 1:\n                    iou = (1.0 - self.gr) + self.gr * iou\n                tobj[b, a, gj, gi] = iou  # iou ratio\n\n                # Classification\n                if self.nc > 1:  # cls loss (only if multiple classes)\n                    t = torch.full_like(pcls, self.cn, device=self.device)  # targets\n                    t[range(n), tcls[i]] = self.cp\n                    lcls += self.BCEcls(pcls, t)  # BCE\n\n                # Append targets to text file\n                # with open('targets.txt', 'a') as file:\n                #     [file.write('%11.5g ' * 4 % tuple(x) + '\\n') for x in torch.cat((txy[i], twh[i]), 1)]\n\n            obji = self.BCEobj(pi[..., 4], tobj)\n            lobj += obji * self.balance[i]  # obj loss\n            if self.autobalance:\n                self.balance[i] = self.balance[i] * 0.9999 + 0.0001 / obji.detach().item()\n\n        if self.autobalance:\n            self.balance = [x / self.balance[self.ssi] for x in self.balance]\n        lbox *= self.hyp['box']\n        lobj *= self.hyp['obj']\n        lcls *= self.hyp['cls']\n        bs = tobj.shape[0]  # batch size\n\n        return (lbox + lobj + lcls) * bs, torch.cat((lbox, lobj, lcls)).detach()\n\n    def build_targets(self, p, targets):\n        # Build targets for compute_loss(), input targets(image,class,x,y,w,h)\n        na, nt = self.na, targets.shape[0]  # number of anchors, targets\n        tcls, tbox, indices, anch = [], [], [], []\n        gain = torch.ones(7, device=self.device)  # normalized to gridspace gain\n        ai = torch.arange(na, device=self.device).float().view(na, 1).repeat(1, nt)  # same as .repeat_interleave(nt)\n        targets = torch.cat((targets.repeat(na, 1, 1), ai[..., None]), 2)  # append anchor indices\n\n        g = 0.5  # bias\n        off = torch.tensor(\n            [\n                [0, 0],\n                [1, 0],\n                [0, 1],\n                [-1, 0],\n                [0, -1],  # j,k,l,m\n                # [1, 1], [1, -1], [-1, 1], [-1, -1],  # jk,jm,lk,lm\n            ],\n            device=self.device).float() * g  # offsets\n\n        for i in range(self.nl):\n            anchors, shape = self.anchors[i], p[i].shape\n            gain[2:6] = torch.tensor(shape)[[3, 2, 3, 2]]  # xyxy gain\n\n            # Match targets to anchors\n            t = targets * gain  # shape(3,n,7)\n            if nt:\n                # Matches\n                r = t[..., 4:6] / anchors[:, None]  # wh ratio\n                j = torch.max(r, 1 / r).max(2)[0] < self.hyp['anchor_t']  # compare\n                # j = wh_iou(anchors, t[:, 4:6]) > model.hyp['iou_t']  # iou(3,n)=wh_iou(anchors(3,2), gwh(n,2))\n                t = t[j]  # filter\n\n                # Offsets\n                gxy = t[:, 2:4]  # grid xy\n                gxi = gain[[2, 3]] - gxy  # inverse\n                j, k = ((gxy % 1 < g) & (gxy > 1)).T\n                l, m = ((gxi % 1 < g) & (gxi > 1)).T\n                j = torch.stack((torch.ones_like(j), j, k, l, m))\n                t = t.repeat((5, 1, 1))[j]\n                offsets = (torch.zeros_like(gxy)[None] + off[:, None])[j]\n            else:\n                t = targets[0]\n                offsets = 0\n\n            # Define\n            bc, gxy, gwh, a = t.chunk(4, 1)  # (image, class), grid xy, grid wh, anchors\n            a, (b, c) = a.long().view(-1), bc.long().T  # anchors, image, class\n            gij = (gxy - offsets).long()\n            gi, gj = gij.T  # grid indices\n\n            # Append\n            indices.append((b, a, gj.clamp_(0, shape[2] - 1), gi.clamp_(0, shape[3] - 1)))  # image, anchor, grid\n            tbox.append(torch.cat((gxy - gij, gwh), 1))  # box\n            anch.append(anchors[a])  # anchors\n            tcls.append(c)  # class\n\n        return tcls, tbox, indices, anch\n\nclass ComputeLossAuxOTA:\n    # Compute losses\n    def __init__(self, model, autobalance=False):\n        super(ComputeLossAuxOTA, self).__init__()\n        device = next(model.parameters()).device  # get model device\n        h = model.hyp  # hyperparameters\n\n        # Define criteria\n        BCEcls = nn.BCEWithLogitsLoss(pos_weight=torch.tensor([h['cls_pw']], device=device))\n        BCEobj = nn.BCEWithLogitsLoss(pos_weight=torch.tensor([h['obj_pw']], device=device))\n\n        # Class label smoothing https://arxiv.org/pdf/1902.04103.pdf eqn 3\n        self.cp, self.cn = smooth_BCE(eps=h.get('label_smoothing', 0.0))  # positive, negative BCE targets\n\n        # Focal loss\n        g = h['fl_gamma']  # focal loss gamma\n        if g > 0:\n            BCEcls, BCEobj = FocalLoss(BCEcls, g), FocalLoss(BCEobj, g)\n\n        det = de_parallel(model).model[-1]  # Detect() module\n        self.balance = {3: [4.0, 1.0, 0.4]}.get(det.nl, [4.0, 1.0, 0.25, 0.06, .02])  # P3-P7\n        self.ssi = list(det.stride).index(16) if autobalance else 0  # stride 16 index\n        self.BCEcls, self.BCEobj, self.gr, self.hyp, self.autobalance = BCEcls, BCEobj, 1.0, h, autobalance\n        for k in 'na', 'nc', 'nl', 'anchors', 'stride':\n            setattr(self, k, getattr(det, k))\n\n    def __call__(self, p, targets, imgs):  # predictions, targets, model   \n        device = targets.device\n        lcls, lbox, lobj = torch.zeros(1, device=device), torch.zeros(1, device=device), torch.zeros(1, device=device)\n        bs_aux, as_aux_, gjs_aux, gis_aux, targets_aux, anchors_aux = self.build_targets2(p[:self.nl], targets, imgs)\n        bs, as_, gjs, gis, targets, anchors = self.build_targets(p[:self.nl], targets, imgs)\n        pre_gen_gains_aux = [torch.tensor(pp.shape, device=device)[[3, 2, 3, 2]] for pp in p[:self.nl]] \n        pre_gen_gains = [torch.tensor(pp.shape, device=device)[[3, 2, 3, 2]] for pp in p[:self.nl]] \n    \n\n        # Losses\n        for i in range(self.nl):  # layer index, layer predictions\n            pi = p[i]\n            pi_aux = p[i+self.nl]\n            b, a, gj, gi = bs[i], as_[i], gjs[i], gis[i]  # image, anchor, gridy, gridx\n            b_aux, a_aux, gj_aux, gi_aux = bs_aux[i], as_aux_[i], gjs_aux[i], gis_aux[i]  # image, anchor, gridy, gridx\n            tobj = torch.zeros_like(pi[..., 0], device=device)  # target obj\n            tobj_aux = torch.zeros_like(pi_aux[..., 0], device=device)  # target obj\n\n            n = b.shape[0]  # number of targets\n            if n:\n                ps = pi[b, a, gj, gi]  # prediction subset corresponding to targets\n\n                # Regression\n                grid = torch.stack([gi, gj], dim=1)\n                pxy = ps[:, :2].sigmoid() * 2. - 0.5\n                pwh = (ps[:, 2:4].sigmoid() * 2) ** 2 * anchors[i]\n                pbox = torch.cat((pxy, pwh), 1)  # predicted box\n                selected_tbox = targets[i][:, 2:6] * pre_gen_gains[i]\n                selected_tbox[:, :2] -= grid\n                iou = bbox_iou(pbox, selected_tbox, CIoU=True)  # iou(prediction, target)\n                lbox += (1.0 - iou).mean()  # iou loss\n\n                # Objectness\n                tobj[b, a, gj, gi] = (1.0 - self.gr) + self.gr * iou.detach().clamp(0).type(tobj.dtype)  # iou ratio\n\n                # Classification\n                selected_tcls = targets[i][:, 1].long()\n                if self.nc > 1:  # cls loss (only if multiple classes)\n                    t = torch.full_like(ps[:, 5:], self.cn, device=device)  # targets\n                    t[range(n), selected_tcls] = self.cp\n                    lcls += self.BCEcls(ps[:, 5:], t)  # BCE\n\n                # Append targets to text file\n                # with open('targets.txt', 'a') as file:\n                #     [file.write('%11.5g ' * 4 % tuple(x) + '\\n') for x in torch.cat((txy[i], twh[i]), 1)]\n            \n            n_aux = b_aux.shape[0]  # number of targets\n            if n_aux:\n                ps_aux = pi_aux[b_aux, a_aux, gj_aux, gi_aux]  # prediction subset corresponding to targets\n                grid_aux = torch.stack([gi_aux, gj_aux], dim=1)\n                pxy_aux = ps_aux[:, :2].sigmoid() * 2. - 0.5\n                #pxy_aux = ps_aux[:, :2].sigmoid() * 3. - 1.\n                pwh_aux = (ps_aux[:, 2:4].sigmoid() * 2) ** 2 * anchors_aux[i]\n                pbox_aux = torch.cat((pxy_aux, pwh_aux), 1)  # predicted box\n                selected_tbox_aux = targets_aux[i][:, 2:6] * pre_gen_gains_aux[i]\n                selected_tbox_aux[:, :2] -= grid_aux\n                iou_aux = bbox_iou(pbox_aux, selected_tbox_aux, CIoU=True)  # iou(prediction, target)\n                lbox += 0.25 * (1.0 - iou_aux).mean()  # iou loss\n\n                # Objectness\n                tobj_aux[b_aux, a_aux, gj_aux, gi_aux] = (1.0 - self.gr) + self.gr * iou_aux.detach().clamp(0).type(tobj_aux.dtype)  # iou ratio\n\n                # Classification\n                selected_tcls_aux = targets_aux[i][:, 1].long()\n                if self.nc > 1:  # cls loss (only if multiple classes)\n                    t_aux = torch.full_like(ps_aux[:, 5:], self.cn, device=device)  # targets\n                    t_aux[range(n_aux), selected_tcls_aux] = self.cp\n                    lcls += 0.25 * self.BCEcls(ps_aux[:, 5:], t_aux)  # BCE\n\n            obji = self.BCEobj(pi[..., 4], tobj)\n            obji_aux = self.BCEobj(pi_aux[..., 4], tobj_aux)\n            lobj += obji * self.balance[i] + 0.25 * obji_aux * self.balance[i] # obj loss\n            if self.autobalance:\n                self.balance[i] = self.balance[i] * 0.9999 + 0.0001 / obji.detach().item()\n\n        if self.autobalance:\n            self.balance = [x / self.balance[self.ssi] for x in self.balance]\n        lbox *= self.hyp['box']\n        lobj *= self.hyp['obj']\n        lcls *= self.hyp['cls']\n        bs = tobj.shape[0]  # batch size\n\n        loss = lbox + lobj + lcls\n        return loss * bs, torch.cat((lbox, lobj, lcls)).detach()\n\n    def build_targets(self, p, targets, imgs):\n        device = torch.device(targets.device)\n        indices, anch = self.find_3_positive(p, targets)\n\n        matching_bs = [[] for pp in p]\n        matching_as = [[] for pp in p]\n        matching_gjs = [[] for pp in p]\n        matching_gis = [[] for pp in p]\n        matching_targets = [[] for pp in p]\n        matching_anchs = [[] for pp in p]\n        \n        nl = len(p)    \n    \n        for batch_idx in range(p[0].shape[0]):\n        \n            b_idx = targets[:, 0]==batch_idx\n            this_target = targets[b_idx]\n            if this_target.shape[0] == 0:\n                continue\n                \n            txywh = this_target[:, 2:6] * imgs[batch_idx].shape[1]\n            txyxy = xywh2xyxy(txywh)\n\n            pxyxys = []\n            p_cls = []\n            p_obj = []\n            from_which_layer = []\n            all_b = []\n            all_a = []\n            all_gj = []\n            all_gi = []\n            all_anch = []\n            \n            for i, pi in enumerate(p):\n                \n                b, a, gj, gi = indices[i]\n                idx = (b == batch_idx)\n                b, a, gj, gi = b[idx], a[idx], gj[idx], gi[idx]                \n                all_b.append(b)\n                all_a.append(a)\n                all_gj.append(gj)\n                all_gi.append(gi)\n                all_anch.append(anch[i][idx])\n                from_which_layer.append((torch.ones(size=(len(b),)) * i).to(device))\n                \n                fg_pred = pi[b, a, gj, gi]                \n                p_obj.append(fg_pred[:, 4:5])\n                p_cls.append(fg_pred[:, 5:])\n                \n                grid = torch.stack([gi, gj], dim=1)\n                pxy = (fg_pred[:, :2].sigmoid() * 2. - 0.5 + grid) * self.stride[i] #/ 8.\n                #pxy = (fg_pred[:, :2].sigmoid() * 3. - 1. + grid) * self.stride[i]\n                pwh = (fg_pred[:, 2:4].sigmoid() * 2) ** 2 * anch[i][idx] * self.stride[i] #/ 8.\n                pxywh = torch.cat([pxy, pwh], dim=-1)\n                pxyxy = xywh2xyxy(pxywh)\n                pxyxys.append(pxyxy)\n            \n            pxyxys = torch.cat(pxyxys, dim=0)\n            if pxyxys.shape[0] == 0:\n                continue\n            p_obj = torch.cat(p_obj, dim=0)\n            p_cls = torch.cat(p_cls, dim=0)\n            from_which_layer = torch.cat(from_which_layer, dim=0)\n            all_b = torch.cat(all_b, dim=0)\n            all_a = torch.cat(all_a, dim=0)\n            all_gj = torch.cat(all_gj, dim=0)\n            all_gi = torch.cat(all_gi, dim=0)\n            all_anch = torch.cat(all_anch, dim=0)\n        \n            pair_wise_iou = box_iou(txyxy, pxyxys)\n\n            pair_wise_iou_loss = -torch.log(pair_wise_iou + 1e-8)\n\n            top_k, _ = torch.topk(pair_wise_iou, min(20, pair_wise_iou.shape[1]), dim=1)\n            dynamic_ks = torch.clamp(top_k.sum(1).int(), min=1)\n\n            gt_cls_per_image = (\n                F.one_hot(this_target[:, 1].to(torch.int64), self.nc)\n                .float()\n                .unsqueeze(1)\n                .repeat(1, pxyxys.shape[0], 1)\n            )\n\n            num_gt = this_target.shape[0]\n            cls_preds_ = (\n                p_cls.float().unsqueeze(0).repeat(num_gt, 1, 1).sigmoid_()\n                * p_obj.unsqueeze(0).repeat(num_gt, 1, 1).sigmoid_()\n            )\n\n            y = cls_preds_.sqrt_()\n            pair_wise_cls_loss = F.binary_cross_entropy_with_logits(\n               torch.log(y/(1-y)) , gt_cls_per_image, reduction=\"none\"\n            ).sum(-1)\n            del cls_preds_\n        \n            cost = (\n                pair_wise_cls_loss\n                + 3.0 * pair_wise_iou_loss\n            )\n\n            matching_matrix = torch.zeros_like(cost)\n\n            for gt_idx in range(num_gt):\n                _, pos_idx = torch.topk(\n                    cost[gt_idx], k=dynamic_ks[gt_idx].item(), largest=False\n                )\n                matching_matrix[gt_idx][pos_idx] = 1.0\n\n            del top_k, dynamic_ks\n            anchor_matching_gt = matching_matrix.sum(0)\n            if (anchor_matching_gt > 1).sum() > 0:\n                _, cost_argmin = torch.min(cost[:, anchor_matching_gt > 1], dim=0)\n                matching_matrix[:, anchor_matching_gt > 1] *= 0.0\n                matching_matrix[cost_argmin, anchor_matching_gt > 1] = 1.0\n            fg_mask_inboxes = matching_matrix.sum(0) > 0.0\n            matched_gt_inds = matching_matrix[:, fg_mask_inboxes].argmax(0)\n        \n            from_which_layer = from_which_layer[fg_mask_inboxes]\n            all_b = all_b[fg_mask_inboxes]\n            all_a = all_a[fg_mask_inboxes]\n            all_gj = all_gj[fg_mask_inboxes]\n            all_gi = all_gi[fg_mask_inboxes]\n            all_anch = all_anch[fg_mask_inboxes]\n        \n            this_target = this_target[matched_gt_inds]\n        \n            for i in range(nl):\n                layer_idx = from_which_layer == i\n                matching_bs[i].append(all_b[layer_idx])\n                matching_as[i].append(all_a[layer_idx])\n                matching_gjs[i].append(all_gj[layer_idx])\n                matching_gis[i].append(all_gi[layer_idx])\n                matching_targets[i].append(this_target[layer_idx])\n                matching_anchs[i].append(all_anch[layer_idx])\n\n        for i in range(nl):\n            if matching_targets[i] != []:\n                matching_bs[i] = torch.cat(matching_bs[i], dim=0)\n                matching_as[i] = torch.cat(matching_as[i], dim=0)\n                matching_gjs[i] = torch.cat(matching_gjs[i], dim=0)\n                matching_gis[i] = torch.cat(matching_gis[i], dim=0)\n                matching_targets[i] = torch.cat(matching_targets[i], dim=0)\n                matching_anchs[i] = torch.cat(matching_anchs[i], dim=0)\n            else:\n                matching_bs[i] = torch.tensor([], device='cuda:0', dtype=torch.int64)\n                matching_as[i] = torch.tensor([], device='cuda:0', dtype=torch.int64)\n                matching_gjs[i] = torch.tensor([], device='cuda:0', dtype=torch.int64)\n                matching_gis[i] = torch.tensor([], device='cuda:0', dtype=torch.int64)\n                matching_targets[i] = torch.tensor([], device='cuda:0', dtype=torch.int64)\n                matching_anchs[i] = torch.tensor([], device='cuda:0', dtype=torch.int64)\n\n        return matching_bs, matching_as, matching_gjs, matching_gis, matching_targets, matching_anchs\n\n    def build_targets2(self, p, targets, imgs):\n        device = torch.device(targets.device)\n        indices, anch = self.find_5_positive(p, targets)\n\n        matching_bs = [[] for pp in p]\n        matching_as = [[] for pp in p]\n        matching_gjs = [[] for pp in p]\n        matching_gis = [[] for pp in p]\n        matching_targets = [[] for pp in p]\n        matching_anchs = [[] for pp in p]\n        \n        nl = len(p)    \n    \n        for batch_idx in range(p[0].shape[0]):\n        \n            b_idx = targets[:, 0]==batch_idx\n            this_target = targets[b_idx]\n            if this_target.shape[0] == 0:\n                continue\n                \n            txywh = this_target[:, 2:6] * imgs[batch_idx].shape[1]\n            txyxy = xywh2xyxy(txywh)\n\n            pxyxys = []\n            p_cls = []\n            p_obj = []\n            from_which_layer = []\n            all_b = []\n            all_a = []\n            all_gj = []\n            all_gi = []\n            all_anch = []\n            \n            for i, pi in enumerate(p):\n                \n                b, a, gj, gi = indices[i]\n                idx = (b == batch_idx)\n                b, a, gj, gi = b[idx], a[idx], gj[idx], gi[idx]                \n                all_b.append(b)\n                all_a.append(a)\n                all_gj.append(gj)\n                all_gi.append(gi)\n                all_anch.append(anch[i][idx])\n                from_which_layer.append((torch.ones(size=(len(b),)) * i).to(device))\n                \n                fg_pred = pi[b, a, gj, gi]                \n                p_obj.append(fg_pred[:, 4:5])\n                p_cls.append(fg_pred[:, 5:])\n                \n                grid = torch.stack([gi, gj], dim=1)\n                pxy = (fg_pred[:, :2].sigmoid() * 2. - 0.5 + grid) * self.stride[i] #/ 8.\n                #pxy = (fg_pred[:, :2].sigmoid() * 3. - 1. + grid) * self.stride[i]\n                pwh = (fg_pred[:, 2:4].sigmoid() * 2) ** 2 * anch[i][idx] * self.stride[i] #/ 8.\n                pxywh = torch.cat([pxy, pwh], dim=-1)\n                pxyxy = xywh2xyxy(pxywh)\n                pxyxys.append(pxyxy)\n            \n            pxyxys = torch.cat(pxyxys, dim=0)\n            if pxyxys.shape[0] == 0:\n                continue\n            p_obj = torch.cat(p_obj, dim=0)\n            p_cls = torch.cat(p_cls, dim=0)\n            from_which_layer = torch.cat(from_which_layer, dim=0)\n            all_b = torch.cat(all_b, dim=0)\n            all_a = torch.cat(all_a, dim=0)\n            all_gj = torch.cat(all_gj, dim=0)\n            all_gi = torch.cat(all_gi, dim=0)\n            all_anch = torch.cat(all_anch, dim=0)\n        \n            pair_wise_iou = box_iou(txyxy, pxyxys)\n\n            pair_wise_iou_loss = -torch.log(pair_wise_iou + 1e-8)\n\n            top_k, _ = torch.topk(pair_wise_iou, min(20, pair_wise_iou.shape[1]), dim=1)\n            dynamic_ks = torch.clamp(top_k.sum(1).int(), min=1)\n\n            gt_cls_per_image = (\n                F.one_hot(this_target[:, 1].to(torch.int64), self.nc)\n                .float()\n                .unsqueeze(1)\n                .repeat(1, pxyxys.shape[0], 1)\n            )\n\n            num_gt = this_target.shape[0]\n            cls_preds_ = (\n                p_cls.float().unsqueeze(0).repeat(num_gt, 1, 1).sigmoid_()\n                * p_obj.unsqueeze(0).repeat(num_gt, 1, 1).sigmoid_()\n            )\n\n            y = cls_preds_.sqrt_()\n            pair_wise_cls_loss = F.binary_cross_entropy_with_logits(\n               torch.log(y/(1-y)) , gt_cls_per_image, reduction=\"none\"\n            ).sum(-1)\n            del cls_preds_\n        \n            cost = (\n                pair_wise_cls_loss\n                + 3.0 * pair_wise_iou_loss\n            )\n\n            matching_matrix = torch.zeros_like(cost)\n\n            for gt_idx in range(num_gt):\n                _, pos_idx = torch.topk(\n                    cost[gt_idx], k=dynamic_ks[gt_idx].item(), largest=False\n                )\n                matching_matrix[gt_idx][pos_idx] = 1.0\n\n            del top_k, dynamic_ks\n            anchor_matching_gt = matching_matrix.sum(0)\n            if (anchor_matching_gt > 1).sum() > 0:\n                _, cost_argmin = torch.min(cost[:, anchor_matching_gt > 1], dim=0)\n                matching_matrix[:, anchor_matching_gt > 1] *= 0.0\n                matching_matrix[cost_argmin, anchor_matching_gt > 1] = 1.0\n            fg_mask_inboxes = matching_matrix.sum(0) > 0.0\n            matched_gt_inds = matching_matrix[:, fg_mask_inboxes].argmax(0)\n        \n            from_which_layer = from_which_layer[fg_mask_inboxes]\n            all_b = all_b[fg_mask_inboxes]\n            all_a = all_a[fg_mask_inboxes]\n            all_gj = all_gj[fg_mask_inboxes]\n            all_gi = all_gi[fg_mask_inboxes]\n            all_anch = all_anch[fg_mask_inboxes]\n        \n            this_target = this_target[matched_gt_inds]\n        \n            for i in range(nl):\n                layer_idx = from_which_layer == i\n                matching_bs[i].append(all_b[layer_idx])\n                matching_as[i].append(all_a[layer_idx])\n                matching_gjs[i].append(all_gj[layer_idx])\n                matching_gis[i].append(all_gi[layer_idx])\n                matching_targets[i].append(this_target[layer_idx])\n                matching_anchs[i].append(all_anch[layer_idx])\n\n        for i in range(nl):\n            if matching_targets[i] != []:\n                matching_bs[i] = torch.cat(matching_bs[i], dim=0)\n                matching_as[i] = torch.cat(matching_as[i], dim=0)\n                matching_gjs[i] = torch.cat(matching_gjs[i], dim=0)\n                matching_gis[i] = torch.cat(matching_gis[i], dim=0)\n                matching_targets[i] = torch.cat(matching_targets[i], dim=0)\n                matching_anchs[i] = torch.cat(matching_anchs[i], dim=0)\n            else:\n                matching_bs[i] = torch.tensor([], device='cuda:0', dtype=torch.int64)\n                matching_as[i] = torch.tensor([], device='cuda:0', dtype=torch.int64)\n                matching_gjs[i] = torch.tensor([], device='cuda:0', dtype=torch.int64)\n                matching_gis[i] = torch.tensor([], device='cuda:0', dtype=torch.int64)\n                matching_targets[i] = torch.tensor([], device='cuda:0', dtype=torch.int64)\n                matching_anchs[i] = torch.tensor([], device='cuda:0', dtype=torch.int64)\n\n        return matching_bs, matching_as, matching_gjs, matching_gis, matching_targets, matching_anchs              \n\n    def find_5_positive(self, p, targets):\n        # Build targets for compute_loss(), input targets(image,class,x,y,w,h)\n        na, nt = self.na, targets.shape[0]  # number of anchors, targets\n        indices, anch = [], []\n        gain = torch.ones(7, device=targets.device).long()  # normalized to gridspace gain\n        ai = torch.arange(na, device=targets.device).float().view(na, 1).repeat(1, nt)  # same as .repeat_interleave(nt)\n        targets = torch.cat((targets.repeat(na, 1, 1), ai[:, :, None]), 2)  # append anchor indices\n\n        g = 1.0  # bias\n        off = torch.tensor([[0, 0],\n                            [1, 0], [0, 1], [-1, 0], [0, -1],  # j,k,l,m\n                            # [1, 1], [1, -1], [-1, 1], [-1, -1],  # jk,jm,lk,lm\n                            ], device=targets.device).float() * g  # offsets\n\n        for i in range(self.nl):\n            anchors = self.anchors[i]\n            gain[2:6] = torch.tensor(p[i].shape)[[3, 2, 3, 2]]  # xyxy gain\n\n            # Match targets to anchors\n            t = targets * gain\n            if nt:\n                # Matches\n                r = t[:, :, 4:6] / anchors[:, None]  # wh ratio\n                j = torch.max(r, 1. / r).max(2)[0] < self.hyp['anchor_t']  # compare\n                # j = wh_iou(anchors, t[:, 4:6]) > model.hyp['iou_t']  # iou(3,n)=wh_iou(anchors(3,2), gwh(n,2))\n                t = t[j]  # filter\n\n                # Offsets\n                gxy = t[:, 2:4]  # grid xy\n                gxi = gain[[2, 3]] - gxy  # inverse\n                j, k = ((gxy % 1. < g) & (gxy > 1.)).T\n                l, m = ((gxi % 1. < g) & (gxi > 1.)).T\n                j = torch.stack((torch.ones_like(j), j, k, l, m))\n                t = t.repeat((5, 1, 1))[j]\n                offsets = (torch.zeros_like(gxy)[None] + off[:, None])[j]\n            else:\n                t = targets[0]\n                offsets = 0\n\n            # Define\n            b, c = t[:, :2].long().T  # image, class\n            gxy = t[:, 2:4]  # grid xy\n            gwh = t[:, 4:6]  # grid wh\n            gij = (gxy - offsets).long()\n            gi, gj = gij.T  # grid xy indices\n\n            # Append\n            a = t[:, 6].long()  # anchor indices\n            indices.append((b, a, gj.clamp_(0, gain[3] - 1), gi.clamp_(0, gain[2] - 1)))  # image, anchor, grid indices\n            anch.append(anchors[a])  # anchors\n\n        return indices, anch                 \n\n    def find_3_positive(self, p, targets):\n        # Build targets for compute_loss(), input targets(image,class,x,y,w,h)\n        na, nt = self.na, targets.shape[0]  # number of anchors, targets\n        indices, anch = [], []\n        gain = torch.ones(7, device=targets.device).long()  # normalized to gridspace gain\n        ai = torch.arange(na, device=targets.device).float().view(na, 1).repeat(1, nt)  # same as .repeat_interleave(nt)\n        targets = torch.cat((targets.repeat(na, 1, 1), ai[:, :, None]), 2)  # append anchor indices\n\n        g = 0.5  # bias\n        off = torch.tensor([[0, 0],\n                            [1, 0], [0, 1], [-1, 0], [0, -1],  # j,k,l,m\n                            # [1, 1], [1, -1], [-1, 1], [-1, -1],  # jk,jm,lk,lm\n                            ], device=targets.device).float() * g  # offsets\n\n        for i in range(self.nl):\n            anchors = self.anchors[i]\n            gain[2:6] = torch.tensor(p[i].shape)[[3, 2, 3, 2]]  # xyxy gain\n\n            # Match targets to anchors\n            t = targets * gain\n            if nt:\n                # Matches\n                r = t[:, :, 4:6] / anchors[:, None]  # wh ratio\n                j = torch.max(r, 1. / r).max(2)[0] < self.hyp['anchor_t']  # compare\n                # j = wh_iou(anchors, t[:, 4:6]) > model.hyp['iou_t']  # iou(3,n)=wh_iou(anchors(3,2), gwh(n,2))\n                t = t[j]  # filter\n\n                # Offsets\n                gxy = t[:, 2:4]  # grid xy\n                gxi = gain[[2, 3]] - gxy  # inverse\n                j, k = ((gxy % 1. < g) & (gxy > 1.)).T\n                l, m = ((gxi % 1. < g) & (gxi > 1.)).T\n                j = torch.stack((torch.ones_like(j), j, k, l, m))\n                t = t.repeat((5, 1, 1))[j]\n                offsets = (torch.zeros_like(gxy)[None] + off[:, None])[j]\n            else:\n                t = targets[0]\n                offsets = 0\n\n            # Define\n            b, c = t[:, :2].long().T  # image, class\n            gxy = t[:, 2:4]  # grid xy\n            gwh = t[:, 4:6]  # grid wh\n            gij = (gxy - offsets).long()\n            gi, gj = gij.T  # grid xy indices\n\n            # Append\n            a = t[:, 6].long()  # anchor indices\n            indices.append((b, a, gj.clamp_(0, gain[3] - 1), gi.clamp_(0, gain[2] - 1)))  # image, anchor, grid indices\n            anch.append(anchors[a])  # anchors\n\n        return indices, anch\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/utils/metrics.py",
    "content": "# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n\"\"\"\nModel validation metrics\n\"\"\"\n\nimport math\nimport warnings\nfrom pathlib import Path\n\nimport matplotlib.pyplot as plt\nimport numpy as np\nimport torch\n\nfrom utils import TryExcept, threaded\n\n\ndef fitness(x):\n    # Model fitness as a weighted combination of metrics\n    w = [0.0, 0.0, 0.1, 0.9]  # weights for [P, R, mAP@0.5, mAP@0.5:0.95]\n    return (x[:, :4] * w).sum(1)\n\n\ndef smooth(y, f=0.05):\n    # Box filter of fraction f\n    nf = round(len(y) * f * 2) // 2 + 1  # number of filter elements (must be odd)\n    p = np.ones(nf // 2)  # ones padding\n    yp = np.concatenate((p * y[0], y, p * y[-1]), 0)  # y padded\n    return np.convolve(yp, np.ones(nf) / nf, mode='valid')  # y-smoothed\n\n\ndef ap_per_class(tp, conf, pred_cls, target_cls, plot=False, save_dir='.', names=(), eps=1e-16, prefix=''):\n    \"\"\" Compute the average precision, given the recall and precision curves.\n    Source: https://github.com/rafaelpadilla/Object-Detection-Metrics.\n    # Arguments\n        tp:  True positives (nparray, nx1 or nx10).\n        conf:  Objectness value from 0-1 (nparray).\n        pred_cls:  Predicted object classes (nparray).\n        target_cls:  True object classes (nparray).\n        plot:  Plot precision-recall curve at mAP@0.5\n        save_dir:  Plot save directory\n    # Returns\n        The average precision as computed in py-faster-rcnn.\n    \"\"\"\n\n    # Sort by objectness\n    i = np.argsort(-conf)\n    tp, conf, pred_cls = tp[i], conf[i], pred_cls[i]\n\n    # Find unique classes\n    unique_classes, nt = np.unique(target_cls, return_counts=True)\n    nc = unique_classes.shape[0]  # number of classes, number of detections\n\n    # Create Precision-Recall curve and compute AP for each class\n    px, py = np.linspace(0, 1, 1000), []  # for plotting\n    ap, p, r = np.zeros((nc, tp.shape[1])), np.zeros((nc, 1000)), np.zeros((nc, 1000))\n    for ci, c in enumerate(unique_classes):\n        i = pred_cls == c\n        n_l = nt[ci]  # number of labels\n        n_p = i.sum()  # number of predictions\n        if n_p == 0 or n_l == 0:\n            continue\n\n        # Accumulate FPs and TPs\n        fpc = (1 - tp[i]).cumsum(0)\n        tpc = tp[i].cumsum(0)\n\n        # Recall\n        recall = tpc / (n_l + eps)  # recall curve\n        r[ci] = np.interp(-px, -conf[i], recall[:, 0], left=0)  # negative x, xp because xp decreases\n\n        # Precision\n        precision = tpc / (tpc + fpc)  # precision curve\n        p[ci] = np.interp(-px, -conf[i], precision[:, 0], left=1)  # p at pr_score\n\n        # AP from recall-precision curve\n        for j in range(tp.shape[1]):\n            ap[ci, j], mpre, mrec = compute_ap(recall[:, j], precision[:, j])\n            if plot and j == 0:\n                py.append(np.interp(px, mrec, mpre))  # precision at mAP@0.5\n\n    # Compute F1 (harmonic mean of precision and recall)\n    f1 = 2 * p * r / (p + r + eps)\n    names = [v for k, v in names.items() if k in unique_classes]  # list: only classes that have data\n    names = dict(enumerate(names))  # to dict\n    if plot:\n        plot_pr_curve(px, py, ap, Path(save_dir) / f'{prefix}PR_curve.png', names)\n        plot_mc_curve(px, f1, Path(save_dir) / f'{prefix}F1_curve.png', names, ylabel='F1')\n        plot_mc_curve(px, p, Path(save_dir) / f'{prefix}P_curve.png', names, ylabel='Precision')\n        plot_mc_curve(px, r, Path(save_dir) / f'{prefix}R_curve.png', names, ylabel='Recall')\n\n    i = smooth(f1.mean(0), 0.1).argmax()  # max F1 index\n    p, r, f1 = p[:, i], r[:, i], f1[:, i]\n    tp = (r * nt).round()  # true positives\n    fp = (tp / (p + eps) - tp).round()  # false positives\n    return tp, fp, p, r, f1, ap, unique_classes.astype(int)\n\n\ndef compute_ap(recall, precision):\n    \"\"\" Compute the average precision, given the recall and precision curves\n    # Arguments\n        recall:    The recall curve (list)\n        precision: The precision curve (list)\n    # Returns\n        Average precision, precision curve, recall curve\n    \"\"\"\n\n    # Append sentinel values to beginning and end\n    mrec = np.concatenate(([0.0], recall, [1.0]))\n    mpre = np.concatenate(([1.0], precision, [0.0]))\n\n    # Compute the precision envelope\n    mpre = np.flip(np.maximum.accumulate(np.flip(mpre)))\n\n    # Integrate area under curve\n    method = 'interp'  # methods: 'continuous', 'interp'\n    if method == 'interp':\n        x = np.linspace(0, 1, 101)  # 101-point interp (COCO)\n        ap = np.trapz(np.interp(x, mrec, mpre), x)  # integrate\n    else:  # 'continuous'\n        i = np.where(mrec[1:] != mrec[:-1])[0]  # points where x axis (recall) changes\n        ap = np.sum((mrec[i + 1] - mrec[i]) * mpre[i + 1])  # area under curve\n\n    return ap, mpre, mrec\n\n\nclass ConfusionMatrix:\n    # Updated version of https://github.com/kaanakan/object_detection_confusion_matrix\n    def __init__(self, nc, conf=0.25, iou_thres=0.45):\n        self.matrix = np.zeros((nc + 1, nc + 1))\n        self.nc = nc  # number of classes\n        self.conf = conf\n        self.iou_thres = iou_thres\n\n    def process_batch(self, detections, labels):\n        \"\"\"\n        Return intersection-over-union (Jaccard index) of boxes.\n        Both sets of boxes are expected to be in (x1, y1, x2, y2) format.\n        Arguments:\n            detections (Array[N, 6]), x1, y1, x2, y2, conf, class\n            labels (Array[M, 5]), class, x1, y1, x2, y2\n        Returns:\n            None, updates confusion matrix accordingly\n        \"\"\"\n        if detections is None:\n            gt_classes = labels.int()\n            for gc in gt_classes:\n                self.matrix[self.nc, gc] += 1  # background FN\n            return\n\n        detections = detections[detections[:, 4] > self.conf]\n        gt_classes = labels[:, 0].int()\n        detection_classes = detections[:, 5].int()\n        iou = box_iou(labels[:, 1:], detections[:, :4])\n\n        x = torch.where(iou > self.iou_thres)\n        if x[0].shape[0]:\n            matches = torch.cat((torch.stack(x, 1), iou[x[0], x[1]][:, None]), 1).cpu().numpy()\n            if x[0].shape[0] > 1:\n                matches = matches[matches[:, 2].argsort()[::-1]]\n                matches = matches[np.unique(matches[:, 1], return_index=True)[1]]\n                matches = matches[matches[:, 2].argsort()[::-1]]\n                matches = matches[np.unique(matches[:, 0], return_index=True)[1]]\n        else:\n            matches = np.zeros((0, 3))\n\n        n = matches.shape[0] > 0\n        m0, m1, _ = matches.transpose().astype(int)\n        for i, gc in enumerate(gt_classes):\n            j = m0 == i\n            if n and sum(j) == 1:\n                self.matrix[detection_classes[m1[j]], gc] += 1  # correct\n            else:\n                self.matrix[self.nc, gc] += 1  # true background\n\n        if n:\n            for i, dc in enumerate(detection_classes):\n                if not any(m1 == i):\n                    self.matrix[dc, self.nc] += 1  # predicted background\n\n    def tp_fp(self):\n        tp = self.matrix.diagonal()  # true positives\n        fp = self.matrix.sum(1) - tp  # false positives\n        # fn = self.matrix.sum(0) - tp  # false negatives (missed detections)\n        return tp[:-1], fp[:-1]  # remove background class\n\n    @TryExcept('WARNING ⚠️ ConfusionMatrix plot failure')\n    def plot(self, normalize=True, save_dir='', names=()):\n        import seaborn as sn\n\n        array = self.matrix / ((self.matrix.sum(0).reshape(1, -1) + 1E-9) if normalize else 1)  # normalize columns\n        array[array < 0.005] = np.nan  # don't annotate (would appear as 0.00)\n\n        fig, ax = plt.subplots(1, 1, figsize=(12, 9), tight_layout=True)\n        nc, nn = self.nc, len(names)  # number of classes, names\n        sn.set(font_scale=1.0 if nc < 50 else 0.8)  # for label size\n        labels = (0 < nn < 99) and (nn == nc)  # apply names to ticklabels\n        ticklabels = (names + ['background']) if labels else 'auto'\n        with warnings.catch_warnings():\n            warnings.simplefilter('ignore')  # suppress empty matrix RuntimeWarning: All-NaN slice encountered\n            sn.heatmap(array,\n                       ax=ax,\n                       annot=nc < 30,\n                       annot_kws={\n                           'size': 8},\n                       cmap='Blues',\n                       fmt='.2f',\n                       square=True,\n                       vmin=0.0,\n                       xticklabels=ticklabels,\n                       yticklabels=ticklabels).set_facecolor((1, 1, 1))\n        ax.set_xlabel('True')\n        ax.set_ylabel('Predicted')\n        ax.set_title('Confusion Matrix')\n        fig.savefig(Path(save_dir) / 'confusion_matrix.png', dpi=250)\n        plt.close(fig)\n\n    def print(self):\n        for i in range(self.nc + 1):\n            print(' '.join(map(str, self.matrix[i])))\n\n\ndef bbox_iou(box1, box2, xywh=True, GIoU=False, DIoU=False, CIoU=False, eps=1e-7):\n    # Returns Intersection over Union (IoU) of box1(1,4) to box2(n,4)\n\n    # Get the coordinates of bounding boxes\n    if xywh:  # transform from xywh to xyxy\n        (x1, y1, w1, h1), (x2, y2, w2, h2) = box1.chunk(4, -1), box2.chunk(4, -1)\n        w1_, h1_, w2_, h2_ = w1 / 2, h1 / 2, w2 / 2, h2 / 2\n        b1_x1, b1_x2, b1_y1, b1_y2 = x1 - w1_, x1 + w1_, y1 - h1_, y1 + h1_\n        b2_x1, b2_x2, b2_y1, b2_y2 = x2 - w2_, x2 + w2_, y2 - h2_, y2 + h2_\n    else:  # x1, y1, x2, y2 = box1\n        b1_x1, b1_y1, b1_x2, b1_y2 = box1.chunk(4, -1)\n        b2_x1, b2_y1, b2_x2, b2_y2 = box2.chunk(4, -1)\n        w1, h1 = b1_x2 - b1_x1, (b1_y2 - b1_y1).clamp(eps)\n        w2, h2 = b2_x2 - b2_x1, (b2_y2 - b2_y1).clamp(eps)\n\n    # Intersection area\n    inter = (b1_x2.minimum(b2_x2) - b1_x1.maximum(b2_x1)).clamp(0) * \\\n            (b1_y2.minimum(b2_y2) - b1_y1.maximum(b2_y1)).clamp(0)\n\n    # Union Area\n    union = w1 * h1 + w2 * h2 - inter + eps\n\n    # IoU\n    iou = inter / union\n    if CIoU or DIoU or GIoU:\n        cw = b1_x2.maximum(b2_x2) - b1_x1.minimum(b2_x1)  # convex (smallest enclosing box) width\n        ch = b1_y2.maximum(b2_y2) - b1_y1.minimum(b2_y1)  # convex height\n        if CIoU or DIoU:  # Distance or Complete IoU https://arxiv.org/abs/1911.08287v1\n            c2 = cw ** 2 + ch ** 2 + eps  # convex diagonal squared\n            rho2 = ((b2_x1 + b2_x2 - b1_x1 - b1_x2) ** 2 + (b2_y1 + b2_y2 - b1_y1 - b1_y2) ** 2) / 4  # center dist ** 2\n            if CIoU:  # https://github.com/Zzh-tju/DIoU-SSD-pytorch/blob/master/utils/box/box_utils.py#L47\n                v = (4 / math.pi ** 2) * (torch.atan(w2 / h2) - torch.atan(w1 / h1)).pow(2)\n                with torch.no_grad():\n                    alpha = v / (v - iou + (1 + eps))\n                return iou - (rho2 / c2 + v * alpha)  # CIoU\n            return iou - rho2 / c2  # DIoU\n        c_area = cw * ch + eps  # convex area\n        return iou - (c_area - union) / c_area  # GIoU https://arxiv.org/pdf/1902.09630.pdf\n    return iou  # IoU\n\n\ndef box_iou(box1, box2, eps=1e-7):\n    # https://github.com/pytorch/vision/blob/master/torchvision/ops/boxes.py\n    \"\"\"\n    Return intersection-over-union (Jaccard index) of boxes.\n    Both sets of boxes are expected to be in (x1, y1, x2, y2) format.\n    Arguments:\n        box1 (Tensor[N, 4])\n        box2 (Tensor[M, 4])\n    Returns:\n        iou (Tensor[N, M]): the NxM matrix containing the pairwise\n            IoU values for every element in boxes1 and boxes2\n    \"\"\"\n\n    # inter(N,M) = (rb(N,M,2) - lt(N,M,2)).clamp(0).prod(2)\n    (a1, a2), (b1, b2) = box1.unsqueeze(1).chunk(2, 2), box2.unsqueeze(0).chunk(2, 2)\n    inter = (torch.min(a2, b2) - torch.max(a1, b1)).clamp(0).prod(2)\n\n    # IoU = inter / (area1 + area2 - inter)\n    return inter / ((a2 - a1).prod(2) + (b2 - b1).prod(2) - inter + eps)\n\n\ndef bbox_ioa(box1, box2, eps=1e-7):\n    \"\"\" Returns the intersection over box2 area given box1, box2. Boxes are x1y1x2y2\n    box1:       np.array of shape(4)\n    box2:       np.array of shape(nx4)\n    returns:    np.array of shape(n)\n    \"\"\"\n\n    # Get the coordinates of bounding boxes\n    b1_x1, b1_y1, b1_x2, b1_y2 = box1\n    b2_x1, b2_y1, b2_x2, b2_y2 = box2.T\n\n    # Intersection area\n    inter_area = (np.minimum(b1_x2, b2_x2) - np.maximum(b1_x1, b2_x1)).clip(0) * \\\n                 (np.minimum(b1_y2, b2_y2) - np.maximum(b1_y1, b2_y1)).clip(0)\n\n    # box2 area\n    box2_area = (b2_x2 - b2_x1) * (b2_y2 - b2_y1) + eps\n\n    # Intersection over box2 area\n    return inter_area / box2_area\n\n\ndef wh_iou(wh1, wh2, eps=1e-7):\n    # Returns the nxm IoU matrix. wh1 is nx2, wh2 is mx2\n    wh1 = wh1[:, None]  # [N,1,2]\n    wh2 = wh2[None]  # [1,M,2]\n    inter = torch.min(wh1, wh2).prod(2)  # [N,M]\n    return inter / (wh1.prod(2) + wh2.prod(2) - inter + eps)  # iou = inter / (area1 + area2 - inter)\n\n\n# Plots ----------------------------------------------------------------------------------------------------------------\n\n\n@threaded\ndef plot_pr_curve(px, py, ap, save_dir=Path('pr_curve.png'), names=()):\n    # Precision-recall curve\n    fig, ax = plt.subplots(1, 1, figsize=(9, 6), tight_layout=True)\n    py = np.stack(py, axis=1)\n\n    if 0 < len(names) < 21:  # display per-class legend if < 21 classes\n        for i, y in enumerate(py.T):\n            ax.plot(px, y, linewidth=1, label=f'{names[i]} {ap[i, 0]:.3f}')  # plot(recall, precision)\n    else:\n        ax.plot(px, py, linewidth=1, color='grey')  # plot(recall, precision)\n\n    ax.plot(px, py.mean(1), linewidth=3, color='blue', label='all classes %.3f mAP@0.5' % ap[:, 0].mean())\n    ax.set_xlabel('Recall')\n    ax.set_ylabel('Precision')\n    ax.set_xlim(0, 1)\n    ax.set_ylim(0, 1)\n    ax.legend(bbox_to_anchor=(1.04, 1), loc='upper left')\n    ax.set_title('Precision-Recall Curve')\n    fig.savefig(save_dir, dpi=250)\n    plt.close(fig)\n\n\n@threaded\ndef plot_mc_curve(px, py, save_dir=Path('mc_curve.png'), names=(), xlabel='Confidence', ylabel='Metric'):\n    # Metric-confidence curve\n    fig, ax = plt.subplots(1, 1, figsize=(9, 6), tight_layout=True)\n\n    if 0 < len(names) < 21:  # display per-class legend if < 21 classes\n        for i, y in enumerate(py):\n            ax.plot(px, y, linewidth=1, label=f'{names[i]}')  # plot(confidence, metric)\n    else:\n        ax.plot(px, py.T, linewidth=1, color='grey')  # plot(confidence, metric)\n\n    y = smooth(py.mean(0), 0.05)\n    ax.plot(px, y, linewidth=3, color='blue', label=f'all classes {y.max():.2f} at {px[y.argmax()]:.3f}')\n    ax.set_xlabel(xlabel)\n    ax.set_ylabel(ylabel)\n    ax.set_xlim(0, 1)\n    ax.set_ylim(0, 1)\n    ax.legend(bbox_to_anchor=(1.04, 1), loc='upper left')\n    ax.set_title(f'{ylabel}-Confidence Curve')\n    fig.savefig(save_dir, dpi=250)\n    plt.close(fig)\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/utils/plots.py",
    "content": "# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n\"\"\"\nPlotting utils\n\"\"\"\n\nimport contextlib\nimport math\nimport os\nfrom copy import copy\nfrom pathlib import Path\nfrom urllib.error import URLError\n\nimport cv2\nimport matplotlib\nimport matplotlib.pyplot as plt\nimport numpy as np\nimport pandas as pd\nimport seaborn as sn\nimport torch\nfrom PIL import Image, ImageDraw, ImageFont\n\nfrom utils import TryExcept, threaded\nfrom utils.general import (CONFIG_DIR, FONT, LOGGER, check_font, check_requirements, clip_boxes, increment_path,\n                           is_ascii, xywh2xyxy, xyxy2xywh)\nfrom utils.metrics import fitness\nfrom utils.segment.general import scale_image\n\n# Settings\nRANK = int(os.getenv('RANK', -1))\nmatplotlib.rc('font', **{'size': 11})\nmatplotlib.use('Agg')  # for writing to files only\n\n\nclass Colors:\n    # Ultralytics color palette https://ultralytics.com/\n    def __init__(self):\n        # hex = matplotlib.colors.TABLEAU_COLORS.values()\n        hexs = ('FF3838', 'FF9D97', 'FF701F', 'FFB21D', 'CFD231', '48F90A', '92CC17', '3DDB86', '1A9334', '00D4BB',\n                '2C99A8', '00C2FF', '344593', '6473FF', '0018EC', '8438FF', '520085', 'CB38FF', 'FF95C8', 'FF37C7')\n        self.palette = [self.hex2rgb(f'#{c}') for c in hexs]\n        self.n = len(self.palette)\n\n    def __call__(self, i, bgr=False):\n        c = self.palette[int(i) % self.n]\n        return (c[2], c[1], c[0]) if bgr else c\n\n    @staticmethod\n    def hex2rgb(h):  # rgb order (PIL)\n        return tuple(int(h[1 + i:1 + i + 2], 16) for i in (0, 2, 4))\n\n\ncolors = Colors()  # create instance for 'from utils.plots import colors'\n\n\ndef check_pil_font(font=FONT, size=10):\n    # Return a PIL TrueType Font, downloading to CONFIG_DIR if necessary\n    font = Path(font)\n    font = font if font.exists() else (CONFIG_DIR / font.name)\n    try:\n        return ImageFont.truetype(str(font) if font.exists() else font.name, size)\n    except Exception:  # download if missing\n        try:\n            check_font(font)\n            return ImageFont.truetype(str(font), size)\n        except TypeError:\n            check_requirements('Pillow>=8.4.0')  # known issue https://github.com/ultralytics/yolov5/issues/5374\n        except URLError:  # not online\n            return ImageFont.load_default()\n\n\nclass Annotator:\n    # YOLOv5 Annotator for train/val mosaics and jpgs and detect/hub inference annotations\n    def __init__(self, im, line_width=None, font_size=None, font='Arial.ttf', pil=False, example='abc'):\n        assert im.data.contiguous, 'Image not contiguous. Apply np.ascontiguousarray(im) to Annotator() input images.'\n        non_ascii = not is_ascii(example)  # non-latin labels, i.e. asian, arabic, cyrillic\n        self.pil = pil or non_ascii\n        if self.pil:  # use PIL\n            self.im = im if isinstance(im, Image.Image) else Image.fromarray(im)\n            self.draw = ImageDraw.Draw(self.im)\n            self.font = check_pil_font(font='Arial.Unicode.ttf' if non_ascii else font,\n                                       size=font_size or max(round(sum(self.im.size) / 2 * 0.035), 12))\n        else:  # use cv2\n            self.im = im\n        self.lw = line_width or max(round(sum(im.shape) / 2 * 0.003), 2)  # line width\n\n    def box_label(self, box, label='', color=(128, 128, 128), txt_color=(255, 255, 255)):\n        # Add one xyxy box to image with label\n        if self.pil or not is_ascii(label):\n            self.draw.rectangle(box, width=self.lw, outline=color)  # box\n            if label:\n                w, h = self.font.getsize(label)  # text width, height (WARNING: deprecated) in 9.2.0\n                # _, _, w, h = self.font.getbbox(label)  # text width, height (New)\n                outside = box[1] - h >= 0  # label fits outside box\n                self.draw.rectangle(\n                    (box[0], box[1] - h if outside else box[1], box[0] + w + 1,\n                     box[1] + 1 if outside else box[1] + h + 1),\n                    fill=color,\n                )\n                # self.draw.text((box[0], box[1]), label, fill=txt_color, font=self.font, anchor='ls')  # for PIL>8.0\n                self.draw.text((box[0], box[1] - h if outside else box[1]), label, fill=txt_color, font=self.font)\n        else:  # cv2\n            p1, p2 = (int(box[0]), int(box[1])), (int(box[2]), int(box[3]))\n            cv2.rectangle(self.im, p1, p2, color, thickness=self.lw, lineType=cv2.LINE_AA)\n            if label:\n                tf = max(self.lw - 1, 1)  # font thickness\n                w, h = cv2.getTextSize(label, 0, fontScale=self.lw / 3, thickness=tf)[0]  # text width, height\n                outside = p1[1] - h >= 3\n                p2 = p1[0] + w, p1[1] - h - 3 if outside else p1[1] + h + 3\n                cv2.rectangle(self.im, p1, p2, color, -1, cv2.LINE_AA)  # filled\n                cv2.putText(self.im,\n                            label, (p1[0], p1[1] - 2 if outside else p1[1] + h + 2),\n                            0,\n                            self.lw / 3,\n                            txt_color,\n                            thickness=tf,\n                            lineType=cv2.LINE_AA)\n\n    def masks(self, masks, colors, im_gpu, alpha=0.5, retina_masks=False):\n        \"\"\"Plot masks at once.\n        Args:\n            masks (tensor): predicted masks on cuda, shape: [n, h, w]\n            colors (List[List[Int]]): colors for predicted masks, [[r, g, b] * n]\n            im_gpu (tensor): img is in cuda, shape: [3, h, w], range: [0, 1]\n            alpha (float): mask transparency: 0.0 fully transparent, 1.0 opaque\n        \"\"\"\n        if self.pil:\n            # convert to numpy first\n            self.im = np.asarray(self.im).copy()\n        if len(masks) == 0:\n            self.im[:] = im_gpu.permute(1, 2, 0).contiguous().cpu().numpy() * 255\n        colors = torch.tensor(colors, device=im_gpu.device, dtype=torch.float32) / 255.0\n        colors = colors[:, None, None]  # shape(n,1,1,3)\n        masks = masks.unsqueeze(3)  # shape(n,h,w,1)\n        masks_color = masks * (colors * alpha)  # shape(n,h,w,3)\n\n        inv_alph_masks = (1 - masks * alpha).cumprod(0)  # shape(n,h,w,1)\n        mcs = (masks_color * inv_alph_masks).sum(0) * 2  # mask color summand shape(n,h,w,3)\n\n        im_gpu = im_gpu.flip(dims=[0])  # flip channel\n        im_gpu = im_gpu.permute(1, 2, 0).contiguous()  # shape(h,w,3)\n        im_gpu = im_gpu * inv_alph_masks[-1] + mcs\n        im_mask = (im_gpu * 255).byte().cpu().numpy()\n        self.im[:] = im_mask if retina_masks else scale_image(im_gpu.shape, im_mask, self.im.shape)\n        if self.pil:\n            # convert im back to PIL and update draw\n            self.fromarray(self.im)\n\n    def rectangle(self, xy, fill=None, outline=None, width=1):\n        # Add rectangle to image (PIL-only)\n        self.draw.rectangle(xy, fill, outline, width)\n\n    def text(self, xy, text, txt_color=(255, 255, 255), anchor='top'):\n        # Add text to image (PIL-only)\n        if anchor == 'bottom':  # start y from font bottom\n            w, h = self.font.getsize(text)  # text width, height\n            xy[1] += 1 - h\n        self.draw.text(xy, text, fill=txt_color, font=self.font)\n\n    def fromarray(self, im):\n        # Update self.im from a numpy array\n        self.im = im if isinstance(im, Image.Image) else Image.fromarray(im)\n        self.draw = ImageDraw.Draw(self.im)\n\n    def result(self):\n        # Return annotated image as array\n        return np.asarray(self.im)\n\n\ndef feature_visualization(x, module_type, stage, n=32, save_dir=Path('runs/detect/exp')):\n    \"\"\"\n    x:              Features to be visualized\n    module_type:    Module type\n    stage:          Module stage within model\n    n:              Maximum number of feature maps to plot\n    save_dir:       Directory to save results\n    \"\"\"\n    if 'Detect' not in module_type:\n        batch, channels, height, width = x.shape  # batch, channels, height, width\n        if height > 1 and width > 1:\n            f = save_dir / f\"stage{stage}_{module_type.split('.')[-1]}_features.png\"  # filename\n\n            blocks = torch.chunk(x[0].cpu(), channels, dim=0)  # select batch index 0, block by channels\n            n = min(n, channels)  # number of plots\n            fig, ax = plt.subplots(math.ceil(n / 8), 8, tight_layout=True)  # 8 rows x n/8 cols\n            ax = ax.ravel()\n            plt.subplots_adjust(wspace=0.05, hspace=0.05)\n            for i in range(n):\n                ax[i].imshow(blocks[i].squeeze())  # cmap='gray'\n                ax[i].axis('off')\n\n            LOGGER.info(f'Saving {f}... ({n}/{channels})')\n            plt.savefig(f, dpi=300, bbox_inches='tight')\n            plt.close()\n            np.save(str(f.with_suffix('.npy')), x[0].cpu().numpy())  # npy save\n\n\ndef hist2d(x, y, n=100):\n    # 2d histogram used in labels.png and evolve.png\n    xedges, yedges = np.linspace(x.min(), x.max(), n), np.linspace(y.min(), y.max(), n)\n    hist, xedges, yedges = np.histogram2d(x, y, (xedges, yedges))\n    xidx = np.clip(np.digitize(x, xedges) - 1, 0, hist.shape[0] - 1)\n    yidx = np.clip(np.digitize(y, yedges) - 1, 0, hist.shape[1] - 1)\n    return np.log(hist[xidx, yidx])\n\n\ndef butter_lowpass_filtfilt(data, cutoff=1500, fs=50000, order=5):\n    from scipy.signal import butter, filtfilt\n\n    # https://stackoverflow.com/questions/28536191/how-to-filter-smooth-with-scipy-numpy\n    def butter_lowpass(cutoff, fs, order):\n        nyq = 0.5 * fs\n        normal_cutoff = cutoff / nyq\n        return butter(order, normal_cutoff, btype='low', analog=False)\n\n    b, a = butter_lowpass(cutoff, fs, order=order)\n    return filtfilt(b, a, data)  # forward-backward filter\n\n\ndef output_to_target(output, max_det=300):\n    # Convert model output to target format [batch_id, class_id, x, y, w, h, conf] for plotting\n    targets = []\n    for i, o in enumerate(output):\n        box, conf, cls = o[:max_det, :6].cpu().split((4, 1, 1), 1)\n        j = torch.full((conf.shape[0], 1), i)\n        targets.append(torch.cat((j, cls, xyxy2xywh(box), conf), 1))\n    return torch.cat(targets, 0).numpy()\n\n\n@threaded\ndef plot_images(images, targets, paths=None, fname='images.jpg', names=None):\n    # Plot image grid with labels\n    if isinstance(images, torch.Tensor):\n        images = images.cpu().float().numpy()\n    if isinstance(targets, torch.Tensor):\n        targets = targets.cpu().numpy()\n\n    max_size = 1920  # max image size\n    max_subplots = 16  # max image subplots, i.e. 4x4\n    bs, _, h, w = images.shape  # batch size, _, height, width\n    bs = min(bs, max_subplots)  # limit plot images\n    ns = np.ceil(bs ** 0.5)  # number of subplots (square)\n    if np.max(images[0]) <= 1:\n        images *= 255  # de-normalise (optional)\n\n    # Build Image\n    mosaic = np.full((int(ns * h), int(ns * w), 3), 255, dtype=np.uint8)  # init\n    for i, im in enumerate(images):\n        if i == max_subplots:  # if last batch has fewer images than we expect\n            break\n        x, y = int(w * (i // ns)), int(h * (i % ns))  # block origin\n        im = im.transpose(1, 2, 0)\n        mosaic[y:y + h, x:x + w, :] = im\n\n    # Resize (optional)\n    scale = max_size / ns / max(h, w)\n    if scale < 1:\n        h = math.ceil(scale * h)\n        w = math.ceil(scale * w)\n        mosaic = cv2.resize(mosaic, tuple(int(x * ns) for x in (w, h)))\n\n    # Annotate\n    fs = int((h + w) * ns * 0.01)  # font size\n    annotator = Annotator(mosaic, line_width=round(fs / 10), font_size=fs, pil=True, example=names)\n    for i in range(i + 1):\n        x, y = int(w * (i // ns)), int(h * (i % ns))  # block origin\n        annotator.rectangle([x, y, x + w, y + h], None, (255, 255, 255), width=2)  # borders\n        if paths:\n            annotator.text((x + 5, y + 5), text=Path(paths[i]).name[:40], txt_color=(220, 220, 220))  # filenames\n        if len(targets) > 0:\n            ti = targets[targets[:, 0] == i]  # image targets\n            boxes = xywh2xyxy(ti[:, 2:6]).T\n            classes = ti[:, 1].astype('int')\n            labels = ti.shape[1] == 6  # labels if no conf column\n            conf = None if labels else ti[:, 6]  # check for confidence presence (label vs pred)\n\n            if boxes.shape[1]:\n                if boxes.max() <= 1.01:  # if normalized with tolerance 0.01\n                    boxes[[0, 2]] *= w  # scale to pixels\n                    boxes[[1, 3]] *= h\n                elif scale < 1:  # absolute coords need scale if image scales\n                    boxes *= scale\n            boxes[[0, 2]] += x\n            boxes[[1, 3]] += y\n            for j, box in enumerate(boxes.T.tolist()):\n                cls = classes[j]\n                color = colors(cls)\n                cls = names[cls] if names else cls\n                if labels or conf[j] > 0.25:  # 0.25 conf thresh\n                    label = f'{cls}' if labels else f'{cls} {conf[j]:.1f}'\n                    annotator.box_label(box, label, color=color)\n    annotator.im.save(fname)  # save\n\n\ndef plot_lr_scheduler(optimizer, scheduler, epochs=300, save_dir=''):\n    # Plot LR simulating training for full epochs\n    optimizer, scheduler = copy(optimizer), copy(scheduler)  # do not modify originals\n    y = []\n    for _ in range(epochs):\n        scheduler.step()\n        y.append(optimizer.param_groups[0]['lr'])\n    plt.plot(y, '.-', label='LR')\n    plt.xlabel('epoch')\n    plt.ylabel('LR')\n    plt.grid()\n    plt.xlim(0, epochs)\n    plt.ylim(0)\n    plt.savefig(Path(save_dir) / 'LR.png', dpi=200)\n    plt.close()\n\n\ndef plot_val_txt():  # from utils.plots import *; plot_val()\n    # Plot val.txt histograms\n    x = np.loadtxt('val.txt', dtype=np.float32)\n    box = xyxy2xywh(x[:, :4])\n    cx, cy = box[:, 0], box[:, 1]\n\n    fig, ax = plt.subplots(1, 1, figsize=(6, 6), tight_layout=True)\n    ax.hist2d(cx, cy, bins=600, cmax=10, cmin=0)\n    ax.set_aspect('equal')\n    plt.savefig('hist2d.png', dpi=300)\n\n    fig, ax = plt.subplots(1, 2, figsize=(12, 6), tight_layout=True)\n    ax[0].hist(cx, bins=600)\n    ax[1].hist(cy, bins=600)\n    plt.savefig('hist1d.png', dpi=200)\n\n\ndef plot_targets_txt():  # from utils.plots import *; plot_targets_txt()\n    # Plot targets.txt histograms\n    x = np.loadtxt('targets.txt', dtype=np.float32).T\n    s = ['x targets', 'y targets', 'width targets', 'height targets']\n    fig, ax = plt.subplots(2, 2, figsize=(8, 8), tight_layout=True)\n    ax = ax.ravel()\n    for i in range(4):\n        ax[i].hist(x[i], bins=100, label=f'{x[i].mean():.3g} +/- {x[i].std():.3g}')\n        ax[i].legend()\n        ax[i].set_title(s[i])\n    plt.savefig('targets.jpg', dpi=200)\n\n\ndef plot_val_study(file='', dir='', x=None):  # from utils.plots import *; plot_val_study()\n    # Plot file=study.txt generated by val.py (or plot all study*.txt in dir)\n    save_dir = Path(file).parent if file else Path(dir)\n    plot2 = False  # plot additional results\n    if plot2:\n        ax = plt.subplots(2, 4, figsize=(10, 6), tight_layout=True)[1].ravel()\n\n    fig2, ax2 = plt.subplots(1, 1, figsize=(8, 4), tight_layout=True)\n    # for f in [save_dir / f'study_coco_{x}.txt' for x in ['yolov5n6', 'yolov5s6', 'yolov5m6', 'yolov5l6', 'yolov5x6']]:\n    for f in sorted(save_dir.glob('study*.txt')):\n        y = np.loadtxt(f, dtype=np.float32, usecols=[0, 1, 2, 3, 7, 8, 9], ndmin=2).T\n        x = np.arange(y.shape[1]) if x is None else np.array(x)\n        if plot2:\n            s = ['P', 'R', 'mAP@.5', 'mAP@.5:.95', 't_preprocess (ms/img)', 't_inference (ms/img)', 't_NMS (ms/img)']\n            for i in range(7):\n                ax[i].plot(x, y[i], '.-', linewidth=2, markersize=8)\n                ax[i].set_title(s[i])\n\n        j = y[3].argmax() + 1\n        ax2.plot(y[5, 1:j],\n                 y[3, 1:j] * 1E2,\n                 '.-',\n                 linewidth=2,\n                 markersize=8,\n                 label=f.stem.replace('study_coco_', '').replace('yolo', 'YOLO'))\n\n    ax2.plot(1E3 / np.array([209, 140, 97, 58, 35, 18]), [34.6, 40.5, 43.0, 47.5, 49.7, 51.5],\n             'k.-',\n             linewidth=2,\n             markersize=8,\n             alpha=.25,\n             label='EfficientDet')\n\n    ax2.grid(alpha=0.2)\n    ax2.set_yticks(np.arange(20, 60, 5))\n    ax2.set_xlim(0, 57)\n    ax2.set_ylim(25, 55)\n    ax2.set_xlabel('GPU Speed (ms/img)')\n    ax2.set_ylabel('COCO AP val')\n    ax2.legend(loc='lower right')\n    f = save_dir / 'study.png'\n    print(f'Saving {f}...')\n    plt.savefig(f, dpi=300)\n\n\n@TryExcept()  # known issue https://github.com/ultralytics/yolov5/issues/5395\ndef plot_labels(labels, names=(), save_dir=Path('')):\n    # plot dataset labels\n    LOGGER.info(f\"Plotting labels to {save_dir / 'labels.jpg'}... \")\n    c, b = labels[:, 0], labels[:, 1:].transpose()  # classes, boxes\n    nc = int(c.max() + 1)  # number of classes\n    x = pd.DataFrame(b.transpose(), columns=['x', 'y', 'width', 'height'])\n\n    # seaborn correlogram\n    sn.pairplot(x, corner=True, diag_kind='auto', kind='hist', diag_kws=dict(bins=50), plot_kws=dict(pmax=0.9))\n    plt.savefig(save_dir / 'labels_correlogram.jpg', dpi=200)\n    plt.close()\n\n    # matplotlib labels\n    matplotlib.use('svg')  # faster\n    ax = plt.subplots(2, 2, figsize=(8, 8), tight_layout=True)[1].ravel()\n    y = ax[0].hist(c, bins=np.linspace(0, nc, nc + 1) - 0.5, rwidth=0.8)\n    with contextlib.suppress(Exception):  # color histogram bars by class\n        [y[2].patches[i].set_color([x / 255 for x in colors(i)]) for i in range(nc)]  # known issue #3195\n    ax[0].set_ylabel('instances')\n    if 0 < len(names) < 30:\n        ax[0].set_xticks(range(len(names)))\n        ax[0].set_xticklabels(list(names.values()), rotation=90, fontsize=10)\n    else:\n        ax[0].set_xlabel('classes')\n    sn.histplot(x, x='x', y='y', ax=ax[2], bins=50, pmax=0.9)\n    sn.histplot(x, x='width', y='height', ax=ax[3], bins=50, pmax=0.9)\n\n    # rectangles\n    labels[:, 1:3] = 0.5  # center\n    labels[:, 1:] = xywh2xyxy(labels[:, 1:]) * 2000\n    img = Image.fromarray(np.ones((2000, 2000, 3), dtype=np.uint8) * 255)\n    for cls, *box in labels[:1000]:\n        ImageDraw.Draw(img).rectangle(box, width=1, outline=colors(cls))  # plot\n    ax[1].imshow(img)\n    ax[1].axis('off')\n\n    for a in [0, 1, 2, 3]:\n        for s in ['top', 'right', 'left', 'bottom']:\n            ax[a].spines[s].set_visible(False)\n\n    plt.savefig(save_dir / 'labels.jpg', dpi=200)\n    matplotlib.use('Agg')\n    plt.close()\n\n\ndef imshow_cls(im, labels=None, pred=None, names=None, nmax=25, verbose=False, f=Path('images.jpg')):\n    # Show classification image grid with labels (optional) and predictions (optional)\n    from utils.augmentations import denormalize\n\n    names = names or [f'class{i}' for i in range(1000)]\n    blocks = torch.chunk(denormalize(im.clone()).cpu().float(), len(im),\n                         dim=0)  # select batch index 0, block by channels\n    n = min(len(blocks), nmax)  # number of plots\n    m = min(8, round(n ** 0.5))  # 8 x 8 default\n    fig, ax = plt.subplots(math.ceil(n / m), m)  # 8 rows x n/8 cols\n    ax = ax.ravel() if m > 1 else [ax]\n    # plt.subplots_adjust(wspace=0.05, hspace=0.05)\n    for i in range(n):\n        ax[i].imshow(blocks[i].squeeze().permute((1, 2, 0)).numpy().clip(0.0, 1.0))\n        ax[i].axis('off')\n        if labels is not None:\n            s = names[labels[i]] + (f'—{names[pred[i]]}' if pred is not None else '')\n            ax[i].set_title(s, fontsize=8, verticalalignment='top')\n    plt.savefig(f, dpi=300, bbox_inches='tight')\n    plt.close()\n    if verbose:\n        LOGGER.info(f'Saving {f}')\n        if labels is not None:\n            LOGGER.info('True:     ' + ' '.join(f'{names[i]:3s}' for i in labels[:nmax]))\n        if pred is not None:\n            LOGGER.info('Predicted:' + ' '.join(f'{names[i]:3s}' for i in pred[:nmax]))\n    return f\n\n\ndef plot_evolve(evolve_csv='path/to/evolve.csv'):  # from utils.plots import *; plot_evolve()\n    # Plot evolve.csv hyp evolution results\n    evolve_csv = Path(evolve_csv)\n    data = pd.read_csv(evolve_csv)\n    keys = [x.strip() for x in data.columns]\n    x = data.values\n    f = fitness(x)\n    j = np.argmax(f)  # max fitness index\n    plt.figure(figsize=(10, 12), tight_layout=True)\n    matplotlib.rc('font', **{'size': 8})\n    print(f'Best results from row {j} of {evolve_csv}:')\n    for i, k in enumerate(keys[7:]):\n        v = x[:, 7 + i]\n        mu = v[j]  # best single result\n        plt.subplot(6, 5, i + 1)\n        plt.scatter(v, f, c=hist2d(v, f, 20), cmap='viridis', alpha=.8, edgecolors='none')\n        plt.plot(mu, f.max(), 'k+', markersize=15)\n        plt.title(f'{k} = {mu:.3g}', fontdict={'size': 9})  # limit to 40 characters\n        if i % 5 != 0:\n            plt.yticks([])\n        print(f'{k:>15}: {mu:.3g}')\n    f = evolve_csv.with_suffix('.png')  # filename\n    plt.savefig(f, dpi=200)\n    plt.close()\n    print(f'Saved {f}')\n\n\ndef plot_results(file='path/to/results.csv', dir=''):\n    # Plot training results.csv. Usage: from utils.plots import *; plot_results('path/to/results.csv')\n    save_dir = Path(file).parent if file else Path(dir)\n    fig, ax = plt.subplots(2, 5, figsize=(12, 6), tight_layout=True)\n    ax = ax.ravel()\n    files = list(save_dir.glob('results*.csv'))\n    assert len(files), f'No results.csv files found in {save_dir.resolve()}, nothing to plot.'\n    for f in files:\n        try:\n            data = pd.read_csv(f)\n            s = [x.strip() for x in data.columns]\n            x = data.values[:, 0]\n            for i, j in enumerate([1, 2, 3, 4, 5, 8, 9, 10, 6, 7]):\n                y = data.values[:, j].astype('float')\n                # y[y == 0] = np.nan  # don't show zero values\n                ax[i].plot(x, y, marker='.', label=f.stem, linewidth=2, markersize=8)\n                ax[i].set_title(s[j], fontsize=12)\n                # if j in [8, 9, 10]:  # share train and val loss y axes\n                #     ax[i].get_shared_y_axes().join(ax[i], ax[i - 5])\n        except Exception as e:\n            LOGGER.info(f'Warning: Plotting error for {f}: {e}')\n    ax[1].legend()\n    fig.savefig(save_dir / 'results.png', dpi=200)\n    plt.close()\n\n\ndef profile_idetection(start=0, stop=0, labels=(), save_dir=''):\n    # Plot iDetection '*.txt' per-image logs. from utils.plots import *; profile_idetection()\n    ax = plt.subplots(2, 4, figsize=(12, 6), tight_layout=True)[1].ravel()\n    s = ['Images', 'Free Storage (GB)', 'RAM Usage (GB)', 'Battery', 'dt_raw (ms)', 'dt_smooth (ms)', 'real-world FPS']\n    files = list(Path(save_dir).glob('frames*.txt'))\n    for fi, f in enumerate(files):\n        try:\n            results = np.loadtxt(f, ndmin=2).T[:, 90:-30]  # clip first and last rows\n            n = results.shape[1]  # number of rows\n            x = np.arange(start, min(stop, n) if stop else n)\n            results = results[:, x]\n            t = (results[0] - results[0].min())  # set t0=0s\n            results[0] = x\n            for i, a in enumerate(ax):\n                if i < len(results):\n                    label = labels[fi] if len(labels) else f.stem.replace('frames_', '')\n                    a.plot(t, results[i], marker='.', label=label, linewidth=1, markersize=5)\n                    a.set_title(s[i])\n                    a.set_xlabel('time (s)')\n                    # if fi == len(files) - 1:\n                    #     a.set_ylim(bottom=0)\n                    for side in ['top', 'right']:\n                        a.spines[side].set_visible(False)\n                else:\n                    a.remove()\n        except Exception as e:\n            print(f'Warning: Plotting error for {f}; {e}')\n    ax[1].legend()\n    plt.savefig(Path(save_dir) / 'idetection_profile.png', dpi=200)\n\n\ndef save_one_box(xyxy, im, file=Path('im.jpg'), gain=1.02, pad=10, square=False, BGR=False, save=True):\n    # Save image crop as {file} with crop size multiple {gain} and {pad} pixels. Save and/or return crop\n    xyxy = torch.tensor(xyxy).view(-1, 4)\n    b = xyxy2xywh(xyxy)  # boxes\n    if square:\n        b[:, 2:] = b[:, 2:].max(1)[0].unsqueeze(1)  # attempt rectangle to square\n    b[:, 2:] = b[:, 2:] * gain + pad  # box wh * gain + pad\n    xyxy = xywh2xyxy(b).long()\n    clip_boxes(xyxy, im.shape)\n    crop = im[int(xyxy[0, 1]):int(xyxy[0, 3]), int(xyxy[0, 0]):int(xyxy[0, 2]), ::(1 if BGR else -1)]\n    if save:\n        file.parent.mkdir(parents=True, exist_ok=True)  # make directory\n        f = str(increment_path(file).with_suffix('.jpg'))\n        # cv2.imwrite(f, crop)  # save BGR, https://github.com/ultralytics/yolov5/issues/7007 chroma subsampling issue\n        Image.fromarray(crop[..., ::-1]).save(f, quality=95, subsampling=0)  # save RGB\n    return crop\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/utils/segment/__init__.py",
    "content": ""
  },
  {
    "path": "yolo-improve/yolov5-AUX/utils/segment/augmentations.py",
    "content": "# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n\"\"\"\nImage augmentation functions\n\"\"\"\n\nimport math\nimport random\n\nimport cv2\nimport numpy as np\n\nfrom ..augmentations import box_candidates\nfrom ..general import resample_segments, segment2box\n\n\ndef mixup(im, labels, segments, im2, labels2, segments2):\n    # Applies MixUp augmentation https://arxiv.org/pdf/1710.09412.pdf\n    r = np.random.beta(32.0, 32.0)  # mixup ratio, alpha=beta=32.0\n    im = (im * r + im2 * (1 - r)).astype(np.uint8)\n    labels = np.concatenate((labels, labels2), 0)\n    segments = np.concatenate((segments, segments2), 0)\n    return im, labels, segments\n\n\ndef random_perspective(im,\n                       targets=(),\n                       segments=(),\n                       degrees=10,\n                       translate=.1,\n                       scale=.1,\n                       shear=10,\n                       perspective=0.0,\n                       border=(0, 0)):\n    # torchvision.transforms.RandomAffine(degrees=(-10, 10), translate=(.1, .1), scale=(.9, 1.1), shear=(-10, 10))\n    # targets = [cls, xyxy]\n\n    height = im.shape[0] + border[0] * 2  # shape(h,w,c)\n    width = im.shape[1] + border[1] * 2\n\n    # Center\n    C = np.eye(3)\n    C[0, 2] = -im.shape[1] / 2  # x translation (pixels)\n    C[1, 2] = -im.shape[0] / 2  # y translation (pixels)\n\n    # Perspective\n    P = np.eye(3)\n    P[2, 0] = random.uniform(-perspective, perspective)  # x perspective (about y)\n    P[2, 1] = random.uniform(-perspective, perspective)  # y perspective (about x)\n\n    # Rotation and Scale\n    R = np.eye(3)\n    a = random.uniform(-degrees, degrees)\n    # a += random.choice([-180, -90, 0, 90])  # add 90deg rotations to small rotations\n    s = random.uniform(1 - scale, 1 + scale)\n    # s = 2 ** random.uniform(-scale, scale)\n    R[:2] = cv2.getRotationMatrix2D(angle=a, center=(0, 0), scale=s)\n\n    # Shear\n    S = np.eye(3)\n    S[0, 1] = math.tan(random.uniform(-shear, shear) * math.pi / 180)  # x shear (deg)\n    S[1, 0] = math.tan(random.uniform(-shear, shear) * math.pi / 180)  # y shear (deg)\n\n    # Translation\n    T = np.eye(3)\n    T[0, 2] = (random.uniform(0.5 - translate, 0.5 + translate) * width)  # x translation (pixels)\n    T[1, 2] = (random.uniform(0.5 - translate, 0.5 + translate) * height)  # y translation (pixels)\n\n    # Combined rotation matrix\n    M = T @ S @ R @ P @ C  # order of operations (right to left) is IMPORTANT\n    if (border[0] != 0) or (border[1] != 0) or (M != np.eye(3)).any():  # image changed\n        if perspective:\n            im = cv2.warpPerspective(im, M, dsize=(width, height), borderValue=(114, 114, 114))\n        else:  # affine\n            im = cv2.warpAffine(im, M[:2], dsize=(width, height), borderValue=(114, 114, 114))\n\n    # Visualize\n    # import matplotlib.pyplot as plt\n    # ax = plt.subplots(1, 2, figsize=(12, 6))[1].ravel()\n    # ax[0].imshow(im[:, :, ::-1])  # base\n    # ax[1].imshow(im2[:, :, ::-1])  # warped\n\n    # Transform label coordinates\n    n = len(targets)\n    new_segments = []\n    if n:\n        new = np.zeros((n, 4))\n        segments = resample_segments(segments)  # upsample\n        for i, segment in enumerate(segments):\n            xy = np.ones((len(segment), 3))\n            xy[:, :2] = segment\n            xy = xy @ M.T  # transform\n            xy = (xy[:, :2] / xy[:, 2:3] if perspective else xy[:, :2])  # perspective rescale or affine\n\n            # clip\n            new[i] = segment2box(xy, width, height)\n            new_segments.append(xy)\n\n        # filter candidates\n        i = box_candidates(box1=targets[:, 1:5].T * s, box2=new.T, area_thr=0.01)\n        targets = targets[i]\n        targets[:, 1:5] = new[i]\n        new_segments = np.array(new_segments)[i]\n\n    return im, targets, new_segments\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/utils/segment/dataloaders.py",
    "content": "# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n\"\"\"\nDataloaders\n\"\"\"\n\nimport os\nimport random\n\nimport cv2\nimport numpy as np\nimport torch\nfrom torch.utils.data import DataLoader, distributed\n\nfrom ..augmentations import augment_hsv, copy_paste, letterbox\nfrom ..dataloaders import InfiniteDataLoader, LoadImagesAndLabels, seed_worker\nfrom ..general import LOGGER, xyn2xy, xywhn2xyxy, xyxy2xywhn\nfrom ..torch_utils import torch_distributed_zero_first\nfrom .augmentations import mixup, random_perspective\n\nRANK = int(os.getenv('RANK', -1))\n\n\ndef create_dataloader(path,\n                      imgsz,\n                      batch_size,\n                      stride,\n                      single_cls=False,\n                      hyp=None,\n                      augment=False,\n                      cache=False,\n                      pad=0.0,\n                      rect=False,\n                      rank=-1,\n                      workers=8,\n                      image_weights=False,\n                      quad=False,\n                      prefix='',\n                      shuffle=False,\n                      mask_downsample_ratio=1,\n                      overlap_mask=False,\n                      seed=0):\n    if rect and shuffle:\n        LOGGER.warning('WARNING ⚠️ --rect is incompatible with DataLoader shuffle, setting shuffle=False')\n        shuffle = False\n    with torch_distributed_zero_first(rank):  # init dataset *.cache only once if DDP\n        dataset = LoadImagesAndLabelsAndMasks(\n            path,\n            imgsz,\n            batch_size,\n            augment=augment,  # augmentation\n            hyp=hyp,  # hyperparameters\n            rect=rect,  # rectangular batches\n            cache_images=cache,\n            single_cls=single_cls,\n            stride=int(stride),\n            pad=pad,\n            image_weights=image_weights,\n            prefix=prefix,\n            downsample_ratio=mask_downsample_ratio,\n            overlap=overlap_mask)\n\n    batch_size = min(batch_size, len(dataset))\n    nd = torch.cuda.device_count()  # number of CUDA devices\n    nw = min([os.cpu_count() // max(nd, 1), batch_size if batch_size > 1 else 0, workers])  # number of workers\n    sampler = None if rank == -1 else distributed.DistributedSampler(dataset, shuffle=shuffle)\n    loader = DataLoader if image_weights else InfiniteDataLoader  # only DataLoader allows for attribute updates\n    generator = torch.Generator()\n    generator.manual_seed(6148914691236517205 + seed + RANK)\n    return loader(\n        dataset,\n        batch_size=batch_size,\n        shuffle=shuffle and sampler is None,\n        num_workers=nw,\n        sampler=sampler,\n        pin_memory=True,\n        collate_fn=LoadImagesAndLabelsAndMasks.collate_fn4 if quad else LoadImagesAndLabelsAndMasks.collate_fn,\n        worker_init_fn=seed_worker,\n        generator=generator,\n    ), dataset\n\n\nclass LoadImagesAndLabelsAndMasks(LoadImagesAndLabels):  # for training/testing\n\n    def __init__(\n        self,\n        path,\n        img_size=640,\n        batch_size=16,\n        augment=False,\n        hyp=None,\n        rect=False,\n        image_weights=False,\n        cache_images=False,\n        single_cls=False,\n        stride=32,\n        pad=0,\n        min_items=0,\n        prefix='',\n        downsample_ratio=1,\n        overlap=False,\n    ):\n        super().__init__(path, img_size, batch_size, augment, hyp, rect, image_weights, cache_images, single_cls,\n                         stride, pad, min_items, prefix)\n        self.downsample_ratio = downsample_ratio\n        self.overlap = overlap\n\n    def __getitem__(self, index):\n        index = self.indices[index]  # linear, shuffled, or image_weights\n\n        hyp = self.hyp\n        mosaic = self.mosaic and random.random() < hyp['mosaic']\n        masks = []\n        if mosaic:\n            # Load mosaic\n            img, labels, segments = self.load_mosaic(index)\n            shapes = None\n\n            # MixUp augmentation\n            if random.random() < hyp['mixup']:\n                img, labels, segments = mixup(img, labels, segments, *self.load_mosaic(random.randint(0, self.n - 1)))\n\n        else:\n            # Load image\n            img, (h0, w0), (h, w) = self.load_image(index)\n\n            # Letterbox\n            shape = self.batch_shapes[self.batch[index]] if self.rect else self.img_size  # final letterboxed shape\n            img, ratio, pad = letterbox(img, shape, auto=False, scaleup=self.augment)\n            shapes = (h0, w0), ((h / h0, w / w0), pad)  # for COCO mAP rescaling\n\n            labels = self.labels[index].copy()\n            # [array, array, ....], array.shape=(num_points, 2), xyxyxyxy\n            segments = self.segments[index].copy()\n            if len(segments):\n                for i_s in range(len(segments)):\n                    segments[i_s] = xyn2xy(\n                        segments[i_s],\n                        ratio[0] * w,\n                        ratio[1] * h,\n                        padw=pad[0],\n                        padh=pad[1],\n                    )\n            if labels.size:  # normalized xywh to pixel xyxy format\n                labels[:, 1:] = xywhn2xyxy(labels[:, 1:], ratio[0] * w, ratio[1] * h, padw=pad[0], padh=pad[1])\n\n            if self.augment:\n                img, labels, segments = random_perspective(img,\n                                                           labels,\n                                                           segments=segments,\n                                                           degrees=hyp['degrees'],\n                                                           translate=hyp['translate'],\n                                                           scale=hyp['scale'],\n                                                           shear=hyp['shear'],\n                                                           perspective=hyp['perspective'])\n\n        nl = len(labels)  # number of labels\n        if nl:\n            labels[:, 1:5] = xyxy2xywhn(labels[:, 1:5], w=img.shape[1], h=img.shape[0], clip=True, eps=1e-3)\n            if self.overlap:\n                masks, sorted_idx = polygons2masks_overlap(img.shape[:2],\n                                                           segments,\n                                                           downsample_ratio=self.downsample_ratio)\n                masks = masks[None]  # (640, 640) -> (1, 640, 640)\n                labels = labels[sorted_idx]\n            else:\n                masks = polygons2masks(img.shape[:2], segments, color=1, downsample_ratio=self.downsample_ratio)\n\n        masks = (torch.from_numpy(masks) if len(masks) else torch.zeros(1 if self.overlap else nl, img.shape[0] //\n                                                                        self.downsample_ratio, img.shape[1] //\n                                                                        self.downsample_ratio))\n        # TODO: albumentations support\n        if self.augment:\n            # Albumentations\n            # there are some augmentation that won't change boxes and masks,\n            # so just be it for now.\n            img, labels = self.albumentations(img, labels)\n            nl = len(labels)  # update after albumentations\n\n            # HSV color-space\n            augment_hsv(img, hgain=hyp['hsv_h'], sgain=hyp['hsv_s'], vgain=hyp['hsv_v'])\n\n            # Flip up-down\n            if random.random() < hyp['flipud']:\n                img = np.flipud(img)\n                if nl:\n                    labels[:, 2] = 1 - labels[:, 2]\n                    masks = torch.flip(masks, dims=[1])\n\n            # Flip left-right\n            if random.random() < hyp['fliplr']:\n                img = np.fliplr(img)\n                if nl:\n                    labels[:, 1] = 1 - labels[:, 1]\n                    masks = torch.flip(masks, dims=[2])\n\n            # Cutouts  # labels = cutout(img, labels, p=0.5)\n\n        labels_out = torch.zeros((nl, 6))\n        if nl:\n            labels_out[:, 1:] = torch.from_numpy(labels)\n\n        # Convert\n        img = img.transpose((2, 0, 1))[::-1]  # HWC to CHW, BGR to RGB\n        img = np.ascontiguousarray(img)\n\n        return (torch.from_numpy(img), labels_out, self.im_files[index], shapes, masks)\n\n    def load_mosaic(self, index):\n        # YOLOv5 4-mosaic loader. Loads 1 image + 3 random images into a 4-image mosaic\n        labels4, segments4 = [], []\n        s = self.img_size\n        yc, xc = (int(random.uniform(-x, 2 * s + x)) for x in self.mosaic_border)  # mosaic center x, y\n\n        # 3 additional image indices\n        indices = [index] + random.choices(self.indices, k=3)  # 3 additional image indices\n        for i, index in enumerate(indices):\n            # Load image\n            img, _, (h, w) = self.load_image(index)\n\n            # place img in img4\n            if i == 0:  # top left\n                img4 = np.full((s * 2, s * 2, img.shape[2]), 114, dtype=np.uint8)  # base image with 4 tiles\n                x1a, y1a, x2a, y2a = max(xc - w, 0), max(yc - h, 0), xc, yc  # xmin, ymin, xmax, ymax (large image)\n                x1b, y1b, x2b, y2b = w - (x2a - x1a), h - (y2a - y1a), w, h  # xmin, ymin, xmax, ymax (small image)\n            elif i == 1:  # top right\n                x1a, y1a, x2a, y2a = xc, max(yc - h, 0), min(xc + w, s * 2), yc\n                x1b, y1b, x2b, y2b = 0, h - (y2a - y1a), min(w, x2a - x1a), h\n            elif i == 2:  # bottom left\n                x1a, y1a, x2a, y2a = max(xc - w, 0), yc, xc, min(s * 2, yc + h)\n                x1b, y1b, x2b, y2b = w - (x2a - x1a), 0, w, min(y2a - y1a, h)\n            elif i == 3:  # bottom right\n                x1a, y1a, x2a, y2a = xc, yc, min(xc + w, s * 2), min(s * 2, yc + h)\n                x1b, y1b, x2b, y2b = 0, 0, min(w, x2a - x1a), min(y2a - y1a, h)\n\n            img4[y1a:y2a, x1a:x2a] = img[y1b:y2b, x1b:x2b]  # img4[ymin:ymax, xmin:xmax]\n            padw = x1a - x1b\n            padh = y1a - y1b\n\n            labels, segments = self.labels[index].copy(), self.segments[index].copy()\n\n            if labels.size:\n                labels[:, 1:] = xywhn2xyxy(labels[:, 1:], w, h, padw, padh)  # normalized xywh to pixel xyxy format\n                segments = [xyn2xy(x, w, h, padw, padh) for x in segments]\n            labels4.append(labels)\n            segments4.extend(segments)\n\n        # Concat/clip labels\n        labels4 = np.concatenate(labels4, 0)\n        for x in (labels4[:, 1:], *segments4):\n            np.clip(x, 0, 2 * s, out=x)  # clip when using random_perspective()\n        # img4, labels4 = replicate(img4, labels4)  # replicate\n\n        # Augment\n        img4, labels4, segments4 = copy_paste(img4, labels4, segments4, p=self.hyp['copy_paste'])\n        img4, labels4, segments4 = random_perspective(img4,\n                                                      labels4,\n                                                      segments4,\n                                                      degrees=self.hyp['degrees'],\n                                                      translate=self.hyp['translate'],\n                                                      scale=self.hyp['scale'],\n                                                      shear=self.hyp['shear'],\n                                                      perspective=self.hyp['perspective'],\n                                                      border=self.mosaic_border)  # border to remove\n        return img4, labels4, segments4\n\n    @staticmethod\n    def collate_fn(batch):\n        img, label, path, shapes, masks = zip(*batch)  # transposed\n        batched_masks = torch.cat(masks, 0)\n        for i, l in enumerate(label):\n            l[:, 0] = i  # add target image index for build_targets()\n        return torch.stack(img, 0), torch.cat(label, 0), path, shapes, batched_masks\n\n\ndef polygon2mask(img_size, polygons, color=1, downsample_ratio=1):\n    \"\"\"\n    Args:\n        img_size (tuple): The image size.\n        polygons (np.ndarray): [N, M], N is the number of polygons,\n            M is the number of points(Be divided by 2).\n    \"\"\"\n    mask = np.zeros(img_size, dtype=np.uint8)\n    polygons = np.asarray(polygons)\n    polygons = polygons.astype(np.int32)\n    shape = polygons.shape\n    polygons = polygons.reshape(shape[0], -1, 2)\n    cv2.fillPoly(mask, polygons, color=color)\n    nh, nw = (img_size[0] // downsample_ratio, img_size[1] // downsample_ratio)\n    # NOTE: fillPoly firstly then resize is trying the keep the same way\n    # of loss calculation when mask-ratio=1.\n    mask = cv2.resize(mask, (nw, nh))\n    return mask\n\n\ndef polygons2masks(img_size, polygons, color, downsample_ratio=1):\n    \"\"\"\n    Args:\n        img_size (tuple): The image size.\n        polygons (list[np.ndarray]): each polygon is [N, M],\n            N is the number of polygons,\n            M is the number of points(Be divided by 2).\n    \"\"\"\n    masks = []\n    for si in range(len(polygons)):\n        mask = polygon2mask(img_size, [polygons[si].reshape(-1)], color, downsample_ratio)\n        masks.append(mask)\n    return np.array(masks)\n\n\ndef polygons2masks_overlap(img_size, segments, downsample_ratio=1):\n    \"\"\"Return a (640, 640) overlap mask.\"\"\"\n    masks = np.zeros((img_size[0] // downsample_ratio, img_size[1] // downsample_ratio),\n                     dtype=np.int32 if len(segments) > 255 else np.uint8)\n    areas = []\n    ms = []\n    for si in range(len(segments)):\n        mask = polygon2mask(\n            img_size,\n            [segments[si].reshape(-1)],\n            downsample_ratio=downsample_ratio,\n            color=1,\n        )\n        ms.append(mask)\n        areas.append(mask.sum())\n    areas = np.asarray(areas)\n    index = np.argsort(-areas)\n    ms = np.array(ms)[index]\n    for i in range(len(segments)):\n        mask = ms[i] * (i + 1)\n        masks = masks + mask\n        masks = np.clip(masks, a_min=0, a_max=i + 1)\n    return masks, index\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/utils/segment/general.py",
    "content": "import cv2\nimport numpy as np\nimport torch\nimport torch.nn.functional as F\n\n\ndef crop_mask(masks, boxes):\n    \"\"\"\n    \"Crop\" predicted masks by zeroing out everything not in the predicted bbox.\n    Vectorized by Chong (thanks Chong).\n\n    Args:\n        - masks should be a size [h, w, n] tensor of masks\n        - boxes should be a size [n, 4] tensor of bbox coords in relative point form\n    \"\"\"\n\n    n, h, w = masks.shape\n    x1, y1, x2, y2 = torch.chunk(boxes[:, :, None], 4, 1)  # x1 shape(1,1,n)\n    r = torch.arange(w, device=masks.device, dtype=x1.dtype)[None, None, :]  # rows shape(1,w,1)\n    c = torch.arange(h, device=masks.device, dtype=x1.dtype)[None, :, None]  # cols shape(h,1,1)\n\n    return masks * ((r >= x1) * (r < x2) * (c >= y1) * (c < y2))\n\n\ndef process_mask_upsample(protos, masks_in, bboxes, shape):\n    \"\"\"\n    Crop after upsample.\n    protos: [mask_dim, mask_h, mask_w]\n    masks_in: [n, mask_dim], n is number of masks after nms\n    bboxes: [n, 4], n is number of masks after nms\n    shape: input_image_size, (h, w)\n\n    return: h, w, n\n    \"\"\"\n\n    c, mh, mw = protos.shape  # CHW\n    masks = (masks_in @ protos.float().view(c, -1)).sigmoid().view(-1, mh, mw)\n    masks = F.interpolate(masks[None], shape, mode='bilinear', align_corners=False)[0]  # CHW\n    masks = crop_mask(masks, bboxes)  # CHW\n    return masks.gt_(0.5)\n\n\ndef process_mask(protos, masks_in, bboxes, shape, upsample=False):\n    \"\"\"\n    Crop before upsample.\n    proto_out: [mask_dim, mask_h, mask_w]\n    out_masks: [n, mask_dim], n is number of masks after nms\n    bboxes: [n, 4], n is number of masks after nms\n    shape:input_image_size, (h, w)\n\n    return: h, w, n\n    \"\"\"\n\n    c, mh, mw = protos.shape  # CHW\n    ih, iw = shape\n    masks = (masks_in @ protos.float().view(c, -1)).sigmoid().view(-1, mh, mw)  # CHW\n\n    downsampled_bboxes = bboxes.clone()\n    downsampled_bboxes[:, 0] *= mw / iw\n    downsampled_bboxes[:, 2] *= mw / iw\n    downsampled_bboxes[:, 3] *= mh / ih\n    downsampled_bboxes[:, 1] *= mh / ih\n\n    masks = crop_mask(masks, downsampled_bboxes)  # CHW\n    if upsample:\n        masks = F.interpolate(masks[None], shape, mode='bilinear', align_corners=False)[0]  # CHW\n    return masks.gt_(0.5)\n\n\ndef process_mask_native(protos, masks_in, bboxes, shape):\n    \"\"\"\n    Crop after upsample.\n    protos: [mask_dim, mask_h, mask_w]\n    masks_in: [n, mask_dim], n is number of masks after nms\n    bboxes: [n, 4], n is number of masks after nms\n    shape: input_image_size, (h, w)\n\n    return: h, w, n\n    \"\"\"\n    c, mh, mw = protos.shape  # CHW\n    masks = (masks_in @ protos.float().view(c, -1)).sigmoid().view(-1, mh, mw)\n    gain = min(mh / shape[0], mw / shape[1])  # gain  = old / new\n    pad = (mw - shape[1] * gain) / 2, (mh - shape[0] * gain) / 2  # wh padding\n    top, left = int(pad[1]), int(pad[0])  # y, x\n    bottom, right = int(mh - pad[1]), int(mw - pad[0])\n    masks = masks[:, top:bottom, left:right]\n\n    masks = F.interpolate(masks[None], shape, mode='bilinear', align_corners=False)[0]  # CHW\n    masks = crop_mask(masks, bboxes)  # CHW\n    return masks.gt_(0.5)\n\n\ndef scale_image(im1_shape, masks, im0_shape, ratio_pad=None):\n    \"\"\"\n    img1_shape: model input shape, [h, w]\n    img0_shape: origin pic shape, [h, w, 3]\n    masks: [h, w, num]\n    \"\"\"\n    # Rescale coordinates (xyxy) from im1_shape to im0_shape\n    if ratio_pad is None:  # calculate from im0_shape\n        gain = min(im1_shape[0] / im0_shape[0], im1_shape[1] / im0_shape[1])  # gain  = old / new\n        pad = (im1_shape[1] - im0_shape[1] * gain) / 2, (im1_shape[0] - im0_shape[0] * gain) / 2  # wh padding\n    else:\n        pad = ratio_pad[1]\n    top, left = int(pad[1]), int(pad[0])  # y, x\n    bottom, right = int(im1_shape[0] - pad[1]), int(im1_shape[1] - pad[0])\n\n    if len(masks.shape) < 2:\n        raise ValueError(f'\"len of masks shape\" should be 2 or 3, but got {len(masks.shape)}')\n    masks = masks[top:bottom, left:right]\n    # masks = masks.permute(2, 0, 1).contiguous()\n    # masks = F.interpolate(masks[None], im0_shape[:2], mode='bilinear', align_corners=False)[0]\n    # masks = masks.permute(1, 2, 0).contiguous()\n    masks = cv2.resize(masks, (im0_shape[1], im0_shape[0]))\n\n    if len(masks.shape) == 2:\n        masks = masks[:, :, None]\n    return masks\n\n\ndef mask_iou(mask1, mask2, eps=1e-7):\n    \"\"\"\n    mask1: [N, n] m1 means number of predicted objects\n    mask2: [M, n] m2 means number of gt objects\n    Note: n means image_w x image_h\n\n    return: masks iou, [N, M]\n    \"\"\"\n    intersection = torch.matmul(mask1, mask2.t()).clamp(0)\n    union = (mask1.sum(1)[:, None] + mask2.sum(1)[None]) - intersection  # (area1 + area2) - intersection\n    return intersection / (union + eps)\n\n\ndef masks_iou(mask1, mask2, eps=1e-7):\n    \"\"\"\n    mask1: [N, n] m1 means number of predicted objects\n    mask2: [N, n] m2 means number of gt objects\n    Note: n means image_w x image_h\n\n    return: masks iou, (N, )\n    \"\"\"\n    intersection = (mask1 * mask2).sum(1).clamp(0)  # (N, )\n    union = (mask1.sum(1) + mask2.sum(1))[None] - intersection  # (area1 + area2) - intersection\n    return intersection / (union + eps)\n\n\ndef masks2segments(masks, strategy='largest'):\n    # Convert masks(n,160,160) into segments(n,xy)\n    segments = []\n    for x in masks.int().cpu().numpy().astype('uint8'):\n        c = cv2.findContours(x, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)[0]\n        if c:\n            if strategy == 'concat':  # concatenate all segments\n                c = np.concatenate([x.reshape(-1, 2) for x in c])\n            elif strategy == 'largest':  # select largest segment\n                c = np.array(c[np.array([len(x) for x in c]).argmax()]).reshape(-1, 2)\n        else:\n            c = np.zeros((0, 2))  # no segments found\n        segments.append(c.astype('float32'))\n    return segments\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/utils/segment/loss.py",
    "content": "import torch\nimport torch.nn as nn\nimport torch.nn.functional as F\n\nfrom ..general import xywh2xyxy\nfrom ..loss import FocalLoss, smooth_BCE\nfrom ..metrics import bbox_iou\nfrom ..torch_utils import de_parallel\nfrom .general import crop_mask\n\n\nclass ComputeLoss:\n    # Compute losses\n    def __init__(self, model, autobalance=False, overlap=False):\n        self.sort_obj_iou = False\n        self.overlap = overlap\n        device = next(model.parameters()).device  # get model device\n        h = model.hyp  # hyperparameters\n        self.device = device\n\n        # Define criteria\n        BCEcls = nn.BCEWithLogitsLoss(pos_weight=torch.tensor([h['cls_pw']], device=device))\n        BCEobj = nn.BCEWithLogitsLoss(pos_weight=torch.tensor([h['obj_pw']], device=device))\n\n        # Class label smoothing https://arxiv.org/pdf/1902.04103.pdf eqn 3\n        self.cp, self.cn = smooth_BCE(eps=h.get('label_smoothing', 0.0))  # positive, negative BCE targets\n\n        # Focal loss\n        g = h['fl_gamma']  # focal loss gamma\n        if g > 0:\n            BCEcls, BCEobj = FocalLoss(BCEcls, g), FocalLoss(BCEobj, g)\n\n        m = de_parallel(model).model[-1]  # Detect() module\n        self.balance = {3: [4.0, 1.0, 0.4]}.get(m.nl, [4.0, 1.0, 0.25, 0.06, 0.02])  # P3-P7\n        self.ssi = list(m.stride).index(16) if autobalance else 0  # stride 16 index\n        self.BCEcls, self.BCEobj, self.gr, self.hyp, self.autobalance = BCEcls, BCEobj, 1.0, h, autobalance\n        self.na = m.na  # number of anchors\n        self.nc = m.nc  # number of classes\n        self.nl = m.nl  # number of layers\n        self.nm = m.nm  # number of masks\n        self.anchors = m.anchors\n        self.device = device\n\n    def __call__(self, preds, targets, masks):  # predictions, targets, model\n        p, proto = preds\n        bs, nm, mask_h, mask_w = proto.shape  # batch size, number of masks, mask height, mask width\n        lcls = torch.zeros(1, device=self.device)\n        lbox = torch.zeros(1, device=self.device)\n        lobj = torch.zeros(1, device=self.device)\n        lseg = torch.zeros(1, device=self.device)\n        tcls, tbox, indices, anchors, tidxs, xywhn = self.build_targets(p, targets)  # targets\n\n        # Losses\n        for i, pi in enumerate(p):  # layer index, layer predictions\n            b, a, gj, gi = indices[i]  # image, anchor, gridy, gridx\n            tobj = torch.zeros(pi.shape[:4], dtype=pi.dtype, device=self.device)  # target obj\n\n            n = b.shape[0]  # number of targets\n            if n:\n                pxy, pwh, _, pcls, pmask = pi[b, a, gj, gi].split((2, 2, 1, self.nc, nm), 1)  # subset of predictions\n\n                # Box regression\n                pxy = pxy.sigmoid() * 2 - 0.5\n                pwh = (pwh.sigmoid() * 2) ** 2 * anchors[i]\n                pbox = torch.cat((pxy, pwh), 1)  # predicted box\n                iou = bbox_iou(pbox, tbox[i], CIoU=True).squeeze()  # iou(prediction, target)\n                lbox += (1.0 - iou).mean()  # iou loss\n\n                # Objectness\n                iou = iou.detach().clamp(0).type(tobj.dtype)\n                if self.sort_obj_iou:\n                    j = iou.argsort()\n                    b, a, gj, gi, iou = b[j], a[j], gj[j], gi[j], iou[j]\n                if self.gr < 1:\n                    iou = (1.0 - self.gr) + self.gr * iou\n                tobj[b, a, gj, gi] = iou  # iou ratio\n\n                # Classification\n                if self.nc > 1:  # cls loss (only if multiple classes)\n                    t = torch.full_like(pcls, self.cn, device=self.device)  # targets\n                    t[range(n), tcls[i]] = self.cp\n                    lcls += self.BCEcls(pcls, t)  # BCE\n\n                # Mask regression\n                if tuple(masks.shape[-2:]) != (mask_h, mask_w):  # downsample\n                    masks = F.interpolate(masks[None], (mask_h, mask_w), mode='nearest')[0]\n                marea = xywhn[i][:, 2:].prod(1)  # mask width, height normalized\n                mxyxy = xywh2xyxy(xywhn[i] * torch.tensor([mask_w, mask_h, mask_w, mask_h], device=self.device))\n                for bi in b.unique():\n                    j = b == bi  # matching index\n                    if self.overlap:\n                        mask_gti = torch.where(masks[bi][None] == tidxs[i][j].view(-1, 1, 1), 1.0, 0.0)\n                    else:\n                        mask_gti = masks[tidxs[i]][j]\n                    lseg += self.single_mask_loss(mask_gti, pmask[j], proto[bi], mxyxy[j], marea[j])\n\n            obji = self.BCEobj(pi[..., 4], tobj)\n            lobj += obji * self.balance[i]  # obj loss\n            if self.autobalance:\n                self.balance[i] = self.balance[i] * 0.9999 + 0.0001 / obji.detach().item()\n\n        if self.autobalance:\n            self.balance = [x / self.balance[self.ssi] for x in self.balance]\n        lbox *= self.hyp['box']\n        lobj *= self.hyp['obj']\n        lcls *= self.hyp['cls']\n        lseg *= self.hyp['box'] / bs\n\n        loss = lbox + lobj + lcls + lseg\n        return loss * bs, torch.cat((lbox, lseg, lobj, lcls)).detach()\n\n    def single_mask_loss(self, gt_mask, pred, proto, xyxy, area):\n        # Mask loss for one image\n        pred_mask = (pred @ proto.view(self.nm, -1)).view(-1, *proto.shape[1:])  # (n,32) @ (32,80,80) -> (n,80,80)\n        loss = F.binary_cross_entropy_with_logits(pred_mask, gt_mask, reduction='none')\n        return (crop_mask(loss, xyxy).mean(dim=(1, 2)) / area).mean()\n\n    def build_targets(self, p, targets):\n        # Build targets for compute_loss(), input targets(image,class,x,y,w,h)\n        na, nt = self.na, targets.shape[0]  # number of anchors, targets\n        tcls, tbox, indices, anch, tidxs, xywhn = [], [], [], [], [], []\n        gain = torch.ones(8, device=self.device)  # normalized to gridspace gain\n        ai = torch.arange(na, device=self.device).float().view(na, 1).repeat(1, nt)  # same as .repeat_interleave(nt)\n        if self.overlap:\n            batch = p[0].shape[0]\n            ti = []\n            for i in range(batch):\n                num = (targets[:, 0] == i).sum()  # find number of targets of each image\n                ti.append(torch.arange(num, device=self.device).float().view(1, num).repeat(na, 1) + 1)  # (na, num)\n            ti = torch.cat(ti, 1)  # (na, nt)\n        else:\n            ti = torch.arange(nt, device=self.device).float().view(1, nt).repeat(na, 1)\n        targets = torch.cat((targets.repeat(na, 1, 1), ai[..., None], ti[..., None]), 2)  # append anchor indices\n\n        g = 0.5  # bias\n        off = torch.tensor(\n            [\n                [0, 0],\n                [1, 0],\n                [0, 1],\n                [-1, 0],\n                [0, -1],  # j,k,l,m\n                # [1, 1], [1, -1], [-1, 1], [-1, -1],  # jk,jm,lk,lm\n            ],\n            device=self.device).float() * g  # offsets\n\n        for i in range(self.nl):\n            anchors, shape = self.anchors[i], p[i].shape\n            gain[2:6] = torch.tensor(shape)[[3, 2, 3, 2]]  # xyxy gain\n\n            # Match targets to anchors\n            t = targets * gain  # shape(3,n,7)\n            if nt:\n                # Matches\n                r = t[..., 4:6] / anchors[:, None]  # wh ratio\n                j = torch.max(r, 1 / r).max(2)[0] < self.hyp['anchor_t']  # compare\n                # j = wh_iou(anchors, t[:, 4:6]) > model.hyp['iou_t']  # iou(3,n)=wh_iou(anchors(3,2), gwh(n,2))\n                t = t[j]  # filter\n\n                # Offsets\n                gxy = t[:, 2:4]  # grid xy\n                gxi = gain[[2, 3]] - gxy  # inverse\n                j, k = ((gxy % 1 < g) & (gxy > 1)).T\n                l, m = ((gxi % 1 < g) & (gxi > 1)).T\n                j = torch.stack((torch.ones_like(j), j, k, l, m))\n                t = t.repeat((5, 1, 1))[j]\n                offsets = (torch.zeros_like(gxy)[None] + off[:, None])[j]\n            else:\n                t = targets[0]\n                offsets = 0\n\n            # Define\n            bc, gxy, gwh, at = t.chunk(4, 1)  # (image, class), grid xy, grid wh, anchors\n            (a, tidx), (b, c) = at.long().T, bc.long().T  # anchors, image, class\n            gij = (gxy - offsets).long()\n            gi, gj = gij.T  # grid indices\n\n            # Append\n            indices.append((b, a, gj.clamp_(0, shape[2] - 1), gi.clamp_(0, shape[3] - 1)))  # image, anchor, grid\n            tbox.append(torch.cat((gxy - gij, gwh), 1))  # box\n            anch.append(anchors[a])  # anchors\n            tcls.append(c)  # class\n            tidxs.append(tidx)\n            xywhn.append(torch.cat((gxy, gwh), 1) / gain[2:6])  # xywh normalized\n\n        return tcls, tbox, indices, anch, tidxs, xywhn\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/utils/segment/metrics.py",
    "content": "# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n\"\"\"\nModel validation metrics\n\"\"\"\n\nimport numpy as np\n\nfrom ..metrics import ap_per_class\n\n\ndef fitness(x):\n    # Model fitness as a weighted combination of metrics\n    w = [0.0, 0.0, 0.1, 0.9, 0.0, 0.0, 0.1, 0.9]\n    return (x[:, :8] * w).sum(1)\n\n\ndef ap_per_class_box_and_mask(\n        tp_m,\n        tp_b,\n        conf,\n        pred_cls,\n        target_cls,\n        plot=False,\n        save_dir='.',\n        names=(),\n):\n    \"\"\"\n    Args:\n        tp_b: tp of boxes.\n        tp_m: tp of masks.\n        other arguments see `func: ap_per_class`.\n    \"\"\"\n    results_boxes = ap_per_class(tp_b,\n                                 conf,\n                                 pred_cls,\n                                 target_cls,\n                                 plot=plot,\n                                 save_dir=save_dir,\n                                 names=names,\n                                 prefix='Box')[2:]\n    results_masks = ap_per_class(tp_m,\n                                 conf,\n                                 pred_cls,\n                                 target_cls,\n                                 plot=plot,\n                                 save_dir=save_dir,\n                                 names=names,\n                                 prefix='Mask')[2:]\n\n    results = {\n        'boxes': {\n            'p': results_boxes[0],\n            'r': results_boxes[1],\n            'ap': results_boxes[3],\n            'f1': results_boxes[2],\n            'ap_class': results_boxes[4]},\n        'masks': {\n            'p': results_masks[0],\n            'r': results_masks[1],\n            'ap': results_masks[3],\n            'f1': results_masks[2],\n            'ap_class': results_masks[4]}}\n    return results\n\n\nclass Metric:\n\n    def __init__(self) -> None:\n        self.p = []  # (nc, )\n        self.r = []  # (nc, )\n        self.f1 = []  # (nc, )\n        self.all_ap = []  # (nc, 10)\n        self.ap_class_index = []  # (nc, )\n\n    @property\n    def ap50(self):\n        \"\"\"AP@0.5 of all classes.\n        Return:\n            (nc, ) or [].\n        \"\"\"\n        return self.all_ap[:, 0] if len(self.all_ap) else []\n\n    @property\n    def ap(self):\n        \"\"\"AP@0.5:0.95\n        Return:\n            (nc, ) or [].\n        \"\"\"\n        return self.all_ap.mean(1) if len(self.all_ap) else []\n\n    @property\n    def mp(self):\n        \"\"\"mean precision of all classes.\n        Return:\n            float.\n        \"\"\"\n        return self.p.mean() if len(self.p) else 0.0\n\n    @property\n    def mr(self):\n        \"\"\"mean recall of all classes.\n        Return:\n            float.\n        \"\"\"\n        return self.r.mean() if len(self.r) else 0.0\n\n    @property\n    def map50(self):\n        \"\"\"Mean AP@0.5 of all classes.\n        Return:\n            float.\n        \"\"\"\n        return self.all_ap[:, 0].mean() if len(self.all_ap) else 0.0\n\n    @property\n    def map(self):\n        \"\"\"Mean AP@0.5:0.95 of all classes.\n        Return:\n            float.\n        \"\"\"\n        return self.all_ap.mean() if len(self.all_ap) else 0.0\n\n    def mean_results(self):\n        \"\"\"Mean of results, return mp, mr, map50, map\"\"\"\n        return (self.mp, self.mr, self.map50, self.map)\n\n    def class_result(self, i):\n        \"\"\"class-aware result, return p[i], r[i], ap50[i], ap[i]\"\"\"\n        return (self.p[i], self.r[i], self.ap50[i], self.ap[i])\n\n    def get_maps(self, nc):\n        maps = np.zeros(nc) + self.map\n        for i, c in enumerate(self.ap_class_index):\n            maps[c] = self.ap[i]\n        return maps\n\n    def update(self, results):\n        \"\"\"\n        Args:\n            results: tuple(p, r, ap, f1, ap_class)\n        \"\"\"\n        p, r, all_ap, f1, ap_class_index = results\n        self.p = p\n        self.r = r\n        self.all_ap = all_ap\n        self.f1 = f1\n        self.ap_class_index = ap_class_index\n\n\nclass Metrics:\n    \"\"\"Metric for boxes and masks.\"\"\"\n\n    def __init__(self) -> None:\n        self.metric_box = Metric()\n        self.metric_mask = Metric()\n\n    def update(self, results):\n        \"\"\"\n        Args:\n            results: Dict{'boxes': Dict{}, 'masks': Dict{}}\n        \"\"\"\n        self.metric_box.update(list(results['boxes'].values()))\n        self.metric_mask.update(list(results['masks'].values()))\n\n    def mean_results(self):\n        return self.metric_box.mean_results() + self.metric_mask.mean_results()\n\n    def class_result(self, i):\n        return self.metric_box.class_result(i) + self.metric_mask.class_result(i)\n\n    def get_maps(self, nc):\n        return self.metric_box.get_maps(nc) + self.metric_mask.get_maps(nc)\n\n    @property\n    def ap_class_index(self):\n        # boxes and masks have the same ap_class_index\n        return self.metric_box.ap_class_index\n\n\nKEYS = [\n    'train/box_loss',\n    'train/seg_loss',  # train loss\n    'train/obj_loss',\n    'train/cls_loss',\n    'metrics/precision(B)',\n    'metrics/recall(B)',\n    'metrics/mAP_0.5(B)',\n    'metrics/mAP_0.5:0.95(B)',  # metrics\n    'metrics/precision(M)',\n    'metrics/recall(M)',\n    'metrics/mAP_0.5(M)',\n    'metrics/mAP_0.5:0.95(M)',  # metrics\n    'val/box_loss',\n    'val/seg_loss',  # val loss\n    'val/obj_loss',\n    'val/cls_loss',\n    'x/lr0',\n    'x/lr1',\n    'x/lr2',]\n\nBEST_KEYS = [\n    'best/epoch',\n    'best/precision(B)',\n    'best/recall(B)',\n    'best/mAP_0.5(B)',\n    'best/mAP_0.5:0.95(B)',\n    'best/precision(M)',\n    'best/recall(M)',\n    'best/mAP_0.5(M)',\n    'best/mAP_0.5:0.95(M)',]\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/utils/segment/plots.py",
    "content": "import contextlib\nimport math\nfrom pathlib import Path\n\nimport cv2\nimport matplotlib.pyplot as plt\nimport numpy as np\nimport pandas as pd\nimport torch\n\nfrom .. import threaded\nfrom ..general import xywh2xyxy\nfrom ..plots import Annotator, colors\n\n\n@threaded\ndef plot_images_and_masks(images, targets, masks, paths=None, fname='images.jpg', names=None):\n    # Plot image grid with labels\n    if isinstance(images, torch.Tensor):\n        images = images.cpu().float().numpy()\n    if isinstance(targets, torch.Tensor):\n        targets = targets.cpu().numpy()\n    if isinstance(masks, torch.Tensor):\n        masks = masks.cpu().numpy().astype(int)\n\n    max_size = 1920  # max image size\n    max_subplots = 16  # max image subplots, i.e. 4x4\n    bs, _, h, w = images.shape  # batch size, _, height, width\n    bs = min(bs, max_subplots)  # limit plot images\n    ns = np.ceil(bs ** 0.5)  # number of subplots (square)\n    if np.max(images[0]) <= 1:\n        images *= 255  # de-normalise (optional)\n\n    # Build Image\n    mosaic = np.full((int(ns * h), int(ns * w), 3), 255, dtype=np.uint8)  # init\n    for i, im in enumerate(images):\n        if i == max_subplots:  # if last batch has fewer images than we expect\n            break\n        x, y = int(w * (i // ns)), int(h * (i % ns))  # block origin\n        im = im.transpose(1, 2, 0)\n        mosaic[y:y + h, x:x + w, :] = im\n\n    # Resize (optional)\n    scale = max_size / ns / max(h, w)\n    if scale < 1:\n        h = math.ceil(scale * h)\n        w = math.ceil(scale * w)\n        mosaic = cv2.resize(mosaic, tuple(int(x * ns) for x in (w, h)))\n\n    # Annotate\n    fs = int((h + w) * ns * 0.01)  # font size\n    annotator = Annotator(mosaic, line_width=round(fs / 10), font_size=fs, pil=True, example=names)\n    for i in range(i + 1):\n        x, y = int(w * (i // ns)), int(h * (i % ns))  # block origin\n        annotator.rectangle([x, y, x + w, y + h], None, (255, 255, 255), width=2)  # borders\n        if paths:\n            annotator.text((x + 5, y + 5 + h), text=Path(paths[i]).name[:40], txt_color=(220, 220, 220))  # filenames\n        if len(targets) > 0:\n            idx = targets[:, 0] == i\n            ti = targets[idx]  # image targets\n\n            boxes = xywh2xyxy(ti[:, 2:6]).T\n            classes = ti[:, 1].astype('int')\n            labels = ti.shape[1] == 6  # labels if no conf column\n            conf = None if labels else ti[:, 6]  # check for confidence presence (label vs pred)\n\n            if boxes.shape[1]:\n                if boxes.max() <= 1.01:  # if normalized with tolerance 0.01\n                    boxes[[0, 2]] *= w  # scale to pixels\n                    boxes[[1, 3]] *= h\n                elif scale < 1:  # absolute coords need scale if image scales\n                    boxes *= scale\n            boxes[[0, 2]] += x\n            boxes[[1, 3]] += y\n            for j, box in enumerate(boxes.T.tolist()):\n                cls = classes[j]\n                color = colors(cls)\n                cls = names[cls] if names else cls\n                if labels or conf[j] > 0.25:  # 0.25 conf thresh\n                    label = f'{cls}' if labels else f'{cls} {conf[j]:.1f}'\n                    annotator.box_label(box, label, color=color)\n\n            # Plot masks\n            if len(masks):\n                if masks.max() > 1.0:  # mean that masks are overlap\n                    image_masks = masks[[i]]  # (1, 640, 640)\n                    nl = len(ti)\n                    index = np.arange(nl).reshape(nl, 1, 1) + 1\n                    image_masks = np.repeat(image_masks, nl, axis=0)\n                    image_masks = np.where(image_masks == index, 1.0, 0.0)\n                else:\n                    image_masks = masks[idx]\n\n                im = np.asarray(annotator.im).copy()\n                for j, box in enumerate(boxes.T.tolist()):\n                    if labels or conf[j] > 0.25:  # 0.25 conf thresh\n                        color = colors(classes[j])\n                        mh, mw = image_masks[j].shape\n                        if mh != h or mw != w:\n                            mask = image_masks[j].astype(np.uint8)\n                            mask = cv2.resize(mask, (w, h))\n                            mask = mask.astype(bool)\n                        else:\n                            mask = image_masks[j].astype(bool)\n                        with contextlib.suppress(Exception):\n                            im[y:y + h, x:x + w, :][mask] = im[y:y + h, x:x + w, :][mask] * 0.4 + np.array(color) * 0.6\n                annotator.fromarray(im)\n    annotator.im.save(fname)  # save\n\n\ndef plot_results_with_masks(file='path/to/results.csv', dir='', best=True):\n    # Plot training results.csv. Usage: from utils.plots import *; plot_results('path/to/results.csv')\n    save_dir = Path(file).parent if file else Path(dir)\n    fig, ax = plt.subplots(2, 8, figsize=(18, 6), tight_layout=True)\n    ax = ax.ravel()\n    files = list(save_dir.glob('results*.csv'))\n    assert len(files), f'No results.csv files found in {save_dir.resolve()}, nothing to plot.'\n    for f in files:\n        try:\n            data = pd.read_csv(f)\n            index = np.argmax(0.9 * data.values[:, 8] + 0.1 * data.values[:, 7] + 0.9 * data.values[:, 12] +\n                              0.1 * data.values[:, 11])\n            s = [x.strip() for x in data.columns]\n            x = data.values[:, 0]\n            for i, j in enumerate([1, 2, 3, 4, 5, 6, 9, 10, 13, 14, 15, 16, 7, 8, 11, 12]):\n                y = data.values[:, j]\n                # y[y == 0] = np.nan  # don't show zero values\n                ax[i].plot(x, y, marker='.', label=f.stem, linewidth=2, markersize=2)\n                if best:\n                    # best\n                    ax[i].scatter(index, y[index], color='r', label=f'best:{index}', marker='*', linewidth=3)\n                    ax[i].set_title(s[j] + f'\\n{round(y[index], 5)}')\n                else:\n                    # last\n                    ax[i].scatter(x[-1], y[-1], color='r', label='last', marker='*', linewidth=3)\n                    ax[i].set_title(s[j] + f'\\n{round(y[-1], 5)}')\n                # if j in [8, 9, 10]:  # share train and val loss y axes\n                #     ax[i].get_shared_y_axes().join(ax[i], ax[i - 5])\n        except Exception as e:\n            print(f'Warning: Plotting error for {f}: {e}')\n    ax[1].legend()\n    fig.savefig(save_dir / 'results.png', dpi=200)\n    plt.close()\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/utils/torch_utils.py",
    "content": "# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n\"\"\"\nPyTorch utils\n\"\"\"\n\nimport math\nimport os\nimport platform\nimport subprocess\nimport time\nimport warnings\nfrom contextlib import contextmanager\nfrom copy import deepcopy\nfrom pathlib import Path\n\nimport torch\nimport torch.distributed as dist\nimport torch.nn as nn\nimport torch.nn.functional as F\nfrom torch.nn.parallel import DistributedDataParallel as DDP\n\nfrom utils.general import LOGGER, check_version, colorstr, file_date, git_describe\n\nLOCAL_RANK = int(os.getenv('LOCAL_RANK', -1))  # https://pytorch.org/docs/stable/elastic/run.html\nRANK = int(os.getenv('RANK', -1))\nWORLD_SIZE = int(os.getenv('WORLD_SIZE', 1))\n\ntry:\n    import thop  # for FLOPs computation\nexcept ImportError:\n    thop = None\n\n# Suppress PyTorch warnings\nwarnings.filterwarnings('ignore', message='User provided device_type of \\'cuda\\', but CUDA is not available. Disabling')\nwarnings.filterwarnings('ignore', category=UserWarning)\n\n\ndef smart_inference_mode(torch_1_9=check_version(torch.__version__, '1.9.0')):\n    # Applies torch.inference_mode() decorator if torch>=1.9.0 else torch.no_grad() decorator\n    def decorate(fn):\n        return (torch.inference_mode if torch_1_9 else torch.no_grad)()(fn)\n\n    return decorate\n\n\ndef smartCrossEntropyLoss(label_smoothing=0.0):\n    # Returns nn.CrossEntropyLoss with label smoothing enabled for torch>=1.10.0\n    if check_version(torch.__version__, '1.10.0'):\n        return nn.CrossEntropyLoss(label_smoothing=label_smoothing)\n    if label_smoothing > 0:\n        LOGGER.warning(f'WARNING ⚠️ label smoothing {label_smoothing} requires torch>=1.10.0')\n    return nn.CrossEntropyLoss()\n\n\ndef smart_DDP(model):\n    # Model DDP creation with checks\n    assert not check_version(torch.__version__, '1.12.0', pinned=True), \\\n        'torch==1.12.0 torchvision==0.13.0 DDP training is not supported due to a known issue. ' \\\n        'Please upgrade or downgrade torch to use DDP. See https://github.com/ultralytics/yolov5/issues/8395'\n    if check_version(torch.__version__, '1.11.0'):\n        return DDP(model, device_ids=[LOCAL_RANK], output_device=LOCAL_RANK, static_graph=True)\n    else:\n        return DDP(model, device_ids=[LOCAL_RANK], output_device=LOCAL_RANK)\n\n\ndef reshape_classifier_output(model, n=1000):\n    # Update a TorchVision classification model to class count 'n' if required\n    from models.common import Classify\n    name, m = list((model.model if hasattr(model, 'model') else model).named_children())[-1]  # last module\n    if isinstance(m, Classify):  # YOLOv5 Classify() head\n        if m.linear.out_features != n:\n            m.linear = nn.Linear(m.linear.in_features, n)\n    elif isinstance(m, nn.Linear):  # ResNet, EfficientNet\n        if m.out_features != n:\n            setattr(model, name, nn.Linear(m.in_features, n))\n    elif isinstance(m, nn.Sequential):\n        types = [type(x) for x in m]\n        if nn.Linear in types:\n            i = types.index(nn.Linear)  # nn.Linear index\n            if m[i].out_features != n:\n                m[i] = nn.Linear(m[i].in_features, n)\n        elif nn.Conv2d in types:\n            i = types.index(nn.Conv2d)  # nn.Conv2d index\n            if m[i].out_channels != n:\n                m[i] = nn.Conv2d(m[i].in_channels, n, m[i].kernel_size, m[i].stride, bias=m[i].bias is not None)\n\n\n@contextmanager\ndef torch_distributed_zero_first(local_rank: int):\n    # Decorator to make all processes in distributed training wait for each local_master to do something\n    if local_rank not in [-1, 0]:\n        dist.barrier(device_ids=[local_rank])\n    yield\n    if local_rank == 0:\n        dist.barrier(device_ids=[0])\n\n\ndef device_count():\n    # Returns number of CUDA devices available. Safe version of torch.cuda.device_count(). Supports Linux and Windows\n    assert platform.system() in ('Linux', 'Windows'), 'device_count() only supported on Linux or Windows'\n    try:\n        cmd = 'nvidia-smi -L | wc -l' if platform.system() == 'Linux' else 'nvidia-smi -L | find /c /v \"\"'  # Windows\n        return int(subprocess.run(cmd, shell=True, capture_output=True, check=True).stdout.decode().split()[-1])\n    except Exception:\n        return 0\n\n\ndef select_device(device='', batch_size=0, newline=True):\n    # device = None or 'cpu' or 0 or '0' or '0,1,2,3'\n    s = f'YOLOv5 🚀 {git_describe() or file_date()} Python-{platform.python_version()} torch-{torch.__version__} '\n    device = str(device).strip().lower().replace('cuda:', '').replace('none', '')  # to string, 'cuda:0' to '0'\n    cpu = device == 'cpu'\n    mps = device == 'mps'  # Apple Metal Performance Shaders (MPS)\n    if cpu or mps:\n        os.environ['CUDA_VISIBLE_DEVICES'] = '-1'  # force torch.cuda.is_available() = False\n    elif device:  # non-cpu device requested\n        os.environ['CUDA_VISIBLE_DEVICES'] = device  # set environment variable - must be before assert is_available()\n        assert torch.cuda.is_available() and torch.cuda.device_count() >= len(device.replace(',', '')), \\\n            f\"Invalid CUDA '--device {device}' requested, use '--device cpu' or pass valid CUDA device(s)\"\n\n    if not cpu and not mps and torch.cuda.is_available():  # prefer GPU if available\n        devices = device.split(',') if device else '0'  # range(torch.cuda.device_count())  # i.e. 0,1,6,7\n        n = len(devices)  # device count\n        if n > 1 and batch_size > 0:  # check batch_size is divisible by device_count\n            assert batch_size % n == 0, f'batch-size {batch_size} not multiple of GPU count {n}'\n        space = ' ' * (len(s) + 1)\n        for i, d in enumerate(devices):\n            p = torch.cuda.get_device_properties(i)\n            s += f\"{'' if i == 0 else space}CUDA:{d} ({p.name}, {p.total_memory / (1 << 20):.0f}MiB)\\n\"  # bytes to MB\n        arg = 'cuda:0'\n    elif mps and getattr(torch, 'has_mps', False) and torch.backends.mps.is_available():  # prefer MPS if available\n        s += 'MPS\\n'\n        arg = 'mps'\n    else:  # revert to CPU\n        s += 'CPU\\n'\n        arg = 'cpu'\n\n    if not newline:\n        s = s.rstrip()\n    LOGGER.info(s)\n    return torch.device(arg)\n\n\ndef time_sync():\n    # PyTorch-accurate time\n    if torch.cuda.is_available():\n        torch.cuda.synchronize()\n    return time.time()\n\n\ndef profile(input, ops, n=10, device=None):\n    \"\"\" YOLOv5 speed/memory/FLOPs profiler\n    Usage:\n        input = torch.randn(16, 3, 640, 640)\n        m1 = lambda x: x * torch.sigmoid(x)\n        m2 = nn.SiLU()\n        profile(input, [m1, m2], n=100)  # profile over 100 iterations\n    \"\"\"\n    results = []\n    if not isinstance(device, torch.device):\n        device = select_device(device)\n    print(f\"{'Params':>12s}{'GFLOPs':>12s}{'GPU_mem (GB)':>14s}{'forward (ms)':>14s}{'backward (ms)':>14s}\"\n          f\"{'input':>24s}{'output':>24s}\")\n\n    for x in input if isinstance(input, list) else [input]:\n        x = x.to(device)\n        x.requires_grad = True\n        for m in ops if isinstance(ops, list) else [ops]:\n            m = m.to(device) if hasattr(m, 'to') else m  # device\n            m = m.half() if hasattr(m, 'half') and isinstance(x, torch.Tensor) and x.dtype is torch.float16 else m\n            tf, tb, t = 0, 0, [0, 0, 0]  # dt forward, backward\n            try:\n                flops = thop.profile(m, inputs=(x,), verbose=False)[0] / 1E9 * 2  # GFLOPs\n            except Exception:\n                flops = 0\n\n            try:\n                for _ in range(n):\n                    t[0] = time_sync()\n                    y = m(x)\n                    t[1] = time_sync()\n                    try:\n                        _ = (sum(yi.sum() for yi in y) if isinstance(y, list) else y).sum().backward()\n                        t[2] = time_sync()\n                    except Exception:  # no backward method\n                        # print(e)  # for debug\n                        t[2] = float('nan')\n                    tf += (t[1] - t[0]) * 1000 / n  # ms per op forward\n                    tb += (t[2] - t[1]) * 1000 / n  # ms per op backward\n                mem = torch.cuda.memory_reserved() / 1E9 if torch.cuda.is_available() else 0  # (GB)\n                s_in, s_out = (tuple(x.shape) if isinstance(x, torch.Tensor) else 'list' for x in (x, y))  # shapes\n                p = sum(x.numel() for x in m.parameters()) if isinstance(m, nn.Module) else 0  # parameters\n                print(f'{p:12}{flops:12.4g}{mem:>14.3f}{tf:14.4g}{tb:14.4g}{str(s_in):>24s}{str(s_out):>24s}')\n                results.append([p, flops, mem, tf, tb, s_in, s_out])\n            except Exception as e:\n                print(e)\n                results.append(None)\n            torch.cuda.empty_cache()\n    return results\n\n\ndef is_parallel(model):\n    # Returns True if model is of type DP or DDP\n    return type(model) in (nn.parallel.DataParallel, nn.parallel.DistributedDataParallel)\n\n\ndef de_parallel(model):\n    # De-parallelize a model: returns single-GPU model if model is of type DP or DDP\n    return model.module if is_parallel(model) else model\n\n\ndef initialize_weights(model):\n    for m in model.modules():\n        t = type(m)\n        if t is nn.Conv2d:\n            pass  # nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')\n        elif t is nn.BatchNorm2d:\n            m.eps = 1e-3\n            m.momentum = 0.03\n        elif t in [nn.Hardswish, nn.LeakyReLU, nn.ReLU, nn.ReLU6, nn.SiLU]:\n            m.inplace = True\n\n\ndef find_modules(model, mclass=nn.Conv2d):\n    # Finds layer indices matching module class 'mclass'\n    return [i for i, m in enumerate(model.module_list) if isinstance(m, mclass)]\n\n\ndef sparsity(model):\n    # Return global model sparsity\n    a, b = 0, 0\n    for p in model.parameters():\n        a += p.numel()\n        b += (p == 0).sum()\n    return b / a\n\n\ndef prune(model, amount=0.3):\n    # Prune model to requested global sparsity\n    import torch.nn.utils.prune as prune\n    for name, m in model.named_modules():\n        if isinstance(m, nn.Conv2d):\n            prune.l1_unstructured(m, name='weight', amount=amount)  # prune\n            prune.remove(m, 'weight')  # make permanent\n    LOGGER.info(f'Model pruned to {sparsity(model):.3g} global sparsity')\n\n\ndef fuse_conv_and_bn(conv, bn):\n    # Fuse Conv2d() and BatchNorm2d() layers https://tehnokv.com/posts/fusing-batchnorm-and-conv/\n    fusedconv = nn.Conv2d(conv.in_channels,\n                          conv.out_channels,\n                          kernel_size=conv.kernel_size,\n                          stride=conv.stride,\n                          padding=conv.padding,\n                          dilation=conv.dilation,\n                          groups=conv.groups,\n                          bias=True).requires_grad_(False).to(conv.weight.device)\n\n    # Prepare filters\n    w_conv = conv.weight.clone().view(conv.out_channels, -1)\n    w_bn = torch.diag(bn.weight.div(torch.sqrt(bn.eps + bn.running_var)))\n    fusedconv.weight.copy_(torch.mm(w_bn, w_conv).view(fusedconv.weight.shape))\n\n    # Prepare spatial bias\n    b_conv = torch.zeros(conv.weight.size(0), device=conv.weight.device) if conv.bias is None else conv.bias\n    b_bn = bn.bias - bn.weight.mul(bn.running_mean).div(torch.sqrt(bn.running_var + bn.eps))\n    fusedconv.bias.copy_(torch.mm(w_bn, b_conv.reshape(-1, 1)).reshape(-1) + b_bn)\n\n    return fusedconv\n\n\ndef model_info(model, verbose=False, imgsz=640):\n    # Model information. img_size may be int or list, i.e. img_size=640 or img_size=[640, 320]\n    n_p = sum(x.numel() for x in model.parameters())  # number parameters\n    n_g = sum(x.numel() for x in model.parameters() if x.requires_grad)  # number gradients\n    if verbose:\n        print(f\"{'layer':>5} {'name':>40} {'gradient':>9} {'parameters':>12} {'shape':>20} {'mu':>10} {'sigma':>10}\")\n        for i, (name, p) in enumerate(model.named_parameters()):\n            name = name.replace('module_list.', '')\n            print('%5g %40s %9s %12g %20s %10.3g %10.3g' %\n                  (i, name, p.requires_grad, p.numel(), list(p.shape), p.mean(), p.std()))\n\n    try:  # FLOPs\n        p = next(model.parameters())\n        stride = max(int(model.stride.max()), 32) if hasattr(model, 'stride') else 32  # max stride\n        im = torch.empty((1, p.shape[1], stride, stride), device=p.device)  # input image in BCHW format\n        flops = thop.profile(deepcopy(model), inputs=(im,), verbose=False)[0] / 1E9 * 2  # stride GFLOPs\n        imgsz = imgsz if isinstance(imgsz, list) else [imgsz, imgsz]  # expand if int/float\n        fs = f', {flops * imgsz[0] / stride * imgsz[1] / stride:.1f} GFLOPs'  # 640x640 GFLOPs\n    except Exception:\n        fs = ''\n\n    name = Path(model.yaml_file).stem.replace('yolov5', 'YOLOv5') if hasattr(model, 'yaml_file') else 'Model'\n    LOGGER.info(f'{name} summary: {len(list(model.modules()))} layers, {n_p} parameters, {n_g} gradients{fs}')\n\n\ndef scale_img(img, ratio=1.0, same_shape=False, gs=32):  # img(16,3,256,416)\n    # Scales img(bs,3,y,x) by ratio constrained to gs-multiple\n    if ratio == 1.0:\n        return img\n    h, w = img.shape[2:]\n    s = (int(h * ratio), int(w * ratio))  # new size\n    img = F.interpolate(img, size=s, mode='bilinear', align_corners=False)  # resize\n    if not same_shape:  # pad/crop img\n        h, w = (math.ceil(x * ratio / gs) * gs for x in (h, w))\n    return F.pad(img, [0, w - s[1], 0, h - s[0]], value=0.447)  # value = imagenet mean\n\n\ndef copy_attr(a, b, include=(), exclude=()):\n    # Copy attributes from b to a, options to only include [...] and to exclude [...]\n    for k, v in b.__dict__.items():\n        if (len(include) and k not in include) or k.startswith('_') or k in exclude:\n            continue\n        else:\n            setattr(a, k, v)\n\n\ndef smart_optimizer(model, name='Adam', lr=0.001, momentum=0.9, decay=1e-5):\n    # YOLOv5 3-param group optimizer: 0) weights with decay, 1) weights no decay, 2) biases no decay\n    g = [], [], []  # optimizer parameter groups\n    bn = tuple(v for k, v in nn.__dict__.items() if 'Norm' in k)  # normalization layers, i.e. BatchNorm2d()\n    for v in model.modules():\n        for p_name, p in v.named_parameters(recurse=0):\n            if p_name == 'bias':  # bias (no decay)\n                g[2].append(p)\n            elif p_name == 'weight' and isinstance(v, bn):  # weight (no decay)\n                g[1].append(p)\n            else:\n                g[0].append(p)  # weight (with decay)\n\n    if name == 'Adam':\n        optimizer = torch.optim.Adam(g[2], lr=lr, betas=(momentum, 0.999))  # adjust beta1 to momentum\n    elif name == 'AdamW':\n        optimizer = torch.optim.AdamW(g[2], lr=lr, betas=(momentum, 0.999), weight_decay=0.0)\n    elif name == 'RMSProp':\n        optimizer = torch.optim.RMSprop(g[2], lr=lr, momentum=momentum)\n    elif name == 'SGD':\n        optimizer = torch.optim.SGD(g[2], lr=lr, momentum=momentum, nesterov=True)\n    else:\n        raise NotImplementedError(f'Optimizer {name} not implemented.')\n\n    optimizer.add_param_group({'params': g[0], 'weight_decay': decay})  # add g0 with weight_decay\n    optimizer.add_param_group({'params': g[1], 'weight_decay': 0.0})  # add g1 (BatchNorm2d weights)\n    LOGGER.info(f\"{colorstr('optimizer:')} {type(optimizer).__name__}(lr={lr}) with parameter groups \"\n                f'{len(g[1])} weight(decay=0.0), {len(g[0])} weight(decay={decay}), {len(g[2])} bias')\n    return optimizer\n\n\ndef smart_hub_load(repo='ultralytics/yolov5', model='yolov5s', **kwargs):\n    # YOLOv5 torch.hub.load() wrapper with smart error/issue handling\n    if check_version(torch.__version__, '1.9.1'):\n        kwargs['skip_validation'] = True  # validation causes GitHub API rate limit errors\n    if check_version(torch.__version__, '1.12.0'):\n        kwargs['trust_repo'] = True  # argument required starting in torch 0.12\n    try:\n        return torch.hub.load(repo, model, **kwargs)\n    except Exception:\n        return torch.hub.load(repo, model, force_reload=True, **kwargs)\n\n\ndef smart_resume(ckpt, optimizer, ema=None, weights='yolov5s.pt', epochs=300, resume=True):\n    # Resume training from a partially trained checkpoint\n    best_fitness = 0.0\n    start_epoch = ckpt['epoch'] + 1\n    if ckpt['optimizer'] is not None:\n        optimizer.load_state_dict(ckpt['optimizer'])  # optimizer\n        best_fitness = ckpt['best_fitness']\n    if ema and ckpt.get('ema'):\n        ema.ema.load_state_dict(ckpt['ema'].float().state_dict())  # EMA\n        ema.updates = ckpt['updates']\n    if resume:\n        assert start_epoch > 0, f'{weights} training to {epochs} epochs is finished, nothing to resume.\\n' \\\n                                f\"Start a new training without --resume, i.e. 'python train.py --weights {weights}'\"\n        LOGGER.info(f'Resuming training from {weights} from epoch {start_epoch} to {epochs} total epochs')\n    if epochs < start_epoch:\n        LOGGER.info(f\"{weights} has been trained for {ckpt['epoch']} epochs. Fine-tuning for {epochs} more epochs.\")\n        epochs += ckpt['epoch']  # finetune additional epochs\n    return best_fitness, start_epoch, epochs\n\n\nclass EarlyStopping:\n    # YOLOv5 simple early stopper\n    def __init__(self, patience=30):\n        self.best_fitness = 0.0  # i.e. mAP\n        self.best_epoch = 0\n        self.patience = patience or float('inf')  # epochs to wait after fitness stops improving to stop\n        self.possible_stop = False  # possible stop may occur next epoch\n\n    def __call__(self, epoch, fitness):\n        if fitness >= self.best_fitness:  # >= 0 to allow for early zero-fitness stage of training\n            self.best_epoch = epoch\n            self.best_fitness = fitness\n        delta = epoch - self.best_epoch  # epochs without improvement\n        self.possible_stop = delta >= (self.patience - 1)  # possible stop may occur next epoch\n        stop = delta >= self.patience  # stop training if patience exceeded\n        if stop:\n            LOGGER.info(f'Stopping training early as no improvement observed in last {self.patience} epochs. '\n                        f'Best results observed at epoch {self.best_epoch}, best model saved as best.pt.\\n'\n                        f'To update EarlyStopping(patience={self.patience}) pass a new patience value, '\n                        f'i.e. `python train.py --patience 300` or use `--patience 0` to disable EarlyStopping.')\n        return stop\n\n\nclass ModelEMA:\n    \"\"\" Updated Exponential Moving Average (EMA) from https://github.com/rwightman/pytorch-image-models\n    Keeps a moving average of everything in the model state_dict (parameters and buffers)\n    For EMA details see https://www.tensorflow.org/api_docs/python/tf/train/ExponentialMovingAverage\n    \"\"\"\n\n    def __init__(self, model, decay=0.9999, tau=2000, updates=0):\n        # Create EMA\n        self.ema = deepcopy(de_parallel(model)).eval()  # FP32 EMA\n        self.updates = updates  # number of EMA updates\n        self.decay = lambda x: decay * (1 - math.exp(-x / tau))  # decay exponential ramp (to help early epochs)\n        for p in self.ema.parameters():\n            p.requires_grad_(False)\n\n    def update(self, model):\n        # Update EMA parameters\n        self.updates += 1\n        d = self.decay(self.updates)\n\n        msd = de_parallel(model).state_dict()  # model state_dict\n        for k, v in self.ema.state_dict().items():\n            if v.dtype.is_floating_point:  # true for FP16 and FP32\n                v *= d\n                v += (1 - d) * msd[k].detach()\n        # assert v.dtype == msd[k].dtype == torch.float32, f'{k}: EMA {v.dtype} and model {msd[k].dtype} must be FP32'\n\n    def update_attr(self, model, include=(), exclude=('process_group', 'reducer')):\n        # Update EMA attributes\n        copy_attr(self.ema, model, include, exclude)\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/utils/triton.py",
    "content": "# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n\"\"\" Utils to interact with the Triton Inference Server\n\"\"\"\n\nimport typing\nfrom urllib.parse import urlparse\n\nimport torch\n\n\nclass TritonRemoteModel:\n    \"\"\" A wrapper over a model served by the Triton Inference Server. It can\n    be configured to communicate over GRPC or HTTP. It accepts Torch Tensors\n    as input and returns them as outputs.\n    \"\"\"\n\n    def __init__(self, url: str):\n        \"\"\"\n        Keyword arguments:\n        url: Fully qualified address of the Triton server - for e.g. grpc://localhost:8000\n        \"\"\"\n\n        parsed_url = urlparse(url)\n        if parsed_url.scheme == 'grpc':\n            from tritonclient.grpc import InferenceServerClient, InferInput\n\n            self.client = InferenceServerClient(parsed_url.netloc)  # Triton GRPC client\n            model_repository = self.client.get_model_repository_index()\n            self.model_name = model_repository.models[0].name\n            self.metadata = self.client.get_model_metadata(self.model_name, as_json=True)\n\n            def create_input_placeholders() -> typing.List[InferInput]:\n                return [\n                    InferInput(i['name'], [int(s) for s in i['shape']], i['datatype']) for i in self.metadata['inputs']]\n\n        else:\n            from tritonclient.http import InferenceServerClient, InferInput\n\n            self.client = InferenceServerClient(parsed_url.netloc)  # Triton HTTP client\n            model_repository = self.client.get_model_repository_index()\n            self.model_name = model_repository[0]['name']\n            self.metadata = self.client.get_model_metadata(self.model_name)\n\n            def create_input_placeholders() -> typing.List[InferInput]:\n                return [\n                    InferInput(i['name'], [int(s) for s in i['shape']], i['datatype']) for i in self.metadata['inputs']]\n\n        self._create_input_placeholders_fn = create_input_placeholders\n\n    @property\n    def runtime(self):\n        \"\"\"Returns the model runtime\"\"\"\n        return self.metadata.get('backend', self.metadata.get('platform'))\n\n    def __call__(self, *args, **kwargs) -> typing.Union[torch.Tensor, typing.Tuple[torch.Tensor, ...]]:\n        \"\"\" Invokes the model. Parameters can be provided via args or kwargs.\n        args, if provided, are assumed to match the order of inputs of the model.\n        kwargs are matched with the model input names.\n        \"\"\"\n        inputs = self._create_inputs(*args, **kwargs)\n        response = self.client.infer(model_name=self.model_name, inputs=inputs)\n        result = []\n        for output in self.metadata['outputs']:\n            tensor = torch.as_tensor(response.as_numpy(output['name']))\n            result.append(tensor)\n        return result[0] if len(result) == 1 else result\n\n    def _create_inputs(self, *args, **kwargs):\n        args_len, kwargs_len = len(args), len(kwargs)\n        if not args_len and not kwargs_len:\n            raise RuntimeError('No inputs provided.')\n        if args_len and kwargs_len:\n            raise RuntimeError('Cannot specify args and kwargs at the same time')\n\n        placeholders = self._create_input_placeholders_fn()\n        if args_len:\n            if args_len != len(placeholders):\n                raise RuntimeError(f'Expected {len(placeholders)} inputs, got {args_len}.')\n            for input, value in zip(placeholders, args):\n                input.set_data_from_numpy(value.cpu().numpy())\n        else:\n            for input in placeholders:\n                value = kwargs[input.name]\n                input.set_data_from_numpy(value.cpu().numpy())\n        return placeholders\n"
  },
  {
    "path": "yolo-improve/yolov5-AUX/val.py",
    "content": "# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n\"\"\"\nValidate a trained YOLOv5 detection model on a detection dataset\n\nUsage:\n    $ python val.py --weights yolov5s.pt --data coco128.yaml --img 640\n\nUsage - formats:\n    $ python val.py --weights yolov5s.pt                 # PyTorch\n                              yolov5s.torchscript        # TorchScript\n                              yolov5s.onnx               # ONNX Runtime or OpenCV DNN with --dnn\n                              yolov5s_openvino_model     # OpenVINO\n                              yolov5s.engine             # TensorRT\n                              yolov5s.mlmodel            # CoreML (macOS-only)\n                              yolov5s_saved_model        # TensorFlow SavedModel\n                              yolov5s.pb                 # TensorFlow GraphDef\n                              yolov5s.tflite             # TensorFlow Lite\n                              yolov5s_edgetpu.tflite     # TensorFlow Edge TPU\n                              yolov5s_paddle_model       # PaddlePaddle\n\"\"\"\n\nimport argparse\nimport json\nimport os\nimport subprocess\nimport sys\nfrom pathlib import Path\n\nimport numpy as np\nimport torch\nfrom tqdm import tqdm\n\nFILE = Path(__file__).resolve()\nROOT = FILE.parents[0]  # YOLOv5 root directory\nif str(ROOT) not in sys.path:\n    sys.path.append(str(ROOT))  # add ROOT to PATH\nROOT = Path(os.path.relpath(ROOT, Path.cwd()))  # relative\n\nfrom models.common import DetectMultiBackend\nfrom utils.callbacks import Callbacks\nfrom utils.dataloaders import create_dataloader\nfrom utils.general import (LOGGER, TQDM_BAR_FORMAT, Profile, check_dataset, check_img_size, check_requirements,\n                           check_yaml, coco80_to_coco91_class, colorstr, increment_path, non_max_suppression,\n                           print_args, scale_boxes, xywh2xyxy, xyxy2xywh)\nfrom utils.metrics import ConfusionMatrix, ap_per_class, box_iou\nfrom utils.plots import output_to_target, plot_images, plot_val_study\nfrom utils.torch_utils import select_device, smart_inference_mode\n\n\ndef save_one_txt(predn, save_conf, shape, file):\n    # Save one txt result\n    gn = torch.tensor(shape)[[1, 0, 1, 0]]  # normalization gain whwh\n    for *xyxy, conf, cls in predn.tolist():\n        xywh = (xyxy2xywh(torch.tensor(xyxy).view(1, 4)) / gn).view(-1).tolist()  # normalized xywh\n        line = (cls, *xywh, conf) if save_conf else (cls, *xywh)  # label format\n        with open(file, 'a') as f:\n            f.write(('%g ' * len(line)).rstrip() % line + '\\n')\n\n\ndef save_one_json(predn, jdict, path, class_map):\n    # Save one JSON result {\"image_id\": 42, \"category_id\": 18, \"bbox\": [258.15, 41.29, 348.26, 243.78], \"score\": 0.236}\n    image_id = int(path.stem) if path.stem.isnumeric() else path.stem\n    box = xyxy2xywh(predn[:, :4])  # xywh\n    box[:, :2] -= box[:, 2:] / 2  # xy center to top-left corner\n    for p, b in zip(predn.tolist(), box.tolist()):\n        jdict.append({\n            'image_id': image_id,\n            'category_id': class_map[int(p[5])],\n            'bbox': [round(x, 3) for x in b],\n            'score': round(p[4], 5)})\n\n\ndef process_batch(detections, labels, iouv):\n    \"\"\"\n    Return correct prediction matrix\n    Arguments:\n        detections (array[N, 6]), x1, y1, x2, y2, conf, class\n        labels (array[M, 5]), class, x1, y1, x2, y2\n    Returns:\n        correct (array[N, 10]), for 10 IoU levels\n    \"\"\"\n    correct = np.zeros((detections.shape[0], iouv.shape[0])).astype(bool)\n    iou = box_iou(labels[:, 1:], detections[:, :4])\n    correct_class = labels[:, 0:1] == detections[:, 5]\n    for i in range(len(iouv)):\n        x = torch.where((iou >= iouv[i]) & correct_class)  # IoU > threshold and classes match\n        if x[0].shape[0]:\n            matches = torch.cat((torch.stack(x, 1), iou[x[0], x[1]][:, None]), 1).cpu().numpy()  # [label, detect, iou]\n            if x[0].shape[0] > 1:\n                matches = matches[matches[:, 2].argsort()[::-1]]\n                matches = matches[np.unique(matches[:, 1], return_index=True)[1]]\n                # matches = matches[matches[:, 2].argsort()[::-1]]\n                matches = matches[np.unique(matches[:, 0], return_index=True)[1]]\n            correct[matches[:, 1].astype(int), i] = True\n    return torch.tensor(correct, dtype=torch.bool, device=iouv.device)\n\n\n@smart_inference_mode()\ndef run(\n        data,\n        weights=None,  # model.pt path(s)\n        batch_size=32,  # batch size\n        imgsz=640,  # inference size (pixels)\n        conf_thres=0.001,  # confidence threshold\n        iou_thres=0.6,  # NMS IoU threshold\n        max_det=300,  # maximum detections per image\n        task='val',  # train, val, test, speed or study\n        device='',  # cuda device, i.e. 0 or 0,1,2,3 or cpu\n        workers=8,  # max dataloader workers (per RANK in DDP mode)\n        single_cls=False,  # treat as single-class dataset\n        augment=False,  # augmented inference\n        verbose=False,  # verbose output\n        save_txt=False,  # save results to *.txt\n        save_hybrid=False,  # save label+prediction hybrid results to *.txt\n        save_conf=False,  # save confidences in --save-txt labels\n        save_json=False,  # save a COCO-JSON results file\n        project=ROOT / 'runs/val',  # save to project/name\n        name='exp',  # save to project/name\n        exist_ok=False,  # existing project/name ok, do not increment\n        half=True,  # use FP16 half-precision inference\n        dnn=False,  # use OpenCV DNN for ONNX inference\n        model=None,\n        dataloader=None,\n        save_dir=Path(''),\n        plots=True,\n        callbacks=Callbacks(),\n        compute_loss=None,\n):\n    # Initialize/load model and set device\n    training = model is not None\n    if training:  # called by train.py\n        device, pt, jit, engine = next(model.parameters()).device, True, False, False  # get model device, PyTorch model\n        half &= device.type != 'cpu'  # half precision only supported on CUDA\n        model.half() if half else model.float()\n    else:  # called directly\n        device = select_device(device, batch_size=batch_size)\n\n        # Directories\n        save_dir = increment_path(Path(project) / name, exist_ok=exist_ok)  # increment run\n        (save_dir / 'labels' if save_txt else save_dir).mkdir(parents=True, exist_ok=True)  # make dir\n\n        # Load model\n        model = DetectMultiBackend(weights, device=device, dnn=dnn, data=data, fp16=half)\n        stride, pt, jit, engine = model.stride, model.pt, model.jit, model.engine\n        imgsz = check_img_size(imgsz, s=stride)  # check image size\n        half = model.fp16  # FP16 supported on limited backends with CUDA\n        if engine:\n            batch_size = model.batch_size\n        else:\n            device = model.device\n            if not (pt or jit):\n                batch_size = 1  # export.py models default to batch-size 1\n                LOGGER.info(f'Forcing --batch-size 1 square inference (1,3,{imgsz},{imgsz}) for non-PyTorch models')\n\n        # Data\n        data = check_dataset(data)  # check\n\n    # Configure\n    model.eval()\n    cuda = device.type != 'cpu'\n    is_coco = isinstance(data.get('val'), str) and data['val'].endswith(f'coco{os.sep}val2017.txt')  # COCO dataset\n    nc = 1 if single_cls else int(data['nc'])  # number of classes\n    iouv = torch.linspace(0.5, 0.95, 10, device=device)  # iou vector for mAP@0.5:0.95\n    niou = iouv.numel()\n\n    # Dataloader\n    if not training:\n        if pt and not single_cls:  # check --weights are trained on --data\n            ncm = model.model.nc\n            assert ncm == nc, f'{weights} ({ncm} classes) trained on different --data than what you passed ({nc} ' \\\n                              f'classes). Pass correct combination of --weights and --data that are trained together.'\n        model.warmup(imgsz=(1 if pt else batch_size, 3, imgsz, imgsz))  # warmup\n        pad, rect = (0.0, False) if task == 'speed' else (0.5, pt)  # square inference for benchmarks\n        task = task if task in ('train', 'val', 'test') else 'val'  # path to train/val/test images\n        dataloader = create_dataloader(data[task],\n                                       imgsz,\n                                       batch_size,\n                                       stride,\n                                       single_cls,\n                                       pad=pad,\n                                       rect=rect,\n                                       workers=workers,\n                                       prefix=colorstr(f'{task}: '))[0]\n\n    seen = 0\n    confusion_matrix = ConfusionMatrix(nc=nc)\n    names = model.names if hasattr(model, 'names') else model.module.names  # get class names\n    if isinstance(names, (list, tuple)):  # old format\n        names = dict(enumerate(names))\n    class_map = coco80_to_coco91_class() if is_coco else list(range(1000))\n    s = ('%22s' + '%11s' * 6) % ('Class', 'Images', 'Instances', 'P', 'R', 'mAP50', 'mAP50-95')\n    tp, fp, p, r, f1, mp, mr, map50, ap50, map = 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0\n    dt = Profile(), Profile(), Profile()  # profiling times\n    loss = torch.zeros(3, device=device)\n    jdict, stats, ap, ap_class = [], [], [], []\n    callbacks.run('on_val_start')\n    pbar = tqdm(dataloader, desc=s, bar_format=TQDM_BAR_FORMAT)  # progress bar\n    for batch_i, (im, targets, paths, shapes) in enumerate(pbar):\n        callbacks.run('on_val_batch_start')\n        with dt[0]:\n            if cuda:\n                im = im.to(device, non_blocking=True)\n                targets = targets.to(device)\n            im = im.half() if half else im.float()  # uint8 to fp16/32\n            im /= 255  # 0 - 255 to 0.0 - 1.0\n            nb, _, height, width = im.shape  # batch size, channels, height, width\n\n        # Inference\n        with dt[1]:\n            preds, train_out = model(im) if compute_loss else (model(im, augment=augment), None)\n\n        # Loss\n        if compute_loss:\n            loss += compute_loss(train_out, targets)[1]  # box, obj, cls\n\n        # NMS\n        targets[:, 2:] *= torch.tensor((width, height, width, height), device=device)  # to pixels\n        lb = [targets[targets[:, 0] == i, 1:] for i in range(nb)] if save_hybrid else []  # for autolabelling\n        with dt[2]:\n            preds = non_max_suppression(preds,\n                                        conf_thres,\n                                        iou_thres,\n                                        labels=lb,\n                                        multi_label=True,\n                                        agnostic=single_cls,\n                                        max_det=max_det)\n\n        # Metrics\n        for si, pred in enumerate(preds):\n            labels = targets[targets[:, 0] == si, 1:]\n            nl, npr = labels.shape[0], pred.shape[0]  # number of labels, predictions\n            path, shape = Path(paths[si]), shapes[si][0]\n            correct = torch.zeros(npr, niou, dtype=torch.bool, device=device)  # init\n            seen += 1\n\n            if npr == 0:\n                if nl:\n                    stats.append((correct, *torch.zeros((2, 0), device=device), labels[:, 0]))\n                    if plots:\n                        confusion_matrix.process_batch(detections=None, labels=labels[:, 0])\n                continue\n\n            # Predictions\n            if single_cls:\n                pred[:, 5] = 0\n            predn = pred.clone()\n            scale_boxes(im[si].shape[1:], predn[:, :4], shape, shapes[si][1])  # native-space pred\n\n            # Evaluate\n            if nl:\n                tbox = xywh2xyxy(labels[:, 1:5])  # target boxes\n                scale_boxes(im[si].shape[1:], tbox, shape, shapes[si][1])  # native-space labels\n                labelsn = torch.cat((labels[:, 0:1], tbox), 1)  # native-space labels\n                correct = process_batch(predn, labelsn, iouv)\n                if plots:\n                    confusion_matrix.process_batch(predn, labelsn)\n            stats.append((correct, pred[:, 4], pred[:, 5], labels[:, 0]))  # (correct, conf, pcls, tcls)\n\n            # Save/log\n            if save_txt:\n                save_one_txt(predn, save_conf, shape, file=save_dir / 'labels' / f'{path.stem}.txt')\n            if save_json:\n                save_one_json(predn, jdict, path, class_map)  # append to COCO-JSON dictionary\n            callbacks.run('on_val_image_end', pred, predn, path, names, im[si])\n\n        # Plot images\n        if plots and batch_i < 3:\n            plot_images(im, targets, paths, save_dir / f'val_batch{batch_i}_labels.jpg', names)  # labels\n            plot_images(im, output_to_target(preds), paths, save_dir / f'val_batch{batch_i}_pred.jpg', names)  # pred\n\n        callbacks.run('on_val_batch_end', batch_i, im, targets, paths, shapes, preds)\n\n    # Compute metrics\n    stats = [torch.cat(x, 0).cpu().numpy() for x in zip(*stats)]  # to numpy\n    if len(stats) and stats[0].any():\n        tp, fp, p, r, f1, ap, ap_class = ap_per_class(*stats, plot=plots, save_dir=save_dir, names=names)\n        ap50, ap = ap[:, 0], ap.mean(1)  # AP@0.5, AP@0.5:0.95\n        mp, mr, map50, map = p.mean(), r.mean(), ap50.mean(), ap.mean()\n    nt = np.bincount(stats[3].astype(int), minlength=nc)  # number of targets per class\n\n    # Print results\n    pf = '%22s' + '%11i' * 2 + '%11.3g' * 4  # print format\n    LOGGER.info(pf % ('all', seen, nt.sum(), mp, mr, map50, map))\n    if nt.sum() == 0:\n        LOGGER.warning(f'WARNING ⚠️ no labels found in {task} set, can not compute metrics without labels')\n\n    # Print results per class\n    if (verbose or (nc < 50 and not training)) and nc > 1 and len(stats):\n        for i, c in enumerate(ap_class):\n            LOGGER.info(pf % (names[c], seen, nt[c], p[i], r[i], ap50[i], ap[i]))\n\n    # Print speeds\n    t = tuple(x.t / seen * 1E3 for x in dt)  # speeds per image\n    if not training:\n        shape = (batch_size, 3, imgsz, imgsz)\n        LOGGER.info(f'Speed: %.1fms pre-process, %.1fms inference, %.1fms NMS per image at shape {shape}' % t)\n\n    # Plots\n    if plots:\n        confusion_matrix.plot(save_dir=save_dir, names=list(names.values()))\n        callbacks.run('on_val_end', nt, tp, fp, p, r, f1, ap, ap50, ap_class, confusion_matrix)\n\n    # Save JSON\n    if save_json and len(jdict):\n        w = Path(weights[0] if isinstance(weights, list) else weights).stem if weights is not None else ''  # weights\n        anno_json = str(Path('../datasets/coco/annotations/instances_val2017.json'))  # annotations\n        pred_json = str(save_dir / f'{w}_predictions.json')  # predictions\n        LOGGER.info(f'\\nEvaluating pycocotools mAP... saving {pred_json}...')\n        with open(pred_json, 'w') as f:\n            json.dump(jdict, f)\n\n        try:  # https://github.com/cocodataset/cocoapi/blob/master/PythonAPI/pycocoEvalDemo.ipynb\n            check_requirements('pycocotools>=2.0.6')\n            from pycocotools.coco import COCO\n            from pycocotools.cocoeval import COCOeval\n\n            anno = COCO(anno_json)  # init annotations api\n            pred = anno.loadRes(pred_json)  # init predictions api\n            eval = COCOeval(anno, pred, 'bbox')\n            if is_coco:\n                eval.params.imgIds = [int(Path(x).stem) for x in dataloader.dataset.im_files]  # image IDs to evaluate\n            eval.evaluate()\n            eval.accumulate()\n            eval.summarize()\n            map, map50 = eval.stats[:2]  # update results (mAP@0.5:0.95, mAP@0.5)\n        except Exception as e:\n            LOGGER.info(f'pycocotools unable to run: {e}')\n\n    # Return results\n    model.float()  # for training\n    if not training:\n        s = f\"\\n{len(list(save_dir.glob('labels/*.txt')))} labels saved to {save_dir / 'labels'}\" if save_txt else ''\n        LOGGER.info(f\"Results saved to {colorstr('bold', save_dir)}{s}\")\n    maps = np.zeros(nc) + map\n    for i, c in enumerate(ap_class):\n        maps[c] = ap[i]\n    return (mp, mr, map50, map, *(loss.cpu() / len(dataloader)).tolist()), maps, t\n\n\ndef parse_opt():\n    parser = argparse.ArgumentParser()\n    parser.add_argument('--data', type=str, default=ROOT / 'data/coco128.yaml', help='dataset.yaml path')\n    parser.add_argument('--weights', nargs='+', type=str, default=ROOT / 'yolov5s.pt', help='model path(s)')\n    parser.add_argument('--batch-size', type=int, default=32, help='batch size')\n    parser.add_argument('--imgsz', '--img', '--img-size', type=int, default=640, help='inference size (pixels)')\n    parser.add_argument('--conf-thres', type=float, default=0.001, help='confidence threshold')\n    parser.add_argument('--iou-thres', type=float, default=0.6, help='NMS IoU threshold')\n    parser.add_argument('--max-det', type=int, default=300, help='maximum detections per image')\n    parser.add_argument('--task', default='val', help='train, val, test, speed or study')\n    parser.add_argument('--device', default='', help='cuda device, i.e. 0 or 0,1,2,3 or cpu')\n    parser.add_argument('--workers', type=int, default=8, help='max dataloader workers (per RANK in DDP mode)')\n    parser.add_argument('--single-cls', action='store_true', help='treat as single-class dataset')\n    parser.add_argument('--augment', action='store_true', help='augmented inference')\n    parser.add_argument('--verbose', action='store_true', help='report mAP by class')\n    parser.add_argument('--save-txt', action='store_true', help='save results to *.txt')\n    parser.add_argument('--save-hybrid', action='store_true', help='save label+prediction hybrid results to *.txt')\n    parser.add_argument('--save-conf', action='store_true', help='save confidences in --save-txt labels')\n    parser.add_argument('--save-json', action='store_true', help='save a COCO-JSON results file')\n    parser.add_argument('--project', default=ROOT / 'runs/val', help='save to project/name')\n    parser.add_argument('--name', default='exp', help='save to project/name')\n    parser.add_argument('--exist-ok', action='store_true', help='existing project/name ok, do not increment')\n    parser.add_argument('--half', action='store_true', help='use FP16 half-precision inference')\n    parser.add_argument('--dnn', action='store_true', help='use OpenCV DNN for ONNX inference')\n    opt = parser.parse_args()\n    opt.data = check_yaml(opt.data)  # check YAML\n    opt.save_json |= opt.data.endswith('coco.yaml')\n    opt.save_txt |= opt.save_hybrid\n    print_args(vars(opt))\n    return opt\n\n\ndef main(opt):\n    check_requirements(exclude=('tensorboard', 'thop'))\n\n    if opt.task in ('train', 'val', 'test'):  # run normally\n        if opt.conf_thres > 0.001:  # https://github.com/ultralytics/yolov5/issues/1466\n            LOGGER.info(f'WARNING ⚠️ confidence threshold {opt.conf_thres} > 0.001 produces invalid results')\n        if opt.save_hybrid:\n            LOGGER.info('WARNING ⚠️ --save-hybrid will return high mAP from hybrid labels, not from predictions alone')\n        run(**vars(opt))\n\n    else:\n        weights = opt.weights if isinstance(opt.weights, list) else [opt.weights]\n        opt.half = torch.cuda.is_available() and opt.device != 'cpu'  # FP16 for fastest results\n        if opt.task == 'speed':  # speed benchmarks\n            # python val.py --task speed --data coco.yaml --batch 1 --weights yolov5n.pt yolov5s.pt...\n            opt.conf_thres, opt.iou_thres, opt.save_json = 0.25, 0.45, False\n            for opt.weights in weights:\n                run(**vars(opt), plots=False)\n\n        elif opt.task == 'study':  # speed vs mAP benchmarks\n            # python val.py --task study --data coco.yaml --iou 0.7 --weights yolov5n.pt yolov5s.pt...\n            for opt.weights in weights:\n                f = f'study_{Path(opt.data).stem}_{Path(opt.weights).stem}.txt'  # filename to save to\n                x, y = list(range(256, 1536 + 128, 128)), []  # x axis (image sizes), y axis\n                for opt.imgsz in x:  # img-size\n                    LOGGER.info(f'\\nRunning {f} --imgsz {opt.imgsz}...')\n                    r, _, t = run(**vars(opt), plots=False)\n                    y.append(r + t)  # results and times\n                np.savetxt(f, y, fmt='%10.4g')  # save\n            subprocess.run(['zip', '-r', 'study.zip', 'study_*.txt'])\n            plot_val_study(x=x)  # plot\n        else:\n            raise NotImplementedError(f'--task {opt.task} not in (\"train\", \"val\", \"test\", \"speed\", \"study\")')\n\n\nif __name__ == '__main__':\n    opt = parse_opt()\n    main(opt)\n"
  },
  {
    "path": "yolo-improve/yolov5-C3RFEM.py",
    "content": "class TridentBlock(nn.Module):\n    def __init__(self, c1, c2, stride=1, c=False, e=0.5, padding=[1, 2, 3], dilate=[1, 2, 3], bias=False):\n        super(TridentBlock, self).__init__()\n        self.stride = stride\n        self.c = c\n        c_ = int(c2 * e)\n        self.padding = padding\n        self.dilate = dilate\n        self.share_weightconv1 = nn.Parameter(torch.Tensor(c_, c1, 1, 1))\n        self.share_weightconv2 = nn.Parameter(torch.Tensor(c2, c_, 3, 3))\n\n        self.bn1 = nn.BatchNorm2d(c_)\n        self.bn2 = nn.BatchNorm2d(c2)\n\n        self.act = nn.SiLU()\n\n        nn.init.kaiming_uniform_(self.share_weightconv1, nonlinearity=\"relu\")\n        nn.init.kaiming_uniform_(self.share_weightconv2, nonlinearity=\"relu\")\n\n        if bias:\n            self.bias = nn.Parameter(torch.Tensor(c2))\n        else:\n            self.bias = None\n\n        if self.bias is not None:\n            nn.init.constant_(self.bias, 0)\n\n    def forward_for_small(self, x):\n        residual = x\n        out = nn.functional.conv2d(x, self.share_weightconv1, bias=self.bias)\n        out = self.bn1(out)\n        out = self.act(out)\n\n        out = nn.functional.conv2d(out, self.share_weightconv2, bias=self.bias, stride=self.stride, padding=self.padding[0],\n                                   dilation=self.dilate[0])\n        out = self.bn2(out)\n        out += residual\n        out = self.act(out)\n\n        return out\n\n    def forward_for_middle(self, x):\n        residual = x\n        out = nn.functional.conv2d(x, self.share_weightconv1, bias=self.bias)\n        out = self.bn1(out)\n        out = self.act(out)\n\n        out = nn.functional.conv2d(out, self.share_weightconv2, bias=self.bias, stride=self.stride, padding=self.padding[1],\n                                   dilation=self.dilate[1])\n        out = self.bn2(out)\n        out += residual\n        out = self.act(out)\n\n        return out\n\n    def forward_for_big(self, x):\n        residual = x\n        out = nn.functional.conv2d(x, self.share_weightconv1, bias=self.bias)\n        out = self.bn1(out)\n        out = self.act(out)\n\n        out = nn.functional.conv2d(out, self.share_weightconv2, bias=self.bias, stride=self.stride, padding=self.padding[2],\n                                   dilation=self.dilate[2])\n        out = self.bn2(out)\n        out += residual\n        out = self.act(out)\n\n        return out\n\n    def forward(self, x):\n        xm = x\n        base_feat = []\n        if self.c is not False:\n            x1 = self.forward_for_small(x)\n            x2 = self.forward_for_middle(x)\n            x3 = self.forward_for_big(x)\n        else:\n            x1 = self.forward_for_small(xm[0])\n            x2 = self.forward_for_middle(xm[1])\n            x3 = self.forward_for_big(xm[2])\n\n        base_feat.append(x1)\n        base_feat.append(x2)\n        base_feat.append(x3)\n\n        return base_feat\n\nclass RFEM(nn.Module):\n    def __init__(self, c1, c2, n=1, e=0.5, stride=1):\n        super(RFEM, self).__init__()\n        c = True\n        layers = []\n        layers.append(TridentBlock(c1, c2, stride=stride, c=c, e=e))\n        c1 = c2\n        for i in range(1, n):\n            layers.append(TridentBlock(c1, c2))\n        self.layer = nn.Sequential(*layers)\n        self.bn = nn.BatchNorm2d(c2)\n        self.act = nn.SiLU()\n\n    def forward(self, x):\n        out = self.layer(x)\n        out = out[0] + out[1] + out[2] + x\n        out = self.act(self.bn(out))\n        return out\n\nclass C3RFEM(C3):\n    # C3 module with RFEM\n    def __init__(self, c1, c2, n=1, shortcut=False, g=1, e=0.5):\n        super().__init__(c1, c2, n, shortcut, g, e)\n        c_ = int(c2 * e)\n        self.m = nn.Sequential(*(RFEM(c_, c_, n=1, e=e) for _ in range(n)))\n\n# YOLOv5 🚀 by Ultralytics, AGPL-3.0 license\n\n# Parameters\nnc: 80  # number of classes\ndepth_multiple: 0.33  # model depth multiple\nwidth_multiple: 0.25  # layer channel multiple\nanchors:\n  - [10,13, 16,30, 33,23]  # P3/8\n  - [30,61, 62,45, 59,119]  # P4/16\n  - [116,90, 156,198, 373,326]  # P5/32\n\n# YOLOv5 v6.0 backbone\nbackbone:\n  # [from, number, module, args]\n  [[-1, 1, Conv, [64, 6, 2, 2]],  # 0-P1/2\n   [-1, 1, Conv, [128, 3, 2]],  # 1-P2/4\n   [-1, 3, C3, [128]],\n   [-1, 1, Conv, [256, 3, 2]],  # 3-P3/8\n   [-1, 6, C3, [256]],\n   [-1, 1, Conv, [512, 3, 2]],  # 5-P4/16\n   [-1, 9, C3, [512]],\n   [-1, 1, Conv, [1024, 3, 2]],  # 7-P5/32\n   [-1, 3, C3, [1024]],\n   [-1, 1, SPPF, [1024, 5]],  # 9\n   [-1, 1, C3RFEM, [1024]] # 10\n  ]\n\n# YOLOv5 v6.0 head\nhead:\n  [[-1, 1, Conv, [512, 1, 1]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [[-1, 6], 1, Concat, [1]],  # cat backbone P4\n   [-1, 3, C3, [512, False]],  # 14\n\n   [-1, 1, Conv, [256, 1, 1]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [[-1, 4], 1, Concat, [1]],  # cat backbone P3\n   [-1, 3, C3, [256, False]],  # 18 (P3/8-small)\n\n   [-1, 1, Conv, [256, 3, 2]],\n   [[-1, 15], 1, Concat, [1]],  # cat head P4\n   [-1, 3, C3, [512, False]],  # 21 (P4/16-medium)\n\n   [-1, 1, Conv, [512, 3, 2]],\n   [[-1, 11], 1, Concat, [1]],  # cat head P5\n   [-1, 3, C3, [1024, False]],  # 24 (P5/32-large)\n\n   [[18, 21, 24], 1, Detect, [nc, anchors]],  # Detect(P3, P4, P5)\n  ]\n"
  },
  {
    "path": "yolo-improve/yolov5-CARAFE.py",
    "content": "class CARAFE(nn.Module):\n    def __init__(self, c, k_enc=3, k_up=5, c_mid=64, scale=2):\n        \"\"\" The unofficial implementation of the CARAFE module.\n        The details are in \"https://arxiv.org/abs/1905.02188\".\n        Args:\n            c: The channel number of the input and the output.\n            c_mid: The channel number after compression.\n            scale: The expected upsample scale.\n            k_up: The size of the reassembly kernel.\n            k_enc: The kernel size of the encoder.\n        Returns:\n            X: The upsampled feature map.\n        \"\"\"\n        super(CARAFE, self).__init__()\n        self.scale = scale\n\n        self.comp = Conv(c, c_mid)\n        self.enc = Conv(c_mid, (scale*k_up)**2, k=k_enc, act=False)\n        self.pix_shf = nn.PixelShuffle(scale)\n\n        self.upsmp = nn.Upsample(scale_factor=scale, mode='nearest')\n        self.unfold = nn.Unfold(kernel_size=k_up, dilation=scale, \n                                padding=k_up//2*scale)\n\n    def forward(self, X):\n        b, c, h, w = X.size()\n        h_, w_ = h * self.scale, w * self.scale\n        \n        W = self.comp(X)                                # b * m * h * w\n        W = self.enc(W)                                 # b * 100 * h * w\n        W = self.pix_shf(W)                             # b * 25 * h_ * w_\n        W = torch.softmax(W, dim=1)                         # b * 25 * h_ * w_\n\n        X = self.upsmp(X)                               # b * c * h_ * w_\n        X = self.unfold(X)                              # b * 25c * h_ * w_\n        X = X.view(b, c, -1, h_, w_)                    # b * 25 * c * h_ * w_\n\n        X = torch.einsum('bkhw,bckhw->bchw', [W, X])    # b * c * h_ * w_\n        return X\n\nelif m is CARAFE:\n    c2 = ch[f]\n    args = [c2, *args]"
  },
  {
    "path": "yolo-improve/yolov5-CCFM.py",
    "content": "class RepConv(nn.Module):\n    \"\"\"\n    RepConv is a basic rep-style block, including training and deploy status.\n\n    This module is used in RT-DETR.\n    Based on https://github.com/DingXiaoH/RepVGG/blob/main/repvgg.py\n    \"\"\"\n    default_act = nn.SiLU()  # default activation\n\n    def __init__(self, c1, c2, k=3, s=1, p=1, g=1, d=1, act=True, bn=False, deploy=False):\n        \"\"\"Initializes Light Convolution layer with inputs, outputs & optional activation function.\"\"\"\n        super().__init__()\n        assert k == 3 and p == 1\n        self.g = g\n        self.c1 = c1\n        self.c2 = c2\n        self.act = self.default_act if act is True else act if isinstance(act, nn.Module) else nn.Identity()\n\n        self.bn = nn.BatchNorm2d(num_features=c1) if bn and c2 == c1 and s == 1 else None\n        self.conv1 = Conv(c1, c2, k, s, p=p, g=g, act=False)\n        self.conv2 = Conv(c1, c2, 1, s, p=(p - k // 2), g=g, act=False)\n\n    def forward_fuse(self, x):\n        \"\"\"Forward process.\"\"\"\n        return self.act(self.conv(x))\n\n    def forward(self, x):\n        \"\"\"Forward process.\"\"\"\n        id_out = 0 if self.bn is None else self.bn(x)\n        return self.act(self.conv1(x) + self.conv2(x) + id_out)\n\n    def get_equivalent_kernel_bias(self):\n        \"\"\"Returns equivalent kernel and bias by adding 3x3 kernel, 1x1 kernel and identity kernel with their biases.\"\"\"\n        kernel3x3, bias3x3 = self._fuse_bn_tensor(self.conv1)\n        kernel1x1, bias1x1 = self._fuse_bn_tensor(self.conv2)\n        kernelid, biasid = self._fuse_bn_tensor(self.bn)\n        return kernel3x3 + self._pad_1x1_to_3x3_tensor(kernel1x1) + kernelid, bias3x3 + bias1x1 + biasid\n\n    def _pad_1x1_to_3x3_tensor(self, kernel1x1):\n        \"\"\"Pads a 1x1 tensor to a 3x3 tensor.\"\"\"\n        if kernel1x1 is None:\n            return 0\n        else:\n            return torch.nn.functional.pad(kernel1x1, [1, 1, 1, 1])\n\n    def _fuse_bn_tensor(self, branch):\n        \"\"\"Generates appropriate kernels and biases for convolution by fusing branches of the neural network.\"\"\"\n        if branch is None:\n            return 0, 0\n        if isinstance(branch, Conv):\n            kernel = branch.conv.weight\n            running_mean = branch.bn.running_mean\n            running_var = branch.bn.running_var\n            gamma = branch.bn.weight\n            beta = branch.bn.bias\n            eps = branch.bn.eps\n        elif isinstance(branch, nn.BatchNorm2d):\n            if not hasattr(self, 'id_tensor'):\n                input_dim = self.c1 // self.g\n                kernel_value = np.zeros((self.c1, input_dim, 3, 3), dtype=np.float32)\n                for i in range(self.c1):\n                    kernel_value[i, i % input_dim, 1, 1] = 1\n                self.id_tensor = torch.from_numpy(kernel_value).to(branch.weight.device)\n            kernel = self.id_tensor\n            running_mean = branch.running_mean\n            running_var = branch.running_var\n            gamma = branch.weight\n            beta = branch.bias\n            eps = branch.eps\n        std = (running_var + eps).sqrt()\n        t = (gamma / std).reshape(-1, 1, 1, 1)\n        return kernel * t, beta - running_mean * gamma / std\n\n    def fuse_convs(self):\n        \"\"\"Combines two convolution layers into a single layer and removes unused attributes from the class.\"\"\"\n        if hasattr(self, 'conv'):\n            return\n        kernel, bias = self.get_equivalent_kernel_bias()\n        self.conv = nn.Conv2d(in_channels=self.conv1.conv.in_channels,\n                              out_channels=self.conv1.conv.out_channels,\n                              kernel_size=self.conv1.conv.kernel_size,\n                              stride=self.conv1.conv.stride,\n                              padding=self.conv1.conv.padding,\n                              dilation=self.conv1.conv.dilation,\n                              groups=self.conv1.conv.groups,\n                              bias=True).requires_grad_(False)\n        self.conv.weight.data = kernel\n        self.conv.bias.data = bias\n        for para in self.parameters():\n            para.detach_()\n        self.__delattr__('conv1')\n        self.__delattr__('conv2')\n        if hasattr(self, 'nm'):\n            self.__delattr__('nm')\n        if hasattr(self, 'bn'):\n            self.__delattr__('bn')\n        if hasattr(self, 'id_tensor'):\n            self.__delattr__('id_tensor')\n\nclass RepC3(nn.Module):\n    \"\"\"Rep C3.\"\"\"\n\n    def __init__(self, c1, c2, n=3, e=1.0):\n        \"\"\"Initialize CSP Bottleneck with a single convolution using input channels, output channels, and number.\"\"\"\n        super().__init__()\n        c_ = int(c2 * e)  # hidden channels\n        self.cv1 = Conv(c1, c_, 1, 1)\n        self.cv2 = Conv(c1, c_, 1, 1)\n        self.m = nn.Sequential(*[RepConv(c_, c_) for _ in range(n)])\n        self.cv3 = Conv(c_, c2, 1, 1) if c_ != c2 else nn.Identity()\n\n    def forward(self, x):\n        \"\"\"Forward pass of RT-DETR neck layer.\"\"\"\n        return self.cv3(self.m(self.cv1(x)) + self.cv2(x))\n\n# YOLOv5 🚀 by Ultralytics, AGPL-3.0 license\n\n# Parameters\nnc: 80  # number of classes\ndepth_multiple: 0.33  # model depth multiple\nwidth_multiple: 0.25  # layer channel multiple\nanchors:\n  - [10,13, 16,30, 33,23]  # P3/8\n  - [30,61, 62,45, 59,119]  # P4/16\n  - [116,90, 156,198, 373,326]  # P5/32\n\n# YOLOv5 v6.0 backbone\nbackbone:\n  # [from, number, module, args]\n  [[-1, 1, Conv, [64, 6, 2, 2]],  # 0-P1/2\n   [-1, 1, Conv, [128, 3, 2]],  # 1-P2/4\n   [-1, 3, C3, [128]],\n   [-1, 1, Conv, [256, 3, 2]],  # 3-P3/8\n   [-1, 6, C3, [256]],\n   [-1, 1, Conv, [512, 3, 2]],  # 5-P4/16\n   [-1, 9, C3, [512]],\n   [-1, 1, Conv, [1024, 3, 2]],  # 7-P5/32\n   [-1, 3, C3, [1024]],\n   [-1, 1, SPPF, [1024, 5]],  # 9\n  ]\n\n# YOLOv5 v6.0 head\nhead:\n  [[-1, 1, nn.Upsample, [None, 2, 'nearest']], # 10\n   [6, 1, Conv, [256, 1, 1, None, 1, 1, False]],  \n   [[-2, -1], 1, Concat, [1]], \n   [-1, 3, RepC3, [256, 0.5]],  \n   [-1, 1, Conv, [256, 1, 1]], # 14\n\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']], #15\n   [4, 1, Conv, [256, 1, 1, None, 1, 1, False]],  \n   [[-2, -1], 1, Concat, [1]],  \n   [-1, 3, RepC3, [256, 0.5]], # 18\n\n   [-1, 1, Conv, [256, 3, 2]], # 19   \n   [[-1, 14], 1, Concat, [1]],  \n   [-1, 3, RepC3, [256, 0.5]], # 21    \n\n   [-1, 1, Conv, [256, 3, 2]], # 22   \n   [[-1, 9], 1, Concat, [1]],  \n   [-1, 3, RepC3, [256, 0.5]], # 24    \n\n   [[18, 21, 24], 1, Detect, [nc, anchors]],  # Detect(P3, P4, P5)\n  ]\n"
  },
  {
    "path": "yolo-improve/yolov5-ContextAggregation.py",
    "content": "from mmcv.cnn import ConvModule\nfrom mmengine.model import caffe2_xavier_init, constant_init\nclass ContextAggregation(nn.Module):\n    \"\"\"\n    Context Aggregation Block.\n\n    Args:\n        in_channels (int): Number of input channels.\n        reduction (int, optional): Channel reduction ratio. Default: 1.\n        conv_cfg (dict or None, optional): Config dict for the convolution\n            layer. Default: None.\n    \"\"\"\n\n    def __init__(self, in_channels, reduction=1):\n        super(ContextAggregation, self).__init__()\n        self.in_channels = in_channels\n        self.reduction = reduction\n        self.inter_channels = max(in_channels // reduction, 1)\n\n        conv_params = dict(kernel_size=1, act_cfg=None)\n\n        self.a = ConvModule(in_channels, 1, **conv_params)\n        self.k = ConvModule(in_channels, 1, **conv_params)\n        self.v = ConvModule(in_channels, self.inter_channels, **conv_params)\n        self.m = ConvModule(self.inter_channels, in_channels, **conv_params)\n\n        self.init_weights()\n\n    def init_weights(self):\n        for m in (self.a, self.k, self.v):\n            caffe2_xavier_init(m.conv)\n        constant_init(self.m.conv, 0)\n\n    def forward(self, x):\n        n, c = x.size(0), self.inter_channels\n\n        # a: [N, 1, H, W]\n        a = self.a(x).sigmoid()\n\n        # k: [N, 1, HW, 1]\n        k = self.k(x).view(n, 1, -1, 1).softmax(2)\n\n        # v: [N, 1, C, HW]\n        v = self.v(x).view(n, 1, c, -1)\n\n        # y: [N, C, 1, 1]\n        y = torch.matmul(v, k).view(n, c, 1, 1)\n        y = self.m(y) * a\n\n        return x + y\n\n\n# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n# Parameters\nnc: 80  # number of classes\ndepth_multiple: 0.33  # model depth multiple\nwidth_multiple: 0.25  # layer channel multiple\nanchors:\n  - [10,13, 16,30, 33,23]  # P3/8\n  - [30,61, 62,45, 59,119]  # P4/16\n  - [116,90, 156,198, 373,326]  # P5/32\n\n# YOLOv5 v6.0 backbone\nbackbone:\n  # [from, number, module, args]\n  [[-1, 1, Conv, [64, 6, 2, 2]],  # 0-P1/2\n   [-1, 1, Conv, [128, 3, 2]],  # 1-P2/4\n   [-1, 3, C3, [128]],\n   [-1, 1, Conv, [256, 3, 2]],  # 3-P3/8\n   [-1, 6, C3, [256]],\n   [-1, 1, Conv, [512, 3, 2]],  # 5-P4/16\n   [-1, 9, C3, [512]],\n   [-1, 1, Conv, [1024, 3, 2]],  # 7-P5/32\n   [-1, 3, C3, [1024]],\n   [-1, 1, SPPF, [1024, 5]],  # 9\n  ]\n\n# YOLOv5 v6.0 head\nhead:\n  [[-1, 1, Conv, [512, 1, 1]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [[-1, 6], 1, Concat, [1]],  # cat backbone P4\n   [-1, 3, C3, [512, False]],  # 13\n\n   [-1, 1, Conv, [256, 1, 1]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [[-1, 4], 1, Concat, [1]],  # cat backbone P3\n   [-1, 3, C3, [256, False]],  # 17 (P3/8-small)\n\n   [-1, 1, Conv, [256, 3, 2]],\n   [[-1, 14], 1, Concat, [1]],  # cat head P4\n   [-1, 3, C3, [512, False]],  # 20 (P4/16-medium)\n\n   [-1, 1, Conv, [512, 3, 2]],\n   [[-1, 10], 1, Concat, [1]],  # cat head P5\n   [-1, 3, C3, [1024, False]],  # 23 (P5/32-large)\n\n   [17, 1, ContextAggregation, []], # 24\n   [20, 1, ContextAggregation, []], # 25\n   [23, 1, ContextAggregation, []], # 26\n\n   [[24, 25, 26], 1, Detect, [nc, anchors]],  # Detect(P3, P4, P5)\n  ]\n"
  },
  {
    "path": "yolo-improve/yolov5-CoordConv.py",
    "content": "class AddCoords(nn.Module):\n\n    def __init__(self, with_r=False):\n        super().__init__()\n        self.with_r = with_r\n\n    def forward(self, input_tensor):\n        \"\"\"\n        Args:\n            input_tensor: shape(batch, channel, x_dim, y_dim)\n        \"\"\"\n        batch_size, _, x_dim, y_dim = input_tensor.size()\n\n        xx_channel = torch.arange(x_dim).repeat(1, y_dim, 1)\n        yy_channel = torch.arange(y_dim).repeat(1, x_dim, 1).transpose(1, 2)\n\n        xx_channel = xx_channel.float() / (x_dim - 1)\n        yy_channel = yy_channel.float() / (y_dim - 1)\n\n        xx_channel = xx_channel * 2 - 1\n        yy_channel = yy_channel * 2 - 1\n\n        xx_channel = xx_channel.repeat(batch_size, 1, 1, 1).transpose(2, 3)\n        yy_channel = yy_channel.repeat(batch_size, 1, 1, 1).transpose(2, 3)\n\n        ret = torch.cat([\n            input_tensor,\n            xx_channel.type_as(input_tensor),\n            yy_channel.type_as(input_tensor)], dim=1)\n\n        if self.with_r:\n            rr = torch.sqrt(torch.pow(xx_channel.type_as(input_tensor) - 0.5, 2) + torch.pow(yy_channel.type_as(input_tensor) - 0.5, 2))\n            ret = torch.cat([ret, rr], dim=1)\n\n        return ret\n\n\nclass CoordConv(nn.Module):\n\n    def __init__(self, in_channels, out_channels, kernel_size=1, stride=1, with_r=False):\n        super().__init__()\n        self.addcoords = AddCoords(with_r=with_r)\n        in_channels += 2\n        if with_r:\n            in_channels += 1\n        self.conv = Conv(in_channels, out_channels, k=kernel_size, s=stride)\n\n    def forward(self, x):\n        x = self.addcoords(x)\n        x = self.conv(x)\n        return x\n\n# YOLOv5 v6.0 backbone\nbackbone:\n  # [from, number, module, args]\n  [[-1, 1, Conv, [64, 6, 2, 2]],  # 0-P1/2\n   [-1, 1, Conv, [128, 3, 2]],  # 1-P2/4\n   [-1, 3, C3, [128]],\n   [-1, 1, Conv, [256, 3, 2]],  # 3-P3/8\n   [-1, 6, C3, [256]],\n   [-1, 1, Conv, [512, 3, 2]],  # 5-P4/16\n   [-1, 9, C3, [512]],\n   [-1, 1, Conv, [1024, 3, 2]],  # 7-P5/32\n   [-1, 3, C3, [1024]],\n   [-1, 1, SPPF, [1024, 5]],  # 9\n  ]\n\n# YOLOv5 v6.0 head\nhead:\n  [[-1, 1, CoordConv, [512, 1, 1]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [[-1, 6], 1, Concat, [1]],  # cat backbone P4\n   [-1, 3, C3, [512, False]],  # 13\n\n   [-1, 1, CoordConv, [256, 1, 1]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [[-1, 4], 1, Concat, [1]],  # cat backbone P3\n   [-1, 3, C3, [256, False]],  # 17 (P3/8-small)\n\n   [-1, 1, Conv, [256, 3, 2]],\n   [[-1, 14], 1, Concat, [1]],  # cat head P4\n   [-1, 3, C3, [512, False]],  # 20 (P4/16-medium)\n\n   [-1, 1, Conv, [512, 3, 2]],\n   [[-1, 10], 1, Concat, [1]],  # cat head P5\n   [-1, 3, C3, [1024, False]],  # 23 (P5/32-large)\n\n   [17, 1, CoordConv, [256, 3, 1]], # 24\n   [20, 1, CoordConv, [512, 3, 1]], # 25\n   [23, 1, CoordConv, [1024, 3, 1]], # 26\n\n   [[24, 25, 26], 1, Detect, [nc, anchors]],  # Detect(P3, P4, P5)\n  ]"
  },
  {
    "path": "yolo-improve/yolov5-DBB.py",
    "content": "import torch.nn.functional as F\ndef transI_fusebn(kernel, bn):\n    gamma = bn.weight\n    std = (bn.running_var + bn.eps).sqrt()\n    return kernel * ((gamma / std).reshape(-1, 1, 1, 1)), bn.bias - bn.running_mean * gamma / std\n\ndef transII_addbranch(kernels, biases):\n    return sum(kernels), sum(biases)\n\ndef transIII_1x1_kxk(k1, b1, k2, b2, groups):\n    if groups == 1:\n        k = F.conv2d(k2, k1.permute(1, 0, 2, 3))      #\n        b_hat = (k2 * b1.reshape(1, -1, 1, 1)).sum((1, 2, 3))\n    else:\n        k_slices = []\n        b_slices = []\n        k1_T = k1.permute(1, 0, 2, 3)\n        k1_group_width = k1.size(0) // groups\n        k2_group_width = k2.size(0) // groups\n        for g in range(groups):\n            k1_T_slice = k1_T[:, g*k1_group_width:(g+1)*k1_group_width, :, :]\n            k2_slice = k2[g*k2_group_width:(g+1)*k2_group_width, :, :, :]\n            k_slices.append(F.conv2d(k2_slice, k1_T_slice))\n            b_slices.append((k2_slice * b1[g*k1_group_width:(g+1)*k1_group_width].reshape(1, -1, 1, 1)).sum((1, 2, 3)))\n        k, b_hat = transIV_depthconcat(k_slices, b_slices)\n    return k, b_hat + b2\n\ndef transIV_depthconcat(kernels, biases):\n    return torch.cat(kernels, dim=0), torch.cat(biases)\n\ndef transV_avg(channels, kernel_size, groups):\n    input_dim = channels // groups\n    k = torch.zeros((channels, input_dim, kernel_size, kernel_size))\n    k[np.arange(channels), np.tile(np.arange(input_dim), groups), :, :] = 1.0 / kernel_size ** 2\n    return k\n\n#   This has not been tested with non-square kernels (kernel.size(2) != kernel.size(3)) nor even-size kernels\ndef transVI_multiscale(kernel, target_kernel_size):\n    H_pixels_to_pad = (target_kernel_size - kernel.size(2)) // 2\n    W_pixels_to_pad = (target_kernel_size - kernel.size(3)) // 2\n    return F.pad(kernel, [H_pixels_to_pad, H_pixels_to_pad, W_pixels_to_pad, W_pixels_to_pad])\n\ndef conv_bn(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1,\n                   padding_mode='zeros'):\n    conv_layer = nn.Conv2d(in_channels=in_channels, out_channels=out_channels, kernel_size=kernel_size,\n                           stride=stride, padding=padding, dilation=dilation, groups=groups,\n                           bias=False, padding_mode=padding_mode)\n    bn_layer = nn.BatchNorm2d(num_features=out_channels, affine=True)\n    se = nn.Sequential()\n    se.add_module('conv', conv_layer)\n    se.add_module('bn', bn_layer)\n    return se\n\n\nclass IdentityBasedConv1x1(nn.Conv2d):\n    def __init__(self, channels, groups=1):\n        super(IdentityBasedConv1x1, self).__init__(in_channels=channels, out_channels=channels, kernel_size=1, stride=1, padding=0, groups=groups, bias=False)\n\n        assert channels % groups == 0\n        input_dim = channels // groups\n        id_value = np.zeros((channels, input_dim, 1, 1))\n        for i in range(channels):\n            id_value[i, i % input_dim, 0, 0] = 1\n        self.id_tensor = torch.from_numpy(id_value).type_as(self.weight)\n        nn.init.zeros_(self.weight)\n\n    def forward(self, input):\n        kernel = self.weight + self.id_tensor.to(self.weight.device).type_as(self.weight)\n        result = F.conv2d(input, kernel, None, stride=1, padding=0, dilation=self.dilation, groups=self.groups)\n        return result\n\n    def get_actual_kernel(self):\n        return self.weight + self.id_tensor.to(self.weight.device)\n\n\nclass BNAndPadLayer(nn.Module):\n    def __init__(self,\n                 pad_pixels,\n                 num_features,\n                 eps=1e-5,\n                 momentum=0.1,\n                 affine=True,\n                 track_running_stats=True):\n        super(BNAndPadLayer, self).__init__()\n        self.bn = nn.BatchNorm2d(num_features, eps, momentum, affine, track_running_stats)\n        self.pad_pixels = pad_pixels\n\n    def forward(self, input):\n        output = self.bn(input)\n        if self.pad_pixels > 0:\n            if self.bn.affine:\n                pad_values = self.bn.bias.detach() - self.bn.running_mean * self.bn.weight.detach() / torch.sqrt(self.bn.running_var + self.bn.eps)\n            else:\n                pad_values = - self.bn.running_mean / torch.sqrt(self.bn.running_var + self.bn.eps)\n            output = F.pad(output, [self.pad_pixels] * 4)\n            pad_values = pad_values.view(1, -1, 1, 1)\n            output[:, :, 0:self.pad_pixels, :] = pad_values\n            output[:, :, -self.pad_pixels:, :] = pad_values\n            output[:, :, :, 0:self.pad_pixels] = pad_values\n            output[:, :, :, -self.pad_pixels:] = pad_values\n        return output\n\n    @property\n    def weight(self):\n        return self.bn.weight\n\n    @property\n    def bias(self):\n        return self.bn.bias\n\n    @property\n    def running_mean(self):\n        return self.bn.running_mean\n\n    @property\n    def running_var(self):\n        return self.bn.running_var\n\n    @property\n    def eps(self):\n        return self.bn.eps\n\n\nclass DiverseBranchBlock(nn.Module):\n    def __init__(self, in_channels, out_channels, kernel_size,\n                 stride=1, padding=None, dilation=1, groups=1,\n                 internal_channels_1x1_3x3=None,\n                 deploy=False, single_init=False):\n        super(DiverseBranchBlock, self).__init__()\n        self.deploy = deploy\n\n        self.nonlinear = Conv.default_act\n\n        self.kernel_size = kernel_size\n        self.out_channels = out_channels\n        self.groups = groups\n        \n        if padding is None:\n            padding = autopad(kernel_size, padding, dilation)\n        assert padding == kernel_size // 2\n\n        if deploy:\n            self.dbb_reparam = nn.Conv2d(in_channels=in_channels, out_channels=out_channels, kernel_size=kernel_size, stride=stride,\n                                      padding=padding, dilation=dilation, groups=groups, bias=True)\n\n        else:\n\n            self.dbb_origin = conv_bn(in_channels=in_channels, out_channels=out_channels, kernel_size=kernel_size, stride=stride, padding=padding, dilation=dilation, groups=groups)\n\n            self.dbb_avg = nn.Sequential()\n            if groups < out_channels:\n                self.dbb_avg.add_module('conv',\n                                        nn.Conv2d(in_channels=in_channels, out_channels=out_channels, kernel_size=1,\n                                                  stride=1, padding=0, groups=groups, bias=False))\n                self.dbb_avg.add_module('bn', BNAndPadLayer(pad_pixels=padding, num_features=out_channels))\n                self.dbb_avg.add_module('avg', nn.AvgPool2d(kernel_size=kernel_size, stride=stride, padding=0))\n                self.dbb_1x1 = conv_bn(in_channels=in_channels, out_channels=out_channels, kernel_size=1, stride=stride,\n                                       padding=0, groups=groups)\n            else:\n                self.dbb_avg.add_module('avg', nn.AvgPool2d(kernel_size=kernel_size, stride=stride, padding=padding))\n\n            self.dbb_avg.add_module('avgbn', nn.BatchNorm2d(out_channels))\n\n\n            if internal_channels_1x1_3x3 is None:\n                internal_channels_1x1_3x3 = in_channels if groups < out_channels else 2 * in_channels   # For mobilenet, it is better to have 2X internal channels\n\n            self.dbb_1x1_kxk = nn.Sequential()\n            if internal_channels_1x1_3x3 == in_channels:\n                self.dbb_1x1_kxk.add_module('idconv1', IdentityBasedConv1x1(channels=in_channels, groups=groups))\n            else:\n                self.dbb_1x1_kxk.add_module('conv1', nn.Conv2d(in_channels=in_channels, out_channels=internal_channels_1x1_3x3,\n                                                            kernel_size=1, stride=1, padding=0, groups=groups, bias=False))\n            self.dbb_1x1_kxk.add_module('bn1', BNAndPadLayer(pad_pixels=padding, num_features=internal_channels_1x1_3x3, affine=True))\n            self.dbb_1x1_kxk.add_module('conv2', nn.Conv2d(in_channels=internal_channels_1x1_3x3, out_channels=out_channels,\n                                                            kernel_size=kernel_size, stride=stride, padding=0, groups=groups, bias=False))\n            self.dbb_1x1_kxk.add_module('bn2', nn.BatchNorm2d(out_channels))\n\n        #   The experiments reported in the paper used the default initialization of bn.weight (all as 1). But changing the initialization may be useful in some cases.\n        if single_init:\n            #   Initialize the bn.weight of dbb_origin as 1 and others as 0. This is not the default setting.\n            self.single_init()\n\n    def get_equivalent_kernel_bias(self):\n        k_origin, b_origin = transI_fusebn(self.dbb_origin.conv.weight, self.dbb_origin.bn)\n\n        if hasattr(self, 'dbb_1x1'):\n            k_1x1, b_1x1 = transI_fusebn(self.dbb_1x1.conv.weight, self.dbb_1x1.bn)\n            k_1x1 = transVI_multiscale(k_1x1, self.kernel_size)\n        else:\n            k_1x1, b_1x1 = 0, 0\n\n        if hasattr(self.dbb_1x1_kxk, 'idconv1'):\n            k_1x1_kxk_first = self.dbb_1x1_kxk.idconv1.get_actual_kernel()\n        else:\n            k_1x1_kxk_first = self.dbb_1x1_kxk.conv1.weight\n        k_1x1_kxk_first, b_1x1_kxk_first = transI_fusebn(k_1x1_kxk_first, self.dbb_1x1_kxk.bn1)\n        k_1x1_kxk_second, b_1x1_kxk_second = transI_fusebn(self.dbb_1x1_kxk.conv2.weight, self.dbb_1x1_kxk.bn2)\n        k_1x1_kxk_merged, b_1x1_kxk_merged = transIII_1x1_kxk(k_1x1_kxk_first, b_1x1_kxk_first, k_1x1_kxk_second, b_1x1_kxk_second, groups=self.groups)\n\n        k_avg = transV_avg(self.out_channels, self.kernel_size, self.groups)\n        k_1x1_avg_second, b_1x1_avg_second = transI_fusebn(k_avg.to(self.dbb_avg.avgbn.weight.device), self.dbb_avg.avgbn)\n        if hasattr(self.dbb_avg, 'conv'):\n            k_1x1_avg_first, b_1x1_avg_first = transI_fusebn(self.dbb_avg.conv.weight, self.dbb_avg.bn)\n            k_1x1_avg_merged, b_1x1_avg_merged = transIII_1x1_kxk(k_1x1_avg_first, b_1x1_avg_first, k_1x1_avg_second, b_1x1_avg_second, groups=self.groups)\n        else:\n            k_1x1_avg_merged, b_1x1_avg_merged = k_1x1_avg_second, b_1x1_avg_second\n\n        return transII_addbranch((k_origin, k_1x1, k_1x1_kxk_merged, k_1x1_avg_merged), (b_origin, b_1x1, b_1x1_kxk_merged, b_1x1_avg_merged))\n\n    def switch_to_deploy(self):\n        if hasattr(self, 'dbb_reparam'):\n            return\n        kernel, bias = self.get_equivalent_kernel_bias()\n        self.dbb_reparam = nn.Conv2d(in_channels=self.dbb_origin.conv.in_channels, out_channels=self.dbb_origin.conv.out_channels,\n                                     kernel_size=self.dbb_origin.conv.kernel_size, stride=self.dbb_origin.conv.stride,\n                                     padding=self.dbb_origin.conv.padding, dilation=self.dbb_origin.conv.dilation, groups=self.dbb_origin.conv.groups, bias=True)\n        self.dbb_reparam.weight.data = kernel\n        self.dbb_reparam.bias.data = bias\n        for para in self.parameters():\n            para.detach_()\n        self.__delattr__('dbb_origin')\n        self.__delattr__('dbb_avg')\n        if hasattr(self, 'dbb_1x1'):\n            self.__delattr__('dbb_1x1')\n        self.__delattr__('dbb_1x1_kxk')\n\n    def forward(self, inputs):\n        if hasattr(self, 'dbb_reparam'):\n            return self.nonlinear(self.dbb_reparam(inputs))\n\n        out = self.dbb_origin(inputs)\n        if hasattr(self, 'dbb_1x1'):\n            out += self.dbb_1x1(inputs)\n        out += self.dbb_avg(inputs)\n        out += self.dbb_1x1_kxk(inputs)\n        return self.nonlinear(out)\n\n    def init_gamma(self, gamma_value):\n        if hasattr(self, \"dbb_origin\"):\n            torch.nn.init.constant_(self.dbb_origin.bn.weight, gamma_value)\n        if hasattr(self, \"dbb_1x1\"):\n            torch.nn.init.constant_(self.dbb_1x1.bn.weight, gamma_value)\n        if hasattr(self, \"dbb_avg\"):\n            torch.nn.init.constant_(self.dbb_avg.avgbn.weight, gamma_value)\n        if hasattr(self, \"dbb_1x1_kxk\"):\n            torch.nn.init.constant_(self.dbb_1x1_kxk.bn2.weight, gamma_value)\n\n    def single_init(self):\n        self.init_gamma(0.0)\n        if hasattr(self, \"dbb_origin\"):\n            torch.nn.init.constant_(self.dbb_origin.bn.weight, 1.0)\n\nclass Bottleneck_DBB(nn.Module):\n    # Standard bottleneck\n    def __init__(self, c1, c2, shortcut=True, g=1, e=0.5):  # ch_in, ch_out, shortcut, groups, expansion\n        super().__init__()\n        c_ = int(c2 * e)  # hidden channels\n        self.cv1 = Conv(c1, c_, 1, 1)\n        self.cv2 = DiverseBranchBlock(c_, c2, 3, 1, groups=g)\n        self.add = shortcut and c1 == c2\n\n    def forward(self, x):\n        return x + self.cv2(self.cv1(x)) if self.add else self.cv2(self.cv1(x))\n\nclass C3_DBB(C3):\n    # CSP Bottleneck with 3 convolutions\n    def __init__(self, c1, c2, n=1, shortcut=True, g=1, e=0.5):\n        super().__init__(c1, c2, n, shortcut, g, e)\n        c_ = int(c2 * e)  # hidden channels\n        self.m = nn.Sequential(*(Bottleneck_DBB(c_, c_, shortcut, g, e=1.0) for _ in range(n)))"
  },
  {
    "path": "yolo-improve/yolov5-DCN.py",
    "content": "class DCNv2(nn.Module):\n    def __init__(self, in_channels, out_channels, kernel_size, stride=1,\n                 padding=1, dilation=1, groups=1, deformable_groups=1):\n        super(DCNv2, self).__init__()\n\n        self.in_channels = in_channels\n        self.out_channels = out_channels\n        self.kernel_size = (kernel_size, kernel_size)\n        self.stride = (stride, stride)\n        self.padding = (padding, padding)\n        self.dilation = (dilation, dilation)\n        self.groups = groups\n        self.deformable_groups = deformable_groups\n\n        self.weight = nn.Parameter(\n            torch.empty(out_channels, in_channels, *self.kernel_size)\n        )\n        self.bias = nn.Parameter(torch.empty(out_channels))\n\n        out_channels_offset_mask = (self.deformable_groups * 3 *\n                                    self.kernel_size[0] * self.kernel_size[1])\n        self.conv_offset_mask = nn.Conv2d(\n            self.in_channels,\n            out_channels_offset_mask,\n            kernel_size=self.kernel_size,\n            stride=self.stride,\n            padding=self.padding,\n            bias=True,\n        )\n        self.bn = nn.BatchNorm2d(out_channels)\n        self.act = Conv.default_act\n        self.reset_parameters()\n\n    def forward(self, x):\n        offset_mask = self.conv_offset_mask(x)\n        o1, o2, mask = torch.chunk(offset_mask, 3, dim=1)\n        offset = torch.cat((o1, o2), dim=1)\n        mask = torch.sigmoid(mask)\n        x = torch.ops.torchvision.deform_conv2d(\n            x,\n            self.weight,\n            offset,\n            mask,\n            self.bias,\n            self.stride[0], self.stride[1],\n            self.padding[0], self.padding[1],\n            self.dilation[0], self.dilation[1],\n            self.groups,\n            self.deformable_groups,\n            True\n        )\n        x = self.bn(x)\n        x = self.act(x)\n        return x\n\n    def reset_parameters(self):\n        n = self.in_channels\n        for k in self.kernel_size:\n            n *= k\n        std = 1. / math.sqrt(n)\n        self.weight.data.uniform_(-std, std)\n        self.bias.data.zero_()\n        self.conv_offset_mask.weight.data.zero_()\n        self.conv_offset_mask.bias.data.zero_()\n\nclass Bottleneck_DCN(nn.Module):\n    # Standard bottleneck\n    def __init__(self, c1, c2, shortcut=True, g=1, e=0.5):  # ch_in, ch_out, shortcut, groups, expansion\n        super().__init__()\n        c_ = int(c2 * e)  # hidden channels\n        self.cv1 = Conv(c1, c_, 1, 1)\n        self.cv2 = DCNv2(c_, c2, 3, 1, groups=g)\n        self.add = shortcut and c1 == c2\n\n    def forward(self, x):\n        return x + self.cv2(self.cv1(x)) if self.add else self.cv2(self.cv1(x))\n\nclass C3_DCN(C3):\n    # C3 module with DCNv2\n    def __init__(self, c1, c2, n=1, shortcut=True, g=1, e=0.5):\n        super().__init__(c1, c2, n, shortcut, g, e)\n        c_ = int(c2 * e)\n        self.m = nn.Sequential(*(Bottleneck_DCN(c_, c_, shortcut, g, e=1.0) for _ in range(n)))"
  },
  {
    "path": "yolo-improve/yolov5-DCNV3/commod.py",
    "content": "from models.ops_dcnv3.modules import DCNv3\nclass DCNV3_YoLo(nn.Module):\n    def __init__(self, inc, ouc, k=1, s=1, p=None, g=1, d=1, act=True):\n        super().__init__()\n        \n        self.conv = Conv(inc, ouc, k=1)\n        self.dcnv3 = DCNv3(ouc, kernel_size=k, stride=s, group=g, dilation=d)\n        self.bn = nn.BatchNorm2d(ouc)\n        self.act = Conv.default_act\n    \n    def forward(self, x):\n        x = self.conv(x)\n        x = x.permute(0, 2, 3, 1)\n        x = self.dcnv3(x)\n        x = x.permute(0, 3, 1, 2)\n        x = self.act(self.bn(x))\n        return x\n\nclass Bottleneck_DCNV3(nn.Module):\n    # Standard bottleneck\n    def __init__(self, c1, c2, shortcut=True, g=1, e=0.5):  # ch_in, ch_out, shortcut, groups, expansion\n        super().__init__()\n        c_ = int(c2 * e)  # hidden channels\n        self.cv1 = Conv(c1, c_, 1, 1)\n        self.cv2 = DCNV3_YoLo(c_, c2, 3, 1, g=g)\n        self.add = shortcut and c1 == c2\n\n    def forward(self, x):\n        return x + self.cv2(self.cv1(x)) if self.add else self.cv2(self.cv1(x))\n\nclass C3_DCNV3(nn.Module):\n    # CSP Bottleneck with 3 convolutions\n    def __init__(self, c1, c2, n=1, shortcut=True, g=1, e=0.5):  # ch_in, ch_out, number, shortcut, groups, expansion\n        super().__init__()\n        c_ = int(c2 * e)  # hidden channels\n        self.cv1 = Conv(c1, c_, 1, 1)\n        self.cv2 = Conv(c1, c_, 1, 1)\n        self.cv3 = Conv(2 * c_, c2, 1)  # optional act=FReLU(c2)\n        self.m = nn.Sequential(*(Bottleneck_DCNV3(c_, c_, shortcut, g, e=1.0) for _ in range(n)))\n\n    def forward(self, x):\n        return self.cv3(torch.cat((self.m(self.cv1(x)), self.cv2(x)), 1))\n\n# models/yolo.py DetectionModel class\nself.model.to(torch.device('cuda'))\nm.stride = torch.tensor([s / x.shape[-2] for x in forward(torch.zeros(1, ch, s, s).to(torch.device('cuda')))]).cpu()  # forward\nself.model.cpu()"
  },
  {
    "path": "yolo-improve/yolov5-DCNV3/ops_dcnv3/functions/__init__.py",
    "content": "# --------------------------------------------------------\n# InternImage\n# Copyright (c) 2022 OpenGVLab\n# Licensed under The MIT License [see LICENSE for details]\n# --------------------------------------------------------\n\nfrom .dcnv3_func import DCNv3Function, dcnv3_core_pytorch\n"
  },
  {
    "path": "yolo-improve/yolov5-DCNV3/ops_dcnv3/functions/dcnv3_func.py",
    "content": "# --------------------------------------------------------\n# InternImage\n# Copyright (c) 2022 OpenGVLab\n# Licensed under The MIT License [see LICENSE for details]\n# --------------------------------------------------------\n\nfrom __future__ import absolute_import\nfrom __future__ import print_function\nfrom __future__ import division\n\nimport torch\nimport torch.nn.functional as F\nfrom torch.autograd import Function\nfrom torch.autograd.function import once_differentiable\nfrom torch.cuda.amp import custom_bwd, custom_fwd\nimport DCNv3\n\n\nclass DCNv3Function(Function):\n    @staticmethod\n    @custom_fwd\n    def forward(\n            ctx, input, offset, mask,\n            kernel_h, kernel_w, stride_h, stride_w,\n            pad_h, pad_w, dilation_h, dilation_w,\n            group, group_channels, offset_scale, im2col_step):\n        ctx.kernel_h = kernel_h\n        ctx.kernel_w = kernel_w\n        ctx.stride_h = stride_h\n        ctx.stride_w = stride_w\n        ctx.pad_h = pad_h\n        ctx.pad_w = pad_w\n        ctx.dilation_h = dilation_h\n        ctx.dilation_w = dilation_w\n        ctx.group = group\n        ctx.group_channels = group_channels\n        ctx.offset_scale = offset_scale\n        ctx.im2col_step = im2col_step\n        output = DCNv3.dcnv3_forward(\n            input, offset, mask, kernel_h,\n            kernel_w, stride_h, stride_w, pad_h,\n            pad_w, dilation_h, dilation_w, group,\n            group_channels, offset_scale, ctx.im2col_step)\n        ctx.save_for_backward(input, offset, mask)\n\n        return output\n\n    @staticmethod\n    @once_differentiable\n    @custom_bwd\n    def backward(ctx, grad_output):\n        input, offset, mask = ctx.saved_tensors\n        grad_input, grad_offset, grad_mask = \\\n            DCNv3.dcnv3_backward(\n                input, offset, mask, ctx.kernel_h,\n                ctx.kernel_w, ctx.stride_h, ctx.stride_w, ctx.pad_h,\n                ctx.pad_w, ctx.dilation_h, ctx.dilation_w, ctx.group,\n                ctx.group_channels, ctx.offset_scale, grad_output.contiguous(), ctx.im2col_step)\n\n        return grad_input, grad_offset, grad_mask, \\\n            None, None, None, None, None, None, None, None, None, None, None, None\n\n    @staticmethod\n    def symbolic(g, input, offset, mask, kernel_h, kernel_w, stride_h,\n                 stride_w, pad_h, pad_w, dilation_h, dilation_w, group,\n                 group_channels, offset_scale, im2col_step):\n        \"\"\"Symbolic function for mmdeploy::DCNv3.\n\n        Returns:\n            DCNv3 op for onnx.\n        \"\"\"\n        return g.op(\n            'mmdeploy::TRTDCNv3',\n            input,\n            offset,\n            mask,\n            kernel_h_i=int(kernel_h),\n            kernel_w_i=int(kernel_w),\n            stride_h_i=int(stride_h),\n            stride_w_i=int(stride_w),\n            pad_h_i=int(pad_h),\n            pad_w_i=int(pad_w),\n            dilation_h_i=int(dilation_h),\n            dilation_w_i=int(dilation_w),\n            group_i=int(group),\n            group_channels_i=int(group_channels),\n            offset_scale_f=float(offset_scale),\n            im2col_step_i=int(im2col_step),\n        )\n\n\ndef _get_reference_points(spatial_shapes, device, kernel_h, kernel_w, dilation_h, dilation_w, pad_h=0, pad_w=0, stride_h=1, stride_w=1):\n    _, H_, W_, _ = spatial_shapes\n    H_out = (H_ - (dilation_h * (kernel_h - 1) + 1)) // stride_h + 1\n    W_out = (W_ - (dilation_w * (kernel_w - 1) + 1)) // stride_w + 1\n\n    ref_y, ref_x = torch.meshgrid(\n        torch.linspace(\n            # pad_h + 0.5,\n            # H_ - pad_h - 0.5,\n            (dilation_h * (kernel_h - 1)) // 2 + 0.5,\n            (dilation_h * (kernel_h - 1)) // 2 + 0.5 + (H_out - 1) * stride_h,\n            H_out,\n            dtype=torch.float32,\n            device=device),\n        torch.linspace(\n            # pad_w + 0.5,\n            # W_ - pad_w - 0.5,\n            (dilation_w * (kernel_w - 1)) // 2 + 0.5,\n            (dilation_w * (kernel_w - 1)) // 2 + 0.5 + (W_out - 1) * stride_w,\n            W_out,\n            dtype=torch.float32,\n            device=device))\n    ref_y = ref_y.reshape(-1)[None] / H_\n    ref_x = ref_x.reshape(-1)[None] / W_\n\n    ref = torch.stack((ref_x, ref_y), -1).reshape(\n        1, H_out, W_out, 1, 2)\n\n    return ref\n\n\ndef _generate_dilation_grids(spatial_shapes, kernel_h, kernel_w, dilation_h, dilation_w, group, device):\n    _, H_, W_, _ = spatial_shapes\n    points_list = []\n    x, y = torch.meshgrid(\n        torch.linspace(\n            -((dilation_w * (kernel_w - 1)) // 2),\n            -((dilation_w * (kernel_w - 1)) // 2) +\n            (kernel_w - 1) * dilation_w, kernel_w,\n            dtype=torch.float32,\n            device=device),\n        torch.linspace(\n            -((dilation_h * (kernel_h - 1)) // 2),\n            -((dilation_h * (kernel_h - 1)) // 2) +\n            (kernel_h - 1) * dilation_h, kernel_h,\n            dtype=torch.float32,\n            device=device))\n\n    points_list.extend([x / W_, y / H_])\n    grid = torch.stack(points_list, -1).reshape(-1, 1, 2).\\\n        repeat(1, group, 1).permute(1, 0, 2)\n    grid = grid.reshape(1, 1, 1, group * kernel_h * kernel_w, 2)\n\n    return grid\n\n\ndef dcnv3_core_pytorch(\n        input, offset, mask, kernel_h,\n        kernel_w, stride_h, stride_w, pad_h,\n        pad_w, dilation_h, dilation_w, group,\n        group_channels, offset_scale):\n    # for debug and test only,\n    # need to use cuda version instead\n    input = F.pad(\n        input,\n        [0, 0, pad_h, pad_h, pad_w, pad_w])\n    N_, H_in, W_in, _ = input.shape\n    _, H_out, W_out, _ = offset.shape\n\n    ref = _get_reference_points(\n        input.shape, input.device, kernel_h, kernel_w, dilation_h, dilation_w, pad_h, pad_w, stride_h, stride_w)\n    grid = _generate_dilation_grids(\n        input.shape, kernel_h, kernel_w, dilation_h, dilation_w, group, input.device)\n    spatial_norm = torch.tensor([W_in, H_in]).reshape(1, 1, 1, 2).\\\n        repeat(1, 1, 1, group*kernel_h*kernel_w).to(input.device)\n\n    sampling_locations = (ref + grid * offset_scale).repeat(N_, 1, 1, 1, 1).flatten(3, 4) + \\\n        offset * offset_scale / spatial_norm\n\n    P_ = kernel_h * kernel_w\n    sampling_grids = 2 * sampling_locations - 1\n    # N_, H_in, W_in, group*group_channels -> N_, H_in*W_in, group*group_channels -> N_, group*group_channels, H_in*W_in -> N_*group, group_channels, H_in, W_in\n    input_ = input.view(N_, H_in*W_in, group*group_channels).transpose(1, 2).\\\n        reshape(N_*group, group_channels, H_in, W_in)\n    # N_, H_out, W_out, group*P_*2 -> N_, H_out*W_out, group, P_, 2 -> N_, group, H_out*W_out, P_, 2 -> N_*group, H_out*W_out, P_, 2\n    sampling_grid_ = sampling_grids.view(N_, H_out*W_out, group, P_, 2).transpose(1, 2).\\\n        flatten(0, 1)\n    # N_*group, group_channels, H_out*W_out, P_\n    sampling_input_ = F.grid_sample(\n        input_, sampling_grid_, mode='bilinear', padding_mode='zeros', align_corners=False)\n\n    # (N_, H_out, W_out, group*P_) -> N_, H_out*W_out, group, P_ -> (N_, group, H_out*W_out, P_) -> (N_*group, 1, H_out*W_out, P_)\n    mask = mask.view(N_, H_out*W_out, group, P_).transpose(1, 2).\\\n        reshape(N_*group, 1, H_out*W_out, P_)\n    output = (sampling_input_ * mask).sum(-1).view(N_,\n                                                   group*group_channels, H_out*W_out)\n\n    return output.transpose(1, 2).reshape(N_, H_out, W_out, -1).contiguous()\n"
  },
  {
    "path": "yolo-improve/yolov5-DCNV3/ops_dcnv3/make.sh",
    "content": "#!/usr/bin/env bash\n# --------------------------------------------------------\n# InternImage\n# Copyright (c) 2022 OpenGVLab\n# Licensed under The MIT License [see LICENSE for details]\n# --------------------------------------------------------\n\npython setup.py build install\n"
  },
  {
    "path": "yolo-improve/yolov5-DCNV3/ops_dcnv3/modules/__init__.py",
    "content": "# --------------------------------------------------------\n# InternImage\n# Copyright (c) 2022 OpenGVLab\n# Licensed under The MIT License [see LICENSE for details]\n# --------------------------------------------------------\n\nfrom .dcnv3 import DCNv3"
  },
  {
    "path": "yolo-improve/yolov5-DCNV3/ops_dcnv3/modules/dcnv3.py",
    "content": "# --------------------------------------------------------\n# InternImage\n# Copyright (c) 2022 OpenGVLab\n# Licensed under The MIT License [see LICENSE for details]\n# --------------------------------------------------------\n\nfrom __future__ import absolute_import\nfrom __future__ import print_function\nfrom __future__ import division\n\nimport warnings\nfrom torch import nn\nimport torch.nn.functional as F\nfrom torch.nn.init import xavier_uniform_, constant_\nfrom ..functions import DCNv3Function, dcnv3_core_pytorch\n\ndef autopad(k, p=None, d=1):  # kernel, padding, dilation\n    # Pad to 'same' shape outputs\n    if d > 1:\n        k = d * (k - 1) + 1 if isinstance(k, int) else [d * (x - 1) + 1 for x in k]  # actual kernel-size\n    if p is None:\n        p = k // 2 if isinstance(k, int) else [x // 2 for x in k]  # auto-pad\n    return p\n\n\nclass Conv(nn.Module):\n    # Standard convolution with args(ch_in, ch_out, kernel, stride, padding, groups, dilation, activation)\n    default_act = nn.SiLU()  # default activation\n\n    def __init__(self, c1, c2, k=1, s=1, p=None, g=1, d=1, act=True):\n        super().__init__()\n        self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p, d), groups=g, dilation=d, bias=False)\n        self.bn = nn.BatchNorm2d(c2)\n        self.act = self.default_act if act is True else act if isinstance(act, nn.Module) else nn.Identity()\n\n    def forward(self, x):\n        return self.act(self.bn(self.conv(x)))\n\n    def forward_fuse(self, x):\n        return self.act(self.conv(x))\n\ndef _is_power_of_2(n):\n    if (not isinstance(n, int)) or (n < 0):\n        raise ValueError(\n            \"invalid input for _is_power_of_2: {} (type: {})\".format(n, type(n)))\n\n    return (n & (n-1) == 0) and n != 0\n\n\nclass DCNv3(nn.Module):\n    def __init__(\n            self, channels=64, kernel_size=3, stride=1,\n            pad=1, dilation=1, group=4, offset_scale=1.0,\n            act_layer='GELU', norm_layer='LN'):\n        \"\"\"\n        DCNv3 Module\n        :param channels     \n        :param kernel_size  \n        :param stride      \n        :param pad     \n        :param dilation\n        :param group\n        :param offset_scale\n        :param act_layer\n        :param norm_layer\n        \"\"\"\n        super().__init__()\n        if channels % group != 0:\n            raise ValueError(\n                f'channels must be divisible by group, but got {channels} and {group}')\n        _d_per_group = channels // group\n        # you'd better set _d_per_group to a power of 2 which is more efficient in our CUDA implementation\n        if not _is_power_of_2(_d_per_group):\n            warnings.warn(\n                \"You'd better set channels in DCNv3 to make the dimension of each attention head a power of 2 \"\n                \"which is more efficient in our CUDA implementation.\")\n\n        self.offset_scale = offset_scale\n        self.channels = channels\n        self.kernel_size = kernel_size\n        self.stride = stride\n        self.dilation = 1\n        self.pad = pad\n        self.group = group\n        self.group_channels = channels // group\n        self.offset_scale = offset_scale\n\n        self.dw_conv = Conv(channels, channels, kernel_size, g=channels)\n        self.offset = nn.Linear(\n            channels,\n            group * kernel_size * kernel_size * 2)\n        self.mask = nn.Linear(\n            channels,\n            group * kernel_size * kernel_size)\n        self.input_proj = nn.Linear(channels, channels)\n        self.output_proj = nn.Linear(channels, channels)\n        self._reset_parameters()\n\n    def _reset_parameters(self):\n        constant_(self.offset.weight.data, 0.)\n        constant_(self.offset.bias.data, 0.)\n        constant_(self.mask.weight.data, 0.)\n        constant_(self.mask.bias.data, 0.)\n        xavier_uniform_(self.input_proj.weight.data)\n        constant_(self.input_proj.bias.data, 0.)\n        xavier_uniform_(self.output_proj.weight.data)\n        constant_(self.output_proj.bias.data, 0.)\n\n    def forward(self, input):\n        \"\"\"\n        :param query                       (N, H, W, C)\n        :return output                     (N, H, W, C)\n        \"\"\"\n        N, H, W, _ = input.shape\n\n        x = self.input_proj(input)\n        dtype = x.dtype\n\n        x1 = input.permute(0, 3, 1, 2)\n        x1 = self.dw_conv(x1).permute(0, 2, 3, 1)\n        offset = self.offset(x1)\n        mask = self.mask(x1).reshape(N, H, W, self.group, -1)\n        mask = F.softmax(mask, -1).reshape(N, H, W, -1).type(dtype)\n\n        x = DCNv3Function.apply(\n            x, offset, mask,\n            self.kernel_size, self.kernel_size,\n            self.stride, self.stride,\n            self.pad, self.pad,\n            self.dilation, self.dilation,\n            self.group, self.group_channels,\n            self.offset_scale,\n            256)\n        x = self.output_proj(x)\n\n        return x"
  },
  {
    "path": "yolo-improve/yolov5-DCNV3/ops_dcnv3/setup.py",
    "content": "# --------------------------------------------------------\n# InternImage\n# Copyright (c) 2022 OpenGVLab\n# Licensed under The MIT License [see LICENSE for details]\n# --------------------------------------------------------\n\nimport os\nimport glob\n\nimport torch\n\nfrom torch.utils.cpp_extension import CUDA_HOME\nfrom torch.utils.cpp_extension import CppExtension\nfrom torch.utils.cpp_extension import CUDAExtension\n\nfrom setuptools import find_packages\nfrom setuptools import setup\n\nrequirements = [\"torch\", \"torchvision\"]\n\n\ndef get_extensions():\n    this_dir = os.path.dirname(os.path.abspath(__file__))\n    extensions_dir = os.path.join(this_dir, \"src\")\n\n    main_file = glob.glob(os.path.join(extensions_dir, \"*.cpp\"))\n    source_cpu = glob.glob(os.path.join(extensions_dir, \"cpu\", \"*.cpp\"))\n    source_cuda = glob.glob(os.path.join(extensions_dir, \"cuda\", \"*.cu\"))\n\n    sources = main_file + source_cpu\n    extension = CppExtension\n    extra_compile_args = {\"cxx\": []}\n    define_macros = []\n\n    if torch.cuda.is_available() and CUDA_HOME is not None:\n        extension = CUDAExtension\n        sources += source_cuda\n        define_macros += [(\"WITH_CUDA\", None)]\n        extra_compile_args[\"nvcc\"] = [\n            # \"-DCUDA_HAS_FP16=1\",\n            # \"-D__CUDA_NO_HALF_OPERATORS__\",\n            # \"-D__CUDA_NO_HALF_CONVERSIONS__\",\n            # \"-D__CUDA_NO_HALF2_OPERATORS__\",\n        ]\n    else:\n        raise NotImplementedError('Cuda is not availabel')\n\n    sources = [os.path.join(extensions_dir, s) for s in sources]\n    include_dirs = [extensions_dir]\n    ext_modules = [\n        extension(\n            \"DCNv3\",\n            sources,\n            include_dirs=include_dirs,\n            define_macros=define_macros,\n            extra_compile_args=extra_compile_args,\n        )\n    ]\n    return ext_modules\n\n\nsetup(\n    name=\"DCNv3\",\n    version=\"1.0\",\n    author=\"InternImage\",\n    url=\"https://github.com/OpenGVLab/InternImage\",\n    description=\n    \"PyTorch Wrapper for CUDA Functions of DCNv3\",\n    packages=find_packages(exclude=(\n        \"configs\",\n        \"tests\",\n    )),\n    ext_modules=get_extensions(),\n    cmdclass={\"build_ext\": torch.utils.cpp_extension.BuildExtension},\n)\n"
  },
  {
    "path": "yolo-improve/yolov5-DCNV3/ops_dcnv3/src/cpu/dcnv3_cpu.cpp",
    "content": "/*!\n**************************************************************************************************\n* InternImage\n* Copyright (c) 2022 OpenGVLab\n* Licensed under The MIT License [see LICENSE for details]\n**************************************************************************************************\n* Modified from\n*https://github.com/chengdazhi/Deformable-Convolution-V2-PyTorch/tree/pytorch_1.0.0\n**************************************************************************************************\n*/\n\n#include <vector>\n\n#include <ATen/ATen.h>\n#include <ATen/cuda/CUDAContext.h>\n\nat::Tensor dcnv3_cpu_forward(const at::Tensor &input, const at::Tensor &offset,\n                             const at::Tensor &mask, const int kernel_h,\n                             const int kernel_w, const int stride_h,\n                             const int stride_w, const int pad_h,\n                             const int pad_w, const int dilation_h,\n                             const int dilation_w, const int group,\n                             const int group_channels, const float offset_scale,\n                             const int im2col_step) {\n    AT_ERROR(\"Not implement on cpu\");\n}\n\nstd::vector<at::Tensor>\ndcnv3_cpu_backward(const at::Tensor &input, const at::Tensor &offset,\n                   const at::Tensor &mask, const int kernel_h,\n                   const int kernel_w, const int stride_h, const int stride_w,\n                   const int pad_h, const int pad_w, const int dilation_h,\n                   const int dilation_w, const int group,\n                   const int group_channels, const float offset_scale,\n                   const at::Tensor &grad_output, const int im2col_step) {\n    AT_ERROR(\"Not implement on cpu\");\n}\n"
  },
  {
    "path": "yolo-improve/yolov5-DCNV3/ops_dcnv3/src/cpu/dcnv3_cpu.h",
    "content": "/*!\n**************************************************************************************************\n* InternImage\n* Copyright (c) 2022 OpenGVLab\n* Licensed under The MIT License [see LICENSE for details]\n**************************************************************************************************\n* Modified from\n*https://github.com/chengdazhi/Deformable-Convolution-V2-PyTorch/tree/pytorch_1.0.0\n**************************************************************************************************\n*/\n\n#pragma once\n#include <torch/extension.h>\n\nat::Tensor dcnv3_cpu_forward(const at::Tensor &input, const at::Tensor &offset,\n                             const at::Tensor &mask, const int kernel_h,\n                             const int kernel_w, const int stride_h,\n                             const int stride_w, const int pad_h,\n                             const int pad_w, const int dilation_h,\n                             const int dilation_w, const int group,\n                             const int group_channels, const float offset_scale,\n                             const int im2col_step);\n\nstd::vector<at::Tensor>\ndcnv3_cpu_backward(const at::Tensor &input, const at::Tensor &offset,\n                   const at::Tensor &mask, const int kernel_h,\n                   const int kernel_w, const int stride_h, const int stride_w,\n                   const int pad_h, const int pad_w, const int dilation_h,\n                   const int dilation_w, const int group,\n                   const int group_channels, const float offset_scale,\n                   const at::Tensor &grad_output, const int im2col_step);\n"
  },
  {
    "path": "yolo-improve/yolov5-DCNV3/ops_dcnv3/src/cuda/dcnv3_cuda.cu",
    "content": "/*!\n**************************************************************************************************\n* InternImage\n* Copyright (c) 2022 OpenGVLab\n* Licensed under The MIT License [see LICENSE for details]\n**************************************************************************************************\n* Modified from\n*https://github.com/chengdazhi/Deformable-Convolution-V2-PyTorch/tree/pytorch_1.0.0\n**************************************************************************************************\n*/\n\n#include \"cuda/dcnv3_im2col_cuda.cuh\"\n#include <vector>\n\n#include <ATen/ATen.h>\n#include <ATen/cuda/CUDAContext.h>\n#include <cuda.h>\n#include <cuda_runtime.h>\n#include <torch/torch.h>\n\nat::Tensor dcnv3_cuda_forward(const at::Tensor &input, const at::Tensor &offset,\n                              const at::Tensor &mask, const int kernel_h,\n                              const int kernel_w, const int stride_h,\n                              const int stride_w, const int pad_h,\n                              const int pad_w, const int dilation_h,\n                              const int dilation_w, const int group,\n                              const int group_channels,\n                              const float offset_scale, const int im2col_step) {\n    AT_ASSERTM(input.is_contiguous(), \"input tensor has to be contiguous\");\n    AT_ASSERTM(offset.is_contiguous(), \"offset tensor has to be contiguous\");\n    AT_ASSERTM(mask.is_contiguous(), \"mask tensor has to be contiguous\");\n    AT_ASSERTM(input.type().is_cuda(), \"input must be a CUDA tensor\");\n    AT_ASSERTM(offset.type().is_cuda(), \"offset must be a CUDA tensor\");\n    AT_ASSERTM(mask.type().is_cuda(), \"mask must be a CUDA tensor\");\n\n    const int batch = input.size(0);\n    const int height_in = input.size(1);\n    const int width_in = input.size(2);\n    const int channels = input.size(3);\n    const int height_out =\n        (height_in + 2 * pad_h - (dilation_h * (kernel_h - 1) + 1)) / stride_h +\n        1;\n    const int width_out =\n        (width_in + 2 * pad_w - (dilation_w * (kernel_w - 1) + 1)) / stride_w +\n        1;\n    const int im2col_step_ = std::min(batch, im2col_step);\n\n    AT_ASSERTM(batch % im2col_step_ == 0,\n               \"batch(%d) must divide im2col_step(%d)\", batch, im2col_step_);\n    AT_ASSERTM(\n        channels == (group * group_channels),\n        \"Input channels and group times group channels wont match: (%d vs %d).\",\n        channels, group * group_channels);\n\n    auto output =\n        at::zeros({batch, height_out, width_out, group * group_channels},\n                  input.options());\n\n    const int batch_n = im2col_step_;\n    auto output_n = output.view({batch / batch_n, batch_n, height_out,\n                                 width_out, group * group_channels});\n    auto per_input_size = height_in * width_in * group * group_channels;\n    auto per_offset_size =\n        height_out * width_out * group * kernel_h * kernel_w * 2;\n    auto per_mask_size = height_out * width_out * group * kernel_h * kernel_w;\n    for (int n = 0; n < batch / im2col_step_; ++n) {\n        auto columns = output_n.select(0, n);\n        // AT_DISPATCH_FLOATING_TYPES(\n        AT_DISPATCH_FLOATING_TYPES_AND_HALF(\n            input.type(), \"ms_deform_attn_forward_cuda\", ([&] {\n                dcnv3_im2col_cuda(\n                    at::cuda::getCurrentCUDAStream(),\n                    input.data<scalar_t>() + n * im2col_step_ * per_input_size,\n                    offset.data<scalar_t>() +\n                        n * im2col_step_ * per_offset_size,\n                    mask.data<scalar_t>() + n * im2col_step_ * per_mask_size,\n                    columns.data<scalar_t>(), kernel_h, kernel_w, stride_h,\n                    stride_w, pad_h, pad_w, dilation_h, dilation_w, group,\n                    group_channels, batch_n, height_in, width_in, height_out,\n                    width_out, offset_scale);\n            }));\n    }\n\n    return output;\n}\n\nstd::vector<at::Tensor>\ndcnv3_cuda_backward(const at::Tensor &input, const at::Tensor &offset,\n                    const at::Tensor &mask, const int kernel_h,\n                    const int kernel_w, const int stride_h, const int stride_w,\n                    const int pad_h, const int pad_w, const int dilation_h,\n                    const int dilation_w, const int group,\n                    const int group_channels, const float offset_scale,\n                    const at::Tensor &grad_output, const int im2col_step) {\n\n    AT_ASSERTM(input.is_contiguous(), \"input tensor has to be contiguous\");\n    AT_ASSERTM(offset.is_contiguous(), \"offset tensor has to be contiguous\");\n    AT_ASSERTM(mask.is_contiguous(), \"mask tensor has to be contiguous\");\n    AT_ASSERTM(grad_output.is_contiguous(),\n               \"grad_output tensor has to be contiguous\");\n    AT_ASSERTM(input.type().is_cuda(), \"input must be a CUDA tensor\");\n    AT_ASSERTM(offset.type().is_cuda(), \"offset must be a CUDA tensor\");\n    AT_ASSERTM(mask.type().is_cuda(), \"mask must be a CUDA tensor\");\n    AT_ASSERTM(grad_output.type().is_cuda(),\n               \"grad_output must be a CUDA tensor\");\n\n    const int batch = input.size(0);\n    const int height_in = input.size(1);\n    const int width_in = input.size(2);\n    const int channels = input.size(3);\n    const int height_out =\n        (height_in + 2 * pad_h - (dilation_h * (kernel_h - 1) + 1)) / stride_h +\n        1;\n    const int width_out =\n        (width_in + 2 * pad_w - (dilation_w * (kernel_w - 1) + 1)) / stride_w +\n        1;\n    const int im2col_step_ = std::min(batch, im2col_step);\n\n    AT_ASSERTM(batch % im2col_step_ == 0,\n               \"batch(%d) must divide im2col_step(%d)\", batch, im2col_step_);\n    AT_ASSERTM(\n        channels == (group * group_channels),\n        \"Input channels and group times group channels wont match: (%d vs %d).\",\n        channels, group * group_channels);\n\n    auto dtype = input.dtype();\n    if (dtype == at::kHalf) {\n        dtype = at::kFloat;\n    }\n\n    auto grad_input = at::zeros_like(input, dtype);\n    auto grad_offset = at::zeros_like(offset, dtype);\n    auto grad_mask = at::zeros_like(mask, dtype);\n\n    const int batch_n = im2col_step_;\n    auto per_input_size = height_in * width_in * group * group_channels;\n    auto per_offset_size =\n        height_out * width_out * group * kernel_h * kernel_w * 2;\n    auto per_mask_size = height_out * width_out * group * kernel_h * kernel_w;\n    auto grad_output_n =\n        grad_output.view({batch / im2col_step_, batch_n, height_out * width_out,\n                          group, group_channels});\n\n    for (int n = 0; n < batch / im2col_step_; ++n) {\n        auto grad_output_g = grad_output_n.select(0, n);\n        // AT_DISPATCH_FLOATING_TYPES(\n        AT_DISPATCH_FLOATING_TYPES_AND_HALF(\n            input.type(), \"ms_deform_attn_backward_cuda\", ([&] {\n                dcnv3_col2im_cuda(\n                    at::cuda::getCurrentCUDAStream(),\n                    grad_output_g.data<scalar_t>(),\n                    input.data<scalar_t>() + n * im2col_step_ * per_input_size,\n                    offset.data<scalar_t>() +\n                        n * im2col_step_ * per_offset_size,\n                    mask.data<scalar_t>() + n * im2col_step_ * per_mask_size,\n                    kernel_h, kernel_w, stride_h, stride_w, pad_h, pad_w,\n                    dilation_h, dilation_w, group, group_channels, batch_n,\n                    height_in, width_in, height_out, width_out, offset_scale,\n                    grad_input.data<opmath_t>() +\n                        n * im2col_step_ * per_input_size,\n                    grad_offset.data<opmath_t>() +\n                        n * im2col_step_ * per_offset_size,\n                    grad_mask.data<opmath_t>() +\n                        n * im2col_step_ * per_mask_size);\n            }));\n    }\n\n    if (input.dtype() == torch::kHalf) {\n        return {grad_input.to(torch::kHalf), grad_offset.to(torch::kHalf),\n                grad_mask.to(torch::kHalf)};\n    } else {\n        return {grad_input, grad_offset, grad_mask};\n    }\n}"
  },
  {
    "path": "yolo-improve/yolov5-DCNV3/ops_dcnv3/src/cuda/dcnv3_cuda.h",
    "content": "/*!\n**************************************************************************************************\n* InternImage\n* Copyright (c) 2022 OpenGVLab\n* Licensed under The MIT License [see LICENSE for details]\n**************************************************************************************************\n* Modified from\n*https://github.com/chengdazhi/Deformable-Convolution-V2-PyTorch/tree/pytorch_1.0.0\n**************************************************************************************************\n*/\n\n#pragma once\n#include <torch/extension.h>\n\nat::Tensor dcnv3_cuda_forward(const at::Tensor &input, const at::Tensor &offset,\n                              const at::Tensor &mask, const int kernel_h,\n                              const int kernel_w, const int stride_h,\n                              const int stride_w, const int pad_h,\n                              const int pad_w, const int dilation_h,\n                              const int dilation_w, const int group,\n                              const int group_channels,\n                              const float offset_scale, const int im2col_step);\n\nstd::vector<at::Tensor>\ndcnv3_cuda_backward(const at::Tensor &input, const at::Tensor &offset,\n                    const at::Tensor &mask, const int kernel_h,\n                    const int kernel_w, const int stride_h, const int stride_w,\n                    const int pad_h, const int pad_w, const int dilation_h,\n                    const int dilation_w, const int group,\n                    const int group_channels, const float offset_scale,\n                    const at::Tensor &grad_output, const int im2col_step);\n"
  },
  {
    "path": "yolo-improve/yolov5-DCNV3/ops_dcnv3/src/cuda/dcnv3_im2col_cuda.cuh",
    "content": "/*!\n**************************************************************************************************\n* InternImage\n* Copyright (c) 2022 OpenGVLab\n* Licensed under The MIT License [see LICENSE for details]\n**************************************************************************************************\n* Modified from\n*https://github.com/chengdazhi/Deformable-Convolution-V2-PyTorch/tree/pytorch_1.0.0\n**************************************************************************************************\n*/\n\n#include <algorithm>\n#include <cstdio>\n#include <cstring>\n\n#include <ATen/ATen.h>\n#include <ATen/OpMathType.h>\n#include <ATen/cuda/CUDAContext.h>\n#include <THC/THCAtomics.cuh>\n\n#define CUDA_KERNEL_LOOP(i, n)                                                 \\\n    for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < (n);               \\\n         i += blockDim.x * gridDim.x)\n\nconst int CUDA_NUM_THREADS = 256;\ninline int GET_BLOCKS(const int N, const int num_threads) {\n    return (N + num_threads - 1) / num_threads;\n}\n\n#define opmath_t at::opmath_type<scalar_t>\n\ntemplate <typename scalar_t>\n__device__ opmath_t dcnv3_im2col_bilinear(const scalar_t *&bottom_data,\n                                          const int &height, const int &width,\n                                          const int &group,\n                                          const int &group_channels,\n                                          const opmath_t &h, const opmath_t &w,\n                                          const int &g, const int &c) {\n    const int h_low = floor(h);\n    const int w_low = floor(w);\n    const int h_high = h_low + 1;\n    const int w_high = w_low + 1;\n\n    const opmath_t lh = h - h_low;\n    const opmath_t lw = w - w_low;\n    const opmath_t hh = 1 - lh, hw = 1 - lw;\n\n    const int w_stride = group * group_channels;\n    const int h_stride = width * w_stride;\n    const int h_low_ptr_offset = h_low * h_stride;\n    const int h_high_ptr_offset = h_low_ptr_offset + h_stride;\n    const int w_low_ptr_offset = w_low * w_stride;\n    const int w_high_ptr_offset = w_low_ptr_offset + w_stride;\n    const int base_ptr = g * group_channels + c;\n\n    opmath_t v1 = 0;\n    if (h_low >= 0 && w_low >= 0) {\n        const int ptr1 = h_low_ptr_offset + w_low_ptr_offset + base_ptr;\n        v1 = bottom_data[ptr1];\n    }\n    opmath_t v2 = 0;\n    if (h_low >= 0 && w_high <= width - 1) {\n        const int ptr2 = h_low_ptr_offset + w_high_ptr_offset + base_ptr;\n        v2 = bottom_data[ptr2];\n    }\n    opmath_t v3 = 0;\n    if (h_high <= height - 1 && w_low >= 0) {\n        const int ptr3 = h_high_ptr_offset + w_low_ptr_offset + base_ptr;\n        v3 = bottom_data[ptr3];\n    }\n    opmath_t v4 = 0;\n    if (h_high <= height - 1 && w_high <= width - 1) {\n        const int ptr4 = h_high_ptr_offset + w_high_ptr_offset + base_ptr;\n        v4 = bottom_data[ptr4];\n    }\n    const opmath_t w1 = hh * hw, w2 = hh * lw, w3 = lh * hw, w4 = lh * lw;\n\n    const opmath_t val = (w1 * v1 + w2 * v2 + w3 * v3 + w4 * v4);\n    return val;\n}\n\ntemplate <typename scalar_t>\n__device__ void dcnv3_col2im_bilinear(\n    const scalar_t *&bottom_data, const int &height, const int &width,\n    const int &nheads, const int &group_channels, const opmath_t &h,\n    const opmath_t &w, const int &m, const int &c, const opmath_t offset_scale,\n    const opmath_t &top_grad, const opmath_t &mask, opmath_t *&grad_im,\n    opmath_t *grad_offset, opmath_t *grad_mask) {\n    const int h_low = floor(h);\n    const int w_low = floor(w);\n    const int h_high = h_low + 1;\n    const int w_high = w_low + 1;\n\n    const opmath_t lh = h - h_low;\n    const opmath_t lw = w - w_low;\n    const opmath_t hh = 1 - lh, hw = 1 - lw;\n\n    const int w_stride = nheads * group_channels;\n    const int h_stride = width * w_stride;\n    const int h_low_ptr_offset = h_low * h_stride;\n    const int h_high_ptr_offset = h_low_ptr_offset + h_stride;\n    const int w_low_ptr_offset = w_low * w_stride;\n    const int w_high_ptr_offset = w_low_ptr_offset + w_stride;\n    const int base_ptr = m * group_channels + c;\n\n    const opmath_t w1 = hh * hw, w2 = hh * lw, w3 = lh * hw, w4 = lh * lw;\n    const opmath_t top_grad_im = top_grad * mask;\n    opmath_t grad_h_weight = 0, grad_w_weight = 0;\n\n    opmath_t v1 = 0;\n    if (h_low >= 0 && w_low >= 0) {\n        const int ptr1 = h_low_ptr_offset + w_low_ptr_offset + base_ptr;\n        v1 = bottom_data[ptr1];\n        grad_h_weight -= hw * v1;\n        grad_w_weight -= hh * v1;\n        atomicAdd(grad_im + ptr1, w1 * top_grad_im);\n    }\n    opmath_t v2 = 0;\n    if (h_low >= 0 && w_high <= width - 1) {\n        const int ptr2 = h_low_ptr_offset + w_high_ptr_offset + base_ptr;\n        v2 = bottom_data[ptr2];\n        grad_h_weight -= lw * v2;\n        grad_w_weight += hh * v2;\n        atomicAdd(grad_im + ptr2, w2 * top_grad_im);\n    }\n    opmath_t v3 = 0;\n    if (h_high <= height - 1 && w_low >= 0) {\n        const int ptr3 = h_high_ptr_offset + w_low_ptr_offset + base_ptr;\n        v3 = bottom_data[ptr3];\n        grad_h_weight += hw * v3;\n        grad_w_weight -= lh * v3;\n        atomicAdd(grad_im + ptr3, w3 * top_grad_im);\n    }\n    opmath_t v4 = 0;\n    if (h_high <= height - 1 && w_high <= width - 1) {\n        const int ptr4 = h_high_ptr_offset + w_high_ptr_offset + base_ptr;\n        v4 = bottom_data[ptr4];\n        grad_h_weight += lw * v4;\n        grad_w_weight += lh * v4;\n        atomicAdd(grad_im + ptr4, w4 * top_grad_im);\n    }\n\n    const opmath_t val = (w1 * v1 + w2 * v2 + w3 * v3 + w4 * v4);\n    *grad_mask = top_grad * val;\n    *grad_offset = offset_scale * grad_w_weight * top_grad_im;\n    *(grad_offset + 1) = offset_scale * grad_h_weight * top_grad_im;\n}\n\ntemplate <typename scalar_t>\n__device__ void dcnv3_col2im_bilinear_gm(\n    const scalar_t *&bottom_data, const int &height, const int &width,\n    const int &nheads, const int &group_channels, const opmath_t &h,\n    const opmath_t &w, const int &m, const int &c, const opmath_t offset_scale,\n    const opmath_t &top_grad, const opmath_t &mask, opmath_t *&grad_im,\n    opmath_t *grad_offset, opmath_t *grad_mask) {\n    const int h_low = floor(h);\n    const int w_low = floor(w);\n    const int h_high = h_low + 1;\n    const int w_high = w_low + 1;\n\n    const opmath_t lh = h - h_low;\n    const opmath_t lw = w - w_low;\n    const opmath_t hh = 1 - lh, hw = 1 - lw;\n\n    const int w_stride = nheads * group_channels;\n    const int h_stride = width * w_stride;\n    const int h_low_ptr_offset = h_low * h_stride;\n    const int h_high_ptr_offset = h_low_ptr_offset + h_stride;\n    const int w_low_ptr_offset = w_low * w_stride;\n    const int w_high_ptr_offset = w_low_ptr_offset + w_stride;\n    const int base_ptr = m * group_channels + c;\n\n    const opmath_t w1 = hh * hw, w2 = hh * lw, w3 = lh * hw, w4 = lh * lw;\n    const opmath_t top_grad_im = top_grad * mask;\n    opmath_t grad_h_weight = 0, grad_w_weight = 0;\n\n    opmath_t v1 = 0;\n    if (h_low >= 0 && w_low >= 0) {\n        const int ptr1 = h_low_ptr_offset + w_low_ptr_offset + base_ptr;\n        v1 = bottom_data[ptr1];\n        grad_h_weight -= hw * v1;\n        grad_w_weight -= hh * v1;\n        atomicAdd(grad_im + ptr1, w1 * top_grad_im);\n    }\n    opmath_t v2 = 0;\n    if (h_low >= 0 && w_high <= width - 1) {\n        const int ptr2 = h_low_ptr_offset + w_high_ptr_offset + base_ptr;\n        v2 = bottom_data[ptr2];\n        grad_h_weight -= lw * v2;\n        grad_w_weight += hh * v2;\n        atomicAdd(grad_im + ptr2, w2 * top_grad_im);\n    }\n    opmath_t v3 = 0;\n    if (h_high <= height - 1 && w_low >= 0) {\n        const int ptr3 = h_high_ptr_offset + w_low_ptr_offset + base_ptr;\n        v3 = bottom_data[ptr3];\n        grad_h_weight += hw * v3;\n        grad_w_weight -= lh * v3;\n        atomicAdd(grad_im + ptr3, w3 * top_grad_im);\n    }\n    opmath_t v4 = 0;\n    if (h_high <= height - 1 && w_high <= width - 1) {\n        const int ptr4 = h_high_ptr_offset + w_high_ptr_offset + base_ptr;\n        v4 = bottom_data[ptr4];\n        grad_h_weight += lw * v4;\n        grad_w_weight += lh * v4;\n        atomicAdd(grad_im + ptr4, w4 * top_grad_im);\n    }\n\n    const opmath_t val = (w1 * v1 + w2 * v2 + w3 * v3 + w4 * v4);\n    atomicAdd(grad_mask, top_grad * val);\n    atomicAdd(grad_offset, offset_scale * grad_w_weight * top_grad_im);\n    atomicAdd(grad_offset + 1, offset_scale * grad_h_weight * top_grad_im);\n}\n\ntemplate <typename scalar_t>\n__global__ void dcnv3_im2col_gpu_kernel(\n    const int num_kernels, const scalar_t *data_im, const scalar_t *data_offset,\n    const scalar_t *data_mask, scalar_t *data_col, const int kernel_h,\n    const int kernel_w, const int stride_h, const int stride_w, const int pad_h,\n    const int pad_w, const int dilation_h, const int dilation_w,\n    const int group, const int group_channels, const int height_in,\n    const int width_in, const int height_out, const int width_out,\n    const opmath_t offset_scale) {\n    CUDA_KERNEL_LOOP(index, num_kernels) {\n        int _temp = index;\n        const int c_col = _temp % group_channels;\n        _temp /= group_channels;\n        const int sampling_index = _temp;\n        const int g_col = _temp % group;\n        _temp /= group;\n        const int p0_w = ((dilation_w * (kernel_w - 1)) >> 1) - pad_w +\n                         (_temp % width_out) * stride_w;\n        _temp /= width_out;\n        const int p0_h = ((dilation_h * (kernel_h - 1)) >> 1) - pad_h +\n                         (_temp % height_out) * stride_h;\n        _temp /= height_out;\n        const int b_col = _temp;\n\n        const int input_size = height_in * width_in;\n        scalar_t *data_col_ptr = data_col + index;\n        const int kernel_size = kernel_h * kernel_w;\n        int data_weight_ptr = sampling_index * kernel_size;\n        int data_loc_w_ptr = data_weight_ptr << 1;\n        const int qid_stride = group * group_channels;\n        opmath_t col = 0;\n        const scalar_t *data_im_ptr = data_im + b_col * input_size * qid_stride;\n        // top-left\n        const opmath_t p0_w_ =\n            p0_w - ((dilation_w * (kernel_w - 1)) >> 1) * offset_scale;\n        const opmath_t p0_h_ =\n            p0_h - ((dilation_h * (kernel_h - 1)) >> 1) * offset_scale;\n        for (int i = 0; i < kernel_w; ++i) {\n            for (int j = 0; j < kernel_h; ++j) {\n                const opmath_t offset_w = data_offset[data_loc_w_ptr];\n                const opmath_t offset_h = data_offset[data_loc_w_ptr + 1];\n                const opmath_t loc_w =\n                    p0_w_ + (i * dilation_w + offset_w) * offset_scale;\n                const opmath_t loc_h =\n                    p0_h_ + (j * dilation_h + offset_h) * offset_scale;\n                const opmath_t weight = data_mask[data_weight_ptr];\n                if (loc_h > -1 && loc_w > -1 && loc_h < height_in &&\n                    loc_w < width_in) {\n                    col += dcnv3_im2col_bilinear(\n                               data_im_ptr, height_in, width_in, group,\n                               group_channels, loc_h, loc_w, g_col, c_col) *\n                           weight;\n                }\n                data_weight_ptr += 1;\n                data_loc_w_ptr += 2;\n            }\n        }\n        *data_col_ptr = col;\n    }\n}\n\n// debug\ntemplate <typename scalar_t, unsigned int blockSize>\n__global__ void dcnv3_col2im_gpu_kernel_shm_blocksize_aware_reduce_v1(\n    const int num_kernels, const scalar_t *grad_col, const scalar_t *data_im,\n    const scalar_t *data_offset, const scalar_t *data_mask, const int kernel_h,\n    const int kernel_w, const int stride_h, const int stride_w, const int pad_h,\n    const int pad_w, const int dilation_h, const int dilation_w,\n    const int group, const int group_channels, const int height_in,\n    const int width_in, const int height_out, const int width_out,\n    const opmath_t offset_scale, opmath_t *grad_im, opmath_t *grad_offset,\n    opmath_t *grad_mask) {\n    CUDA_KERNEL_LOOP(index, num_kernels) {\n        __shared__ opmath_t cache_grad_offset[blockSize * 2];\n        __shared__ opmath_t cache_grad_mask[blockSize];\n        unsigned int tid = threadIdx.x;\n        int _temp = index;\n        const int c_col = _temp % group_channels;\n        _temp /= group_channels;\n        const int sampling_index = _temp;\n        const int g_col = _temp % group;\n        _temp /= group;\n        const int p0_w = ((dilation_w * (kernel_w - 1)) >> 1) - pad_w +\n                         (_temp % width_out) * stride_w;\n        _temp /= width_out;\n        const int p0_h = ((dilation_h * (kernel_h - 1)) >> 1) - pad_h +\n                         (_temp % height_out) * stride_h;\n        _temp /= height_out;\n        const int b_col = _temp;\n\n        const opmath_t top_grad = grad_col[index];\n        const int input_size = height_in * width_in;\n        const int kernel_size = kernel_h * kernel_w;\n        int data_weight_ptr = sampling_index * kernel_size;\n        int data_loc_w_ptr = data_weight_ptr << 1;\n        const int grad_sampling_ptr = data_weight_ptr;\n        grad_offset += grad_sampling_ptr << 1;\n        grad_mask += grad_sampling_ptr;\n        const int qid_stride = group * group_channels;\n        const int im_ptr_offset = b_col * input_size * qid_stride;\n        const scalar_t *data_im_ptr = data_im + im_ptr_offset;\n        opmath_t *grad_im_ptr = grad_im + im_ptr_offset;\n        const opmath_t p0_w_ =\n            p0_w - ((dilation_w * (kernel_w - 1)) >> 1) * offset_scale;\n        const opmath_t p0_h_ =\n            p0_h - ((dilation_h * (kernel_h - 1)) >> 1) * offset_scale;\n        for (int i = 0; i < kernel_w; ++i) {\n            for (int j = 0; j < kernel_h; ++j) {\n                const opmath_t offset_w = data_offset[data_loc_w_ptr];\n                const opmath_t offset_h = data_offset[data_loc_w_ptr + 1];\n                const opmath_t loc_w =\n                    p0_w_ + (i * dilation_w + offset_w) * offset_scale;\n                const opmath_t loc_h =\n                    p0_h_ + (j * dilation_h + offset_h) * offset_scale;\n                const opmath_t weight = data_mask[data_weight_ptr];\n                *(cache_grad_offset + (threadIdx.x << 1)) = 0;\n                *(cache_grad_offset + ((threadIdx.x << 1) + 1)) = 0;\n                *(cache_grad_mask + threadIdx.x) = 0;\n                if (loc_h > -1 && loc_w > -1 && loc_h < height_in &&\n                    loc_w < width_in) {\n                    dcnv3_col2im_bilinear(\n                        data_im_ptr, height_in, width_in, group, group_channels,\n                        loc_h, loc_w, g_col, c_col, offset_scale, top_grad,\n                        weight, grad_im_ptr,\n                        cache_grad_offset + (threadIdx.x << 1),\n                        cache_grad_mask + threadIdx.x);\n                }\n\n                __syncthreads();\n                if (tid == 0) {\n                    opmath_t _grad_w = cache_grad_offset[0],\n                             _grad_h = cache_grad_offset[1],\n                             _grad_a = cache_grad_mask[0];\n                    int sid = 2;\n                    for (unsigned int tid = 1; tid < blockSize; ++tid) {\n                        _grad_w += cache_grad_offset[sid];\n                        _grad_h += cache_grad_offset[sid + 1];\n                        _grad_a += cache_grad_mask[tid];\n                        sid += 2;\n                    }\n\n                    *grad_offset = _grad_w;\n                    *(grad_offset + 1) = _grad_h;\n                    *grad_mask = _grad_a;\n                }\n                __syncthreads();\n\n                data_weight_ptr += 1;\n                data_loc_w_ptr += 2;\n                grad_mask += 1;\n                grad_offset += 2;\n            }\n        }\n    }\n}\n\ntemplate <typename scalar_t, unsigned int blockSize>\n__global__ void dcnv3_col2im_gpu_kernel_shm_blocksize_aware_reduce_v2(\n    const int num_kernels, const scalar_t *grad_col, const scalar_t *data_im,\n    const scalar_t *data_offset, const scalar_t *data_mask, const int kernel_h,\n    const int kernel_w, const int stride_h, const int stride_w, const int pad_h,\n    const int pad_w, const int dilation_h, const int dilation_w,\n    const int group, const int group_channels, const int height_in,\n    const int width_in, const int height_out, const int width_out,\n    const opmath_t offset_scale, opmath_t *grad_im, opmath_t *grad_offset,\n    opmath_t *grad_mask) {\n    CUDA_KERNEL_LOOP(index, num_kernels) {\n        __shared__ opmath_t cache_grad_offset[blockSize * 2];\n        __shared__ opmath_t cache_grad_mask[blockSize];\n        unsigned int tid = threadIdx.x;\n        int _temp = index;\n        const int c_col = _temp % group_channels;\n        _temp /= group_channels;\n        const int sampling_index = _temp;\n        const int g_col = _temp % group;\n        _temp /= group;\n        const int p0_w = ((dilation_w * (kernel_w - 1)) >> 1) - pad_w +\n                         (_temp % width_out) * stride_w;\n        _temp /= width_out;\n        const int p0_h = ((dilation_h * (kernel_h - 1)) >> 1) - pad_h +\n                         (_temp % height_out) * stride_h;\n        _temp /= height_out;\n        const int b_col = _temp;\n\n        const opmath_t top_grad = grad_col[index];\n        const int input_size = height_in * width_in;\n        const int kernel_size = kernel_h * kernel_w;\n        int data_weight_ptr = sampling_index * kernel_size;\n        int data_loc_w_ptr = data_weight_ptr << 1;\n        const int grad_sampling_ptr = data_weight_ptr;\n        grad_offset += grad_sampling_ptr << 1;\n        grad_mask += grad_sampling_ptr;\n        const int qid_stride = group * group_channels;\n        const int im_ptr_offset = b_col * input_size * qid_stride;\n        const scalar_t *data_im_ptr = data_im + im_ptr_offset;\n        opmath_t *grad_im_ptr = grad_im + im_ptr_offset;\n        const opmath_t p0_w_ =\n            p0_w - ((dilation_w * (kernel_w - 1)) >> 1) * offset_scale;\n        const opmath_t p0_h_ =\n            p0_h - ((dilation_h * (kernel_h - 1)) >> 1) * offset_scale;\n        for (int i = 0; i < kernel_w; ++i) {\n            for (int j = 0; j < kernel_h; ++j) {\n                const opmath_t offset_w = data_offset[data_loc_w_ptr];\n                const opmath_t offset_h = data_offset[data_loc_w_ptr + 1];\n                const opmath_t loc_w =\n                    p0_w_ + (i * dilation_w + offset_w) * offset_scale;\n                const opmath_t loc_h =\n                    p0_h_ + (j * dilation_h + offset_h) * offset_scale;\n                const opmath_t weight = data_mask[data_weight_ptr];\n                *(cache_grad_offset + (threadIdx.x << 1)) = 0;\n                *(cache_grad_offset + ((threadIdx.x << 1) + 1)) = 0;\n                *(cache_grad_mask + threadIdx.x) = 0;\n                if (loc_h > -1 && loc_w > -1 && loc_h < height_in &&\n                    loc_w < width_in) {\n                    dcnv3_col2im_bilinear(\n                        data_im_ptr, height_in, width_in, group, group_channels,\n                        loc_h, loc_w, g_col, c_col, offset_scale, top_grad,\n                        weight, grad_im_ptr,\n                        cache_grad_offset + (threadIdx.x << 1),\n                        cache_grad_mask + threadIdx.x);\n                }\n\n                __syncthreads();\n\n                for (unsigned int s = blockSize / 2; s > 0; s >>= 1) {\n                    if (tid < s) {\n                        const unsigned int xid1 = tid << 1;\n                        const unsigned int xid2 = (tid + s) << 1;\n                        cache_grad_mask[tid] += cache_grad_mask[tid + s];\n                        cache_grad_offset[xid1] += cache_grad_offset[xid2];\n                        cache_grad_offset[xid1 + 1] +=\n                            cache_grad_offset[xid2 + 1];\n                    }\n                    __syncthreads();\n                }\n\n                if (tid == 0) {\n                    *grad_offset = cache_grad_offset[0];\n                    *(grad_offset + 1) = cache_grad_offset[1];\n                    *grad_mask = cache_grad_mask[0];\n                }\n                __syncthreads();\n\n                data_weight_ptr += 1;\n                data_loc_w_ptr += 2;\n                grad_mask += 1;\n                grad_offset += 2;\n            }\n        }\n    }\n}\n\ntemplate <typename scalar_t>\n__global__ void dcnv3_col2im_gpu_kernel_shm_reduce_v1(\n    const int num_kernels, const scalar_t *grad_col, const scalar_t *data_im,\n    const scalar_t *data_offset, const scalar_t *data_mask, const int kernel_h,\n    const int kernel_w, const int stride_h, const int stride_w, const int pad_h,\n    const int pad_w, const int dilation_h, const int dilation_w,\n    const int group, const int group_channels, const int height_in,\n    const int width_in, const int height_out, const int width_out,\n    const opmath_t offset_scale, opmath_t *grad_im, opmath_t *grad_offset,\n    opmath_t *grad_mask) {\n    CUDA_KERNEL_LOOP(index, num_kernels) {\n        extern __shared__ int _s[];\n        opmath_t *cache_grad_offset = (opmath_t *)_s;\n        opmath_t *cache_grad_mask = cache_grad_offset + 2 * blockDim.x;\n        unsigned int tid = threadIdx.x;\n        int _temp = index;\n        const int c_col = _temp % group_channels;\n        _temp /= group_channels;\n        const int sampling_index = _temp;\n        const int g_col = _temp % group;\n        _temp /= group;\n        const int p0_w = ((dilation_w * (kernel_w - 1)) >> 1) - pad_w +\n                         (_temp % width_out) * stride_w;\n        _temp /= width_out;\n        const int p0_h = ((dilation_h * (kernel_h - 1)) >> 1) - pad_h +\n                         (_temp % height_out) * stride_h;\n        _temp /= height_out;\n        const int b_col = _temp;\n\n        const opmath_t top_grad = grad_col[index];\n        const int input_size = height_in * width_in;\n        const int kernel_size = kernel_h * kernel_w;\n        int data_weight_ptr = sampling_index * kernel_size;\n        int data_loc_w_ptr = data_weight_ptr << 1;\n        const int grad_sampling_ptr = data_weight_ptr;\n        grad_offset += grad_sampling_ptr << 1;\n        grad_mask += grad_sampling_ptr;\n        const int qid_stride = group * group_channels;\n        const int im_ptr_offset = b_col * input_size * qid_stride;\n        const scalar_t *data_im_ptr = data_im + im_ptr_offset;\n        opmath_t *grad_im_ptr = grad_im + im_ptr_offset;\n        const opmath_t p0_w_ =\n            p0_w - ((dilation_w * (kernel_w - 1)) >> 1) * offset_scale;\n        const opmath_t p0_h_ =\n            p0_h - ((dilation_h * (kernel_h - 1)) >> 1) * offset_scale;\n        for (int i = 0; i < kernel_w; ++i) {\n            for (int j = 0; j < kernel_h; ++j) {\n                const opmath_t offset_w = data_offset[data_loc_w_ptr];\n                const opmath_t offset_h = data_offset[data_loc_w_ptr + 1];\n                const opmath_t loc_w =\n                    p0_w_ + (i * dilation_w + offset_w) * offset_scale;\n                const opmath_t loc_h =\n                    p0_h_ + (j * dilation_h + offset_h) * offset_scale;\n                const opmath_t weight = data_mask[data_weight_ptr];\n                *(cache_grad_offset + (threadIdx.x << 1)) = 0;\n                *(cache_grad_offset + ((threadIdx.x << 1) + 1)) = 0;\n                *(cache_grad_mask + threadIdx.x) = 0;\n                if (loc_h > -1 && loc_w > -1 && loc_h < height_in &&\n                    loc_w < width_in) {\n                    dcnv3_col2im_bilinear(\n                        data_im_ptr, height_in, width_in, group, group_channels,\n                        loc_h, loc_w, g_col, c_col, offset_scale, top_grad,\n                        weight, grad_im_ptr,\n                        cache_grad_offset + (threadIdx.x << 1),\n                        cache_grad_mask + threadIdx.x);\n                }\n\n                __syncthreads();\n                if (tid == 0) {\n                    opmath_t _grad_w = cache_grad_offset[0],\n                             _grad_h = cache_grad_offset[1],\n                             _grad_a = cache_grad_mask[0];\n                    int sid = 2;\n                    for (unsigned int tid = 1; tid < blockDim.x; ++tid) {\n                        _grad_w += cache_grad_offset[sid];\n                        _grad_h += cache_grad_offset[sid + 1];\n                        _grad_a += cache_grad_mask[tid];\n                        sid += 2;\n                    }\n\n                    *grad_offset = _grad_w;\n                    *(grad_offset + 1) = _grad_h;\n                    *grad_mask = _grad_a;\n                }\n                __syncthreads();\n\n                data_weight_ptr += 1;\n                data_loc_w_ptr += 2;\n                grad_mask += 1;\n                grad_offset += 2;\n            }\n        }\n    }\n}\n\ntemplate <typename scalar_t>\n__global__ void dcnv3_col2im_gpu_kernel_shm_reduce_v2(\n    const int num_kernels, const scalar_t *grad_col, const scalar_t *data_im,\n    const scalar_t *data_offset, const scalar_t *data_mask, const int kernel_h,\n    const int kernel_w, const int stride_h, const int stride_w, const int pad_h,\n    const int pad_w, const int dilation_h, const int dilation_w,\n    const int group, const int group_channels, const int height_in,\n    const int width_in, const int height_out, const int width_out,\n    const opmath_t offset_scale, opmath_t *grad_im, opmath_t *grad_offset,\n    opmath_t *grad_mask) {\n    CUDA_KERNEL_LOOP(index, num_kernels) {\n        extern __shared__ int _s[];\n        opmath_t *cache_grad_offset = (opmath_t *)_s;\n        opmath_t *cache_grad_mask = cache_grad_offset + 2 * blockDim.x;\n        unsigned int tid = threadIdx.x;\n        int _temp = index;\n        const int c_col = _temp % group_channels;\n        _temp /= group_channels;\n        const int sampling_index = _temp;\n        const int g_col = _temp % group;\n        _temp /= group;\n        const int p0_w = ((dilation_w * (kernel_w - 1)) >> 1) - pad_w +\n                         (_temp % width_out) * stride_w;\n        _temp /= width_out;\n        const int p0_h = ((dilation_h * (kernel_h - 1)) >> 1) - pad_h +\n                         (_temp % height_out) * stride_h;\n        _temp /= height_out;\n        const int b_col = _temp;\n\n        const opmath_t top_grad = grad_col[index];\n        const int input_size = height_in * width_in;\n        const int kernel_size = kernel_h * kernel_w;\n        int data_weight_ptr = sampling_index * kernel_size;\n        int data_loc_w_ptr = data_weight_ptr << 1;\n        const int grad_sampling_ptr = data_weight_ptr;\n        grad_offset += grad_sampling_ptr << 1;\n        grad_mask += grad_sampling_ptr;\n        const int qid_stride = group * group_channels;\n        const int im_ptr_offset = b_col * input_size * qid_stride;\n        const scalar_t *data_im_ptr = data_im + im_ptr_offset;\n        opmath_t *grad_im_ptr = grad_im + im_ptr_offset;\n        const opmath_t p0_w_ =\n            p0_w - ((dilation_w * (kernel_w - 1)) >> 1) * offset_scale;\n        const opmath_t p0_h_ =\n            p0_h - ((dilation_h * (kernel_h - 1)) >> 1) * offset_scale;\n        for (int i = 0; i < kernel_w; ++i) {\n            for (int j = 0; j < kernel_h; ++j) {\n                const opmath_t offset_w = data_offset[data_loc_w_ptr];\n                const opmath_t offset_h = data_offset[data_loc_w_ptr + 1];\n                const opmath_t loc_w =\n                    p0_w_ + (i * dilation_w + offset_w) * offset_scale;\n                const opmath_t loc_h =\n                    p0_h_ + (j * dilation_h + offset_h) * offset_scale;\n                const opmath_t weight = data_mask[data_weight_ptr];\n                *(cache_grad_offset + (threadIdx.x << 1)) = 0;\n                *(cache_grad_offset + ((threadIdx.x << 1) + 1)) = 0;\n                *(cache_grad_mask + threadIdx.x) = 0;\n                if (loc_h > -1 && loc_w > -1 && loc_h < height_in &&\n                    loc_w < width_in) {\n                    dcnv3_col2im_bilinear(\n                        data_im_ptr, height_in, width_in, group, group_channels,\n                        loc_h, loc_w, g_col, c_col, offset_scale, top_grad,\n                        weight, grad_im_ptr,\n                        cache_grad_offset + (threadIdx.x << 1),\n                        cache_grad_mask + threadIdx.x);\n                }\n\n                __syncthreads();\n\n                for (unsigned int s = blockDim.x / 2, spre = blockDim.x; s > 0;\n                     s >>= 1, spre >>= 1) {\n                    if (tid < s) {\n                        const unsigned int xid1 = tid << 1;\n                        const unsigned int xid2 = (tid + s) << 1;\n                        cache_grad_mask[tid] += cache_grad_mask[tid + s];\n                        cache_grad_offset[xid1] += cache_grad_offset[xid2];\n                        cache_grad_offset[xid1 + 1] +=\n                            cache_grad_offset[xid2 + 1];\n                        if (tid + (s << 1) < spre) {\n                            cache_grad_mask[tid] +=\n                                cache_grad_mask[tid + (s << 1)];\n                            cache_grad_offset[xid1] +=\n                                cache_grad_offset[xid2 + (s << 1)];\n                            cache_grad_offset[xid1 + 1] +=\n                                cache_grad_offset[xid2 + 1 + (s << 1)];\n                        }\n                    }\n                    __syncthreads();\n                }\n\n                if (tid == 0) {\n                    *grad_offset = cache_grad_offset[0];\n                    *(grad_offset + 1) = cache_grad_offset[1];\n                    *grad_mask = cache_grad_mask[0];\n                }\n                __syncthreads();\n\n                data_weight_ptr += 1;\n                data_loc_w_ptr += 2;\n                grad_mask += 1;\n                grad_offset += 2;\n            }\n        }\n    }\n}\n\ntemplate <typename scalar_t>\n__global__ void dcnv3_col2im_gpu_kernel_shm_reduce_v2_multi_blocks(\n    const int num_kernels, const scalar_t *grad_col, const scalar_t *data_im,\n    const scalar_t *data_offset, const scalar_t *data_mask, const int kernel_h,\n    const int kernel_w, const int stride_h, const int stride_w, const int pad_h,\n    const int pad_w, const int dilation_h, const int dilation_w,\n    const int group, const int group_channels, const int height_in,\n    const int width_in, const int height_out, const int width_out,\n    const opmath_t offset_scale, opmath_t *grad_im, opmath_t *grad_offset,\n    opmath_t *grad_mask) {\n    CUDA_KERNEL_LOOP(index, num_kernels) {\n        extern __shared__ int _s[];\n        opmath_t *cache_grad_offset = (opmath_t *)_s;\n        opmath_t *cache_grad_mask = cache_grad_offset + 2 * blockDim.x;\n        unsigned int tid = threadIdx.x;\n        int _temp = index;\n        const int c_col = _temp % group_channels;\n        _temp /= group_channels;\n        const int sampling_index = _temp;\n        const int g_col = _temp % group;\n        _temp /= group;\n        const int p0_w = ((dilation_w * (kernel_w - 1)) >> 1) - pad_w +\n                         (_temp % width_out) * stride_w;\n        _temp /= width_out;\n        const int p0_h = ((dilation_h * (kernel_h - 1)) >> 1) - pad_h +\n                         (_temp % height_out) * stride_h;\n        _temp /= height_out;\n        const int b_col = _temp;\n\n        const opmath_t top_grad = grad_col[index];\n        const int input_size = height_in * width_in;\n        const int kernel_size = kernel_h * kernel_w;\n        int data_weight_ptr = sampling_index * kernel_size;\n        int data_loc_w_ptr = data_weight_ptr << 1;\n        const int grad_sampling_ptr = data_weight_ptr;\n        grad_offset += grad_sampling_ptr << 1;\n        grad_mask += grad_sampling_ptr;\n        const int qid_stride = group * group_channels;\n        const int im_ptr_offset = b_col * input_size * qid_stride;\n        const scalar_t *data_im_ptr = data_im + im_ptr_offset;\n        opmath_t *grad_im_ptr = grad_im + im_ptr_offset;\n        const opmath_t p0_w_ =\n            p0_w - ((dilation_w * (kernel_w - 1)) >> 1) * offset_scale;\n        const opmath_t p0_h_ =\n            p0_h - ((dilation_h * (kernel_h - 1)) >> 1) * offset_scale;\n        for (int i = 0; i < kernel_w; ++i) {\n            for (int j = 0; j < kernel_h; ++j) {\n                const opmath_t offset_w = data_offset[data_loc_w_ptr];\n                const opmath_t offset_h = data_offset[data_loc_w_ptr + 1];\n                const opmath_t loc_w =\n                    p0_w_ + (i * dilation_w + offset_w) * offset_scale;\n                const opmath_t loc_h =\n                    p0_h_ + (j * dilation_h + offset_h) * offset_scale;\n                const opmath_t weight = data_mask[data_weight_ptr];\n                *(cache_grad_offset + (threadIdx.x << 1)) = 0;\n                *(cache_grad_offset + ((threadIdx.x << 1) + 1)) = 0;\n                *(cache_grad_mask + threadIdx.x) = 0;\n                if (loc_h > -1 && loc_w > -1 && loc_h < height_in &&\n                    loc_w < width_in) {\n                    dcnv3_col2im_bilinear(\n                        data_im_ptr, height_in, width_in, group, group_channels,\n                        loc_h, loc_w, g_col, c_col, offset_scale, top_grad,\n                        weight, grad_im_ptr,\n                        cache_grad_offset + (threadIdx.x << 1),\n                        cache_grad_mask + threadIdx.x);\n                }\n\n                __syncthreads();\n\n                for (unsigned int s = blockDim.x / 2, spre = blockDim.x; s > 0;\n                     s >>= 1, spre >>= 1) {\n                    if (tid < s) {\n                        const unsigned int xid1 = tid << 1;\n                        const unsigned int xid2 = (tid + s) << 1;\n                        cache_grad_mask[tid] += cache_grad_mask[tid + s];\n                        cache_grad_offset[xid1] += cache_grad_offset[xid2];\n                        cache_grad_offset[xid1 + 1] +=\n                            cache_grad_offset[xid2 + 1];\n                        if (tid + (s << 1) < spre) {\n                            cache_grad_mask[tid] +=\n                                cache_grad_mask[tid + (s << 1)];\n                            cache_grad_offset[xid1] +=\n                                cache_grad_offset[xid2 + (s << 1)];\n                            cache_grad_offset[xid1 + 1] +=\n                                cache_grad_offset[xid2 + 1 + (s << 1)];\n                        }\n                    }\n                    __syncthreads();\n                }\n\n                if (tid == 0) {\n                    atomicAdd(grad_offset, cache_grad_offset[0]);\n                    atomicAdd(grad_offset + 1, cache_grad_offset[1]);\n                    atomicAdd(grad_mask, cache_grad_mask[0]);\n                }\n                __syncthreads();\n\n                data_weight_ptr += 1;\n                data_loc_w_ptr += 2;\n                grad_mask += 1;\n                grad_offset += 2;\n            }\n        }\n    }\n}\n\ntemplate <typename scalar_t>\n__global__ void dcnv3_col2im_gpu_kernel_gm(\n    const int num_kernels, const scalar_t *grad_col, const scalar_t *data_im,\n    const scalar_t *data_offset, const scalar_t *data_mask, const int kernel_h,\n    const int kernel_w, const int stride_h, const int stride_w, const int pad_h,\n    const int pad_w, const int dilation_h, const int dilation_w,\n    const int group, const int group_channels, const int height_in,\n    const int width_in, const int height_out, const int width_out,\n    const opmath_t offset_scale, opmath_t *grad_im, opmath_t *grad_offset,\n    opmath_t *grad_mask) {\n    CUDA_KERNEL_LOOP(index, num_kernels) {\n        int _temp = index;\n        const int c_col = _temp % group_channels;\n        _temp /= group_channels;\n        const int sampling_index = _temp;\n        const int g_col = _temp % group;\n        _temp /= group;\n        const int p0_w = ((dilation_w * (kernel_w - 1)) >> 1) - pad_w +\n                         (_temp % width_out) * stride_w;\n        _temp /= width_out;\n        const int p0_h = ((dilation_h * (kernel_h - 1)) >> 1) - pad_h +\n                         (_temp % height_out) * stride_h;\n        _temp /= height_out;\n        const int b_col = _temp;\n\n        const opmath_t top_grad = grad_col[index];\n        const int input_size = height_in * width_in;\n        const int kernel_size = kernel_h * kernel_w;\n        int data_weight_ptr = sampling_index * kernel_size;\n        int data_loc_w_ptr = data_weight_ptr << 1;\n        const int grad_sampling_ptr = data_weight_ptr;\n        grad_offset += grad_sampling_ptr << 1;\n        grad_mask += grad_sampling_ptr;\n        const int qid_stride = group * group_channels;\n        const int im_ptr_offset = b_col * input_size * qid_stride;\n        const scalar_t *data_im_ptr = data_im + im_ptr_offset;\n        opmath_t *grad_im_ptr = grad_im + im_ptr_offset;\n        const opmath_t p0_w_ =\n            p0_w - ((dilation_w * (kernel_w - 1)) >> 1) * offset_scale;\n        const opmath_t p0_h_ =\n            p0_h - ((dilation_h * (kernel_h - 1)) >> 1) * offset_scale;\n        for (int i = 0; i < kernel_w; ++i) {\n            for (int j = 0; j < kernel_h; ++j) {\n                const opmath_t offset_w = data_offset[data_loc_w_ptr];\n                const opmath_t offset_h = data_offset[data_loc_w_ptr + 1];\n                const opmath_t loc_w =\n                    p0_w_ + (i * dilation_w + offset_w) * offset_scale;\n                const opmath_t loc_h =\n                    p0_h_ + (j * dilation_h + offset_h) * offset_scale;\n                const opmath_t weight = data_mask[data_weight_ptr];\n                if (loc_h > -1 && loc_w > -1 && loc_h < height_in &&\n                    loc_w < width_in) {\n                    dcnv3_col2im_bilinear_gm(\n                        data_im_ptr, height_in, width_in, group, group_channels,\n                        loc_h, loc_w, g_col, c_col, offset_scale, top_grad,\n                        weight, grad_im_ptr, grad_offset, grad_mask);\n                }\n                data_weight_ptr += 1;\n                data_loc_w_ptr += 2;\n                grad_mask += 1;\n                grad_offset += 2;\n            }\n        }\n    }\n}\n\ntemplate <typename scalar_t>\nvoid dcnv3_im2col_cuda(cudaStream_t stream, const scalar_t *data_im,\n                       const scalar_t *data_offset, const scalar_t *data_mask,\n                       scalar_t *data_col, const int kernel_h,\n                       const int kernel_w, const int stride_h,\n                       const int stride_w, const int pad_h, const int pad_w,\n                       const int dilation_h, const int dilation_w,\n                       const int group, const int group_channels,\n                       const int batch_n, const int height_in,\n                       const int width_in, const int height_out,\n                       const int width_out, const opmath_t offset_scale) {\n    const int num_kernels =\n        batch_n * height_out * width_out * group * group_channels;\n    const int num_actual_kernels =\n        batch_n * height_out * width_out * group * group_channels;\n    const int num_threads = CUDA_NUM_THREADS;\n    dcnv3_im2col_gpu_kernel<scalar_t>\n        <<<GET_BLOCKS(num_actual_kernels, num_threads), num_threads, 0,\n           stream>>>(num_kernels, data_im, data_offset, data_mask, data_col,\n                     kernel_h, kernel_w, stride_h, stride_w, pad_h, pad_w,\n                     dilation_h, dilation_w, group, group_channels, height_in,\n                     width_in, height_out, width_out, offset_scale);\n\n    cudaError_t err = cudaGetLastError();\n    if (err != cudaSuccess) {\n        printf(\"error in dcnv3_im2col_cuda: %s\\n\", cudaGetErrorString(err));\n    }\n}\n\ntemplate <typename scalar_t>\nvoid dcnv3_col2im_cuda(\n    cudaStream_t stream, const scalar_t *grad_col, const scalar_t *data_im,\n    const scalar_t *data_offset, const scalar_t *data_mask, const int kernel_h,\n    const int kernel_w, const int stride_h, const int stride_w, const int pad_h,\n    const int pad_w, const int dilation_h, const int dilation_w,\n    const int group, const int group_channels, const int batch_n,\n    const int height_in, const int width_in, const int height_out,\n    const int width_out, const opmath_t offset_scale, opmath_t *grad_im,\n    opmath_t *grad_offset, opmath_t *grad_mask) {\n    const int num_threads =\n        (group_channels > CUDA_NUM_THREADS) ? CUDA_NUM_THREADS : group_channels;\n    const int num_kernels =\n        batch_n * height_out * width_out * group * group_channels;\n    const int num_actual_kernels =\n        batch_n * height_out * width_out * group * group_channels;\n    if (group_channels > 1024) {\n        if ((group_channels & 1023) == 0) {\n            dcnv3_col2im_gpu_kernel_shm_reduce_v2_multi_blocks<scalar_t>\n                <<<GET_BLOCKS(num_actual_kernels, num_threads), num_threads,\n                   num_threads * 3 * sizeof(opmath_t), stream>>>(\n                    num_kernels, grad_col, data_im, data_offset, data_mask,\n                    kernel_h, kernel_w, stride_h, stride_w, pad_h, pad_w,\n                    dilation_h, dilation_w, group, group_channels, height_in,\n                    width_in, height_out, width_out, offset_scale, grad_im,\n                    grad_offset, grad_mask);\n        } else {\n            dcnv3_col2im_gpu_kernel_gm<scalar_t>\n                <<<GET_BLOCKS(num_actual_kernels, num_threads), num_threads, 0,\n                   stream>>>(num_kernels, grad_col, data_im, data_offset,\n                             data_mask, kernel_h, kernel_w, stride_h, stride_w,\n                             pad_h, pad_w, dilation_h, dilation_w, group,\n                             group_channels, height_in, width_in, height_out,\n                             width_out, offset_scale, grad_im, grad_offset,\n                             grad_mask);\n        }\n    } else {\n        switch (group_channels) {\n        case 1:\n            dcnv3_col2im_gpu_kernel_shm_blocksize_aware_reduce_v1<scalar_t, 1>\n                <<<GET_BLOCKS(num_actual_kernels, num_threads), num_threads, 0,\n                   stream>>>(num_kernels, grad_col, data_im, data_offset,\n                             data_mask, kernel_h, kernel_w, stride_h, stride_w,\n                             pad_h, pad_w, dilation_h, dilation_w, group,\n                             group_channels, height_in, width_in, height_out,\n                             width_out, offset_scale, grad_im, grad_offset,\n                             grad_mask);\n            break;\n        case 2:\n            dcnv3_col2im_gpu_kernel_shm_blocksize_aware_reduce_v1<scalar_t, 2>\n                <<<GET_BLOCKS(num_actual_kernels, num_threads), num_threads, 0,\n                   stream>>>(num_kernels, grad_col, data_im, data_offset,\n                             data_mask, kernel_h, kernel_w, stride_h, stride_w,\n                             pad_h, pad_w, dilation_h, dilation_w, group,\n                             group_channels, height_in, width_in, height_out,\n                             width_out, offset_scale, grad_im, grad_offset,\n                             grad_mask);\n            break;\n        case 4:\n            dcnv3_col2im_gpu_kernel_shm_blocksize_aware_reduce_v1<scalar_t, 4>\n                <<<GET_BLOCKS(num_actual_kernels, num_threads), num_threads, 0,\n                   stream>>>(num_kernels, grad_col, data_im, data_offset,\n                             data_mask, kernel_h, kernel_w, stride_h, stride_w,\n                             pad_h, pad_w, dilation_h, dilation_w, group,\n                             group_channels, height_in, width_in, height_out,\n                             width_out, offset_scale, grad_im, grad_offset,\n                             grad_mask);\n            break;\n        case 8:\n            dcnv3_col2im_gpu_kernel_shm_blocksize_aware_reduce_v1<scalar_t, 8>\n                <<<GET_BLOCKS(num_actual_kernels, num_threads), num_threads, 0,\n                   stream>>>(num_kernels, grad_col, data_im, data_offset,\n                             data_mask, kernel_h, kernel_w, stride_h, stride_w,\n                             pad_h, pad_w, dilation_h, dilation_w, group,\n                             group_channels, height_in, width_in, height_out,\n                             width_out, offset_scale, grad_im, grad_offset,\n                             grad_mask);\n            break;\n        case 16:\n            dcnv3_col2im_gpu_kernel_shm_blocksize_aware_reduce_v1<scalar_t, 16>\n                <<<GET_BLOCKS(num_actual_kernels, num_threads), num_threads, 0,\n                   stream>>>(num_kernels, grad_col, data_im, data_offset,\n                             data_mask, kernel_h, kernel_w, stride_h, stride_w,\n                             pad_h, pad_w, dilation_h, dilation_w, group,\n                             group_channels, height_in, width_in, height_out,\n                             width_out, offset_scale, grad_im, grad_offset,\n                             grad_mask);\n            break;\n        case 32:\n            dcnv3_col2im_gpu_kernel_shm_blocksize_aware_reduce_v1<scalar_t, 32>\n                <<<GET_BLOCKS(num_actual_kernels, num_threads), num_threads, 0,\n                   stream>>>(num_kernels, grad_col, data_im, data_offset,\n                             data_mask, kernel_h, kernel_w, stride_h, stride_w,\n                             pad_h, pad_w, dilation_h, dilation_w, group,\n                             group_channels, height_in, width_in, height_out,\n                             width_out, offset_scale, grad_im, grad_offset,\n                             grad_mask);\n            break;\n        case 64:\n            dcnv3_col2im_gpu_kernel_shm_blocksize_aware_reduce_v2<scalar_t, 64>\n                <<<GET_BLOCKS(num_actual_kernels, num_threads), num_threads, 0,\n                   stream>>>(num_kernels, grad_col, data_im, data_offset,\n                             data_mask, kernel_h, kernel_w, stride_h, stride_w,\n                             pad_h, pad_w, dilation_h, dilation_w, group,\n                             group_channels, height_in, width_in, height_out,\n                             width_out, offset_scale, grad_im, grad_offset,\n                             grad_mask);\n            break;\n        case 128:\n            dcnv3_col2im_gpu_kernel_shm_blocksize_aware_reduce_v2<scalar_t, 128>\n                <<<GET_BLOCKS(num_actual_kernels, num_threads), num_threads, 0,\n                   stream>>>(num_kernels, grad_col, data_im, data_offset,\n                             data_mask, kernel_h, kernel_w, stride_h, stride_w,\n                             pad_h, pad_w, dilation_h, dilation_w, group,\n                             group_channels, height_in, width_in, height_out,\n                             width_out, offset_scale, grad_im, grad_offset,\n                             grad_mask);\n            break;\n        case 256:\n            dcnv3_col2im_gpu_kernel_shm_blocksize_aware_reduce_v2<scalar_t, 256>\n                <<<GET_BLOCKS(num_actual_kernels, num_threads), num_threads, 0,\n                   stream>>>(num_kernels, grad_col, data_im, data_offset,\n                             data_mask, kernel_h, kernel_w, stride_h, stride_w,\n                             pad_h, pad_w, dilation_h, dilation_w, group,\n                             group_channels, height_in, width_in, height_out,\n                             width_out, offset_scale, grad_im, grad_offset,\n                             grad_mask);\n            break;\n        case 512:\n            dcnv3_col2im_gpu_kernel_shm_blocksize_aware_reduce_v2<scalar_t, 512>\n                <<<GET_BLOCKS(num_actual_kernels, num_threads), num_threads, 0,\n                   stream>>>(num_kernels, grad_col, data_im, data_offset,\n                             data_mask, kernel_h, kernel_w, stride_h, stride_w,\n                             pad_h, pad_w, dilation_h, dilation_w, group,\n                             group_channels, height_in, width_in, height_out,\n                             width_out, offset_scale, grad_im, grad_offset,\n                             grad_mask);\n            break;\n        case 1024:\n            dcnv3_col2im_gpu_kernel_shm_blocksize_aware_reduce_v2<scalar_t,\n                                                                  1024>\n                <<<GET_BLOCKS(num_actual_kernels, num_threads), num_threads, 0,\n                   stream>>>(num_kernels, grad_col, data_im, data_offset,\n                             data_mask, kernel_h, kernel_w, stride_h, stride_w,\n                             pad_h, pad_w, dilation_h, dilation_w, group,\n                             group_channels, height_in, width_in, height_out,\n                             width_out, offset_scale, grad_im, grad_offset,\n                             grad_mask);\n            break;\n        default:\n            if (group_channels < 64) {\n                dcnv3_col2im_gpu_kernel_shm_reduce_v1<scalar_t>\n                    <<<GET_BLOCKS(num_actual_kernels, num_threads), num_threads,\n                       num_threads * 3 * sizeof(opmath_t), stream>>>(\n                        num_kernels, grad_col, data_im, data_offset, data_mask,\n                        kernel_h, kernel_w, stride_h, stride_w, pad_h, pad_w,\n                        dilation_h, dilation_w, group, group_channels,\n                        height_in, width_in, height_out, width_out,\n                        offset_scale, grad_im, grad_offset, grad_mask);\n            } else {\n                dcnv3_col2im_gpu_kernel_shm_reduce_v2<scalar_t>\n                    <<<GET_BLOCKS(num_actual_kernels, num_threads), num_threads,\n                       num_threads * 3 * sizeof(opmath_t), stream>>>(\n                        num_kernels, grad_col, data_im, data_offset, data_mask,\n                        kernel_h, kernel_w, stride_h, stride_w, pad_h, pad_w,\n                        dilation_h, dilation_w, group, group_channels,\n                        height_in, width_in, height_out, width_out,\n                        offset_scale, grad_im, grad_offset, grad_mask);\n            }\n        }\n    }\n    cudaError_t err = cudaGetLastError();\n    if (err != cudaSuccess) {\n        printf(\"error in dcnv3_col2im_cuda: %s\\n\", cudaGetErrorString(err));\n    }\n}"
  },
  {
    "path": "yolo-improve/yolov5-DCNV3/ops_dcnv3/src/dcnv3.h",
    "content": "/*!\n**************************************************************************************************\n* InternImage\n* Copyright (c) 2022 OpenGVLab\n* Licensed under The MIT License [see LICENSE for details]\n**************************************************************************************************\n* Modified from\n*https://github.com/chengdazhi/Deformable-Convolution-V2-PyTorch/tree/pytorch_1.0.0\n**************************************************************************************************\n*/\n\n#pragma once\n\n#include \"cpu/dcnv3_cpu.h\"\n\n#ifdef WITH_CUDA\n#include \"cuda/dcnv3_cuda.h\"\n#endif\n\nat::Tensor dcnv3_forward(const at::Tensor &input, const at::Tensor &offset,\n                         const at::Tensor &mask, const int kernel_h,\n                         const int kernel_w, const int stride_h,\n                         const int stride_w, const int pad_h, const int pad_w,\n                         const int dilation_h, const int dilation_w,\n                         const int group, const int group_channels,\n                         const float offset_scale, const int im2col_step) {\n    if (input.type().is_cuda()) {\n#ifdef WITH_CUDA\n        return dcnv3_cuda_forward(input, offset, mask, kernel_h, kernel_w,\n                                  stride_h, stride_w, pad_h, pad_w, dilation_h,\n                                  dilation_w, group, group_channels,\n                                  offset_scale, im2col_step);\n#else\n        AT_ERROR(\"Not compiled with GPU support\");\n#endif\n    }\n    AT_ERROR(\"Not implemented on the CPU\");\n}\n\nstd::vector<at::Tensor>\ndcnv3_backward(const at::Tensor &input, const at::Tensor &offset,\n               const at::Tensor &mask, const int kernel_h, const int kernel_w,\n               const int stride_h, const int stride_w, const int pad_h,\n               const int pad_w, const int dilation_h, const int dilation_w,\n               const int group, const int group_channels,\n               const float offset_scale, const at::Tensor &grad_output,\n               const int im2col_step) {\n    if (input.type().is_cuda()) {\n#ifdef WITH_CUDA\n        return dcnv3_cuda_backward(input, offset, mask, kernel_h, kernel_w,\n                                   stride_h, stride_w, pad_h, pad_w, dilation_h,\n                                   dilation_w, group, group_channels,\n                                   offset_scale, grad_output, im2col_step);\n#else\n        AT_ERROR(\"Not compiled with GPU support\");\n#endif\n    }\n    AT_ERROR(\"Not implemented on the CPU\");\n}\n"
  },
  {
    "path": "yolo-improve/yolov5-DCNV3/ops_dcnv3/src/vision.cpp",
    "content": "/*!\n**************************************************************************************************\n* InternImage\n* Copyright (c) 2022 OpenGVLab\n* Licensed under The MIT License [see LICENSE for details]\n**************************************************************************************************\n* Modified from\n*https://github.com/chengdazhi/Deformable-Convolution-V2-PyTorch/tree/pytorch_1.0.0\n**************************************************************************************************\n*/\n\n#include \"dcnv3.h\"\n\nPYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {\n    m.def(\"dcnv3_forward\", &dcnv3_forward, \"dcnv3_forward\");\n    m.def(\"dcnv3_backward\", &dcnv3_backward, \"dcnv3_backward\");\n}\n"
  },
  {
    "path": "yolo-improve/yolov5-DCNV3/ops_dcnv3/test.py",
    "content": "# --------------------------------------------------------\n# InternImage\n# Copyright (c) 2022 OpenGVLab\n# Licensed under The MIT License [see LICENSE for details]\n# --------------------------------------------------------\n\nfrom __future__ import absolute_import\nfrom __future__ import print_function\nfrom __future__ import division\n\nimport time\nimport torch\nimport torch.nn as nn\nimport math\nfrom torch.autograd import gradcheck\n\nfrom functions.dcnv3_func import DCNv3Function, dcnv3_core_pytorch\n\nH_in, W_in = 8, 8\nN, M, D = 2, 4, 16\nKh, Kw = 3, 3\nP = Kh * Kw\noffset_scale = 2.0\npad = 1\ndilation = 1\nstride = 1\nH_out = (H_in + 2 * pad - (dilation * (Kh - 1) + 1)) // stride + 1\nW_out = (W_in + 2 * pad - (dilation * (Kw - 1) + 1)) // stride + 1\n\ntorch.manual_seed(3)\n\n\n@torch.no_grad()\ndef check_forward_equal_with_pytorch_double():\n    input = torch.rand(N, H_in, W_in, M*D).cuda() * 0.01\n    offset = torch.rand(N, H_out, W_out, M*P*2).cuda() * 10\n    mask = torch.rand(N, H_out, W_out, M, P).cuda() + 1e-5\n    mask /= mask.sum(-1, keepdim=True)\n    mask = mask.reshape(N, H_out, W_out, M*P)\n\n    output_pytorch = dcnv3_core_pytorch(\n        input.double(),\n        offset.double(),\n        mask.double(),\n        Kh, Kw, stride, stride, Kh // 2, Kw // 2, dilation, dilation, M, D, offset_scale).detach().cpu()\n\n    im2col_step = 2\n    output_cuda = DCNv3Function.apply(\n        input.double(),\n        offset.double(),\n        mask.double(),\n        Kh, Kw, stride, stride, Kh // 2, Kw // 2, dilation, dilation, M, D, offset_scale,\n        im2col_step).detach().cpu()\n\n    fwdok = torch.allclose(output_cuda, output_pytorch)\n    max_abs_err = (output_cuda - output_pytorch).abs().max()\n    max_rel_err = ((output_cuda - output_pytorch).abs() /\n                   output_pytorch.abs()).max()\n    print('>>> forward double')\n    print(f'* {fwdok} check_forward_equal_with_pytorch_double: max_abs_err {max_abs_err:.2e} max_rel_err {max_rel_err:.2e}')\n\n\n@torch.no_grad()\ndef check_forward_equal_with_pytorch_float():\n    input = torch.rand(N, H_in, W_in, M*D).cuda() * 0.01\n    offset = torch.rand(N, H_out, W_out, M*P*2).cuda() * 10\n    mask = torch.rand(N, H_out, W_out, M, P).cuda() + 1e-5\n    mask /= mask.sum(-1, keepdim=True)\n    mask = mask.reshape(N, H_out, W_out, M*P)\n\n    output_pytorch = dcnv3_core_pytorch(\n        input,\n        offset,\n        mask,\n        Kh, Kw, stride, stride, Kh // 2, Kw // 2, dilation, dilation, M, D, offset_scale).detach().cpu()\n\n    im2col_step = 2\n    output_cuda = DCNv3Function.apply(\n        input,\n        offset,\n        mask,\n        Kh, Kw, stride, stride, Kh // 2, Kw // 2, dilation, dilation, M, D, offset_scale,\n        im2col_step).detach().cpu()\n\n    fwdok = torch.allclose(output_cuda, output_pytorch, rtol=1e-2, atol=1e-3)\n    max_abs_err = (output_cuda - output_pytorch).abs().max()\n    max_rel_err = ((output_cuda - output_pytorch).abs() /\n                   output_pytorch.abs()).max()\n    print('>>> forward float')\n    print(f'* {fwdok} check_forward_equal_with_pytorch_float: max_abs_err {max_abs_err:.2e} max_rel_err {max_rel_err:.2e}')\n\n\ndef check_backward_equal_with_pytorch_double(channels=4, grad_input=True, grad_offset=True, grad_mask=True):\n    # H_in, W_in = 4, 4\n    N = 2\n    M = 2\n    H_out = (H_in + 2 * pad - (dilation * (Kh - 1) + 1)) // stride + 1\n    W_out = (W_in + 2 * pad - (dilation * (Kw - 1) + 1)) // stride + 1\n\n    D = channels\n    input0 = torch.rand(N, H_in, W_in, M*D).cuda() * 0.01\n    offset0 = torch.rand(N, H_out, W_out, M*P*2).cuda() * 10\n    mask0 = torch.rand(N, H_out, W_out, M, P).cuda() + 1e-5\n    mask0 /= mask0.sum(-1, keepdim=True)\n    mask0 = mask0.reshape(N, H_out, W_out, M*P)\n    input0.requires_grad = grad_input\n    offset0.requires_grad = grad_offset\n    mask0.requires_grad = grad_mask\n\n    output_pytorch = dcnv3_core_pytorch(\n        input0.double(),\n        offset0.double(),\n        mask0.double(),\n        Kh, Kw, stride, stride, Kh // 2, Kw // 2, dilation, dilation, M, D, offset_scale)\n    output_pytorch.sum().backward()\n\n    input1 = input0.detach()\n    offset1 = offset0.detach()\n    mask1 = mask0.detach()\n    input1.requires_grad = grad_input\n    offset1.requires_grad = grad_offset\n    mask1.requires_grad = grad_mask\n\n    im2col_step = 2\n    output_cuda = DCNv3Function.apply(\n        input1.double(),\n        offset1.double(),\n        mask1.double(),\n        Kh, Kw, stride, stride, Kh // 2, Kw // 2, dilation, dilation, M, D, offset_scale,\n        im2col_step)\n    output_cuda.sum().backward()\n\n    print(f'>>> backward double: channels {D}')\n    bwdok = torch.allclose(input0.grad, input1.grad, rtol=1e-2, atol=1e-3)\n    max_abs_err = (input0.grad - input1.grad).abs().max()\n    max_rel_err = ((input0.grad - input1.grad).abs() /\n                   input0.grad.abs()).max()\n    print(\n        f'* {bwdok} input_grad check_backward_equal_with_pytorch_double: max_abs_err {max_abs_err:.2e} max_rel_err {max_rel_err:.2e}')\n\n    bwdok = torch.allclose(offset0.grad, offset1.grad, rtol=1e-2, atol=1e-3)\n    max_abs_err = (offset0.grad - offset1.grad).abs().max()\n    max_rel_err = ((offset0.grad - offset1.grad).abs() /\n                   offset0.grad.abs()).max()\n    print(\n        f'* {bwdok} offset_grad check_backward_equal_with_pytorch_double: max_abs_err {max_abs_err:.2e} max_rel_err {max_rel_err:.2e}')\n\n    bwdok = torch.allclose(mask0.grad, mask1.grad, rtol=1e-2, atol=1e-3)\n    max_abs_err = (mask0.grad - mask1.grad).abs().max()\n    max_rel_err = ((mask0.grad - mask1.grad).abs() /\n                   mask0.grad.abs()).max()\n    print(\n        f'* {bwdok} mask_grad check_backward_equal_with_pytorch_double: max_abs_err {max_abs_err:.2e} max_rel_err {max_rel_err:.2e}')\n\n\ndef check_backward_equal_with_pytorch_float(channels=4, grad_input=True, grad_offset=True, grad_mask=True):\n    # H_in, W_in = 4, 4\n    N = 2\n    M = 2\n    H_out = (H_in + 2 * pad - (dilation * (Kh - 1) + 1)) // stride + 1\n    W_out = (W_in + 2 * pad - (dilation * (Kw - 1) + 1)) // stride + 1\n\n    D = channels\n    input0 = torch.rand(N, H_in, W_in, M*D).cuda() * 0.01\n    offset0 = torch.rand(N, H_out, W_out, M*P*2).cuda() * 10\n    mask0 = torch.rand(N, H_out, W_out, M, P).cuda() + 1e-5\n    mask0 /= mask0.sum(-1, keepdim=True)\n    mask0 = mask0.reshape(N, H_out, W_out, M*P)\n    input0.requires_grad = grad_input\n    offset0.requires_grad = grad_offset\n    mask0.requires_grad = grad_mask\n\n    output_pytorch = dcnv3_core_pytorch(\n        input0,\n        offset0,\n        mask0,\n        Kh, Kw, stride, stride, Kh // 2, Kw // 2, dilation, dilation, M, D, offset_scale)\n    output_pytorch.sum().backward()\n\n    input1 = input0.detach()\n    offset1 = offset0.detach()\n    mask1 = mask0.detach()\n    input1.requires_grad = grad_input\n    offset1.requires_grad = grad_offset\n    mask1.requires_grad = grad_mask\n\n    im2col_step = 2\n    output_cuda = DCNv3Function.apply(\n        input1,\n        offset1,\n        mask1,\n        Kh, Kw, stride, stride, Kh // 2, Kw // 2, dilation, dilation, M, D, offset_scale,\n        im2col_step)\n    output_cuda.sum().backward()\n\n    print(f'>>> backward float: channels {D}')\n    bwdok = torch.allclose(input0.grad, input1.grad, rtol=1e-2, atol=1e-3)\n    max_abs_err = (input0.grad - input1.grad).abs().max()\n    max_rel_err = ((input0.grad - input1.grad).abs() /\n                   input0.grad.abs()).max()\n    print(\n        f'* {bwdok} input_grad check_backward_equal_with_pytorch_float: max_abs_err {max_abs_err:.2e} max_rel_err {max_rel_err:.2e}')\n\n    bwdok = torch.allclose(offset0.grad, offset1.grad, rtol=1e-2, atol=1e-3)\n    max_abs_err = (offset0.grad - offset1.grad).abs().max()\n    max_rel_err = ((offset0.grad - offset1.grad).abs() /\n                   offset0.grad.abs()).max()\n    print(\n        f'* {bwdok} offset_grad check_backward_equal_with_pytorch_float: max_abs_err {max_abs_err:.2e} max_rel_err {max_rel_err:.2e}')\n\n    bwdok = torch.allclose(mask0.grad, mask1.grad, rtol=1e-2, atol=1e-3)\n    max_abs_err = (mask0.grad - mask1.grad).abs().max()\n    max_rel_err = ((mask0.grad - mask1.grad).abs() /\n                   mask0.grad.abs()).max()\n    print(\n        f'* {bwdok} mask_grad check_backward_equal_with_pytorch_float: max_abs_err {max_abs_err:.2e} max_rel_err {max_rel_err:.2e}')\n\n\n@torch.no_grad()\ndef check_time_cost(im2col_step=128):\n    N = 512\n    H_in, W_in = 64, 64\n    H_out = (H_in + 2 * pad - (dilation * (Kh - 1) + 1)) // stride + 1\n    W_out = (W_in + 2 * pad - (dilation * (Kw - 1) + 1)) // stride + 1\n\n    input = torch.rand(N, H_in, W_in, M*D).cuda() * 0.01\n    offset = torch.rand(N, H_out, W_out, M*P*2).cuda() * 10\n    mask = torch.rand(N, H_out, W_out, M, P).cuda() + 1e-5\n    mask /= mask.sum(-1, keepdim=True)\n    mask = mask.reshape(N, H_out, W_out, M*P)\n    print(\n        f'>>> time cost: im2col_step {im2col_step}; input {input.shape}; points {P} ')\n    repeat = 100\n    for i in range(repeat):\n        output_cuda = DCNv3Function.apply(\n            input,\n            offset,\n            mask,\n            Kh, Kw, stride, stride, Kh // 2, Kw // 2, dilation, dilation, M, D, 1.0,\n            im2col_step)\n    torch.cuda.synchronize()\n    start = time.time()\n    for i in range(repeat):\n        output_cuda = DCNv3Function.apply(\n            input,\n            offset,\n            mask,\n            Kh, Kw, stride, stride, Kh // 2, Kw // 2, dilation, dilation, M, D, 1.0,\n            im2col_step)\n    torch.cuda.synchronize()\n    print(f'foward time cost: {(time.time() - start) / repeat}')\n\n\nif __name__ == '__main__':\n    check_forward_equal_with_pytorch_double()\n    check_forward_equal_with_pytorch_float()\n    for channels in [1, 16, 30, 32, 64, 71, 1025]:\n        check_backward_equal_with_pytorch_double(channels, True, True, True)\n    for channels in [1, 16, 30, 32, 64, 71, 1025]:\n        check_backward_equal_with_pytorch_float(channels, True, True, True)\n    for i in range(3):\n        im2col_step = 128 * (2 ** i)\n        check_time_cost(im2col_step)"
  },
  {
    "path": "yolo-improve/yolov5-DSConv.py",
    "content": "import torch.nn.functional as F\nfrom torch.nn.modules.conv import _ConvNd\nfrom torch.nn.modules.utils import _pair\n\nclass DSConv(_ConvNd):\n    def __init__(self, in_channels, out_channels, kernel_size, stride=1,\n                 padding=None, dilation=1, groups=1, padding_mode='zeros', bias=False, block_size=32, KDSBias=False, CDS=False):\n        padding = _pair(autopad(kernel_size, padding, dilation))\n        kernel_size = _pair(kernel_size)\n        stride = _pair(stride)\n        dilation = _pair(dilation)\n\n        blck_numb = math.ceil(((in_channels)/(block_size*groups)))\n        super(DSConv, self).__init__(\n            in_channels, out_channels, kernel_size, stride, padding, dilation,\n            False, _pair(0), groups, bias, padding_mode)\n\n        # KDS weight From Paper\n        self.intweight = torch.Tensor(out_channels, in_channels, *kernel_size)\n        self.alpha = torch.Tensor(out_channels, blck_numb, *kernel_size)\n\n        # KDS bias From Paper\n        self.KDSBias = KDSBias\n        self.CDS = CDS\n\n        if KDSBias:\n            self.KDSb = torch.Tensor(out_channels, blck_numb, *kernel_size)\n        if CDS:\n            self.CDSw = torch.Tensor(out_channels)\n            self.CDSb = torch.Tensor(out_channels)\n\n        self.reset_parameters()\n\n    def get_weight_res(self):\n        # Include expansion of alpha and multiplication with weights to include in the convolution layer here\n        alpha_res = torch.zeros(self.weight.shape).to(self.alpha.device)\n\n        # Include KDSBias\n        if self.KDSBias:\n            KDSBias_res = torch.zeros(self.weight.shape).to(self.alpha.device)\n\n        # Handy definitions:\n        nmb_blocks = self.alpha.shape[1]\n        total_depth = self.weight.shape[1]\n        bs = total_depth//nmb_blocks\n\n        llb = total_depth-(nmb_blocks-1)*bs\n\n        # Casting the Alpha values as same tensor shape as weight\n        for i in range(nmb_blocks):\n            length_blk = llb if i==nmb_blocks-1 else bs\n\n            shp = self.alpha.shape # Notice this is the same shape for the bias as well\n            to_repeat=self.alpha[:, i, ...].view(shp[0],1,shp[2],shp[3]).clone()\n            repeated = to_repeat.expand(shp[0], length_blk, shp[2], shp[3]).clone()\n            alpha_res[:, i*bs:(i*bs+length_blk), ...] = repeated.clone()\n\n            if self.KDSBias:\n                to_repeat = self.KDSb[:, i, ...].view(shp[0], 1, shp[2], shp[3]).clone()\n                repeated = to_repeat.expand(shp[0], length_blk, shp[2], shp[3]).clone()\n                KDSBias_res[:, i*bs:(i*bs+length_blk), ...] = repeated.clone()\n\n        if self.CDS:\n            to_repeat = self.CDSw.view(-1, 1, 1, 1)\n            repeated = to_repeat.expand_as(self.weight)\n            print(repeated.shape)\n\n        # Element-wise multiplication of alpha and weight\n        weight_res = torch.mul(alpha_res, self.weight)\n        if self.KDSBias:\n            weight_res = torch.add(weight_res, KDSBias_res)\n        return weight_res\n\n    def forward(self, input):\n        # Get resulting weight\n        #weight_res = self.get_weight_res()\n\n        # Returning convolution\n        return F.conv2d(input, self.weight, self.bias,\n                            self.stride, self.padding, self.dilation,\n                            self.groups)\n\nclass DSConv2D(Conv):\n    def __init__(self, inc, ouc, k=1, s=1, p=None, g=1, d=1, act=True):\n        super().__init__(inc, ouc, k, s, p, g, d, act)\n        self.conv = DSConv(inc, ouc, k, s, p, g, d)\n\nclass Bottleneck_DSConv(nn.Module):\n    # Standard bottleneck\n    def __init__(self, c1, c2, shortcut=True, g=1, e=0.5):  # ch_in, ch_out, shortcut, groups, expansion\n        super().__init__()\n        c_ = int(c2 * e)  # hidden channels\n        self.cv1 = DSConv2D(c1, c_, 1, 1)\n        self.cv2 = DSConv2D(c_, c2, 3, 1, g=g)\n        self.add = shortcut and c1 == c2\n\n    def forward(self, x):\n        return x + self.cv2(self.cv1(x)) if self.add else self.cv2(self.cv1(x))\n\nclass C3_DSConv(C3):\n    # C3 module with dsconv\n    def __init__(self, c1, c2, n=1, shortcut=True, g=1, e=0.5):\n        super().__init__(c1, c2, n, shortcut, g, e)\n        c_ = int(c2 * e)\n        self.m = nn.Sequential(*(Bottleneck_DSConv(c_, c_, shortcut, g, e=1.0) for _ in range(n)))"
  },
  {
    "path": "yolo-improve/yolov5-DecoupledHead.py",
    "content": "class Decoupled_Detect(nn.Module):\n    # YOLOv5 Detect head for detection models\n    stride = None  # strides computed during build\n    dynamic = False  # force grid reconstruction\n    export = False  # export mode\n\n    def __init__(self, nc=80, anchors=(), ch=(), inplace=True):  # detection layer\n        super().__init__()\n        self.nc = nc  # number of classes\n        self.no = nc + 5  # number of outputs per anchor\n        self.nl = len(anchors)  # number of detection layers\n        self.na = len(anchors[0]) // 2  # number of anchors\n        self.grid = [torch.empty(0) for _ in range(self.nl)]  # init grid\n        self.anchor_grid = [torch.empty(0) for _ in range(self.nl)]  # init anchor grid\n        self.register_buffer('anchors', torch.tensor(anchors).float().view(self.nl, -1, 2))  # shape(nl,na,2)\n        \n        self.m_stem = nn.ModuleList(Conv(x, x, 1) for x in ch)  # stem conv\n        self.m_cls = nn.ModuleList(nn.Sequential(Conv(x, x, 3), nn.Conv2d(x, self.na * self.nc, 1)) for x in ch)  # cls conv\n        self.m_reg_conf = nn.ModuleList(Conv(x, x, 3) for x in ch)  # reg_conf stem conv\n        self.m_reg = nn.ModuleList(nn.Conv2d(x, self.na * 4, 1) for x in ch)  # reg conv\n        self.m_conf = nn.ModuleList(nn.Conv2d(x, self.na * 1, 1) for x in ch)  # conf conv\n        \n        self.inplace = inplace  # use inplace ops (e.g. slice assignment)\n\n    def forward(self, x):\n        z = []  # inference output\n        for i in range(self.nl):\n            x[i] = self.m_stem[i](x[i])  # conv\n            \n            bs, _, ny, nx = x[i].shape\n            x_cls = self.m_cls[i](x[i]).view(bs, self.na, self.nc, ny, nx).permute(0, 1, 3, 4, 2).contiguous()\n            x_reg_conf = self.m_reg_conf[i](x[i])\n            x_reg = self.m_reg[i](x_reg_conf).view(bs, self.na, 4, ny, nx).permute(0, 1, 3, 4, 2).contiguous()\n            x_conf = self.m_conf[i](x_reg_conf).view(bs, self.na, 1, ny, nx).permute(0, 1, 3, 4, 2).contiguous()\n            x[i] = torch.cat([x_reg, x_conf, x_cls], dim=4)\n\n            if not self.training:  # inference\n                if self.dynamic or self.grid[i].shape[2:4] != x[i].shape[2:4]:\n                    self.grid[i], self.anchor_grid[i] = self._make_grid(nx, ny, i)\n\n                if isinstance(self, Segment):  # (boxes + masks)\n                    xy, wh, conf, mask = x[i].split((2, 2, self.nc + 1, self.no - self.nc - 5), 4)\n                    xy = (xy.sigmoid() * 2 + self.grid[i]) * self.stride[i]  # xy\n                    wh = (wh.sigmoid() * 2) ** 2 * self.anchor_grid[i]  # wh\n                    y = torch.cat((xy, wh, conf.sigmoid(), mask), 4)\n                else:  # Detect (boxes only)\n                    xy, wh, conf = x[i].sigmoid().split((2, 2, self.nc + 1), 4)\n                    xy = (xy * 2 + self.grid[i]) * self.stride[i]  # xy\n                    wh = (wh * 2) ** 2 * self.anchor_grid[i]  # wh\n                    y = torch.cat((xy, wh, conf), 4)\n                z.append(y.view(bs, self.na * nx * ny, self.no))\n\n        return x if self.training else (torch.cat(z, 1),) if self.export else (torch.cat(z, 1), x)\n\n    def _make_grid(self, nx=20, ny=20, i=0, torch_1_10=check_version(torch.__version__, '1.10.0')):\n        d = self.anchors[i].device\n        t = self.anchors[i].dtype\n        shape = 1, self.na, ny, nx, 2  # grid shape\n        y, x = torch.arange(ny, device=d, dtype=t), torch.arange(nx, device=d, dtype=t)\n        yv, xv = torch.meshgrid(y, x, indexing='ij') if torch_1_10 else torch.meshgrid(y, x)  # torch>=0.7 compatibility\n        grid = torch.stack((xv, yv), 2).expand(shape) - 0.5  # add grid offset, i.e. y = 2.0 * x - 0.5\n        anchor_grid = (self.anchors[i] * self.stride[i]).view((1, self.na, 1, 1, 2)).expand(shape)\n        return grid, anchor_grid\n\ndef _initialize_biases(self, cf=None):  # initialize biases into Detect(), cf is class frequency\n    # https://arxiv.org/abs/1708.02002 section 3.3\n    # cf = torch.bincount(torch.tensor(np.concatenate(dataset.labels, 0)[:, 0]).long(), minlength=nc) + 1.\n    m = self.model[-1]  # Detect() module\n    \n    if isinstance(m, Detect):\n        for mi, s in zip(m.m, m.stride):  # from\n            b = mi.bias.view(m.na, -1)  # conv.bias(255) to (3,85)\n            b.data[:, 4] += math.log(8 / (640 / s) ** 2)  # obj (8 objects per 640 image)\n            b.data[:, 5:5 + m.nc] += math.log(0.6 / (m.nc - 0.99999)) if cf is None else torch.log(cf / cf.sum())  # cls\n            mi.bias = torch.nn.Parameter(b.view(-1), requires_grad=True)\n    elif isinstance(m, Decoupled_Detect):\n        for mi, s in zip(m.m_conf, m.stride):  # from\n            b = mi.bias.view(m.na, -1)  # conv.bias(255) to (3,85)\n            b.data += math.log(8 / (640 / s) ** 2)  # obj (8 objects per 640 image)\n            mi.bias = torch.nn.Parameter(b.view(-1), requires_grad=True)\n\n        for mi, s in zip(m.m_cls, m.stride):  # from\n            b = mi[-1].bias.view(m.na, -1)  # conv.bias(255) to (3,85)\n            b.data += math.log(0.6 / (m.nc - 0.99999)) if cf is None else torch.log(cf / cf.sum())  # cls\n            mi[-1].bias = torch.nn.Parameter(b.view(-1), requires_grad=True)"
  },
  {
    "path": "yolo-improve/yolov5-DySnakeConv.py",
    "content": "import torch\nimport torch.nn as nn\n\nclass DySnakeConv(nn.Module):\n    def __init__(self, inc, ouc, k=3, act=True) -> None:\n        super().__init__()\n        \n        self.conv_0 = Conv(inc, ouc, k, act=act)\n        self.conv_x = DSConv(inc, ouc, 0, k)\n        self.conv_y = DSConv(inc, ouc, 1, k)\n        self.conv_1x1 = Conv(ouc * 3, ouc, 1, act=act)\n    \n    def forward(self, x):\n        return self.conv_1x1(torch.cat([self.conv_0(x), self.conv_x(x), self.conv_y(x)], dim=1))\n\nclass DSConv(nn.Module):\n    def __init__(self, in_ch, out_ch, morph, kernel_size=3, if_offset=True, extend_scope=1):\n        \"\"\"\n        The Dynamic Snake Convolution\n        :param in_ch: input channel\n        :param out_ch: output channel\n        :param kernel_size: the size of kernel\n        :param extend_scope: the range to expand (default 1 for this method)\n        :param morph: the morphology of the convolution kernel is mainly divided into two types\n                        along the x-axis (0) and the y-axis (1) (see the paper for details)\n        :param if_offset: whether deformation is required, if it is False, it is the standard convolution kernel\n        \"\"\"\n        super(DSConv, self).__init__()\n        # use the <offset_conv> to learn the deformable offset\n        self.offset_conv = nn.Conv2d(in_ch, 2 * kernel_size, 3, padding=1)\n        self.bn = nn.BatchNorm2d(2 * kernel_size)\n        self.kernel_size = kernel_size\n\n        # two types of the DSConv (along x-axis and y-axis)\n        self.dsc_conv_x = nn.Conv2d(\n            in_ch,\n            out_ch,\n            kernel_size=(kernel_size, 1),\n            stride=(kernel_size, 1),\n            padding=0,\n        )\n        self.dsc_conv_y = nn.Conv2d(\n            in_ch,\n            out_ch,\n            kernel_size=(1, kernel_size),\n            stride=(1, kernel_size),\n            padding=0,\n        )\n\n        self.gn = nn.GroupNorm(out_ch // 4, out_ch)\n        self.act = Conv.default_act\n\n        self.extend_scope = extend_scope\n        self.morph = morph\n        self.if_offset = if_offset\n\n    def forward(self, f):\n        offset = self.offset_conv(f)\n        offset = self.bn(offset)\n        # We need a range of deformation between -1 and 1 to mimic the snake's swing\n        offset = torch.tanh(offset)\n        input_shape = f.shape\n        dsc = DSC(input_shape, self.kernel_size, self.extend_scope, self.morph)\n        deformed_feature = dsc.deform_conv(f, offset, self.if_offset)\n        if self.morph == 0:\n            x = self.dsc_conv_x(deformed_feature.type(f.dtype))\n            x = self.gn(x)\n            x = self.act(x)\n            return x\n        else:\n            x = self.dsc_conv_y(deformed_feature.type(f.dtype))\n            x = self.gn(x)\n            x = self.act(x)\n            return x\n\n\n# Core code, for ease of understanding, we mark the dimensions of input and output next to the code\nclass DSC(object):\n    def __init__(self, input_shape, kernel_size, extend_scope, morph):\n        self.num_points = kernel_size\n        self.width = input_shape[2]\n        self.height = input_shape[3]\n        self.morph = morph\n        self.extend_scope = extend_scope  # offset (-1 ~ 1) * extend_scope\n\n        # define feature map shape\n        \"\"\"\n        B: Batch size  C: Channel  W: Width  H: Height\n        \"\"\"\n        self.num_batch = input_shape[0]\n        self.num_channels = input_shape[1]\n\n    \"\"\"\n    input: offset [B,2*K,W,H]  K: Kernel size (2*K: 2D image, deformation contains <x_offset> and <y_offset>)\n    output_x: [B,1,W,K*H]   coordinate map\n    output_y: [B,1,K*W,H]   coordinate map\n    \"\"\"\n\n    def _coordinate_map_3D(self, offset, if_offset):\n        device = offset.device\n        # offset\n        y_offset, x_offset = torch.split(offset, self.num_points, dim=1)\n\n        y_center = torch.arange(0, self.width).repeat([self.height])\n        y_center = y_center.reshape(self.height, self.width)\n        y_center = y_center.permute(1, 0)\n        y_center = y_center.reshape([-1, self.width, self.height])\n        y_center = y_center.repeat([self.num_points, 1, 1]).float()\n        y_center = y_center.unsqueeze(0)\n\n        x_center = torch.arange(0, self.height).repeat([self.width])\n        x_center = x_center.reshape(self.width, self.height)\n        x_center = x_center.permute(0, 1)\n        x_center = x_center.reshape([-1, self.width, self.height])\n        x_center = x_center.repeat([self.num_points, 1, 1]).float()\n        x_center = x_center.unsqueeze(0)\n\n        if self.morph == 0:\n            \"\"\"\n            Initialize the kernel and flatten the kernel\n                y: only need 0\n                x: -num_points//2 ~ num_points//2 (Determined by the kernel size)\n                !!! The related PPT will be submitted later, and the PPT will contain the whole changes of each step\n            \"\"\"\n            y = torch.linspace(0, 0, 1)\n            x = torch.linspace(\n                -int(self.num_points // 2),\n                int(self.num_points // 2),\n                int(self.num_points),\n            )\n\n            y, x = torch.meshgrid(y, x)\n            y_spread = y.reshape(-1, 1)\n            x_spread = x.reshape(-1, 1)\n\n            y_grid = y_spread.repeat([1, self.width * self.height])\n            y_grid = y_grid.reshape([self.num_points, self.width, self.height])\n            y_grid = y_grid.unsqueeze(0)  # [B*K*K, W,H]\n\n            x_grid = x_spread.repeat([1, self.width * self.height])\n            x_grid = x_grid.reshape([self.num_points, self.width, self.height])\n            x_grid = x_grid.unsqueeze(0)  # [B*K*K, W,H]\n\n            y_new = y_center + y_grid\n            x_new = x_center + x_grid\n\n            y_new = y_new.repeat(self.num_batch, 1, 1, 1).to(device)\n            x_new = x_new.repeat(self.num_batch, 1, 1, 1).to(device)\n\n            y_offset_new = y_offset.detach().clone()\n\n            if if_offset:\n                y_offset = y_offset.permute(1, 0, 2, 3)\n                y_offset_new = y_offset_new.permute(1, 0, 2, 3)\n                center = int(self.num_points // 2)\n\n                # The center position remains unchanged and the rest of the positions begin to swing\n                # This part is quite simple. The main idea is that \"offset is an iterative process\"\n                y_offset_new[center] = 0\n                for index in range(1, center):\n                    y_offset_new[center + index] = (y_offset_new[center + index - 1] + y_offset[center + index])\n                    y_offset_new[center - index] = (y_offset_new[center - index + 1] + y_offset[center - index])\n                y_offset_new = y_offset_new.permute(1, 0, 2, 3).to(device)\n                y_new = y_new.add(y_offset_new.mul(self.extend_scope))\n\n            y_new = y_new.reshape(\n                [self.num_batch, self.num_points, 1, self.width, self.height])\n            y_new = y_new.permute(0, 3, 1, 4, 2)\n            y_new = y_new.reshape([\n                self.num_batch, self.num_points * self.width, 1 * self.height\n            ])\n            x_new = x_new.reshape(\n                [self.num_batch, self.num_points, 1, self.width, self.height])\n            x_new = x_new.permute(0, 3, 1, 4, 2)\n            x_new = x_new.reshape([\n                self.num_batch, self.num_points * self.width, 1 * self.height\n            ])\n            return y_new, x_new\n\n        else:\n            \"\"\"\n            Initialize the kernel and flatten the kernel\n                y: -num_points//2 ~ num_points//2 (Determined by the kernel size)\n                x: only need 0\n            \"\"\"\n            y = torch.linspace(\n                -int(self.num_points // 2),\n                int(self.num_points // 2),\n                int(self.num_points),\n            )\n            x = torch.linspace(0, 0, 1)\n\n            y, x = torch.meshgrid(y, x)\n            y_spread = y.reshape(-1, 1)\n            x_spread = x.reshape(-1, 1)\n\n            y_grid = y_spread.repeat([1, self.width * self.height])\n            y_grid = y_grid.reshape([self.num_points, self.width, self.height])\n            y_grid = y_grid.unsqueeze(0)\n\n            x_grid = x_spread.repeat([1, self.width * self.height])\n            x_grid = x_grid.reshape([self.num_points, self.width, self.height])\n            x_grid = x_grid.unsqueeze(0)\n\n            y_new = y_center + y_grid\n            x_new = x_center + x_grid\n\n            y_new = y_new.repeat(self.num_batch, 1, 1, 1)\n            x_new = x_new.repeat(self.num_batch, 1, 1, 1)\n\n            y_new = y_new.to(device)\n            x_new = x_new.to(device)\n            x_offset_new = x_offset.detach().clone()\n\n            if if_offset:\n                x_offset = x_offset.permute(1, 0, 2, 3)\n                x_offset_new = x_offset_new.permute(1, 0, 2, 3)\n                center = int(self.num_points // 2)\n                x_offset_new[center] = 0\n                for index in range(1, center):\n                    x_offset_new[center + index] = (x_offset_new[center + index - 1] + x_offset[center + index])\n                    x_offset_new[center - index] = (x_offset_new[center - index + 1] + x_offset[center - index])\n                x_offset_new = x_offset_new.permute(1, 0, 2, 3).to(device)\n                x_new = x_new.add(x_offset_new.mul(self.extend_scope))\n\n            y_new = y_new.reshape(\n                [self.num_batch, 1, self.num_points, self.width, self.height])\n            y_new = y_new.permute(0, 3, 1, 4, 2)\n            y_new = y_new.reshape([\n                self.num_batch, 1 * self.width, self.num_points * self.height\n            ])\n            x_new = x_new.reshape(\n                [self.num_batch, 1, self.num_points, self.width, self.height])\n            x_new = x_new.permute(0, 3, 1, 4, 2)\n            x_new = x_new.reshape([\n                self.num_batch, 1 * self.width, self.num_points * self.height\n            ])\n            return y_new, x_new\n\n    \"\"\"\n    input: input feature map [N,C,D,W,H]；coordinate map [N,K*D,K*W,K*H] \n    output: [N,1,K*D,K*W,K*H]  deformed feature map\n    \"\"\"\n    def _bilinear_interpolate_3D(self, input_feature, y, x):\n        device = input_feature.device\n        y = y.reshape([-1]).float()\n        x = x.reshape([-1]).float()\n\n        zero = torch.zeros([]).int()\n        max_y = self.width - 1\n        max_x = self.height - 1\n\n        # find 8 grid locations\n        y0 = torch.floor(y).int()\n        y1 = y0 + 1\n        x0 = torch.floor(x).int()\n        x1 = x0 + 1\n\n        # clip out coordinates exceeding feature map volume\n        y0 = torch.clamp(y0, zero, max_y)\n        y1 = torch.clamp(y1, zero, max_y)\n        x0 = torch.clamp(x0, zero, max_x)\n        x1 = torch.clamp(x1, zero, max_x)\n\n        input_feature_flat = input_feature.flatten()\n        input_feature_flat = input_feature_flat.reshape(\n            self.num_batch, self.num_channels, self.width, self.height)\n        input_feature_flat = input_feature_flat.permute(0, 2, 3, 1)\n        input_feature_flat = input_feature_flat.reshape(-1, self.num_channels)\n        dimension = self.height * self.width\n\n        base = torch.arange(self.num_batch) * dimension\n        base = base.reshape([-1, 1]).float()\n\n        repeat = torch.ones([self.num_points * self.width * self.height\n                             ]).unsqueeze(0)\n        repeat = repeat.float()\n\n        base = torch.matmul(base, repeat)\n        base = base.reshape([-1])\n\n        base = base.to(device)\n\n        base_y0 = base + y0 * self.height\n        base_y1 = base + y1 * self.height\n\n        # top rectangle of the neighbourhood volume\n        index_a0 = base_y0 - base + x0\n        index_c0 = base_y0 - base + x1\n\n        # bottom rectangle of the neighbourhood volume\n        index_a1 = base_y1 - base + x0\n        index_c1 = base_y1 - base + x1\n\n        # get 8 grid values\n        value_a0 = input_feature_flat[index_a0.type(torch.int64)].to(device)\n        value_c0 = input_feature_flat[index_c0.type(torch.int64)].to(device)\n        value_a1 = input_feature_flat[index_a1.type(torch.int64)].to(device)\n        value_c1 = input_feature_flat[index_c1.type(torch.int64)].to(device)\n\n        # find 8 grid locations\n        y0 = torch.floor(y).int()\n        y1 = y0 + 1\n        x0 = torch.floor(x).int()\n        x1 = x0 + 1\n\n        # clip out coordinates exceeding feature map volume\n        y0 = torch.clamp(y0, zero, max_y + 1)\n        y1 = torch.clamp(y1, zero, max_y + 1)\n        x0 = torch.clamp(x0, zero, max_x + 1)\n        x1 = torch.clamp(x1, zero, max_x + 1)\n\n        x0_float = x0.float()\n        x1_float = x1.float()\n        y0_float = y0.float()\n        y1_float = y1.float()\n\n        vol_a0 = ((y1_float - y) * (x1_float - x)).unsqueeze(-1).to(device)\n        vol_c0 = ((y1_float - y) * (x - x0_float)).unsqueeze(-1).to(device)\n        vol_a1 = ((y - y0_float) * (x1_float - x)).unsqueeze(-1).to(device)\n        vol_c1 = ((y - y0_float) * (x - x0_float)).unsqueeze(-1).to(device)\n\n        outputs = (value_a0 * vol_a0 + value_c0 * vol_c0 + value_a1 * vol_a1 +\n                   value_c1 * vol_c1)\n\n        if self.morph == 0:\n            outputs = outputs.reshape([\n                self.num_batch,\n                self.num_points * self.width,\n                1 * self.height,\n                self.num_channels,\n            ])\n            outputs = outputs.permute(0, 3, 1, 2)\n        else:\n            outputs = outputs.reshape([\n                self.num_batch,\n                1 * self.width,\n                self.num_points * self.height,\n                self.num_channels,\n            ])\n            outputs = outputs.permute(0, 3, 1, 2)\n        return outputs\n\n    def deform_conv(self, input, offset, if_offset):\n        y, x = self._coordinate_map_3D(offset, if_offset)\n        deformed_feature = self._bilinear_interpolate_3D(input, y, x)\n        return deformed_feature\n\n\n#### YOLOV5\nclass Bottleneck_DySnake(nn.Module):\n    # Standard bottleneck\n    def __init__(self, c1, c2, shortcut=True, g=1, e=0.5):  # ch_in, ch_out, shortcut, groups, expansion\n        super().__init__()\n        c_ = int(c2 * e)  # hidden channels\n        self.cv1 = Conv(c1, c_, 1, 1)\n        self.cv2 = DySnakeConv(c_, c2, 3)\n        self.add = shortcut and c1 == c2\n\n    def forward(self, x):\n        return x + self.cv2(self.cv1(x)) if self.add else self.cv2(self.cv1(x))\n\nclass C3_DySnake(C3):\n    # C3 module with DySnakeConv\n    def __init__(self, c1, c2, n=1, shortcut=True, g=1, e=0.5):\n        super().__init__(c1, c2, n, shortcut, g, e)\n        c_ = int(c2 * e)\n        self.m = nn.Sequential(*(Bottleneck_DySnake(c_, c_, shortcut, g, e=1.0) for _ in range(n)))"
  },
  {
    "path": "yolo-improve/yolov5-EVC.py",
    "content": "import torch.nn.functional as F\nfrom functools import partial\nfrom timm.models.layers import DropPath, trunc_normal_\n# LVC\nclass Encoding(nn.Module):\n    def __init__(self, in_channels, num_codes):\n        super(Encoding, self).__init__()\n        # init codewords and smoothing factor\n        self.in_channels, self.num_codes = in_channels, num_codes\n        num_codes = 64\n        std = 1. / ((num_codes * in_channels)**0.5)\n        # [num_codes, channels]\n        self.codewords = nn.Parameter(\n            torch.empty(num_codes, in_channels, dtype=torch.float).uniform_(-std, std), requires_grad=True)\n        # [num_codes]\n        self.scale = nn.Parameter(torch.empty(num_codes, dtype=torch.float).uniform_(-1, 0), requires_grad=True)\n\n    @staticmethod\n    def scaled_l2(x, codewords, scale):\n        num_codes, in_channels = codewords.size()\n        b = x.size(0)\n        expanded_x = x.unsqueeze(2).expand((b, x.size(1), num_codes, in_channels))\n\n        # ---处理codebook (num_code, c1)\n        reshaped_codewords = codewords.view((1, 1, num_codes, in_channels))\n\n        # 把scale从1, num_code变成   batch, c2, N, num_codes\n        reshaped_scale = scale.view((1, 1, num_codes))  # N, num_codes\n\n        # ---计算rik = z1 - d  # b, N, num_codes\n        scaled_l2_norm = reshaped_scale * (expanded_x - reshaped_codewords).pow(2).sum(dim=3)\n        return scaled_l2_norm\n\n    @staticmethod\n    def aggregate(assignment_weights, x, codewords):\n        num_codes, in_channels = codewords.size()\n\n        # ---处理codebook\n        reshaped_codewords = codewords.view((1, 1, num_codes, in_channels))\n        b = x.size(0)\n\n        # ---处理特征向量x b, c1, N\n        expanded_x = x.unsqueeze(2).expand((b, x.size(1), num_codes, in_channels))\n\n        #变换rei  b, N, num_codes,-\n        assignment_weights = assignment_weights.unsqueeze(3)  # b, N, num_codes,\n\n        # ---开始计算eik,必须在Rei计算完之后\n        encoded_feat = (assignment_weights * (expanded_x - reshaped_codewords)).sum(1)\n        return encoded_feat\n\n    def forward(self, x):\n        assert x.dim() == 4 and x.size(1) == self.in_channels\n        b, in_channels, w, h = x.size()\n\n        # [batch_size, height x width, channels]\n        x = x.view(b, self.in_channels, -1).transpose(1, 2).contiguous()\n\n        # assignment_weights: [batch_size, channels, num_codes]\n        assignment_weights = torch.softmax(self.scaled_l2(x, self.codewords, self.scale), dim=2)\n\n        # aggregate\n        encoded_feat = self.aggregate(assignment_weights, x, self.codewords)\n        return encoded_feat\n\n\nclass Mlp(nn.Module):\n    \"\"\"\n    Implementation of MLP with 1*1 convolutions. Input: tensor with shape [B, C, H, W]\n    \"\"\"\n    def __init__(self, in_features, hidden_features=None,\n                 out_features=None, act_layer=nn.GELU, drop=0.):\n        super().__init__()\n        out_features = out_features or in_features\n        hidden_features = hidden_features or in_features\n        self.fc1 = nn.Conv2d(in_features, hidden_features, 1)\n        self.act = act_layer()\n        self.fc2 = nn.Conv2d(hidden_features, out_features, 1)\n        self.drop = nn.Dropout(drop)\n        self.apply(self._init_weights)\n\n    def _init_weights(self, m):\n        if isinstance(m, nn.Conv2d):\n            trunc_normal_(m.weight, std=.02)\n            if m.bias is not None:\n                nn.init.constant_(m.bias, 0)\n\n    def forward(self, x):\n        x = self.fc1(x)\n        x = self.act(x)\n        x = self.drop(x)\n        x = self.fc2(x)\n        x = self.drop(x)\n        return x\n\n#  1*1 3*3 1*1\nclass ConvBlock(nn.Module):\n    def __init__(self, in_channels, out_channels, stride=1, res_conv=False, act_layer=nn.ReLU, groups=1, norm_layer=partial(nn.BatchNorm2d, eps=1e-6)):\n        super(ConvBlock, self).__init__()\n        self.in_channels = in_channels\n        expansion = 4\n        c = out_channels // expansion\n\n        self.conv1 = Conv(in_channels, c, act=nn.ReLU())\n        self.conv2 = Conv(c, c, k=3, s=stride, g=groups, act=nn.ReLU())\n\n        self.conv3 = Conv(c, out_channels, 1, act=False)\n        self.act3 = act_layer(inplace=True)\n\n        if res_conv:\n            self.residual_conv = nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=1, padding=0, bias=False)\n            self.residual_bn = norm_layer(out_channels)\n\n        self.res_conv = res_conv\n\n    def zero_init_last_bn(self):\n        nn.init.zeros_(self.bn3.weight)\n\n    def forward(self, x, return_x_2=True):\n        residual = x\n\n        x = self.conv1(x)\n\n        x2 = self.conv2(x) #if x_t_r is None else self.conv2(x + x_t_r)\n\n        x = self.conv3(x2)\n\n        if self.res_conv:\n            residual = self.residual_conv(residual)\n            residual = self.residual_bn(residual)\n\n        x += residual\n        x = self.act3(x)\n\n        if return_x_2:\n            return x, x2\n        else:\n            return x\n\nclass Mean(nn.Module):\n    def __init__(self, dim, keep_dim=False):\n        super(Mean, self).__init__()\n        self.dim = dim\n        self.keep_dim = keep_dim\n\n    def forward(self, input):\n        return input.mean(self.dim, self.keep_dim)\n\nclass LVCBlock(nn.Module):\n    def __init__(self, in_channels, out_channels, num_codes, channel_ratio=0.25, base_channel=64):\n        super(LVCBlock, self).__init__()\n        self.out_channels = out_channels\n        self.num_codes = num_codes\n        num_codes = 64\n\n        self.conv_1 = ConvBlock(in_channels=in_channels, out_channels=in_channels, res_conv=True, stride=1)\n\n        self.LVC = nn.Sequential(\n            Conv(in_channels, in_channels, 1, act=nn.ReLU()),\n            Encoding(in_channels=in_channels, num_codes=num_codes),\n            nn.BatchNorm1d(num_codes),\n            nn.ReLU(inplace=True),\n            Mean(dim=1))\n        self.fc = nn.Sequential(nn.Linear(in_channels, in_channels), nn.Sigmoid())\n\n    def forward(self, x):\n        x = self.conv_1(x, return_x_2=False)\n        en = self.LVC(x)\n        gam = self.fc(en)\n        b, in_channels, _, _ = x.size()\n        y = gam.view(b, in_channels, 1, 1)\n        x = F.relu_(x + x * y)\n        return x\n\nclass GroupNorm(nn.GroupNorm):\n    \"\"\"\n    Group Normalization with 1 group.\n    Input: tensor in shape [B, C, H, W]\n    \"\"\"\n    def __init__(self, num_channels, **kwargs):\n        super().__init__(1, num_channels, **kwargs)\n\nclass DWConv_LMLP(nn.Module):\n    \"\"\"Depthwise Conv + Conv\"\"\"\n    def __init__(self, in_channels, out_channels, ksize, stride=1, act=\"silu\"):\n        super().__init__()\n        self.dconv = Conv(\n            in_channels,\n            in_channels,\n            k=ksize,\n            s=stride,\n            g=in_channels,\n        )\n        self.pconv = Conv(\n            in_channels, out_channels, k=1, s=1, g=1\n        )\n\n    def forward(self, x):\n        x = self.dconv(x)\n        return self.pconv(x)\n\n# LightMLPBlock\nclass LightMLPBlock(nn.Module):\n    def __init__(self, in_channels, out_channels, ksize=1, stride=1, act=\"silu\",\n    mlp_ratio=4., drop=0., act_layer=nn.GELU, \n    use_layer_scale=True, layer_scale_init_value=1e-5, drop_path=0., norm_layer=GroupNorm):  # act_layer=nn.GELU,\n        super().__init__()\n        self.dw = DWConv_LMLP(in_channels, out_channels, ksize=1, stride=1, act=\"silu\")\n        self.linear = nn.Linear(out_channels, out_channels)  # learnable position embedding\n        self.out_channels = out_channels\n\n        self.norm1 = norm_layer(in_channels)\n        self.norm2 = norm_layer(in_channels)\n\n        mlp_hidden_dim = int(in_channels * mlp_ratio)\n        self.mlp = Mlp(in_features=in_channels, hidden_features=mlp_hidden_dim, act_layer=nn.GELU,\n                       drop=drop)\n\n        self.drop_path = DropPath(drop_path) if drop_path > 0. \\\n            else nn.Identity()\n\n        self.use_layer_scale = use_layer_scale\n        if use_layer_scale:\n            self.layer_scale_1 = nn.Parameter(\n                layer_scale_init_value * torch.ones((out_channels)), requires_grad=True)\n            self.layer_scale_2 = nn.Parameter(\n                layer_scale_init_value * torch.ones((out_channels)), requires_grad=True)\n\n    def forward(self, x):\n        if self.use_layer_scale:\n            x = x + self.drop_path(self.layer_scale_1.unsqueeze(-1).unsqueeze(-1) * self.dw(self.norm1(x)))\n            x = x + self.drop_path(self.layer_scale_2.unsqueeze(-1).unsqueeze(-1) * self.mlp(self.norm2(x)))\n        else:\n            x = x + self.drop_path(self.dw(self.norm1(x)))\n            x = x + self.drop_path(self.mlp(self.norm2(x)))\n        return x\n\n\n# EVCBlock\nclass EVCBlock(nn.Module):\n    def __init__(self, in_channels, out_channels, channel_ratio=4, base_channel=16):\n        super().__init__()\n        expansion = 2\n        ch = out_channels * expansion\n        # Stem stage: get the feature maps by conv block (copied form resnet.py) 进入conformer框架之前的处理\n        self.conv1 = Conv(in_channels, in_channels, k=7, act=nn.ReLU())\n        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=1, padding=1)  # 1 / 4 [56, 56]\n\n        # LVC\n        self.lvc = LVCBlock(in_channels=in_channels, out_channels=out_channels, num_codes=64)  # c1值暂时未定\n        # LightMLPBlock\n        self.l_MLP = LightMLPBlock(in_channels, out_channels, ksize=1, stride=1, act=\"silu\", act_layer=nn.GELU, mlp_ratio=4., drop=0.,\n                                     use_layer_scale=True, layer_scale_init_value=1e-5, drop_path=0., norm_layer=GroupNorm)\n        self.cnv1 = nn.Conv2d(ch, out_channels, kernel_size=1, stride=1, padding=0)\n\n    def forward(self, x):\n        x1 = self.maxpool((self.conv1(x)))\n        # LVCBlock\n        x_lvc = self.lvc(x1)\n        # LightMLPBlock\n        x_lmlp = self.l_MLP(x1)\n        # concat\n        x = torch.cat((x_lvc, x_lmlp), dim=1)\n        x = self.cnv1(x)\n        return x\n\n\n\nelif m is EVCBlock:\n    c2 = ch[f]\n    args = [c2, c2]"
  },
  {
    "path": "yolo-improve/yolov5-FasterBlock.py",
    "content": "from timm.models.layers import DropPath\nclass Partial_conv3(nn.Module):\n    def __init__(self, dim, n_div, forward):\n        super().__init__()\n        self.dim_conv3 = dim // n_div\n        self.dim_untouched = dim - self.dim_conv3\n        self.partial_conv3 = nn.Conv2d(self.dim_conv3, self.dim_conv3, 3, 1, 1, bias=False)\n\n        if forward == 'slicing':\n            self.forward = self.forward_slicing\n        elif forward == 'split_cat':\n            self.forward = self.forward_split_cat\n        else:\n            raise NotImplementedError\n\n    def forward_slicing(self, x):\n        # only for inference\n        x = x.clone()   # !!! Keep the original input intact for the residual connection later\n        x[:, :self.dim_conv3, :, :] = self.partial_conv3(x[:, :self.dim_conv3, :, :])\n        return x\n\n    def forward_split_cat(self, x):\n        # for training/inference\n        x1, x2 = torch.split(x, [self.dim_conv3, self.dim_untouched], dim=1)\n        x1 = self.partial_conv3(x1)\n        x = torch.cat((x1, x2), 1)\n        return x\n\nclass Faster_Block(nn.Module):\n    def __init__(self,\n                 inc,\n                 dim,\n                 n_div=4,\n                 mlp_ratio=2,\n                 drop_path=0.1,\n                 layer_scale_init_value=0.0,\n                 pconv_fw_type='split_cat'\n                 ):\n        super().__init__()\n        self.dim = dim\n        self.mlp_ratio = mlp_ratio\n        self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity()\n        self.n_div = n_div\n\n        mlp_hidden_dim = int(dim * mlp_ratio)\n\n        mlp_layer = [\n            Conv(dim, mlp_hidden_dim, 1),\n            nn.Conv2d(mlp_hidden_dim, dim, 1, bias=False)\n        ]\n\n        self.mlp = nn.Sequential(*mlp_layer)\n\n        self.spatial_mixing = Partial_conv3(\n            dim,\n            n_div,\n            pconv_fw_type\n        )\n        \n        self.adjust_channel = None\n        if inc != dim:\n            self.adjust_channel = Conv(inc, dim, 1)\n\n        if layer_scale_init_value > 0:\n            self.layer_scale = nn.Parameter(layer_scale_init_value * torch.ones((dim)), requires_grad=True)\n            self.forward = self.forward_layer_scale\n        else:\n            self.forward = self.forward\n\n    def forward(self, x):\n        if self.adjust_channel is not None:\n            x = self.adjust_channel(x)\n        shortcut = x\n        x = self.spatial_mixing(x)\n        x = shortcut + self.drop_path(self.mlp(x))\n        return x\n\n    def forward_layer_scale(self, x):\n        shortcut = x\n        x = self.spatial_mixing(x)\n        x = shortcut + self.drop_path(\n            self.layer_scale.unsqueeze(-1).unsqueeze(-1) * self.mlp(x))\n        return x\n\nclass C3_Faster(C3):\n    # C3 module with cross-convolutions\n    def __init__(self, c1, c2, n=1, shortcut=True, g=1, e=0.5):\n        super().__init__(c1, c2, n, shortcut, g, e)\n        c_ = int(c2 * e)\n        self.m = nn.Sequential(*(Faster_Block(c_, c_) for _ in range(n)))"
  },
  {
    "path": "yolo-improve/yolov5-GFPN/extra_modules.py",
    "content": "import torch\nimport torch.nn as nn\nimport torch.nn.functional as F\n\ndef conv_bn(in_channels, out_channels, kernel_size, stride, padding, groups=1):\n    '''Basic cell for rep-style block, including conv and bn'''\n    result = nn.Sequential()\n    result.add_module(\n        'conv',\n        nn.Conv2d(in_channels=in_channels,\n                  out_channels=out_channels,\n                  kernel_size=kernel_size,\n                  stride=stride,\n                  padding=padding,\n                  groups=groups,\n                  bias=False))\n    result.add_module('bn', nn.BatchNorm2d(num_features=out_channels))\n    return result\n\nclass RepConv(nn.Module):\n    '''RepConv is a basic rep-style block, including training and deploy status\n    Code is based on https://github.com/DingXiaoH/RepVGG/blob/main/repvgg.py\n    '''\n    def __init__(self,\n                 in_channels,\n                 out_channels,\n                 kernel_size=3,\n                 stride=1,\n                 padding=1,\n                 dilation=1,\n                 groups=1,\n                 padding_mode='zeros',\n                 deploy=False,\n                 act='relu',\n                 norm=None):\n        super(RepConv, self).__init__()\n        self.deploy = deploy\n        self.groups = groups\n        self.in_channels = in_channels\n        self.out_channels = out_channels\n\n        assert kernel_size == 3\n        assert padding == 1\n\n        padding_11 = padding - kernel_size // 2\n\n        if isinstance(act, str):\n            self.nonlinearity = get_activation(act)\n        else:\n            self.nonlinearity = act\n\n        if deploy:\n            self.rbr_reparam = nn.Conv2d(in_channels=in_channels,\n                                         out_channels=out_channels,\n                                         kernel_size=kernel_size,\n                                         stride=stride,\n                                         padding=padding,\n                                         dilation=dilation,\n                                         groups=groups,\n                                         bias=True,\n                                         padding_mode=padding_mode)\n\n        else:\n            self.rbr_identity = None\n            self.rbr_dense = conv_bn(in_channels=in_channels,\n                                     out_channels=out_channels,\n                                     kernel_size=kernel_size,\n                                     stride=stride,\n                                     padding=padding,\n                                     groups=groups)\n            self.rbr_1x1 = conv_bn(in_channels=in_channels,\n                                   out_channels=out_channels,\n                                   kernel_size=1,\n                                   stride=stride,\n                                   padding=padding_11,\n                                   groups=groups)\n\n    def forward(self, inputs):\n        '''Forward process'''\n        if hasattr(self, 'rbr_reparam'):\n            return self.nonlinearity(self.rbr_reparam(inputs))\n\n        if self.rbr_identity is None:\n            id_out = 0\n        else:\n            id_out = self.rbr_identity(inputs)\n\n        return self.nonlinearity(\n            self.rbr_dense(inputs) + self.rbr_1x1(inputs) + id_out)\n\n    def get_equivalent_kernel_bias(self):\n        kernel3x3, bias3x3 = self._fuse_bn_tensor(self.rbr_dense)\n        kernel1x1, bias1x1 = self._fuse_bn_tensor(self.rbr_1x1)\n        kernelid, biasid = self._fuse_bn_tensor(self.rbr_identity)\n        return kernel3x3 + self._pad_1x1_to_3x3_tensor(\n            kernel1x1) + kernelid, bias3x3 + bias1x1 + biasid\n\n    def _pad_1x1_to_3x3_tensor(self, kernel1x1):\n        if kernel1x1 is None:\n            return 0\n        else:\n            return torch.nn.functional.pad(kernel1x1, [1, 1, 1, 1])\n\n    def _fuse_bn_tensor(self, branch):\n        if branch is None:\n            return 0, 0\n        if isinstance(branch, nn.Sequential):\n            kernel = branch.conv.weight\n            running_mean = branch.bn.running_mean\n            running_var = branch.bn.running_var\n            gamma = branch.bn.weight\n            beta = branch.bn.bias\n            eps = branch.bn.eps\n        else:\n            assert isinstance(branch, nn.BatchNorm2d)\n            if not hasattr(self, 'id_tensor'):\n                input_dim = self.in_channels // self.groups\n                kernel_value = np.zeros((self.in_channels, input_dim, 3, 3),\n                                        dtype=np.float32)\n                for i in range(self.in_channels):\n                    kernel_value[i, i % input_dim, 1, 1] = 1\n                self.id_tensor = torch.from_numpy(kernel_value).to(\n                    branch.weight.device)\n            kernel = self.id_tensor\n            running_mean = branch.running_mean\n            running_var = branch.running_var\n            gamma = branch.weight\n            beta = branch.bias\n            eps = branch.eps\n        std = (running_var + eps).sqrt()\n        t = (gamma / std).reshape(-1, 1, 1, 1)\n        return kernel * t, beta - running_mean * gamma / std\n\n    def switch_to_deploy(self):\n        if hasattr(self, 'rbr_reparam'):\n            return\n        kernel, bias = self.get_equivalent_kernel_bias()\n        self.rbr_reparam = nn.Conv2d(\n            in_channels=self.rbr_dense.conv.in_channels,\n            out_channels=self.rbr_dense.conv.out_channels,\n            kernel_size=self.rbr_dense.conv.kernel_size,\n            stride=self.rbr_dense.conv.stride,\n            padding=self.rbr_dense.conv.padding,\n            dilation=self.rbr_dense.conv.dilation,\n            groups=self.rbr_dense.conv.groups,\n            bias=True)\n        self.rbr_reparam.weight.data = kernel\n        self.rbr_reparam.bias.data = bias\n        for para in self.parameters():\n            para.detach_()\n        self.__delattr__('rbr_dense')\n        self.__delattr__('rbr_1x1')\n        if hasattr(self, 'rbr_identity'):\n            self.__delattr__('rbr_identity')\n        if hasattr(self, 'id_tensor'):\n            self.__delattr__('id_tensor')\n        self.deploy = True\n\nclass Swish(nn.Module):\n    def __init__(self, inplace=True):\n        super(Swish, self).__init__()\n        self.inplace = inplace\n\n    def forward(self, x):\n        if self.inplace:\n            x.mul_(F.sigmoid(x))\n            return x\n        else:\n            return x * F.sigmoid(x)\n\ndef get_activation(name='silu', inplace=True):\n    if name is None:\n        return nn.Identity()\n\n    if isinstance(name, str):\n        if name == 'silu':\n            module = nn.SiLU(inplace=inplace)\n        elif name == 'relu':\n            module = nn.ReLU(inplace=inplace)\n        elif name == 'lrelu':\n            module = nn.LeakyReLU(0.1, inplace=inplace)\n        elif name == 'swish':\n            module = Swish(inplace=inplace)\n        elif name == 'hardsigmoid':\n            module = nn.Hardsigmoid(inplace=inplace)\n        elif name == 'identity':\n            module = nn.Identity()\n        else:\n            raise AttributeError('Unsupported act type: {}'.format(name))\n        return module\n\n    elif isinstance(name, nn.Module):\n        return name\n\n    else:\n        raise AttributeError('Unsupported act type: {}'.format(name))\n\ndef get_norm(name, out_channels, inplace=True):\n    if name == 'bn':\n        module = nn.BatchNorm2d(out_channels)\n    else:\n        raise NotImplementedError\n    return module\n\nclass ConvBNAct(nn.Module):\n    \"\"\"A Conv2d -> Batchnorm -> silu/leaky relu block\"\"\"\n    def __init__(\n        self,\n        in_channels,\n        out_channels,\n        ksize,\n        stride=1,\n        groups=1,\n        bias=False,\n        act='silu',\n        norm='bn',\n        reparam=False,\n    ):\n        super().__init__()\n        # same padding\n        pad = (ksize - 1) // 2\n        self.conv = nn.Conv2d(\n            in_channels,\n            out_channels,\n            kernel_size=ksize,\n            stride=stride,\n            padding=pad,\n            groups=groups,\n            bias=bias,\n        )\n        if norm is not None:\n            self.bn = get_norm(norm, out_channels, inplace=True)\n        if act is not None:\n            self.act = get_activation(act, inplace=True)\n        self.with_norm = norm is not None\n        self.with_act = act is not None\n\n    def forward(self, x):\n        x = self.conv(x)\n        if self.with_norm:\n            x = self.bn(x)\n        if self.with_act:\n            x = self.act(x)\n        return x\n\n    def fuseforward(self, x):\n        return self.act(self.conv(x))\n\nclass BasicBlock_3x3_Reverse(nn.Module):\n    def __init__(self,\n                 ch_in,\n                 ch_hidden_ratio,\n                 ch_out,\n                 act='relu',\n                 shortcut=True):\n        super(BasicBlock_3x3_Reverse, self).__init__()\n        assert ch_in == ch_out\n        ch_hidden = int(ch_in * ch_hidden_ratio)\n        self.conv1 = ConvBNAct(ch_hidden, ch_out, 3, stride=1, act=act)\n        self.conv2 = RepConv(ch_in, ch_hidden, 3, stride=1, act=act)\n        self.shortcut = shortcut\n\n    def forward(self, x):\n        y = self.conv2(x)\n        y = self.conv1(y)\n        if self.shortcut:\n            return x + y\n        else:\n            return y\n\nclass SPP(nn.Module):\n    def __init__(\n        self,\n        ch_in,\n        ch_out,\n        k,\n        pool_size,\n        act='swish',\n    ):\n        super(SPP, self).__init__()\n        self.pool = []\n        for i, size in enumerate(pool_size):\n            pool = nn.MaxPool2d(kernel_size=size,\n                                stride=1,\n                                padding=size // 2,\n                                ceil_mode=False)\n            self.add_module('pool{}'.format(i), pool)\n            self.pool.append(pool)\n        self.conv = ConvBNAct(ch_in, ch_out, k, act=act)\n\n    def forward(self, x):\n        outs = [x]\n\n        for pool in self.pool:\n            outs.append(pool(x))\n        y = torch.cat(outs, axis=1)\n\n        y = self.conv(y)\n        return y\n\nclass CSPStage(nn.Module):\n    def __init__(self,\n                 ch_in,\n                 ch_out,\n                 n,\n                 block_fn='BasicBlock_3x3_Reverse',\n                 ch_hidden_ratio=1.0,\n                 act='silu',\n                 spp=False):\n        super(CSPStage, self).__init__()\n\n        split_ratio = 2\n        ch_first = int(ch_out // split_ratio)\n        ch_mid = int(ch_out - ch_first)\n        self.conv1 = ConvBNAct(ch_in, ch_first, 1, act=act)\n        self.conv2 = ConvBNAct(ch_in, ch_mid, 1, act=act)\n        self.convs = nn.Sequential()\n\n        next_ch_in = ch_mid\n        for i in range(n):\n            if block_fn == 'BasicBlock_3x3_Reverse':\n                self.convs.add_module(\n                    str(i),\n                    BasicBlock_3x3_Reverse(next_ch_in,\n                                           ch_hidden_ratio,\n                                           ch_mid,\n                                           act=act,\n                                           shortcut=True))\n            else:\n                raise NotImplementedError\n            if i == (n - 1) // 2 and spp:\n                self.convs.add_module(\n                    'spp', SPP(ch_mid * 4, ch_mid, 1, [5, 9, 13], act=act))\n            next_ch_in = ch_mid\n        self.conv3 = ConvBNAct(ch_mid * n + ch_first, ch_out, 1, act=act)\n\n    def forward(self, x):\n        y1 = self.conv1(x)\n        y2 = self.conv2(x)\n\n        mid_out = [y1]\n        for conv in self.convs:\n            y2 = conv(y2)\n            mid_out.append(y2)\n        y = torch.cat(mid_out, axis=1)\n        y = self.conv3(y)\n        return y\n\n"
  },
  {
    "path": "yolo-improve/yolov5-GFPN/yolov5_GFPN.yaml",
    "content": "# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n\n# Parameters\nnc: 80  # number of classes\ndepth_multiple: 0.33  # model depth multiple\nwidth_multiple: 0.25  # layer channel multiple\nanchors:\n  - [10,13, 16,30, 33,23]  # P3/8\n  - [30,61, 62,45, 59,119]  # P4/16\n  - [116,90, 156,198, 373,326]  # P5/32\n\n# YOLOv5 v6.0 backbone\nbackbone:\n  # [from, number, module, args]\n  [[-1, 1, Conv, [64, 6, 2, 2]],  # 0-P1/2\n   [-1, 1, Conv, [128, 3, 2]],  # 1-P2/4\n   [-1, 3, C3, [128]],\n   [-1, 1, Conv, [256, 3, 2]],  # 3-P3/8\n   [-1, 6, C3, [256]],\n   [-1, 1, Conv, [512, 3, 2]],  # 5-P4/16\n   [-1, 9, C3, [512]],\n   [-1, 1, Conv, [1024, 3, 2]],  # 7-P5/32\n   [-1, 3, C3, [1024]],\n   [-1, 1, SPPF, [1024, 5]],  # 9\n  ]\n\n# DAMO-YOLO GFPN Head\nhead:\n  [[-1, 1, Conv, [512, 1, 1]], # 10\n   [6, 1, Conv, [512, 3, 2]],\n   [[-1, 10], 1, Concat, [1]],\n   [-1, 3, CSPStage, [512]], # 13\n\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']], #14\n   [4, 1, Conv, [256, 3, 2]], # 15\n   [[14, -1, 6], 1, Concat, [1]],\n   [-1, 3, CSPStage, [512]], # 17\n\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [[-1, 4], 1, Concat, [1]],\n   [-1, 3, CSPStage, [256]], # 20\n\n   [-1, 1, Conv, [256, 3, 2]],\n   [[-1, 17], 1, Concat, [1]],\n   [-1, 3, CSPStage, [512]], # 23\n\n   [17, 1, Conv, [256, 3, 2]], # 24\n   [23, 1, Conv, [256, 3, 2]], # 25\n   [[13, 24, -1], 1, Concat, [1]],\n   [-1, 3, CSPStage, [1024]], # 27\n\n   [[20, 23, 27], 1, Detect, [nc, anchors]],  # Detect(P3, P4, P5)\n  ]"
  },
  {
    "path": "yolo-improve/yolov5-GOLDYOLO/common.py",
    "content": "import torch.nn.functional as F\n\ndef conv_bn(in_channels, out_channels, kernel_size, stride, padding, groups=1, bias=False):\n    '''Basic cell for rep-style block, including conv and bn'''\n    result = nn.Sequential()\n    result.add_module('conv', nn.Conv2d(in_channels=in_channels, out_channels=out_channels,\n                                        kernel_size=kernel_size, stride=stride, padding=padding, groups=groups,\n                                        bias=bias))\n    result.add_module('bn', nn.BatchNorm2d(num_features=out_channels))\n    return result\n\nclass RepVGGBlock(nn.Module):\n    '''RepVGGBlock is a basic rep-style block, including training and deploy status\n    This code is based on https://github.com/DingXiaoH/RepVGG/blob/main/repvgg.py\n    '''\n    \n    def __init__(self, in_channels, out_channels, kernel_size=3,\n                 stride=1, padding=1, dilation=1, groups=1, padding_mode='zeros', deploy=False, use_se=False):\n        super(RepVGGBlock, self).__init__()\n        \"\"\" Initialization of the class.\n        Args:\n            in_channels (int): Number of channels in the input image\n            out_channels (int): Number of channels produced by the convolution\n            kernel_size (int or tuple): Size of the convolving kernel\n            stride (int or tuple, optional): Stride of the convolution. Default: 1\n            padding (int or tuple, optional): Zero-padding added to both sides of\n                the input. Default: 1\n            dilation (int or tuple, optional): Spacing between kernel elements. Default: 1\n            groups (int, optional): Number of blocked connections from input\n                channels to output channels. Default: 1\n            padding_mode (string, optional): Default: 'zeros'\n            deploy: Whether to be deploy status or training status. Default: False\n            use_se: Whether to use se. Default: False\n        \"\"\"\n        self.deploy = deploy\n        self.groups = groups\n        self.in_channels = in_channels\n        self.out_channels = out_channels\n        \n        assert kernel_size == 3\n        assert padding == 1\n        \n        padding_11 = padding - kernel_size // 2\n        \n        self.nonlinearity = nn.ReLU()\n        \n        if use_se:\n            raise NotImplementedError(\"se block not supported yet\")\n        else:\n            self.se = nn.Identity()\n        \n        if deploy:\n            self.rbr_reparam = nn.Conv2d(in_channels=in_channels, out_channels=out_channels, kernel_size=kernel_size,\n                                         stride=stride,\n                                         padding=padding, dilation=dilation, groups=groups, bias=True,\n                                         padding_mode=padding_mode)\n        \n        else:\n            self.rbr_identity = nn.BatchNorm2d(\n                    num_features=in_channels) if out_channels == in_channels and stride == 1 else None\n            self.rbr_dense = conv_bn(in_channels=in_channels, out_channels=out_channels, kernel_size=kernel_size,\n                                     stride=stride, padding=padding, groups=groups)\n            self.rbr_1x1 = conv_bn(in_channels=in_channels, out_channels=out_channels, kernel_size=1, stride=stride,\n                                   padding=padding_11, groups=groups)\n    \n    def forward(self, inputs):\n        '''Forward process'''\n        if hasattr(self, 'rbr_reparam'):\n            return self.nonlinearity(self.se(self.rbr_reparam(inputs)))\n        \n        if self.rbr_identity is None:\n            id_out = 0\n        else:\n            id_out = self.rbr_identity(inputs)\n        \n        return self.nonlinearity(self.se(self.rbr_dense(inputs) + self.rbr_1x1(inputs) + id_out))\n    \n    def get_equivalent_kernel_bias(self):\n        kernel3x3, bias3x3 = self._fuse_bn_tensor(self.rbr_dense)\n        kernel1x1, bias1x1 = self._fuse_bn_tensor(self.rbr_1x1)\n        kernelid, biasid = self._fuse_bn_tensor(self.rbr_identity)\n        return kernel3x3 + self._pad_1x1_to_3x3_tensor(kernel1x1) + kernelid, bias3x3 + bias1x1 + biasid\n    \n    def _pad_1x1_to_3x3_tensor(self, kernel1x1):\n        if kernel1x1 is None:\n            return 0\n        else:\n            return torch.nn.functional.pad(kernel1x1, [1, 1, 1, 1])\n    \n    def _fuse_bn_tensor(self, branch):\n        if branch is None:\n            return 0, 0\n        if isinstance(branch, nn.Sequential):\n            kernel = branch.conv.weight\n            running_mean = branch.bn.running_mean\n            running_var = branch.bn.running_var\n            gamma = branch.bn.weight\n            beta = branch.bn.bias\n            eps = branch.bn.eps\n        else:\n            assert isinstance(branch, nn.BatchNorm2d)\n            if not hasattr(self, 'id_tensor'):\n                input_dim = self.in_channels // self.groups\n                kernel_value = np.zeros((self.in_channels, input_dim, 3, 3), dtype=np.float32)\n                for i in range(self.in_channels):\n                    kernel_value[i, i % input_dim, 1, 1] = 1\n                self.id_tensor = torch.from_numpy(kernel_value).to(branch.weight.device)\n            kernel = self.id_tensor\n            running_mean = branch.running_mean\n            running_var = branch.running_var\n            gamma = branch.weight\n            beta = branch.bias\n            eps = branch.eps\n        std = (running_var + eps).sqrt()\n        t = (gamma / std).reshape(-1, 1, 1, 1)\n        return kernel * t, beta - running_mean * gamma / std\n    \n    def switch_to_deploy(self):\n        if hasattr(self, 'rbr_reparam'):\n            return\n        kernel, bias = self.get_equivalent_kernel_bias()\n        self.rbr_reparam = nn.Conv2d(in_channels=self.rbr_dense.conv.in_channels,\n                                     out_channels=self.rbr_dense.conv.out_channels,\n                                     kernel_size=self.rbr_dense.conv.kernel_size, stride=self.rbr_dense.conv.stride,\n                                     padding=self.rbr_dense.conv.padding, dilation=self.rbr_dense.conv.dilation,\n                                     groups=self.rbr_dense.conv.groups, bias=True)\n        self.rbr_reparam.weight.data = kernel\n        self.rbr_reparam.bias.data = bias\n        for para in self.parameters():\n            para.detach_()\n        self.__delattr__('rbr_dense')\n        self.__delattr__('rbr_1x1')\n        if hasattr(self, 'rbr_identity'):\n            self.__delattr__('rbr_identity')\n        if hasattr(self, 'id_tensor'):\n            self.__delattr__('id_tensor')\n        self.deploy = True\n\ndef onnx_AdaptiveAvgPool2d(x, output_size):\n    stride_size = np.floor(np.array(x.shape[-2:]) / output_size).astype(np.int32)\n    kernel_size = np.array(x.shape[-2:]) - (output_size - 1) * stride_size\n    avg = nn.AvgPool2d(kernel_size=list(kernel_size), stride=list(stride_size))\n    x = avg(x)\n    return x\n\ndef get_avg_pool():\n    if torch.onnx.is_in_onnx_export():\n        avg_pool = onnx_AdaptiveAvgPool2d\n    else:\n        avg_pool = nn.functional.adaptive_avg_pool2d\n    return avg_pool\n\nclass SimFusion_3in(nn.Module):\n    def __init__(self, in_channel_list, out_channels):\n        super().__init__()\n        self.cv1 = Conv(in_channel_list[0], out_channels, act=nn.ReLU()) if in_channel_list[0] != out_channels else nn.Identity()\n        self.cv2 = Conv(in_channel_list[1], out_channels, act=nn.ReLU()) if in_channel_list[1] != out_channels else nn.Identity()\n        self.cv3 = Conv(in_channel_list[2], out_channels, act=nn.ReLU()) if in_channel_list[2] != out_channels else nn.Identity()\n        self.cv_fuse = Conv(out_channels * 3, out_channels, act=nn.ReLU())\n        self.downsample = nn.functional.adaptive_avg_pool2d\n    \n    def forward(self, x):\n        N, C, H, W = x[1].shape\n        output_size = (H, W)\n        \n        if torch.onnx.is_in_onnx_export():\n            self.downsample = onnx_AdaptiveAvgPool2d\n            output_size = np.array([H, W])\n        \n        x0 = self.cv1(self.downsample(x[0], output_size))\n        x1 = self.cv2(x[1])\n        x2 = self.cv3(F.interpolate(x[2], size=(H, W), mode='bilinear', align_corners=False))\n        return self.cv_fuse(torch.cat((x0, x1, x2), dim=1))\n\nclass SimFusion_4in(nn.Module):\n    def __init__(self):\n        super().__init__()\n        self.avg_pool = nn.functional.adaptive_avg_pool2d\n    \n    def forward(self, x):\n        x_l, x_m, x_s, x_n = x\n        B, C, H, W = x_s.shape\n        output_size = np.array([H, W])\n        \n        if torch.onnx.is_in_onnx_export():\n            self.avg_pool = onnx_AdaptiveAvgPool2d\n        \n        x_l = self.avg_pool(x_l, output_size)\n        x_m = self.avg_pool(x_m, output_size)\n        x_n = F.interpolate(x_n, size=(H, W), mode='bilinear', align_corners=False)\n        \n        out = torch.cat([x_l, x_m, x_s, x_n], 1)\n        return out\n\nclass IFM(nn.Module):\n    def __init__(self, inc, ouc, embed_dim_p=96, fuse_block_num=3) -> None:\n        super().__init__()\n        \n        self.conv = nn.Sequential(\n            Conv(inc, embed_dim_p),\n            *[RepVGGBlock(embed_dim_p, embed_dim_p) for _ in range(fuse_block_num)],\n            Conv(embed_dim_p, sum(ouc))\n        )\n    \n    def forward(self, x):\n        return self.conv(x)\n\nclass h_sigmoid(nn.Module):\n    def __init__(self, inplace=True):\n        super(h_sigmoid, self).__init__()\n        self.relu = nn.ReLU6(inplace=inplace)\n    \n    def forward(self, x):\n        return self.relu(x + 3) / 6\n\nclass InjectionMultiSum_Auto_pool(nn.Module):\n    def __init__(\n            self,\n            inp: int,\n            oup: int,\n            global_inp: list,\n            flag: int\n    ) -> None:\n        super().__init__()\n        self.global_inp = global_inp\n        self.flag = flag\n        self.local_embedding = Conv(inp, oup, 1, act=False)\n        self.global_embedding = Conv(global_inp[self.flag], oup, 1, act=False)\n        self.global_act = Conv(global_inp[self.flag], oup, 1, act=False)\n        self.act = h_sigmoid()\n    \n    def forward(self, x):\n        '''\n        x_g: global features\n        x_l: local features\n        '''\n        x_l, x_g = x\n        B, C, H, W = x_l.shape\n        g_B, g_C, g_H, g_W = x_g.shape\n        use_pool = H < g_H\n        \n        gloabl_info = x_g.split(self.global_inp, dim=1)[self.flag]\n        \n        local_feat = self.local_embedding(x_l)\n        \n        global_act = self.global_act(gloabl_info)\n        global_feat = self.global_embedding(gloabl_info)\n        \n        if use_pool:\n            avg_pool = get_avg_pool()\n            output_size = np.array([H, W])\n            \n            sig_act = avg_pool(global_act, output_size)\n            global_feat = avg_pool(global_feat, output_size)\n        \n        else:\n            sig_act = F.interpolate(self.act(global_act), size=(H, W), mode='bilinear', align_corners=False)\n            global_feat = F.interpolate(global_feat, size=(H, W), mode='bilinear', align_corners=False)\n        \n        out = local_feat * sig_act + global_feat\n        return out\n\ndef get_shape(tensor):\n    shape = tensor.shape\n    if torch.onnx.is_in_onnx_export():\n        shape = [i.cpu().numpy() for i in shape]\n    return shape\n\nclass PyramidPoolAgg(nn.Module):\n    def __init__(self, inc, ouc, stride, pool_mode='torch'):\n        super().__init__()\n        self.stride = stride\n        if pool_mode == 'torch':\n            self.pool = nn.functional.adaptive_avg_pool2d\n        elif pool_mode == 'onnx':\n            self.pool = onnx_AdaptiveAvgPool2d\n        self.conv = Conv(inc, ouc)\n    \n    def forward(self, inputs):\n        B, C, H, W = get_shape(inputs[-1])\n        H = (H - 1) // self.stride + 1\n        W = (W - 1) // self.stride + 1\n        \n        output_size = np.array([H, W])\n        \n        if not hasattr(self, 'pool'):\n            self.pool = nn.functional.adaptive_avg_pool2d\n        \n        if torch.onnx.is_in_onnx_export():\n            self.pool = onnx_AdaptiveAvgPool2d\n        \n        out = [self.pool(inp, output_size) for inp in inputs]\n        \n        return self.conv(torch.cat(out, dim=1))\n\ndef drop_path(x, drop_prob: float = 0., training: bool = False):\n    \"\"\"Drop paths (Stochastic Depth) per sample (when applied in main path of residual blocks).\n    This is the same as the DropConnect impl I created for EfficientNet, etc networks, however,\n    the original name is misleading as 'Drop Connect' is a different form of dropout in a separate paper...\n    See discussion: https://github.com/tensorflow/tpu/issues/494#issuecomment-532968956 ... I've opted for\n    changing the layer and argument names to 'drop path' rather than mix DropConnect as a layer name and use\n    'survival rate' as the argument.\n    \"\"\"\n    if drop_prob == 0. or not training:\n        return x\n    keep_prob = 1 - drop_prob\n    shape = (x.shape[0],) + (1,) * (x.ndim - 1)  # work with diff dim tensors, not just 2D ConvNets\n    random_tensor = keep_prob + torch.rand(shape, dtype=x.dtype, device=x.device)\n    random_tensor.floor_()  # binarize\n    output = x.div(keep_prob) * random_tensor\n    return output\n\nclass Mlp(nn.Module):\n    def __init__(self, in_features, hidden_features=None, out_features=None, drop=0.):\n        super().__init__()\n        out_features = out_features or in_features\n        hidden_features = hidden_features or in_features\n        self.fc1 = Conv(in_features, hidden_features, act=False)\n        self.dwconv = nn.Conv2d(hidden_features, hidden_features, 3, 1, 1, bias=True, groups=hidden_features)\n        self.act = nn.ReLU6()\n        self.fc2 = Conv(hidden_features, out_features, act=False)\n        self.drop = nn.Dropout(drop)\n    \n    def forward(self, x):\n        x = self.fc1(x)\n        x = self.dwconv(x)\n        x = self.act(x)\n        x = self.drop(x)\n        x = self.fc2(x)\n        x = self.drop(x)\n        return x\n\nclass DropPath(nn.Module):\n    \"\"\"Drop paths (Stochastic Depth) per sample  (when applied in main path of residual blocks).\n    \"\"\"\n    \n    def __init__(self, drop_prob=None):\n        super(DropPath, self).__init__()\n        self.drop_prob = drop_prob\n    \n    def forward(self, x):\n        return drop_path(x, self.drop_prob, self.training)\n\nclass Attention(torch.nn.Module):\n    def __init__(self, dim, key_dim, num_heads, attn_ratio=4):\n        super().__init__()\n        self.num_heads = num_heads\n        self.scale = key_dim ** -0.5\n        self.key_dim = key_dim\n        self.nh_kd = nh_kd = key_dim * num_heads  # num_head key_dim\n        self.d = int(attn_ratio * key_dim)\n        self.dh = int(attn_ratio * key_dim) * num_heads\n        self.attn_ratio = attn_ratio\n        \n        self.to_q = Conv(dim, nh_kd, 1, act=False)\n        self.to_k = Conv(dim, nh_kd, 1, act=False)\n        self.to_v = Conv(dim, self.dh, 1, act=False)\n        \n        self.proj = torch.nn.Sequential(nn.ReLU6(), Conv(self.dh, dim, act=False))\n    \n    def forward(self, x):  # x (B,N,C)\n        B, C, H, W = get_shape(x)\n        \n        qq = self.to_q(x).reshape(B, self.num_heads, self.key_dim, H * W).permute(0, 1, 3, 2)\n        kk = self.to_k(x).reshape(B, self.num_heads, self.key_dim, H * W)\n        vv = self.to_v(x).reshape(B, self.num_heads, self.d, H * W).permute(0, 1, 3, 2)\n        \n        attn = torch.matmul(qq, kk)\n        attn = attn.softmax(dim=-1)  # dim = k\n        \n        xx = torch.matmul(attn, vv)\n        \n        xx = xx.permute(0, 1, 3, 2).reshape(B, self.dh, H, W)\n        xx = self.proj(xx)\n        return xx\n\nclass top_Block(nn.Module):\n    \n    def __init__(self, dim, key_dim, num_heads, mlp_ratio=4., attn_ratio=2., drop=0.,\n                 drop_path=0.):\n        super().__init__()\n        self.dim = dim\n        self.num_heads = num_heads\n        self.mlp_ratio = mlp_ratio\n        \n        self.attn = Attention(dim, key_dim=key_dim, num_heads=num_heads, attn_ratio=attn_ratio)\n        \n        # NOTE: drop path for stochastic depth, we shall see if this is better than dropout here\n        self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity()\n        mlp_hidden_dim = int(dim * mlp_ratio)\n        self.mlp = Mlp(in_features=dim, hidden_features=mlp_hidden_dim, drop=drop)\n    \n    def forward(self, x1):\n        x1 = x1 + self.drop_path(self.attn(x1))\n        x1 = x1 + self.drop_path(self.mlp(x1))\n        return x1\n\nclass TopBasicLayer(nn.Module):\n    def __init__(self, embedding_dim, ouc_list, block_num=2, key_dim=8, num_heads=4,\n                 mlp_ratio=4., attn_ratio=2., drop=0., attn_drop=0., drop_path=0.):\n        super().__init__()\n        self.block_num = block_num\n        \n        self.transformer_blocks = nn.ModuleList()\n        for i in range(self.block_num):\n            self.transformer_blocks.append(top_Block(\n                    embedding_dim, key_dim=key_dim, num_heads=num_heads,\n                    mlp_ratio=mlp_ratio, attn_ratio=attn_ratio,\n                    drop=drop, drop_path=drop_path[i] if isinstance(drop_path, list) else drop_path))\n        self.conv = nn.Conv2d(embedding_dim, sum(ouc_list), 1)\n        \n    def forward(self, x):\n        # token * N \n        for i in range(self.block_num):\n            x = self.transformer_blocks[i](x)\n        return self.conv(x)\n\nclass AdvPoolFusion(nn.Module):\n    def forward(self, x):\n        x1, x2 = x\n        if torch.onnx.is_in_onnx_export():\n            self.pool = onnx_AdaptiveAvgPool2d\n        else:\n            self.pool = nn.functional.adaptive_avg_pool2d\n        \n        N, C, H, W = x2.shape\n        output_size = np.array([H, W])\n        x1 = self.pool(x1, output_size)\n        \n        return torch.cat([x1, x2], 1)"
  },
  {
    "path": "yolo-improve/yolov5-GOLDYOLO/yolo.py",
    "content": "elif m is SimFusion_4in:\n    c2 = sum(ch[x] for x in f)\nelif m is SimFusion_3in:\n    c2 = args[0]\n    if c2 != no:  # if not output\n        c2 = make_divisible(c2 * gw, 8)\n    args = [[ch[f_] for f_ in f], c2]\nelif m is IFM:\n    c1 = ch[f]\n    c2 = sum(args[0])\n    args = [c1, *args]\nelif m is InjectionMultiSum_Auto_pool:\n    c1 = ch[f[0]]\n    c2 = args[0]\n    args = [c1, *args]\nelif m is PyramidPoolAgg:\n    c2 = args[0]\n    args = [sum([ch[f_] for f_ in f]), *args]\nelif m is AdvPoolFusion:\n    c2 = sum(ch[x] for x in f)\nelif m is TopBasicLayer:\n    c2 = sum(args[1])"
  },
  {
    "path": "yolo-improve/yolov5-GOLDYOLO/yolov5n-goldyolo.yaml",
    "content": "# YOLOv5 🚀 by Ultralytics, AGPL-3.0 license\n\n# Parameters\nnc: 80  # number of classes\ndepth_multiple: 0.33  # model depth multiple\nwidth_multiple: 0.25  # layer channel multiple\nanchors:\n  - [10,13, 16,30, 33,23]  # P3/8\n  - [30,61, 62,45, 59,119]  # P4/16\n  - [116,90, 156,198, 373,326]  # P5/32\n\n# YOLOv5 v6.0 backbone\nbackbone:\n  # [from, number, module, args]\n  [[-1, 1, Conv, [64, 6, 2, 2]],  # 0-P1/2\n   [-1, 1, Conv, [128, 3, 2]],  # 1-P2/4\n   [-1, 3, C3, [128]],\n   [-1, 1, Conv, [256, 3, 2]],  # 3-P3/8\n   [-1, 6, C3, [256]],\n   [-1, 1, Conv, [512, 3, 2]],  # 5-P4/16\n   [-1, 9, C3, [512]],\n   [-1, 1, Conv, [1024, 3, 2]],  # 7-P5/32\n   [-1, 3, C3, [1024]],\n   [-1, 1, SPPF, [1024, 5]],  # 9\n  ]\n\n# YOLOv5 v6.0 head\nhead:  \n  [[[2, 4, 6, 9], 1, SimFusion_4in, []], # 10\n   [-1, 1, IFM, [[64, 32]]], # 11\n   \n   [9, 1, Conv, [512, 1, 1]], # 12\n   [[4, 6, -1], 1, SimFusion_3in, [512]], # 13\n   [[-1, 11], 1, InjectionMultiSum_Auto_pool, [512, [64, 32], 0]], # 14\n   [-1, 3, C3, [512, False]], # 15\n\n   [6, 1, Conv, [256, 1, 1]], # 16\n   [[2, 4, -1], 1, SimFusion_3in, [256]], # 17\n   [[-1, 11], 1, InjectionMultiSum_Auto_pool, [256, [64, 32], 1]], # 18\n   [-1, 3, C3, [256, False]], # 19\n\n   [[19, 15, 9], 1, PyramidPoolAgg, [352, 2]], # 20\n   [-1, 1, TopBasicLayer, [352, [64, 128]]], # 21\n\n   [[19, 16], 1, AdvPoolFusion, []], # 22\n   [[-1, 21], 1, InjectionMultiSum_Auto_pool, [256, [64, 128], 0]], # 23\n   [-1, 3, C3, [256, False]], # 24\n\n   [[-1, 12], 1, AdvPoolFusion, []], # 25\n   [[-1, 21], 1, InjectionMultiSum_Auto_pool, [512, [64, 128], 1]], # 26\n   [-1, 3, C3, [512, False]], # 27\n\n   [[19, 24, 27], 1, Detect, [nc, anchors]] # 28\n  ]\n"
  },
  {
    "path": "yolo-improve/yolov5-GOLDYOLO/yolov7-goldyolo.yaml",
    "content": "# parameters\nnc: 80  # number of classes\ndepth_multiple: 1.0  # model depth multiple\nwidth_multiple: 1.0  # layer channel multiple\n\n# anchors\nanchors:\n  - [12,16, 19,36, 40,28]  # P3/8\n  - [36,75, 76,55, 72,146]  # P4/16\n  - [142,110, 192,243, 459,401]  # P5/32\n\n# yolov7 backbone\nbackbone:\n  # [from, number, module, args]\n  [[-1, 1, Conv, [32, 3, 1]],  # 0\n  \n   [-1, 1, Conv, [64, 3, 2]],  # 1-P1/2      \n   [-1, 1, Conv, [64, 3, 1]],\n   \n   [-1, 1, Conv, [128, 3, 2]],  # 3-P2/4  \n   [-1, 1, Yolov7_E_ELAN, [256, 64]], # 4\n         \n   [-1, 1, V7DownSampling, [128]],  # 5-P3/8  \n   [-1, 1, Yolov7_E_ELAN, [512, 128]], # 6\n         \n   [-1, 1, V7DownSampling, [256]],  # 7-P4/16  \n   [-1, 1, Yolov7_E_ELAN, [1024, 256]], # 8\n         \n   [-1, 1, V7DownSampling, [512]],  # 9-P5/32  \n   [-1, 1, Yolov7_E_ELAN, [1024, 256]],  # 10\n  ]\n\n# yolov7 head\nhead:\n  [[-1, 1, SPPCSPC, [512]], # 11-Yolov7-tiny-spp\n   [[4, 6, 8, 11], 1, SimFusion_4in, []], # 12\n   [-1, 1, IFM, [[64, 32]]], # 13\n   \n   [11, 1, Conv, [1024, 1, 1]], # 14\n   [[6, 8, -1], 1, SimFusion_3in, [256]], # 15\n   [[-1, 13], 1, InjectionMultiSum_Auto_pool, [256, [64, 32], 0]], # 16\n   [-1, 1, Yolov7_E_ELAN_NECK, [256, 128]], # 17\n\n   [8, 1, Conv, [128, 1, 1]], # 18\n   [[4, 6, -1], 1, SimFusion_3in, [128]], # 19\n   [[-1, 13], 1, InjectionMultiSum_Auto_pool, [128, [64, 32], 1]], # 20\n   [-1, 1, Yolov7_E_ELAN_NECK, [128, 64]], # 21\n\n   [[21, 17, 11], 1, PyramidPoolAgg, [352, 2]], # 22\n   [-1, 1, TopBasicLayer, [352, [64, 128]]], # 23\n\n   [[21, 18], 1, AdvPoolFusion, []], # 24\n   [[-1, 23], 1, InjectionMultiSum_Auto_pool, [256, [64, 128], 0]], # 25\n   [-1, 1, Yolov7_E_ELAN_NECK, [256, 128]], # 26\n\n   [[-1, 14], 1, AdvPoolFusion, []], # 27\n   [[-1, 23], 1, InjectionMultiSum_Auto_pool, [512, [64, 128], 1]], # 28\n   [-1, 1, Yolov7_E_ELAN_NECK, [512, 256]], # 29\n\n   [21, 1, RepConv, [256, 3, 1]], # 30-P3\n   [26, 1, RepConv, [512, 3, 1]], # 31-P4\n   [29, 1, RepConv, [1024, 3, 1]], # 32-P5\n\n   [[30, 31, 32], 1, IDetect, [nc, anchors]] # 33\n  ]"
  },
  {
    "path": "yolo-improve/yolov5-GOLDYOLO/yolov7-tiny-goldyolo.yaml",
    "content": "# parameters\nnc: 80  # number of classes\ndepth_multiple: 1.0  # model depth multiple\nwidth_multiple: 1.0  # layer channel multiple\n\n# anchors\nanchors:\n  - [10,13, 16,30, 33,23]  # P3/8\n  - [30,61, 62,45, 59,119]  # P4/16\n  - [116,90, 156,198, 373,326]  # P5/32\n\n# yolov7-tiny backbone\nbackbone:\n  # [from, number, module, args] c2, k=1, s=1, p=None, g=1, act=True\n  [[-1, 1, Conv, [32, 3, 2, None, 1, nn.LeakyReLU(0.1)]],  # 0-P1/2  \n  \n   [-1, 1, Conv, [64, 3, 2, None, 1, nn.LeakyReLU(0.1)]],  # 1-P2/4    \n\n   [-1, 1, Yolov7_Tiny_E_ELAN, [64, 32, nn.LeakyReLU(0.1)]], # 2\n\n   [-1, 1, MP, []],  # 3-P3/8\n   [-1, 1, Yolov7_Tiny_E_ELAN, [128, 64, nn.LeakyReLU(0.1)]], # 4\n\n   [-1, 1, MP, []],  # 5-P4/16\n   [-1, 1, Yolov7_Tiny_E_ELAN, [256, 128, nn.LeakyReLU(0.1)]], # 6\n\n   [-1, 1, MP, []],  # 7-P5/32\n   [-1, 1, Yolov7_Tiny_E_ELAN, [512, 256, nn.LeakyReLU(0.1)]], # 8\n  ]\n\n# yolov7-tiny head\nhead:\n  [[-1, 1, Yolov7_Tiny_SPP, [256, nn.LeakyReLU(0.1)]], # 9-Yolov7-tiny-spp\n   [[2, 4, 6, 9], 1, SimFusion_4in, []], # 10\n   [-1, 1, IFM, [[64, 32]]], # 11\n   \n   [9, 1, Conv, [256, 1, 1]], # 12\n   [[4, 6, -1], 1, SimFusion_3in, [256]], # 13\n   [[-1, 11], 1, InjectionMultiSum_Auto_pool, [256, [64, 32], 0]], # 14\n   [-1, 1, Yolov7_Tiny_E_ELAN, [128, 64, nn.LeakyReLU(0.1)]], # 15\n\n   [6, 1, Conv, [128, 1, 1]], # 16\n   [[2, 4, -1], 1, SimFusion_3in, [128]], # 17\n   [[-1, 11], 1, InjectionMultiSum_Auto_pool, [128, [64, 32], 1]], # 18\n   [-1, 1, Yolov7_Tiny_E_ELAN, [64, 32, nn.LeakyReLU(0.1)]], # 19\n\n   [[19, 15, 9], 1, PyramidPoolAgg, [352, 2]], # 20\n   [-1, 1, TopBasicLayer, [352, [64, 128]]], # 21\n\n   [[19, 16], 1, AdvPoolFusion, []], # 22\n   [[-1, 21], 1, InjectionMultiSum_Auto_pool, [128, [64, 128], 0]], # 23\n   [-1, 1, Yolov7_Tiny_E_ELAN, [128, 64, nn.LeakyReLU(0.1)]], # 24\n\n   [[-1, 12], 1, AdvPoolFusion, []], # 25\n   [[-1, 21], 1, InjectionMultiSum_Auto_pool, [256, [64, 128], 1]], # 26\n   [-1, 1, Yolov7_Tiny_E_ELAN, [256, 128, nn.LeakyReLU(0.1)]], # 27\n\n   [19, 1, Conv, [128, 3, 1, None, 1, nn.LeakyReLU(0.1)]], # 28-P3\n   [24, 1, Conv, [256, 3, 1, None, 1, nn.LeakyReLU(0.1)]], # 29-P4\n   [27, 1, Conv, [512, 3, 1, None, 1, nn.LeakyReLU(0.1)]], # 30-P5\n\n   [[28, 29, 30], 1, IDetect, [nc, anchors]] # 28\n  ]"
  },
  {
    "path": "yolo-improve/yolov5-NWD.py",
    "content": "def wasserstein_loss(pred, target, eps=1e-7, constant=12.8):\n    r\"\"\"`Implementation of paper `Enhancing Geometric Factors into\n    Model Learning and Inference for Object Detection and Instance\n    Segmentation <https://arxiv.org/abs/2005.03572>`_.\n    Code is modified from https://github.com/Zzh-tju/CIoU.\n    Args:\n        pred (Tensor): Predicted bboxes of format (x_center, y_center, w, h),\n            shape (n, 4).\n        target (Tensor): Corresponding gt bboxes, shape (n, 4).\n        eps (float): Eps to avoid log(0).\n    Return:\n        Tensor: Loss tensor.\n    \"\"\"\n\n    center1 = pred[:, :2]\n    center2 = target[:, :2]\n\n    whs = center1[:, :2] - center2[:, :2]\n\n    center_distance = whs[:, 0] * whs[:, 0] + whs[:, 1] * whs[:, 1] + eps #\n\n    w1 = pred[:, 2]  + eps\n    h1 = pred[:, 3]  + eps\n    w2 = target[:, 2] + eps\n    h2 = target[:, 3] + eps\n\n    wh_distance = ((w1 - w2) ** 2 + (h1 - h2) ** 2) / 4\n\n    wasserstein_2 = center_distance + wh_distance\n    return torch.exp(-torch.sqrt(wasserstein_2) / constant)\n\n\nnwd = wasserstein_loss(pbox, tbox[i]).squeeze()\niou_ratio = 0.5\nlbox += (1 - iou_ratio) * (1.0 - nwd).mean() + iou_ratio * (1.0 - iou).mean()  # iou loss\n\n# Objectness\niou = (iou.detach() * iou_ratio + nwd.detach() * (1 - iou_ratio)).clamp(0, 1).type(tobj.dtype)"
  },
  {
    "path": "yolo-improve/yolov5-OTA/loss.py",
    "content": "import torch.nn.functional as F\nfrom utils.metrics import box_iou\nfrom utils.torch_utils import de_parallel\nfrom utils.general import xywh2xyxy\n\nclass ComputeLossOTA:\n    # Compute losses\n    def __init__(self, model, autobalance=False):\n        super(ComputeLossOTA, self).__init__()\n        device = next(model.parameters()).device  # get model device\n        h = model.hyp  # hyperparameters\n\n        # Define criteria\n        BCEcls = nn.BCEWithLogitsLoss(pos_weight=torch.tensor([h['cls_pw']], device=device))\n        BCEobj = nn.BCEWithLogitsLoss(pos_weight=torch.tensor([h['obj_pw']], device=device))\n\n        # Class label smoothing https://arxiv.org/pdf/1902.04103.pdf eqn 3\n        self.cp, self.cn = smooth_BCE(eps=h.get('label_smoothing', 0.0))  # positive, negative BCE targets\n\n        # Focal loss\n        g = h['fl_gamma']  # focal loss gamma\n        if g > 0:\n            BCEcls, BCEobj = FocalLoss(BCEcls, g), FocalLoss(BCEobj, g)\n\n        det = de_parallel(model).model[-1]  # Detect() module\n        self.balance = {3: [4.0, 1.0, 0.4]}.get(det.nl, [4.0, 1.0, 0.25, 0.06, .02])  # P3-P7\n        self.ssi = list(det.stride).index(16) if autobalance else 0  # stride 16 index\n        self.BCEcls, self.BCEobj, self.gr, self.hyp, self.autobalance = BCEcls, BCEobj, 1.0, h, autobalance\n        for k in 'na', 'nc', 'nl', 'anchors', 'stride':\n            setattr(self, k, getattr(det, k))\n\n    def __call__(self, p, targets, imgs):  # predictions, targets, model   \n        device = targets.device\n        lcls, lbox, lobj = torch.zeros(1, device=device), torch.zeros(1, device=device), torch.zeros(1, device=device)\n        bs, as_, gjs, gis, targets, anchors = self.build_targets(p, targets, imgs)\n        pre_gen_gains = [torch.tensor(pp.shape, device=device)[[3, 2, 3, 2]] for pp in p] \n    \n\n        # Losses\n        for i, pi in enumerate(p):  # layer index, layer predictions\n            b, a, gj, gi = bs[i], as_[i], gjs[i], gis[i]  # image, anchor, gridy, gridx\n            tobj = torch.zeros_like(pi[..., 0], device=device)  # target obj\n\n            n = b.shape[0]  # number of targets\n            if n:\n                ps = pi[b, a, gj, gi]  # prediction subset corresponding to targets\n\n                # Regression\n                grid = torch.stack([gi, gj], dim=1)\n                pxy = ps[:, :2].sigmoid() * 2. - 0.5\n                #pxy = ps[:, :2].sigmoid() * 3. - 1.\n                pwh = (ps[:, 2:4].sigmoid() * 2) ** 2 * anchors[i]\n                pbox = torch.cat((pxy, pwh), 1)  # predicted box\n                selected_tbox = targets[i][:, 2:6] * pre_gen_gains[i]\n                selected_tbox[:, :2] -= grid\n                iou = bbox_iou(pbox, selected_tbox, CIoU=True)  # iou(prediction, target)\n                if type(iou) is tuple:\n                    lbox += (iou[1].detach() * (1 - iou[0])).mean()\n                    iou = iou[0]\n                else:\n                    lbox += (1.0 - iou).mean()  # iou loss\n\n                # Objectness\n                tobj[b, a, gj, gi] = (1.0 - self.gr) + self.gr * iou.detach().clamp(0).type(tobj.dtype)  # iou ratio\n\n                # Classification\n                selected_tcls = targets[i][:, 1].long()\n                if self.nc > 1:  # cls loss (only if multiple classes)\n                    t = torch.full_like(ps[:, 5:], self.cn, device=device)  # targets\n                    t[range(n), selected_tcls] = self.cp\n                    lcls += self.BCEcls(ps[:, 5:], t)  # BCE\n\n                # Append targets to text file\n                # with open('targets.txt', 'a') as file:\n                #     [file.write('%11.5g ' * 4 % tuple(x) + '\\n') for x in torch.cat((txy[i], twh[i]), 1)]\n\n            obji = self.BCEobj(pi[..., 4], tobj)\n            lobj += obji * self.balance[i]  # obj loss\n            if self.autobalance:\n                self.balance[i] = self.balance[i] * 0.9999 + 0.0001 / obji.detach().item()\n\n        if self.autobalance:\n            self.balance = [x / self.balance[self.ssi] for x in self.balance]\n        lbox *= self.hyp['box']\n        lobj *= self.hyp['obj']\n        lcls *= self.hyp['cls']\n        bs = tobj.shape[0]  # batch size\n\n        loss = lbox + lobj + lcls\n        return loss * bs, torch.cat((lbox, lobj, lcls)).detach()\n\n    def build_targets(self, p, targets, imgs):\n        indices, anch = self.find_3_positive(p, targets)\n        device = torch.device(targets.device)\n        matching_bs = [[] for pp in p]\n        matching_as = [[] for pp in p]\n        matching_gjs = [[] for pp in p]\n        matching_gis = [[] for pp in p]\n        matching_targets = [[] for pp in p]\n        matching_anchs = [[] for pp in p]\n        \n        nl = len(p)    \n    \n        for batch_idx in range(p[0].shape[0]):\n        \n            b_idx = targets[:, 0]==batch_idx\n            this_target = targets[b_idx]\n            if this_target.shape[0] == 0:\n                continue\n                \n            txywh = this_target[:, 2:6] * imgs[batch_idx].shape[1]\n            txyxy = xywh2xyxy(txywh)\n\n            pxyxys = []\n            p_cls = []\n            p_obj = []\n            from_which_layer = []\n            all_b = []\n            all_a = []\n            all_gj = []\n            all_gi = []\n            all_anch = []\n            \n            for i, pi in enumerate(p):\n                \n                b, a, gj, gi = indices[i]\n                idx = (b == batch_idx)\n                b, a, gj, gi = b[idx], a[idx], gj[idx], gi[idx]                \n                all_b.append(b)\n                all_a.append(a)\n                all_gj.append(gj)\n                all_gi.append(gi)\n                all_anch.append(anch[i][idx])\n                from_which_layer.append((torch.ones(size=(len(b),)) * i).to(device))\n                \n                fg_pred = pi[b, a, gj, gi]                \n                p_obj.append(fg_pred[:, 4:5])\n                p_cls.append(fg_pred[:, 5:])\n                \n                grid = torch.stack([gi, gj], dim=1)\n                pxy = (fg_pred[:, :2].sigmoid() * 2. - 0.5 + grid) * self.stride[i] #/ 8.\n                #pxy = (fg_pred[:, :2].sigmoid() * 3. - 1. + grid) * self.stride[i]\n                pwh = (fg_pred[:, 2:4].sigmoid() * 2) ** 2 * anch[i][idx] * self.stride[i] #/ 8.\n                pxywh = torch.cat([pxy, pwh], dim=-1)\n                pxyxy = xywh2xyxy(pxywh)\n                pxyxys.append(pxyxy)\n            \n            pxyxys = torch.cat(pxyxys, dim=0)\n            if pxyxys.shape[0] == 0:\n                continue\n            p_obj = torch.cat(p_obj, dim=0)\n            p_cls = torch.cat(p_cls, dim=0)\n            from_which_layer = torch.cat(from_which_layer, dim=0)\n            all_b = torch.cat(all_b, dim=0)\n            all_a = torch.cat(all_a, dim=0)\n            all_gj = torch.cat(all_gj, dim=0)\n            all_gi = torch.cat(all_gi, dim=0)\n            all_anch = torch.cat(all_anch, dim=0)\n        \n            pair_wise_iou = box_iou(txyxy, pxyxys)\n\n            pair_wise_iou_loss = -torch.log(pair_wise_iou + 1e-8)\n\n            top_k, _ = torch.topk(pair_wise_iou, min(10, pair_wise_iou.shape[1]), dim=1)\n            dynamic_ks = torch.clamp(top_k.sum(1).int(), min=1)\n\n            gt_cls_per_image = (\n                F.one_hot(this_target[:, 1].to(torch.int64), self.nc)\n                .float()\n                .unsqueeze(1)\n                .repeat(1, pxyxys.shape[0], 1)\n            )\n\n            num_gt = this_target.shape[0]\n            cls_preds_ = (\n                p_cls.float().unsqueeze(0).repeat(num_gt, 1, 1).sigmoid_()\n                * p_obj.unsqueeze(0).repeat(num_gt, 1, 1).sigmoid_()\n            )\n\n            y = cls_preds_.sqrt_()\n            pair_wise_cls_loss = F.binary_cross_entropy_with_logits(\n               torch.log(y/(1-y)) , gt_cls_per_image, reduction=\"none\"\n            ).sum(-1)\n            del cls_preds_\n        \n            cost = (\n                pair_wise_cls_loss\n                + 3.0 * pair_wise_iou_loss\n            )\n\n            matching_matrix = torch.zeros_like(cost, device=device)\n\n            for gt_idx in range(num_gt):\n                _, pos_idx = torch.topk(\n                    cost[gt_idx], k=dynamic_ks[gt_idx].item(), largest=False\n                )\n                matching_matrix[gt_idx][pos_idx] = 1.0\n\n            del top_k, dynamic_ks\n            anchor_matching_gt = matching_matrix.sum(0)\n            if (anchor_matching_gt > 1).sum() > 0:\n                _, cost_argmin = torch.min(cost[:, anchor_matching_gt > 1], dim=0)\n                matching_matrix[:, anchor_matching_gt > 1] *= 0.0\n                matching_matrix[cost_argmin, anchor_matching_gt > 1] = 1.0\n            fg_mask_inboxes = (matching_matrix.sum(0) > 0.0).to(device)\n            matched_gt_inds = matching_matrix[:, fg_mask_inboxes].argmax(0)\n        \n            from_which_layer = from_which_layer[fg_mask_inboxes]\n            all_b = all_b[fg_mask_inboxes]\n            all_a = all_a[fg_mask_inboxes]\n            all_gj = all_gj[fg_mask_inboxes]\n            all_gi = all_gi[fg_mask_inboxes]\n            all_anch = all_anch[fg_mask_inboxes]\n        \n            this_target = this_target[matched_gt_inds]\n        \n            for i in range(nl):\n                layer_idx = from_which_layer == i\n                matching_bs[i].append(all_b[layer_idx])\n                matching_as[i].append(all_a[layer_idx])\n                matching_gjs[i].append(all_gj[layer_idx])\n                matching_gis[i].append(all_gi[layer_idx])\n                matching_targets[i].append(this_target[layer_idx])\n                matching_anchs[i].append(all_anch[layer_idx])\n\n        for i in range(nl):\n            if matching_targets[i] != []:\n                matching_bs[i] = torch.cat(matching_bs[i], dim=0)\n                matching_as[i] = torch.cat(matching_as[i], dim=0)\n                matching_gjs[i] = torch.cat(matching_gjs[i], dim=0)\n                matching_gis[i] = torch.cat(matching_gis[i], dim=0)\n                matching_targets[i] = torch.cat(matching_targets[i], dim=0)\n                matching_anchs[i] = torch.cat(matching_anchs[i], dim=0)\n            else:\n                matching_bs[i] = torch.tensor([], device='cuda:0', dtype=torch.int64)\n                matching_as[i] = torch.tensor([], device='cuda:0', dtype=torch.int64)\n                matching_gjs[i] = torch.tensor([], device='cuda:0', dtype=torch.int64)\n                matching_gis[i] = torch.tensor([], device='cuda:0', dtype=torch.int64)\n                matching_targets[i] = torch.tensor([], device='cuda:0', dtype=torch.int64)\n                matching_anchs[i] = torch.tensor([], device='cuda:0', dtype=torch.int64)\n\n        return matching_bs, matching_as, matching_gjs, matching_gis, matching_targets, matching_anchs           \n\n    def find_3_positive(self, p, targets):\n        # Build targets for compute_loss(), input targets(image,class,x,y,w,h)\n        na, nt = self.na, targets.shape[0]  # number of anchors, targets\n        indices, anch = [], []\n        gain = torch.ones(7, device=targets.device).long()  # normalized to gridspace gain\n        ai = torch.arange(na, device=targets.device).float().view(na, 1).repeat(1, nt)  # same as .repeat_interleave(nt)\n        targets = torch.cat((targets.repeat(na, 1, 1), ai[:, :, None]), 2)  # append anchor indices\n\n        g = 0.5  # bias\n        off = torch.tensor([[0, 0],\n                            [1, 0], [0, 1], [-1, 0], [0, -1],  # j,k,l,m\n                            # [1, 1], [1, -1], [-1, 1], [-1, -1],  # jk,jm,lk,lm\n                            ], device=targets.device).float() * g  # offsets\n\n        for i in range(self.nl):\n            anchors = self.anchors[i]\n            gain[2:6] = torch.tensor(p[i].shape)[[3, 2, 3, 2]]  # xyxy gain\n\n            # Match targets to anchors\n            t = targets * gain\n            if nt:\n                # Matches\n                r = t[:, :, 4:6] / anchors[:, None]  # wh ratio\n                j = torch.max(r, 1. / r).max(2)[0] < self.hyp['anchor_t']  # compare\n                # j = wh_iou(anchors, t[:, 4:6]) > model.hyp['iou_t']  # iou(3,n)=wh_iou(anchors(3,2), gwh(n,2))\n                t = t[j]  # filter\n\n                # Offsets\n                gxy = t[:, 2:4]  # grid xy\n                gxi = gain[[2, 3]] - gxy  # inverse\n                j, k = ((gxy % 1. < g) & (gxy > 1.)).T\n                l, m = ((gxi % 1. < g) & (gxi > 1.)).T\n                j = torch.stack((torch.ones_like(j), j, k, l, m))\n                t = t.repeat((5, 1, 1))[j]\n                offsets = (torch.zeros_like(gxy)[None] + off[:, None])[j]\n            else:\n                t = targets[0]\n                offsets = 0\n\n            # Define\n            b, c = t[:, :2].long().T  # image, class\n            gxy = t[:, 2:4]  # grid xy\n            gwh = t[:, 4:6]  # grid wh\n            gij = (gxy - offsets).long()\n            gi, gj = gij.T  # grid xy indices\n\n            # Append\n            a = t[:, 6].long()  # anchor indices\n            indices.append((b, a, gj.clamp_(0, gain[3] - 1), gi.clamp_(0, gain[2] - 1)))  # image, anchor, grid indices\n            anch.append(anchors[a])  # anchors\n\n        return indices, anch"
  },
  {
    "path": "yolo-improve/yolov5-RepNCSPELAN.py",
    "content": "class RepConvN(nn.Module):\n    \"\"\"RepConv is a basic rep-style block, including training and deploy status\n    This code is based on https://github.com/DingXiaoH/RepVGG/blob/main/repvgg.py\n    \"\"\"\n    default_act = nn.SiLU()  # default activation\n\n    def __init__(self, c1, c2, k=3, s=1, p=1, g=1, d=1, act=True, bn=False, deploy=False):\n        super().__init__()\n        assert k == 3 and p == 1\n        self.g = g\n        self.c1 = c1\n        self.c2 = c2\n        self.act = self.default_act if act is True else act if isinstance(act, nn.Module) else nn.Identity()\n\n        self.bn = None\n        self.conv1 = Conv(c1, c2, k, s, p=p, g=g, act=False)\n        self.conv2 = Conv(c1, c2, 1, s, p=(p - k // 2), g=g, act=False)\n\n    def forward_fuse(self, x):\n        \"\"\"Forward process\"\"\"\n        return self.act(self.conv(x))\n\n    def forward(self, x):\n        \"\"\"Forward process\"\"\"\n        id_out = 0 if self.bn is None else self.bn(x)\n        return self.act(self.conv1(x) + self.conv2(x) + id_out)\n\n    def get_equivalent_kernel_bias(self):\n        kernel3x3, bias3x3 = self._fuse_bn_tensor(self.conv1)\n        kernel1x1, bias1x1 = self._fuse_bn_tensor(self.conv2)\n        kernelid, biasid = self._fuse_bn_tensor(self.bn)\n        return kernel3x3 + self._pad_1x1_to_3x3_tensor(kernel1x1) + kernelid, bias3x3 + bias1x1 + biasid\n\n    def _avg_to_3x3_tensor(self, avgp):\n        channels = self.c1\n        groups = self.g\n        kernel_size = avgp.kernel_size\n        input_dim = channels // groups\n        k = torch.zeros((channels, input_dim, kernel_size, kernel_size))\n        k[np.arange(channels), np.tile(np.arange(input_dim), groups), :, :] = 1.0 / kernel_size ** 2\n        return k\n\n    def _pad_1x1_to_3x3_tensor(self, kernel1x1):\n        if kernel1x1 is None:\n            return 0\n        else:\n            return torch.nn.functional.pad(kernel1x1, [1, 1, 1, 1])\n\n    def _fuse_bn_tensor(self, branch):\n        if branch is None:\n            return 0, 0\n        if isinstance(branch, Conv):\n            kernel = branch.conv.weight\n            running_mean = branch.bn.running_mean\n            running_var = branch.bn.running_var\n            gamma = branch.bn.weight\n            beta = branch.bn.bias\n            eps = branch.bn.eps\n        elif isinstance(branch, nn.BatchNorm2d):\n            if not hasattr(self, 'id_tensor'):\n                input_dim = self.c1 // self.g\n                kernel_value = np.zeros((self.c1, input_dim, 3, 3), dtype=np.float32)\n                for i in range(self.c1):\n                    kernel_value[i, i % input_dim, 1, 1] = 1\n                self.id_tensor = torch.from_numpy(kernel_value).to(branch.weight.device)\n            kernel = self.id_tensor\n            running_mean = branch.running_mean\n            running_var = branch.running_var\n            gamma = branch.weight\n            beta = branch.bias\n            eps = branch.eps\n        std = (running_var + eps).sqrt()\n        t = (gamma / std).reshape(-1, 1, 1, 1)\n        return kernel * t, beta - running_mean * gamma / std\n\n    def fuse_convs(self):\n        if hasattr(self, 'conv'):\n            return\n        kernel, bias = self.get_equivalent_kernel_bias()\n        self.conv = nn.Conv2d(in_channels=self.conv1.conv.in_channels,\n                              out_channels=self.conv1.conv.out_channels,\n                              kernel_size=self.conv1.conv.kernel_size,\n                              stride=self.conv1.conv.stride,\n                              padding=self.conv1.conv.padding,\n                              dilation=self.conv1.conv.dilation,\n                              groups=self.conv1.conv.groups,\n                              bias=True).requires_grad_(False)\n        self.conv.weight.data = kernel\n        self.conv.bias.data = bias\n        for para in self.parameters():\n            para.detach_()\n        self.__delattr__('conv1')\n        self.__delattr__('conv2')\n        if hasattr(self, 'nm'):\n            self.__delattr__('nm')\n        if hasattr(self, 'bn'):\n            self.__delattr__('bn')\n        if hasattr(self, 'id_tensor'):\n            self.__delattr__('id_tensor')\n\nclass RepNBottleneck(nn.Module):\n    # Standard bottleneck\n    def __init__(self, c1, c2, shortcut=True, g=1, k=(3, 3), e=0.5):  # ch_in, ch_out, shortcut, kernels, groups, expand\n        super().__init__()\n        c_ = int(c2 * e)  # hidden channels\n        self.cv1 = RepConvN(c1, c_, k[0], 1)\n        self.cv2 = Conv(c_, c2, k[1], 1, g=g)\n        self.add = shortcut and c1 == c2\n\n    def forward(self, x):\n        return x + self.cv2(self.cv1(x)) if self.add else self.cv2(self.cv1(x))\n\nclass RepNCSP(nn.Module):\n    # CSP Bottleneck with 3 convolutions\n    def __init__(self, c1, c2, n=1, shortcut=True, g=1, e=0.5):  # ch_in, ch_out, number, shortcut, groups, expansion\n        super().__init__()\n        c_ = int(c2 * e)  # hidden channels\n        self.cv1 = Conv(c1, c_, 1, 1)\n        self.cv2 = Conv(c1, c_, 1, 1)\n        self.cv3 = Conv(2 * c_, c2, 1)  # optional act=FReLU(c2)\n        self.m = nn.Sequential(*(RepNBottleneck(c_, c_, shortcut, g, e=1.0) for _ in range(n)))\n\n    def forward(self, x):\n        return self.cv3(torch.cat((self.m(self.cv1(x)), self.cv2(x)), 1))\n\nclass RepNCSPELAN4(nn.Module):\n    # csp-elan\n    def __init__(self, c1, c2, c3, c4, c5=1):  # ch_in, ch_out, number, shortcut, groups, expansion\n        super().__init__()\n        self.c = c3//2\n        self.cv1 = Conv(c1, c3, 1, 1)\n        self.cv2 = nn.Sequential(RepNCSP(c3//2, c4, c5), Conv(c4, c4, 3, 1))\n        self.cv3 = nn.Sequential(RepNCSP(c4, c4, c5), Conv(c4, c4, 3, 1))\n        self.cv4 = Conv(c3+(2*c4), c2, 1, 1)\n\n    def forward(self, x):\n        y = list(self.cv1(x).chunk(2, 1))\n        y.extend((m(y[-1])) for m in [self.cv2, self.cv3])\n        return self.cv4(torch.cat(y, 1))\n\n    def forward_split(self, x):\n        y = list(self.cv1(x).split((self.c, self.c), 1))\n        y.extend(m(y[-1]) for m in [self.cv2, self.cv3])\n        return self.cv4(torch.cat(y, 1))\n\n# ------------------------------------yolo.py------------------------------------\nif m in (RepNCSPELAN4,):\n    args[2] = make_divisible(args[2] * gw, ch_mul)\n    args[3] = make_divisible(args[3] * gw, ch_mul)\n\nif hasattr(m, 'fuse_convs'):\n    m.fuse_convs()\n    m.forward = m.forward_fuse\n\n# ------------------------------------yaml------------------------------------\n# YOLOv5 🚀 by Ultralytics, AGPL-3.0 license\n\n# Parameters\nnc: 80 # number of classes\ndepth_multiple: 0.33 # model depth multiple\nwidth_multiple: 0.25 # layer channel multiple\nanchors:\n  - [10, 13, 16, 30, 33, 23] # P3/8\n  - [30, 61, 62, 45, 59, 119] # P4/16\n  - [116, 90, 156, 198, 373, 326] # P5/32\n\n# YOLOv5 v6.0 backbone\nbackbone:\n  # [from, number, module, args]\n  [\n    [-1, 1, Conv, [64, 6, 2, 2]], # 0-P1/2\n    [-1, 1, Conv, [128, 3, 2]], # 1-P2/4\n    [-1, 1, RepNCSPELAN4, [128, 64, 32, 1]],\n    [-1, 1, Conv, [256, 3, 2]], # 3-P3/8\n    [-1, 1, RepNCSPELAN4, [256, 128, 64, 1]],\n    [-1, 1, Conv, [512, 3, 2]], # 5-P4/16\n    [-1, 1, RepNCSPELAN4, [512, 256, 128, 1]],\n    [-1, 1, Conv, [1024, 3, 2]], # 7-P5/32\n    [-1, 1, RepNCSPELAN4, [1024, 512, 256, 1]],\n    [-1, 1, SPPF, [1024, 5]], # 9\n  ]\n\n# YOLOv5 v6.0 head\nhead: [\n    [-1, 1, Conv, [512, 1, 1]],\n    [-1, 1, nn.Upsample, [None, 2, \"nearest\"]],\n    [[-1, 6], 1, Concat, [1]], # cat backbone P4\n    [-1, 1, RepNCSPELAN4, [512, 256, 128, 1]], # 13\n\n    [-1, 1, Conv, [256, 1, 1]],\n    [-1, 1, nn.Upsample, [None, 2, \"nearest\"]],\n    [[-1, 4], 1, Concat, [1]], # cat backbone P3\n    [-1, 1, RepNCSPELAN4, [256, 128, 64, 1]], # 17 (P3/8-small)\n\n    [-1, 1, Conv, [256, 3, 2]],\n    [[-1, 14], 1, Concat, [1]], # cat head P4\n    [-1, 1, RepNCSPELAN4, [512, 256, 128, 1]], # 20 (P4/16-medium)\n\n    [-1, 1, Conv, [512, 3, 2]],\n    [[-1, 10], 1, Concat, [1]], # cat head P5\n    [-1, 1, RepNCSPELAN4, [1024, 512, 256, 1]], # 23 (P5/32-large)\n\n    [[17, 20, 23], 1, Detect, [nc, anchors]], # Detect(P3, P4, P5)\n  ]\n"
  },
  {
    "path": "yolo-improve/yolov5-SAConv.py",
    "content": "class ConvAWS2d(nn.Conv2d):\n    def __init__(self,\n                 in_channels,\n                 out_channels,\n                 kernel_size,\n                 stride=1,\n                 padding=0,\n                 dilation=1,\n                 groups=1,\n                 bias=True):\n        super().__init__(\n            in_channels,\n            out_channels,\n            kernel_size,\n            stride=stride,\n            padding=padding,\n            dilation=dilation,\n            groups=groups,\n            bias=bias)\n        self.register_buffer('weight_gamma', torch.ones(self.out_channels, 1, 1, 1))\n        self.register_buffer('weight_beta', torch.zeros(self.out_channels, 1, 1, 1))\n\n    def _get_weight(self, weight):\n        weight_mean = weight.mean(dim=1, keepdim=True).mean(dim=2,\n                                  keepdim=True).mean(dim=3, keepdim=True)\n        weight = weight - weight_mean\n        std = torch.sqrt(weight.view(weight.size(0), -1).var(dim=1) + 1e-5).view(-1, 1, 1, 1)\n        weight = weight / std\n        weight = self.weight_gamma * weight + self.weight_beta\n        return weight\n\n    def forward(self, x):\n        weight = self._get_weight(self.weight)\n        return super()._conv_forward(x, weight, None)\n\n    def _load_from_state_dict(self, state_dict, prefix, local_metadata, strict,\n                              missing_keys, unexpected_keys, error_msgs):\n        self.weight_gamma.data.fill_(-1)\n        super()._load_from_state_dict(state_dict, prefix, local_metadata, strict,\n                                      missing_keys, unexpected_keys, error_msgs)\n        if self.weight_gamma.data.mean() > 0:\n            return\n        weight = self.weight.data\n        weight_mean = weight.data.mean(dim=1, keepdim=True).mean(dim=2,\n                                       keepdim=True).mean(dim=3, keepdim=True)\n        self.weight_beta.data.copy_(weight_mean)\n        std = torch.sqrt(weight.view(weight.size(0), -1).var(dim=1) + 1e-5).view(-1, 1, 1, 1)\n        self.weight_gamma.data.copy_(std)\n    \nclass SAConv2d(ConvAWS2d):\n    def __init__(self,\n                 in_channels,\n                 out_channels,\n                 kernel_size,\n                 s=1,\n                 p=None,\n                 g=1,\n                 d=1,\n                 act=True,\n                 bias=True):\n        super().__init__(\n            in_channels,\n            out_channels,\n            kernel_size,\n            stride=s,\n            padding=autopad(kernel_size, p, d),\n            dilation=d,\n            groups=g,\n            bias=bias)\n        self.switch = torch.nn.Conv2d(\n            self.in_channels,\n            1,\n            kernel_size=1,\n            stride=s,\n            bias=True)\n        self.switch.weight.data.fill_(0)\n        self.switch.bias.data.fill_(1)\n        self.weight_diff = torch.nn.Parameter(torch.Tensor(self.weight.size()))\n        self.weight_diff.data.zero_()\n        self.pre_context = torch.nn.Conv2d(\n            self.in_channels,\n            self.in_channels,\n            kernel_size=1,\n            bias=True)\n        self.pre_context.weight.data.fill_(0)\n        self.pre_context.bias.data.fill_(0)\n        self.post_context = torch.nn.Conv2d(\n            self.out_channels,\n            self.out_channels,\n            kernel_size=1,\n            bias=True)\n        self.post_context.weight.data.fill_(0)\n        self.post_context.bias.data.fill_(0)\n        \n        self.bn = nn.BatchNorm2d(out_channels)\n        self.act = Conv.default_act if act is True else act if isinstance(act, nn.Module) else nn.Identity()\n\n    def forward(self, x):\n        # pre-context\n        avg_x = torch.nn.functional.adaptive_avg_pool2d(x, output_size=1)\n        avg_x = self.pre_context(avg_x)\n        avg_x = avg_x.expand_as(x)\n        x = x + avg_x\n        # switch\n        avg_x = torch.nn.functional.pad(x, pad=(2, 2, 2, 2), mode=\"reflect\")\n        avg_x = torch.nn.functional.avg_pool2d(avg_x, kernel_size=5, stride=1, padding=0)\n        switch = self.switch(avg_x)\n        # sac\n        weight = self._get_weight(self.weight)\n        out_s = super()._conv_forward(x, weight, None)\n        ori_p = self.padding\n        ori_d = self.dilation\n        self.padding = tuple(3 * p for p in self.padding)\n        self.dilation = tuple(3 * d for d in self.dilation)\n        weight = weight + self.weight_diff\n        out_l = super()._conv_forward(x, weight, None)\n        out = switch * out_s + (1 - switch) * out_l\n        self.padding = ori_p\n        self.dilation = ori_d\n        # post-context\n        avg_x = torch.nn.functional.adaptive_avg_pool2d(out, output_size=1)\n        avg_x = self.post_context(avg_x)\n        avg_x = avg_x.expand_as(out)\n        out = out + avg_x\n        return self.act(self.bn(out))\n\nclass Bottleneck_SAC(nn.Module):\n    # Standard bottleneck\n    def __init__(self, c1, c2, shortcut=True, g=1, e=0.5):  # ch_in, ch_out, shortcut, groups, expansion\n        super().__init__()\n        c_ = int(c2 * e)  # hidden channels\n        self.cv1 = Conv(c1, c_, 1, 1)\n        self.cv2 = SAConv2d(c_, c2, 3, 1, g=g)\n        self.add = shortcut and c1 == c2\n\n    def forward(self, x):\n        return x + self.cv2(self.cv1(x)) if self.add else self.cv2(self.cv1(x))\n\nclass C3_SAC(C3):\n    # CSP Bottleneck with 3 convolutions\n    def __init__(self, c1, c2, n=1, shortcut=True, g=1, e=0.5):  # ch_in, ch_out, number, shortcut, groups, expansion\n        super().__init__(c1, c2, n, shortcut, g, e)\n        c_ = int(c2 * e)  # hidden channels\n        self.m = nn.Sequential(*(Bottleneck_SAC(c_, c_, shortcut, g, e=1.0) for _ in range(n)))"
  },
  {
    "path": "yolo-improve/yolov5-TSCODE.py",
    "content": "\nfrom einops import rearrange\nclass TSCODE_Detect(nn.Module):\n    # YOLOv5 Detect head for detection models\n    stride = None  # strides computed during build\n    dynamic = False  # force grid reconstruction\n    export = False  # export mode\n\n    def __init__(self, nc=80, anchors=(), ch=(), inplace=True):  # detection layer\n        super().__init__()\n        self.nc = nc  # number of classes\n        self.no = nc + 5  # number of outputs per anchor\n        self.nl = len(anchors)  # number of detection layers\n        self.na = len(anchors[0]) // 2  # number of anchors\n        self.grid = [torch.empty(0) for _ in range(self.nl)]  # init grid\n        self.anchor_grid = [torch.empty(0) for _ in range(self.nl)]  # init anchor grid\n        self.register_buffer('anchors', torch.tensor(anchors).float().view(self.nl, -1, 2))  # shape(nl,na,2)\n        self.m_sce = nn.ModuleList(SCE(ch[id:id+2]) for id in range(1, len(ch) - 1))\n        self.m_dpe = nn.ModuleList(DPE(ch[id-1:id+2], ch[id]) for id in range(1, len(ch) - 1))\n        \n        self.m_cls = nn.ModuleList(nn.Sequential(Conv(sum(ch[id:id+2]), ch[id], 1), Conv(ch[id], ch[id], 3), nn.Conv2d(ch[id], self.na * self.nc * 4, 1)) for id in range(1, len(ch) - 1))  # cls conv\n        self.m_reg_conf = nn.ModuleList(nn.Sequential(*[Conv(ch[id], ch[id], 3) for i in range(2)]) for id in range(1, len(ch) - 1))  # reg_conf stem conv\n        self.m_reg = nn.ModuleList(nn.Conv2d(ch[id], self.na * 4, 1) for id in range(1, len(ch) - 1))  # reg conv\n        self.m_conf = nn.ModuleList(nn.Conv2d(ch[id], self.na * 1, 1) for id in range(1, len(ch) - 1))  # conf conv\n        self.ph, self.pw = 2, 2\n        \n        self.inplace = inplace  # use inplace ops (e.g. slice assignment)\n\n    def forward(self, x_):\n        x, z = [], []  # inference output\n        for i, idx in enumerate(range(1, self.nl + 1)):\n            bs, _, ny, nx = x_[idx].shape\n            \n            x_sce, x_dpe = self.m_sce[i](x_[idx:idx+2]), self.m_dpe[i](x_[idx-1:idx+2])\n            x_cls = rearrange(self.m_cls[i](x_sce), 'bs (nl ph pw nc) h w -> bs nl nc (h ph) (w pw)', nl=self.nl, ph=self.ph, pw=self.pw, nc=self.nc)\n            x_cls = x_cls.permute(0, 1, 3, 4, 2).contiguous()\n            \n            x_reg_conf = self.m_reg_conf[i](x_dpe)\n            x_reg = self.m_reg[i](x_reg_conf).view(bs, self.na, 4, ny, nx).permute(0, 1, 3, 4, 2).contiguous()\n            x_conf = self.m_conf[i](x_reg_conf).view(bs, self.na, 1, ny, nx).permute(0, 1, 3, 4, 2).contiguous()\n            x.append(torch.cat([x_reg, x_conf, x_cls], dim=4))\n        \n            if not self.training:  # inference\n                if self.dynamic or self.grid[i].shape[2:4] != x[i].shape[2:4]:\n                    self.grid[i], self.anchor_grid[i] = self._make_grid(nx, ny, i)\n\n                if isinstance(self, Segment):  # (boxes + masks)\n                    xy, wh, conf, mask = x[i].split((2, 2, self.nc + 1, self.no - self.nc - 5), 4)\n                    xy = (xy.sigmoid() * 2 + self.grid[i]) * self.stride[i]  # xy\n                    wh = (wh.sigmoid() * 2) ** 2 * self.anchor_grid[i]  # wh\n                    y = torch.cat((xy, wh, conf.sigmoid(), mask), 4)\n                else:  # Detect (boxes only)\n                    xy, wh, conf = x[i].sigmoid().split((2, 2, self.nc + 1), 4)\n                    xy = (xy * 2 + self.grid[i]) * self.stride[i]  # xy\n                    wh = (wh * 2) ** 2 * self.anchor_grid[i]  # wh\n                    y = torch.cat((xy, wh, conf), 4)\n                z.append(y.view(bs, self.na * nx * ny, self.no))\n\n        return x if self.training else (torch.cat(z, 1),) if self.export else (torch.cat(z, 1), x)\n\n    def _make_grid(self, nx=20, ny=20, i=0, torch_1_10=check_version(torch.__version__, '1.10.0')):\n        d = self.anchors[i].device\n        t = self.anchors[i].dtype\n        shape = 1, self.na, ny, nx, 2  # grid shape\n        y, x = torch.arange(ny, device=d, dtype=t), torch.arange(nx, device=d, dtype=t)\n        yv, xv = torch.meshgrid(y, x, indexing='ij') if torch_1_10 else torch.meshgrid(y, x)  # torch>=0.7 compatibility\n        grid = torch.stack((xv, yv), 2).expand(shape) - 0.5  # add grid offset, i.e. y = 2.0 * x - 0.5\n        anchor_grid = (self.anchors[i] * self.stride[i]).view((1, self.na, 1, 1, 2)).expand(shape)\n        return grid, anchor_grid\n    \nclass Decoupled_Detect(nn.Module):\n    # YOLOv5 Detect head for detection models\n    stride = None  # strides computed during build\n    dynamic = False  # force grid reconstruction\n    export = False  # export mode\n\n    def __init__(self, nc=80, anchors=(), ch=(), inplace=True):  # detection layer\n        super().__init__()\n        self.nc = nc  # number of classes\n        self.no = nc + 5  # number of outputs per anchor\n        self.nl = len(anchors)  # number of detection layers\n        self.na = len(anchors[0]) // 2  # number of anchors\n        self.grid = [torch.empty(0) for _ in range(self.nl)]  # init grid\n        self.anchor_grid = [torch.empty(0) for _ in range(self.nl)]  # init anchor grid\n        self.register_buffer('anchors', torch.tensor(anchors).float().view(self.nl, -1, 2))  # shape(nl,na,2)\n        \n        self.m_stem = nn.ModuleList(Conv(x, x, 1) for x in ch)  # stem conv\n        self.m_cls = nn.ModuleList(nn.Sequential(Conv(x, x, 3), nn.Conv2d(x, self.na * self.nc, 1)) for x in ch)  # cls conv\n        self.m_reg_conf = nn.ModuleList(Conv(x, x, 3) for x in ch)  # reg_conf stem conv\n        self.m_reg = nn.ModuleList(nn.Conv2d(x, self.na * 4, 1) for x in ch)  # reg conv\n        self.m_conf = nn.ModuleList(nn.Conv2d(x, self.na * 1, 1) for x in ch)  # conf conv\n        \n        self.inplace = inplace  # use inplace ops (e.g. slice assignment)\n\n    def forward(self, x):\n        z = []  # inference output\n        for i in range(self.nl):\n            x[i] = self.m_stem[i](x[i])  # conv\n            \n            bs, _, ny, nx = x[i].shape\n            x_cls = self.m_cls[i](x[i]).view(bs, self.na, self.nc, ny, nx).permute(0, 1, 3, 4, 2).contiguous()\n            x_reg_conf = self.m_reg_conf[i](x[i])\n            x_reg = self.m_reg[i](x_reg_conf).view(bs, self.na, 4, ny, nx).permute(0, 1, 3, 4, 2).contiguous()\n            x_conf = self.m_conf[i](x_reg_conf).view(bs, self.na, 1, ny, nx).permute(0, 1, 3, 4, 2).contiguous()\n            x[i] = torch.cat([x_reg, x_conf, x_cls], dim=4)\n\n            if not self.training:  # inference\n                if self.dynamic or self.grid[i].shape[2:4] != x[i].shape[2:4]:\n                    self.grid[i], self.anchor_grid[i] = self._make_grid(nx, ny, i)\n\n                if isinstance(self, Segment):  # (boxes + masks)\n                    xy, wh, conf, mask = x[i].split((2, 2, self.nc + 1, self.no - self.nc - 5), 4)\n                    xy = (xy.sigmoid() * 2 + self.grid[i]) * self.stride[i]  # xy\n                    wh = (wh.sigmoid() * 2) ** 2 * self.anchor_grid[i]  # wh\n                    y = torch.cat((xy, wh, conf.sigmoid(), mask), 4)\n                else:  # Detect (boxes only)\n                    xy, wh, conf = x[i].sigmoid().split((2, 2, self.nc + 1), 4)\n                    xy = (xy * 2 + self.grid[i]) * self.stride[i]  # xy\n                    wh = (wh * 2) ** 2 * self.anchor_grid[i]  # wh\n                    y = torch.cat((xy, wh, conf), 4)\n                z.append(y.view(bs, self.na * nx * ny, self.no))\n\n        return x if self.training else (torch.cat(z, 1),) if self.export else (torch.cat(z, 1), x)\n\n    def _make_grid(self, nx=20, ny=20, i=0, torch_1_10=check_version(torch.__version__, '1.10.0')):\n        d = self.anchors[i].device\n        t = self.anchors[i].dtype\n        shape = 1, self.na, ny, nx, 2  # grid shape\n        y, x = torch.arange(ny, device=d, dtype=t), torch.arange(nx, device=d, dtype=t)\n        yv, xv = torch.meshgrid(y, x, indexing='ij') if torch_1_10 else torch.meshgrid(y, x)  # torch>=0.7 compatibility\n        grid = torch.stack((xv, yv), 2).expand(shape) - 0.5  # add grid offset, i.e. y = 2.0 * x - 0.5\n        anchor_grid = (self.anchors[i] * self.stride[i]).view((1, self.na, 1, 1, 2)).expand(shape)\n        return grid, anchor_grid\n\ndef _initialize_biases(self, cf=None):  # initialize biases into Detect(), cf is class frequency\n    # https://arxiv.org/abs/1708.02002 section 3.3\n    # cf = torch.bincount(torch.tensor(np.concatenate(dataset.labels, 0)[:, 0]).long(), minlength=nc) + 1.\n    m = self.model[-1]  # Detect() module\n    \n    if isinstance(m, Detect):\n        for mi, s in zip(m.m, m.stride):  # from\n            b = mi.bias.view(m.na, -1)  # conv.bias(255) to (3,85)\n            b.data[:, 4] += math.log(8 / (640 / s) ** 2)  # obj (8 objects per 640 image)\n            b.data[:, 5:5 + m.nc] += math.log(0.6 / (m.nc - 0.99999)) if cf is None else torch.log(cf / cf.sum())  # cls\n            mi.bias = torch.nn.Parameter(b.view(-1), requires_grad=True)\n    elif isinstance(m, Decoupled_Detect) or isinstance(m, TSCODE_Detect):\n        for mi, s in zip(m.m_conf, m.stride):  # from\n            b = mi.bias.view(m.na, -1)  # conv.bias(255) to (3,85)\n            b.data += math.log(8 / (640 / s) ** 2)  # obj (8 objects per 640 image)\n            mi.bias = torch.nn.Parameter(b.view(-1), requires_grad=True)\n\n        for mi, s in zip(m.m_cls, m.stride):  # from\n            b = mi[-1].bias.view(m.na, -1)  # conv.bias(255) to (3,85)\n            b.data += math.log(0.6 / (m.nc - 0.99999)) if cf is None else torch.log(cf / cf.sum())  # cls\n            mi[-1].bias = torch.nn.Parameter(b.view(-1), requires_grad=True)\n\n### Task-Specific Context Decoupling for Object Detection\n\nclass SCE(nn.Module):\n    def __init__(self, c1):\n        super().__init__()\n        self.down = Conv(c1[0], c1[0], k=3, s=2)\n        \n    def forward(self, x):\n        x_p1, x_p2 = x\n        x = torch.concat([self.down(x_p1), x_p2], dim=1)\n        return x\n\nclass DPE(nn.Module):\n    def __init__(self, c1, c2):\n        super().__init__()\n        self.adjust_channel_forp1 = Conv(c1[0], c2, k=1)\n        self.adjust_channel_forp2 = Conv(c1[1], c2, k=1)\n        \n        self.up_forp2 = nn.Sequential(\n            nn.Upsample(scale_factor=2),\n            Conv(c2, c2, k=1)\n        )\n        self.up_forp3 = nn.Sequential(\n            nn.Upsample(scale_factor=2),\n            Conv(c1[2], c2, k=1)\n        )\n        self.down = Conv(c2, c2, k=3, s=2)\n        self.middle = Conv(c2, c2, k=1)\n        \n    def forward(self, x):\n        x_p2 = self.adjust_channel_forp2(x[1])\n        x_p1 = self.adjust_channel_forp1(x[0]) + self.up_forp2(x_p2)\n        x_p1 = self.down(x_p1)\n        \n        x_p3 = self.up_forp3(x[2])\n        \n        return x_p1 + x_p2 + x_p3\n\n#### yolov5-FPN-TSCODE\n# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n# Parameters\nnc: 80  # number of classes\ndepth_multiple: 0.33  # model depth multiple\nwidth_multiple: 0.25  # layer channel multiple\nanchors:\n  - [10,13, 16,30, 33,23]  # P3/8\n  - [30,61, 62,45, 59,119]  # P4/16\n  - [116,90, 156,198, 373,326]  # P5/32\n\n# YOLOv5 v6.0 backbone\nbackbone:\n  # [from, number, module, args]\n  [[-1, 1, Conv, [64, 6, 2, 2]],  # 0-P1/2\n   [-1, 1, Conv, [128, 3, 2]],  # 1-P2/4\n   [-1, 3, C3, [128]],\n   [-1, 1, Conv, [256, 3, 2]],  # 3-P3/8\n   [-1, 6, C3, [256]],\n   [-1, 1, Conv, [512, 3, 2]],  # 5-P4/16\n   [-1, 9, C3, [512]],\n   [-1, 1, Conv, [1024, 3, 2]],  # 7-P5/32\n   [-1, 3, C3, [1024]],\n   [-1, 1, SPPF, [1024, 5]],  # 9\n  ]\n\n# YOLOv5 v6.0 head\nhead:\n  [[-1, 1, Conv, [512, 1, 1]], # 10\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']], # 11\n   [[-1, 6], 1, Concat, [1]],  # cat backbone P4 12\n   [-1, 3, C3, [512, False]],  # 13\n\n   [-1, 1, Conv, [256, 1, 1]], # 14\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']], # 15\n   [[-1, 4], 1, Concat, [1]],  # cat backbone P3 16\n   [-1, 3, C3, [256, False]],  # 17 (P3/8-small)\n\n   [9, 1, Conv, [1024, 3, 2]], # 18-P6/64\n   [-1, 3, C3, [1024]], # 19\n\n   [[2, 17, 13, 10, 19], 1, TSCODE_Detect, [nc, anchors]],  # Detect(P3, P4, P5)\n  ]\n\n\n#### yolov5-PFPN-TSCODE\n# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n\n# Parameters\nnc: 80  # number of classes\ndepth_multiple: 0.33  # model depth multiple\nwidth_multiple: 0.25  # layer channel multiple\nanchors:\n  - [10,13, 16,30, 33,23]  # P3/8\n  - [30,61, 62,45, 59,119]  # P4/16\n  - [116,90, 156,198, 373,326]  # P5/32\n\n# YOLOv5 v6.0 backbone\nbackbone:\n  # [from, number, module, args]\n  [[-1, 1, Conv, [64, 6, 2, 2]],  # 0-P1/2\n   [-1, 1, Conv, [128, 3, 2]],  # 1-P2/4\n   [-1, 3, C3, [128]],\n   [-1, 1, Conv, [256, 3, 2]],  # 3-P3/8\n   [-1, 6, C3, [256]],\n   [-1, 1, Conv, [512, 3, 2]],  # 5-P4/16\n   [-1, 9, C3, [512]],\n   [-1, 1, Conv, [1024, 3, 2]],  # 7-P5/32\n   [-1, 3, C3, [1024]],\n   [-1, 1, SPPF, [1024, 5]],  # 9\n  ]\n\n# YOLOv5 v6.0 head\nhead:\n  [[-1, 1, Conv, [512, 1, 1]], # 10\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']], # 11\n   [[-1, 6], 1, Concat, [1]],  # cat backbone P4 12\n   [-1, 3, C3, [512, False]],  # 13\n\n   [-1, 1, Conv, [256, 1, 1]], # 14\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']], # 15\n   [[-1, 4], 1, Concat, [1]],  # cat backbone P3 16\n   [-1, 3, C3, [256, False]],  # 17 (P3/8-small)\n\n   [-1, 1, Conv, [256, 3, 2]], # 18\n   [[-1, 14], 1, Concat, [1]],  # cat head P4 19\n   [-1, 3, C3, [512, False]],  # 20 (P4/16-medium)\n\n   [-1, 1, Conv, [512, 3, 2]], # 21\n   [[-1, 10], 1, Concat, [1]],  # cat head P5 # 22\n   [-1, 3, C3, [1024, False]],  # 23 (P5/32-large)\n\n   [9, 1, Conv, [1024, 3, 2]], # 24-P6/64\n   [-1, 3, C3, [1024]], # 25\n\n   [[2, 17, 20, 23, 25], 1, TSCODE_Detect, [nc, anchors]],  # Detect(P3, P4, P5)\n  ]\n"
  },
  {
    "path": "yolo-improve/yolov5-aLRPLoss.py",
    "content": "class aLRPLoss(torch.autograd.Function):\n    @staticmethod\n    def forward(ctx, logits, targets, regression_losses, delta=1., eps=1e-5): \n        classification_grads=torch.zeros(logits.shape).cuda()\n        \n        #Filter fg logits\n        fg_labels = (targets == 1)\n        fg_logits = logits[fg_labels]\n        fg_num = len(fg_logits)\n\n        #Do not use bg with scores less than minimum fg logit\n        #since changing its score does not have an effect on precision\n        threshold_logit = torch.min(fg_logits)-delta\n\n        #Get valid bg logits\n        relevant_bg_labels=((targets==0)&(logits>=threshold_logit))\n        relevant_bg_logits=logits[relevant_bg_labels] \n        relevant_bg_grad=torch.zeros(len(relevant_bg_logits)).cuda()\n        rank=torch.zeros(fg_num).cuda()\n        prec=torch.zeros(fg_num).cuda()\n        fg_grad=torch.zeros(fg_num).cuda()\n        \n        max_prec=0                                           \n        #sort the fg logits\n        order=torch.argsort(fg_logits)\n        #Loops over each positive following the order\n        for ii in order:\n            #x_ij s as score differences with fgs\n            fg_relations=fg_logits-fg_logits[ii] \n            #Apply piecewise linear function and determine relations with fgs\n            fg_relations=torch.clamp(fg_relations/(2*delta)+0.5,min=0,max=1)\n            #Discard i=j in the summation in rank_pos\n            fg_relations[ii]=0\n\n            #x_ij s as score differences with bgs\n            bg_relations=relevant_bg_logits-fg_logits[ii]\n            #Apply piecewise linear function and determine relations with bgs\n            bg_relations=torch.clamp(bg_relations/(2*delta)+0.5,min=0,max=1)\n\n            #Compute the rank of the example within fgs and number of bgs with larger scores\n            rank_pos=1+torch.sum(fg_relations)\n            FP_num=torch.sum(bg_relations)\n            #Store the total since it is normalizer also for aLRP Regression error\n            rank[ii]=rank_pos+FP_num\n                            \n            #Compute precision for this example to compute classification loss \n            prec[ii]=rank_pos/rank[ii]                \n            #For stability, set eps to a infinitesmall value (e.g. 1e-6), then compute grads\n            if FP_num > eps:   \n                fg_grad[ii] = -(torch.sum(fg_relations*regression_losses)+FP_num)/rank[ii]\n                relevant_bg_grad += (bg_relations*(-fg_grad[ii]/FP_num))   \n                    \n        #aLRP with grad formulation fg gradient\n        classification_grads[fg_labels]= fg_grad\n        #aLRP with grad formulation bg gradient\n        classification_grads[relevant_bg_labels]= relevant_bg_grad \n \n        classification_grads /= (fg_num)\n    \n        cls_loss=1-prec.mean()\n        ctx.save_for_backward(classification_grads)\n\n        return cls_loss, rank, order\n\n    @staticmethod\n    def backward(ctx, out_grad1, out_grad2, out_grad3):\n        g1, =ctx.saved_tensors\n        return g1*out_grad1, None, None, None, None\n\n# init\nself.aLRP_Loss = aLRPLoss()\nself.SB_weight = 50\nself.period = 3665\nself.cls_LRP_hist = collections.deque(maxlen=self.period)\nself.reg_LRP_hist = collections.deque(maxlen=self.period)\nself.counter = 0\n\n# __call__\ndef __call__(self, p, targets):  # predictions, targets\n    lcls = torch.zeros(1, device=self.device)  # class loss\n    lbox = torch.zeros(1, device=self.device)  # box loss\n    lobj = torch.zeros(1, device=self.device)  # object loss\n    tcls, tbox, indices, anchors = self.build_targets(p, targets)  # targets\n\n    # Losses\n    for i, pi in enumerate(p):  # layer index, layer predictions\n        b, a, gj, gi = indices[i]  # image, anchor, gridy, gridx\n        tobj = torch.zeros(pi.shape[:4], dtype=pi.dtype, device=self.device)  # target obj\n\n        n = b.shape[0]  # number of targets\n        if n:\n            # pxy, pwh, _, pcls = pi[b, a, gj, gi].tensor_split((2, 4, 5), dim=1)  # faster, requires torch 1.8.0\n            pxy, pwh, _, pcls = pi[b, a, gj, gi].split((2, 2, 1, self.nc), 1)  # target-subset of predictions\n\n            # Regression\n            pxy = pxy.sigmoid() * 2 - 0.5\n            pwh = (pwh.sigmoid() * 2) ** 2 * anchors[i]\n            pbox = torch.cat((pxy, pwh), 1)  # predicted box\n            iou = bbox_iou(pbox, tbox[i], CIoU=True).squeeze()  # iou(prediction, target)\n\n            # Classification\n            if self.nc > 1:  # cls loss (only if multiple classes)\n                t = torch.full_like(pcls, self.cn, device=self.device)  # targets\n                t[range(n), tcls[i]] = self.cp\n                # lcls += self.BCEcls(pcls, t)  # BCE\n                \n                lbox_temp = 1.0 - iou\n                losses_cls, rank, order = self.aLRP_Loss.apply(pcls.reshape(-1), t.reshape(-1), lbox_temp.detach())\n                ordered_losses_bbox = lbox_temp[order.detach()].flip(dims=[0])\n                losses_bbox = (torch.cumsum(ordered_losses_bbox,dim=0)/rank[order.detach()].detach().flip(dims=[0])).mean()\n                \n                self.cls_LRP_hist.append(float(losses_cls.item()))\n                self.reg_LRP_hist.append(float(losses_bbox.item()))\n                self.counter += 1\n                \n                if self.counter == self.period:\n                    self.SB_weight = (np.mean(self.reg_LRP_hist)+np.mean(self.cls_LRP_hist))/np.mean(self.reg_LRP_hist)\n                    self.cls_LRP_hist.clear()\n                    self.reg_LRP_hist.clear()\n                    self.counter=0\n                \n                lbox += losses_bbox * self.SB_weight  # iou loss\n                lcls += losses_cls\n            \n            # Objectness\n            iou = iou.detach().clamp(0).type(tobj.dtype)\n            if self.sort_obj_iou:\n                j = iou.argsort()\n                b, a, gj, gi, iou = b[j], a[j], gj[j], gi[j], iou[j]\n            if self.gr < 1:\n                iou = (1.0 - self.gr) + self.gr * iou\n            tobj[b, a, gj, gi] = iou  # iou ratio\n\n            # Append targets to text file\n            # with open('targets.txt', 'a') as file:\n            #     [file.write('%11.5g ' * 4 % tuple(x) + '\\n') for x in torch.cat((txy[i], twh[i]), 1)]\n\n        obji = self.BCEobj(pi[..., 4], tobj)\n        lobj += obji * self.balance[i]  # obj loss\n        if self.autobalance:\n            self.balance[i] = self.balance[i] * 0.9999 + 0.0001 / obji.detach().item()\n\n    if self.autobalance:\n        self.balance = [x / self.balance[self.ssi] for x in self.balance]\n    lbox *= self.hyp['box']\n    lobj *= self.hyp['obj']\n    lcls *= self.hyp['cls']\n    bs = tobj.shape[0]  # batch size\n\n    return (lbox + lobj + lcls) * bs, torch.cat((lbox, lobj, lcls)).detach()"
  },
  {
    "path": "yolo-improve/yolov5-asf.py",
    "content": "# common.py\nimport torch.nn.functional as F\nclass Zoom_cat(nn.Module):\n    def __init__(self):\n        super().__init__()\n\n    def forward(self, x):\n        \"\"\"l,m,s表示大中小三个尺度，最终会被整合到m这个尺度上\"\"\"\n        l, m, s = x[0], x[1], x[2]\n        tgt_size = m.shape[2:]\n        l = F.adaptive_max_pool2d(l, tgt_size) + F.adaptive_avg_pool2d(l, tgt_size)\n        s = F.interpolate(s, m.shape[2:], mode='nearest')\n        lms = torch.cat([l, m, s], dim=1)\n        return lms\n\nclass ScalSeq(nn.Module):\n    def __init__(self, inc, channel):\n        super(ScalSeq, self).__init__()\n        self.conv1 =  Conv(inc[1], channel,1)\n        self.conv2 =  Conv(inc[2], channel,1)\n        self.conv3d = nn.Conv3d(channel,channel,kernel_size=(1,1,1))\n        self.bn = nn.BatchNorm3d(channel)\n        self.act = nn.LeakyReLU(0.1)\n        self.pool_3d = nn.MaxPool3d(kernel_size=(3,1,1))\n\n    def forward(self, x):\n        p3, p4, p5 = x[0],x[1],x[2]\n        p4_2 = self.conv1(p4)\n        p4_2 = F.interpolate(p4_2, p3.size()[2:], mode='nearest')\n        p5_2 = self.conv2(p5)\n        p5_2 = F.interpolate(p5_2, p3.size()[2:], mode='nearest')\n        p3_3d = torch.unsqueeze(p3, -3)\n        p4_3d = torch.unsqueeze(p4_2, -3)\n        p5_3d = torch.unsqueeze(p5_2, -3)\n        combine = torch.cat([p3_3d,p4_3d,p5_3d],dim = 2)\n        conv_3d = self.conv3d(combine)\n        bn = self.bn(conv_3d)\n        act = self.act(bn)\n        x = self.pool_3d(act)\n        x = torch.squeeze(x, 2)\n        return x\n    \nclass Add(nn.Module):\n    # Concatenate a list of tensors along dimension\n    def __init__(self):\n        super().__init__()\n\n    def forward(self, x):\n        input1,input2 = x[0],x[1]\n        x = input1 + input2\n        return x\n\nclass channel_att(nn.Module):\n    def __init__(self, channel, b=1, gamma=2):\n        super(channel_att, self).__init__()\n        kernel_size = int(abs((math.log(channel, 2) + b) / gamma))\n        kernel_size = kernel_size if kernel_size % 2 else kernel_size + 1\n        \n        self.avg_pool = nn.AdaptiveAvgPool2d(1)\n        self.conv = nn.Conv1d(1, 1, kernel_size=kernel_size, padding=(kernel_size - 1) // 2, bias=False) \n        self.sigmoid = nn.Sigmoid()\n\n    def forward(self, x):\n        y = self.avg_pool(x)\n        y = y.squeeze(-1)\n        y = y.transpose(-1, -2)\n        y = self.conv(y).transpose(-1, -2).unsqueeze(-1)\n        y = self.sigmoid(y)\n        return x * y.expand_as(x)\n    \nclass local_att(nn.Module):\n    def __init__(self, channel, reduction=16):\n        super(local_att, self).__init__()\n        \n        self.conv_1x1 = nn.Conv2d(in_channels=channel, out_channels=channel//reduction, kernel_size=1, stride=1, bias=False)\n \n        self.relu   = nn.ReLU()\n        self.bn     = nn.BatchNorm2d(channel//reduction)\n \n        self.F_h = nn.Conv2d(in_channels=channel//reduction, out_channels=channel, kernel_size=1, stride=1, bias=False)\n        self.F_w = nn.Conv2d(in_channels=channel//reduction, out_channels=channel, kernel_size=1, stride=1, bias=False)\n \n        self.sigmoid_h = nn.Sigmoid()\n        self.sigmoid_w = nn.Sigmoid()\n \n    def forward(self, x):\n        _, _, h, w = x.size()\n        \n        x_h = torch.mean(x, dim = 3, keepdim = True).permute(0, 1, 3, 2)\n        x_w = torch.mean(x, dim = 2, keepdim = True)\n \n        x_cat_conv_relu = self.relu(self.bn(self.conv_1x1(torch.cat((x_h, x_w), 3))))\n \n        x_cat_conv_split_h, x_cat_conv_split_w = x_cat_conv_relu.split([h, w], 3)\n \n        s_h = self.sigmoid_h(self.F_h(x_cat_conv_split_h.permute(0, 1, 3, 2)))\n        s_w = self.sigmoid_w(self.F_w(x_cat_conv_split_w))\n \n        out = x * s_h.expand_as(x) * s_w.expand_as(x)\n        return out\n    \nclass attention_model(nn.Module):\n    # Concatenate a list of tensors along dimension\n    def __init__(self, ch = 256):\n        super().__init__()\n        self.channel_att = channel_att(ch)\n        self.local_att = local_att(ch)\n    def forward(self, x):\n        input1,input2 = x[0],x[1]\n        input1 = self.channel_att(input1)\n        x = input1 + input2\n        x = self.local_att(x)\n        return x\n\n# yolo.py\nelif m is Zoom_cat:\n    c2 = sum(ch[x] for x in f)\nelif m is Add:\n    c2 = ch[f[-1]]\nelif m is attention_model:\n    c2 = ch[f[-1]]\n    args = [c2]\nelif m is ScalSeq:\n    c1 = [ch[x] for x in f]\n    c2 = make_divisible(args[0] * gw, 8)\n    args = [c1, c2]\n\n\n# YOLOv5 🚀 by Ultralytics, AGPL-3.0 license\n\n# Parameters\nnc: 80  # number of classes\ndepth_multiple: 0.33  # model depth multiple\nwidth_multiple: 0.25  # layer channel multiple\nanchors:\n  - [10,13, 16,30, 33,23]  # P3/8\n  - [30,61, 62,45, 59,119]  # P4/16\n  - [116,90, 156,198, 373,326]  # P5/32\n\n# YOLOv5 v6.0 backbone\nbackbone:\n  # [from, number, module, args]\n  [[-1, 1, Conv, [64, 6, 2, 2]],  # 0-P1/2\n   [-1, 1, Conv, [128, 3, 2]],  # 1-P2/4\n   [-1, 3, C3, [128]],\n   [-1, 1, Conv, [256, 3, 2]],  # 3-P3/8\n   [-1, 6, C3, [256]],\n   [-1, 1, Conv, [512, 3, 2]],  # 5-P4/16\n   [-1, 9, C3, [512]],\n   [-1, 1, Conv, [1024, 3, 2]],  # 7-P5/32\n   [-1, 3, C3, [1024]],\n   [-1, 1, SPPF, [1024, 5]],  # 9\n  ]\n\n# YOLOv5 v6.0 head\nhead:\n  [[-1, 1, Conv, [512, 1, 1]], #10\n   [4, 1, Conv, [512, 1, 1]], #11\n   [[-1, 6, -2], 1, Zoom_cat, []],  # 12 cat backbone P4\n   [-1, 3, C3, [512, False]],  # 13\n\n   [-1, 1, Conv, [256, 1, 1]], #14\n   [2, 1, Conv, [256, 1, 1]], #15\n   [[-1, 4, -2], 1, Zoom_cat, []],  #16  cat backbone P3\n   [-1, 3, C3, [256, False]],  # 17 (P3/8-small)\n\n   [-1, 1, Conv, [256, 3, 2]], #18\n   [[-1, 14], 1, Concat, [1]],  #19 cat head P4\n   [-1, 3, C3, [512, False]],  # 20 (P4/16-medium)\n\n   [-1, 1, Conv, [512, 3, 2]], #21\n   [[-1, 10], 1, Concat, [1]],  #22 cat head P5\n   [-1, 3, C3, [1024, False]],  # 23 (P5/32-large)\n\n   [[4, 6, 8], 1, ScalSeq, [256]], #24 args[inchane]\n   [[17, -1], 1, attention_model, []], #25\n\n   [[25, 20, 23], 1, Detect, [nc, anchors]],  # Detect(P3, P4, P5)\n  ]\n"
  },
  {
    "path": "yolo-improve/yolov5-backbone/CVPR2023-EfficientViT/EfficientViT.py",
    "content": "# --------------------------------------------------------\n# EfficientViT Model Architecture for Downstream Tasks\n# Copyright (c) 2022 Microsoft\n# Written by: Xinyu Liu\n# --------------------------------------------------------\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport torch.utils.checkpoint as checkpoint\nimport itertools\n\nfrom timm.models.layers import SqueezeExcite\n\nimport numpy as np\nimport itertools\n\n__all__ = ['EfficientViT_M0', 'EfficientViT_M1', 'EfficientViT_M2', 'EfficientViT_M3', 'EfficientViT_M4', 'EfficientViT_M5']\n\nclass Conv2d_BN(torch.nn.Sequential):\n    def __init__(self, a, b, ks=1, stride=1, pad=0, dilation=1,\n                 groups=1, bn_weight_init=1, resolution=-10000):\n        super().__init__()\n        self.add_module('c', torch.nn.Conv2d(\n            a, b, ks, stride, pad, dilation, groups, bias=False))\n        self.add_module('bn', torch.nn.BatchNorm2d(b))\n        torch.nn.init.constant_(self.bn.weight, bn_weight_init)\n        torch.nn.init.constant_(self.bn.bias, 0)\n\n    @torch.no_grad()\n    def fuse(self):\n        c, bn = self._modules.values()\n        w = bn.weight / (bn.running_var + bn.eps)**0.5\n        w = c.weight * w[:, None, None, None]\n        b = bn.bias - bn.running_mean * bn.weight / \\\n            (bn.running_var + bn.eps)**0.5\n        m = torch.nn.Conv2d(w.size(1) * self.c.groups, w.size(\n            0), w.shape[2:], stride=self.c.stride, padding=self.c.padding, dilation=self.c.dilation, groups=self.c.groups)\n        m.weight.data.copy_(w)\n        m.bias.data.copy_(b)\n        return m\n\ndef replace_batchnorm(net):\n    for child_name, child in net.named_children():\n        if hasattr(child, 'fuse'):\n            setattr(net, child_name, child.fuse())\n        elif isinstance(child, torch.nn.BatchNorm2d):\n            setattr(net, child_name, torch.nn.Identity())\n        else:\n            replace_batchnorm(child)\n            \n\nclass PatchMerging(torch.nn.Module):\n    def __init__(self, dim, out_dim, input_resolution):\n        super().__init__()\n        hid_dim = int(dim * 4)\n        self.conv1 = Conv2d_BN(dim, hid_dim, 1, 1, 0, resolution=input_resolution)\n        self.act = torch.nn.ReLU()\n        self.conv2 = Conv2d_BN(hid_dim, hid_dim, 3, 2, 1, groups=hid_dim, resolution=input_resolution)\n        self.se = SqueezeExcite(hid_dim, .25)\n        self.conv3 = Conv2d_BN(hid_dim, out_dim, 1, 1, 0, resolution=input_resolution // 2)\n\n    def forward(self, x):\n        x = self.conv3(self.se(self.act(self.conv2(self.act(self.conv1(x))))))\n        return x\n\n\nclass Residual(torch.nn.Module):\n    def __init__(self, m, drop=0.):\n        super().__init__()\n        self.m = m\n        self.drop = drop\n\n    def forward(self, x):\n        if self.training and self.drop > 0:\n            return x + self.m(x) * torch.rand(x.size(0), 1, 1, 1,\n                                              device=x.device).ge_(self.drop).div(1 - self.drop).detach()\n        else:\n            return x + self.m(x)\n\n\nclass FFN(torch.nn.Module):\n    def __init__(self, ed, h, resolution):\n        super().__init__()\n        self.pw1 = Conv2d_BN(ed, h, resolution=resolution)\n        self.act = torch.nn.ReLU()\n        self.pw2 = Conv2d_BN(h, ed, bn_weight_init=0, resolution=resolution)\n\n    def forward(self, x):\n        x = self.pw2(self.act(self.pw1(x)))\n        return x\n\n\nclass CascadedGroupAttention(torch.nn.Module):\n    r\"\"\" Cascaded Group Attention.\n\n    Args:\n        dim (int): Number of input channels.\n        key_dim (int): The dimension for query and key.\n        num_heads (int): Number of attention heads.\n        attn_ratio (int): Multiplier for the query dim for value dimension.\n        resolution (int): Input resolution, correspond to the window size.\n        kernels (List[int]): The kernel size of the dw conv on query.\n    \"\"\"\n    def __init__(self, dim, key_dim, num_heads=8,\n                 attn_ratio=4,\n                 resolution=14,\n                 kernels=[5, 5, 5, 5],):\n        super().__init__()\n        self.num_heads = num_heads\n        self.scale = key_dim ** -0.5\n        self.key_dim = key_dim\n        self.d = int(attn_ratio * key_dim)\n        self.attn_ratio = attn_ratio\n\n        qkvs = []\n        dws = []\n        for i in range(num_heads):\n            qkvs.append(Conv2d_BN(dim // (num_heads), self.key_dim * 2 + self.d, resolution=resolution))\n            dws.append(Conv2d_BN(self.key_dim, self.key_dim, kernels[i], 1, kernels[i]//2, groups=self.key_dim, resolution=resolution))\n        self.qkvs = torch.nn.ModuleList(qkvs)\n        self.dws = torch.nn.ModuleList(dws)\n        self.proj = torch.nn.Sequential(torch.nn.ReLU(), Conv2d_BN(\n            self.d * num_heads, dim, bn_weight_init=0, resolution=resolution))\n\n        points = list(itertools.product(range(resolution), range(resolution)))\n        N = len(points)\n        attention_offsets = {}\n        idxs = []\n        for p1 in points:\n            for p2 in points:\n                offset = (abs(p1[0] - p2[0]), abs(p1[1] - p2[1]))\n                if offset not in attention_offsets:\n                    attention_offsets[offset] = len(attention_offsets)\n                idxs.append(attention_offsets[offset])\n        self.attention_biases = torch.nn.Parameter(\n            torch.zeros(num_heads, len(attention_offsets)))\n        self.register_buffer('attention_bias_idxs',\n                             torch.LongTensor(idxs).view(N, N))\n\n    @torch.no_grad()\n    def train(self, mode=True):\n        super().train(mode)\n        if mode and hasattr(self, 'ab'):\n            del self.ab\n        else:\n            self.ab = self.attention_biases[:, self.attention_bias_idxs]\n\n    def forward(self, x):  # x (B,C,H,W)\n        B, C, H, W = x.shape\n        trainingab = self.attention_biases[:, self.attention_bias_idxs]\n        feats_in = x.chunk(len(self.qkvs), dim=1)\n        feats_out = []\n        feat = feats_in[0]\n        for i, qkv in enumerate(self.qkvs):\n            if i > 0: # add the previous output to the input\n                feat = feat + feats_in[i]\n            feat = qkv(feat)\n            q, k, v = feat.view(B, -1, H, W).split([self.key_dim, self.key_dim, self.d], dim=1) # B, C/h, H, W\n            q = self.dws[i](q)\n            q, k, v = q.flatten(2), k.flatten(2), v.flatten(2) # B, C/h, N\n            attn = (\n                (q.transpose(-2, -1) @ k) * self.scale\n                +\n                (trainingab[i] if self.training else self.ab[i])\n            )\n            attn = attn.softmax(dim=-1) # BNN\n            feat = (v @ attn.transpose(-2, -1)).view(B, self.d, H, W) # BCHW\n            feats_out.append(feat)\n        x = self.proj(torch.cat(feats_out, 1))\n        return x\n\n\nclass LocalWindowAttention(torch.nn.Module):\n    r\"\"\" Local Window Attention.\n\n    Args:\n        dim (int): Number of input channels.\n        key_dim (int): The dimension for query and key.\n        num_heads (int): Number of attention heads.\n        attn_ratio (int): Multiplier for the query dim for value dimension.\n        resolution (int): Input resolution.\n        window_resolution (int): Local window resolution.\n        kernels (List[int]): The kernel size of the dw conv on query.\n    \"\"\"\n    def __init__(self, dim, key_dim, num_heads=8,\n                 attn_ratio=4,\n                 resolution=14,\n                 window_resolution=7,\n                 kernels=[5, 5, 5, 5],):\n        super().__init__()\n        self.dim = dim\n        self.num_heads = num_heads\n        self.resolution = resolution\n        assert window_resolution > 0, 'window_size must be greater than 0'\n        self.window_resolution = window_resolution\n        \n        self.attn = CascadedGroupAttention(dim, key_dim, num_heads,\n                                attn_ratio=attn_ratio, \n                                resolution=window_resolution,\n                                kernels=kernels,)\n\n    def forward(self, x):\n        B, C, H, W = x.shape\n               \n        if H <= self.window_resolution and W <= self.window_resolution:\n            x = self.attn(x)\n        else:\n            x = x.permute(0, 2, 3, 1)\n            pad_b = (self.window_resolution - H %\n                     self.window_resolution) % self.window_resolution\n            pad_r = (self.window_resolution - W %\n                     self.window_resolution) % self.window_resolution\n            padding = pad_b > 0 or pad_r > 0\n\n            if padding:\n                x = torch.nn.functional.pad(x, (0, 0, 0, pad_r, 0, pad_b))\n\n            pH, pW = H + pad_b, W + pad_r\n            nH = pH // self.window_resolution\n            nW = pW // self.window_resolution\n            # window partition, BHWC -> B(nHh)(nWw)C -> BnHnWhwC -> (BnHnW)hwC -> (BnHnW)Chw\n            x = x.view(B, nH, self.window_resolution, nW, self.window_resolution, C).transpose(2, 3).reshape(\n                B * nH * nW, self.window_resolution, self.window_resolution, C\n            ).permute(0, 3, 1, 2)\n            x = self.attn(x)\n            # window reverse, (BnHnW)Chw -> (BnHnW)hwC -> BnHnWhwC -> B(nHh)(nWw)C -> BHWC\n            x = x.permute(0, 2, 3, 1).view(B, nH, nW, self.window_resolution, self.window_resolution,\n                       C).transpose(2, 3).reshape(B, pH, pW, C)\n\n            if padding:\n                x = x[:, :H, :W].contiguous()\n\n            x = x.permute(0, 3, 1, 2)\n\n        return x\n\n\nclass EfficientViTBlock(torch.nn.Module):\n    \"\"\" A basic EfficientViT building block.\n\n    Args:\n        type (str): Type for token mixer. Default: 's' for self-attention.\n        ed (int): Number of input channels.\n        kd (int): Dimension for query and key in the token mixer.\n        nh (int): Number of attention heads.\n        ar (int): Multiplier for the query dim for value dimension.\n        resolution (int): Input resolution.\n        window_resolution (int): Local window resolution.\n        kernels (List[int]): The kernel size of the dw conv on query.\n    \"\"\"\n    def __init__(self, type,\n                 ed, kd, nh=8,\n                 ar=4,\n                 resolution=14,\n                 window_resolution=7,\n                 kernels=[5, 5, 5, 5],):\n        super().__init__()\n            \n        self.dw0 = Residual(Conv2d_BN(ed, ed, 3, 1, 1, groups=ed, bn_weight_init=0., resolution=resolution))\n        self.ffn0 = Residual(FFN(ed, int(ed * 2), resolution))\n\n        if type == 's':\n            self.mixer = Residual(LocalWindowAttention(ed, kd, nh, attn_ratio=ar, \\\n                    resolution=resolution, window_resolution=window_resolution, kernels=kernels))\n                \n        self.dw1 = Residual(Conv2d_BN(ed, ed, 3, 1, 1, groups=ed, bn_weight_init=0., resolution=resolution))\n        self.ffn1 = Residual(FFN(ed, int(ed * 2), resolution))\n\n    def forward(self, x):\n        return self.ffn1(self.dw1(self.mixer(self.ffn0(self.dw0(x)))))\n\n\nclass EfficientViT(torch.nn.Module):\n    def __init__(self, img_size=400,\n                 patch_size=16,\n                 frozen_stages=0,\n                 in_chans=3,\n                 stages=['s', 's', 's'],\n                 embed_dim=[64, 128, 192],\n                 key_dim=[16, 16, 16],\n                 depth=[1, 2, 3],\n                 num_heads=[4, 4, 4],\n                 window_size=[7, 7, 7],\n                 kernels=[5, 5, 5, 5],\n                 down_ops=[['subsample', 2], ['subsample', 2], ['']],\n                 pretrained=None,\n                 distillation=False,):\n        super().__init__()\n\n        resolution = img_size\n        self.patch_embed = torch.nn.Sequential(Conv2d_BN(in_chans, embed_dim[0] // 8, 3, 2, 1, resolution=resolution), torch.nn.ReLU(),\n                           Conv2d_BN(embed_dim[0] // 8, embed_dim[0] // 4, 3, 2, 1, resolution=resolution // 2), torch.nn.ReLU(),\n                           Conv2d_BN(embed_dim[0] // 4, embed_dim[0] // 2, 3, 2, 1, resolution=resolution // 4), torch.nn.ReLU(),\n                           Conv2d_BN(embed_dim[0] // 2, embed_dim[0], 3, 1, 1, resolution=resolution // 8))\n\n        resolution = img_size // patch_size\n        attn_ratio = [embed_dim[i] / (key_dim[i] * num_heads[i]) for i in range(len(embed_dim))]\n        self.blocks1 = []\n        self.blocks2 = []\n        self.blocks3 = []\n        for i, (stg, ed, kd, dpth, nh, ar, wd, do) in enumerate(\n                zip(stages, embed_dim, key_dim, depth, num_heads, attn_ratio, window_size, down_ops)):\n            for d in range(dpth):\n                eval('self.blocks' + str(i+1)).append(EfficientViTBlock(stg, ed, kd, nh, ar, resolution, wd, kernels))\n            if do[0] == 'subsample':\n                #('Subsample' stride)\n                blk = eval('self.blocks' + str(i+2))\n                resolution_ = (resolution - 1) // do[1] + 1\n                blk.append(torch.nn.Sequential(Residual(Conv2d_BN(embed_dim[i], embed_dim[i], 3, 1, 1, groups=embed_dim[i], resolution=resolution)),\n                                    Residual(FFN(embed_dim[i], int(embed_dim[i] * 2), resolution)),))\n                blk.append(PatchMerging(*embed_dim[i:i + 2], resolution))\n                resolution = resolution_\n                blk.append(torch.nn.Sequential(Residual(Conv2d_BN(embed_dim[i + 1], embed_dim[i + 1], 3, 1, 1, groups=embed_dim[i + 1], resolution=resolution)),\n                                    Residual(FFN(embed_dim[i + 1], int(embed_dim[i + 1] * 2), resolution)),))\n        self.blocks1 = torch.nn.Sequential(*self.blocks1)\n        self.blocks2 = torch.nn.Sequential(*self.blocks2)\n        self.blocks3 = torch.nn.Sequential(*self.blocks3)\n        \n        self.channel = [i.size(1) for i in self.forward(torch.randn(1, 3, 640, 640))]\n\n    def forward(self, x):\n        outs = []\n        x = self.patch_embed(x)\n        x = self.blocks1(x)\n        outs.append(x)\n        x = self.blocks2(x)\n        outs.append(x)\n        x = self.blocks3(x)\n        outs.append(x)\n        return outs\n\nEfficientViT_m0 = {\n        'img_size': 224,\n        'patch_size': 16,\n        'embed_dim': [64, 128, 192],\n        'depth': [1, 2, 3],\n        'num_heads': [4, 4, 4],\n        'window_size': [7, 7, 7],\n        'kernels': [7, 5, 3, 3],\n    }\n\nEfficientViT_m1 = {\n        'img_size': 224,\n        'patch_size': 16,\n        'embed_dim': [128, 144, 192],\n        'depth': [1, 2, 3],\n        'num_heads': [2, 3, 3],\n        'window_size': [7, 7, 7],\n        'kernels': [7, 5, 3, 3],\n    }\n\nEfficientViT_m2 = {\n        'img_size': 224,\n        'patch_size': 16,\n        'embed_dim': [128, 192, 224],\n        'depth': [1, 2, 3],\n        'num_heads': [4, 3, 2],\n        'window_size': [7, 7, 7],\n        'kernels': [7, 5, 3, 3],\n    }\n\nEfficientViT_m3 = {\n        'img_size': 224,\n        'patch_size': 16,\n        'embed_dim': [128, 240, 320],\n        'depth': [1, 2, 3],\n        'num_heads': [4, 3, 4],\n        'window_size': [7, 7, 7],\n        'kernels': [5, 5, 5, 5],\n    }\n\nEfficientViT_m4 = {\n        'img_size': 224,\n        'patch_size': 16,\n        'embed_dim': [128, 256, 384],\n        'depth': [1, 2, 3],\n        'num_heads': [4, 4, 4],\n        'window_size': [7, 7, 7],\n        'kernels': [7, 5, 3, 3],\n    }\n\nEfficientViT_m5 = {\n        'img_size': 224,\n        'patch_size': 16,\n        'embed_dim': [192, 288, 384],\n        'depth': [1, 3, 4],\n        'num_heads': [3, 3, 4],\n        'window_size': [7, 7, 7],\n        'kernels': [7, 5, 3, 3],\n    }\n\ndef EfficientViT_M0(pretrained='', frozen_stages=0, distillation=False, fuse=False, pretrained_cfg=None, model_cfg=EfficientViT_m0):\n    model = EfficientViT(frozen_stages=frozen_stages, distillation=distillation, pretrained=pretrained, **model_cfg)\n    if pretrained:\n        model.load_state_dict(update_weight(model.state_dict(), torch.load(pretrained)['model']))\n    if fuse:\n        replace_batchnorm(model)\n    return model\n\ndef EfficientViT_M1(pretrained='', frozen_stages=0, distillation=False, fuse=False, pretrained_cfg=None, model_cfg=EfficientViT_m1):\n    model = EfficientViT(frozen_stages=frozen_stages, distillation=distillation, pretrained=pretrained, **model_cfg)\n    if pretrained:\n        model.load_state_dict(update_weight(model.state_dict(), torch.load(pretrained)['model']))\n    if fuse:\n        replace_batchnorm(model)\n    return model\n\ndef EfficientViT_M2(pretrained='', frozen_stages=0, distillation=False, fuse=False, pretrained_cfg=None, model_cfg=EfficientViT_m2):\n    model = EfficientViT(frozen_stages=frozen_stages, distillation=distillation, pretrained=pretrained, **model_cfg)\n    if pretrained:\n        model.load_state_dict(update_weight(model.state_dict(), torch.load(pretrained)['model']))\n    if fuse:\n        replace_batchnorm(model)\n    return model\n\ndef EfficientViT_M3(pretrained='', frozen_stages=0, distillation=False, fuse=False, pretrained_cfg=None, model_cfg=EfficientViT_m3):\n    model = EfficientViT(frozen_stages=frozen_stages, distillation=distillation, pretrained=pretrained, **model_cfg)\n    if pretrained:\n        model.load_state_dict(update_weight(model.state_dict(), torch.load(pretrained)['model']))\n    if fuse:\n        replace_batchnorm(model)\n    return model\n    \ndef EfficientViT_M4(pretrained='', frozen_stages=0, distillation=False, fuse=False, pretrained_cfg=None, model_cfg=EfficientViT_m4):\n    model = EfficientViT(frozen_stages=frozen_stages, distillation=distillation, pretrained=pretrained, **model_cfg)\n    if pretrained:\n        model.load_state_dict(update_weight(model.state_dict(), torch.load(pretrained)['model']))\n    if fuse:\n        replace_batchnorm(model)\n    return model\n\ndef EfficientViT_M5(pretrained='', frozen_stages=0, distillation=False, fuse=False, pretrained_cfg=None, model_cfg=EfficientViT_m5):\n    model = EfficientViT(frozen_stages=frozen_stages, distillation=distillation, pretrained=pretrained, **model_cfg)\n    if pretrained:\n        model.load_state_dict(update_weight(model.state_dict(), torch.load(pretrained)['model']))\n    if fuse:\n        replace_batchnorm(model)\n    return model\n\ndef update_weight(model_dict, weight_dict):\n    idx, temp_dict = 0, {}\n    for k, v in weight_dict.items():\n        # k = k[9:]\n        if k in model_dict.keys() and np.shape(model_dict[k]) == np.shape(v):\n            temp_dict[k] = v\n            idx += 1\n    model_dict.update(temp_dict)\n    print(f'loading weights... {idx}/{len(model_dict)} items')\n    return model_dict\n\nif __name__ == '__main__':\n    model = EfficientViT_M0('efficientvit_m0.pth')\n    inputs = torch.randn((1, 3, 640, 640))\n    res = model(inputs)\n    for i in res:\n        print(i.size())"
  },
  {
    "path": "yolo-improve/yolov5-backbone/CVPR2024-StarNet/starnet.py",
    "content": "\"\"\"\nImplementation of Prof-of-Concept Network: StarNet.\n\nWe make StarNet as simple as possible [to show the key contribution of element-wise multiplication]:\n    - like NO layer-scale in network design,\n    - and NO EMA during training,\n    - which would improve the performance further.\n\nCreated by: Xu Ma (Email: ma.xu1@northeastern.edu)\nModified Date: Mar/29/2024\n\"\"\"\nimport torch\nimport torch.nn as nn\nfrom timm.models.layers import DropPath, trunc_normal_\n\n__all__ = ['starnet_s050', 'starnet_s100', 'starnet_s150', 'starnet_s1', 'starnet_s2', 'starnet_s3', 'starnet_s4']\n\nmodel_urls = {\n    \"starnet_s1\": \"https://github.com/ma-xu/Rewrite-the-Stars/releases/download/checkpoints_v1/starnet_s1.pth.tar\",\n    \"starnet_s2\": \"https://github.com/ma-xu/Rewrite-the-Stars/releases/download/checkpoints_v1/starnet_s2.pth.tar\",\n    \"starnet_s3\": \"https://github.com/ma-xu/Rewrite-the-Stars/releases/download/checkpoints_v1/starnet_s3.pth.tar\",\n    \"starnet_s4\": \"https://github.com/ma-xu/Rewrite-the-Stars/releases/download/checkpoints_v1/starnet_s4.pth.tar\",\n}\n\n\nclass ConvBN(torch.nn.Sequential):\n    def __init__(self, in_planes, out_planes, kernel_size=1, stride=1, padding=0, dilation=1, groups=1, with_bn=True):\n        super().__init__()\n        self.add_module('conv', torch.nn.Conv2d(in_planes, out_planes, kernel_size, stride, padding, dilation, groups))\n        if with_bn:\n            self.add_module('bn', torch.nn.BatchNorm2d(out_planes))\n            torch.nn.init.constant_(self.bn.weight, 1)\n            torch.nn.init.constant_(self.bn.bias, 0)\n\n\nclass Block(nn.Module):\n    def __init__(self, dim, mlp_ratio=3, drop_path=0.):\n        super().__init__()\n        self.dwconv = ConvBN(dim, dim, 7, 1, (7 - 1) // 2, groups=dim, with_bn=True)\n        self.f1 = ConvBN(dim, mlp_ratio * dim, 1, with_bn=False)\n        self.f2 = ConvBN(dim, mlp_ratio * dim, 1, with_bn=False)\n        self.g = ConvBN(mlp_ratio * dim, dim, 1, with_bn=True)\n        self.dwconv2 = ConvBN(dim, dim, 7, 1, (7 - 1) // 2, groups=dim, with_bn=False)\n        self.act = nn.ReLU6()\n        self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity()\n\n    def forward(self, x):\n        input = x\n        x = self.dwconv(x)\n        x1, x2 = self.f1(x), self.f2(x)\n        x = self.act(x1) * x2\n        x = self.dwconv2(self.g(x))\n        x = input + self.drop_path(x)\n        return x\n\n\nclass StarNet(nn.Module):\n    def __init__(self, base_dim=32, depths=[3, 3, 12, 5], mlp_ratio=4, drop_path_rate=0.0, num_classes=1000, **kwargs):\n        super().__init__()\n        self.num_classes = num_classes\n        self.in_channel = 32\n        # stem layer\n        self.stem = nn.Sequential(ConvBN(3, self.in_channel, kernel_size=3, stride=2, padding=1), nn.ReLU6())\n        dpr = [x.item() for x in torch.linspace(0, drop_path_rate, sum(depths))] # stochastic depth\n        # build stages\n        self.stages = nn.ModuleList()\n        cur = 0\n        for i_layer in range(len(depths)):\n            embed_dim = base_dim * 2 ** i_layer\n            down_sampler = ConvBN(self.in_channel, embed_dim, 3, 2, 1)\n            self.in_channel = embed_dim\n            blocks = [Block(self.in_channel, mlp_ratio, dpr[cur + i]) for i in range(depths[i_layer])]\n            cur += depths[i_layer]\n            self.stages.append(nn.Sequential(down_sampler, *blocks))\n        \n        self.channel = [i.size(1) for i in self.forward(torch.randn(1, 3, 640, 640))]\n        self.apply(self._init_weights)\n\n    def _init_weights(self, m):\n        if isinstance(m, nn.Linear or nn.Conv2d):\n            trunc_normal_(m.weight, std=.02)\n            if isinstance(m, nn.Linear) and m.bias is not None:\n                nn.init.constant_(m.bias, 0)\n        elif isinstance(m, nn.LayerNorm or nn.BatchNorm2d):\n            nn.init.constant_(m.bias, 0)\n            nn.init.constant_(m.weight, 1.0)\n\n    def forward(self, x):\n        features = []\n        x = self.stem(x)\n        features.append(x)\n        for stage in self.stages:\n            x = stage(x)\n            features.append(x)\n        return features\n\n\n\ndef starnet_s1(pretrained=False, **kwargs):\n    model = StarNet(24, [2, 2, 8, 3], **kwargs)\n    if pretrained:\n        url = model_urls['starnet_s1']\n        checkpoint = torch.hub.load_state_dict_from_url(url=url, map_location=\"cpu\")\n        model.load_state_dict(checkpoint[\"state_dict\"], strict=False)\n    return model\n\n\n\ndef starnet_s2(pretrained=False, **kwargs):\n    model = StarNet(32, [1, 2, 6, 2], **kwargs)\n    if pretrained:\n        url = model_urls['starnet_s2']\n        checkpoint = torch.hub.load_state_dict_from_url(url=url, map_location=\"cpu\")\n        model.load_state_dict(checkpoint[\"state_dict\"], strict=False)\n    return model\n\n\n\ndef starnet_s3(pretrained=False, **kwargs):\n    model = StarNet(32, [2, 2, 8, 4], **kwargs)\n    if pretrained:\n        url = model_urls['starnet_s3']\n        checkpoint = torch.hub.load_state_dict_from_url(url=url, map_location=\"cpu\")\n        model.load_state_dict(checkpoint[\"state_dict\"], strict=False)\n    return model\n\n\n\ndef starnet_s4(pretrained=False, **kwargs):\n    model = StarNet(32, [3, 3, 12, 5], **kwargs)\n    if pretrained:\n        url = model_urls['starnet_s4']\n        checkpoint = torch.hub.load_state_dict_from_url(url=url, map_location=\"cpu\")\n        model.load_state_dict(checkpoint[\"state_dict\"], strict=False)\n    return model\n\n\n# very small networks #\n\ndef starnet_s050(pretrained=False, **kwargs):\n    return StarNet(16, [1, 1, 3, 1], 3, **kwargs)\n\n\n\ndef starnet_s100(pretrained=False, **kwargs):\n    return StarNet(20, [1, 2, 4, 1], 4, **kwargs)\n\n\n\ndef starnet_s150(pretrained=False, **kwargs):\n    return StarNet(24, [1, 2, 4, 2], 3, **kwargs)\n\n"
  },
  {
    "path": "yolo-improve/yolov5-backbone/ConvNextV2/convnextv2.py",
    "content": "# Copyright (c) Meta Platforms, Inc. and affiliates.\n\n# All rights reserved.\n\n# This source code is licensed under the license found in the\n# LICENSE file in the root directory of this source tree.\n\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport numpy as np\nfrom timm.models.layers import trunc_normal_, DropPath\n\n__all__ = ['convnextv2_atto', 'convnextv2_femto', 'convnextv2_pico', 'convnextv2_nano', 'convnextv2_tiny', 'convnextv2_base', 'convnextv2_large', 'convnextv2_huge']\n\nclass LayerNorm(nn.Module):\n    \"\"\" LayerNorm that supports two data formats: channels_last (default) or channels_first. \n    The ordering of the dimensions in the inputs. channels_last corresponds to inputs with \n    shape (batch_size, height, width, channels) while channels_first corresponds to inputs \n    with shape (batch_size, channels, height, width).\n    \"\"\"\n    def __init__(self, normalized_shape, eps=1e-6, data_format=\"channels_last\"):\n        super().__init__()\n        self.weight = nn.Parameter(torch.ones(normalized_shape))\n        self.bias = nn.Parameter(torch.zeros(normalized_shape))\n        self.eps = eps\n        self.data_format = data_format\n        if self.data_format not in [\"channels_last\", \"channels_first\"]:\n            raise NotImplementedError \n        self.normalized_shape = (normalized_shape, )\n    \n    def forward(self, x):\n        if self.data_format == \"channels_last\":\n            return F.layer_norm(x, self.normalized_shape, self.weight, self.bias, self.eps)\n        elif self.data_format == \"channels_first\":\n            u = x.mean(1, keepdim=True)\n            s = (x - u).pow(2).mean(1, keepdim=True)\n            x = (x - u) / torch.sqrt(s + self.eps)\n            x = self.weight[:, None, None] * x + self.bias[:, None, None]\n            return x\n\nclass GRN(nn.Module):\n    \"\"\" GRN (Global Response Normalization) layer\n    \"\"\"\n    def __init__(self, dim):\n        super().__init__()\n        self.gamma = nn.Parameter(torch.zeros(1, 1, 1, dim))\n        self.beta = nn.Parameter(torch.zeros(1, 1, 1, dim))\n\n    def forward(self, x):\n        Gx = torch.norm(x, p=2, dim=(1,2), keepdim=True)\n        Nx = Gx / (Gx.mean(dim=-1, keepdim=True) + 1e-6)\n        return self.gamma * (x * Nx) + self.beta + x\n\nclass Block(nn.Module):\n    \"\"\" ConvNeXtV2 Block.\n    \n    Args:\n        dim (int): Number of input channels.\n        drop_path (float): Stochastic depth rate. Default: 0.0\n    \"\"\"\n    def __init__(self, dim, drop_path=0.):\n        super().__init__()\n        self.dwconv = nn.Conv2d(dim, dim, kernel_size=7, padding=3, groups=dim) # depthwise conv\n        self.norm = LayerNorm(dim, eps=1e-6)\n        self.pwconv1 = nn.Linear(dim, 4 * dim) # pointwise/1x1 convs, implemented with linear layers\n        self.act = nn.GELU()\n        self.grn = GRN(4 * dim)\n        self.pwconv2 = nn.Linear(4 * dim, dim)\n        self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity()\n\n    def forward(self, x):\n        input = x\n        x = self.dwconv(x)\n        x = x.permute(0, 2, 3, 1) # (N, C, H, W) -> (N, H, W, C)\n        x = self.norm(x)\n        x = self.pwconv1(x)\n        x = self.act(x)\n        x = self.grn(x)\n        x = self.pwconv2(x)\n        x = x.permute(0, 3, 1, 2) # (N, H, W, C) -> (N, C, H, W)\n\n        x = input + self.drop_path(x)\n        return x\n\nclass ConvNeXtV2(nn.Module):\n    \"\"\" ConvNeXt V2\n        \n    Args:\n        in_chans (int): Number of input image channels. Default: 3\n        num_classes (int): Number of classes for classification head. Default: 1000\n        depths (tuple(int)): Number of blocks at each stage. Default: [3, 3, 9, 3]\n        dims (int): Feature dimension at each stage. Default: [96, 192, 384, 768]\n        drop_path_rate (float): Stochastic depth rate. Default: 0.\n        head_init_scale (float): Init scaling value for classifier weights and biases. Default: 1.\n    \"\"\"\n    def __init__(self, in_chans=3, num_classes=1000, \n                 depths=[3, 3, 9, 3], dims=[96, 192, 384, 768], \n                 drop_path_rate=0., head_init_scale=1.\n                 ):\n        super().__init__()\n        self.depths = depths\n        self.downsample_layers = nn.ModuleList() # stem and 3 intermediate downsampling conv layers\n        stem = nn.Sequential(\n            nn.Conv2d(in_chans, dims[0], kernel_size=4, stride=4),\n            LayerNorm(dims[0], eps=1e-6, data_format=\"channels_first\")\n        )\n        self.downsample_layers.append(stem)\n        for i in range(3):\n            downsample_layer = nn.Sequential(\n                    LayerNorm(dims[i], eps=1e-6, data_format=\"channels_first\"),\n                    nn.Conv2d(dims[i], dims[i+1], kernel_size=2, stride=2),\n            )\n            self.downsample_layers.append(downsample_layer)\n\n        self.stages = nn.ModuleList() # 4 feature resolution stages, each consisting of multiple residual blocks\n        dp_rates=[x.item() for x in torch.linspace(0, drop_path_rate, sum(depths))] \n        cur = 0\n        for i in range(4):\n            stage = nn.Sequential(\n                *[Block(dim=dims[i], drop_path=dp_rates[cur + j]) for j in range(depths[i])]\n            )\n            self.stages.append(stage)\n            cur += depths[i]\n\n        self.norm = nn.LayerNorm(dims[-1], eps=1e-6) # final norm layer\n        self.head = nn.Linear(dims[-1], num_classes)\n\n        self.apply(self._init_weights)\n        self.channel = [i.size(1) for i in self.forward(torch.randn(1, 3, 640, 640))]\n\n    def _init_weights(self, m):\n        if isinstance(m, (nn.Conv2d, nn.Linear)):\n            trunc_normal_(m.weight, std=.02)\n            nn.init.constant_(m.bias, 0)\n\n    def forward(self, x):\n        res = []\n        for i in range(4):\n            x = self.downsample_layers[i](x)\n            x = self.stages[i](x)\n            res.append(x)\n        return res\n\ndef update_weight(model_dict, weight_dict):\n    idx, temp_dict = 0, {}\n    for k, v in weight_dict.items():\n        if k in model_dict.keys() and np.shape(model_dict[k]) == np.shape(v):\n            temp_dict[k] = v\n            idx += 1\n    model_dict.update(temp_dict)\n    print(f'loading weights... {idx}/{len(model_dict)} items')\n    return model_dict\n\ndef convnextv2_atto(weights='', **kwargs):\n    model = ConvNeXtV2(depths=[2, 2, 6, 2], dims=[40, 80, 160, 320], **kwargs)\n    if weights:\n        model.load_state_dict(update_weight(model.state_dict(), torch.load(weights)['model']))\n    return model\n\ndef convnextv2_femto(weights='', **kwargs):\n    model = ConvNeXtV2(depths=[2, 2, 6, 2], dims=[48, 96, 192, 384], **kwargs)\n    if weights:\n        model.load_state_dict(update_weight(model.state_dict(), torch.load(weights)['model']))\n    return model\n\ndef convnextv2_pico(weights='', **kwargs):\n    model = ConvNeXtV2(depths=[2, 2, 6, 2], dims=[64, 128, 256, 512], **kwargs)\n    if weights:\n        model.load_state_dict(update_weight(model.state_dict(), torch.load(weights)['model']))\n    return model\n\ndef convnextv2_nano(weights='', **kwargs):\n    model = ConvNeXtV2(depths=[2, 2, 8, 2], dims=[80, 160, 320, 640], **kwargs)\n    if weights:\n        model.load_state_dict(update_weight(model.state_dict(), torch.load(weights)['model']))\n    return model\n\ndef convnextv2_tiny(weights='', **kwargs):\n    model = ConvNeXtV2(depths=[3, 3, 9, 3], dims=[96, 192, 384, 768], **kwargs)\n    if weights:\n        model.load_state_dict(update_weight(model.state_dict(), torch.load(weights)['model']))\n    return model\n\ndef convnextv2_base(weights='', **kwargs):\n    model = ConvNeXtV2(depths=[3, 3, 27, 3], dims=[128, 256, 512, 1024], **kwargs)\n    if weights:\n        model.load_state_dict(update_weight(model.state_dict(), torch.load(weights)['model']))\n    return model\n\ndef convnextv2_large(weights='', **kwargs):\n    model = ConvNeXtV2(depths=[3, 3, 27, 3], dims=[192, 384, 768, 1536], **kwargs)\n    if weights:\n        model.load_state_dict(update_weight(model.state_dict(), torch.load(weights)['model']))\n    return model\n\ndef convnextv2_huge(weights='', **kwargs):\n    model = ConvNeXtV2(depths=[3, 3, 27, 3], dims=[352, 704, 1408, 2816], **kwargs)\n    if weights:\n        model.load_state_dict(update_weight(model.state_dict(), torch.load(weights)['model']))\n    return model"
  },
  {
    "path": "yolo-improve/yolov5-backbone/EMO/emo.py",
    "content": "import math\nimport numpy as np\nimport torch.nn as nn\nfrom einops import rearrange, reduce\nfrom timm.models.layers.activations import *\nfrom timm.models.layers import DropPath, trunc_normal_, create_attn\nfrom timm.models.efficientnet_blocks import num_groups, SqueezeExcite as SE\nfrom functools import partial\n\n__all__ = ['EMO_1M', 'EMO_2M', 'EMO_5M', 'EMO_6M']\n\ninplace = True\n\ndef get_act(act_layer='relu'):\n\tact_dict = {\n\t\t'none': nn.Identity,\n\t\t'sigmoid': Sigmoid,\n\t\t'swish': Swish,\n\t\t'mish': Mish,\n\t\t'hsigmoid': HardSigmoid,\n\t\t'hswish': HardSwish,\n\t\t'hmish': HardMish,\n\t\t'tanh': Tanh,\n\t\t'relu': nn.ReLU,\n\t\t'relu6': nn.ReLU6,\n\t\t'prelu': PReLU,\n\t\t'gelu': GELU,\n\t\t'silu': nn.SiLU\n\t}\n\treturn act_dict[act_layer]\n\nclass LayerNorm2d(nn.Module):\n\t\n\tdef __init__(self, normalized_shape, eps=1e-6, elementwise_affine=True):\n\t\tsuper().__init__()\n\t\tself.norm = nn.LayerNorm(normalized_shape, eps, elementwise_affine)\n\t\n\tdef forward(self, x):\n\t\tx = rearrange(x, 'b c h w -> b h w c').contiguous()\n\t\tx = self.norm(x)\n\t\tx = rearrange(x, 'b h w c -> b c h w').contiguous()\n\t\treturn x\n\ndef get_norm(norm_layer='in_1d'):\n\teps = 1e-6\n\tnorm_dict = {\n\t\t'none': nn.Identity,\n\t\t'in_1d': partial(nn.InstanceNorm1d, eps=eps),\n\t\t'in_2d': partial(nn.InstanceNorm2d, eps=eps),\n\t\t'in_3d': partial(nn.InstanceNorm3d, eps=eps),\n\t\t'bn_1d': partial(nn.BatchNorm1d, eps=eps),\n\t\t'bn_2d': partial(nn.BatchNorm2d, eps=eps),\n\t\t'bn_3d': partial(nn.BatchNorm3d, eps=eps),\n\t\t'gn': partial(nn.GroupNorm, eps=eps),\n\t\t'ln_1d': partial(nn.LayerNorm, eps=eps),\n\t\t'ln_2d': partial(LayerNorm2d, eps=eps),\n\t}\n\treturn norm_dict[norm_layer]\n\nclass ConvNormAct(nn.Module):\n\t\n\tdef __init__(self, dim_in, dim_out, kernel_size, stride=1, dilation=1, groups=1, bias=False,\n\t\t\t\t skip=False, norm_layer='bn_2d', act_layer='relu', inplace=True, drop_path_rate=0.):\n\t\tsuper(ConvNormAct, self).__init__()\n\t\tself.has_skip = skip and dim_in == dim_out\n\t\tpadding = math.ceil((kernel_size - stride) / 2)\n\t\tself.conv = nn.Conv2d(dim_in, dim_out, kernel_size, stride, padding, dilation, groups, bias)\n\t\tself.norm = get_norm(norm_layer)(dim_out)\n\t\tself.act = get_act(act_layer)(inplace=inplace)\n\t\tself.drop_path = DropPath(drop_path_rate) if drop_path_rate else nn.Identity()\n\t\n\tdef forward(self, x):\n\t\tshortcut = x\n\t\tx = self.conv(x)\n\t\tx = self.norm(x)\n\t\tx = self.act(x)\n\t\tif self.has_skip:\n\t\t\tx = self.drop_path(x) + shortcut\n\t\treturn x\n\ninplace = True\n\n# ========== Multi-Scale Populations, for down-sampling and inductive bias ==========\nclass MSPatchEmb(nn.Module):\n\t\n\tdef __init__(self, dim_in, emb_dim, kernel_size=2, c_group=-1, stride=1, dilations=[1, 2, 3],\n\t\t\t\t norm_layer='bn_2d', act_layer='silu'):\n\t\tsuper().__init__()\n\t\tself.dilation_num = len(dilations)\n\t\tassert dim_in % c_group == 0\n\t\tc_group = math.gcd(dim_in, emb_dim) if c_group == -1 else c_group\n\t\tself.convs = nn.ModuleList()\n\t\tfor i in range(len(dilations)):\n\t\t\tpadding = math.ceil(((kernel_size - 1) * dilations[i] + 1 - stride) / 2)\n\t\t\tself.convs.append(nn.Sequential(nn.Conv2d(dim_in, emb_dim, kernel_size, stride, padding, dilations[i], groups=c_group),\n\t\t\t\tget_norm(norm_layer)(emb_dim),\n\t\t\t\tget_act(act_layer)(emb_dim)))\n\t\n\tdef forward(self, x):\n\t\tif self.dilation_num == 1:\n\t\t\tx = self.convs[0](x)\n\t\telse:\n\t\t\tx = torch.cat([self.convs[i](x).unsqueeze(dim=-1) for i in range(self.dilation_num)], dim=-1)\n\t\t\tx = reduce(x, 'b c h w n -> b c h w', 'mean').contiguous()\n\t\treturn x\n\n\nclass iRMB(nn.Module):\n\tdef __init__(self, dim_in, dim_out, norm_in=True, has_skip=True, exp_ratio=1.0, norm_layer='bn_2d',\n\t\t\t\t act_layer='relu', v_proj=True, dw_ks=3, stride=1, dilation=1, se_ratio=0.0, dim_head=64, window_size=7,\n\t\t\t\t attn_s=True, qkv_bias=False, attn_drop=0., drop=0., drop_path=0., v_group=False, attn_pre=False):\n\t\tsuper().__init__()\n\t\tself.norm = get_norm(norm_layer)(dim_in) if norm_in else nn.Identity()\n\t\tdim_mid = int(dim_in * exp_ratio)\n\t\tself.has_skip = (dim_in == dim_out and stride == 1) and has_skip\n\t\tself.attn_s = attn_s\n\t\tif self.attn_s:\n\t\t\tassert dim_in % dim_head == 0, 'dim should be divisible by num_heads'\n\t\t\tself.dim_head = dim_head\n\t\t\tself.window_size = window_size\n\t\t\tself.num_head = dim_in // dim_head\n\t\t\tself.scale = self.dim_head ** -0.5\n\t\t\tself.attn_pre = attn_pre\n\t\t\tself.qk = ConvNormAct(dim_in, int(dim_in * 2), kernel_size=1, bias=qkv_bias, norm_layer='none', act_layer='none')\n\t\t\tself.v = ConvNormAct(dim_in, dim_mid, kernel_size=1, groups=self.num_head if v_group else 1, bias=qkv_bias, norm_layer='none', act_layer=act_layer, inplace=inplace)\n\t\t\tself.attn_drop = nn.Dropout(attn_drop)\n\t\telse:\n\t\t\tif v_proj:\n\t\t\t\tself.v = ConvNormAct(dim_in, dim_mid, kernel_size=1, bias=qkv_bias, norm_layer='none', act_layer=act_layer, inplace=inplace)\n\t\t\telse:\n\t\t\t\tself.v = nn.Identity()\n\t\tself.conv_local = ConvNormAct(dim_mid, dim_mid, kernel_size=dw_ks, stride=stride, dilation=dilation, groups=dim_mid, norm_layer='bn_2d', act_layer='silu', inplace=inplace)\n\t\tself.se = SE(dim_mid, rd_ratio=se_ratio, act_layer=get_act(act_layer)) if se_ratio > 0.0 else nn.Identity()\n\t\t\n\t\tself.proj_drop = nn.Dropout(drop)\n\t\tself.proj = ConvNormAct(dim_mid, dim_out, kernel_size=1, norm_layer='none', act_layer='none', inplace=inplace)\n\t\tself.drop_path = DropPath(drop_path) if drop_path else nn.Identity()\n\t\n\tdef forward(self, x):\n\t\tshortcut = x\n\t\tx = self.norm(x)\n\t\tB, C, H, W = x.shape\n\t\tif self.attn_s:\n\t\t\t# padding\n\t\t\tif self.window_size <= 0:\n\t\t\t\twindow_size_W, window_size_H = W, H\n\t\t\telse:\n\t\t\t\twindow_size_W, window_size_H = self.window_size, self.window_size\n\t\t\tpad_l, pad_t = 0, 0\n\t\t\tpad_r = (window_size_W - W % window_size_W) % window_size_W\n\t\t\tpad_b = (window_size_H - H % window_size_H) % window_size_H\n\t\t\tx = F.pad(x, (pad_l, pad_r, pad_t, pad_b, 0, 0,))\n\t\t\tn1, n2 = (H + pad_b) // window_size_H, (W + pad_r) // window_size_W\n\t\t\tx = rearrange(x, 'b c (h1 n1) (w1 n2) -> (b n1 n2) c h1 w1', n1=n1, n2=n2).contiguous()\n\t\t\t# attention\n\t\t\tb, c, h, w = x.shape\n\t\t\tqk = self.qk(x)\n\t\t\tqk = rearrange(qk, 'b (qk heads dim_head) h w -> qk b heads (h w) dim_head', qk=2, heads=self.num_head, dim_head=self.dim_head).contiguous()\n\t\t\tq, k = qk[0], qk[1]\n\t\t\tattn_spa = (q @ k.transpose(-2, -1)) * self.scale\n\t\t\tattn_spa = attn_spa.softmax(dim=-1)\n\t\t\tattn_spa = self.attn_drop(attn_spa)\n\t\t\tif self.attn_pre:\n\t\t\t\tx = rearrange(x, 'b (heads dim_head) h w -> b heads (h w) dim_head', heads=self.num_head).contiguous()\n\t\t\t\tx_spa = attn_spa @ x\n\t\t\t\tx_spa = rearrange(x_spa, 'b heads (h w) dim_head -> b (heads dim_head) h w', heads=self.num_head, h=h, w=w).contiguous()\n\t\t\t\tx_spa = self.v(x_spa)\n\t\t\telse:\n\t\t\t\tv = self.v(x)\n\t\t\t\tv = rearrange(v, 'b (heads dim_head) h w -> b heads (h w) dim_head', heads=self.num_head).contiguous()\n\t\t\t\tx_spa = attn_spa @ v\n\t\t\t\tx_spa = rearrange(x_spa, 'b heads (h w) dim_head -> b (heads dim_head) h w', heads=self.num_head, h=h, w=w).contiguous()\n\t\t\t# unpadding\n\t\t\tx = rearrange(x_spa, '(b n1 n2) c h1 w1 -> b c (h1 n1) (w1 n2)', n1=n1, n2=n2).contiguous()\n\t\t\tif pad_r > 0 or pad_b > 0:\n\t\t\t\tx = x[:, :, :H, :W].contiguous()\n\t\telse:\n\t\t\tx = self.v(x)\n\n\t\tx = x + self.se(self.conv_local(x)) if self.has_skip else self.se(self.conv_local(x))\n\t\t\n\t\tx = self.proj_drop(x)\n\t\tx = self.proj(x)\n\t\t\n\t\tx = (shortcut + self.drop_path(x)) if self.has_skip else x\n\t\treturn x\n\n\nclass EMO(nn.Module):\n\tdef __init__(self, dim_in=3, num_classes=1000, img_size=224,\n\t\t\t\t depths=[1, 2, 4, 2], stem_dim=16, embed_dims=[64, 128, 256, 512], exp_ratios=[4., 4., 4., 4.],\n\t\t\t\t norm_layers=['bn_2d', 'bn_2d', 'bn_2d', 'bn_2d'], act_layers=['relu', 'relu', 'relu', 'relu'],\n\t\t\t\t dw_kss=[3, 3, 5, 5], se_ratios=[0.0, 0.0, 0.0, 0.0], dim_heads=[32, 32, 32, 32],\n\t\t\t\t window_sizes=[7, 7, 7, 7], attn_ss=[False, False, True, True], qkv_bias=True,\n\t\t\t\t attn_drop=0., drop=0., drop_path=0., v_group=False, attn_pre=False, pre_dim=0):\n\t\tsuper().__init__()\n\t\tself.num_classes = num_classes\n\t\tassert num_classes > 0\n\t\tdprs = [x.item() for x in torch.linspace(0, drop_path, sum(depths))]\n\t\tself.stage0 = nn.ModuleList([\n\t\t\tMSPatchEmb(  # down to 112\n\t\t\t\tdim_in, stem_dim, kernel_size=dw_kss[0], c_group=1, stride=2, dilations=[1],\n\t\t\t\tnorm_layer=norm_layers[0], act_layer='none'),\n\t\t\tiRMB(  # ds\n\t\t\t\tstem_dim, stem_dim, norm_in=False, has_skip=False, exp_ratio=1,\n\t\t\t\tnorm_layer=norm_layers[0], act_layer=act_layers[0], v_proj=False, dw_ks=dw_kss[0],\n\t\t\t\tstride=1, dilation=1, se_ratio=1,\n\t\t\t\tdim_head=dim_heads[0], window_size=window_sizes[0], attn_s=False,\n\t\t\t\tqkv_bias=qkv_bias, attn_drop=attn_drop, drop=drop, drop_path=0.,\n\t\t\t\tattn_pre=attn_pre\n\t\t\t)\n\t\t])\n\t\temb_dim_pre = stem_dim\n\t\tfor i in range(len(depths)):\n\t\t\tlayers = []\n\t\t\tdpr = dprs[sum(depths[:i]):sum(depths[:i + 1])]\n\t\t\tfor j in range(depths[i]):\n\t\t\t\tif j == 0:\n\t\t\t\t\tstride, has_skip, attn_s, exp_ratio = 2, False, False, exp_ratios[i] * 2\n\t\t\t\telse:\n\t\t\t\t\tstride, has_skip, attn_s, exp_ratio = 1, True, attn_ss[i], exp_ratios[i]\n\t\t\t\tlayers.append(iRMB(\n\t\t\t\t\temb_dim_pre, embed_dims[i], norm_in=True, has_skip=has_skip, exp_ratio=exp_ratio,\n\t\t\t\t\tnorm_layer=norm_layers[i], act_layer=act_layers[i], v_proj=True, dw_ks=dw_kss[i],\n\t\t\t\t\tstride=stride, dilation=1, se_ratio=se_ratios[i],\n\t\t\t\t\tdim_head=dim_heads[i], window_size=window_sizes[i], attn_s=attn_s,\n\t\t\t\t\tqkv_bias=qkv_bias, attn_drop=attn_drop, drop=drop, drop_path=dpr[j], v_group=v_group,\n\t\t\t\t\tattn_pre=attn_pre\n\t\t\t\t))\n\t\t\t\temb_dim_pre = embed_dims[i]\n\t\t\tself.__setattr__(f'stage{i + 1}', nn.ModuleList(layers))\n\t\t\n\t\tself.norm = get_norm(norm_layers[-1])(embed_dims[-1])\n\t\tself.apply(self._init_weights)\n\t\tself.channel = [i.size(1) for i in self.forward(torch.randn(1, 3, 640, 640))]\n\t\n\tdef _init_weights(self, m):\n\t\tif isinstance(m, nn.Linear):\n\t\t\ttrunc_normal_(m.weight, std=.02)\n\t\t\tif m.bias is not None:\n\t\t\t\tnn.init.zeros_(m.bias)\n\t\telif isinstance(m, (nn.LayerNorm, nn.GroupNorm,\n\t\t\t\t\t\t\tnn.BatchNorm1d, nn.BatchNorm2d, nn.BatchNorm3d,\n\t\t\t\t\t\t\tnn.InstanceNorm1d, nn.InstanceNorm2d, nn.InstanceNorm3d)):\n\t\t\tnn.init.zeros_(m.bias)\n\t\t\tnn.init.ones_(m.weight)\n\t\n\t@torch.jit.ignore\n\tdef no_weight_decay(self):\n\t\treturn {'token'}\n\t\n\t@torch.jit.ignore\n\tdef no_weight_decay_keywords(self):\n\t\treturn {'alpha', 'gamma', 'beta'}\n\t\n\t@torch.jit.ignore\n\tdef no_ft_keywords(self):\n\t\t# return {'head.weight', 'head.bias'}\n\t\treturn {}\n\t\n\t@torch.jit.ignore\n\tdef ft_head_keywords(self):\n\t\treturn {'head.weight', 'head.bias'}, self.num_classes\n\t\n\tdef get_classifier(self):\n\t\treturn self.head\n\t\n\tdef reset_classifier(self, num_classes):\n\t\tself.num_classes = num_classes\n\t\tself.head = nn.Linear(self.pre_dim, num_classes) if num_classes > 0 else nn.Identity()\n\t\n\tdef check_bn(self):\n\t\tfor name, m in self.named_modules():\n\t\t\tif isinstance(m, nn.modules.batchnorm._NormBase):\n\t\t\t\tm.running_mean = torch.nan_to_num(m.running_mean, nan=0, posinf=1, neginf=-1)\n\t\t\t\tm.running_var = torch.nan_to_num(m.running_var, nan=0, posinf=1, neginf=-1)\n\t\n\tdef forward_features(self, x):\n\t\tfor blk in self.stage0:\n\t\t\tx = blk(x)\n\t\tx1 = x\n\t\tfor blk in self.stage1:\n\t\t\tx = blk(x)\n\t\tx2 = x\n\t\tfor blk in self.stage2:\n\t\t\tx = blk(x)\n\t\tx3 = x\n\t\tfor blk in self.stage3:\n\t\t\tx = blk(x)\n\t\tx4 = x\n\t\tfor blk in self.stage4:\n\t\t\tx = blk(x)\n\t\tx5 = x\n\t\treturn [x1, x2, x3, x4, x5]\n\t\n\tdef forward(self, x):\n\t\tx = self.forward_features(x)\n\t\tx[-1] = self.norm(x[-1])\n\t\treturn x\n\ndef update_weight(model_dict, weight_dict):\n    idx, temp_dict = 0, {}\n    for k, v in weight_dict.items():\n        if k in model_dict.keys() and np.shape(model_dict[k]) == np.shape(v):\n            temp_dict[k] = v\n            idx += 1\n    model_dict.update(temp_dict)\n    print(f'loading weights... {idx}/{len(model_dict)} items')\n    return model_dict\n\ndef EMO_1M(weights='', **kwargs):\n\tmodel = EMO(\n\t\t# dim_in=3, num_classes=1000, img_size=224,\n\t\tdepths=[2, 2, 8, 3], stem_dim=24, embed_dims=[32, 48, 80, 168], exp_ratios=[2., 2.5, 3.0, 3.5],\n\t\tnorm_layers=['bn_2d', 'bn_2d', 'ln_2d', 'ln_2d'], act_layers=['silu', 'silu', 'gelu', 'gelu'],\n\t\tdw_kss=[3, 3, 5, 5], dim_heads=[16, 16, 20, 21], window_sizes=[7, 7, 7, 7], attn_ss=[False, False, True, True],\n\t\tqkv_bias=True, attn_drop=0., drop=0., drop_path=0.04036, v_group=False, attn_pre=True, pre_dim=0,\n\t\t**kwargs)\n\tif weights:\n\t\tpretrained_weight = torch.load(weights)\n\t\tmodel.load_state_dict(update_weight(model.state_dict(), pretrained_weight))\n\treturn model\n\ndef EMO_2M(weights='', **kwargs):\n\tmodel = EMO(\n\t\t# dim_in=3, num_classes=1000, img_size=224,\n\t\tdepths=[3, 3, 9, 3], stem_dim=24, embed_dims=[32, 48, 120, 200], exp_ratios=[2., 2.5, 3.0, 3.5],\n\t\tnorm_layers=['bn_2d', 'bn_2d', 'ln_2d', 'ln_2d'], act_layers=['silu', 'silu', 'gelu', 'gelu'],\n\t\tdw_kss=[3, 3, 5, 5], dim_heads=[16, 16, 20, 20], window_sizes=[7, 7, 7, 7], attn_ss=[False, False, True, True],\n\t\tqkv_bias=True, attn_drop=0., drop=0., drop_path=0.05, v_group=False, attn_pre=True, pre_dim=0,\n\t\t**kwargs)\n\tif weights:\n\t\tpretrained_weight = torch.load(weights)\n\t\tmodel.load_state_dict(update_weight(model.state_dict(), pretrained_weight))\n\treturn model\n\ndef EMO_5M(weights='', **kwargs):\n\tmodel = EMO(\n\t\t# dim_in=3, num_classes=1000, img_size=224,\n\t\tdepths=[3, 3, 9, 3], stem_dim=24, embed_dims=[48, 72, 160, 288], exp_ratios=[2., 3., 4., 4.],\n\t\tnorm_layers=['bn_2d', 'bn_2d', 'ln_2d', 'ln_2d'], act_layers=['silu', 'silu', 'gelu', 'gelu'],\n\t\tdw_kss=[3, 3, 5, 5], dim_heads=[24, 24, 32, 32], window_sizes=[7, 7, 7, 7], attn_ss=[False, False, True, True],\n\t\tqkv_bias=True, attn_drop=0., drop=0., drop_path=0.05, v_group=False, attn_pre=True, pre_dim=0,\n\t\t**kwargs)\n\tif weights:\n\t\tpretrained_weight = torch.load(weights)\n\t\tmodel.load_state_dict(update_weight(model.state_dict(), pretrained_weight))\n\treturn model\n\ndef EMO_6M(weights='', **kwargs):\n\tmodel = EMO(\n\t\t# dim_in=3, num_classes=1000, img_size=224,\n\t\tdepths=[3, 3, 9, 3], stem_dim=24, embed_dims=[48, 72, 160, 320], exp_ratios=[2., 3., 4., 5.],\n\t\tnorm_layers=['bn_2d', 'bn_2d', 'ln_2d', 'ln_2d'], act_layers=['silu', 'silu', 'gelu', 'gelu'],\n\t\tdw_kss=[3, 3, 5, 5], dim_heads=[16, 24, 20, 32], window_sizes=[7, 7, 7, 7], attn_ss=[False, False, True, True],\n\t\tqkv_bias=True, attn_drop=0., drop=0., drop_path=0.05, v_group=False, attn_pre=True, pre_dim=0,\n\t\t**kwargs)\n\tif weights:\n\t\tpretrained_weight = torch.load(weights)\n\t\tmodel.load_state_dict(update_weight(model.state_dict(), pretrained_weight))\n\treturn model\n\nif __name__ == '__main__':\n    model = EMO_1M('EMO_1M/net.pth')\n    model = EMO_2M('EMO_2M/net.pth')\n    model = EMO_5M('EMO_5M/net.pth')\n    model = EMO_6M('EMO_6M/net.pth')"
  },
  {
    "path": "yolo-improve/yolov5-backbone/EfficientFormerV2/EfficientFormerV2.py",
    "content": "\"\"\"\nEfficientFormer_v2\n\"\"\"\nimport os\nimport copy\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport math\nfrom typing import Dict\nimport itertools\nimport numpy as np\nfrom timm.models.layers import DropPath, trunc_normal_, to_2tuple\n\n__all__ = ['efficientformerv2_s0', 'efficientformerv2_s1', 'efficientformerv2_s2', 'efficientformerv2_l']\n\nEfficientFormer_width = {\n    'L': [40, 80, 192, 384],  # 26m 83.3% 6attn\n    'S2': [32, 64, 144, 288],  # 12m 81.6% 4attn dp0.02\n    'S1': [32, 48, 120, 224],  # 6.1m 79.0\n    'S0': [32, 48, 96, 176],  # 75.0 75.7\n}\n\nEfficientFormer_depth = {\n    'L': [5, 5, 15, 10],  # 26m 83.3%\n    'S2': [4, 4, 12, 8],  # 12m\n    'S1': [3, 3, 9, 6],  # 79.0\n    'S0': [2, 2, 6, 4],  # 75.7\n}\n\n# 26m\nexpansion_ratios_L = {\n    '0': [4, 4, 4, 4, 4],\n    '1': [4, 4, 4, 4, 4],\n    '2': [4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4],\n    '3': [4, 4, 4, 3, 3, 3, 3, 4, 4, 4],\n}\n\n# 12m\nexpansion_ratios_S2 = {\n    '0': [4, 4, 4, 4],\n    '1': [4, 4, 4, 4],\n    '2': [4, 4, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4],\n    '3': [4, 4, 3, 3, 3, 3, 4, 4],\n}\n\n# 6.1m\nexpansion_ratios_S1 = {\n    '0': [4, 4, 4],\n    '1': [4, 4, 4],\n    '2': [4, 4, 3, 3, 3, 3, 4, 4, 4],\n    '3': [4, 4, 3, 3, 4, 4],\n}\n\n# 3.5m\nexpansion_ratios_S0 = {\n    '0': [4, 4],\n    '1': [4, 4],\n    '2': [4, 3, 3, 3, 4, 4],\n    '3': [4, 3, 3, 4],\n}\n\n\nclass Attention4D(torch.nn.Module):\n    def __init__(self, dim=384, key_dim=32, num_heads=8,\n                 attn_ratio=4,\n                 resolution=7,\n                 act_layer=nn.ReLU,\n                 stride=None):\n        super().__init__()\n        self.num_heads = num_heads\n        self.scale = key_dim ** -0.5\n        self.key_dim = key_dim\n        self.nh_kd = nh_kd = key_dim * num_heads\n\n        if stride is not None:\n            self.resolution = math.ceil(resolution / stride)\n            self.stride_conv = nn.Sequential(nn.Conv2d(dim, dim, kernel_size=3, stride=stride, padding=1, groups=dim),\n                                             nn.BatchNorm2d(dim), )\n            self.upsample = nn.Upsample(scale_factor=stride, mode='bilinear')\n        else:\n            self.resolution = resolution\n            self.stride_conv = None\n            self.upsample = None\n\n        self.N = self.resolution ** 2\n        self.N2 = self.N\n        self.d = int(attn_ratio * key_dim)\n        self.dh = int(attn_ratio * key_dim) * num_heads\n        self.attn_ratio = attn_ratio\n        h = self.dh + nh_kd * 2\n        self.q = nn.Sequential(nn.Conv2d(dim, self.num_heads * self.key_dim, 1),\n                               nn.BatchNorm2d(self.num_heads * self.key_dim), )\n        self.k = nn.Sequential(nn.Conv2d(dim, self.num_heads * self.key_dim, 1),\n                               nn.BatchNorm2d(self.num_heads * self.key_dim), )\n        self.v = nn.Sequential(nn.Conv2d(dim, self.num_heads * self.d, 1),\n                               nn.BatchNorm2d(self.num_heads * self.d),\n                               )\n        self.v_local = nn.Sequential(nn.Conv2d(self.num_heads * self.d, self.num_heads * self.d,\n                                               kernel_size=3, stride=1, padding=1, groups=self.num_heads * self.d),\n                                     nn.BatchNorm2d(self.num_heads * self.d), )\n        self.talking_head1 = nn.Conv2d(self.num_heads, self.num_heads, kernel_size=1, stride=1, padding=0)\n        self.talking_head2 = nn.Conv2d(self.num_heads, self.num_heads, kernel_size=1, stride=1, padding=0)\n\n        self.proj = nn.Sequential(act_layer(),\n                                  nn.Conv2d(self.dh, dim, 1),\n                                  nn.BatchNorm2d(dim), )\n\n        points = list(itertools.product(range(self.resolution), range(self.resolution)))\n        N = len(points)\n        attention_offsets = {}\n        idxs = []\n        for p1 in points:\n            for p2 in points:\n                offset = (abs(p1[0] - p2[0]), abs(p1[1] - p2[1]))\n                if offset not in attention_offsets:\n                    attention_offsets[offset] = len(attention_offsets)\n                idxs.append(attention_offsets[offset])\n        self.attention_biases = torch.nn.Parameter(\n            torch.zeros(num_heads, len(attention_offsets)))\n        self.register_buffer('attention_bias_idxs',\n                             torch.LongTensor(idxs).view(N, N))\n\n    @torch.no_grad()\n    def train(self, mode=True):\n        super().train(mode)\n        if mode and hasattr(self, 'ab'):\n            del self.ab\n        else:\n            self.ab = self.attention_biases[:, self.attention_bias_idxs]\n\n    def forward(self, x):  # x (B,N,C)\n        B, C, H, W = x.shape\n        if self.stride_conv is not None:\n            x = self.stride_conv(x)\n\n        q = self.q(x).flatten(2).reshape(B, self.num_heads, -1, self.N).permute(0, 1, 3, 2)\n        k = self.k(x).flatten(2).reshape(B, self.num_heads, -1, self.N).permute(0, 1, 2, 3)\n        v = self.v(x)\n        v_local = self.v_local(v)\n        v = v.flatten(2).reshape(B, self.num_heads, -1, self.N).permute(0, 1, 3, 2)\n\n        attn = (\n                (q @ k) * self.scale\n                +\n                (self.attention_biases[:, self.attention_bias_idxs]\n                 if self.training else self.ab)\n        )\n        # attn = (q @ k) * self.scale\n        attn = self.talking_head1(attn)\n        attn = attn.softmax(dim=-1)\n        attn = self.talking_head2(attn)\n\n        x = (attn @ v)\n\n        out = x.transpose(2, 3).reshape(B, self.dh, self.resolution, self.resolution) + v_local\n        if self.upsample is not None:\n            out = self.upsample(out)\n\n        out = self.proj(out)\n        return out\n\n\ndef stem(in_chs, out_chs, act_layer=nn.ReLU):\n    return nn.Sequential(\n        nn.Conv2d(in_chs, out_chs // 2, kernel_size=3, stride=2, padding=1),\n        nn.BatchNorm2d(out_chs // 2),\n        act_layer(),\n        nn.Conv2d(out_chs // 2, out_chs, kernel_size=3, stride=2, padding=1),\n        nn.BatchNorm2d(out_chs),\n        act_layer(),\n    )\n\n\nclass LGQuery(torch.nn.Module):\n    def __init__(self, in_dim, out_dim, resolution1, resolution2):\n        super().__init__()\n        self.resolution1 = resolution1\n        self.resolution2 = resolution2\n        self.pool = nn.AvgPool2d(1, 2, 0)\n        self.local = nn.Sequential(nn.Conv2d(in_dim, in_dim, kernel_size=3, stride=2, padding=1, groups=in_dim),\n                                   )\n        self.proj = nn.Sequential(nn.Conv2d(in_dim, out_dim, 1),\n                                  nn.BatchNorm2d(out_dim), )\n\n    def forward(self, x):\n        local_q = self.local(x)\n        pool_q = self.pool(x)\n        q = local_q + pool_q\n        q = self.proj(q)\n        return q\n\n\nclass Attention4DDownsample(torch.nn.Module):\n    def __init__(self, dim=384, key_dim=16, num_heads=8,\n                 attn_ratio=4,\n                 resolution=7,\n                 out_dim=None,\n                 act_layer=None,\n                 ):\n        super().__init__()\n\n        self.num_heads = num_heads\n        self.scale = key_dim ** -0.5\n        self.key_dim = key_dim\n        self.nh_kd = nh_kd = key_dim * num_heads\n\n        self.resolution = resolution\n\n        self.d = int(attn_ratio * key_dim)\n        self.dh = int(attn_ratio * key_dim) * num_heads\n        self.attn_ratio = attn_ratio\n        h = self.dh + nh_kd * 2\n\n        if out_dim is not None:\n            self.out_dim = out_dim\n        else:\n            self.out_dim = dim\n        self.resolution2 = math.ceil(self.resolution / 2)\n        self.q = LGQuery(dim, self.num_heads * self.key_dim, self.resolution, self.resolution2)\n\n        self.N = self.resolution ** 2\n        self.N2 = self.resolution2 ** 2\n\n        self.k = nn.Sequential(nn.Conv2d(dim, self.num_heads * self.key_dim, 1),\n                               nn.BatchNorm2d(self.num_heads * self.key_dim), )\n        self.v = nn.Sequential(nn.Conv2d(dim, self.num_heads * self.d, 1),\n                               nn.BatchNorm2d(self.num_heads * self.d),\n                               )\n        self.v_local = nn.Sequential(nn.Conv2d(self.num_heads * self.d, self.num_heads * self.d,\n                                               kernel_size=3, stride=2, padding=1, groups=self.num_heads * self.d),\n                                     nn.BatchNorm2d(self.num_heads * self.d), )\n\n        self.proj = nn.Sequential(\n            act_layer(),\n            nn.Conv2d(self.dh, self.out_dim, 1),\n            nn.BatchNorm2d(self.out_dim), )\n\n        points = list(itertools.product(range(self.resolution), range(self.resolution)))\n        points_ = list(itertools.product(\n            range(self.resolution2), range(self.resolution2)))\n        N = len(points)\n        N_ = len(points_)\n        attention_offsets = {}\n        idxs = []\n        for p1 in points_:\n            for p2 in points:\n                size = 1\n                offset = (\n                    abs(p1[0] * math.ceil(self.resolution / self.resolution2) - p2[0] + (size - 1) / 2),\n                    abs(p1[1] * math.ceil(self.resolution / self.resolution2) - p2[1] + (size - 1) / 2))\n                if offset not in attention_offsets:\n                    attention_offsets[offset] = len(attention_offsets)\n                idxs.append(attention_offsets[offset])\n        self.attention_biases = torch.nn.Parameter(\n            torch.zeros(num_heads, len(attention_offsets)))\n        self.register_buffer('attention_bias_idxs',\n                             torch.LongTensor(idxs).view(N_, N))\n\n    @torch.no_grad()\n    def train(self, mode=True):\n        super().train(mode)\n        if mode and hasattr(self, 'ab'):\n            del self.ab\n        else:\n            self.ab = self.attention_biases[:, self.attention_bias_idxs]\n\n    def forward(self, x):  # x (B,N,C)\n        B, C, H, W = x.shape\n\n        q = self.q(x).flatten(2).reshape(B, self.num_heads, -1, self.N2).permute(0, 1, 3, 2)\n        k = self.k(x).flatten(2).reshape(B, self.num_heads, -1, self.N).permute(0, 1, 2, 3)\n        v = self.v(x)\n        v_local = self.v_local(v)\n        v = v.flatten(2).reshape(B, self.num_heads, -1, self.N).permute(0, 1, 3, 2)\n\n        attn = (\n                (q @ k) * self.scale\n                +\n                (self.attention_biases[:, self.attention_bias_idxs]\n                 if self.training else self.ab)\n        )\n\n        # attn = (q @ k) * self.scale\n        attn = attn.softmax(dim=-1)\n        x = (attn @ v).transpose(2, 3)\n        out = x.reshape(B, self.dh, self.resolution2, self.resolution2) + v_local\n\n        out = self.proj(out)\n        return out\n\n\nclass Embedding(nn.Module):\n    def __init__(self, patch_size=3, stride=2, padding=1,\n                 in_chans=3, embed_dim=768, norm_layer=nn.BatchNorm2d,\n                 light=False, asub=False, resolution=None, act_layer=nn.ReLU, attn_block=Attention4DDownsample):\n        super().__init__()\n        self.light = light\n        self.asub = asub\n\n        if self.light:\n            self.new_proj = nn.Sequential(\n                nn.Conv2d(in_chans, in_chans, kernel_size=3, stride=2, padding=1, groups=in_chans),\n                nn.BatchNorm2d(in_chans),\n                nn.Hardswish(),\n                nn.Conv2d(in_chans, embed_dim, kernel_size=1, stride=1, padding=0),\n                nn.BatchNorm2d(embed_dim),\n            )\n            self.skip = nn.Sequential(\n                nn.Conv2d(in_chans, embed_dim, kernel_size=1, stride=2, padding=0),\n                nn.BatchNorm2d(embed_dim)\n            )\n        elif self.asub:\n            self.attn = attn_block(dim=in_chans, out_dim=embed_dim,\n                                   resolution=resolution, act_layer=act_layer)\n            patch_size = to_2tuple(patch_size)\n            stride = to_2tuple(stride)\n            padding = to_2tuple(padding)\n            self.conv = nn.Conv2d(in_chans, embed_dim, kernel_size=patch_size,\n                                  stride=stride, padding=padding)\n            self.bn = norm_layer(embed_dim) if norm_layer else nn.Identity()\n        else:\n            patch_size = to_2tuple(patch_size)\n            stride = to_2tuple(stride)\n            padding = to_2tuple(padding)\n            self.proj = nn.Conv2d(in_chans, embed_dim, kernel_size=patch_size,\n                                  stride=stride, padding=padding)\n            self.norm = norm_layer(embed_dim) if norm_layer else nn.Identity()\n\n    def forward(self, x):\n        if self.light:\n            out = self.new_proj(x) + self.skip(x)\n        elif self.asub:\n            out_conv = self.conv(x)\n            out_conv = self.bn(out_conv)\n            out = self.attn(x) + out_conv\n        else:\n            x = self.proj(x)\n            out = self.norm(x)\n        return out\n\n\nclass Mlp(nn.Module):\n    \"\"\"\n    Implementation of MLP with 1*1 convolutions.\n    Input: tensor with shape [B, C, H, W]\n    \"\"\"\n\n    def __init__(self, in_features, hidden_features=None,\n                 out_features=None, act_layer=nn.GELU, drop=0., mid_conv=False):\n        super().__init__()\n        out_features = out_features or in_features\n        hidden_features = hidden_features or in_features\n        self.mid_conv = mid_conv\n        self.fc1 = nn.Conv2d(in_features, hidden_features, 1)\n        self.act = act_layer()\n        self.fc2 = nn.Conv2d(hidden_features, out_features, 1)\n        self.drop = nn.Dropout(drop)\n        self.apply(self._init_weights)\n\n        if self.mid_conv:\n            self.mid = nn.Conv2d(hidden_features, hidden_features, kernel_size=3, stride=1, padding=1,\n                                 groups=hidden_features)\n            self.mid_norm = nn.BatchNorm2d(hidden_features)\n\n        self.norm1 = nn.BatchNorm2d(hidden_features)\n        self.norm2 = nn.BatchNorm2d(out_features)\n\n    def _init_weights(self, m):\n        if isinstance(m, nn.Conv2d):\n            trunc_normal_(m.weight, std=.02)\n            if m.bias is not None:\n                nn.init.constant_(m.bias, 0)\n\n    def forward(self, x):\n        x = self.fc1(x)\n        x = self.norm1(x)\n        x = self.act(x)\n\n        if self.mid_conv:\n            x_mid = self.mid(x)\n            x_mid = self.mid_norm(x_mid)\n            x = self.act(x_mid)\n        x = self.drop(x)\n\n        x = self.fc2(x)\n        x = self.norm2(x)\n\n        x = self.drop(x)\n        return x\n\n\nclass AttnFFN(nn.Module):\n    def __init__(self, dim, mlp_ratio=4.,\n                 act_layer=nn.ReLU, norm_layer=nn.LayerNorm,\n                 drop=0., drop_path=0.,\n                 use_layer_scale=True, layer_scale_init_value=1e-5,\n                 resolution=7, stride=None):\n\n        super().__init__()\n\n        self.token_mixer = Attention4D(dim, resolution=resolution, act_layer=act_layer, stride=stride)\n        mlp_hidden_dim = int(dim * mlp_ratio)\n        self.mlp = Mlp(in_features=dim, hidden_features=mlp_hidden_dim,\n                       act_layer=act_layer, drop=drop, mid_conv=True)\n\n        self.drop_path = DropPath(drop_path) if drop_path > 0. \\\n            else nn.Identity()\n        self.use_layer_scale = use_layer_scale\n        if use_layer_scale:\n            self.layer_scale_1 = nn.Parameter(\n                layer_scale_init_value * torch.ones(dim).unsqueeze(-1).unsqueeze(-1), requires_grad=True)\n            self.layer_scale_2 = nn.Parameter(\n                layer_scale_init_value * torch.ones(dim).unsqueeze(-1).unsqueeze(-1), requires_grad=True)\n\n    def forward(self, x):\n        if self.use_layer_scale:\n            x = x + self.drop_path(self.layer_scale_1 * self.token_mixer(x))\n            x = x + self.drop_path(self.layer_scale_2 * self.mlp(x))\n\n        else:\n            x = x + self.drop_path(self.token_mixer(x))\n            x = x + self.drop_path(self.mlp(x))\n        return x\n\n\nclass FFN(nn.Module):\n    def __init__(self, dim, pool_size=3, mlp_ratio=4.,\n                 act_layer=nn.GELU,\n                 drop=0., drop_path=0.,\n                 use_layer_scale=True, layer_scale_init_value=1e-5):\n        super().__init__()\n\n        mlp_hidden_dim = int(dim * mlp_ratio)\n        self.mlp = Mlp(in_features=dim, hidden_features=mlp_hidden_dim,\n                       act_layer=act_layer, drop=drop, mid_conv=True)\n\n        self.drop_path = DropPath(drop_path) if drop_path > 0. \\\n            else nn.Identity()\n        self.use_layer_scale = use_layer_scale\n        if use_layer_scale:\n            self.layer_scale_2 = nn.Parameter(\n                layer_scale_init_value * torch.ones(dim).unsqueeze(-1).unsqueeze(-1), requires_grad=True)\n\n    def forward(self, x):\n        if self.use_layer_scale:\n            x = x + self.drop_path(self.layer_scale_2 * self.mlp(x))\n        else:\n            x = x + self.drop_path(self.mlp(x))\n        return x\n\n\ndef eformer_block(dim, index, layers,\n                  pool_size=3, mlp_ratio=4.,\n                  act_layer=nn.GELU, norm_layer=nn.LayerNorm,\n                  drop_rate=.0, drop_path_rate=0.,\n                  use_layer_scale=True, layer_scale_init_value=1e-5, vit_num=1, resolution=7, e_ratios=None):\n    blocks = []\n    for block_idx in range(layers[index]):\n        block_dpr = drop_path_rate * (\n                block_idx + sum(layers[:index])) / (sum(layers) - 1)\n        mlp_ratio = e_ratios[str(index)][block_idx]\n        if index >= 2 and block_idx > layers[index] - 1 - vit_num:\n            if index == 2:\n                stride = 2\n            else:\n                stride = None\n            blocks.append(AttnFFN(\n                dim, mlp_ratio=mlp_ratio,\n                act_layer=act_layer, norm_layer=norm_layer,\n                drop=drop_rate, drop_path=block_dpr,\n                use_layer_scale=use_layer_scale,\n                layer_scale_init_value=layer_scale_init_value,\n                resolution=resolution,\n                stride=stride,\n            ))\n        else:\n            blocks.append(FFN(\n                dim, pool_size=pool_size, mlp_ratio=mlp_ratio,\n                act_layer=act_layer,\n                drop=drop_rate, drop_path=block_dpr,\n                use_layer_scale=use_layer_scale,\n                layer_scale_init_value=layer_scale_init_value,\n            ))\n    blocks = nn.Sequential(*blocks)\n    return blocks\n\n\nclass EfficientFormerV2(nn.Module):\n    def __init__(self, layers, embed_dims=None,\n                 mlp_ratios=4, downsamples=None,\n                 pool_size=3,\n                 norm_layer=nn.BatchNorm2d, act_layer=nn.GELU,\n                 num_classes=1000,\n                 down_patch_size=3, down_stride=2, down_pad=1,\n                 drop_rate=0., drop_path_rate=0.,\n                 use_layer_scale=True, layer_scale_init_value=1e-5,\n                 fork_feat=True,\n                 vit_num=0,\n                 resolution=640,\n                 e_ratios=expansion_ratios_L,\n                 **kwargs):\n        super().__init__()\n\n        if not fork_feat:\n            self.num_classes = num_classes\n        self.fork_feat = fork_feat\n\n        self.patch_embed = stem(3, embed_dims[0], act_layer=act_layer)\n\n        network = []\n        for i in range(len(layers)):\n            stage = eformer_block(embed_dims[i], i, layers,\n                                  pool_size=pool_size, mlp_ratio=mlp_ratios,\n                                  act_layer=act_layer, norm_layer=norm_layer,\n                                  drop_rate=drop_rate,\n                                  drop_path_rate=drop_path_rate,\n                                  use_layer_scale=use_layer_scale,\n                                  layer_scale_init_value=layer_scale_init_value,\n                                  resolution=math.ceil(resolution / (2 ** (i + 2))),\n                                  vit_num=vit_num,\n                                  e_ratios=e_ratios)\n            network.append(stage)\n            if i >= len(layers) - 1:\n                break\n            if downsamples[i] or embed_dims[i] != embed_dims[i + 1]:\n                # downsampling between two stages\n                if i >= 2:\n                    asub = True\n                else:\n                    asub = False\n                network.append(\n                    Embedding(\n                        patch_size=down_patch_size, stride=down_stride,\n                        padding=down_pad,\n                        in_chans=embed_dims[i], embed_dim=embed_dims[i + 1],\n                        resolution=math.ceil(resolution / (2 ** (i + 2))),\n                        asub=asub,\n                        act_layer=act_layer, norm_layer=norm_layer,\n                    )\n                )\n\n        self.network = nn.ModuleList(network)\n\n        if self.fork_feat:\n            # add a norm layer for each output\n            self.out_indices = [0, 2, 4, 6]\n            for i_emb, i_layer in enumerate(self.out_indices):\n                if i_emb == 0 and os.environ.get('FORK_LAST3', None):\n                    layer = nn.Identity()\n                else:\n                    layer = norm_layer(embed_dims[i_emb])\n                layer_name = f'norm{i_layer}'\n                self.add_module(layer_name, layer)\n        self.channel = [i.size(1) for i in self.forward(torch.randn(1, 3, resolution, resolution))]\n        \n    def forward_tokens(self, x):\n        outs = []\n        for idx, block in enumerate(self.network):\n            x = block(x)\n            if self.fork_feat and idx in self.out_indices:\n                norm_layer = getattr(self, f'norm{idx}')\n                x_out = norm_layer(x)\n                outs.append(x_out)\n        return outs\n\n    def forward(self, x):\n        x = self.patch_embed(x)\n        x = self.forward_tokens(x)\n        return x\n\ndef update_weight(model_dict, weight_dict):\n    idx, temp_dict = 0, {}\n    for k, v in weight_dict.items():\n        if k in model_dict.keys() and np.shape(model_dict[k]) == np.shape(v):\n            temp_dict[k] = v\n            idx += 1\n    model_dict.update(temp_dict)\n    print(f'loading weights... {idx}/{len(model_dict)} items')\n    return model_dict\n\ndef efficientformerv2_s0(weights='', **kwargs):\n    model = EfficientFormerV2(\n        layers=EfficientFormer_depth['S0'],\n        embed_dims=EfficientFormer_width['S0'],\n        downsamples=[True, True, True, True, True],\n        vit_num=2,\n        drop_path_rate=0.0,\n        e_ratios=expansion_ratios_S0,\n        **kwargs)\n    if weights:\n        pretrained_weight = torch.load(weights)['model']\n        model.load_state_dict(update_weight(model.state_dict(), pretrained_weight))\n    return model\n\ndef efficientformerv2_s1(weights='', **kwargs):\n    model = EfficientFormerV2(\n        layers=EfficientFormer_depth['S1'],\n        embed_dims=EfficientFormer_width['S1'],\n        downsamples=[True, True, True, True],\n        vit_num=2,\n        drop_path_rate=0.0,\n        e_ratios=expansion_ratios_S1,\n        **kwargs)\n    if weights:\n        pretrained_weight = torch.load(weights)['model']\n        model.load_state_dict(update_weight(model.state_dict(), pretrained_weight))\n    return model\n\ndef efficientformerv2_s2(weights='', **kwargs):\n    model = EfficientFormerV2(\n        layers=EfficientFormer_depth['S2'],\n        embed_dims=EfficientFormer_width['S2'],\n        downsamples=[True, True, True, True],\n        vit_num=4,\n        drop_path_rate=0.02,\n        e_ratios=expansion_ratios_S2,\n        **kwargs)\n    if weights:\n        pretrained_weight = torch.load(weights)['model']\n        model.load_state_dict(update_weight(model.state_dict(), pretrained_weight))\n    return model\n\ndef efficientformerv2_l(weights='', **kwargs):\n    model = EfficientFormerV2(\n        layers=EfficientFormer_depth['L'],\n        embed_dims=EfficientFormer_width['L'],\n        downsamples=[True, True, True, True],\n        vit_num=6,\n        drop_path_rate=0.1,\n        e_ratios=expansion_ratios_L,\n        **kwargs)\n    if weights:\n        pretrained_weight = torch.load(weights)['model']\n        model.load_state_dict(update_weight(model.state_dict(), pretrained_weight))\n    return model\n\nif __name__ == '__main__':\n    inputs = torch.randn((1, 3, 640, 640))\n    \n    model = efficientformerv2_s0('eformer_s0_450.pth')\n    res = model(inputs)\n    for i in res:\n        print(i.size())\n    \n    model = efficientformerv2_s1('eformer_s1_450.pth')\n    res = model(inputs)\n    for i in res:\n        print(i.size())\n    \n    model = efficientformerv2_s2('eformer_s2_450.pth')\n    res = model(inputs)\n    for i in res:\n        print(i.size())\n    \n    model = efficientformerv2_l('eformer_l_450.pth')\n    res = model(inputs)\n    for i in res:\n        print(i.size())"
  },
  {
    "path": "yolo-improve/yolov5-backbone/EfficientViT/efficientViT.py",
    "content": "from typing import Dict, List, Tuple, Union, Optional, Type, Callable, Any\nfrom inspect import signature\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport numpy as np\n\n__all__ = [\n    \"efficientvit_b0\",\n    \"efficientvit_b1\",\n    \"efficientvit_b2\",\n    \"efficientvit_b3\",\n]\n\n#################################################################################\n#                             Basic Layers                                      #\n#################################################################################\n\ndef build_kwargs_from_config(config: Dict, target_func: Callable) -> Dict[str, Any]:\n    valid_keys = list(signature(target_func).parameters)\n    kwargs = {}\n    for key in config:\n        if key in valid_keys:\n            kwargs[key] = config[key]\n    return kwargs\n\nREGISTERED_NORM_DICT: Dict[str, Type] = {\n    \"bn2d\": nn.BatchNorm2d,\n    \"ln\": nn.LayerNorm,\n}\n\ndef build_norm(name=\"bn2d\", num_features=None, **kwargs) -> Optional[nn.Module]:\n    if name == \"ln\":\n        kwargs[\"normalized_shape\"] = num_features\n    else:\n        kwargs[\"num_features\"] = num_features\n    if name in REGISTERED_NORM_DICT:\n        norm_cls = REGISTERED_NORM_DICT[name]\n        args = build_kwargs_from_config(kwargs, norm_cls)\n        return norm_cls(**args)\n    else:\n        return None\n\nREGISTERED_ACT_DICT: Dict[str, Type] = {\n    \"relu\": nn.ReLU,\n    \"relu6\": nn.ReLU6,\n    \"hswish\": nn.Hardswish,\n}\n\ndef build_act(name: str, **kwargs) -> Optional[nn.Module]:\n    if name in REGISTERED_ACT_DICT:\n        act_cls = REGISTERED_ACT_DICT[name]\n        args = build_kwargs_from_config(kwargs, act_cls)\n        return act_cls(**args)\n    else:\n        return None\n\ndef get_same_padding(kernel_size: Union[int, Tuple[int, ...]]) -> Union[int, Tuple[int, ...]]:\n    if isinstance(kernel_size, tuple):\n        return tuple([get_same_padding(ks) for ks in kernel_size])\n    else:\n        assert kernel_size % 2 > 0, \"kernel size should be odd number\"\n        return kernel_size // 2\n\ndef list_sum(x: List) -> Any:\n    return x[0] if len(x) == 1 else x[0] + list_sum(x[1:])\n\ndef merge_tensor(x: List[torch.Tensor], mode=\"cat\", dim=1) -> torch.Tensor:\n    if mode == \"cat\":\n        return torch.cat(x, dim=dim)\n    elif mode == \"add\":\n        return list_sum(x)\n    else:\n        raise NotImplementedError\n\ndef resize(\n    x: torch.Tensor,\n    size: Optional[Any] = None,\n    scale_factor: Optional[List[float]] = None,\n    mode: str = \"bicubic\",\n    align_corners: Optional[bool] = False,\n) -> torch.Tensor:\n    if mode in {\"bilinear\", \"bicubic\"}:\n        return F.interpolate(\n            x,\n            size=size,\n            scale_factor=scale_factor,\n            mode=mode,\n            align_corners=align_corners,\n        )\n    elif mode in {\"nearest\", \"area\"}:\n        return F.interpolate(x, size=size, scale_factor=scale_factor, mode=mode)\n    else:\n        raise NotImplementedError(f\"resize(mode={mode}) not implemented.\")\n\ndef val2list(x: Union[List, Tuple, Any], repeat_time=1) -> List:\n    if isinstance(x, (list, tuple)):\n        return list(x)\n    return [x for _ in range(repeat_time)]\n\ndef val2tuple(x: Union[List, Tuple, Any], min_len: int = 1, idx_repeat: int = -1) -> Tuple:\n    # convert to list first\n    x = val2list(x)\n\n    # repeat elements if necessary\n    if len(x) > 0:\n        x[idx_repeat:idx_repeat] = [x[idx_repeat] for _ in range(min_len - len(x))]\n\n    return tuple(x)\n\nclass ConvLayer(nn.Module):\n    def __init__(\n        self,\n        in_channels: int,\n        out_channels: int,\n        kernel_size=3,\n        stride=1,\n        dilation=1,\n        groups=1,\n        use_bias=False,\n        dropout_rate=0,\n        norm=\"bn2d\",\n        act_func=\"relu\",\n    ):\n        super(ConvLayer, self).__init__()\n\n        padding = get_same_padding(kernel_size)\n        padding *= dilation\n\n        self.dropout = nn.Dropout2d(dropout_rate, inplace=False) if dropout_rate > 0 else None\n        self.conv = nn.Conv2d(\n            in_channels,\n            out_channels,\n            kernel_size=(kernel_size, kernel_size),\n            stride=(stride, stride),\n            padding=padding,\n            dilation=(dilation, dilation),\n            groups=groups,\n            bias=use_bias,\n        )\n        self.norm = build_norm(norm, num_features=out_channels)\n        self.act = build_act(act_func)\n    \n    def forward(self, x: torch.Tensor) -> torch.Tensor:\n        if self.dropout is not None:\n            x = self.dropout(x)\n        x = self.conv(x)\n        if self.norm:\n            x = self.norm(x)\n        if self.act:\n            x = self.act(x)\n        return x\n\n\nclass UpSampleLayer(nn.Module):\n    def __init__(\n        self,\n        mode=\"bicubic\",\n        size: Union[int, Tuple[int, int], List[int], None] = None,\n        factor=2,\n        align_corners=False,\n    ):\n        super(UpSampleLayer, self).__init__()\n        self.mode = mode\n        self.size = val2list(size, 2) if size is not None else None\n        self.factor = None if self.size is not None else factor\n        self.align_corners = align_corners\n\n    def forward(self, x: torch.Tensor) -> torch.Tensor:\n        return resize(x, self.size, self.factor, self.mode, self.align_corners)\n\n\nclass LinearLayer(nn.Module):\n    def __init__(\n        self,\n        in_features: int,\n        out_features: int,\n        use_bias=True,\n        dropout_rate=0,\n        norm=None,\n        act_func=None,\n    ):\n        super(LinearLayer, self).__init__()\n\n        self.dropout = nn.Dropout(dropout_rate, inplace=False) if dropout_rate > 0 else None\n        self.linear = nn.Linear(in_features, out_features, use_bias)\n        self.norm = build_norm(norm, num_features=out_features)\n        self.act = build_act(act_func)\n    \n    def _try_squeeze(self, x: torch.Tensor) -> torch.Tensor:\n        if x.dim() > 2:\n            x = torch.flatten(x, start_dim=1)\n        return x\n\n    def forward(self, x: torch.Tensor) -> torch.Tensor:\n        x = self._try_squeeze(x)\n        if self.dropout:\n            x = self.dropout(x)\n        x = self.linear(x)\n        if self.norm:\n            x = self.norm(x)\n        if self.act:\n            x = self.act(x)\n        return x\n\n\nclass IdentityLayer(nn.Module):\n    def forward(self, x: torch.Tensor) -> torch.Tensor:\n        return x\n\n\n#################################################################################\n#                             Basic Blocks                                      #\n#################################################################################\n\n\nclass DSConv(nn.Module):\n    def __init__(\n        self,\n        in_channels: int,\n        out_channels: int,\n        kernel_size=3,\n        stride=1,\n        use_bias=False,\n        norm=(\"bn2d\", \"bn2d\"),\n        act_func=(\"relu6\", None),\n    ):\n        super(DSConv, self).__init__()\n\n        use_bias = val2tuple(use_bias, 2)\n        norm = val2tuple(norm, 2)\n        act_func = val2tuple(act_func, 2)\n\n        self.depth_conv = ConvLayer(\n            in_channels,\n            in_channels,\n            kernel_size,\n            stride,\n            groups=in_channels,\n            norm=norm[0],\n            act_func=act_func[0],\n            use_bias=use_bias[0],\n        )\n        self.point_conv = ConvLayer(\n            in_channels,\n            out_channels,\n            1,\n            norm=norm[1],\n            act_func=act_func[1],\n            use_bias=use_bias[1],\n        )\n    \n    def forward(self, x: torch.Tensor) -> torch.Tensor:\n        x = self.depth_conv(x)\n        x = self.point_conv(x)\n        return x\n\n\nclass MBConv(nn.Module):\n    def __init__(\n        self,\n        in_channels: int,\n        out_channels: int,\n        kernel_size=3,\n        stride=1,\n        mid_channels=None,\n        expand_ratio=6,\n        use_bias=False,\n        norm=(\"bn2d\", \"bn2d\", \"bn2d\"),\n        act_func=(\"relu6\", \"relu6\", None),\n    ):\n        super(MBConv, self).__init__()\n\n        use_bias = val2tuple(use_bias, 3)\n        norm = val2tuple(norm, 3)\n        act_func = val2tuple(act_func, 3)\n        mid_channels = mid_channels or round(in_channels * expand_ratio)\n\n        self.inverted_conv = ConvLayer(\n            in_channels,\n            mid_channels,\n            1,\n            stride=1,\n            norm=norm[0],\n            act_func=act_func[0],\n            use_bias=use_bias[0],\n        )\n        self.depth_conv = ConvLayer(\n            mid_channels,\n            mid_channels,\n            kernel_size,\n            stride=stride,\n            groups=mid_channels,\n            norm=norm[1],\n            act_func=act_func[1],\n            use_bias=use_bias[1],\n        )\n        self.point_conv = ConvLayer(\n            mid_channels,\n            out_channels,\n            1,\n            norm=norm[2],\n            act_func=act_func[2],\n            use_bias=use_bias[2],\n        )\n\n    def forward(self, x: torch.Tensor) -> torch.Tensor:\n        x = self.inverted_conv(x)\n        x = self.depth_conv(x)\n        x = self.point_conv(x)\n        return x\n\n\nclass LiteMSA(nn.Module):\n    r\"\"\" Lightweight multi-scale attention \"\"\"\n    def __init__(\n        self,\n        in_channels: int,\n        out_channels: int,\n        heads: Optional[int] = None,\n        heads_ratio: float = 1.0,\n        dim=8,\n        use_bias=False,\n        norm=(None, \"bn2d\"),\n        act_func=(None, None),\n        kernel_func=\"relu\",\n        scales: Tuple[int, ...] = (5,),\n    ):\n        super(LiteMSA, self).__init__()\n        heads = heads or int(in_channels // dim * heads_ratio)\n\n        total_dim = heads * dim\n\n        use_bias = val2tuple(use_bias, 2)\n        norm = val2tuple(norm, 2)\n        act_func = val2tuple(act_func, 2)\n\n        self.dim = dim\n        self.qkv = ConvLayer(\n            in_channels,\n            3 * total_dim,\n            1,\n            use_bias=use_bias[0],\n            norm=norm[0],\n            act_func=act_func[0],\n        )\n        self.aggreg = nn.ModuleList(\n            [\n                nn.Sequential(\n                    nn.Conv2d(\n                        3 * total_dim, 3 * total_dim, scale, padding=get_same_padding(scale), groups=3 * total_dim, bias=use_bias[0],\n                    ),\n                    nn.Conv2d(3 * total_dim, 3 * total_dim, 1, groups=3 * heads, bias=use_bias[0]),\n                )\n                for scale in scales\n            ]\n        )\n        self.kernel_func = build_act(kernel_func, inplace=False)\n\n        self.proj = ConvLayer(\n            total_dim * (1 + len(scales)),\n            out_channels,\n            1,\n            use_bias=use_bias[1],\n            norm=norm[1],\n            act_func=act_func[1],\n        )\n    \n    def forward(self, x: torch.Tensor) -> torch.Tensor:\n        B, _, H, W = list(x.size())\n\n        # generate multi-scale q, k, v\n        qkv = self.qkv(x)\n        multi_scale_qkv = [qkv]\n        for op in self.aggreg:\n            multi_scale_qkv.append(op(qkv))\n        multi_scale_qkv = torch.cat(multi_scale_qkv, dim=1)\n\n        multi_scale_qkv = torch.reshape(\n            multi_scale_qkv,\n            (\n                B,\n                -1,\n                3 * self.dim,\n                H * W,\n            ),\n        )\n        multi_scale_qkv = torch.transpose(multi_scale_qkv, -1, -2)\n        q, k, v = (\n            multi_scale_qkv[..., 0 : self.dim].clone(),\n            multi_scale_qkv[..., self.dim : 2 * self.dim].clone(),\n            multi_scale_qkv[..., 2 * self.dim :].clone(),\n        )\n\n        # lightweight global attention\n        q = self.kernel_func(q)\n        k = self.kernel_func(k)\n\n        trans_k = k.transpose(-1, -2)\n\n        v = F.pad(v, (0, 1), mode=\"constant\", value=1)\n        kv = torch.matmul(trans_k, v)\n        out = torch.matmul(q, kv)\n        out = out[..., :-1] / (out[..., -1:] + 1e-15)\n\n        # final projecttion\n        out = torch.transpose(out, -1, -2)\n        out = torch.reshape(out, (B, -1, H, W))\n        out = self.proj(out)\n\n        return out\n\n\nclass EfficientViTBlock(nn.Module):\n    def __init__(self, in_channels: int, heads_ratio: float = 1.0, dim=32, expand_ratio: float = 4, norm=\"bn2d\", act_func=\"hswish\"):\n        super(EfficientViTBlock, self).__init__()\n        self.context_module = ResidualBlock(\n            LiteMSA(\n                in_channels=in_channels,\n                out_channels=in_channels,\n                heads_ratio=heads_ratio,\n                dim=dim,\n                norm=(None, norm),\n            ),\n            IdentityLayer(),\n        )\n        local_module = MBConv(\n            in_channels=in_channels,\n            out_channels=in_channels,\n            expand_ratio=expand_ratio,\n            use_bias=(True, True, False),\n            norm=(None, None, norm),\n            act_func=(act_func, act_func, None),\n        )\n        self.local_module = ResidualBlock(local_module, IdentityLayer())\n    \n    def forward(self, x: torch.Tensor) -> torch.Tensor:\n        x = self.context_module(x)\n        x = self.local_module(x)\n        return x\n\n\n#################################################################################\n#                             Functional Blocks                                 #\n#################################################################################\n\n\nclass ResidualBlock(nn.Module):\n    def __init__(\n        self,\n        main: Optional[nn.Module],\n        shortcut: Optional[nn.Module],\n        post_act=None,\n        pre_norm: Optional[nn.Module] = None,\n    ):\n        super(ResidualBlock, self).__init__()\n\n        self.pre_norm = pre_norm\n        self.main = main\n        self.shortcut = shortcut\n        self.post_act = build_act(post_act)\n\n    def forward_main(self, x: torch.Tensor) -> torch.Tensor:\n        if self.pre_norm is None:\n            return self.main(x)\n        else:\n            return self.main(self.pre_norm(x))\n\n    def forward(self, x: torch.Tensor) -> torch.Tensor:\n        if self.main is None:\n            res = x\n        elif self.shortcut is None:\n            res = self.forward_main(x)\n        else:\n            res = self.forward_main(x) + self.shortcut(x)\n            if self.post_act:\n                res = self.post_act(res)\n        return res\n\n\nclass DAGBlock(nn.Module):\n    def __init__(\n        self,\n        inputs: Dict[str, nn.Module],\n        merge_mode: str,\n        post_input: Optional[nn.Module],\n        middle: nn.Module,\n        outputs: Dict[str, nn.Module],\n    ):\n        super(DAGBlock, self).__init__()\n\n        self.input_keys = list(inputs.keys())\n        self.input_ops = nn.ModuleList(list(inputs.values()))\n        self.merge_mode = merge_mode\n        self.post_input = post_input\n\n        self.middle = middle\n\n        self.output_keys = list(outputs.keys())\n        self.output_ops = nn.ModuleList(list(outputs.values()))\n\n    def forward(self, feature_dict: Dict[str, torch.Tensor]) -> Dict[str, torch.Tensor]:\n        feat = [op(feature_dict[key]) for key, op in zip(self.input_keys, self.input_ops)]\n        feat = merge_tensor(feat, self.merge_mode, dim=1)\n        if self.post_input is not None:\n            feat = self.post_input(feat)\n        feat = self.middle(feat)\n        for key, op in zip(self.output_keys, self.output_ops):\n            feature_dict[key] = op(feat)\n        return feature_dict\n\n\nclass OpSequential(nn.Module):\n    def __init__(self, op_list: List[Optional[nn.Module]]):\n        super(OpSequential, self).__init__()\n        valid_op_list = []\n        for op in op_list:\n            if op is not None:\n                valid_op_list.append(op)\n        self.op_list = nn.ModuleList(valid_op_list)\n    \n    def forward(self, x: torch.Tensor) -> torch.Tensor:\n        for op in self.op_list:\n            x = op(x)\n        return x\n\nclass EfficientViTBackbone(nn.Module):\n    def __init__(self, width_list: List[int], depth_list: List[int], in_channels=3, dim=32, expand_ratio=4, norm=\"bn2d\", act_func=\"hswish\") -> None:\n        super().__init__()\n\n        self.width_list = []\n        # input stem\n        self.input_stem = [\n            ConvLayer(\n                in_channels=3,\n                out_channels=width_list[0],\n                stride=2,\n                norm=norm,\n                act_func=act_func,\n            )\n        ]\n        for _ in range(depth_list[0]):\n            block = self.build_local_block(\n                in_channels=width_list[0],\n                out_channels=width_list[0],\n                stride=1,\n                expand_ratio=1,\n                norm=norm,\n                act_func=act_func,\n            )\n            self.input_stem.append(ResidualBlock(block, IdentityLayer()))\n        in_channels = width_list[0]\n        self.input_stem = OpSequential(self.input_stem)\n        self.width_list.append(in_channels)\n\n        # stages\n        self.stages = []\n        for w, d in zip(width_list[1:3], depth_list[1:3]):\n            stage = []\n            for i in range(d):\n                stride = 2 if i == 0 else 1\n                block = self.build_local_block(\n                    in_channels=in_channels,\n                    out_channels=w,\n                    stride=stride,\n                    expand_ratio=expand_ratio,\n                    norm=norm,\n                    act_func=act_func,\n                )\n                block = ResidualBlock(block, IdentityLayer() if stride == 1 else None)\n                stage.append(block)\n                in_channels = w\n            self.stages.append(OpSequential(stage))\n            self.width_list.append(in_channels)\n\n        for w, d in zip(width_list[3:], depth_list[3:]):\n            stage = []\n            block = self.build_local_block(\n                in_channels=in_channels,\n                out_channels=w,\n                stride=2,\n                expand_ratio=expand_ratio,\n                norm=norm,\n                act_func=act_func,\n                fewer_norm=True,\n            )\n            stage.append(ResidualBlock(block, None))\n            in_channels = w\n\n            for _ in range(d):\n                stage.append(\n                    EfficientViTBlock(\n                        in_channels=in_channels,\n                        dim=dim,\n                        expand_ratio=expand_ratio,\n                        norm=norm,\n                        act_func=act_func,\n                    )\n                )\n            self.stages.append(OpSequential(stage))\n            self.width_list.append(in_channels)\n        self.stages = nn.ModuleList(self.stages)\n        self.channel = [i.size(1) for i in self.forward(torch.randn(1, 3, 224, 224))]\n    @staticmethod\n    def build_local_block(in_channels: int, out_channels: int, stride: int, expand_ratio: float, norm: str, act_func: str, fewer_norm: bool = False) -> nn.Module:\n        if expand_ratio == 1:\n            block = DSConv(\n                in_channels=in_channels,\n                out_channels=out_channels,\n                stride=stride,\n                use_bias=(True, False) if fewer_norm else False,\n                norm=(None, norm) if fewer_norm else norm,\n                act_func=(act_func, None),\n            )\n        else:      \n            block = MBConv(\n                in_channels=in_channels,\n                out_channels=out_channels,\n                stride=stride,\n                expand_ratio=expand_ratio,\n                use_bias=(True, True, False) if fewer_norm else False,\n                norm=(None, None, norm) if fewer_norm else norm,\n                act_func=(act_func, act_func, None),\n            )\n        return block\n\n    def forward(self, x: torch.Tensor) -> Dict[str, torch.Tensor]:\n        res = []\n        x = self.input_stem(x)\n        res.append(x)\n        for stage_id, stage in enumerate(self.stages, 1):\n            x = stage(x)\n            res.append(x)\n        return res\n\ndef update_weight(model_dict, weight_dict):\n    idx, temp_dict = 0, {}\n    for k, v in weight_dict.items():\n        k = k[9:]\n        if k in model_dict.keys() and np.shape(model_dict[k]) == np.shape(v):\n            temp_dict[k] = v\n            idx += 1\n    model_dict.update(temp_dict)\n    print(f'loading weights... {idx}/{len(model_dict)} items')\n    return model_dict\n\ndef efficientvit_b0(weights='', **kwargs) -> EfficientViTBackbone:\n    backbone = EfficientViTBackbone(\n        width_list=[8, 16, 32, 64, 128],\n        depth_list=[1, 2, 2, 2, 2],\n        dim=16,\n        **build_kwargs_from_config(kwargs, EfficientViTBackbone),\n    )\n    if weights:\n        backbone.load_state_dict(update_weight(backbone.state_dict(), torch.load(weights)['state_dict']))\n    return backbone\n\n\ndef efficientvit_b1(weights='', **kwargs) -> EfficientViTBackbone:\n    backbone = EfficientViTBackbone(\n        width_list=[16, 32, 64, 128, 256],\n        depth_list=[1, 2, 3, 3, 4],\n        dim=16,\n        **build_kwargs_from_config(kwargs, EfficientViTBackbone),\n    )\n    if weights:\n        backbone.load_state_dict(update_weight(backbone.state_dict(), torch.load(weights)['state_dict']))\n    return backbone\n\n\ndef efficientvit_b2(weights='', **kwargs) -> EfficientViTBackbone:\n    backbone = EfficientViTBackbone(\n        width_list=[24, 48, 96, 192, 384],\n        depth_list=[1, 3, 4, 4, 6],\n        dim=32,\n        **build_kwargs_from_config(kwargs, EfficientViTBackbone),\n    )\n    if weights:\n        backbone.load_state_dict(update_weight(backbone.state_dict(), torch.load(weights)['state_dict']))\n    return backbone\n\n\ndef efficientvit_b3(weights='', **kwargs) -> EfficientViTBackbone:\n    backbone = EfficientViTBackbone(\n        width_list=[32, 64, 128, 256, 512],\n        depth_list=[1, 4, 6, 6, 9],\n        dim=32,\n        **build_kwargs_from_config(kwargs, EfficientViTBackbone),\n    )\n    if weights:\n        backbone.load_state_dict(update_weight(backbone.state_dict(), torch.load(weights)['state_dict']))\n    return backbone\n\nif __name__ == '__main__':\n    model = efficientvit_b1()\n    weights = torch.load('b1-r288.pt')['state_dict']\n    model.load_state_dict(update_weight(model.state_dict(), weights))\n    inputs = torch.randn((1, 3, 640, 640))\n    res = model(inputs)\n    for i in res:\n        print(i.size())"
  },
  {
    "path": "yolo-improve/yolov5-backbone/FocalNet/FocalNet.py",
    "content": "# --------------------------------------------------------\n# FocalNets -- Focal Modulation Networks\n# Copyright (c) 2022 Microsoft\n# Licensed under The MIT License [see LICENSE for details]\n# Written by Jianwei Yang (jianwyan@microsoft.com)\n# --------------------------------------------------------\n\nimport numpy as np\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport torch.utils.checkpoint as checkpoint\nfrom timm.models.layers import DropPath, to_2tuple, trunc_normal_\n\n__all__ = ['focalnet_tiny_srf', 'focalnet_tiny_lrf', 'focalnet_small_srf', 'focalnet_small_lrf', 'focalnet_base_srf', 'focalnet_base_lrf', 'focalnet_large_fl3', 'focalnet_large_fl4', 'focalnet_xlarge_fl3', 'focalnet_xlarge_fl4', 'focalnet_huge_fl3', 'focalnet_huge_fl4']\n\ndef update_weight(model_dict, weight_dict):\n    idx, temp_dict = 0, {}\n    for k, v in weight_dict.items():\n        if k in model_dict.keys() and np.shape(model_dict[k]) == np.shape(v):\n            temp_dict[k] = v\n            idx += 1\n    model_dict.update(temp_dict)\n    print(f'loading weights... {idx}/{len(model_dict)} items')\n    return model_dict\n\nclass Mlp(nn.Module):\n    def __init__(self, in_features, hidden_features=None, out_features=None, act_layer=nn.GELU, drop=0.):\n        super().__init__()\n        out_features = out_features or in_features\n        hidden_features = hidden_features or in_features\n        self.fc1 = nn.Linear(in_features, hidden_features)\n        self.act = act_layer()\n        self.fc2 = nn.Linear(hidden_features, out_features)\n        self.drop = nn.Dropout(drop)\n\n    def forward(self, x):\n        x = self.fc1(x)     \n        x = self.act(x)\n        x = self.drop(x)\n        x = self.fc2(x)\n        x = self.drop(x)\n        return x\n\nclass FocalModulation(nn.Module):\n    def __init__(self, dim, focal_window, focal_level, focal_factor=2, bias=True, proj_drop=0., use_postln_in_modulation=False, normalize_modulator=False):\n        super().__init__()\n\n        self.dim = dim\n        self.focal_window = focal_window\n        self.focal_level = focal_level\n        self.focal_factor = focal_factor\n        self.use_postln_in_modulation = use_postln_in_modulation\n        self.normalize_modulator = normalize_modulator\n\n        self.f = nn.Linear(dim, 2*dim + (self.focal_level+1), bias=bias)\n        self.h = nn.Conv2d(dim, dim, kernel_size=1, stride=1, bias=bias)\n\n        self.act = nn.GELU()\n        self.proj = nn.Linear(dim, dim)\n        self.proj_drop = nn.Dropout(proj_drop)\n        self.focal_layers = nn.ModuleList()\n                \n        self.kernel_sizes = []\n        for k in range(self.focal_level):\n            kernel_size = self.focal_factor*k + self.focal_window\n            self.focal_layers.append(\n                nn.Sequential(\n                    nn.Conv2d(dim, dim, kernel_size=kernel_size, stride=1, \n                    groups=dim, padding=kernel_size//2, bias=False),\n                    nn.GELU(),\n                    )\n                )              \n            self.kernel_sizes.append(kernel_size)          \n        if self.use_postln_in_modulation:\n            self.ln = nn.LayerNorm(dim)\n\n    def forward(self, x):\n        \"\"\"\n        Args:\n            x: input features with shape of (B, H, W, C)\n        \"\"\"\n        C = x.shape[-1]\n\n        # pre linear projection\n        x = self.f(x).permute(0, 3, 1, 2).contiguous()\n        q, ctx, gates = torch.split(x, (C, C, self.focal_level+1), 1)\n        \n        # context aggreation\n        ctx_all = 0 \n        for l in range(self.focal_level):         \n            ctx = self.focal_layers[l](ctx)\n            ctx_all = ctx_all + ctx * gates[:, l:l+1]\n        ctx_global = self.act(ctx.mean(2, keepdim=True).mean(3, keepdim=True))\n        ctx_all = ctx_all + ctx_global * gates[:,self.focal_level:]\n\n        # normalize context\n        if self.normalize_modulator:\n            ctx_all = ctx_all / (self.focal_level+1)\n\n        # focal modulation\n        modulator = self.h(ctx_all)\n        x_out = q * modulator\n        x_out = x_out.permute(0, 2, 3, 1).contiguous()\n        if self.use_postln_in_modulation:\n            x_out = self.ln(x_out)\n        \n        # post linear porjection\n        x_out = self.proj(x_out)\n        x_out = self.proj_drop(x_out)\n        return x_out\n\n    def extra_repr(self) -> str:\n        return f'dim={self.dim}'\n\n    def flops(self, N):\n        # calculate flops for 1 window with token length of N\n        flops = 0\n\n        flops += N * self.dim * (self.dim * 2 + (self.focal_level+1))\n\n        # focal convolution\n        for k in range(self.focal_level):\n            flops += N * (self.kernel_sizes[k]**2+1) * self.dim\n\n        # global gating\n        flops += N * 1 * self.dim \n\n        #  self.linear\n        flops += N * self.dim * (self.dim + 1)\n\n        # x = self.proj(x)\n        flops += N * self.dim * self.dim\n        return flops\n\nclass FocalNetBlock(nn.Module):\n    r\"\"\" Focal Modulation Network Block.\n\n    Args:\n        dim (int): Number of input channels.\n        input_resolution (tuple[int]): Input resulotion.\n        mlp_ratio (float): Ratio of mlp hidden dim to embedding dim.\n        drop (float, optional): Dropout rate. Default: 0.0\n        drop_path (float, optional): Stochastic depth rate. Default: 0.0\n        act_layer (nn.Module, optional): Activation layer. Default: nn.GELU\n        norm_layer (nn.Module, optional): Normalization layer.  Default: nn.LayerNorm\n        focal_level (int): Number of focal levels. \n        focal_window (int): Focal window size at first focal level\n        use_layerscale (bool): Whether use layerscale\n        layerscale_value (float): Initial layerscale value\n        use_postln (bool): Whether use layernorm after modulation\n    \"\"\"\n\n    def __init__(self, dim, input_resolution, mlp_ratio=4., drop=0., drop_path=0., \n                    act_layer=nn.GELU, norm_layer=nn.LayerNorm,\n                    focal_level=1, focal_window=3,\n                    use_layerscale=False, layerscale_value=1e-4, \n                    use_postln=False, use_postln_in_modulation=False, \n                    normalize_modulator=False):\n        super().__init__()\n        self.dim = dim\n        self.input_resolution = input_resolution\n        self.mlp_ratio = mlp_ratio\n\n        self.focal_window = focal_window\n        self.focal_level = focal_level\n        self.use_postln = use_postln\n\n        self.norm1 = norm_layer(dim)\n        self.modulation = FocalModulation(\n            dim, proj_drop=drop, focal_window=focal_window, focal_level=self.focal_level, \n            use_postln_in_modulation=use_postln_in_modulation, normalize_modulator=normalize_modulator\n        )\n\n        self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity()\n        self.norm2 = norm_layer(dim)\n        mlp_hidden_dim = int(dim * mlp_ratio)\n        self.mlp = Mlp(in_features=dim, hidden_features=mlp_hidden_dim, act_layer=act_layer, drop=drop)\n\n        self.gamma_1 = 1.0\n        self.gamma_2 = 1.0    \n        if use_layerscale:\n            self.gamma_1 = nn.Parameter(layerscale_value * torch.ones((dim)), requires_grad=True)\n            self.gamma_2 = nn.Parameter(layerscale_value * torch.ones((dim)), requires_grad=True)\n\n        self.H = None\n        self.W = None\n\n    def forward(self, x):\n        H, W = self.H, self.W\n        B, L, C = x.shape\n        shortcut = x\n\n        # Focal Modulation\n        x = x if self.use_postln else self.norm1(x)\n        x = x.view(B, H, W, C)\n        x = self.modulation(x).view(B, H * W, C)\n        x = x if not self.use_postln else self.norm1(x)\n\n        # FFN\n        x = shortcut + self.drop_path(self.gamma_1 * x)\n        x = x + self.drop_path(self.gamma_2 * (self.norm2(self.mlp(x)) if self.use_postln else self.mlp(self.norm2(x))))\n\n        return x\n\n    def extra_repr(self) -> str:\n        return f\"dim={self.dim}, input_resolution={self.input_resolution}, \" \\\n               f\"mlp_ratio={self.mlp_ratio}\"\n\n    def flops(self):\n        flops = 0\n        H, W = self.input_resolution\n        # norm1\n        flops += self.dim * H * W\n        \n        # W-MSA/SW-MSA\n        flops += self.modulation.flops(H*W)\n\n        # mlp\n        flops += 2 * H * W * self.dim * self.dim * self.mlp_ratio\n        # norm2\n        flops += self.dim * H * W\n        return flops\n\nclass BasicLayer(nn.Module):\n    \"\"\" A basic Focal Transformer layer for one stage.\n\n    Args:\n        dim (int): Number of input channels.\n        input_resolution (tuple[int]): Input resolution.\n        depth (int): Number of blocks.\n        window_size (int): Local window size.\n        mlp_ratio (float): Ratio of mlp hidden dim to embedding dim.\n        qkv_bias (bool, optional): If True, add a learnable bias to query, key, value. Default: True\n        qk_scale (float | None, optional): Override default qk scale of head_dim ** -0.5 if set.\n        drop (float, optional): Dropout rate. Default: 0.0\n        drop_path (float | tuple[float], optional): Stochastic depth rate. Default: 0.0\n        norm_layer (nn.Module, optional): Normalization layer. Default: nn.LayerNorm\n        downsample (nn.Module | None, optional): Downsample layer at the end of the layer. Default: None\n        use_checkpoint (bool): Whether to use checkpointing to save memory. Default: False.\n        focal_level (int): Number of focal levels\n        focal_window (int): Focal window size at first focal level\n        use_layerscale (bool): Whether use layerscale\n        layerscale_value (float): Initial layerscale value\n        use_postln (bool): Whether use layernorm after modulation\n    \"\"\"\n\n    def __init__(self, dim, out_dim, input_resolution, depth,\n                 mlp_ratio=4., drop=0., drop_path=0., norm_layer=nn.LayerNorm, \n                 downsample=None, use_checkpoint=False, \n                 focal_level=1, focal_window=1, \n                 use_conv_embed=False, \n                 use_layerscale=False, layerscale_value=1e-4, \n                 use_postln=False, \n                 use_postln_in_modulation=False, \n                 normalize_modulator=False):\n\n        super().__init__()\n        self.dim = dim\n        self.input_resolution = input_resolution\n        self.depth = depth\n        self.use_checkpoint = use_checkpoint\n        \n        # build blocks\n        self.blocks = nn.ModuleList([\n            FocalNetBlock(\n                dim=dim, \n                input_resolution=input_resolution,\n                mlp_ratio=mlp_ratio, \n                drop=drop, \n                drop_path=drop_path[i] if isinstance(drop_path, list) else drop_path,\n                norm_layer=norm_layer,\n                focal_level=focal_level,\n                focal_window=focal_window, \n                use_layerscale=use_layerscale, \n                layerscale_value=layerscale_value,\n                use_postln=use_postln, \n                use_postln_in_modulation=use_postln_in_modulation, \n                normalize_modulator=normalize_modulator, \n            )\n            for i in range(depth)])\n\n        if downsample is not None:\n            self.downsample = downsample(\n                img_size=input_resolution, \n                patch_size=2, \n                in_chans=dim, \n                embed_dim=out_dim, \n                use_conv_embed=use_conv_embed, \n                norm_layer=norm_layer, \n                is_stem=False\n            )\n        else:\n            self.downsample = None\n\n    def forward(self, x, H, W):\n        for blk in self.blocks:\n            blk.H, blk.W = H, W\n            if self.use_checkpoint:\n                x = checkpoint.checkpoint(blk, x)\n            else:\n                x = blk(x)\n\n        if self.downsample is not None:\n            x = x.transpose(1, 2).reshape(x.shape[0], -1, H, W)\n            x, Ho, Wo = self.downsample(x)\n        else:\n            Ho, Wo = H, W        \n        return x, Ho, Wo\n\n    def extra_repr(self) -> str:\n        return f\"dim={self.dim}, input_resolution={self.input_resolution}, depth={self.depth}\"\n\n    def flops(self):\n        flops = 0\n        for blk in self.blocks:\n            flops += blk.flops()\n        if self.downsample is not None:\n            flops += self.downsample.flops()\n        return flops\n\nclass PatchEmbed(nn.Module):\n    r\"\"\" Image to Patch Embedding\n\n    Args:\n        img_size (int): Image size.  Default: 224.\n        patch_size (int): Patch token size. Default: 4.\n        in_chans (int): Number of input image channels. Default: 3.\n        embed_dim (int): Number of linear projection output channels. Default: 96.\n        norm_layer (nn.Module, optional): Normalization layer. Default: None\n    \"\"\"\n\n    def __init__(self, img_size=(224, 224), patch_size=4, in_chans=3, embed_dim=96, use_conv_embed=False, norm_layer=None, is_stem=False):\n        super().__init__()\n        patch_size = to_2tuple(patch_size)\n        patches_resolution = [img_size[0] // patch_size[0], img_size[1] // patch_size[1]]\n        self.img_size = img_size\n        self.patch_size = patch_size\n        self.patches_resolution = patches_resolution\n        self.num_patches = patches_resolution[0] * patches_resolution[1]\n\n        self.in_chans = in_chans\n        self.embed_dim = embed_dim\n\n        if use_conv_embed:\n            # if we choose to use conv embedding, then we treat the stem and non-stem differently\n            if is_stem:\n                kernel_size = 7; padding = 2; stride = 4\n            else:\n                kernel_size = 3; padding = 1; stride = 2\n            self.proj = nn.Conv2d(in_chans, embed_dim, kernel_size=kernel_size, stride=stride, padding=padding)\n        else:\n            self.proj = nn.Conv2d(in_chans, embed_dim, kernel_size=patch_size, stride=patch_size)\n        \n        if norm_layer is not None:\n            self.norm = norm_layer(embed_dim)\n        else:\n            self.norm = None\n\n    def forward(self, x):\n        B, C, H, W = x.shape\n\n        x = self.proj(x)        \n        H, W = x.shape[2:]\n        x = x.flatten(2).transpose(1, 2)  # B Ph*Pw C\n        if self.norm is not None:\n            x = self.norm(x)\n        return x, H, W\n\n    def flops(self):\n        Ho, Wo = self.patches_resolution\n        flops = Ho * Wo * self.embed_dim * self.in_chans * (self.patch_size[0] * self.patch_size[1])\n        if self.norm is not None:\n            flops += Ho * Wo * self.embed_dim\n        return flops\n\nclass FocalNet(nn.Module):\n    r\"\"\" Focal Modulation Networks (FocalNets)\n\n    Args:\n        img_size (int | tuple(int)): Input image size. Default 224\n        patch_size (int | tuple(int)): Patch size. Default: 4\n        in_chans (int): Number of input image channels. Default: 3\n        num_classes (int): Number of classes for classification head. Default: 1000\n        embed_dim (int): Patch embedding dimension. Default: 96\n        depths (tuple(int)): Depth of each Focal Transformer layer.\n        mlp_ratio (float): Ratio of mlp hidden dim to embedding dim. Default: 4\n        drop_rate (float): Dropout rate. Default: 0\n        drop_path_rate (float): Stochastic depth rate. Default: 0.1\n        norm_layer (nn.Module): Normalization layer. Default: nn.LayerNorm.\n        patch_norm (bool): If True, add normalization after patch embedding. Default: True\n        use_checkpoint (bool): Whether to use checkpointing to save memory. Default: False \n        focal_levels (list): How many focal levels at all stages. Note that this excludes the finest-grain level. Default: [1, 1, 1, 1] \n        focal_windows (list): The focal window size at all stages. Default: [7, 5, 3, 1] \n        use_conv_embed (bool): Whether use convolutional embedding. We noted that using convolutional embedding usually improve the performance, but we do not use it by default. Default: False \n        use_layerscale (bool): Whether use layerscale proposed in CaiT. Default: False \n        layerscale_value (float): Value for layer scale. Default: 1e-4 \n        use_postln (bool): Whether use layernorm after modulation (it helps stablize training of large models)\n    \"\"\"\n    def __init__(self, \n                img_size=224, \n                patch_size=4, \n                in_chans=3, \n                num_classes=1000,\n                embed_dim=96, \n                depths=[2, 2, 6, 2], \n                mlp_ratio=4., \n                drop_rate=0., \n                drop_path_rate=0.1,\n                norm_layer=nn.LayerNorm, \n                patch_norm=True,\n                use_checkpoint=False,                 \n                focal_levels=[2, 2, 2, 2], \n                focal_windows=[3, 3, 3, 3], \n                use_conv_embed=False, \n                use_layerscale=False, \n                layerscale_value=1e-4, \n                use_postln=False, \n                use_postln_in_modulation=False, \n                normalize_modulator=False, \n                **kwargs):\n        super().__init__()\n\n        self.num_layers = len(depths)\n        embed_dim = [embed_dim * (2 ** i) for i in range(self.num_layers)]\n\n        self.num_classes = num_classes\n        self.embed_dim = embed_dim\n        self.patch_norm = patch_norm\n        self.num_features = embed_dim[-1]\n        self.mlp_ratio = mlp_ratio\n        \n        # split image into patches using either non-overlapped embedding or overlapped embedding\n        self.patch_embed = PatchEmbed(\n            img_size=to_2tuple(img_size), \n            patch_size=patch_size, \n            in_chans=in_chans, \n            embed_dim=embed_dim[0], \n            use_conv_embed=use_conv_embed, \n            norm_layer=norm_layer if self.patch_norm else None, \n            is_stem=True)\n\n        num_patches = self.patch_embed.num_patches\n        patches_resolution = self.patch_embed.patches_resolution\n        self.patches_resolution = patches_resolution\n        self.pos_drop = nn.Dropout(p=drop_rate)\n\n        # stochastic depth\n        dpr = [x.item() for x in torch.linspace(0, drop_path_rate, sum(depths))]  # stochastic depth decay rule\n\n        # build layers\n        self.layers = nn.ModuleList()\n        for i_layer in range(self.num_layers):\n            layer = BasicLayer(dim=embed_dim[i_layer], \n                               out_dim=embed_dim[i_layer+1] if (i_layer < self.num_layers - 1) else None,  \n                               input_resolution=(patches_resolution[0] // (2 ** i_layer),\n                                                 patches_resolution[1] // (2 ** i_layer)),\n                               depth=depths[i_layer],\n                               mlp_ratio=self.mlp_ratio,\n                               drop=drop_rate, \n                               drop_path=dpr[sum(depths[:i_layer]):sum(depths[:i_layer + 1])],\n                               norm_layer=norm_layer, \n                               downsample=PatchEmbed if (i_layer < self.num_layers - 1) else None,\n                               focal_level=focal_levels[i_layer], \n                               focal_window=focal_windows[i_layer], \n                               use_conv_embed=use_conv_embed,\n                               use_checkpoint=use_checkpoint, \n                               use_layerscale=use_layerscale, \n                               layerscale_value=layerscale_value, \n                               use_postln=use_postln,\n                               use_postln_in_modulation=use_postln_in_modulation, \n                               normalize_modulator=normalize_modulator\n                    )\n            self.layers.append(layer)\n\n        self.norm = norm_layer(self.num_features)\n\n        self.apply(self._init_weights)\n        self.channel = [i.size(1) for i in self.forward(torch.randn(1, 3, 640, 640))]\n\n    def _init_weights(self, m):\n        if isinstance(m, nn.Linear):\n            trunc_normal_(m.weight, std=.02)\n            if isinstance(m, nn.Linear) and m.bias is not None:\n                nn.init.constant_(m.bias, 0)\n        elif isinstance(m, nn.LayerNorm):\n            nn.init.constant_(m.bias, 0)\n            nn.init.constant_(m.weight, 1.0)\n\n    @torch.jit.ignore\n    def no_weight_decay(self):\n        return {''}\n\n    @torch.jit.ignore\n    def no_weight_decay_keywords(self):\n        return {''}\n\n    def forward(self, x):\n        input_size = x.size(2)\n        scale = [4, 8, 16, 32]\n        \n        x, H, W = self.patch_embed(x)\n        x = self.pos_drop(x)\n        features = [x, None, None, None]\n        for layer in self.layers:\n            x, H, W = layer(x, H, W)\n            if input_size // H in scale:\n                features[scale.index(input_size // H)] = x\n        # features[-1] = self.norm(features[-1])  # B L C\n        \n        for i in range(len(features)):\n            features[i] = torch.transpose(features[i], dim0=2, dim1=1).view(-1,features[i].size(2), int(features[i].size(1) ** 0.5), int(features[i].size(1) ** 0.5))\n        \n        return features\n\n    def flops(self):\n        flops = 0\n        flops += self.patch_embed.flops()\n        for i, layer in enumerate(self.layers):\n            flops += layer.flops()\n        flops += self.num_features * self.patches_resolution[0] * self.patches_resolution[1] // (2 ** self.num_layers)\n        flops += self.num_features * self.num_classes\n        return flops\n\nmodel_urls = {\n    \"focalnet_tiny_srf\": \"https://projects4jw.blob.core.windows.net/focalnet/release/classification/focalnet_tiny_srf.pth\",\n    \"focalnet_tiny_lrf\": \"https://projects4jw.blob.core.windows.net/focalnet/release/classification/focalnet_tiny_lrf.pth\",\n    \"focalnet_small_srf\": \"https://projects4jw.blob.core.windows.net/focalnet/release/classification/focalnet_small_srf.pth\",\n    \"focalnet_small_lrf\": \"https://projects4jw.blob.core.windows.net/focalnet/release/classification/focalnet_small_lrf.pth\",\n    \"focalnet_base_srf\": \"https://projects4jw.blob.core.windows.net/focalnet/release/classification/focalnet_base_srf.pth\",\n    \"focalnet_base_lrf\": \"https://projects4jw.blob.core.windows.net/focalnet/release/classification/focalnet_base_lrf.pth\",    \n    \"focalnet_large_fl3\": \"https://projects4jw.blob.core.windows.net/focalnet/release/classification/focalnet_large_lrf_384.pth\", \n    \"focalnet_large_fl4\": \"https://projects4jw.blob.core.windows.net/focalnet/release/classification/focalnet_large_lrf_384_fl4.pth\", \n    \"focalnet_xlarge_fl3\": \"https://projects4jw.blob.core.windows.net/focalnet/release/classification/focalnet_xlarge_lrf_384.pth\", \n    \"focalnet_xlarge_fl4\": \"https://projects4jw.blob.core.windows.net/focalnet/release/classification/focalnet_xlarge_lrf_384_fl4.pth\", \n    \"focalnet_huge_fl3\": \"https://projects4jw.blob.core.windows.net/focalnet/release/classification/focalnet_huge_lrf_224.pth\", \n    \"focalnet_huge_fl4\": \"https://projects4jw.blob.core.windows.net/focalnet/release/classification/focalnet_huge_lrf_224_fl4.pth\", \n}\n\ndef focalnet_tiny_srf(pretrained=False, **kwargs):\n    model = FocalNet(depths=[2, 2, 6, 2], embed_dim=96, **kwargs)\n    if pretrained:\n        url = model_urls['focalnet_tiny_srf']\n        checkpoint = torch.hub.load_state_dict_from_url(url=url, map_location=\"cpu\", check_hash=True)\n        model.load_state_dict(update_weight(model.state_dict(), checkpoint[\"model\"]))\n    return model\n\ndef focalnet_small_srf(pretrained=False, **kwargs):\n    model = FocalNet(depths=[2, 2, 18, 2], embed_dim=96, **kwargs)\n    if pretrained:\n        url = model_urls['focalnet_small_srf']\n        checkpoint = torch.hub.load_state_dict_from_url(url=url, map_location=\"cpu\")\n        model.load_state_dict(update_weight(model.state_dict(), checkpoint[\"model\"]))\n    return model\n\ndef focalnet_base_srf(pretrained=False, **kwargs):\n    model = FocalNet(depths=[2, 2, 18, 2], embed_dim=128, **kwargs)\n    if pretrained:\n        url = model_urls['focalnet_base_srf']\n        checkpoint = torch.hub.load_state_dict_from_url(url=url, map_location=\"cpu\")\n        model.load_state_dict(update_weight(model.state_dict(), checkpoint[\"model\"]))\n    return model\n\ndef focalnet_tiny_lrf(pretrained=False, **kwargs):\n    model = FocalNet(depths=[2, 2, 6, 2], embed_dim=96, **kwargs)\n    if pretrained:\n        url = model_urls['focalnet_tiny_lrf']\n        checkpoint = torch.hub.load_state_dict_from_url(url=url, map_location=\"cpu\", check_hash=True)\n        model.load_state_dict(update_weight(model.state_dict(), checkpoint[\"model\"]))\n    return model\n\ndef focalnet_small_lrf(pretrained=False, **kwargs):\n    model = FocalNet(depths=[2, 2, 18, 2], embed_dim=96, **kwargs)\n    if pretrained:\n        url = model_urls['focalnet_small_lrf']\n        checkpoint = torch.hub.load_state_dict_from_url(url=url, map_location=\"cpu\")\n        model.load_state_dict(update_weight(model.state_dict(), checkpoint[\"model\"]))\n    return model\n\ndef focalnet_base_lrf(pretrained=False, **kwargs):\n    model = FocalNet(depths=[2, 2, 18, 2], embed_dim=128, **kwargs)\n    if pretrained:\n        url = model_urls['focalnet_base_lrf']\n        checkpoint = torch.hub.load_state_dict_from_url(url=url, map_location=\"cpu\")\n        model.load_state_dict(update_weight(model.state_dict(), checkpoint[\"model\"]))\n    return model\n\ndef focalnet_tiny_iso(pretrained=False, **kwargs):\n    model = FocalNet(depths=[12], patch_size=16, embed_dim=192, **kwargs)\n    if pretrained:\n        url = model_urls['focalnet_tiny_iso']\n        checkpoint = torch.hub.load_state_dict_from_url(url=url, map_location=\"cpu\", check_hash=True)\n        model.load_state_dict(update_weight(model.state_dict(), checkpoint[\"model\"]))\n    return model\n\ndef focalnet_small_iso(pretrained=False, **kwargs):\n    model = FocalNet(depths=[12], patch_size=16, embed_dim=384, **kwargs)\n    if pretrained:\n        url = model_urls['focalnet_small_iso']\n        checkpoint = torch.hub.load_state_dict_from_url(url=url, map_location=\"cpu\")\n        model.load_state_dict(update_weight(model.state_dict(), checkpoint[\"model\"]))\n    return model\n\ndef focalnet_base_iso(pretrained=False, **kwargs):\n    model = FocalNet(depths=[12], patch_size=16, embed_dim=768, focal_levels=[3], focal_windows=[3], use_layerscale=True, use_postln=True, **kwargs)\n    if pretrained:\n        url = model_urls['focalnet_base_iso']\n        checkpoint = torch.hub.load_state_dict_from_url(url=url, map_location=\"cpu\")\n        model.load_state_dict(update_weight(model.state_dict(), checkpoint[\"model\"]))\n    return model\n\n# FocalNet large+ models \ndef focalnet_large_fl3(pretrained=False, **kwargs):\n    model = FocalNet(depths=[2, 2, 18, 2], embed_dim=192, **kwargs)\n    if pretrained:\n        url = model_urls['focalnet_large_fl3']\n        checkpoint = torch.hub.load_state_dict_from_url(url=url, map_location=\"cpu\")\n        model.load_state_dict(update_weight(model.state_dict(), checkpoint[\"model\"]))\n    return model\n\ndef focalnet_large_fl4(pretrained=False, **kwargs):\n    model = FocalNet(depths=[2, 2, 18, 2], embed_dim=192, **kwargs)\n    if pretrained:\n        url = model_urls['focalnet_large_fl4']\n        checkpoint = torch.hub.load_state_dict_from_url(url=url, map_location=\"cpu\")\n        model.load_state_dict(update_weight(model.state_dict(), checkpoint[\"model\"]))\n    return model\n\ndef focalnet_xlarge_fl3(pretrained=False, **kwargs):\n    model = FocalNet(depths=[2, 2, 18, 2], embed_dim=256, **kwargs)\n    if pretrained:\n        url = model_urls['focalnet_xlarge_fl3']\n        checkpoint = torch.hub.load_state_dict_from_url(url=url, map_location=\"cpu\")\n        model.load_state_dict(update_weight(model.state_dict(), checkpoint[\"model\"]))\n    return model\n\ndef focalnet_xlarge_fl4(pretrained=False, **kwargs):\n    model = FocalNet(depths=[2, 2, 18, 2], embed_dim=256, **kwargs)\n    if pretrained:\n        url = model_urls['focalnet_xlarge_fl4']\n        checkpoint = torch.hub.load_state_dict_from_url(url=url, map_location=\"cpu\")\n        model.load_state_dict(update_weight(model.state_dict(), checkpoint[\"model\"]))\n    return model\n\ndef focalnet_huge_fl3(pretrained=False, **kwargs):\n    model = FocalNet(depths=[2, 2, 18, 2], embed_dim=352, **kwargs)\n    if pretrained:\n        url = model_urls['focalnet_huge_fl3']\n        checkpoint = torch.hub.load_state_dict_from_url(url=url, map_location=\"cpu\")\n        model.load_state_dict(update_weight(model.state_dict(), checkpoint[\"model\"]))\n    return model\n\ndef focalnet_huge_fl4(pretrained=False, **kwargs):\n    model = FocalNet(depths=[2, 2, 18, 2], embed_dim=352, **kwargs)\n    if pretrained:\n        url = model_urls['focalnet_huge_fl4']\n        checkpoint = torch.hub.load_state_dict_from_url(url=url, map_location=\"cpu\")\n        model.load_state_dict(update_weight(model.state_dict(), checkpoint[\"model\"]))\n    return model\n\nif __name__ == '__main__':\n    from copy import deepcopy\n    img_size = 640\n    x = torch.rand(16, 3, img_size, img_size).cuda()\n    model = focalnet_tiny_srf(pretrained=True).cuda()\n    # model_copy = deepcopy(model)\n    for i in model(x):\n        print(i.size())\n\n    flops = model.flops()\n    print(f\"number of GFLOPs: {flops / 1e9}\")\n\n    n_parameters = sum(p.numel() for p in model.parameters() if p.requires_grad)\n    print(f\"number of params: {n_parameters}\")\n    \n    print(list(model_urls.keys()))"
  },
  {
    "path": "yolo-improve/yolov5-backbone/LSKNet/lsknet.py",
    "content": "import torch\nimport torch.nn as nn\nfrom torch.nn.modules.utils import _pair as to_2tuple\nfrom timm.layers import DropPath, to_2tuple\nfrom functools import partial\nimport numpy as np\n\n__all__ = 'lsknet_t', 'lsknet_s'\n\nclass Mlp(nn.Module):\n    def __init__(self, in_features, hidden_features=None, out_features=None, act_layer=nn.GELU, drop=0.):\n        super().__init__()\n        out_features = out_features or in_features\n        hidden_features = hidden_features or in_features\n        self.fc1 = nn.Conv2d(in_features, hidden_features, 1)\n        self.dwconv = DWConv(hidden_features)\n        self.act = act_layer()\n        self.fc2 = nn.Conv2d(hidden_features, out_features, 1)\n        self.drop = nn.Dropout(drop)\n\n    def forward(self, x):\n        x = self.fc1(x)\n        x = self.dwconv(x)\n        x = self.act(x)\n        x = self.drop(x)\n        x = self.fc2(x)\n        x = self.drop(x)\n        return x\n\n\nclass LSKblock(nn.Module):\n    def __init__(self, dim):\n        super().__init__()\n        self.conv0 = nn.Conv2d(dim, dim, 5, padding=2, groups=dim)\n        self.conv_spatial = nn.Conv2d(dim, dim, 7, stride=1, padding=9, groups=dim, dilation=3)\n        self.conv1 = nn.Conv2d(dim, dim//2, 1)\n        self.conv2 = nn.Conv2d(dim, dim//2, 1)\n        self.conv_squeeze = nn.Conv2d(2, 2, 7, padding=3)\n        self.conv = nn.Conv2d(dim//2, dim, 1)\n\n    def forward(self, x):   \n        attn1 = self.conv0(x)\n        attn2 = self.conv_spatial(attn1)\n\n        attn1 = self.conv1(attn1)\n        attn2 = self.conv2(attn2)\n        \n        attn = torch.cat([attn1, attn2], dim=1)\n        avg_attn = torch.mean(attn, dim=1, keepdim=True)\n        max_attn, _ = torch.max(attn, dim=1, keepdim=True)\n        agg = torch.cat([avg_attn, max_attn], dim=1)\n        sig = self.conv_squeeze(agg).sigmoid()\n        attn = attn1 * sig[:,0,:,:].unsqueeze(1) + attn2 * sig[:,1,:,:].unsqueeze(1)\n        attn = self.conv(attn)\n        return x * attn\n\n\n\nclass Attention(nn.Module):\n    def __init__(self, d_model):\n        super().__init__()\n\n        self.proj_1 = nn.Conv2d(d_model, d_model, 1)\n        self.activation = nn.GELU()\n        self.spatial_gating_unit = LSKblock(d_model)\n        self.proj_2 = nn.Conv2d(d_model, d_model, 1)\n\n    def forward(self, x):\n        shorcut = x.clone()\n        x = self.proj_1(x)\n        x = self.activation(x)\n        x = self.spatial_gating_unit(x)\n        x = self.proj_2(x)\n        x = x + shorcut\n        return x\n\n\nclass Block(nn.Module):\n    def __init__(self, dim, mlp_ratio=4., drop=0.,drop_path=0., act_layer=nn.GELU, norm_cfg=None):\n        super().__init__()\n        self.norm1 = nn.BatchNorm2d(dim)\n        self.norm2 = nn.BatchNorm2d(dim)\n        self.attn = Attention(dim)\n        self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity()\n        mlp_hidden_dim = int(dim * mlp_ratio)\n        self.mlp = Mlp(in_features=dim, hidden_features=mlp_hidden_dim, act_layer=act_layer, drop=drop)\n        layer_scale_init_value = 1e-2            \n        self.layer_scale_1 = nn.Parameter(\n            layer_scale_init_value * torch.ones((dim)), requires_grad=True)\n        self.layer_scale_2 = nn.Parameter(\n            layer_scale_init_value * torch.ones((dim)), requires_grad=True)\n\n    def forward(self, x):\n        x = x + self.drop_path(self.layer_scale_1.unsqueeze(-1).unsqueeze(-1) * self.attn(self.norm1(x)))\n        x = x + self.drop_path(self.layer_scale_2.unsqueeze(-1).unsqueeze(-1) * self.mlp(self.norm2(x)))\n        return x\n\n\nclass OverlapPatchEmbed(nn.Module):\n    \"\"\" Image to Patch Embedding\n    \"\"\"\n\n    def __init__(self, img_size=224, patch_size=7, stride=4, in_chans=3, embed_dim=768, norm_cfg=None):\n        super().__init__()\n        patch_size = to_2tuple(patch_size)\n        self.proj = nn.Conv2d(in_chans, embed_dim, kernel_size=patch_size, stride=stride,\n                              padding=(patch_size[0] // 2, patch_size[1] // 2))\n        self.norm = nn.BatchNorm2d(embed_dim)\n\n\n    def forward(self, x):\n        x = self.proj(x)\n        _, _, H, W = x.shape\n        x = self.norm(x)        \n        return x, H, W\n\nclass LSKNet(nn.Module):\n    def __init__(self, img_size=224, in_chans=3, embed_dims=[64, 128, 256, 512],\n                mlp_ratios=[8, 8, 4, 4], drop_rate=0., drop_path_rate=0., norm_layer=partial(nn.LayerNorm, eps=1e-6),\n                 depths=[3, 4, 6, 3], num_stages=4, \n                 norm_cfg=None):\n        super().__init__()\n        \n        self.depths = depths\n        self.num_stages = num_stages\n\n        dpr = [x.item() for x in torch.linspace(0, drop_path_rate, sum(depths))]  # stochastic depth decay rule\n        cur = 0\n\n        for i in range(num_stages):\n            patch_embed = OverlapPatchEmbed(img_size=img_size if i == 0 else img_size // (2 ** (i + 1)),\n                                            patch_size=7 if i == 0 else 3,\n                                            stride=4 if i == 0 else 2,\n                                            in_chans=in_chans if i == 0 else embed_dims[i - 1],\n                                            embed_dim=embed_dims[i], norm_cfg=norm_cfg)\n\n            block = nn.ModuleList([Block(\n                dim=embed_dims[i], mlp_ratio=mlp_ratios[i], drop=drop_rate, drop_path=dpr[cur + j],norm_cfg=norm_cfg)\n                for j in range(depths[i])])\n            norm = norm_layer(embed_dims[i])\n            cur += depths[i]\n\n            setattr(self, f\"patch_embed{i + 1}\", patch_embed)\n            setattr(self, f\"block{i + 1}\", block)\n            setattr(self, f\"norm{i + 1}\", norm)\n        \n        self.channel = [i.size(1) for i in self.forward(torch.randn(1, 3, 640, 640))]\n\n    def forward(self, x):\n        B = x.shape[0]\n        outs = []\n        for i in range(self.num_stages):\n            patch_embed = getattr(self, f\"patch_embed{i + 1}\")\n            block = getattr(self, f\"block{i + 1}\")\n            norm = getattr(self, f\"norm{i + 1}\")\n            x, H, W = patch_embed(x)\n            for blk in block:\n                x = blk(x)\n            x = x.flatten(2).transpose(1, 2)\n            x = norm(x)\n            x = x.reshape(B, H, W, -1).permute(0, 3, 1, 2).contiguous()\n            outs.append(x)\n        return outs\n\n\nclass DWConv(nn.Module):\n    def __init__(self, dim=768):\n        super(DWConv, self).__init__()\n        self.dwconv = nn.Conv2d(dim, dim, 3, 1, 1, bias=True, groups=dim)\n\n    def forward(self, x):\n        x = self.dwconv(x)\n        return x\n\ndef update_weight(model_dict, weight_dict):\n    idx, temp_dict = 0, {}\n    for k, v in weight_dict.items():\n        if k in model_dict.keys() and np.shape(model_dict[k]) == np.shape(v):\n            temp_dict[k] = v\n            idx += 1\n    model_dict.update(temp_dict)\n    print(f'loading weights... {idx}/{len(model_dict)} items')\n    return model_dict\n\ndef lsknet_t(weights=''):\n    model = LSKNet(embed_dims=[32, 64, 160, 256], depths=[3, 3, 5, 2], drop_rate=0.1, drop_path_rate=0.1)\n    if weights:\n        model.load_state_dict(update_weight(model.state_dict(), torch.load(weights)['state_dict']))\n    return model\n\ndef lsknet_s(weights=''):\n    model = LSKNet(embed_dims=[64, 128, 256, 512], depths=[2, 2, 4, 2], drop_rate=0.1, drop_path_rate=0.1)\n    if weights:\n        model.load_state_dict(update_weight(model.state_dict(), torch.load(weights)['state_dict']))\n    return model\n\nif __name__ == '__main__':\n    model = lsknet_t('lsk_t_backbone-2ef8a593.pth')\n    inputs = torch.randn((1, 3, 640, 640))\n    for i in model(inputs):\n        print(i.size())"
  },
  {
    "path": "yolo-improve/yolov5-backbone/MobileNetV4/mobilenetv4.py",
    "content": "from typing import Any, Callable, Dict, List, Mapping, Optional, Tuple, Union\n\nimport torch\nimport torch.nn as nn\n\n__all__ = ['MobileNetV4ConvSmall', 'MobileNetV4ConvMedium', 'MobileNetV4ConvLarge', 'MobileNetV4HybridMedium', 'MobileNetV4HybridLarge']\n\nMNV4ConvSmall_BLOCK_SPECS = {\n    \"conv0\": {\n        \"block_name\": \"convbn\",\n        \"num_blocks\": 1,\n        \"block_specs\": [\n            [3, 32, 3, 2]\n        ]\n    },\n    \"layer1\": {\n        \"block_name\": \"convbn\",\n        \"num_blocks\": 2,\n        \"block_specs\": [\n            [32, 32, 3, 2],\n            [32, 32, 1, 1]\n        ]\n    },\n    \"layer2\": {\n        \"block_name\": \"convbn\",\n        \"num_blocks\": 2,\n        \"block_specs\": [\n            [32, 96, 3, 2],\n            [96, 64, 1, 1]\n        ]\n    },\n    \"layer3\": {\n        \"block_name\": \"uib\",\n        \"num_blocks\": 6,\n        \"block_specs\": [\n            [64, 96, 5, 5, True, 2, 3],\n            [96, 96, 0, 3, True, 1, 2],\n            [96, 96, 0, 3, True, 1, 2],\n            [96, 96, 0, 3, True, 1, 2],\n            [96, 96, 0, 3, True, 1, 2],\n            [96, 96, 3, 0, True, 1, 4],\n        ]\n    },\n    \"layer4\": {\n        \"block_name\": \"uib\",\n        \"num_blocks\": 6,\n        \"block_specs\": [\n            [96,  128, 3, 3, True, 2, 6],\n            [128, 128, 5, 5, True, 1, 4],\n            [128, 128, 0, 5, True, 1, 4],\n            [128, 128, 0, 5, True, 1, 3],\n            [128, 128, 0, 3, True, 1, 4],\n            [128, 128, 0, 3, True, 1, 4],\n        ]\n    },  \n    \"layer5\": {\n        \"block_name\": \"convbn\",\n        \"num_blocks\": 2,\n        \"block_specs\": [\n            [128, 960, 1, 1],\n            [960, 1280, 1, 1]\n        ]\n    }\n}\n\nMNV4ConvMedium_BLOCK_SPECS = {\n    \"conv0\": {\n        \"block_name\": \"convbn\",\n        \"num_blocks\": 1,\n        \"block_specs\": [\n            [3, 32, 3, 2]\n        ]\n    },\n    \"layer1\": {\n        \"block_name\": \"fused_ib\",\n        \"num_blocks\": 1,\n        \"block_specs\": [\n            [32, 48, 2, 4.0, True]\n        ]\n    },\n    \"layer2\": {\n        \"block_name\": \"uib\",\n        \"num_blocks\": 2,\n        \"block_specs\": [\n            [48, 80, 3, 5, True, 2, 4],\n            [80, 80, 3, 3, True, 1, 2]\n        ]\n    },\n    \"layer3\": {\n        \"block_name\": \"uib\",\n        \"num_blocks\": 8,\n        \"block_specs\": [\n            [80,  160, 3, 5, True, 2, 6],\n            [160, 160, 3, 3, True, 1, 4],\n            [160, 160, 3, 3, True, 1, 4],\n            [160, 160, 3, 5, True, 1, 4],\n            [160, 160, 3, 3, True, 1, 4],\n            [160, 160, 3, 0, True, 1, 4],\n            [160, 160, 0, 0, True, 1, 2],\n            [160, 160, 3, 0, True, 1, 4]\n        ]\n    },\n    \"layer4\": {\n        \"block_name\": \"uib\",\n        \"num_blocks\": 11,\n        \"block_specs\": [\n            [160, 256, 5, 5, True, 2, 6],\n            [256, 256, 5, 5, True, 1, 4],\n            [256, 256, 3, 5, True, 1, 4],\n            [256, 256, 3, 5, True, 1, 4],\n            [256, 256, 0, 0, True, 1, 4],\n            [256, 256, 3, 0, True, 1, 4],\n            [256, 256, 3, 5, True, 1, 2],\n            [256, 256, 5, 5, True, 1, 4],\n            [256, 256, 0, 0, True, 1, 4],\n            [256, 256, 0, 0, True, 1, 4],\n            [256, 256, 5, 0, True, 1, 2]\n        ]\n    },  \n    \"layer5\": {\n        \"block_name\": \"convbn\",\n        \"num_blocks\": 2,\n        \"block_specs\": [\n            [256, 960, 1, 1],\n            [960, 1280, 1, 1]\n        ]\n    }\n}\n\nMNV4ConvLarge_BLOCK_SPECS = {\n    \"conv0\": {\n        \"block_name\": \"convbn\",\n        \"num_blocks\": 1,\n        \"block_specs\": [\n            [3, 24, 3, 2]\n        ]\n    },\n    \"layer1\": {\n        \"block_name\": \"fused_ib\",\n        \"num_blocks\": 1,\n        \"block_specs\": [\n            [24, 48, 2, 4.0, True]\n        ]\n    },\n    \"layer2\": {\n        \"block_name\": \"uib\",\n        \"num_blocks\": 2,\n        \"block_specs\": [\n            [48, 96, 3, 5, True, 2, 4],\n            [96, 96, 3, 3, True, 1, 4]\n        ]\n    },\n    \"layer3\": {\n        \"block_name\": \"uib\",\n        \"num_blocks\": 11,\n        \"block_specs\": [\n            [96,  192, 3, 5, True, 2, 4],\n            [192, 192, 3, 3, True, 1, 4],\n            [192, 192, 3, 3, True, 1, 4],\n            [192, 192, 3, 3, True, 1, 4],\n            [192, 192, 3, 5, True, 1, 4],\n            [192, 192, 5, 3, True, 1, 4],\n            [192, 192, 5, 3, True, 1, 4],\n            [192, 192, 5, 3, True, 1, 4],\n            [192, 192, 5, 3, True, 1, 4],\n            [192, 192, 5, 3, True, 1, 4],\n            [192, 192, 3, 0, True, 1, 4]\n        ]\n    },\n    \"layer4\": {\n        \"block_name\": \"uib\",\n        \"num_blocks\": 13,\n        \"block_specs\": [\n            [192, 512, 5, 5, True, 2, 4],\n            [512, 512, 5, 5, True, 1, 4],\n            [512, 512, 5, 5, True, 1, 4],\n            [512, 512, 5, 5, True, 1, 4],\n            [512, 512, 5, 0, True, 1, 4],\n            [512, 512, 5, 3, True, 1, 4],\n            [512, 512, 5, 0, True, 1, 4],\n            [512, 512, 5, 0, True, 1, 4],\n            [512, 512, 5, 3, True, 1, 4],\n            [512, 512, 5, 5, True, 1, 4],\n            [512, 512, 5, 0, True, 1, 4],\n            [512, 512, 5, 0, True, 1, 4],\n            [512, 512, 5, 0, True, 1, 4]\n        ]\n    },  \n    \"layer5\": {\n        \"block_name\": \"convbn\",\n        \"num_blocks\": 2,\n        \"block_specs\": [\n            [512, 960, 1, 1],\n            [960, 1280, 1, 1]\n        ]\n    }\n}\n\nMNV4HybridConvMedium_BLOCK_SPECS = {\n\n}\n\nMNV4HybridConvLarge_BLOCK_SPECS = {\n\n}\n\nMODEL_SPECS = {\n    \"MobileNetV4ConvSmall\": MNV4ConvSmall_BLOCK_SPECS,\n    \"MobileNetV4ConvMedium\": MNV4ConvMedium_BLOCK_SPECS,\n    \"MobileNetV4ConvLarge\": MNV4ConvLarge_BLOCK_SPECS,\n    \"MobileNetV4HybridMedium\": MNV4HybridConvMedium_BLOCK_SPECS,\n    \"MobileNetV4HybridLarge\": MNV4HybridConvLarge_BLOCK_SPECS,\n}\n\ndef make_divisible(\n        value: float,\n        divisor: int,\n        min_value: Optional[float] = None,\n        round_down_protect: bool = True,\n    ) -> int:\n    \"\"\"\n    This function is copied from here \n    \"https://github.com/tensorflow/models/blob/master/official/vision/modeling/layers/nn_layers.py\"\n    \n    This is to ensure that all layers have channels that are divisible by 8.\n\n    Args:\n        value: A `float` of original value.\n        divisor: An `int` of the divisor that need to be checked upon.\n        min_value: A `float` of  minimum value threshold.\n        round_down_protect: A `bool` indicating whether round down more than 10%\n        will be allowed.\n\n    Returns:\n        The adjusted value in `int` that is divisible against divisor.\n    \"\"\"\n    if min_value is None:\n        min_value = divisor\n    new_value = max(min_value, int(value + divisor / 2) // divisor * divisor)\n    # Make sure that round down does not go down by more than 10%.\n    if round_down_protect and new_value < 0.9 * value:\n        new_value += divisor\n    return int(new_value)\n\ndef conv_2d(inp, oup, kernel_size=3, stride=1, groups=1, bias=False, norm=True, act=True):\n    conv = nn.Sequential()\n    padding = (kernel_size - 1) // 2\n    conv.add_module('conv', nn.Conv2d(inp, oup, kernel_size, stride, padding, bias=bias, groups=groups))\n    if norm:\n        conv.add_module('BatchNorm2d', nn.BatchNorm2d(oup))\n    if act:\n        conv.add_module('Activation', nn.ReLU6())\n    return conv\n\nclass InvertedResidual(nn.Module):\n    def __init__(self, inp, oup, stride, expand_ratio, act=False):\n        super(InvertedResidual, self).__init__()\n        self.stride = stride\n        assert stride in [1, 2]\n        hidden_dim = int(round(inp * expand_ratio))\n        self.block = nn.Sequential()\n        if expand_ratio != 1:\n            self.block.add_module('exp_1x1', conv_2d(inp, hidden_dim, kernel_size=1, stride=1))\n        self.block.add_module('conv_3x3', conv_2d(hidden_dim, hidden_dim, kernel_size=3, stride=stride, groups=hidden_dim))\n        self.block.add_module('red_1x1', conv_2d(hidden_dim, oup, kernel_size=1, stride=1, act=act))\n        self.use_res_connect = self.stride == 1 and inp == oup\n\n    def forward(self, x):\n        if self.use_res_connect:\n            return x + self.block(x)\n        else:\n            return self.block(x)\n\nclass UniversalInvertedBottleneckBlock(nn.Module):\n    def __init__(self, \n            inp, \n            oup, \n            start_dw_kernel_size, \n            middle_dw_kernel_size, \n            middle_dw_downsample,\n            stride,\n            expand_ratio\n        ):\n        super().__init__()\n        # Starting depthwise conv.\n        self.start_dw_kernel_size = start_dw_kernel_size\n        if self.start_dw_kernel_size:            \n            stride_ = stride if not middle_dw_downsample else 1\n            self._start_dw_ = conv_2d(inp, inp, kernel_size=start_dw_kernel_size, stride=stride_, groups=inp, act=False)\n        # Expansion with 1x1 convs.\n        expand_filters = make_divisible(inp * expand_ratio, 8)\n        self._expand_conv = conv_2d(inp, expand_filters, kernel_size=1)\n        # Middle depthwise conv.\n        self.middle_dw_kernel_size = middle_dw_kernel_size\n        if self.middle_dw_kernel_size:\n            stride_ = stride if middle_dw_downsample else 1\n            self._middle_dw = conv_2d(expand_filters, expand_filters, kernel_size=middle_dw_kernel_size, stride=stride_, groups=expand_filters)\n        # Projection with 1x1 convs.\n        self._proj_conv = conv_2d(expand_filters, oup, kernel_size=1, stride=1, act=False)\n        \n        # Ending depthwise conv.\n        # this not used\n        # _end_dw_kernel_size = 0\n        # self._end_dw = conv_2d(oup, oup, kernel_size=_end_dw_kernel_size, stride=stride, groups=inp, act=False)\n        \n    def forward(self, x):\n        if self.start_dw_kernel_size:\n            x = self._start_dw_(x)\n            # print(\"_start_dw_\", x.shape)\n        x = self._expand_conv(x)\n        # print(\"_expand_conv\", x.shape)\n        if self.middle_dw_kernel_size:\n            x = self._middle_dw(x)\n            # print(\"_middle_dw\", x.shape)\n        x = self._proj_conv(x)\n        # print(\"_proj_conv\", x.shape)\n        return x\n\ndef build_blocks(layer_spec):\n    if not layer_spec.get('block_name'):\n        return nn.Sequential()\n    block_names = layer_spec['block_name']\n    layers = nn.Sequential()\n    if block_names == \"convbn\":\n        schema_ = ['inp', 'oup', 'kernel_size', 'stride']\n        args = {}\n        for i in range(layer_spec['num_blocks']):\n            args = dict(zip(schema_, layer_spec['block_specs'][i]))\n            layers.add_module(f\"convbn_{i}\", conv_2d(**args))\n    elif block_names == \"uib\":\n        schema_ =  ['inp', 'oup', 'start_dw_kernel_size', 'middle_dw_kernel_size', 'middle_dw_downsample', 'stride', 'expand_ratio']\n        args = {}\n        for i in range(layer_spec['num_blocks']):\n            args = dict(zip(schema_, layer_spec['block_specs'][i]))\n            layers.add_module(f\"uib_{i}\", UniversalInvertedBottleneckBlock(**args))\n    elif block_names == \"fused_ib\":\n        schema_ = ['inp', 'oup', 'stride', 'expand_ratio', 'act']\n        args = {}\n        for i in range(layer_spec['num_blocks']):\n            args = dict(zip(schema_, layer_spec['block_specs'][i]))\n            layers.add_module(f\"fused_ib_{i}\", InvertedResidual(**args))\n    else:\n        raise NotImplementedError\n    return layers\n\n\nclass MobileNetV4(nn.Module):\n    def __init__(self, model):\n        # MobileNetV4ConvSmall  MobileNetV4ConvMedium  MobileNetV4ConvLarge\n        # MobileNetV4HybridMedium  MobileNetV4HybridLarge\n        \"\"\"Params to initiate MobilenNetV4\n        Args:\n            model : support 5 types of models as indicated in \n            \"https://github.com/tensorflow/models/blob/master/official/vision/modeling/backbones/mobilenet.py\"        \n        \"\"\"\n        super().__init__()\n        assert model in MODEL_SPECS.keys()\n        self.model = model\n        self.spec = MODEL_SPECS[self.model]\n       \n        # conv0\n        self.conv0 = build_blocks(self.spec['conv0'])\n        # layer1\n        self.layer1 = build_blocks(self.spec['layer1'])\n        # layer2\n        self.layer2 = build_blocks(self.spec['layer2'])\n        # layer3\n        self.layer3 = build_blocks(self.spec['layer3'])\n        # layer4\n        self.layer4 = build_blocks(self.spec['layer4'])\n        # layer5   \n        self.layer5 = build_blocks(self.spec['layer5'])\n        self.features = nn.ModuleList([self.conv0, self.layer1, self.layer2, self.layer3, self.layer4, self.layer5])     \n        self.channel = [i.size(1) for i in self.forward(torch.randn(1, 3, 640, 640))]\n        \n    def forward(self, x):\n        input_size = x.size(2)\n        scale = [4, 8, 16, 32]\n        features = [None, None, None, None]\n        for f in self.features:\n            x = f(x)\n            if input_size // x.size(2) in scale:\n                features[scale.index(input_size // x.size(2))] = x\n        return features\n\ndef MobileNetV4ConvSmall():\n    model = MobileNetV4('MobileNetV4ConvSmall')\n    return model\n\ndef MobileNetV4ConvMedium():\n    model = MobileNetV4('MobileNetV4ConvMedium')\n    return model\n\ndef MobileNetV4ConvLarge():\n    model = MobileNetV4('MobileNetV4ConvLarge')\n    return model\n\ndef MobileNetV4HybridMedium():\n    model = MobileNetV4('MobileNetV4HybridMedium')\n    return model\n\ndef MobileNetV4HybridLarge():\n    model = MobileNetV4('MobileNetV4HybridLarge')\n    return model\n\nif __name__ == '__main__':\n    model = MobileNetV4ConvSmall()\n    inputs = torch.randn((1, 3, 640, 640))\n    res = model(inputs)\n    for i in res:\n        print(i.size())"
  },
  {
    "path": "yolo-improve/yolov5-backbone/NextViT/NextViT.py",
    "content": "# Copyright (c) ByteDance Inc. All rights reserved.\nfrom functools import partial\nimport numpy as np\nimport torch\nimport torch.utils.checkpoint as checkpoint\nfrom einops import rearrange\nfrom timm.models.layers import DropPath, trunc_normal_\nfrom torch import nn\n\n__all__ = ['nextvit_small', 'nextvit_base', 'nextvit_large']\n\nNORM_EPS = 1e-5\n\nclass ConvBNReLU(nn.Module):\n    def __init__(\n            self,\n            in_channels,\n            out_channels,\n            kernel_size,\n            stride,\n            groups=1):\n        super(ConvBNReLU, self).__init__()\n        self.conv = nn.Conv2d(in_channels, out_channels, kernel_size=kernel_size, stride=stride,\n                              padding=1, groups=groups, bias=False)\n        self.norm = nn.BatchNorm2d(out_channels, eps=NORM_EPS)\n        self.act = nn.ReLU(inplace=True)\n\n    def forward(self, x):\n        x = self.conv(x)\n        x = self.norm(x)\n        x = self.act(x)\n        return x\n\n\ndef _make_divisible(v, divisor, min_value=None):\n    if min_value is None:\n        min_value = divisor\n    new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)\n    # Make sure that round down does not go down by more than 10%.\n    if new_v < 0.9 * v:\n        new_v += divisor\n    return new_v\n\n\nclass PatchEmbed(nn.Module):\n    def __init__(self,\n                 in_channels,\n                 out_channels,\n                 stride=1):\n        super(PatchEmbed, self).__init__()\n        norm_layer = partial(nn.BatchNorm2d, eps=NORM_EPS)\n        if stride == 2:\n            self.avgpool = nn.AvgPool2d((2, 2), stride=2, ceil_mode=True, count_include_pad=False)\n            self.conv = nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=1, bias=False)\n            self.norm = norm_layer(out_channels)\n        elif in_channels != out_channels:\n            self.avgpool = nn.Identity()\n            self.conv = nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=1, bias=False)\n            self.norm = norm_layer(out_channels)\n        else:\n            self.avgpool = nn.Identity()\n            self.conv = nn.Identity()\n            self.norm = nn.Identity()\n\n    def forward(self, x):\n        return self.norm(self.conv(self.avgpool(x)))\n\n\nclass MHCA(nn.Module):\n    \"\"\"\n    Multi-Head Convolutional Attention\n    \"\"\"\n    def __init__(self, out_channels, head_dim):\n        super(MHCA, self).__init__()\n        norm_layer = partial(nn.BatchNorm2d, eps=NORM_EPS)\n        self.group_conv3x3 = nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=1,\n                                       padding=1, groups=out_channels // head_dim, bias=False)\n        self.norm = norm_layer(out_channels)\n        self.act = nn.ReLU(inplace=True)\n        self.projection = nn.Conv2d(out_channels, out_channels, kernel_size=1, bias=False)\n\n    def forward(self, x):\n        out = self.group_conv3x3(x)\n        out = self.norm(out)\n        out = self.act(out)\n        out = self.projection(out)\n        return out\n\n\nclass Mlp(nn.Module):\n    def __init__(self, in_features, out_features=None, mlp_ratio=None, drop=0., bias=True):\n        super().__init__()\n        out_features = out_features or in_features\n        hidden_dim = _make_divisible(in_features * mlp_ratio, 32)\n        self.conv1 = nn.Conv2d(in_features, hidden_dim, kernel_size=1, bias=bias)\n        self.act = nn.ReLU(inplace=True)\n        self.conv2 = nn.Conv2d(hidden_dim, out_features, kernel_size=1, bias=bias)\n        self.drop = nn.Dropout(drop)\n\n    def forward(self, x):\n        x = self.conv1(x)\n        x = self.act(x)\n        x = self.drop(x)\n        x = self.conv2(x)\n        x = self.drop(x)\n        return x\n\n\nclass NCB(nn.Module):\n    \"\"\"\n    Next Convolution Block\n    \"\"\"\n    def __init__(self, in_channels, out_channels, stride=1, path_dropout=0,\n                 drop=0, head_dim=32, mlp_ratio=3):\n        super(NCB, self).__init__()\n        self.in_channels = in_channels\n        self.out_channels = out_channels\n        norm_layer = partial(nn.BatchNorm2d, eps=NORM_EPS)\n        assert out_channels % head_dim == 0\n\n        self.patch_embed = PatchEmbed(in_channels, out_channels, stride)\n        self.mhca = MHCA(out_channels, head_dim)\n        self.attention_path_dropout = DropPath(path_dropout)\n\n        self.norm = norm_layer(out_channels)\n        self.mlp = Mlp(out_channels, mlp_ratio=mlp_ratio, drop=drop, bias=True)\n        self.mlp_path_dropout = DropPath(path_dropout)\n        self.is_bn_merged = False\n\n    def forward(self, x):\n        x = self.patch_embed(x)\n        x = x + self.attention_path_dropout(self.mhca(x))\n        if not torch.onnx.is_in_onnx_export() and not self.is_bn_merged:\n            out = self.norm(x)\n        else:\n            out = x\n        x = x + self.mlp_path_dropout(self.mlp(out))\n        return x\n\n\nclass E_MHSA(nn.Module):\n    \"\"\"\n    Efficient Multi-Head Self Attention\n    \"\"\"\n    def __init__(self, dim, out_dim=None, head_dim=32, qkv_bias=True, qk_scale=None,\n                 attn_drop=0, proj_drop=0., sr_ratio=1):\n        super().__init__()\n        self.dim = dim\n        self.out_dim = out_dim if out_dim is not None else dim\n        self.num_heads = self.dim // head_dim\n        self.scale = qk_scale or head_dim ** -0.5\n        self.q = nn.Linear(dim, self.dim, bias=qkv_bias)\n        self.k = nn.Linear(dim, self.dim, bias=qkv_bias)\n        self.v = nn.Linear(dim, self.dim, bias=qkv_bias)\n        self.proj = nn.Linear(self.dim, self.out_dim)\n        self.attn_drop = nn.Dropout(attn_drop)\n        self.proj_drop = nn.Dropout(proj_drop)\n\n        self.sr_ratio = sr_ratio\n        self.N_ratio = sr_ratio ** 2\n        if sr_ratio > 1:\n            self.sr = nn.AvgPool1d(kernel_size=self.N_ratio, stride=self.N_ratio)\n            self.norm = nn.BatchNorm1d(dim, eps=NORM_EPS)\n        self.is_bn_merged = False\n\n    def forward(self, x):\n        B, N, C = x.shape\n        q = self.q(x)\n        q = q.reshape(B, N, self.num_heads, int(C // self.num_heads)).permute(0, 2, 1, 3)\n\n        if self.sr_ratio > 1:\n            x_ = x.transpose(1, 2)\n            x_ = self.sr(x_)\n            if not torch.onnx.is_in_onnx_export() and not self.is_bn_merged:\n                x_ = self.norm(x_)\n            x_ = x_.transpose(1, 2)\n            k = self.k(x_)\n            k = k.reshape(B, -1, self.num_heads, int(C // self.num_heads)).permute(0, 2, 3, 1)\n            v = self.v(x_)\n            v = v.reshape(B, -1, self.num_heads, int(C // self.num_heads)).permute(0, 2, 1, 3)\n        else:\n            k = self.k(x)\n            k = k.reshape(B, -1, self.num_heads, int(C // self.num_heads)).permute(0, 2, 3, 1)\n            v = self.v(x)\n            v = v.reshape(B, -1, self.num_heads, int(C // self.num_heads)).permute(0, 2, 1, 3)\n        attn = (q @ k) * self.scale\n\n        attn = attn.softmax(dim=-1)\n        attn = self.attn_drop(attn)\n\n        x = (attn @ v).transpose(1, 2).reshape(B, N, C)\n        x = self.proj(x)\n        x = self.proj_drop(x)\n        return x\n\n\nclass NTB(nn.Module):\n    \"\"\"\n    Next Transformer Block\n    \"\"\"\n    def __init__(\n            self, in_channels, out_channels, path_dropout, stride=1, sr_ratio=1,\n            mlp_ratio=2, head_dim=32, mix_block_ratio=0.75, attn_drop=0, drop=0,\n    ):\n        super(NTB, self).__init__()\n        self.in_channels = in_channels\n        self.out_channels = out_channels\n        self.mix_block_ratio = mix_block_ratio\n        norm_func = partial(nn.BatchNorm2d, eps=NORM_EPS)\n\n        self.mhsa_out_channels = _make_divisible(int(out_channels * mix_block_ratio), 32)\n        self.mhca_out_channels = out_channels - self.mhsa_out_channels\n\n        self.patch_embed = PatchEmbed(in_channels, self.mhsa_out_channels, stride)\n        self.norm1 = norm_func(self.mhsa_out_channels)\n        self.e_mhsa = E_MHSA(self.mhsa_out_channels, head_dim=head_dim, sr_ratio=sr_ratio,\n                             attn_drop=attn_drop, proj_drop=drop)\n        self.mhsa_path_dropout = DropPath(path_dropout * mix_block_ratio)\n\n        self.projection = PatchEmbed(self.mhsa_out_channels, self.mhca_out_channels, stride=1)\n        self.mhca = MHCA(self.mhca_out_channels, head_dim=head_dim)\n        self.mhca_path_dropout = DropPath(path_dropout * (1 - mix_block_ratio))\n\n        self.norm2 = norm_func(out_channels)\n        self.mlp = Mlp(out_channels, mlp_ratio=mlp_ratio, drop=drop)\n        self.mlp_path_dropout = DropPath(path_dropout)\n\n        self.is_bn_merged = False\n\n    def forward(self, x):\n        x = self.patch_embed(x)\n        B, C, H, W = x.shape\n        if not torch.onnx.is_in_onnx_export() and not self.is_bn_merged:\n            out = self.norm1(x)\n        else:\n            out = x\n        out = rearrange(out, \"b c h w -> b (h w) c\")  # b n c\n        out = self.mhsa_path_dropout(self.e_mhsa(out))\n        x = x + rearrange(out, \"b (h w) c -> b c h w\", h=H)\n\n        out = self.projection(x)\n        out = out + self.mhca_path_dropout(self.mhca(out))\n        x = torch.cat([x, out], dim=1)\n\n        if not torch.onnx.is_in_onnx_export() and not self.is_bn_merged:\n            out = self.norm2(x)\n        else:\n            out = x\n        x = x + self.mlp_path_dropout(self.mlp(out))\n        return x\n\n\nclass NextViT(nn.Module):\n    def __init__(self, stem_chs, depths, path_dropout, attn_drop=0, drop=0, num_classes=1000,\n                 strides=[1, 2, 2, 2], sr_ratios=[8, 4, 2, 1], head_dim=32, mix_block_ratio=0.75,\n                 use_checkpoint=False):\n        super(NextViT, self).__init__()\n        self.use_checkpoint = use_checkpoint\n\n        self.stage_out_channels = [[96] * (depths[0]),\n                                   [192] * (depths[1] - 1) + [256],\n                                   [384, 384, 384, 384, 512] * (depths[2] // 5),\n                                   [768] * (depths[3] - 1) + [1024]]\n\n        # Next Hybrid Strategy\n        self.stage_block_types = [[NCB] * depths[0],\n                                  [NCB] * (depths[1] - 1) + [NTB],\n                                  [NCB, NCB, NCB, NCB, NTB] * (depths[2] // 5),\n                                  [NCB] * (depths[3] - 1) + [NTB]]\n\n        self.stem = nn.Sequential(\n            ConvBNReLU(3, stem_chs[0], kernel_size=3, stride=2),\n            ConvBNReLU(stem_chs[0], stem_chs[1], kernel_size=3, stride=1),\n            ConvBNReLU(stem_chs[1], stem_chs[2], kernel_size=3, stride=1),\n            ConvBNReLU(stem_chs[2], stem_chs[2], kernel_size=3, stride=2),\n        )\n        input_channel = stem_chs[-1]\n        features = []\n        idx = 0\n        dpr = [x.item() for x in torch.linspace(0, path_dropout, sum(depths))]  # stochastic depth decay rule\n        for stage_id in range(len(depths)):\n            numrepeat = depths[stage_id]\n            output_channels = self.stage_out_channels[stage_id]\n            block_types = self.stage_block_types[stage_id]\n            for block_id in range(numrepeat):\n                if strides[stage_id] == 2 and block_id == 0:\n                    stride = 2\n                else:\n                    stride = 1\n                output_channel = output_channels[block_id]\n                block_type = block_types[block_id]\n                if block_type is NCB:\n                    layer = NCB(input_channel, output_channel, stride=stride, path_dropout=dpr[idx + block_id],\n                                drop=drop, head_dim=head_dim)\n                    features.append(layer)\n                elif block_type is NTB:\n                    layer = NTB(input_channel, output_channel, path_dropout=dpr[idx + block_id], stride=stride,\n                                sr_ratio=sr_ratios[stage_id], head_dim=head_dim, mix_block_ratio=mix_block_ratio,\n                                attn_drop=attn_drop, drop=drop)\n                    features.append(layer)\n                input_channel = output_channel\n            idx += numrepeat\n        self.features = nn.Sequential(*features)\n\n        self.norm = nn.BatchNorm2d(output_channel, eps=NORM_EPS)\n        self.stage_out_idx = [sum(depths[:idx + 1]) - 1 for idx in range(len(depths))]\n        self.channel = [i.size(1) for i in self.forward(torch.randn(1, 3, 640, 640))]\n        self._initialize_weights()\n\n    def _initialize_weights(self):\n        for n, m in self.named_modules():\n            if isinstance(m, (nn.BatchNorm2d, nn.GroupNorm, nn.LayerNorm, nn.BatchNorm1d)):\n                nn.init.constant_(m.weight, 1.0)\n                nn.init.constant_(m.bias, 0)\n            elif isinstance(m, nn.Linear):\n                trunc_normal_(m.weight, std=.02)\n                if hasattr(m, 'bias') and m.bias is not None:\n                    nn.init.constant_(m.bias, 0)\n            elif isinstance(m, nn.Conv2d):\n                trunc_normal_(m.weight, std=.02)\n                if hasattr(m, 'bias') and m.bias is not None:\n                    nn.init.constant_(m.bias, 0)\n\n    def forward(self, x):\n        res = []\n        x = self.stem(x)\n        for idx, layer in enumerate(self.features):\n            if self.use_checkpoint:\n                x = checkpoint.checkpoint(layer, x)\n            else:\n                x = layer(x)\n            if idx in self.stage_out_idx:\n                res.append(x)\n        res[-1] = self.norm(res[-1])\n        return res\n\ndef update_weight(model_dict, weight_dict):\n    idx, temp_dict = 0, {}\n    for k, v in weight_dict.items():\n        if k in model_dict.keys() and np.shape(model_dict[k]) == np.shape(v):\n            temp_dict[k] = v\n            idx += 1\n    model_dict.update(temp_dict)\n    print(f'loading weights... {idx}/{len(model_dict)} items')\n    return model_dict\n\ndef nextvit_small(weights=''):\n    model = NextViT(stem_chs=[64, 32, 64], depths=[3, 4, 10, 3], path_dropout=0.1)\n    if weights:\n        pretrained_weight = torch.load(weights)['model']\n        model.load_state_dict(update_weight(model.state_dict(), pretrained_weight))\n    return model\n\n\ndef nextvit_base(weights=''):\n    model = NextViT(stem_chs=[64, 32, 64], depths=[3, 4, 20, 3], path_dropout=0.2)\n    if weights:\n        pretrained_weight = torch.load(weights)['model']\n        model.load_state_dict(update_weight(model.state_dict(), pretrained_weight))\n    return model\n\n\ndef nextvit_large(weights=''):\n    model = NextViT(stem_chs=[64, 32, 64], depths=[3, 4, 30, 3], path_dropout=0.2)\n    if weights:\n        pretrained_weight = torch.load(weights)['model']\n        model.load_state_dict(update_weight(model.state_dict(), pretrained_weight))\n    return model"
  },
  {
    "path": "yolo-improve/yolov5-backbone/ODConv/od_mobilenetv2.py",
    "content": "\nimport torch\nfrom torch import nn\nimport numpy as np\nfrom models.ODConv.odconv import ODConv2d\n\n__all__ = ['od_mobilenetv2_050', 'od_mobilenetv2_075', 'od_mobilenetv2_100']\n\n\ndef _make_divisible(v, divisor, min_value=None):\n    \"\"\"\n    This function is taken from the original tf repo.\n    It ensures that all layers have a channel number that is divisible by 8\n    It can be seen here:\n    https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet.py\n    :param v:\n    :param divisor:\n    :param min_value:\n    :return:\n    \"\"\"\n    if min_value is None:\n        min_value = divisor\n    new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)\n    # Make sure that round down does not go down by more than 10%.\n    if new_v < 0.9 * v:\n        new_v += divisor\n    return new_v\n\n\nclass ConvBNReLU(nn.Sequential):\n    def __init__(self, in_planes, out_planes, kernel_size=3, stride=1, groups=1, norm_layer=nn.BatchNorm2d):\n        padding = (kernel_size - 1) // 2\n        super(ConvBNReLU, self).__init__(\n            nn.Conv2d(in_planes, out_planes, kernel_size, stride, padding, groups=groups, bias=False),\n            norm_layer(out_planes),\n            nn.ReLU6(inplace=True)\n        )\n\n\nclass ODConvBNReLU(nn.Sequential):\n    def __init__(self, in_planes, out_planes, kernel_size=3, stride=1, groups=1, norm_layer=nn.BatchNorm2d,\n                 reduction=0.0625, kernel_num=1):\n        padding = (kernel_size - 1) // 2\n        super(ODConvBNReLU, self).__init__(\n            ODConv2d(in_planes, out_planes, kernel_size, stride, padding, groups=groups,\n                     reduction=reduction, kernel_num=kernel_num),\n            norm_layer(out_planes),\n            nn.ReLU6(inplace=True)\n        )\n\n\nclass InvertedResidual(nn.Module):\n    def __init__(self, inp, oup, stride, expand_ratio, norm_layer=nn.BatchNorm2d, reduction=0.0625, kernel_num=1):\n        super(InvertedResidual, self).__init__()\n        self.stride = stride\n        hidden_dim = int(round(inp * expand_ratio))\n        self.use_res_connect = self.stride == 1 and inp == oup\n\n        layers = []\n        if expand_ratio != 1:\n            # pw\n            layers.append(ODConvBNReLU(inp, hidden_dim, kernel_size=1, norm_layer=norm_layer,\n                                       reduction=reduction, kernel_num=kernel_num))\n        layers.extend([\n            # dw\n            ODConvBNReLU(hidden_dim, hidden_dim, stride=stride, groups=hidden_dim, norm_layer=norm_layer,\n                         reduction=reduction, kernel_num=kernel_num),\n            # pw-linear\n            ODConv2d(hidden_dim, oup, 1, 1, 0,\n                     reduction=reduction, kernel_num=kernel_num),\n            norm_layer(oup),\n        ])\n        self.conv = nn.Sequential(*layers)\n\n    def forward(self, x):\n        if self.use_res_connect:\n            return x + self.conv(x)\n        else:\n            return self.conv(x)\n\n\nclass OD_MobileNetV2(nn.Module):\n    def __init__(self,\n                 num_classes=1000,\n                 width_mult=1.0,\n                 inverted_residual_setting=None,\n                 round_nearest=8,\n                 block=InvertedResidual,\n                 norm_layer=nn.BatchNorm2d,\n                 dropout=0.2,\n                 reduction=0.0625,\n                 kernel_num=1,\n                 **kwargs):\n        \"\"\"\n        MobileNet V2 main class\n        Args:\n            num_classes (int): Number of classes\n            width_mult (float): Width multiplier - adjusts number of channels in each layer by this amount\n            inverted_residual_setting: Network structure\n            round_nearest (int): Round the number of channels in each layer to be a multiple of this number\n            Set to 1 to turn off rounding\n            block: Module specifying inverted residual building block for mobilenet\n            norm_layer: Module specifying the normalization layer to use\n        \"\"\"\n        super(OD_MobileNetV2, self).__init__()\n\n        input_channel = 32\n        last_channel = 1280\n\n        if inverted_residual_setting is None:\n            inverted_residual_setting = [\n                # t, c, n, s\n                [1, 16, 1, 1],\n                [6, 24, 2, 2],\n                [6, 32, 3, 2],\n                [6, 64, 4, 2],\n                [6, 96, 3, 1],\n                [6, 160, 3, 2],\n                [6, 320, 1, 1],\n            ]\n\n        # only check the first element, assuming user knows t,c,n,s are required\n        if len(inverted_residual_setting) == 0 or len(inverted_residual_setting[0]) != 4:\n            raise ValueError(\"inverted_residual_setting should be non-empty \"\n                             \"or a 4-element list, got {}\".format(inverted_residual_setting))\n\n        # building first layer\n        input_channel = _make_divisible(input_channel * width_mult, round_nearest)\n        self.last_channel = _make_divisible(last_channel * max(1.0, width_mult), round_nearest)\n        features = [ConvBNReLU(3, input_channel, stride=2, norm_layer=norm_layer)]\n        # building inverted residual blocks\n        for t, c, n, s in inverted_residual_setting:\n            output_channel = _make_divisible(c * width_mult, round_nearest)\n            for i in range(n):\n                stride = s if i == 0 else 1\n                features.append(block(input_channel, output_channel, stride, expand_ratio=t, norm_layer=norm_layer,\n                                      reduction=reduction, kernel_num=kernel_num))\n                input_channel = output_channel\n        # building last several layers\n        features.append(ODConvBNReLU(input_channel, self.last_channel, kernel_size=1, norm_layer=norm_layer,\n                                     reduction=reduction, kernel_num=kernel_num))\n        # make it nn.Sequential\n        self.features = nn.Sequential(*features)\n\n        # weight initialization\n        for m in self.modules():\n            if isinstance(m, nn.Conv2d):\n                nn.init.kaiming_normal_(m.weight, mode='fan_out')\n                if m.bias is not None:\n                    nn.init.zeros_(m.bias)\n            elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):\n                nn.init.ones_(m.weight)\n                nn.init.zeros_(m.bias)\n            elif isinstance(m, nn.Linear):\n                nn.init.normal_(m.weight, 0, 0.01)\n                nn.init.zeros_(m.bias)\n\n        self.channel = [i.size(1) for i in self.forward(torch.randn(2, 3, 640, 640))]\n        \n    def net_update_temperature(self, temperature):\n        for m in self.modules():\n            if hasattr(m, \"update_temperature\"):\n                m.update_temperature(temperature)      \n\n    def forward(self, x):\n        input_size = x.size(2)\n        scale = [4, 8, 16, 32]\n        features = [None, None, None, None]\n        for idx, layer in enumerate(self.features):\n            x = layer(x)\n            if input_size // x.size(2) in scale:\n                features[scale.index(input_size // x.size(2))] = x\n        return features\n\ndef update_weight(model_dict, weight_dict):\n    idx, temp_dict = 0, {}\n    for k, v in weight_dict.items():\n        if k.replace('module.', '') in model_dict.keys() and np.shape(model_dict[k.replace('module.', '')]) == np.shape(v):\n            temp_dict[k.replace('module.', '')] = v\n            idx += 1\n    model_dict.update(temp_dict)\n    print(f'loading weights... {idx}/{len(model_dict)} items')\n    return model_dict\n\ndef od_mobilenetv2_050(weights=None, kernel_num=1):\n    model = OD_MobileNetV2(width_mult=0.5, kernel_num=kernel_num)\n    if weights is not None:\n        pretrain_weight = torch.load(weights, map_location='cpu')['state_dict']\n        model.load_state_dict(update_weight(model.state_dict(), pretrain_weight))\n    return model\n\ndef od_mobilenetv2_075(weights=None, kernel_num=1):\n    model = OD_MobileNetV2(width_mult=0.75, kernel_num=kernel_num)\n    if weights is not None:\n        pretrain_weight = torch.load(weights, map_location='cpu')['state_dict']\n        model.load_state_dict(update_weight(model.state_dict(), pretrain_weight))\n    return model\n\ndef od_mobilenetv2_100(weights=None, kernel_num=1):\n    model = OD_MobileNetV2(width_mult=1.0, kernel_num=kernel_num)\n    if weights is not None:\n        pretrain_weight = torch.load(weights, map_location='cpu')['state_dict']\n        model.load_state_dict(update_weight(model.state_dict(), pretrain_weight))\n    return model"
  },
  {
    "path": "yolo-improve/yolov5-backbone/ODConv/od_resnet.py",
    "content": "import torch\nimport torch.nn as nn\nfrom models.ODConv.odconv import ODConv2d\nimport numpy as np\n\n__all__ = ['od_resnet18', 'od_resnet34', 'od_resnet50', 'od_resnet101']\n\n\ndef odconv3x3(in_planes, out_planes, stride=1, reduction=0.0625, kernel_num=1):\n    return ODConv2d(in_planes, out_planes, kernel_size=3, stride=stride, padding=1,\n                    reduction=reduction, kernel_num=kernel_num)\n\n\ndef odconv1x1(in_planes, out_planes, stride=1, reduction=0.0625, kernel_num=1):\n    return ODConv2d(in_planes, out_planes, kernel_size=1, stride=stride, padding=0,\n                    reduction=reduction, kernel_num=kernel_num)\n\n\nclass BasicBlock(nn.Module):\n    expansion = 1\n\n    def __init__(self, inplanes, planes, stride=1, downsample=None, reduction=0.0625, kernel_num=1):\n        super(BasicBlock, self).__init__()\n        self.conv1 = odconv3x3(inplanes, planes, stride, reduction=reduction, kernel_num=kernel_num)\n        self.bn1 = nn.BatchNorm2d(planes)\n        self.relu = nn.ReLU(inplace=True)\n        self.conv2 = odconv3x3(planes, planes, reduction=reduction, kernel_num=kernel_num)\n        self.bn2 = nn.BatchNorm2d(planes)\n        self.downsample = downsample\n        self.stride = stride\n\n    def forward(self, x):\n        identity = x\n\n        out = self.conv1(x)\n        out = self.bn1(out)\n        out = self.relu(out)\n\n        out = self.conv2(out)\n        out = self.bn2(out)\n\n        if self.downsample is not None:\n            identity = self.downsample(x)\n\n        out += identity\n        out = self.relu(out)\n        return out\n\n\nclass Bottleneck(nn.Module):\n    expansion = 4\n\n    def __init__(self, inplanes, planes, stride=1, downsample=None, reduction=0.0625, kernel_num=1):\n        super(Bottleneck, self).__init__()\n        self.conv1 = odconv1x1(inplanes, planes, reduction=reduction, kernel_num=kernel_num)\n        self.bn1 = nn.BatchNorm2d(planes)\n        self.conv2 = odconv3x3(planes, planes, stride, reduction=reduction, kernel_num=kernel_num)\n        self.bn2 = nn.BatchNorm2d(planes)\n        self.conv3 = odconv1x1(planes, planes * self.expansion, reduction=reduction, kernel_num=kernel_num)\n        self.bn3 = nn.BatchNorm2d(planes * self.expansion)\n        self.relu = nn.ReLU(inplace=True)\n        self.downsample = downsample\n        self.stride = stride\n\n    def forward(self, x):\n        identity = x\n\n        out = self.conv1(x)\n        out = self.bn1(out)\n        out = self.relu(out)\n\n        out = self.conv2(out)\n        out = self.bn2(out)\n        out = self.relu(out)\n\n        out = self.conv3(out)\n        out = self.bn3(out)\n\n        if self.downsample is not None:\n            identity = self.downsample(x)\n\n        out += identity\n        out = self.relu(out)\n        return out\n\n\nclass OD_ResNet(nn.Module):\n    def __init__(self, block, layers, num_classes=1000, dropout=0.1, reduction=0.0625, kernel_num=1):\n        super(OD_ResNet, self).__init__()\n        self.inplanes = 64\n        self.conv1 = nn.Conv2d(3, self.inplanes, kernel_size=7, stride=2, padding=3,\n                               bias=False)\n        self.bn1 = nn.BatchNorm2d(self.inplanes)\n        self.relu = nn.ReLU(inplace=True)\n        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)\n        self.layer1 = self._make_layer(block, 64, layers[0], reduction=reduction, kernel_num=kernel_num)\n        self.layer2 = self._make_layer(block, 128, layers[1], stride=2, reduction=reduction, kernel_num=kernel_num)\n        self.layer3 = self._make_layer(block, 256, layers[2], stride=2, reduction=reduction, kernel_num=kernel_num)\n        self.layer4 = self._make_layer(block, 512, layers[3], stride=2, reduction=reduction, kernel_num=kernel_num)\n\n        for m in self.modules():\n            if isinstance(m, nn.Conv2d):\n                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')\n            elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):\n                nn.init.constant_(m.weight, 1)\n                nn.init.constant_(m.bias, 0)\n            elif isinstance(m, nn.Linear):\n                nn.init.normal_(m.weight, 0, 0.01)\n                nn.init.zeros_(m.bias)\n\n        self.channel = [i.size(1) for i in self.forward(torch.randn(2, 3, 640, 640))]\n        \n    def net_update_temperature(self, temperature):\n        for m in self.modules():\n            if hasattr(m, \"update_temperature\"):\n                m.update_temperature(temperature)\n\n    def _make_layer(self, block, planes, blocks, stride=1, reduction=0.625, kernel_num=1):\n        downsample = None\n        if stride != 1 or self.inplanes != planes * block.expansion:\n            downsample = nn.Sequential(\n                nn.Conv2d(self.inplanes, planes * block.expansion, kernel_size=1, stride=stride, padding=0, bias=False),\n                nn.BatchNorm2d(planes * block.expansion),\n            )\n\n        layers = []\n        layers.append(block(self.inplanes, planes, stride, downsample, reduction=reduction, kernel_num=kernel_num))\n        self.inplanes = planes * block.expansion\n        for _ in range(1, blocks):\n            layers.append(block(self.inplanes, planes, reduction=reduction, kernel_num=kernel_num))\n\n        return nn.Sequential(*layers) \n\n    def forward(self, x):\n        x = self.conv1(x)\n        x = self.bn1(x)\n        x1 = self.relu(x)\n        x = self.maxpool(x1)\n\n        x2 = self.layer1(x)\n        x3 = self.layer2(x2)\n        x4 = self.layer3(x3)\n        x5 = self.layer4(x4)\n        \n        return [x1, x2, x3, x4, x5]\n\ndef update_weight(model_dict, weight_dict):\n    idx, temp_dict = 0, {}\n    for k, v in weight_dict.items():\n        if k.replace('module.', '') in model_dict.keys() and np.shape(model_dict[k.replace('module.', '')]) == np.shape(v):\n            temp_dict[k.replace('module.', '')] = v\n            idx += 1\n    model_dict.update(temp_dict)\n    print(f'loading weights... {idx}/{len(model_dict)} items')\n    return model_dict\n\ndef od_resnet18(weights=None, kernel_num=1):\n    model = OD_ResNet(BasicBlock, [2, 2, 2, 2], kernel_num=kernel_num)\n    if weights is not None:\n        pretrain_weight = torch.load(weights, map_location='cpu')['state_dict']\n        model.load_state_dict(update_weight(model.state_dict(), pretrain_weight))\n    return model\n\ndef od_resnet34(weights=None, kernel_num=1):\n    model = OD_ResNet(BasicBlock, [3, 4, 6, 3], kernel_num=kernel_num)\n    if weights is not None:\n        pretrain_weight = torch.load(weights, map_location='cpu')['state_dict']\n        model.load_state_dict(update_weight(model.state_dict(), pretrain_weight))\n    return model\n\ndef od_resnet50(weights=None, kernel_num=1):\n    model = OD_ResNet(Bottleneck, [3, 4, 6, 3], kernel_num=kernel_num)\n    if weights is not None:\n        pretrain_weight = torch.load(weights, map_location='cpu')['state_dict']\n        model.load_state_dict(update_weight(model.state_dict(), pretrain_weight))\n    return model\n\ndef od_resnet101(weights=None, kernel_num=1):\n    model = OD_ResNet(Bottleneck, [3, 4, 23, 3], kernel_num=kernel_num)\n    if weights is not None:\n        pretrain_weight = torch.load(weights, map_location='cpu')['state_dict']\n        model.load_state_dict(update_weight(model.state_dict(), pretrain_weight))\n    return model"
  },
  {
    "path": "yolo-improve/yolov5-backbone/ODConv/odconv.py",
    "content": "import torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport torch.autograd\n\n\nclass Attention(nn.Module):\n    def __init__(self, in_planes, out_planes, kernel_size, groups=1, reduction=0.0625, kernel_num=4, min_channel=16):\n        super(Attention, self).__init__()\n        attention_channel = max(int(in_planes * reduction), min_channel)\n        self.kernel_size = kernel_size\n        self.kernel_num = kernel_num\n        self.temperature = 1.0\n\n        self.avgpool = nn.AdaptiveAvgPool2d(1)\n        self.fc = nn.Conv2d(in_planes, attention_channel, 1, bias=False)\n        self.bn = nn.BatchNorm2d(attention_channel)\n        self.relu = nn.ReLU(inplace=True)\n\n        self.channel_fc = nn.Conv2d(attention_channel, in_planes, 1, bias=True)\n        self.func_channel = self.get_channel_attention\n\n        if in_planes == groups and in_planes == out_planes:  # depth-wise convolution\n            self.func_filter = self.skip\n        else:\n            self.filter_fc = nn.Conv2d(attention_channel, out_planes, 1, bias=True)\n            self.func_filter = self.get_filter_attention\n\n        if kernel_size == 1:  # point-wise convolution\n            self.func_spatial = self.skip\n        else:\n            self.spatial_fc = nn.Conv2d(attention_channel, kernel_size * kernel_size, 1, bias=True)\n            self.func_spatial = self.get_spatial_attention\n\n        if kernel_num == 1:\n            self.func_kernel = self.skip\n        else:\n            self.kernel_fc = nn.Conv2d(attention_channel, kernel_num, 1, bias=True)\n            self.func_kernel = self.get_kernel_attention\n\n        self._initialize_weights()\n\n    def _initialize_weights(self):\n        for m in self.modules():\n            if isinstance(m, nn.Conv2d):\n                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')\n                if m.bias is not None:\n                    nn.init.constant_(m.bias, 0)\n            if isinstance(m, nn.BatchNorm2d):\n                nn.init.constant_(m.weight, 1)\n                nn.init.constant_(m.bias, 0)\n\n    def update_temperature(self, temperature):\n        self.temperature = temperature\n\n    @staticmethod\n    def skip(_):\n        return 1.0\n\n    def get_channel_attention(self, x):\n        channel_attention = torch.sigmoid(self.channel_fc(x).view(x.size(0), -1, 1, 1) / self.temperature)\n        return channel_attention\n\n    def get_filter_attention(self, x):\n        filter_attention = torch.sigmoid(self.filter_fc(x).view(x.size(0), -1, 1, 1) / self.temperature)\n        return filter_attention\n\n    def get_spatial_attention(self, x):\n        spatial_attention = self.spatial_fc(x).view(x.size(0), 1, 1, 1, self.kernel_size, self.kernel_size)\n        spatial_attention = torch.sigmoid(spatial_attention / self.temperature)\n        return spatial_attention\n\n    def get_kernel_attention(self, x):\n        kernel_attention = self.kernel_fc(x).view(x.size(0), -1, 1, 1, 1, 1)\n        kernel_attention = F.softmax(kernel_attention / self.temperature, dim=1)\n        return kernel_attention\n\n    def forward(self, x):\n        x = self.avgpool(x)\n        x = self.fc(x)\n        x = self.bn(x)\n        x = self.relu(x)\n        return self.func_channel(x), self.func_filter(x), self.func_spatial(x), self.func_kernel(x)\n\n\nclass ODConv2d(nn.Module):\n    def __init__(self, in_planes, out_planes, kernel_size, stride=1, padding=0, dilation=1, groups=1,\n                 reduction=0.0625, kernel_num=4):\n        super(ODConv2d, self).__init__()\n        self.in_planes = in_planes\n        self.out_planes = out_planes\n        self.kernel_size = kernel_size\n        self.stride = stride\n        self.padding = padding\n        self.dilation = dilation\n        self.groups = groups\n        self.kernel_num = kernel_num\n        self.attention = Attention(in_planes, out_planes, kernel_size, groups=groups,\n                                   reduction=reduction, kernel_num=kernel_num)\n        self.weight = nn.Parameter(torch.randn(kernel_num, out_planes, in_planes//groups, kernel_size, kernel_size),\n                                   requires_grad=True)\n        self._initialize_weights()\n\n        if self.kernel_size == 1 and self.kernel_num == 1:\n            self._forward_impl = self._forward_impl_pw1x\n        else:\n            self._forward_impl = self._forward_impl_common\n\n    def _initialize_weights(self):\n        for i in range(self.kernel_num):\n            nn.init.kaiming_normal_(self.weight[i], mode='fan_out', nonlinearity='relu')\n\n    def update_temperature(self, temperature):\n        self.attention.update_temperature(temperature)\n\n    def _forward_impl_common(self, x):\n        # Multiplying channel attention (or filter attention) to weights and feature maps are equivalent,\n        # while we observe that when using the latter method the models will run faster with less gpu memory cost.\n        channel_attention, filter_attention, spatial_attention, kernel_attention = self.attention(x)\n        batch_size, in_planes, height, width = x.size()\n        x = x * channel_attention\n        x = x.reshape(1, -1, height, width)\n        aggregate_weight = spatial_attention * kernel_attention * self.weight.unsqueeze(dim=0)\n        aggregate_weight = torch.sum(aggregate_weight, dim=1).view(\n            [-1, self.in_planes // self.groups, self.kernel_size, self.kernel_size])\n        output = F.conv2d(x, weight=aggregate_weight, bias=None, stride=self.stride, padding=self.padding,\n                          dilation=self.dilation, groups=self.groups * batch_size)\n        output = output.view(batch_size, self.out_planes, output.size(-2), output.size(-1))\n        output = output * filter_attention\n        return output\n\n    def _forward_impl_pw1x(self, x):\n        channel_attention, filter_attention, spatial_attention, kernel_attention = self.attention(x)\n        x = x * channel_attention\n        output = F.conv2d(x, weight=self.weight.squeeze(dim=0), bias=None, stride=self.stride, padding=self.padding,\n                          dilation=self.dilation, groups=self.groups)\n        output = output * filter_attention\n        return output\n\n    def forward(self, x):\n        return self._forward_impl(x)"
  },
  {
    "path": "yolo-improve/yolov5-backbone/ODConvFuse/od_mobilenetv2.py",
    "content": "\nimport torch\nfrom torch import nn\nimport numpy as np\nfrom models.ODConv.odconv import ODConv2d\n\n__all__ = ['od_mobilenetv2_050', 'od_mobilenetv2_075', 'od_mobilenetv2_100']\n\ndef fuse_conv_bn(conv, bn):\n    # Fuse convolution and batchnorm layers https://tehnokv.com/posts/fusing-batchnorm-and-conv/\n    fusedconv = (\n        nn.Conv2d(\n            conv.in_channels,\n            conv.out_channels,\n            kernel_size=conv.kernel_size,\n            stride=conv.stride,\n            padding=conv.padding,\n            groups=conv.groups,\n            bias=True,\n        )\n        .requires_grad_(False)\n        .to(conv.weight.device)\n    )\n\n    # prepare filters\n    w_conv = conv.weight.clone().view(conv.out_channels, -1)\n    w_bn = torch.diag(bn.weight.div(torch.sqrt(bn.eps + bn.running_var)))\n    fusedconv.weight.copy_(torch.mm(w_bn, w_conv).view(fusedconv.weight.shape))\n\n    # prepare spatial bias\n    b_conv = (\n        torch.zeros(conv.weight.size(0), device=conv.weight.device)\n        if conv.bias is None\n        else conv.bias\n    )\n    b_bn = bn.bias - bn.weight.mul(bn.running_mean).div(\n        torch.sqrt(bn.running_var + bn.eps)\n    )\n    fusedconv.bias.copy_(torch.mm(w_bn, b_conv.reshape(-1, 1)).reshape(-1) + b_bn)\n    return fusedconv\n\ndef _make_divisible(v, divisor, min_value=None):\n    \"\"\"\n    This function is taken from the original tf repo.\n    It ensures that all layers have a channel number that is divisible by 8\n    It can be seen here:\n    https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet.py\n    :param v:\n    :param divisor:\n    :param min_value:\n    :return:\n    \"\"\"\n    if min_value is None:\n        min_value = divisor\n    new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)\n    # Make sure that round down does not go down by more than 10%.\n    if new_v < 0.9 * v:\n        new_v += divisor\n    return new_v\n\n\nclass ConvBNReLU(nn.Sequential):\n    def __init__(self, in_planes, out_planes, kernel_size=3, stride=1, groups=1, norm_layer=nn.BatchNorm2d):\n        padding = (kernel_size - 1) // 2\n        super(ConvBNReLU, self).__init__(\n            nn.Conv2d(in_planes, out_planes, kernel_size, stride, padding, groups=groups, bias=False),\n            norm_layer(out_planes),\n            nn.ReLU6(inplace=True)\n        )\n\n    def fuse(self):\n        self = nn.Sequential(\n            fuse_conv_bn(self[0], self[1]),\n            self[2]\n        )\n\nclass ODConvBNReLU(nn.Sequential):\n    def __init__(self, in_planes, out_planes, kernel_size=3, stride=1, groups=1, norm_layer=nn.BatchNorm2d,\n                 reduction=0.0625, kernel_num=1):\n        padding = (kernel_size - 1) // 2\n        super(ODConvBNReLU, self).__init__(\n            ODConv2d(in_planes, out_planes, kernel_size, stride, padding, groups=groups,\n                     reduction=reduction, kernel_num=kernel_num),\n            norm_layer(out_planes),\n            nn.ReLU6(inplace=True)\n        )\n\n\nclass InvertedResidual(nn.Module):\n    def __init__(self, inp, oup, stride, expand_ratio, norm_layer=nn.BatchNorm2d, reduction=0.0625, kernel_num=1):\n        super(InvertedResidual, self).__init__()\n        self.stride = stride\n        hidden_dim = int(round(inp * expand_ratio))\n        self.use_res_connect = self.stride == 1 and inp == oup\n\n        layers = []\n        if expand_ratio != 1:\n            # pw\n            layers.append(ODConvBNReLU(inp, hidden_dim, kernel_size=1, norm_layer=norm_layer,\n                                       reduction=reduction, kernel_num=kernel_num))\n        layers.extend([\n            # dw\n            ODConvBNReLU(hidden_dim, hidden_dim, stride=stride, groups=hidden_dim, norm_layer=norm_layer,\n                         reduction=reduction, kernel_num=kernel_num),\n            # pw-linear\n            ODConv2d(hidden_dim, oup, 1, 1, 0,\n                     reduction=reduction, kernel_num=kernel_num),\n            norm_layer(oup),\n        ])\n        self.conv = nn.Sequential(*layers)\n\n    def forward(self, x):\n        if self.use_res_connect:\n            return x + self.conv(x)\n        else:\n            return self.conv(x)\n\n\nclass OD_MobileNetV2(nn.Module):\n    def __init__(self,\n                 num_classes=1000,\n                 width_mult=1.0,\n                 inverted_residual_setting=None,\n                 round_nearest=8,\n                 block=InvertedResidual,\n                 norm_layer=nn.BatchNorm2d,\n                 dropout=0.2,\n                 reduction=0.0625,\n                 kernel_num=1,\n                 **kwargs):\n        \"\"\"\n        MobileNet V2 main class\n        Args:\n            num_classes (int): Number of classes\n            width_mult (float): Width multiplier - adjusts number of channels in each layer by this amount\n            inverted_residual_setting: Network structure\n            round_nearest (int): Round the number of channels in each layer to be a multiple of this number\n            Set to 1 to turn off rounding\n            block: Module specifying inverted residual building block for mobilenet\n            norm_layer: Module specifying the normalization layer to use\n        \"\"\"\n        super(OD_MobileNetV2, self).__init__()\n\n        input_channel = 32\n        last_channel = 1280\n\n        if inverted_residual_setting is None:\n            inverted_residual_setting = [\n                # t, c, n, s\n                [1, 16, 1, 1],\n                [6, 24, 2, 2],\n                [6, 32, 3, 2],\n                [6, 64, 4, 2],\n                [6, 96, 3, 1],\n                [6, 160, 3, 2],\n                [6, 320, 1, 1],\n            ]\n\n        # only check the first element, assuming user knows t,c,n,s are required\n        if len(inverted_residual_setting) == 0 or len(inverted_residual_setting[0]) != 4:\n            raise ValueError(\"inverted_residual_setting should be non-empty \"\n                             \"or a 4-element list, got {}\".format(inverted_residual_setting))\n\n        # building first layer\n        input_channel = _make_divisible(input_channel * width_mult, round_nearest)\n        self.last_channel = _make_divisible(last_channel * max(1.0, width_mult), round_nearest)\n        features = [ConvBNReLU(3, input_channel, stride=2, norm_layer=norm_layer)]\n        # building inverted residual blocks\n        for t, c, n, s in inverted_residual_setting:\n            output_channel = _make_divisible(c * width_mult, round_nearest)\n            for i in range(n):\n                stride = s if i == 0 else 1\n                features.append(block(input_channel, output_channel, stride, expand_ratio=t, norm_layer=norm_layer,\n                                      reduction=reduction, kernel_num=kernel_num))\n                input_channel = output_channel\n        # building last several layers\n        features.append(ODConvBNReLU(input_channel, self.last_channel, kernel_size=1, norm_layer=norm_layer,\n                                     reduction=reduction, kernel_num=kernel_num))\n        # make it nn.Sequential\n        self.features = nn.Sequential(*features)\n\n        # weight initialization\n        for m in self.modules():\n            if isinstance(m, nn.Conv2d):\n                nn.init.kaiming_normal_(m.weight, mode='fan_out')\n                if m.bias is not None:\n                    nn.init.zeros_(m.bias)\n            elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):\n                nn.init.ones_(m.weight)\n                nn.init.zeros_(m.bias)\n            elif isinstance(m, nn.Linear):\n                nn.init.normal_(m.weight, 0, 0.01)\n                nn.init.zeros_(m.bias)\n\n        self.channel = [i.size(1) for i in self.forward(torch.randn(2, 3, 640, 640))]\n        \n    def net_update_temperature(self, temperature):\n        for m in self.modules():\n            if hasattr(m, \"update_temperature\"):\n                m.update_temperature(temperature)      \n\n    def forward(self, x):\n        input_size = x.size(2)\n        scale = [4, 8, 16, 32]\n        features = [None, None, None, None]\n        for idx, layer in enumerate(self.features):\n            x = layer(x)\n            if input_size // x.size(2) in scale:\n                features[scale.index(input_size // x.size(2))] = x\n        return features\n\ndef update_weight(model_dict, weight_dict):\n    idx, temp_dict = 0, {}\n    for k, v in weight_dict.items():\n        if k.replace('module.', '') in model_dict.keys() and np.shape(model_dict[k.replace('module.', '')]) == np.shape(v):\n            temp_dict[k.replace('module.', '')] = v\n            idx += 1\n    model_dict.update(temp_dict)\n    print(f'loading weights... {idx}/{len(model_dict)} items')\n    return model_dict\n\ndef od_mobilenetv2_050(weights=None, kernel_num=1):\n    model = OD_MobileNetV2(width_mult=0.5, kernel_num=kernel_num)\n    if weights is not None:\n        pretrain_weight = torch.load(weights, map_location='cpu')['state_dict']\n        model.load_state_dict(update_weight(model.state_dict(), pretrain_weight))\n    return model\n\ndef od_mobilenetv2_075(weights=None, kernel_num=1):\n    model = OD_MobileNetV2(width_mult=0.75, kernel_num=kernel_num)\n    if weights is not None:\n        pretrain_weight = torch.load(weights, map_location='cpu')['state_dict']\n        model.load_state_dict(update_weight(model.state_dict(), pretrain_weight))\n    return model\n\ndef od_mobilenetv2_100(weights=None, kernel_num=1):\n    model = OD_MobileNetV2(width_mult=1.0, kernel_num=kernel_num)\n    if weights is not None:\n        pretrain_weight = torch.load(weights, map_location='cpu')['state_dict']\n        model.load_state_dict(update_weight(model.state_dict(), pretrain_weight))\n    return model"
  },
  {
    "path": "yolo-improve/yolov5-backbone/ODConvFuse/od_resnet.py",
    "content": "import torch\nimport torch.nn as nn\nfrom models.ODConv.odconv import ODConv2d\nimport numpy as np\n\n__all__ = ['od_resnet18', 'od_resnet34', 'od_resnet50', 'od_resnet101']\n\n\ndef odconv3x3(in_planes, out_planes, stride=1, reduction=0.0625, kernel_num=1):\n    return ODConv2d(in_planes, out_planes, kernel_size=3, stride=stride, padding=1,\n                    reduction=reduction, kernel_num=kernel_num)\n\n\ndef odconv1x1(in_planes, out_planes, stride=1, reduction=0.0625, kernel_num=1):\n    return ODConv2d(in_planes, out_planes, kernel_size=1, stride=stride, padding=0,\n                    reduction=reduction, kernel_num=kernel_num)\n\n\nclass BasicBlock(nn.Module):\n    expansion = 1\n\n    def __init__(self, inplanes, planes, stride=1, downsample=None, reduction=0.0625, kernel_num=1):\n        super(BasicBlock, self).__init__()\n        self.conv1 = odconv3x3(inplanes, planes, stride, reduction=reduction, kernel_num=kernel_num)\n        self.bn1 = nn.BatchNorm2d(planes)\n        self.relu = nn.ReLU(inplace=True)\n        self.conv2 = odconv3x3(planes, planes, reduction=reduction, kernel_num=kernel_num)\n        self.bn2 = nn.BatchNorm2d(planes)\n        self.downsample = downsample\n        self.stride = stride\n\n    def forward(self, x):\n        identity = x\n\n        out = self.conv1(x)\n        out = self.bn1(out)\n        out = self.relu(out)\n\n        out = self.conv2(out)\n        out = self.bn2(out)\n\n        if self.downsample is not None:\n            identity = self.downsample(x)\n\n        out += identity\n        out = self.relu(out)\n        return out\n\n\nclass Bottleneck(nn.Module):\n    expansion = 4\n\n    def __init__(self, inplanes, planes, stride=1, downsample=None, reduction=0.0625, kernel_num=1):\n        super(Bottleneck, self).__init__()\n        self.conv1 = odconv1x1(inplanes, planes, reduction=reduction, kernel_num=kernel_num)\n        self.bn1 = nn.BatchNorm2d(planes)\n        self.conv2 = odconv3x3(planes, planes, stride, reduction=reduction, kernel_num=kernel_num)\n        self.bn2 = nn.BatchNorm2d(planes)\n        self.conv3 = odconv1x1(planes, planes * self.expansion, reduction=reduction, kernel_num=kernel_num)\n        self.bn3 = nn.BatchNorm2d(planes * self.expansion)\n        self.relu = nn.ReLU(inplace=True)\n        self.downsample = downsample\n        self.stride = stride\n\n    def forward(self, x):\n        identity = x\n\n        out = self.conv1(x)\n        out = self.bn1(out)\n        out = self.relu(out)\n\n        out = self.conv2(out)\n        out = self.bn2(out)\n        out = self.relu(out)\n\n        out = self.conv3(out)\n        out = self.bn3(out)\n\n        if self.downsample is not None:\n            identity = self.downsample(x)\n\n        out += identity\n        out = self.relu(out)\n        return out\n\n\nclass OD_ResNet(nn.Module):\n    def __init__(self, block, layers, num_classes=1000, dropout=0.1, reduction=0.0625, kernel_num=1):\n        super(OD_ResNet, self).__init__()\n        self.inplanes = 64\n        self.conv1 = nn.Conv2d(3, self.inplanes, kernel_size=7, stride=2, padding=3,\n                               bias=False)\n        self.bn1 = nn.BatchNorm2d(self.inplanes)\n        self.relu = nn.ReLU(inplace=True)\n        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)\n        self.layer1 = self._make_layer(block, 64, layers[0], reduction=reduction, kernel_num=kernel_num)\n        self.layer2 = self._make_layer(block, 128, layers[1], stride=2, reduction=reduction, kernel_num=kernel_num)\n        self.layer3 = self._make_layer(block, 256, layers[2], stride=2, reduction=reduction, kernel_num=kernel_num)\n        self.layer4 = self._make_layer(block, 512, layers[3], stride=2, reduction=reduction, kernel_num=kernel_num)\n\n        for m in self.modules():\n            if isinstance(m, nn.Conv2d):\n                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')\n            elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):\n                nn.init.constant_(m.weight, 1)\n                nn.init.constant_(m.bias, 0)\n            elif isinstance(m, nn.Linear):\n                nn.init.normal_(m.weight, 0, 0.01)\n                nn.init.zeros_(m.bias)\n\n        self.channel = [i.size(1) for i in self.forward(torch.randn(2, 3, 640, 640))]\n        \n    def net_update_temperature(self, temperature):\n        for m in self.modules():\n            if hasattr(m, \"update_temperature\"):\n                m.update_temperature(temperature)\n\n    def _make_layer(self, block, planes, blocks, stride=1, reduction=0.625, kernel_num=1):\n        downsample = None\n        if stride != 1 or self.inplanes != planes * block.expansion:\n            downsample = nn.Sequential(\n                nn.Conv2d(self.inplanes, planes * block.expansion, kernel_size=1, stride=stride, padding=0, bias=False),\n                nn.BatchNorm2d(planes * block.expansion),\n            )\n\n        layers = []\n        layers.append(block(self.inplanes, planes, stride, downsample, reduction=reduction, kernel_num=kernel_num))\n        self.inplanes = planes * block.expansion\n        for _ in range(1, blocks):\n            layers.append(block(self.inplanes, planes, reduction=reduction, kernel_num=kernel_num))\n\n        return nn.Sequential(*layers) \n\n    def forward(self, x):\n        x = self.conv1(x)\n        x = self.bn1(x)\n        x1 = self.relu(x)\n        x = self.maxpool(x1)\n\n        x2 = self.layer1(x)\n        x3 = self.layer2(x2)\n        x4 = self.layer3(x3)\n        x5 = self.layer4(x4)\n        \n        return [x1, x2, x3, x4, x5]\n\ndef update_weight(model_dict, weight_dict):\n    idx, temp_dict = 0, {}\n    for k, v in weight_dict.items():\n        if k.replace('module.', '') in model_dict.keys() and np.shape(model_dict[k.replace('module.', '')]) == np.shape(v):\n            temp_dict[k.replace('module.', '')] = v\n            idx += 1\n    model_dict.update(temp_dict)\n    print(f'loading weights... {idx}/{len(model_dict)} items')\n    return model_dict\n\ndef od_resnet18(weights=None, kernel_num=1):\n    model = OD_ResNet(BasicBlock, [2, 2, 2, 2], kernel_num=kernel_num)\n    if weights is not None:\n        pretrain_weight = torch.load(weights, map_location='cpu')['state_dict']\n        model.load_state_dict(update_weight(model.state_dict(), pretrain_weight))\n    return model\n\ndef od_resnet34(weights=None, kernel_num=1):\n    model = OD_ResNet(BasicBlock, [3, 4, 6, 3], kernel_num=kernel_num)\n    if weights is not None:\n        pretrain_weight = torch.load(weights, map_location='cpu')['state_dict']\n        model.load_state_dict(update_weight(model.state_dict(), pretrain_weight))\n    return model\n\ndef od_resnet50(weights=None, kernel_num=1):\n    model = OD_ResNet(Bottleneck, [3, 4, 6, 3], kernel_num=kernel_num)\n    if weights is not None:\n        pretrain_weight = torch.load(weights, map_location='cpu')['state_dict']\n        model.load_state_dict(update_weight(model.state_dict(), pretrain_weight))\n    return model\n\ndef od_resnet101(weights=None, kernel_num=1):\n    model = OD_ResNet(Bottleneck, [3, 4, 23, 3], kernel_num=kernel_num)\n    if weights is not None:\n        pretrain_weight = torch.load(weights, map_location='cpu')['state_dict']\n        model.load_state_dict(update_weight(model.state_dict(), pretrain_weight))\n    return model"
  },
  {
    "path": "yolo-improve/yolov5-backbone/ODConvFuse/odconv.py",
    "content": "import torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport torch.autograd\n\ndef fuse_conv_bn(conv, bn):\n    # Fuse convolution and batchnorm layers https://tehnokv.com/posts/fusing-batchnorm-and-conv/\n    fusedconv = (\n        nn.Conv2d(\n            conv.in_channels,\n            conv.out_channels,\n            kernel_size=conv.kernel_size,\n            stride=conv.stride,\n            padding=conv.padding,\n            groups=conv.groups,\n            bias=True,\n        )\n        .requires_grad_(False)\n        .to(conv.weight.device)\n    )\n\n    # prepare filters\n    w_conv = conv.weight.clone().view(conv.out_channels, -1)\n    w_bn = torch.diag(bn.weight.div(torch.sqrt(bn.eps + bn.running_var)))\n    fusedconv.weight.copy_(torch.mm(w_bn, w_conv).view(fusedconv.weight.shape))\n\n    # prepare spatial bias\n    b_conv = (\n        torch.zeros(conv.weight.size(0), device=conv.weight.device)\n        if conv.bias is None\n        else conv.bias\n    )\n    b_bn = bn.bias - bn.weight.mul(bn.running_mean).div(\n        torch.sqrt(bn.running_var + bn.eps)\n    )\n    fusedconv.bias.copy_(torch.mm(w_bn, b_conv.reshape(-1, 1)).reshape(-1) + b_bn)\n    return fusedconv\n\nclass Attention(nn.Module):\n    def __init__(self, in_planes, out_planes, kernel_size, groups=1, reduction=0.0625, kernel_num=4, min_channel=16):\n        super(Attention, self).__init__()\n        attention_channel = max(int(in_planes * reduction), min_channel)\n        self.kernel_size = kernel_size\n        self.kernel_num = kernel_num\n        self.temperature = 1.0\n\n        self.avgpool = nn.AdaptiveAvgPool2d(1)\n        self.fc = nn.Conv2d(in_planes, attention_channel, 1, bias=False)\n        self.bn = nn.BatchNorm2d(attention_channel)\n        self.relu = nn.ReLU(inplace=True)\n\n        self.channel_fc = nn.Conv2d(attention_channel, in_planes, 1, bias=True)\n        self.func_channel = self.get_channel_attention\n\n        if in_planes == groups and in_planes == out_planes:  # depth-wise convolution\n            self.func_filter = self.skip\n        else:\n            self.filter_fc = nn.Conv2d(attention_channel, out_planes, 1, bias=True)\n            self.func_filter = self.get_filter_attention\n\n        if kernel_size == 1:  # point-wise convolution\n            self.func_spatial = self.skip\n        else:\n            self.spatial_fc = nn.Conv2d(attention_channel, kernel_size * kernel_size, 1, bias=True)\n            self.func_spatial = self.get_spatial_attention\n\n        if kernel_num == 1:\n            self.func_kernel = self.skip\n        else:\n            self.kernel_fc = nn.Conv2d(attention_channel, kernel_num, 1, bias=True)\n            self.func_kernel = self.get_kernel_attention\n\n        self._initialize_weights()\n\n    def _initialize_weights(self):\n        for m in self.modules():\n            if isinstance(m, nn.Conv2d):\n                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')\n                if m.bias is not None:\n                    nn.init.constant_(m.bias, 0)\n            if isinstance(m, nn.BatchNorm2d):\n                nn.init.constant_(m.weight, 1)\n                nn.init.constant_(m.bias, 0)\n\n    def update_temperature(self, temperature):\n        self.temperature = temperature\n\n    @staticmethod\n    def skip(_):\n        return 1.0\n\n    def get_channel_attention(self, x):\n        channel_attention = torch.sigmoid(self.channel_fc(x).view(x.size(0), -1, 1, 1) / self.temperature)\n        return channel_attention\n\n    def get_filter_attention(self, x):\n        filter_attention = torch.sigmoid(self.filter_fc(x).view(x.size(0), -1, 1, 1) / self.temperature)\n        return filter_attention\n\n    def get_spatial_attention(self, x):\n        spatial_attention = self.spatial_fc(x).view(x.size(0), 1, 1, 1, self.kernel_size, self.kernel_size)\n        spatial_attention = torch.sigmoid(spatial_attention / self.temperature)\n        return spatial_attention\n\n    def get_kernel_attention(self, x):\n        kernel_attention = self.kernel_fc(x).view(x.size(0), -1, 1, 1, 1, 1)\n        kernel_attention = F.softmax(kernel_attention / self.temperature, dim=1)\n        return kernel_attention\n\n    def forward(self, x):\n        x = self.avgpool(x)\n        x = self.fc(x)\n        if hasattr(self, 'bn'):\n            x = self.bn(x)\n        x = self.relu(x)\n        return self.func_channel(x), self.func_filter(x), self.func_spatial(x), self.func_kernel(x)\n    \n    def fuse(self):\n        self.fc = fuse_conv_bn(self.fc, self.bn)\n        del self.bn\n\n\nclass ODConv2d(nn.Module):\n    def __init__(self, in_planes, out_planes, kernel_size, stride=1, padding=0, dilation=1, groups=1,\n                 reduction=0.0625, kernel_num=4):\n        super(ODConv2d, self).__init__()\n        self.in_planes = in_planes\n        self.out_planes = out_planes\n        self.kernel_size = kernel_size\n        self.stride = stride\n        self.padding = padding\n        self.dilation = dilation\n        self.groups = groups\n        self.kernel_num = kernel_num\n        self.attention = Attention(in_planes, out_planes, kernel_size, groups=groups,\n                                   reduction=reduction, kernel_num=kernel_num)\n        self.weight = nn.Parameter(torch.randn(kernel_num, out_planes, in_planes//groups, kernel_size, kernel_size),\n                                   requires_grad=True)\n        self._initialize_weights()\n\n        if self.kernel_size == 1 and self.kernel_num == 1:\n            self._forward_impl = self._forward_impl_pw1x\n        else:\n            self._forward_impl = self._forward_impl_common\n\n    def _initialize_weights(self):\n        for i in range(self.kernel_num):\n            nn.init.kaiming_normal_(self.weight[i], mode='fan_out', nonlinearity='relu')\n\n    def update_temperature(self, temperature):\n        self.attention.update_temperature(temperature)\n\n    def _forward_impl_common(self, x):\n        # Multiplying channel attention (or filter attention) to weights and feature maps are equivalent,\n        # while we observe that when using the latter method the models will run faster with less gpu memory cost.\n        channel_attention, filter_attention, spatial_attention, kernel_attention = self.attention(x)\n        batch_size, in_planes, height, width = x.size()\n        x = x * channel_attention\n        x = x.reshape(1, -1, height, width)\n        aggregate_weight = spatial_attention * kernel_attention * self.weight.unsqueeze(dim=0)\n        aggregate_weight = torch.sum(aggregate_weight, dim=1).view(\n            [-1, self.in_planes // self.groups, self.kernel_size, self.kernel_size])\n        output = F.conv2d(x, weight=aggregate_weight, bias=None, stride=self.stride, padding=self.padding,\n                          dilation=self.dilation, groups=self.groups * batch_size)\n        output = output.view(batch_size, self.out_planes, output.size(-2), output.size(-1))\n        output = output * filter_attention\n        return output\n\n    def _forward_impl_pw1x(self, x):\n        channel_attention, filter_attention, spatial_attention, kernel_attention = self.attention(x)\n        x = x * channel_attention\n        output = F.conv2d(x, weight=self.weight.squeeze(dim=0), bias=None, stride=self.stride, padding=self.padding,\n                          dilation=self.dilation, groups=self.groups)\n        output = output * filter_attention\n        return output\n\n    def forward(self, x):\n        return self._forward_impl(x)"
  },
  {
    "path": "yolo-improve/yolov5-backbone/PoolFormer/poolformer.py",
    "content": "# Copyright 2021 Garena Online Private Limited\n#\n# Licensed under the Apache License, Version 2.0 (the \"License\");\n# you may not use this file except in compliance with the License.\n# You may obtain a copy of the License at\n#\n#     http://www.apache.org/licenses/LICENSE-2.0\n#\n# Unless required by applicable law or agreed to in writing, software\n# distributed under the License is distributed on an \"AS IS\" BASIS,\n# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n# See the License for the specific language governing permissions and\n# limitations under the License.\n\"\"\"\nPoolFormer implementation\n\"\"\"\nimport os\nimport copy\nimport torch\nimport torch.nn as nn\nimport numpy as np\n\nfrom timm.data import IMAGENET_DEFAULT_MEAN, IMAGENET_DEFAULT_STD\nfrom timm.models.layers import DropPath, trunc_normal_, to_2tuple\nfrom timm.models.registry import register_model\n\n__all__ = ['poolformer_s12', 'poolformer_s24', 'poolformer_s36', 'poolformer_m48', 'poolformer_m36']\n\ndef _cfg(url='', **kwargs):\n    return {\n        'url': url,\n        'num_classes': 1000, 'pool_size': None,\n        'crop_pct': .95, 'interpolation': 'bicubic',\n        'mean': IMAGENET_DEFAULT_MEAN, 'std': IMAGENET_DEFAULT_STD, \n        'classifier': 'head',\n        **kwargs\n    }\n\n\ndefault_cfgs = {\n    'poolformer_s': _cfg(crop_pct=0.9),\n    'poolformer_m': _cfg(crop_pct=0.95),\n}\n\n\nclass PatchEmbed(nn.Module):\n    \"\"\"\n    Patch Embedding that is implemented by a layer of conv. \n    Input: tensor in shape [B, C, H, W]\n    Output: tensor in shape [B, C, H/stride, W/stride]\n    \"\"\"\n    def __init__(self, patch_size=16, stride=16, padding=0, \n                 in_chans=3, embed_dim=768, norm_layer=None):\n        super().__init__()\n        patch_size = to_2tuple(patch_size)\n        stride = to_2tuple(stride)\n        padding = to_2tuple(padding)\n        self.proj = nn.Conv2d(in_chans, embed_dim, kernel_size=patch_size, \n                              stride=stride, padding=padding)\n        self.norm = norm_layer(embed_dim) if norm_layer else nn.Identity()\n\n    def forward(self, x):\n        x = self.proj(x)\n        x = self.norm(x)\n        return x\n\n\nclass LayerNormChannel(nn.Module):\n    \"\"\"\n    LayerNorm only for Channel Dimension.\n    Input: tensor in shape [B, C, H, W]\n    \"\"\"\n    def __init__(self, num_channels, eps=1e-05):\n        super().__init__()\n        self.weight = nn.Parameter(torch.ones(num_channels))\n        self.bias = nn.Parameter(torch.zeros(num_channels))\n        self.eps = eps\n\n    def forward(self, x):\n        u = x.mean(1, keepdim=True)\n        s = (x - u).pow(2).mean(1, keepdim=True)\n        x = (x - u) / torch.sqrt(s + self.eps)\n        x = self.weight.unsqueeze(-1).unsqueeze(-1) * x \\\n            + self.bias.unsqueeze(-1).unsqueeze(-1)\n        return x\n\n\nclass GroupNorm(nn.GroupNorm):\n    \"\"\"\n    Group Normalization with 1 group.\n    Input: tensor in shape [B, C, H, W]\n    \"\"\"\n    def __init__(self, num_channels, **kwargs):\n        super().__init__(1, num_channels, **kwargs)\n\n\nclass Pooling(nn.Module):\n    \"\"\"\n    Implementation of pooling for PoolFormer\n    --pool_size: pooling size\n    \"\"\"\n    def __init__(self, pool_size=3):\n        super().__init__()\n        self.pool = nn.AvgPool2d(\n            pool_size, stride=1, padding=pool_size//2, count_include_pad=False)\n\n    def forward(self, x):\n        return self.pool(x) - x\n\n\nclass Mlp(nn.Module):\n    \"\"\"\n    Implementation of MLP with 1*1 convolutions.\n    Input: tensor with shape [B, C, H, W]\n    \"\"\"\n    def __init__(self, in_features, hidden_features=None, \n                 out_features=None, act_layer=nn.GELU, drop=0.):\n        super().__init__()\n        out_features = out_features or in_features\n        hidden_features = hidden_features or in_features\n        self.fc1 = nn.Conv2d(in_features, hidden_features, 1)\n        self.act = act_layer()\n        self.fc2 = nn.Conv2d(hidden_features, out_features, 1)\n        self.drop = nn.Dropout(drop)\n        self.apply(self._init_weights)\n\n    def _init_weights(self, m):\n        if isinstance(m, nn.Conv2d):\n            trunc_normal_(m.weight, std=.02)\n            if m.bias is not None:\n                nn.init.constant_(m.bias, 0)\n\n    def forward(self, x):\n        x = self.fc1(x)\n        x = self.act(x)\n        x = self.drop(x)\n        x = self.fc2(x)\n        x = self.drop(x)\n        return x\n\n\nclass PoolFormerBlock(nn.Module):\n    \"\"\"\n    Implementation of one PoolFormer block.\n    --dim: embedding dim\n    --pool_size: pooling size\n    --mlp_ratio: mlp expansion ratio\n    --act_layer: activation\n    --norm_layer: normalization\n    --drop: dropout rate\n    --drop path: Stochastic Depth, \n        refer to https://arxiv.org/abs/1603.09382\n    --use_layer_scale, --layer_scale_init_value: LayerScale, \n        refer to https://arxiv.org/abs/2103.17239\n    \"\"\"\n    def __init__(self, dim, pool_size=3, mlp_ratio=4., \n                 act_layer=nn.GELU, norm_layer=GroupNorm, \n                 drop=0., drop_path=0., \n                 use_layer_scale=True, layer_scale_init_value=1e-5):\n\n        super().__init__()\n\n        self.norm1 = norm_layer(dim)\n        self.token_mixer = Pooling(pool_size=pool_size)\n        self.norm2 = norm_layer(dim)\n        mlp_hidden_dim = int(dim * mlp_ratio)\n        self.mlp = Mlp(in_features=dim, hidden_features=mlp_hidden_dim, \n                       act_layer=act_layer, drop=drop)\n\n        # The following two techniques are useful to train deep PoolFormers.\n        self.drop_path = DropPath(drop_path) if drop_path > 0. \\\n            else nn.Identity()\n        self.use_layer_scale = use_layer_scale\n        if use_layer_scale:\n            self.layer_scale_1 = nn.Parameter(\n                layer_scale_init_value * torch.ones((dim)), requires_grad=True)\n            self.layer_scale_2 = nn.Parameter(\n                layer_scale_init_value * torch.ones((dim)), requires_grad=True)\n\n    def forward(self, x):\n        if self.use_layer_scale:\n            x = x + self.drop_path(\n                self.layer_scale_1.unsqueeze(-1).unsqueeze(-1)\n                * self.token_mixer(self.norm1(x)))\n            x = x + self.drop_path(\n                self.layer_scale_2.unsqueeze(-1).unsqueeze(-1)\n                * self.mlp(self.norm2(x)))\n        else:\n            x = x + self.drop_path(self.token_mixer(self.norm1(x)))\n            x = x + self.drop_path(self.mlp(self.norm2(x)))\n        return x\n\n\ndef basic_blocks(dim, index, layers, \n                 pool_size=3, mlp_ratio=4., \n                 act_layer=nn.GELU, norm_layer=GroupNorm, \n                 drop_rate=.0, drop_path_rate=0., \n                 use_layer_scale=True, layer_scale_init_value=1e-5):\n    \"\"\"\n    generate PoolFormer blocks for a stage\n    return: PoolFormer blocks \n    \"\"\"\n    blocks = []\n    for block_idx in range(layers[index]):\n        block_dpr = drop_path_rate * (\n            block_idx + sum(layers[:index])) / (sum(layers) - 1)\n        blocks.append(PoolFormerBlock(\n            dim, pool_size=pool_size, mlp_ratio=mlp_ratio, \n            act_layer=act_layer, norm_layer=norm_layer, \n            drop=drop_rate, drop_path=block_dpr, \n            use_layer_scale=use_layer_scale, \n            layer_scale_init_value=layer_scale_init_value, \n            ))\n    blocks = nn.Sequential(*blocks)\n\n    return blocks\n\n\nclass PoolFormer(nn.Module):\n    \"\"\"\n    PoolFormer, the main class of our model\n    --layers: [x,x,x,x], number of blocks for the 4 stages\n    --embed_dims, --mlp_ratios, --pool_size: the embedding dims, mlp ratios and \n        pooling size for the 4 stages\n    --downsamples: flags to apply downsampling or not\n    --norm_layer, --act_layer: define the types of normalization and activation\n    --num_classes: number of classes for the image classification\n    --in_patch_size, --in_stride, --in_pad: specify the patch embedding\n        for the input image\n    --down_patch_size --down_stride --down_pad: \n        specify the downsample (patch embed.)\n    --fork_feat: whether output features of the 4 stages, for dense prediction\n    --init_cfg, --pretrained: \n        for mmdetection and mmsegmentation to load pretrained weights\n    \"\"\"\n    def __init__(self, layers, embed_dims=None, \n                 mlp_ratios=None, downsamples=None, \n                 pool_size=3, \n                 norm_layer=GroupNorm, act_layer=nn.GELU, \n                 num_classes=1000,\n                 in_patch_size=7, in_stride=4, in_pad=2, \n                 down_patch_size=3, down_stride=2, down_pad=1, \n                 drop_rate=0., drop_path_rate=0.,\n                 use_layer_scale=True, layer_scale_init_value=1e-5, \n                 fork_feat=True,\n                 init_cfg=None, \n                 pretrained=None, \n                 **kwargs):\n\n        super().__init__()\n\n        if not fork_feat:\n            self.num_classes = num_classes\n        self.fork_feat = fork_feat\n\n        self.patch_embed = PatchEmbed(\n            patch_size=in_patch_size, stride=in_stride, padding=in_pad, \n            in_chans=3, embed_dim=embed_dims[0])\n\n        # set the main block in network\n        network = []\n        for i in range(len(layers)):\n            stage = basic_blocks(embed_dims[i], i, layers, \n                                 pool_size=pool_size, mlp_ratio=mlp_ratios[i],\n                                 act_layer=act_layer, norm_layer=norm_layer, \n                                 drop_rate=drop_rate, \n                                 drop_path_rate=drop_path_rate,\n                                 use_layer_scale=use_layer_scale, \n                                 layer_scale_init_value=layer_scale_init_value)\n            network.append(stage)\n            if i >= len(layers) - 1:\n                break\n            if downsamples[i] or embed_dims[i] != embed_dims[i+1]:\n                # downsampling between two stages\n                network.append(\n                    PatchEmbed(\n                        patch_size=down_patch_size, stride=down_stride, \n                        padding=down_pad, \n                        in_chans=embed_dims[i], embed_dim=embed_dims[i+1]\n                        )\n                    )\n\n        self.network = nn.ModuleList(network)\n\n        if self.fork_feat:\n            # add a norm layer for each output\n            self.out_indices = [0, 2, 4, 6]\n            for i_emb, i_layer in enumerate(self.out_indices):\n                if i_emb == 0 and os.environ.get('FORK_LAST3', None):\n                    # TODO: more elegant way\n                    \"\"\"For RetinaNet, `start_level=1`. The first norm layer will not used.\n                    cmd: `FORK_LAST3=1 python -m torch.distributed.launch ...`\n                    \"\"\"\n                    layer = nn.Identity()\n                else:\n                    layer = norm_layer(embed_dims[i_emb])\n                layer_name = f'norm{i_layer}'\n                self.add_module(layer_name, layer)\n        else:\n            # Classifier head\n            self.norm = norm_layer(embed_dims[-1])\n            self.head = nn.Linear(\n                embed_dims[-1], num_classes) if num_classes > 0 \\\n                else nn.Identity()\n        self.init_cfg = copy.deepcopy(init_cfg)\n        self.channel = [i.size(1) for i in self.forward(torch.randn(1, 3, 224, 224))]\n\n    def reset_classifier(self, num_classes):\n        self.num_classes = num_classes\n        self.head = nn.Linear(\n            self.embed_dim, num_classes) if num_classes > 0 else nn.Identity()\n\n    def forward_embeddings(self, x):\n        x = self.patch_embed(x)\n        return x\n\n    def forward_tokens(self, x):\n        outs = []\n        for idx, block in enumerate(self.network):\n            x = block(x)\n            if self.fork_feat and idx in self.out_indices:\n                norm_layer = getattr(self, f'norm{idx}')\n                x_out = norm_layer(x)\n                outs.append(x_out)\n        return outs\n\n    def forward(self, x):\n        # input embedding\n        x = self.forward_embeddings(x)\n        # through backbone\n        x = self.forward_tokens(x)\n        return x\n\n\nmodel_urls = {\n    \"poolformer_s12\": \"https://github.com/sail-sg/poolformer/releases/download/v1.0/poolformer_s12.pth.tar\",\n    \"poolformer_s24\": \"https://github.com/sail-sg/poolformer/releases/download/v1.0/poolformer_s24.pth.tar\",\n    \"poolformer_s36\": \"https://github.com/sail-sg/poolformer/releases/download/v1.0/poolformer_s36.pth.tar\",\n    \"poolformer_m36\": \"https://github.com/sail-sg/poolformer/releases/download/v1.0/poolformer_m36.pth.tar\",\n    \"poolformer_m48\": \"https://github.com/sail-sg/poolformer/releases/download/v1.0/poolformer_m48.pth.tar\",\n}\n\ndef update_weight(model_dict, weight_dict):\n    idx, temp_dict = 0, {}\n    for k, v in weight_dict.items():\n        if k in model_dict.keys() and np.shape(model_dict[k]) == np.shape(v):\n            temp_dict[k] = v\n            idx += 1\n    model_dict.update(temp_dict)\n    print(f'loading weights... {idx}/{len(model_dict)} items')\n    return model_dict\n\ndef poolformer_s12(pretrained=False, **kwargs):\n    \"\"\"\n    PoolFormer-S12 model, Params: 12M\n    --layers: [x,x,x,x], numbers of layers for the four stages\n    --embed_dims, --mlp_ratios: \n        embedding dims and mlp ratios for the four stages\n    --downsamples: flags to apply downsampling or not in four blocks\n    \"\"\"\n    layers = [2, 2, 6, 2]\n    embed_dims = [64, 128, 320, 512]\n    mlp_ratios = [4, 4, 4, 4]\n    downsamples = [True, True, True, True]\n    model = PoolFormer(\n        layers, embed_dims=embed_dims, \n        mlp_ratios=mlp_ratios, downsamples=downsamples, \n        **kwargs)\n    model.default_cfg = default_cfgs['poolformer_s']\n    if pretrained:\n        url = model_urls['poolformer_s12']\n        checkpoint = torch.hub.load_state_dict_from_url(url=url, map_location=\"cpu\", check_hash=True)\n        model.load_state_dict(update_weight(model.state_dict(), checkpoint))\n    return model\n\ndef poolformer_s24(pretrained=False, **kwargs):\n    \"\"\"\n    PoolFormer-S24 model, Params: 21M\n    \"\"\"\n    layers = [4, 4, 12, 4]\n    embed_dims = [64, 128, 320, 512]\n    mlp_ratios = [4, 4, 4, 4]\n    downsamples = [True, True, True, True]\n    model = PoolFormer(\n        layers, embed_dims=embed_dims, \n        mlp_ratios=mlp_ratios, downsamples=downsamples, \n        **kwargs)\n    model.default_cfg = default_cfgs['poolformer_s']\n    if pretrained:\n        url = model_urls['poolformer_s24']\n        checkpoint = torch.hub.load_state_dict_from_url(url=url, map_location=\"cpu\", check_hash=True)\n        model.load_state_dict(update_weight(model.state_dict(), checkpoint))\n    return model\n\ndef poolformer_s36(pretrained=False, **kwargs):\n    \"\"\"\n    PoolFormer-S36 model, Params: 31M\n    \"\"\"\n    layers = [6, 6, 18, 6]\n    embed_dims = [64, 128, 320, 512]\n    mlp_ratios = [4, 4, 4, 4]\n    downsamples = [True, True, True, True]\n    model = PoolFormer(\n        layers, embed_dims=embed_dims, \n        mlp_ratios=mlp_ratios, downsamples=downsamples, \n        layer_scale_init_value=1e-6, \n        **kwargs)\n    model.default_cfg = default_cfgs['poolformer_s']\n    if pretrained:\n        url = model_urls['poolformer_s36']\n        checkpoint = torch.hub.load_state_dict_from_url(url=url, map_location=\"cpu\", check_hash=True)\n        model.load_state_dict(update_weight(model.state_dict(), checkpoint))\n    return model\n\ndef poolformer_m36(pretrained=False, **kwargs):\n    \"\"\"\n    PoolFormer-M36 model, Params: 56M\n    \"\"\"\n    layers = [6, 6, 18, 6]\n    embed_dims = [96, 192, 384, 768]\n    mlp_ratios = [4, 4, 4, 4]\n    downsamples = [True, True, True, True]\n    model = PoolFormer(\n        layers, embed_dims=embed_dims, \n        mlp_ratios=mlp_ratios, downsamples=downsamples, \n        layer_scale_init_value=1e-6, \n        **kwargs)\n    model.default_cfg = default_cfgs['poolformer_m']\n    if pretrained:\n        url = model_urls['poolformer_m36']\n        checkpoint = torch.hub.load_state_dict_from_url(url=url, map_location=\"cpu\", check_hash=True)\n        model.load_state_dict(update_weight(model.state_dict(), checkpoint))\n    return model\n\n\n@register_model\ndef poolformer_m48(pretrained=False, **kwargs):\n    \"\"\"\n    PoolFormer-M48 model, Params: 73M\n    \"\"\"\n    layers = [8, 8, 24, 8]\n    embed_dims = [96, 192, 384, 768]\n    mlp_ratios = [4, 4, 4, 4]\n    downsamples = [True, True, True, True]\n    model = PoolFormer(\n        layers, embed_dims=embed_dims, \n        mlp_ratios=mlp_ratios, downsamples=downsamples, \n        layer_scale_init_value=1e-6, \n        **kwargs)\n    model.default_cfg = default_cfgs['poolformer_m']\n    if pretrained:\n        url = model_urls['poolformer_m48']\n        checkpoint = torch.hub.load_state_dict_from_url(url=url, map_location=\"cpu\", check_hash=True)\n        model.load_state_dict(update_weight(model.state_dict(), checkpoint))\n    return model\n\nif __name__ == '__main__':\n    model = poolformer_s12(pretrained=True)\n    inputs = torch.randn((1, 3, 640, 640))\n    for i in model(inputs):\n        print(i.size())"
  },
  {
    "path": "yolo-improve/yolov5-backbone/RIFormer/RIFormer.py",
    "content": "# Copyright (c) OpenMMLab. All rights reserved.\nfrom typing import Sequence\nimport torch\nimport torch.nn as nn\nimport numpy as np\nfrom mmcv.cnn.bricks import DropPath, build_activation_layer, build_norm_layer\nfrom mmengine.model import BaseModule\n\n__all__ = ['RIFormer']\n\nclass Mlp(nn.Module):\n    \"\"\"Mlp implemented by with 1*1 convolutions.\n\n    Input: Tensor with shape [B, C, H, W].\n    Output: Tensor with shape [B, C, H, W].\n    Args:\n        in_features (int): Dimension of input features.\n        hidden_features (int): Dimension of hidden features.\n        out_features (int): Dimension of output features.\n        act_cfg (dict): The config dict for activation between pointwise\n            convolution. Defaults to ``dict(type='GELU')``.\n        drop (float): Dropout rate. Defaults to 0.0.\n    \"\"\"\n\n    def __init__(self,\n                 in_features,\n                 hidden_features=None,\n                 out_features=None,\n                 act_cfg=dict(type='GELU'),\n                 drop=0.):\n        super().__init__()\n        out_features = out_features or in_features\n        hidden_features = hidden_features or in_features\n        self.fc1 = nn.Conv2d(in_features, hidden_features, 1)\n        self.act = build_activation_layer(act_cfg)\n        self.fc2 = nn.Conv2d(hidden_features, out_features, 1)\n        self.drop = nn.Dropout(drop)\n\n    def forward(self, x):\n        x = self.fc1(x)\n        x = self.act(x)\n        x = self.drop(x)\n        x = self.fc2(x)\n        x = self.drop(x)\n        return x\n\nclass PatchEmbed(nn.Module):\n    \"\"\"Patch Embedding module implemented by a layer of convolution.\n\n    Input: tensor in shape [B, C, H, W]\n    Output: tensor in shape [B, C, H/stride, W/stride]\n    Args:\n        patch_size (int): Patch size of the patch embedding. Defaults to 16.\n        stride (int): Stride of the patch embedding. Defaults to 16.\n        padding (int): Padding of the patch embedding. Defaults to 0.\n        in_chans (int): Input channels. Defaults to 3.\n        embed_dim (int): Output dimension of the patch embedding.\n            Defaults to 768.\n        norm_layer (module): Normalization module. Defaults to None (not use).\n    \"\"\"\n\n    def __init__(self,\n                 patch_size=16,\n                 stride=16,\n                 padding=0,\n                 in_chans=3,\n                 embed_dim=768,\n                 norm_layer=None):\n        super().__init__()\n        self.proj = nn.Conv2d(\n            in_chans,\n            embed_dim,\n            kernel_size=patch_size,\n            stride=stride,\n            padding=padding)\n        self.norm = norm_layer(embed_dim) if norm_layer else nn.Identity()\n\n    def forward(self, x):\n        x = self.proj(x)\n        x = self.norm(x)\n        return x\n\n\nclass Affine(nn.Module):\n    \"\"\"Affine Transformation module.\n\n    Args:\n        in_features (int): Input dimension.\n    \"\"\"\n\n    def __init__(self, in_features):\n        super().__init__()\n        self.affine = nn.Conv2d(\n            in_features,\n            in_features,\n            kernel_size=1,\n            stride=1,\n            padding=0,\n            groups=in_features,\n            bias=True)\n\n    def forward(self, x):\n        return self.affine(x) - x\n\n\nclass RIFormerBlock(BaseModule):\n    \"\"\"RIFormer Block.\n\n    Args:\n        dim (int): Embedding dim.\n        mlp_ratio (float): Mlp expansion ratio. Defaults to 4.\n        norm_cfg (dict): The config dict for norm layers.\n            Defaults to ``dict(type='GN', num_groups=1)``.\n        act_cfg (dict): The config dict for activation between pointwise\n            convolution. Defaults to ``dict(type='GELU')``.\n        drop (float): Dropout rate. Defaults to 0.\n        drop_path (float): Stochastic depth rate. Defaults to 0.\n        layer_scale_init_value (float): Init value for Layer Scale.\n            Defaults to 1e-5.\n        deploy (bool): Whether to switch the model structure to\n            deployment mode. Default: False.\n    \"\"\"\n\n    def __init__(self,\n                 dim,\n                 mlp_ratio=4.,\n                 norm_cfg=dict(type='GN', num_groups=1),\n                 act_cfg=dict(type='GELU'),\n                 drop=0.,\n                 drop_path=0.,\n                 layer_scale_init_value=1e-5,\n                 deploy=False):\n\n        super().__init__()\n\n        if deploy:\n            self.norm_reparam = build_norm_layer(norm_cfg, dim)[1]\n        else:\n            self.norm1 = build_norm_layer(norm_cfg, dim)[1]\n            self.token_mixer = Affine(in_features=dim)\n        self.norm2 = build_norm_layer(norm_cfg, dim)[1]\n        mlp_hidden_dim = int(dim * mlp_ratio)\n        self.mlp = Mlp(\n            in_features=dim,\n            hidden_features=mlp_hidden_dim,\n            act_cfg=act_cfg,\n            drop=drop)\n\n        # The following two techniques are useful to train deep RIFormers.\n        self.drop_path = DropPath(drop_path) if drop_path > 0. \\\n            else nn.Identity()\n        self.layer_scale_1 = nn.Parameter(\n            layer_scale_init_value * torch.ones((dim)), requires_grad=True)\n        self.layer_scale_2 = nn.Parameter(\n            layer_scale_init_value * torch.ones((dim)), requires_grad=True)\n        self.norm_cfg = norm_cfg\n        self.dim = dim\n        self.deploy = deploy\n\n    def forward(self, x):\n        if hasattr(self, 'norm_reparam'):\n            x = x + self.drop_path(\n                self.layer_scale_1.unsqueeze(-1).unsqueeze(-1) *\n                self.norm_reparam(x))\n            x = x + self.drop_path(\n                self.layer_scale_2.unsqueeze(-1).unsqueeze(-1) *\n                self.mlp(self.norm2(x)))\n        else:\n            x = x + self.drop_path(\n                self.layer_scale_1.unsqueeze(-1).unsqueeze(-1) *\n                self.token_mixer(self.norm1(x)))\n            x = x + self.drop_path(\n                self.layer_scale_2.unsqueeze(-1).unsqueeze(-1) *\n                self.mlp(self.norm2(x)))\n        return x\n\n    def fuse_affine(self, norm, token_mixer):\n        gamma_affn = token_mixer.affine.weight.reshape(-1)\n        gamma_affn = gamma_affn - torch.ones_like(gamma_affn)\n        beta_affn = token_mixer.affine.bias\n        gamma_ln = norm.weight\n        beta_ln = norm.bias\n        return (gamma_ln * gamma_affn), (beta_ln * gamma_affn + beta_affn)\n\n    def get_equivalent_scale_bias(self):\n        eq_s, eq_b = self.fuse_affine(self.norm1, self.token_mixer)\n        return eq_s, eq_b\n\n    def switch_to_deploy(self):\n        if self.deploy:\n            return\n        eq_s, eq_b = self.get_equivalent_scale_bias()\n        self.norm_reparam = build_norm_layer(self.norm_cfg, self.dim)[1]\n        self.norm_reparam.weight.data = eq_s\n        self.norm_reparam.bias.data = eq_b\n        self.__delattr__('norm1')\n        if hasattr(self, 'token_mixer'):\n            self.__delattr__('token_mixer')\n        self.deploy = True\n\n\ndef basic_blocks(dim,\n                 index,\n                 layers,\n                 mlp_ratio=4.,\n                 norm_cfg=dict(type='GN', num_groups=1),\n                 act_cfg=dict(type='GELU'),\n                 drop_rate=.0,\n                 drop_path_rate=0.,\n                 layer_scale_init_value=1e-5,\n                 deploy=False):\n    \"\"\"generate RIFormer blocks for a stage.\"\"\"\n    blocks = []\n    for block_idx in range(layers[index]):\n        block_dpr = drop_path_rate * (block_idx + sum(layers[:index])) / (\n            sum(layers) - 1)\n        blocks.append(\n            RIFormerBlock(\n                dim,\n                mlp_ratio=mlp_ratio,\n                norm_cfg=norm_cfg,\n                act_cfg=act_cfg,\n                drop=drop_rate,\n                drop_path=block_dpr,\n                layer_scale_init_value=layer_scale_init_value,\n                deploy=deploy,\n            ))\n    blocks = nn.Sequential(*blocks)\n\n    return blocks\n\ndef update_weight(model_dict, weight_dict):\n    idx, temp_dict = 0, {}\n    for k, v in weight_dict.items():\n        k = k[9:]\n        if k in model_dict.keys() and np.shape(model_dict[k]) == np.shape(v):\n            temp_dict[k] = v\n            idx += 1\n    model_dict.update(temp_dict)\n    print(f'loading weights... {idx}/{len(model_dict)} items')\n    return model_dict\n\nclass RIFormer(nn.Module):\n    \"\"\"RIFormer.\n\n    A PyTorch implementation of RIFormer introduced by:\n    `RIFormer: Keep Your Vision Backbone Effective But Removing Token Mixer <https://arxiv.org/abs/xxxx.xxxxx>`_\n\n    Args:\n        arch (str | dict): The model's architecture. If string, it should be\n            one of architecture in ``RIFormer.arch_settings``. And if dict, it\n            should include the following two keys:\n\n            - layers (list[int]): Number of blocks at each stage.\n            - embed_dims (list[int]): The number of channels at each stage.\n            - mlp_ratios (list[int]): Expansion ratio of MLPs.\n            - layer_scale_init_value (float): Init value for Layer Scale.\n\n            Defaults to 'S12'.\n\n        norm_cfg (dict): The config dict for norm layers.\n            Defaults to ``dict(type='LN2d', eps=1e-6)``.\n        act_cfg (dict): The config dict for activation between pointwise\n            convolution. Defaults to ``dict(type='GELU')``.\n        in_patch_size (int): The patch size of/? input image patch embedding.\n            Defaults to 7.\n        in_stride (int): The stride of input image patch embedding.\n            Defaults to 4.\n        in_pad (int): The padding of input image patch embedding.\n            Defaults to 2.\n        down_patch_size (int): The patch size of downsampling patch embedding.\n            Defaults to 3.\n        down_stride (int): The stride of downsampling patch embedding.\n            Defaults to 2.\n        down_pad (int): The padding of downsampling patch embedding.\n            Defaults to 1.\n        drop_rate (float): Dropout rate. Defaults to 0.\n        drop_path_rate (float): Stochastic depth rate. Defaults to 0.\n        out_indices (Sequence | int): Output from which network position.\n            Index 0-6 respectively corresponds to\n            [stage1, downsampling, stage2, downsampling, stage3, downsampling, stage4]\n            Defaults to -1, means the last stage.\n        frozen_stages (int): Stages to be frozen (all param fixed).\n            Defaults to -1, which means not freezing any parameters.\n        deploy (bool): Whether to switch the model structure to\n            deployment mode. Default: False.\n        init_cfg (dict, optional): Initialization config dict\n    \"\"\"  # noqa: E501\n\n    # --layers: [x,x,x,x], numbers of layers for the four stages\n    # --embed_dims, --mlp_ratios:\n    #     embedding dims and mlp ratios for the four stages\n    # --downsamples: flags to apply downsampling or not in four blocks\n    arch_settings = {\n        's12': {\n            'layers': [2, 2, 6, 2],\n            'embed_dims': [64, 128, 320, 512],\n            'mlp_ratios': [4, 4, 4, 4],\n            'layer_scale_init_value': 1e-5,\n        },\n        's24': {\n            'layers': [4, 4, 12, 4],\n            'embed_dims': [64, 128, 320, 512],\n            'mlp_ratios': [4, 4, 4, 4],\n            'layer_scale_init_value': 1e-5,\n        },\n        's36': {\n            'layers': [6, 6, 18, 6],\n            'embed_dims': [64, 128, 320, 512],\n            'mlp_ratios': [4, 4, 4, 4],\n            'layer_scale_init_value': 1e-6,\n        },\n        'm36': {\n            'layers': [6, 6, 18, 6],\n            'embed_dims': [96, 192, 384, 768],\n            'mlp_ratios': [4, 4, 4, 4],\n            'layer_scale_init_value': 1e-6,\n        },\n        'm48': {\n            'layers': [8, 8, 24, 8],\n            'embed_dims': [96, 192, 384, 768],\n            'mlp_ratios': [4, 4, 4, 4],\n            'layer_scale_init_value': 1e-6,\n        },\n    }\n\n    def __init__(self,\n                 arch='s12',\n                 weights = '',\n                 in_channels=3,\n                 norm_cfg=dict(type='GN', num_groups=1),\n                 act_cfg=dict(type='GELU'),\n                 in_patch_size=7,\n                 in_stride=4,\n                 in_pad=2,\n                 down_patch_size=3,\n                 down_stride=2,\n                 down_pad=1,\n                 drop_rate=0.,\n                 drop_path_rate=0.,\n                 out_indices=[0, 2, 4, 6],\n                 deploy=False):\n\n        super().__init__()\n\n        if isinstance(arch, str):\n            assert arch in self.arch_settings, \\\n                f'Unavailable arch, please choose from ' \\\n                f'({set(self.arch_settings)}) or pass a dict.'\n            arch = self.arch_settings[arch]\n        elif isinstance(arch, dict):\n            assert 'layers' in arch and 'embed_dims' in arch, \\\n                f'The arch dict must have \"layers\" and \"embed_dims\", ' \\\n                f'but got {list(arch.keys())}.'\n\n        layers = arch['layers']\n        embed_dims = arch['embed_dims']\n        mlp_ratios = arch['mlp_ratios'] \\\n            if 'mlp_ratios' in arch else [4, 4, 4, 4]\n        layer_scale_init_value = arch['layer_scale_init_value'] \\\n            if 'layer_scale_init_value' in arch else 1e-5\n\n        self.patch_embed = PatchEmbed(\n            patch_size=in_patch_size,\n            stride=in_stride,\n            padding=in_pad,\n            in_chans=in_channels,\n            embed_dim=embed_dims[0])\n\n        # set the main block in network\n        network = []\n        for i in range(len(layers)):\n            stage = basic_blocks(\n                embed_dims[i],\n                i,\n                layers,\n                mlp_ratio=mlp_ratios[i],\n                norm_cfg=norm_cfg,\n                act_cfg=act_cfg,\n                drop_rate=drop_rate,\n                drop_path_rate=drop_path_rate,\n                layer_scale_init_value=layer_scale_init_value,\n                deploy=deploy)\n            network.append(stage)\n            if i >= len(layers) - 1:\n                break\n            if embed_dims[i] != embed_dims[i + 1]:\n                # downsampling between two stages\n                network.append(\n                    PatchEmbed(\n                        patch_size=down_patch_size,\n                        stride=down_stride,\n                        padding=down_pad,\n                        in_chans=embed_dims[i],\n                        embed_dim=embed_dims[i + 1]))\n\n        self.network = nn.ModuleList(network)\n\n        if isinstance(out_indices, int):\n            out_indices = [out_indices]\n        assert isinstance(out_indices, Sequence), \\\n            f'\"out_indices\" must by a sequence or int, ' \\\n            f'get {type(out_indices)} instead.'\n        for i, index in enumerate(out_indices):\n            if index < 0:\n                out_indices[i] = 7 + index\n                assert out_indices[i] >= 0, f'Invalid out_indices {index}'\n        self.out_indices = out_indices\n        if self.out_indices:\n            for i_layer in self.out_indices:\n                layer = build_norm_layer(norm_cfg,\n                                         embed_dims[(i_layer + 1) // 2])[1]\n                layer_name = f'norm{i_layer}'\n                self.add_module(layer_name, layer)\n\n        self.deploy = deploy\n        if weights:\n            self.load_state_dict(update_weight(self.state_dict(), torch.load(weights)['state_dict']))\n        self.channel = [i.size(1) for i in self.forward(torch.randn(1, 3, 640, 640))]\n\n    def forward_embeddings(self, x):\n        x = self.patch_embed(x)\n        return x\n\n    def forward_tokens(self, x):\n        outs = []\n        for idx, block in enumerate(self.network):\n            x = block(x)\n            if idx in self.out_indices:\n                norm_layer = getattr(self, f'norm{idx}')\n                x_out = norm_layer(x)\n                outs.append(x_out)\n        return outs\n    \n    def forward(self, x):\n        # input embedding\n        x = self.forward_embeddings(x)\n        # through backbone\n        x = self.forward_tokens(x)\n        return x\n\nif __name__ == '__main__':\n    model = RIFormer('s12', 'riformer-s12_32xb128_in1k-384px_20230406-145eda4c.pth')\n    inputs = torch.randn((1, 3, 640, 640))\n    for i in model(inputs):\n        print(i.size())"
  },
  {
    "path": "yolo-improve/yolov5-backbone/RepViT/repvit.py",
    "content": "import torch.nn as nn\nimport numpy as np\nfrom timm.models.layers import SqueezeExcite\nimport torch\n\n__all__ = ['repvit_m0_9', 'repvit_m1_0', 'repvit_m1_1', 'repvit_m1_5', 'repvit_m2_3']\n\ndef replace_batchnorm(net):\n    for child_name, child in net.named_children():\n        if hasattr(child, 'fuse_self'):\n            fused = child.fuse_self()\n            setattr(net, child_name, fused)\n            replace_batchnorm(fused)\n        elif isinstance(child, torch.nn.BatchNorm2d):\n            setattr(net, child_name, torch.nn.Identity())\n        else:\n            replace_batchnorm(child)\n\ndef _make_divisible(v, divisor, min_value=None):\n    \"\"\"\n    This function is taken from the original tf repo.\n    It ensures that all layers have a channel number that is divisible by 8\n    It can be seen here:\n    https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet.py\n    :param v:\n    :param divisor:\n    :param min_value:\n    :return:\n    \"\"\"\n    if min_value is None:\n        min_value = divisor\n    new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)\n    # Make sure that round down does not go down by more than 10%.\n    if new_v < 0.9 * v:\n        new_v += divisor\n    return new_v\n\nclass Conv2d_BN(torch.nn.Sequential):\n    def __init__(self, a, b, ks=1, stride=1, pad=0, dilation=1,\n                 groups=1, bn_weight_init=1, resolution=-10000):\n        super().__init__()\n        self.add_module('c', torch.nn.Conv2d(\n            a, b, ks, stride, pad, dilation, groups, bias=False))\n        self.add_module('bn', torch.nn.BatchNorm2d(b))\n        torch.nn.init.constant_(self.bn.weight, bn_weight_init)\n        torch.nn.init.constant_(self.bn.bias, 0)\n\n    @torch.no_grad()\n    def fuse_self(self):\n        c, bn = self._modules.values()\n        w = bn.weight / (bn.running_var + bn.eps)**0.5\n        w = c.weight * w[:, None, None, None]\n        b = bn.bias - bn.running_mean * bn.weight / \\\n            (bn.running_var + bn.eps)**0.5\n        m = torch.nn.Conv2d(w.size(1) * self.c.groups, w.size(\n            0), w.shape[2:], stride=self.c.stride, padding=self.c.padding, dilation=self.c.dilation, groups=self.c.groups,\n            device=c.weight.device)\n        m.weight.data.copy_(w)\n        m.bias.data.copy_(b)\n        return m\n\nclass Residual(torch.nn.Module):\n    def __init__(self, m, drop=0.):\n        super().__init__()\n        self.m = m\n        self.drop = drop\n\n    def forward(self, x):\n        if self.training and self.drop > 0:\n            return x + self.m(x) * torch.rand(x.size(0), 1, 1, 1,\n                                              device=x.device).ge_(self.drop).div(1 - self.drop).detach()\n        else:\n            return x + self.m(x)\n    \n    @torch.no_grad()\n    def fuse_self(self):\n        if isinstance(self.m, Conv2d_BN):\n            m = self.m.fuse_self()\n            assert(m.groups == m.in_channels)\n            identity = torch.ones(m.weight.shape[0], m.weight.shape[1], 1, 1)\n            identity = torch.nn.functional.pad(identity, [1,1,1,1])\n            m.weight += identity.to(m.weight.device)\n            return m\n        elif isinstance(self.m, torch.nn.Conv2d):\n            m = self.m\n            assert(m.groups != m.in_channels)\n            identity = torch.ones(m.weight.shape[0], m.weight.shape[1], 1, 1)\n            identity = torch.nn.functional.pad(identity, [1,1,1,1])\n            m.weight += identity.to(m.weight.device)\n            return m\n        else:\n            return self\n\nclass RepVGGDW(torch.nn.Module):\n    def __init__(self, ed) -> None:\n        super().__init__()\n        self.conv = Conv2d_BN(ed, ed, 3, 1, 1, groups=ed)\n        self.conv1 = torch.nn.Conv2d(ed, ed, 1, 1, 0, groups=ed)\n        self.dim = ed\n        self.bn = torch.nn.BatchNorm2d(ed)\n    \n    def forward(self, x):\n        return self.bn((self.conv(x) + self.conv1(x)) + x)\n    \n    @torch.no_grad()\n    def fuse_self(self):\n        conv = self.conv.fuse_self()\n        conv1 = self.conv1\n        \n        conv_w = conv.weight\n        conv_b = conv.bias\n        conv1_w = conv1.weight\n        conv1_b = conv1.bias\n        \n        conv1_w = torch.nn.functional.pad(conv1_w, [1,1,1,1])\n\n        identity = torch.nn.functional.pad(torch.ones(conv1_w.shape[0], conv1_w.shape[1], 1, 1, device=conv1_w.device), [1,1,1,1])\n\n        final_conv_w = conv_w + conv1_w + identity\n        final_conv_b = conv_b + conv1_b\n\n        conv.weight.data.copy_(final_conv_w)\n        conv.bias.data.copy_(final_conv_b)\n\n        bn = self.bn\n        w = bn.weight / (bn.running_var + bn.eps)**0.5\n        w = conv.weight * w[:, None, None, None]\n        b = bn.bias + (conv.bias - bn.running_mean) * bn.weight / \\\n            (bn.running_var + bn.eps)**0.5\n        conv.weight.data.copy_(w)\n        conv.bias.data.copy_(b)\n        return conv\n\nclass RepViTBlock(nn.Module):\n    def __init__(self, inp, hidden_dim, oup, kernel_size, stride, use_se, use_hs):\n        super(RepViTBlock, self).__init__()\n        assert stride in [1, 2]\n\n        self.identity = stride == 1 and inp == oup\n        assert(hidden_dim == 2 * inp)\n\n        if stride == 2:\n            self.token_mixer = nn.Sequential(\n                Conv2d_BN(inp, inp, kernel_size, stride, (kernel_size - 1) // 2, groups=inp),\n                SqueezeExcite(inp, 0.25) if use_se else nn.Identity(),\n                Conv2d_BN(inp, oup, ks=1, stride=1, pad=0)\n            )\n            self.channel_mixer = Residual(nn.Sequential(\n                    # pw\n                    Conv2d_BN(oup, 2 * oup, 1, 1, 0),\n                    nn.GELU() if use_hs else nn.GELU(),\n                    # pw-linear\n                    Conv2d_BN(2 * oup, oup, 1, 1, 0, bn_weight_init=0),\n                ))\n        else:\n            assert(self.identity)\n            self.token_mixer = nn.Sequential(\n                RepVGGDW(inp),\n                SqueezeExcite(inp, 0.25) if use_se else nn.Identity(),\n            )\n            self.channel_mixer = Residual(nn.Sequential(\n                    # pw\n                    Conv2d_BN(inp, hidden_dim, 1, 1, 0),\n                    nn.GELU() if use_hs else nn.GELU(),\n                    # pw-linear\n                    Conv2d_BN(hidden_dim, oup, 1, 1, 0, bn_weight_init=0),\n                ))\n\n    def forward(self, x):\n        return self.channel_mixer(self.token_mixer(x))\n\nclass RepViT(nn.Module):\n    def __init__(self, cfgs):\n        super(RepViT, self).__init__()\n        # setting of inverted residual blocks\n        self.cfgs = cfgs\n\n        # building first layer\n        input_channel = self.cfgs[0][2]\n        patch_embed = torch.nn.Sequential(Conv2d_BN(3, input_channel // 2, 3, 2, 1), torch.nn.GELU(),\n                           Conv2d_BN(input_channel // 2, input_channel, 3, 2, 1))\n        layers = [patch_embed]\n        # building inverted residual blocks\n        block = RepViTBlock\n        for k, t, c, use_se, use_hs, s in self.cfgs:\n            output_channel = _make_divisible(c, 8)\n            exp_size = _make_divisible(input_channel * t, 8)\n            layers.append(block(input_channel, exp_size, output_channel, k, s, use_se, use_hs))\n            input_channel = output_channel\n        self.features = nn.ModuleList(layers)\n        self.channel = [i.size(1) for i in self.forward(torch.randn(1, 3, 640, 640))]\n        \n    def forward(self, x):\n        input_size = x.size(2)\n        scale = [4, 8, 16, 32]\n        features = [None, None, None, None]\n        for f in self.features:\n            x = f(x)\n            if input_size // x.size(2) in scale:\n                features[scale.index(input_size // x.size(2))] = x\n        return features\n    \n    def switch_to_deploy(self):\n        replace_batchnorm(self)\n\ndef update_weight(model_dict, weight_dict):\n    idx, temp_dict = 0, {}\n    for k, v in weight_dict.items():\n        # k = k[9:]\n        if k in model_dict.keys() and np.shape(model_dict[k]) == np.shape(v):\n            temp_dict[k] = v\n            idx += 1\n    model_dict.update(temp_dict)\n    print(f'loading weights... {idx}/{len(model_dict)} items')\n    return model_dict\n\ndef repvit_m0_9(weights=''):\n    \"\"\"\n    Constructs a MobileNetV3-Large model\n    \"\"\"\n    cfgs = [\n        # k, t, c, SE, HS, s \n        [3,   2,  48, 1, 0, 1],\n        [3,   2,  48, 0, 0, 1],\n        [3,   2,  48, 0, 0, 1],\n        [3,   2,  96, 0, 0, 2],\n        [3,   2,  96, 1, 0, 1],\n        [3,   2,  96, 0, 0, 1],\n        [3,   2,  96, 0, 0, 1],\n        [3,   2,  192, 0, 1, 2],\n        [3,   2,  192, 1, 1, 1],\n        [3,   2,  192, 0, 1, 1],\n        [3,   2,  192, 1, 1, 1],\n        [3,   2, 192, 0, 1, 1],\n        [3,   2, 192, 1, 1, 1],\n        [3,   2, 192, 0, 1, 1],\n        [3,   2, 192, 1, 1, 1],\n        [3,   2, 192, 0, 1, 1],\n        [3,   2, 192, 1, 1, 1],\n        [3,   2, 192, 0, 1, 1],\n        [3,   2, 192, 1, 1, 1],\n        [3,   2, 192, 0, 1, 1],\n        [3,   2, 192, 1, 1, 1],\n        [3,   2, 192, 0, 1, 1],\n        [3,   2, 192, 0, 1, 1],\n        [3,   2, 384, 0, 1, 2],\n        [3,   2, 384, 1, 1, 1],\n        [3,   2, 384, 0, 1, 1]\n    ]\n    model = RepViT(cfgs)\n    if weights:\n        model.load_state_dict(update_weight(model.state_dict(), torch.load(weights)['model']))\n    return model\n\ndef repvit_m1_0(weights=''):\n    \"\"\"\n    Constructs a MobileNetV3-Large model\n    \"\"\"\n    cfgs = [\n        # k, t, c, SE, HS, s \n        [3,   2,  56, 1, 0, 1],\n        [3,   2,  56, 0, 0, 1],\n        [3,   2,  56, 0, 0, 1],\n        [3,   2,  112, 0, 0, 2],\n        [3,   2,  112, 1, 0, 1],\n        [3,   2,  112, 0, 0, 1],\n        [3,   2,  112, 0, 0, 1],\n        [3,   2,  224, 0, 1, 2],\n        [3,   2,  224, 1, 1, 1],\n        [3,   2,  224, 0, 1, 1],\n        [3,   2,  224, 1, 1, 1],\n        [3,   2, 224, 0, 1, 1],\n        [3,   2, 224, 1, 1, 1],\n        [3,   2, 224, 0, 1, 1],\n        [3,   2, 224, 1, 1, 1],\n        [3,   2, 224, 0, 1, 1],\n        [3,   2, 224, 1, 1, 1],\n        [3,   2, 224, 0, 1, 1],\n        [3,   2, 224, 1, 1, 1],\n        [3,   2, 224, 0, 1, 1],\n        [3,   2, 224, 1, 1, 1],\n        [3,   2, 224, 0, 1, 1],\n        [3,   2, 224, 0, 1, 1],\n        [3,   2, 448, 0, 1, 2],\n        [3,   2, 448, 1, 1, 1],\n        [3,   2, 448, 0, 1, 1]\n    ]\n    model = RepViT(cfgs)\n    if weights:\n        model.load_state_dict(update_weight(model.state_dict(), torch.load(weights)['model']))\n    return model\n\ndef repvit_m1_1(weights=''):\n    \"\"\"\n    Constructs a MobileNetV3-Large model\n    \"\"\"\n    cfgs = [\n        # k, t, c, SE, HS, s \n        [3,   2,  64, 1, 0, 1],\n        [3,   2,  64, 0, 0, 1],\n        [3,   2,  64, 0, 0, 1],\n        [3,   2,  128, 0, 0, 2],\n        [3,   2,  128, 1, 0, 1],\n        [3,   2,  128, 0, 0, 1],\n        [3,   2,  128, 0, 0, 1],\n        [3,   2,  256, 0, 1, 2],\n        [3,   2,  256, 1, 1, 1],\n        [3,   2,  256, 0, 1, 1],\n        [3,   2,  256, 1, 1, 1],\n        [3,   2, 256, 0, 1, 1],\n        [3,   2, 256, 1, 1, 1],\n        [3,   2, 256, 0, 1, 1],\n        [3,   2, 256, 1, 1, 1],\n        [3,   2, 256, 0, 1, 1],\n        [3,   2, 256, 1, 1, 1],\n        [3,   2, 256, 0, 1, 1],\n        [3,   2, 256, 1, 1, 1],\n        [3,   2, 256, 0, 1, 1],\n        [3,   2, 256, 0, 1, 1],\n        [3,   2, 512, 0, 1, 2],\n        [3,   2, 512, 1, 1, 1],\n        [3,   2, 512, 0, 1, 1]\n    ]\n    model = RepViT(cfgs)\n    if weights:\n        model.load_state_dict(update_weight(model.state_dict(), torch.load(weights)['model']))\n    return model\n\ndef repvit_m1_5(weights=''):\n    \"\"\"\n    Constructs a MobileNetV3-Large model\n    \"\"\"\n    cfgs = [\n        # k, t, c, SE, HS, s \n        [3,   2,  64, 1, 0, 1],\n        [3,   2,  64, 0, 0, 1],\n        [3,   2,  64, 1, 0, 1],\n        [3,   2,  64, 0, 0, 1],\n        [3,   2,  64, 0, 0, 1],\n        [3,   2,  128, 0, 0, 2],\n        [3,   2,  128, 1, 0, 1],\n        [3,   2,  128, 0, 0, 1],\n        [3,   2,  128, 1, 0, 1],\n        [3,   2,  128, 0, 0, 1],\n        [3,   2,  128, 0, 0, 1],\n        [3,   2,  256, 0, 1, 2],\n        [3,   2,  256, 1, 1, 1],\n        [3,   2,  256, 0, 1, 1],\n        [3,   2,  256, 1, 1, 1],\n        [3,   2,  256, 0, 1, 1],\n        [3,   2,  256, 1, 1, 1],\n        [3,   2,  256, 0, 1, 1],\n        [3,   2,  256, 1, 1, 1],\n        [3,   2, 256, 0, 1, 1],\n        [3,   2, 256, 1, 1, 1],\n        [3,   2, 256, 0, 1, 1],\n        [3,   2, 256, 1, 1, 1],\n        [3,   2, 256, 0, 1, 1],\n        [3,   2, 256, 1, 1, 1],\n        [3,   2, 256, 0, 1, 1],\n        [3,   2, 256, 1, 1, 1],\n        [3,   2, 256, 0, 1, 1],\n        [3,   2, 256, 1, 1, 1],\n        [3,   2, 256, 0, 1, 1],\n        [3,   2, 256, 1, 1, 1],\n        [3,   2, 256, 0, 1, 1],\n        [3,   2, 256, 1, 1, 1],\n        [3,   2, 256, 0, 1, 1],\n        [3,   2, 256, 1, 1, 1],\n        [3,   2, 256, 0, 1, 1],\n        [3,   2, 256, 0, 1, 1],\n        [3,   2, 512, 0, 1, 2],\n        [3,   2, 512, 1, 1, 1],\n        [3,   2, 512, 0, 1, 1],\n        [3,   2, 512, 1, 1, 1],\n        [3,   2, 512, 0, 1, 1]\n    ]\n    model = RepViT(cfgs)\n    if weights:\n        model.load_state_dict(update_weight(model.state_dict(), torch.load(weights)['model']))\n    return model\n\ndef repvit_m2_3(weights=''):\n    \"\"\"\n    Constructs a MobileNetV3-Large model\n    \"\"\"\n    cfgs = [\n        # k, t, c, SE, HS, s \n        [3,   2,  80, 1, 0, 1],\n        [3,   2,  80, 0, 0, 1],\n        [3,   2,  80, 1, 0, 1],\n        [3,   2,  80, 0, 0, 1],\n        [3,   2,  80, 1, 0, 1],\n        [3,   2,  80, 0, 0, 1],\n        [3,   2,  80, 0, 0, 1],\n        [3,   2,  160, 0, 0, 2],\n        [3,   2,  160, 1, 0, 1],\n        [3,   2,  160, 0, 0, 1],\n        [3,   2,  160, 1, 0, 1],\n        [3,   2,  160, 0, 0, 1],\n        [3,   2,  160, 1, 0, 1],\n        [3,   2,  160, 0, 0, 1],\n        [3,   2,  160, 0, 0, 1],\n        [3,   2,  320, 0, 1, 2],\n        [3,   2,  320, 1, 1, 1],\n        [3,   2,  320, 0, 1, 1],\n        [3,   2,  320, 1, 1, 1],\n        [3,   2,  320, 0, 1, 1],\n        [3,   2,  320, 1, 1, 1],\n        [3,   2,  320, 0, 1, 1],\n        [3,   2,  320, 1, 1, 1],\n        [3,   2, 320, 0, 1, 1],\n        [3,   2, 320, 1, 1, 1],\n        [3,   2, 320, 0, 1, 1],\n        [3,   2, 320, 1, 1, 1],\n        [3,   2, 320, 0, 1, 1],\n        [3,   2, 320, 1, 1, 1],\n        [3,   2, 320, 0, 1, 1],\n        [3,   2, 320, 1, 1, 1],\n        [3,   2, 320, 0, 1, 1],\n        [3,   2, 320, 1, 1, 1],\n        [3,   2, 320, 0, 1, 1],\n        [3,   2, 320, 1, 1, 1],\n        [3,   2, 320, 0, 1, 1],\n        [3,   2, 320, 1, 1, 1],\n        [3,   2, 320, 0, 1, 1],\n        [3,   2, 320, 1, 1, 1],\n        [3,   2, 320, 0, 1, 1],\n        [3,   2, 320, 1, 1, 1],\n        [3,   2, 320, 0, 1, 1],\n        [3,   2, 320, 1, 1, 1],\n        [3,   2, 320, 0, 1, 1],\n        [3,   2, 320, 1, 1, 1],\n        [3,   2, 320, 0, 1, 1],\n        [3,   2, 320, 1, 1, 1],\n        [3,   2, 320, 0, 1, 1],\n        [3,   2, 320, 1, 1, 1],\n        [3,   2, 320, 0, 1, 1],\n        # [3,   2, 320, 1, 1, 1],\n        # [3,   2, 320, 0, 1, 1],\n        [3,   2, 320, 0, 1, 1],\n        [3,   2, 640, 0, 1, 2],\n        [3,   2, 640, 1, 1, 1],\n        [3,   2, 640, 0, 1, 1],\n        # [3,   2, 640, 1, 1, 1],\n        # [3,   2, 640, 0, 1, 1]\n    ]\n    model = RepViT(cfgs)\n    if weights:\n        model.load_state_dict(update_weight(model.state_dict(), torch.load(weights)['model']))\n    return model\n\nif __name__ == '__main__':\n    model = repvit_m2_3('repvit_m2_3_distill_450e.pth')\n    inputs = torch.randn((1, 3, 640, 640))\n    res = model(inputs)\n    for i in res:\n        print(i.size())"
  },
  {
    "path": "yolo-improve/yolov5-backbone/SwinTransformer/SwinTransformer.py",
    "content": "# --------------------------------------------------------\n# Swin Transformer\n# Copyright (c) 2021 Microsoft\n# Licensed under The MIT License [see LICENSE for details]\n# Written by Ze Liu, Yutong Lin, Yixuan Wei\n# --------------------------------------------------------\n\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport torch.utils.checkpoint as checkpoint\nimport numpy as np\nfrom timm.models.layers import DropPath, to_2tuple, trunc_normal_\n\n__all__ = ['SwinTransformer_Tiny']\n\nclass Mlp(nn.Module):\n    \"\"\" Multilayer perceptron.\"\"\"\n\n    def __init__(self, in_features, hidden_features=None, out_features=None, act_layer=nn.GELU, drop=0.):\n        super().__init__()\n        out_features = out_features or in_features\n        hidden_features = hidden_features or in_features\n        self.fc1 = nn.Linear(in_features, hidden_features)\n        self.act = act_layer()\n        self.fc2 = nn.Linear(hidden_features, out_features)\n        self.drop = nn.Dropout(drop)\n\n    def forward(self, x):\n        x = self.fc1(x)\n        x = self.act(x)\n        x = self.drop(x)\n        x = self.fc2(x)\n        x = self.drop(x)\n        return x\n\n\ndef window_partition(x, window_size):\n    \"\"\"\n    Args:\n        x: (B, H, W, C)\n        window_size (int): window size\n\n    Returns:\n        windows: (num_windows*B, window_size, window_size, C)\n    \"\"\"\n    B, H, W, C = x.shape\n    x = x.view(B, H // window_size, window_size, W // window_size, window_size, C)\n    windows = x.permute(0, 1, 3, 2, 4, 5).contiguous().view(-1, window_size, window_size, C)\n    return windows\n\n\ndef window_reverse(windows, window_size, H, W):\n    \"\"\"\n    Args:\n        windows: (num_windows*B, window_size, window_size, C)\n        window_size (int): Window size\n        H (int): Height of image\n        W (int): Width of image\n\n    Returns:\n        x: (B, H, W, C)\n    \"\"\"\n    B = int(windows.shape[0] / (H * W / window_size / window_size))\n    x = windows.view(B, H // window_size, W // window_size, window_size, window_size, -1)\n    x = x.permute(0, 1, 3, 2, 4, 5).contiguous().view(B, H, W, -1)\n    return x\n\n\nclass WindowAttention(nn.Module):\n    \"\"\" Window based multi-head self attention (W-MSA) module with relative position bias.\n    It supports both of shifted and non-shifted window.\n\n    Args:\n        dim (int): Number of input channels.\n        window_size (tuple[int]): The height and width of the window.\n        num_heads (int): Number of attention heads.\n        qkv_bias (bool, optional):  If True, add a learnable bias to query, key, value. Default: True\n        qk_scale (float | None, optional): Override default qk scale of head_dim ** -0.5 if set\n        attn_drop (float, optional): Dropout ratio of attention weight. Default: 0.0\n        proj_drop (float, optional): Dropout ratio of output. Default: 0.0\n    \"\"\"\n\n    def __init__(self, dim, window_size, num_heads, qkv_bias=True, qk_scale=None, attn_drop=0., proj_drop=0.):\n\n        super().__init__()\n        self.dim = dim\n        self.window_size = window_size  # Wh, Ww\n        self.num_heads = num_heads\n        head_dim = dim // num_heads\n        self.scale = qk_scale or head_dim ** -0.5\n\n        # define a parameter table of relative position bias\n        self.relative_position_bias_table = nn.Parameter(\n            torch.zeros((2 * window_size[0] - 1) * (2 * window_size[1] - 1), num_heads))  # 2*Wh-1 * 2*Ww-1, nH\n\n        # get pair-wise relative position index for each token inside the window\n        coords_h = torch.arange(self.window_size[0])\n        coords_w = torch.arange(self.window_size[1])\n        coords = torch.stack(torch.meshgrid([coords_h, coords_w]))  # 2, Wh, Ww\n        coords_flatten = torch.flatten(coords, 1)  # 2, Wh*Ww\n        relative_coords = coords_flatten[:, :, None] - coords_flatten[:, None, :]  # 2, Wh*Ww, Wh*Ww\n        relative_coords = relative_coords.permute(1, 2, 0).contiguous()  # Wh*Ww, Wh*Ww, 2\n        relative_coords[:, :, 0] += self.window_size[0] - 1  # shift to start from 0\n        relative_coords[:, :, 1] += self.window_size[1] - 1\n        relative_coords[:, :, 0] *= 2 * self.window_size[1] - 1\n        relative_position_index = relative_coords.sum(-1)  # Wh*Ww, Wh*Ww\n        self.register_buffer(\"relative_position_index\", relative_position_index)\n\n        self.qkv = nn.Linear(dim, dim * 3, bias=qkv_bias)\n        self.attn_drop = nn.Dropout(attn_drop)\n        self.proj = nn.Linear(dim, dim)\n        self.proj_drop = nn.Dropout(proj_drop)\n\n        trunc_normal_(self.relative_position_bias_table, std=.02)\n        self.softmax = nn.Softmax(dim=-1)\n\n    def forward(self, x, mask=None):\n        \"\"\" Forward function.\n\n        Args:\n            x: input features with shape of (num_windows*B, N, C)\n            mask: (0/-inf) mask with shape of (num_windows, Wh*Ww, Wh*Ww) or None\n        \"\"\"\n        B_, N, C = x.shape\n        qkv = self.qkv(x).reshape(B_, N, 3, self.num_heads, C // self.num_heads).permute(2, 0, 3, 1, 4)\n        q, k, v = qkv[0], qkv[1], qkv[2]  # make torchscript happy (cannot use tensor as tuple)\n\n        q = q * self.scale\n        attn = (q @ k.transpose(-2, -1))\n\n        relative_position_bias = self.relative_position_bias_table[self.relative_position_index.view(-1)].view(\n            self.window_size[0] * self.window_size[1], self.window_size[0] * self.window_size[1], -1)  # Wh*Ww,Wh*Ww,nH\n        relative_position_bias = relative_position_bias.permute(2, 0, 1).contiguous()  # nH, Wh*Ww, Wh*Ww\n        attn = attn + relative_position_bias.unsqueeze(0)\n\n        if mask is not None:\n            nW = mask.shape[0]\n            attn = attn.view(B_ // nW, nW, self.num_heads, N, N) + mask.unsqueeze(1).unsqueeze(0)\n            attn = attn.view(-1, self.num_heads, N, N)\n            attn = self.softmax(attn)\n        else:\n            attn = self.softmax(attn)\n\n        attn = self.attn_drop(attn)\n\n        x = (attn @ v).transpose(1, 2).reshape(B_, N, C)\n        x = self.proj(x)\n        x = self.proj_drop(x)\n        return x\n\n\nclass SwinTransformerBlock(nn.Module):\n    \"\"\" Swin Transformer Block.\n\n    Args:\n        dim (int): Number of input channels.\n        num_heads (int): Number of attention heads.\n        window_size (int): Window size.\n        shift_size (int): Shift size for SW-MSA.\n        mlp_ratio (float): Ratio of mlp hidden dim to embedding dim.\n        qkv_bias (bool, optional): If True, add a learnable bias to query, key, value. Default: True\n        qk_scale (float | None, optional): Override default qk scale of head_dim ** -0.5 if set.\n        drop (float, optional): Dropout rate. Default: 0.0\n        attn_drop (float, optional): Attention dropout rate. Default: 0.0\n        drop_path (float, optional): Stochastic depth rate. Default: 0.0\n        act_layer (nn.Module, optional): Activation layer. Default: nn.GELU\n        norm_layer (nn.Module, optional): Normalization layer.  Default: nn.LayerNorm\n    \"\"\"\n\n    def __init__(self, dim, num_heads, window_size=7, shift_size=0,\n                 mlp_ratio=4., qkv_bias=True, qk_scale=None, drop=0., attn_drop=0., drop_path=0.,\n                 act_layer=nn.GELU, norm_layer=nn.LayerNorm):\n        super().__init__()\n        self.dim = dim\n        self.num_heads = num_heads\n        self.window_size = window_size\n        self.shift_size = shift_size\n        self.mlp_ratio = mlp_ratio\n        assert 0 <= self.shift_size < self.window_size, \"shift_size must in 0-window_size\"\n\n        self.norm1 = norm_layer(dim)\n        self.attn = WindowAttention(\n            dim, window_size=to_2tuple(self.window_size), num_heads=num_heads,\n            qkv_bias=qkv_bias, qk_scale=qk_scale, attn_drop=attn_drop, proj_drop=drop)\n\n        self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity()\n        self.norm2 = norm_layer(dim)\n        mlp_hidden_dim = int(dim * mlp_ratio)\n        self.mlp = Mlp(in_features=dim, hidden_features=mlp_hidden_dim, act_layer=act_layer, drop=drop)\n\n        self.H = None\n        self.W = None\n\n    def forward(self, x, mask_matrix):\n        \"\"\" Forward function.\n\n        Args:\n            x: Input feature, tensor size (B, H*W, C).\n            H, W: Spatial resolution of the input feature.\n            mask_matrix: Attention mask for cyclic shift.\n        \"\"\"\n        B, L, C = x.shape\n        H, W = self.H, self.W\n        assert L == H * W, \"input feature has wrong size\"\n\n        shortcut = x\n        x = self.norm1(x)\n        x = x.view(B, H, W, C)\n\n        # pad feature maps to multiples of window size\n        pad_l = pad_t = 0\n        pad_r = (self.window_size - W % self.window_size) % self.window_size\n        pad_b = (self.window_size - H % self.window_size) % self.window_size\n        x = F.pad(x, (0, 0, pad_l, pad_r, pad_t, pad_b))\n        _, Hp, Wp, _ = x.shape\n\n        # cyclic shift\n        if self.shift_size > 0:\n            shifted_x = torch.roll(x, shifts=(-self.shift_size, -self.shift_size), dims=(1, 2))\n            attn_mask = mask_matrix.type(x.dtype)\n        else:\n            shifted_x = x\n            attn_mask = None\n\n        # partition windows\n        x_windows = window_partition(shifted_x, self.window_size)  # nW*B, window_size, window_size, C\n        x_windows = x_windows.view(-1, self.window_size * self.window_size, C)  # nW*B, window_size*window_size, C\n\n        # W-MSA/SW-MSA\n        attn_windows = self.attn(x_windows, mask=attn_mask)  # nW*B, window_size*window_size, C\n\n        # merge windows\n        attn_windows = attn_windows.view(-1, self.window_size, self.window_size, C)\n        shifted_x = window_reverse(attn_windows, self.window_size, Hp, Wp)  # B H' W' C\n\n        # reverse cyclic shift\n        if self.shift_size > 0:\n            x = torch.roll(shifted_x, shifts=(self.shift_size, self.shift_size), dims=(1, 2))\n        else:\n            x = shifted_x\n\n        if pad_r > 0 or pad_b > 0:\n            x = x[:, :H, :W, :].contiguous()\n\n        x = x.view(B, H * W, C)\n\n        # FFN\n        x = shortcut + self.drop_path(x)\n        x = x + self.drop_path(self.mlp(self.norm2(x)))\n\n        return x\n\n\nclass PatchMerging(nn.Module):\n    \"\"\" Patch Merging Layer\n\n    Args:\n        dim (int): Number of input channels.\n        norm_layer (nn.Module, optional): Normalization layer.  Default: nn.LayerNorm\n    \"\"\"\n    def __init__(self, dim, norm_layer=nn.LayerNorm):\n        super().__init__()\n        self.dim = dim\n        self.reduction = nn.Linear(4 * dim, 2 * dim, bias=False)\n        self.norm = norm_layer(4 * dim)\n\n    def forward(self, x, H, W):\n        \"\"\" Forward function.\n\n        Args:\n            x: Input feature, tensor size (B, H*W, C).\n            H, W: Spatial resolution of the input feature.\n        \"\"\"\n        B, L, C = x.shape\n        assert L == H * W, \"input feature has wrong size\"\n\n        x = x.view(B, H, W, C)\n\n        # padding\n        pad_input = (H % 2 == 1) or (W % 2 == 1)\n        if pad_input:\n            x = F.pad(x, (0, 0, 0, W % 2, 0, H % 2))\n\n        x0 = x[:, 0::2, 0::2, :]  # B H/2 W/2 C\n        x1 = x[:, 1::2, 0::2, :]  # B H/2 W/2 C\n        x2 = x[:, 0::2, 1::2, :]  # B H/2 W/2 C\n        x3 = x[:, 1::2, 1::2, :]  # B H/2 W/2 C\n        x = torch.cat([x0, x1, x2, x3], -1)  # B H/2 W/2 4*C\n        x = x.view(B, -1, 4 * C)  # B H/2*W/2 4*C\n\n        x = self.norm(x)\n        x = self.reduction(x)\n\n        return x\n\n\nclass BasicLayer(nn.Module):\n    \"\"\" A basic Swin Transformer layer for one stage.\n\n    Args:\n        dim (int): Number of feature channels\n        depth (int): Depths of this stage.\n        num_heads (int): Number of attention head.\n        window_size (int): Local window size. Default: 7.\n        mlp_ratio (float): Ratio of mlp hidden dim to embedding dim. Default: 4.\n        qkv_bias (bool, optional): If True, add a learnable bias to query, key, value. Default: True\n        qk_scale (float | None, optional): Override default qk scale of head_dim ** -0.5 if set.\n        drop (float, optional): Dropout rate. Default: 0.0\n        attn_drop (float, optional): Attention dropout rate. Default: 0.0\n        drop_path (float | tuple[float], optional): Stochastic depth rate. Default: 0.0\n        norm_layer (nn.Module, optional): Normalization layer. Default: nn.LayerNorm\n        downsample (nn.Module | None, optional): Downsample layer at the end of the layer. Default: None\n        use_checkpoint (bool): Whether to use checkpointing to save memory. Default: False.\n    \"\"\"\n\n    def __init__(self,\n                 dim,\n                 depth,\n                 num_heads,\n                 window_size=7,\n                 mlp_ratio=4.,\n                 qkv_bias=True,\n                 qk_scale=None,\n                 drop=0.,\n                 attn_drop=0.,\n                 drop_path=0.,\n                 norm_layer=nn.LayerNorm,\n                 downsample=None,\n                 use_checkpoint=False):\n        super().__init__()\n        self.window_size = window_size\n        self.shift_size = window_size // 2\n        self.depth = depth\n        self.use_checkpoint = use_checkpoint\n\n        # build blocks\n        self.blocks = nn.ModuleList([\n            SwinTransformerBlock(\n                dim=dim,\n                num_heads=num_heads,\n                window_size=window_size,\n                shift_size=0 if (i % 2 == 0) else window_size // 2,\n                mlp_ratio=mlp_ratio,\n                qkv_bias=qkv_bias,\n                qk_scale=qk_scale,\n                drop=drop,\n                attn_drop=attn_drop,\n                drop_path=drop_path[i] if isinstance(drop_path, list) else drop_path,\n                norm_layer=norm_layer)\n            for i in range(depth)])\n\n        # patch merging layer\n        if downsample is not None:\n            self.downsample = downsample(dim=dim, norm_layer=norm_layer)\n        else:\n            self.downsample = None\n\n    def forward(self, x, H, W):\n        \"\"\" Forward function.\n\n        Args:\n            x: Input feature, tensor size (B, H*W, C).\n            H, W: Spatial resolution of the input feature.\n        \"\"\"\n\n        # calculate attention mask for SW-MSA\n        Hp = int(np.ceil(H / self.window_size)) * self.window_size\n        Wp = int(np.ceil(W / self.window_size)) * self.window_size\n        img_mask = torch.zeros((1, Hp, Wp, 1), device=x.device)  # 1 Hp Wp 1\n        h_slices = (slice(0, -self.window_size),\n                    slice(-self.window_size, -self.shift_size),\n                    slice(-self.shift_size, None))\n        w_slices = (slice(0, -self.window_size),\n                    slice(-self.window_size, -self.shift_size),\n                    slice(-self.shift_size, None))\n        cnt = 0\n        for h in h_slices:\n            for w in w_slices:\n                img_mask[:, h, w, :] = cnt\n                cnt += 1\n\n        mask_windows = window_partition(img_mask, self.window_size)  # nW, window_size, window_size, 1\n        mask_windows = mask_windows.view(-1, self.window_size * self.window_size)\n        attn_mask = mask_windows.unsqueeze(1) - mask_windows.unsqueeze(2)\n        attn_mask = attn_mask.masked_fill(attn_mask != 0, float(-100.0)).masked_fill(attn_mask == 0, float(0.0))\n\n        for blk in self.blocks:\n            blk.H, blk.W = H, W\n            if self.use_checkpoint:\n                x = checkpoint.checkpoint(blk, x, attn_mask)\n            else:\n                x = blk(x, attn_mask)\n        if self.downsample is not None:\n            x_down = self.downsample(x, H, W)\n            Wh, Ww = (H + 1) // 2, (W + 1) // 2\n            return x, H, W, x_down, Wh, Ww\n        else:\n            return x, H, W, x, H, W\n\n\nclass PatchEmbed(nn.Module):\n    \"\"\" Image to Patch Embedding\n\n    Args:\n        patch_size (int): Patch token size. Default: 4.\n        in_chans (int): Number of input image channels. Default: 3.\n        embed_dim (int): Number of linear projection output channels. Default: 96.\n        norm_layer (nn.Module, optional): Normalization layer. Default: None\n    \"\"\"\n\n    def __init__(self, patch_size=4, in_chans=3, embed_dim=96, norm_layer=None):\n        super().__init__()\n        patch_size = to_2tuple(patch_size)\n        self.patch_size = patch_size\n\n        self.in_chans = in_chans\n        self.embed_dim = embed_dim\n\n        self.proj = nn.Conv2d(in_chans, embed_dim, kernel_size=patch_size, stride=patch_size)\n        if norm_layer is not None:\n            self.norm = norm_layer(embed_dim)\n        else:\n            self.norm = None\n\n    def forward(self, x):\n        \"\"\"Forward function.\"\"\"\n        # padding\n        _, _, H, W = x.size()\n        if W % self.patch_size[1] != 0:\n            x = F.pad(x, (0, self.patch_size[1] - W % self.patch_size[1]))\n        if H % self.patch_size[0] != 0:\n            x = F.pad(x, (0, 0, 0, self.patch_size[0] - H % self.patch_size[0]))\n\n        x = self.proj(x)  # B C Wh Ww\n        if self.norm is not None:\n            Wh, Ww = x.size(2), x.size(3)\n            x = x.flatten(2).transpose(1, 2)\n            x = self.norm(x)\n            x = x.transpose(1, 2).view(-1, self.embed_dim, Wh, Ww)\n\n        return x\n\nclass SwinTransformer(nn.Module):\n    \"\"\" Swin Transformer backbone.\n        A PyTorch impl of : `Swin Transformer: Hierarchical Vision Transformer using Shifted Windows`  -\n          https://arxiv.org/pdf/2103.14030\n\n    Args:\n        pretrain_img_size (int): Input image size for training the pretrained model,\n            used in absolute postion embedding. Default 224.\n        patch_size (int | tuple(int)): Patch size. Default: 4.\n        in_chans (int): Number of input image channels. Default: 3.\n        embed_dim (int): Number of linear projection output channels. Default: 96.\n        depths (tuple[int]): Depths of each Swin Transformer stage.\n        num_heads (tuple[int]): Number of attention head of each stage.\n        window_size (int): Window size. Default: 7.\n        mlp_ratio (float): Ratio of mlp hidden dim to embedding dim. Default: 4.\n        qkv_bias (bool): If True, add a learnable bias to query, key, value. Default: True\n        qk_scale (float): Override default qk scale of head_dim ** -0.5 if set.\n        drop_rate (float): Dropout rate.\n        attn_drop_rate (float): Attention dropout rate. Default: 0.\n        drop_path_rate (float): Stochastic depth rate. Default: 0.2.\n        norm_layer (nn.Module): Normalization layer. Default: nn.LayerNorm.\n        ape (bool): If True, add absolute position embedding to the patch embedding. Default: False.\n        patch_norm (bool): If True, add normalization after patch embedding. Default: True.\n        out_indices (Sequence[int]): Output from which stages.\n        frozen_stages (int): Stages to be frozen (stop grad and set eval mode).\n            -1 means not freezing any parameters.\n        use_checkpoint (bool): Whether to use checkpointing to save memory. Default: False.\n    \"\"\"\n\n    def __init__(self,\n                 pretrain_img_size=224,\n                 patch_size=4,\n                 in_chans=3,\n                 embed_dim=96,\n                 depths=[2, 2, 6, 2],\n                 num_heads=[3, 6, 12, 24],\n                 window_size=7,\n                 mlp_ratio=4.,\n                 qkv_bias=True,\n                 qk_scale=None,\n                 drop_rate=0.,\n                 attn_drop_rate=0.,\n                 drop_path_rate=0.2,\n                 norm_layer=nn.LayerNorm,\n                 ape=False,\n                 patch_norm=True,\n                 out_indices=(0, 1, 2, 3),\n                 frozen_stages=-1,\n                 use_checkpoint=False):\n        super().__init__()\n\n        self.pretrain_img_size = pretrain_img_size\n        self.num_layers = len(depths)\n        self.embed_dim = embed_dim\n        self.ape = ape\n        self.patch_norm = patch_norm\n        self.out_indices = out_indices\n        self.frozen_stages = frozen_stages\n\n        # split image into non-overlapping patches\n        self.patch_embed = PatchEmbed(\n            patch_size=patch_size, in_chans=in_chans, embed_dim=embed_dim,\n            norm_layer=norm_layer if self.patch_norm else None)\n\n        # absolute position embedding\n        if self.ape:\n            pretrain_img_size = to_2tuple(pretrain_img_size)\n            patch_size = to_2tuple(patch_size)\n            patches_resolution = [pretrain_img_size[0] // patch_size[0], pretrain_img_size[1] // patch_size[1]]\n\n            self.absolute_pos_embed = nn.Parameter(torch.zeros(1, embed_dim, patches_resolution[0], patches_resolution[1]))\n            trunc_normal_(self.absolute_pos_embed, std=.02)\n\n        self.pos_drop = nn.Dropout(p=drop_rate)\n\n        # stochastic depth\n        dpr = [x.item() for x in torch.linspace(0, drop_path_rate, sum(depths))]  # stochastic depth decay rule\n\n        # build layers\n        self.layers = nn.ModuleList()\n        for i_layer in range(self.num_layers):\n            layer = BasicLayer(\n                dim=int(embed_dim * 2 ** i_layer),\n                depth=depths[i_layer],\n                num_heads=num_heads[i_layer],\n                window_size=window_size,\n                mlp_ratio=mlp_ratio,\n                qkv_bias=qkv_bias,\n                qk_scale=qk_scale,\n                drop=drop_rate,\n                attn_drop=attn_drop_rate,\n                drop_path=dpr[sum(depths[:i_layer]):sum(depths[:i_layer + 1])],\n                norm_layer=norm_layer,\n                downsample=PatchMerging if (i_layer < self.num_layers - 1) else None,\n                use_checkpoint=use_checkpoint)\n            self.layers.append(layer)\n\n        num_features = [int(embed_dim * 2 ** i) for i in range(self.num_layers)]\n        self.num_features = num_features\n\n        # add a norm layer for each output\n        for i_layer in out_indices:\n            layer = norm_layer(num_features[i_layer])\n            layer_name = f'norm{i_layer}'\n            self.add_module(layer_name, layer)\n        self.channel = [i.size(1) for i in self.forward(torch.randn(1, 3, 640, 640))]\n\n    def forward(self, x):\n        \"\"\"Forward function.\"\"\"\n        x = self.patch_embed(x)\n\n        Wh, Ww = x.size(2), x.size(3)\n        if self.ape:\n            # interpolate the position embedding to the corresponding size\n            absolute_pos_embed = F.interpolate(self.absolute_pos_embed, size=(Wh, Ww), mode='bicubic')\n            x = (x + absolute_pos_embed).flatten(2).transpose(1, 2)  # B Wh*Ww C\n        else:\n            x = x.flatten(2).transpose(1, 2)\n        x = self.pos_drop(x)\n\n        outs = []\n        for i in range(self.num_layers):\n            layer = self.layers[i]\n            x_out, H, W, x, Wh, Ww = layer(x, Wh, Ww)\n\n            if i in self.out_indices:\n                norm_layer = getattr(self, f'norm{i}')\n                x_out = norm_layer(x_out)\n\n                out = x_out.view(-1, H, W, self.num_features[i]).permute(0, 3, 1, 2).contiguous()\n                outs.append(out)\n\n        return outs\n\ndef update_weight(model_dict, weight_dict):\n    idx, temp_dict = 0, {}\n    for k, v in weight_dict.items():\n        if k in model_dict.keys() and np.shape(model_dict[k]) == np.shape(v):\n            temp_dict[k] = v\n            idx += 1\n    model_dict.update(temp_dict)\n    print(f'loading weights... {idx}/{len(model_dict)} items')\n    return model_dict\n\ndef SwinTransformer_Tiny(weights=''):\n    model = SwinTransformer(depths=[2, 2, 6, 2], num_heads=[3, 6, 12, 24])\n    if weights:\n        model.load_state_dict(update_weight(model.state_dict(), torch.load(weights)['model']))\n    return model\n\nif __name__ == '__main__':\n    device = torch.device('cuda:0')\n    model = SwinTransformer().to(device)\n    model.half()\n    # model.load_state_dict(update_weight(model.state_dict(), torch.load('swin_tiny_patch4_window7_224_22k.pth')['model']))\n    inputs = torch.randn((1, 3, 640, 512)).to(device).half()\n    res = model(inputs)\n    for i in res:\n        print(i.size())\n    print(model.channel)"
  },
  {
    "path": "yolo-improve/yolov5-backbone/UniRepLKNet/unireplknet.py",
    "content": "# UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio, Video, Point Cloud, Time-Series and Image Recognition\n# Github source: https://github.com/AILab-CVC/UniRepLKNet\n# Licensed under The Apache License 2.0 License [see LICENSE for details]\n# Based on RepLKNet, ConvNeXt, timm, DINO and DeiT code bases\n# https://github.com/DingXiaoH/RepLKNet-pytorch\n# https://github.com/facebookresearch/ConvNeXt\n# https://github.com/rwightman/pytorch-image-models/tree/master/timm\n# https://github.com/facebookresearch/deit/\n# https://github.com/facebookresearch/dino\n# --------------------------------------------------------'\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nfrom timm.layers import trunc_normal_, DropPath, to_2tuple\nfrom functools import partial\nimport torch.utils.checkpoint as checkpoint\nimport numpy as np\n\n__all__ = ['unireplknet_a', 'unireplknet_f', 'unireplknet_p', 'unireplknet_n', 'unireplknet_t', 'unireplknet_s', 'unireplknet_b', 'unireplknet_l', 'unireplknet_xl']\n\nclass GRNwithNHWC(nn.Module):\n    \"\"\" GRN (Global Response Normalization) layer\n    Originally proposed in ConvNeXt V2 (https://arxiv.org/abs/2301.00808)\n    This implementation is more efficient than the original (https://github.com/facebookresearch/ConvNeXt-V2)\n    We assume the inputs to this layer are (N, H, W, C)\n    \"\"\"\n    def __init__(self, dim, use_bias=True):\n        super().__init__()\n        self.use_bias = use_bias\n        self.gamma = nn.Parameter(torch.zeros(1, 1, 1, dim))\n        if self.use_bias:\n            self.beta = nn.Parameter(torch.zeros(1, 1, 1, dim))\n\n    def forward(self, x):\n        Gx = torch.norm(x, p=2, dim=(1, 2), keepdim=True)\n        Nx = Gx / (Gx.mean(dim=-1, keepdim=True) + 1e-6)\n        if self.use_bias:\n            return (self.gamma * Nx + 1) * x + self.beta\n        else:\n            return (self.gamma * Nx + 1) * x\n\n\nclass NCHWtoNHWC(nn.Module):\n    def __init__(self):\n        super().__init__()\n\n    def forward(self, x):\n        return x.permute(0, 2, 3, 1)\n\n\nclass NHWCtoNCHW(nn.Module):\n    def __init__(self):\n        super().__init__()\n\n    def forward(self, x):\n        return x.permute(0, 3, 1, 2)\n\n#================== This function decides which conv implementation (the native or iGEMM) to use\n#   Note that iGEMM large-kernel conv impl will be used if\n#       -   you attempt to do so (attempt_to_use_large_impl=True), and\n#       -   it has been installed (follow https://github.com/AILab-CVC/UniRepLKNet), and\n#       -   the conv layer is depth-wise, stride = 1, non-dilated, kernel_size > 5, and padding == kernel_size // 2\ndef get_conv2d(in_channels, out_channels, kernel_size, stride, padding, dilation, groups, bias,\n               attempt_use_lk_impl=True):\n    kernel_size = to_2tuple(kernel_size)\n    if padding is None:\n        padding = (kernel_size[0] // 2, kernel_size[1] // 2)\n    else:\n        padding = to_2tuple(padding)\n    need_large_impl = kernel_size[0] == kernel_size[1] and kernel_size[0] > 5 and padding == (kernel_size[0] // 2, kernel_size[1] // 2)\n\n    # if attempt_use_lk_impl and need_large_impl:\n    #     print('---------------- trying to import iGEMM implementation for large-kernel conv')\n    #     try:\n    #         from depthwise_conv2d_implicit_gemm import DepthWiseConv2dImplicitGEMM\n    #         print('---------------- found iGEMM implementation ')\n    #     except:\n    #         DepthWiseConv2dImplicitGEMM = None\n    #         print('---------------- found no iGEMM. use original conv. follow https://github.com/AILab-CVC/UniRepLKNet to install it.')\n    #     if DepthWiseConv2dImplicitGEMM is not None and need_large_impl and in_channels == out_channels \\\n    #             and out_channels == groups and stride == 1 and dilation == 1:\n    #         print(f'===== iGEMM Efficient Conv Impl, channels {in_channels}, kernel size {kernel_size} =====')\n    #         return DepthWiseConv2dImplicitGEMM(in_channels, kernel_size, bias=bias)\n    return nn.Conv2d(in_channels=in_channels, out_channels=out_channels, kernel_size=kernel_size, stride=stride,\n                     padding=padding, dilation=dilation, groups=groups, bias=bias)\n\n\ndef get_bn(dim, use_sync_bn=False):\n    if use_sync_bn:\n        return nn.SyncBatchNorm(dim)\n    else:\n        return nn.BatchNorm2d(dim)\n\nclass SEBlock(nn.Module):\n    \"\"\"\n    Squeeze-and-Excitation Block proposed in SENet (https://arxiv.org/abs/1709.01507)\n    We assume the inputs to this layer are (N, C, H, W)\n    \"\"\"\n    def __init__(self, input_channels, internal_neurons):\n        super(SEBlock, self).__init__()\n        self.down = nn.Conv2d(in_channels=input_channels, out_channels=internal_neurons,\n                              kernel_size=1, stride=1, bias=True)\n        self.up = nn.Conv2d(in_channels=internal_neurons, out_channels=input_channels,\n                            kernel_size=1, stride=1, bias=True)\n        self.input_channels = input_channels\n        self.nonlinear = nn.ReLU(inplace=True)\n\n    def forward(self, inputs):\n        x = F.adaptive_avg_pool2d(inputs, output_size=(1, 1))\n        x = self.down(x)\n        x = self.nonlinear(x)\n        x = self.up(x)\n        x = F.sigmoid(x)\n        return inputs * x.view(-1, self.input_channels, 1, 1)\n\ndef fuse_bn(conv, bn):\n    conv_bias = 0 if conv.bias is None else conv.bias\n    std = (bn.running_var + bn.eps).sqrt()\n    return conv.weight * (bn.weight / std).reshape(-1, 1, 1, 1), bn.bias + (conv_bias - bn.running_mean) * bn.weight / std\n\ndef convert_dilated_to_nondilated(kernel, dilate_rate):\n    identity_kernel = torch.ones((1, 1, 1, 1)).to(kernel.device)\n    if kernel.size(1) == 1:\n        #   This is a DW kernel\n        dilated = F.conv_transpose2d(kernel, identity_kernel, stride=dilate_rate)\n        return dilated\n    else:\n        #   This is a dense or group-wise (but not DW) kernel\n        slices = []\n        for i in range(kernel.size(1)):\n            dilated = F.conv_transpose2d(kernel[:,i:i+1,:,:], identity_kernel, stride=dilate_rate)\n            slices.append(dilated)\n        return torch.cat(slices, dim=1)\n\ndef merge_dilated_into_large_kernel(large_kernel, dilated_kernel, dilated_r):\n    large_k = large_kernel.size(2)\n    dilated_k = dilated_kernel.size(2)\n    equivalent_kernel_size = dilated_r * (dilated_k - 1) + 1\n    equivalent_kernel = convert_dilated_to_nondilated(dilated_kernel, dilated_r)\n    rows_to_pad = large_k // 2 - equivalent_kernel_size // 2\n    merged_kernel = large_kernel + F.pad(equivalent_kernel, [rows_to_pad] * 4)\n    return merged_kernel\n\n\nclass DilatedReparamBlock(nn.Module):\n    \"\"\"\n    Dilated Reparam Block proposed in UniRepLKNet (https://github.com/AILab-CVC/UniRepLKNet)\n    We assume the inputs to this block are (N, C, H, W)\n    \"\"\"\n    def __init__(self, channels, kernel_size, deploy, use_sync_bn=False, attempt_use_lk_impl=True):\n        super().__init__()\n        self.lk_origin = get_conv2d(channels, channels, kernel_size, stride=1,\n                                    padding=kernel_size//2, dilation=1, groups=channels, bias=deploy,\n                                    attempt_use_lk_impl=attempt_use_lk_impl)\n        self.attempt_use_lk_impl = attempt_use_lk_impl\n\n        #   Default settings. We did not tune them carefully. Different settings may work better.\n        if kernel_size == 17:\n            self.kernel_sizes = [5, 9, 3, 3, 3]\n            self.dilates = [1, 2, 4, 5, 7]\n        elif kernel_size == 15:\n            self.kernel_sizes = [5, 7, 3, 3, 3]\n            self.dilates = [1, 2, 3, 5, 7]\n        elif kernel_size == 13:\n            self.kernel_sizes = [5, 7, 3, 3, 3]\n            self.dilates = [1, 2, 3, 4, 5]\n        elif kernel_size == 11:\n            self.kernel_sizes = [5, 5, 3, 3, 3]\n            self.dilates = [1, 2, 3, 4, 5]\n        elif kernel_size == 9:\n            self.kernel_sizes = [5, 5, 3, 3]\n            self.dilates = [1, 2, 3, 4]\n        elif kernel_size == 7:\n            self.kernel_sizes = [5, 3, 3]\n            self.dilates = [1, 2, 3]\n        elif kernel_size == 5:\n            self.kernel_sizes = [3, 3]\n            self.dilates = [1, 2]\n        else:\n            raise ValueError('Dilated Reparam Block requires kernel_size >= 5')\n\n        if not deploy:\n            self.origin_bn = get_bn(channels, use_sync_bn)\n            for k, r in zip(self.kernel_sizes, self.dilates):\n                self.__setattr__('dil_conv_k{}_{}'.format(k, r),\n                                 nn.Conv2d(in_channels=channels, out_channels=channels, kernel_size=k, stride=1,\n                                           padding=(r * (k - 1) + 1) // 2, dilation=r, groups=channels,\n                                           bias=False))\n                self.__setattr__('dil_bn_k{}_{}'.format(k, r), get_bn(channels, use_sync_bn=use_sync_bn))\n\n    def forward(self, x):\n        if not hasattr(self, 'origin_bn'):      # deploy mode\n            return self.lk_origin(x)\n        out = self.origin_bn(self.lk_origin(x))\n        for k, r in zip(self.kernel_sizes, self.dilates):\n            conv = self.__getattr__('dil_conv_k{}_{}'.format(k, r))\n            bn = self.__getattr__('dil_bn_k{}_{}'.format(k, r))\n            out = out + bn(conv(x))\n        return out\n\n    def merge_dilated_branches(self):\n        if hasattr(self, 'origin_bn'):\n            origin_k, origin_b = fuse_bn(self.lk_origin, self.origin_bn)\n            for k, r in zip(self.kernel_sizes, self.dilates):\n                conv = self.__getattr__('dil_conv_k{}_{}'.format(k, r))\n                bn = self.__getattr__('dil_bn_k{}_{}'.format(k, r))\n                branch_k, branch_b = fuse_bn(conv, bn)\n                origin_k = merge_dilated_into_large_kernel(origin_k, branch_k, r)\n                origin_b += branch_b\n            merged_conv = get_conv2d(origin_k.size(0), origin_k.size(0), origin_k.size(2), stride=1,\n                                    padding=origin_k.size(2)//2, dilation=1, groups=origin_k.size(0), bias=True,\n                                    attempt_use_lk_impl=self.attempt_use_lk_impl)\n            merged_conv.weight.data = origin_k\n            merged_conv.bias.data = origin_b\n            self.lk_origin = merged_conv\n            self.__delattr__('origin_bn')\n            for k, r in zip(self.kernel_sizes, self.dilates):\n                self.__delattr__('dil_conv_k{}_{}'.format(k, r))\n                self.__delattr__('dil_bn_k{}_{}'.format(k, r))\n\n\nclass UniRepLKNetBlock(nn.Module):\n\n    def __init__(self,\n                 dim,\n                 kernel_size,\n                 drop_path=0.,\n                 layer_scale_init_value=1e-6,\n                 deploy=False,\n                 attempt_use_lk_impl=True,\n                 with_cp=False,\n                 use_sync_bn=False,\n                 ffn_factor=4):\n        super().__init__()\n        self.with_cp = with_cp\n        # if deploy:\n        #     print('------------------------------- Note: deploy mode')\n        # if self.with_cp:\n        #     print('****** note with_cp = True, reduce memory consumption but may slow down training ******')\n\n        self.need_contiguous = (not deploy) or kernel_size >= 7\n\n        if kernel_size == 0:\n            self.dwconv = nn.Identity()\n            self.norm = nn.Identity()\n        elif deploy:\n            self.dwconv = get_conv2d(dim, dim, kernel_size=kernel_size, stride=1, padding=kernel_size // 2,\n                                     dilation=1, groups=dim, bias=True,\n                                     attempt_use_lk_impl=attempt_use_lk_impl)\n            self.norm = nn.Identity()\n        elif kernel_size >= 7:\n            self.dwconv = DilatedReparamBlock(dim, kernel_size, deploy=deploy,\n                                              use_sync_bn=use_sync_bn,\n                                              attempt_use_lk_impl=attempt_use_lk_impl)\n            self.norm = get_bn(dim, use_sync_bn=use_sync_bn)\n        elif kernel_size == 1:\n            self.dwconv = nn.Conv2d(dim, dim, kernel_size=kernel_size, stride=1, padding=kernel_size // 2,\n                                    dilation=1, groups=1, bias=deploy)\n            self.norm = get_bn(dim, use_sync_bn=use_sync_bn)\n        else:\n            assert kernel_size in [3, 5]\n            self.dwconv = nn.Conv2d(dim, dim, kernel_size=kernel_size, stride=1, padding=kernel_size // 2,\n                                    dilation=1, groups=dim, bias=deploy)\n            self.norm = get_bn(dim, use_sync_bn=use_sync_bn)\n\n        self.se = SEBlock(dim, dim // 4)\n\n        ffn_dim = int(ffn_factor * dim)\n        self.pwconv1 = nn.Sequential(\n            NCHWtoNHWC(),\n            nn.Linear(dim, ffn_dim))\n        self.act = nn.Sequential(\n            nn.GELU(),\n            GRNwithNHWC(ffn_dim, use_bias=not deploy))\n        if deploy:\n            self.pwconv2 = nn.Sequential(\n                nn.Linear(ffn_dim, dim),\n                NHWCtoNCHW())\n        else:\n            self.pwconv2 = nn.Sequential(\n                nn.Linear(ffn_dim, dim, bias=False),\n                NHWCtoNCHW(),\n                get_bn(dim, use_sync_bn=use_sync_bn))\n\n        self.gamma = nn.Parameter(layer_scale_init_value * torch.ones(dim),\n                                  requires_grad=True) if (not deploy) and layer_scale_init_value is not None \\\n                                                         and layer_scale_init_value > 0 else None\n        self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity()\n\n    def forward(self, inputs):\n\n        def _f(x):\n            if self.need_contiguous:\n                x = x.contiguous()\n            y = self.se(self.norm(self.dwconv(x)))\n            y = self.pwconv2(self.act(self.pwconv1(y)))\n            if self.gamma is not None:\n                y = self.gamma.view(1, -1, 1, 1) * y\n            return self.drop_path(y) + x\n\n        if self.with_cp and inputs.requires_grad:\n            return checkpoint.checkpoint(_f, inputs)\n        else:\n            return _f(inputs)\n\n    def reparameterize(self):\n        if hasattr(self.dwconv, 'merge_dilated_branches'):\n            self.dwconv.merge_dilated_branches()\n        if hasattr(self.norm, 'running_var') and hasattr(self.dwconv, 'lk_origin'):\n            std = (self.norm.running_var + self.norm.eps).sqrt()\n            self.dwconv.lk_origin.weight.data *= (self.norm.weight / std).view(-1, 1, 1, 1)\n            self.dwconv.lk_origin.bias.data = self.norm.bias + (self.dwconv.lk_origin.bias - self.norm.running_mean) * self.norm.weight / std\n            self.norm = nn.Identity()\n        if self.gamma is not None:\n            final_scale = self.gamma.data\n            self.gamma = None\n        else:\n            final_scale = 1\n        if self.act[1].use_bias and len(self.pwconv2) == 3:\n            grn_bias = self.act[1].beta.data\n            self.act[1].__delattr__('beta')\n            self.act[1].use_bias = False\n            linear = self.pwconv2[0]\n            grn_bias_projected_bias = (linear.weight.data @ grn_bias.view(-1, 1)).squeeze()\n            bn = self.pwconv2[2]\n            std = (bn.running_var + bn.eps).sqrt()\n            new_linear = nn.Linear(linear.in_features, linear.out_features, bias=True)\n            new_linear.weight.data = linear.weight * (bn.weight / std * final_scale).view(-1, 1)\n            linear_bias = 0 if linear.bias is None else linear.bias.data\n            linear_bias += grn_bias_projected_bias\n            new_linear.bias.data = (bn.bias + (linear_bias - bn.running_mean) * bn.weight / std) * final_scale\n            self.pwconv2 = nn.Sequential(new_linear, self.pwconv2[1])\n\n\n\ndefault_UniRepLKNet_A_F_P_kernel_sizes = ((3, 3),\n                                      (13, 13),\n                                      (13, 13, 13, 13, 13, 13),\n                                      (13, 13))\ndefault_UniRepLKNet_N_kernel_sizes = ((3, 3),\n                                      (13, 13),\n                                      (13, 13, 13, 13, 13, 13, 13, 13),\n                                      (13, 13))\ndefault_UniRepLKNet_T_kernel_sizes = ((3, 3, 3),\n                                      (13, 13, 13),\n                                      (13, 3, 13, 3, 13, 3, 13, 3, 13, 3, 13, 3, 13, 3, 13, 3, 13, 3),\n                                      (13, 13, 13))\ndefault_UniRepLKNet_S_B_L_XL_kernel_sizes = ((3, 3, 3),\n                                             (13, 13, 13),\n                                             (13, 3, 3, 13, 3, 3, 13, 3, 3, 13, 3, 3, 13, 3, 3, 13, 3, 3, 13, 3, 3, 13, 3, 3, 13, 3, 3),\n                                             (13, 13, 13))\nUniRepLKNet_A_F_P_depths = (2, 2, 6, 2)\nUniRepLKNet_N_depths = (2, 2, 8, 2)\nUniRepLKNet_T_depths = (3, 3, 18, 3)\nUniRepLKNet_S_B_L_XL_depths = (3, 3, 27, 3)\n\ndefault_depths_to_kernel_sizes = {\n    UniRepLKNet_A_F_P_depths: default_UniRepLKNet_A_F_P_kernel_sizes,\n    UniRepLKNet_N_depths: default_UniRepLKNet_N_kernel_sizes,\n    UniRepLKNet_T_depths: default_UniRepLKNet_T_kernel_sizes,\n    UniRepLKNet_S_B_L_XL_depths: default_UniRepLKNet_S_B_L_XL_kernel_sizes\n}\n\nclass UniRepLKNet(nn.Module):\n    r\"\"\" UniRepLKNet\n        A PyTorch impl of UniRepLKNet\n\n    Args:\n        in_chans (int): Number of input image channels. Default: 3\n        num_classes (int): Number of classes for classification head. Default: 1000\n        depths (tuple(int)): Number of blocks at each stage. Default: (3, 3, 27, 3)\n        dims (int): Feature dimension at each stage. Default: (96, 192, 384, 768)\n        drop_path_rate (float): Stochastic depth rate. Default: 0.\n        layer_scale_init_value (float): Init value for Layer Scale. Default: 1e-6.\n        head_init_scale (float): Init scaling value for classifier weights and biases. Default: 1.\n        kernel_sizes (tuple(tuple(int))): Kernel size for each block. None means using the default settings. Default: None.\n        deploy (bool): deploy = True means using the inference structure. Default: False\n        with_cp (bool): with_cp = True means using torch.utils.checkpoint to save GPU memory. Default: False\n        init_cfg (dict): weights to load. The easiest way to use UniRepLKNet with for OpenMMLab family. Default: None\n        attempt_use_lk_impl (bool): try to load the efficient iGEMM large-kernel impl. Setting it to False disabling the iGEMM impl. Default: True\n        use_sync_bn (bool): use_sync_bn = True means using sync BN. Use it if your batch size is small. Default: False\n    \"\"\"\n    def __init__(self,\n                 in_chans=3,\n                 num_classes=1000,\n                 depths=(3, 3, 27, 3),\n                 dims=(96, 192, 384, 768),\n                 drop_path_rate=0.,\n                 layer_scale_init_value=1e-6,\n                 head_init_scale=1.,\n                 kernel_sizes=None,\n                 deploy=False,\n                 with_cp=False,\n                 init_cfg=None,\n                 attempt_use_lk_impl=True,\n                 use_sync_bn=False,\n                 **kwargs\n                 ):\n        super().__init__()\n\n        depths = tuple(depths)\n        if kernel_sizes is None:\n            if depths in default_depths_to_kernel_sizes:\n                # print('=========== use default kernel size ')\n                kernel_sizes = default_depths_to_kernel_sizes[depths]\n            else:\n                raise ValueError('no default kernel size settings for the given depths, '\n                                 'please specify kernel sizes for each block, e.g., '\n                                 '((3, 3), (13, 13), (13, 13, 13, 13, 13, 13), (13, 13))')\n        # print(kernel_sizes)\n        for i in range(4):\n            assert len(kernel_sizes[i]) == depths[i], 'kernel sizes do not match the depths'\n\n        self.with_cp = with_cp\n\n        dp_rates = [x.item() for x in torch.linspace(0, drop_path_rate, sum(depths))]\n        # print('=========== drop path rates: ', dp_rates)\n\n        self.downsample_layers = nn.ModuleList()\n        self.downsample_layers.append(nn.Sequential(\n            nn.Conv2d(in_chans, dims[0] // 2, kernel_size=3, stride=2, padding=1),\n            LayerNorm(dims[0] // 2, eps=1e-6, data_format=\"channels_first\"),\n            nn.GELU(),\n            nn.Conv2d(dims[0] // 2, dims[0], kernel_size=3, stride=2, padding=1),\n            LayerNorm(dims[0], eps=1e-6, data_format=\"channels_first\")))\n\n        for i in range(3):\n            self.downsample_layers.append(nn.Sequential(\n                nn.Conv2d(dims[i], dims[i + 1], kernel_size=3, stride=2, padding=1),\n                LayerNorm(dims[i + 1], eps=1e-6, data_format=\"channels_first\")))\n\n        self.stages = nn.ModuleList()\n\n        cur = 0\n        for i in range(4):\n            main_stage = nn.Sequential(\n                *[UniRepLKNetBlock(dim=dims[i], kernel_size=kernel_sizes[i][j], drop_path=dp_rates[cur + j],\n                                   layer_scale_init_value=layer_scale_init_value, deploy=deploy,\n                                   attempt_use_lk_impl=attempt_use_lk_impl,\n                                   with_cp=with_cp, use_sync_bn=use_sync_bn) for j in\n                  range(depths[i])])\n            self.stages.append(main_stage)\n            cur += depths[i]\n\n        self.output_mode = 'features'\n        norm_layer = partial(LayerNorm, eps=1e-6, data_format=\"channels_first\")\n        for i_layer in range(4):\n            layer = norm_layer(dims[i_layer])\n            layer_name = f'norm{i_layer}'\n            self.add_module(layer_name, layer)\n        self.channel = [i.size(1) for i in self.forward(torch.randn(1, 3, 640, 640))]\n        self.apply(self._init_weights)\n\n    def _init_weights(self, m):\n        if isinstance(m, (nn.Conv2d, nn.Linear)):\n            trunc_normal_(m.weight, std=.02)\n            if hasattr(m, 'bias') and m.bias is not None:\n                nn.init.constant_(m.bias, 0)\n\n    def forward(self, x):\n        if self.output_mode == 'logits':\n            for stage_idx in range(4):\n                x = self.downsample_layers[stage_idx](x)\n                x = self.stages[stage_idx](x)\n            x = self.norm(x.mean([-2, -1]))\n            x = self.head(x)\n            return x\n        elif self.output_mode == 'features':\n            outs = []\n            for stage_idx in range(4):\n                x = self.downsample_layers[stage_idx](x)\n                x = self.stages[stage_idx](x)\n                outs.append(self.__getattr__(f'norm{stage_idx}')(x))\n            return outs\n        else:\n            raise ValueError('Defined new output mode?')\n\n    def switch_to_deploy(self):\n        for m in self.modules():\n            if hasattr(m, 'reparameterize'):\n                m.reparameterize()\n\n\n\nclass LayerNorm(nn.Module):\n    r\"\"\" LayerNorm implementation used in ConvNeXt\n    LayerNorm that supports two data formats: channels_last (default) or channels_first.\n    The ordering of the dimensions in the inputs. channels_last corresponds to inputs with\n    shape (batch_size, height, width, channels) while channels_first corresponds to inputs\n    with shape (batch_size, channels, height, width).\n    \"\"\"\n\n    def __init__(self, normalized_shape, eps=1e-6, data_format=\"channels_last\", reshape_last_to_first=False):\n        super().__init__()\n        self.weight = nn.Parameter(torch.ones(normalized_shape))\n        self.bias = nn.Parameter(torch.zeros(normalized_shape))\n        self.eps = eps\n        self.data_format = data_format\n        if self.data_format not in [\"channels_last\", \"channels_first\"]:\n            raise NotImplementedError\n        self.normalized_shape = (normalized_shape,)\n        self.reshape_last_to_first = reshape_last_to_first\n\n    def forward(self, x):\n        if self.data_format == \"channels_last\":\n            return F.layer_norm(x, self.normalized_shape, self.weight, self.bias, self.eps)\n        elif self.data_format == \"channels_first\":\n            u = x.mean(1, keepdim=True)\n            s = (x - u).pow(2).mean(1, keepdim=True)\n            x = (x - u) / torch.sqrt(s + self.eps)\n            x = self.weight[:, None, None] * x + self.bias[:, None, None]\n            return x\n\ndef update_weight(model_dict, weight_dict):\n    idx, temp_dict = 0, {}\n    for k, v in weight_dict.items():\n        if k in model_dict.keys() and np.shape(model_dict[k]) == np.shape(v):\n            temp_dict[k] = v\n            idx += 1\n    model_dict.update(temp_dict)\n    print(f'loading weights... {idx}/{len(model_dict)} items')\n    return model_dict\n\ndef unireplknet_a(weights='', **kwargs):\n    model = UniRepLKNet(depths=UniRepLKNet_A_F_P_depths, dims=(40, 80, 160, 320), **kwargs)\n    if weights:\n        model.load_state_dict(update_weight(model.state_dict(), torch.load(weights)))\n    return model\n\ndef unireplknet_f(weights='', **kwargs):\n    model = UniRepLKNet(depths=UniRepLKNet_A_F_P_depths, dims=(48, 96, 192, 384), **kwargs)\n    if weights:\n        model.load_state_dict(update_weight(model.state_dict(), torch.load(weights)))\n    return model\n\ndef unireplknet_p(weights='', **kwargs):\n    model = UniRepLKNet(depths=UniRepLKNet_A_F_P_depths, dims=(64, 128, 256, 512), **kwargs)\n    if weights:\n        model.load_state_dict(update_weight(model.state_dict(), torch.load(weights)))\n    return model\n\ndef unireplknet_n(weights='', **kwargs):\n    model = UniRepLKNet(depths=UniRepLKNet_N_depths, dims=(80, 160, 320, 640), **kwargs)\n    if weights:\n        model.load_state_dict(update_weight(model.state_dict(), torch.load(weights)))\n    return model\n\ndef unireplknet_t(weights='', **kwargs):\n    model = UniRepLKNet(depths=UniRepLKNet_T_depths, dims=(80, 160, 320, 640), **kwargs)\n    if weights:\n        model.load_state_dict(update_weight(model.state_dict(), torch.load(weights)))\n    return model\n\ndef unireplknet_s(weights='', **kwargs):\n    model = UniRepLKNet(depths=UniRepLKNet_S_B_L_XL_depths, dims=(96, 192, 384, 768), **kwargs)\n    if weights:\n        model.load_state_dict(update_weight(model.state_dict(), torch.load(weights)))\n    return model\n\ndef unireplknet_b(weights='', **kwargs):\n    model = UniRepLKNet(depths=UniRepLKNet_S_B_L_XL_depths, dims=(128, 256, 512, 1024), **kwargs)\n    if weights:\n        model.load_state_dict(update_weight(model.state_dict(), torch.load(weights)))\n    return model\n\ndef unireplknet_l(weights='', **kwargs):\n    model = UniRepLKNet(depths=UniRepLKNet_S_B_L_XL_depths, dims=(192, 384, 768, 1536), **kwargs)\n    if weights:\n        model.load_state_dict(update_weight(model.state_dict(), torch.load(weights)))\n    return model\n\ndef unireplknet_xl(weights='', **kwargs):\n    model = UniRepLKNet(depths=UniRepLKNet_S_B_L_XL_depths, dims=(256, 512, 1024, 2048), **kwargs)\n    if weights:\n        model.load_state_dict(update_weight(model.state_dict(), torch.load(weights)))\n    return model\n\nif __name__ == '__main__':\n    inputs = torch.randn((1, 3, 640, 640))\n    model = unireplknet_a('unireplknet_a_in1k_224_acc77.03.pth')\n    res = model(inputs)[-1]\n    model.switch_to_deploy()\n    res_fuse = model(inputs)[-1]\n    print(torch.mean(res_fuse - res))"
  },
  {
    "path": "yolo-improve/yolov5-backbone/VanillaNet/VanillaNet.py",
    "content": "#Copyright (C) 2023. Huawei Technologies Co., Ltd. All rights reserved.\n\n#This program is free software; you can redistribute it and/or modify it under the terms of the MIT License.\n\n#This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the MIT License for more details.\n\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nfrom timm.models.layers import weight_init, DropPath\nimport numpy as np\n\n__all__ = ['vanillanet_5', 'vanillanet_6', 'vanillanet_7', 'vanillanet_8', 'vanillanet_9', 'vanillanet_10', 'vanillanet_11', 'vanillanet_12', 'vanillanet_13', 'vanillanet_13_x1_5', 'vanillanet_13_x1_5_ada_pool']\n\nclass activation(nn.ReLU):\n    def __init__(self, dim, act_num=3, deploy=False):\n        super(activation, self).__init__()\n        self.deploy = deploy\n        self.weight = torch.nn.Parameter(torch.randn(dim, 1, act_num*2 + 1, act_num*2 + 1))\n        self.bias = None\n        self.bn = nn.BatchNorm2d(dim, eps=1e-6)\n        self.dim = dim\n        self.act_num = act_num\n        weight_init.trunc_normal_(self.weight, std=.02)\n\n    def forward(self, x):\n        if self.deploy:\n            return torch.nn.functional.conv2d(\n                super(activation, self).forward(x), \n                self.weight, self.bias, padding=(self.act_num*2 + 1)//2, groups=self.dim)\n        else:\n            return self.bn(torch.nn.functional.conv2d(\n                super(activation, self).forward(x),\n                self.weight, padding=self.act_num, groups=self.dim))\n\n    def _fuse_bn_tensor(self, weight, bn):\n        kernel = weight\n        running_mean = bn.running_mean\n        running_var = bn.running_var\n        gamma = bn.weight\n        beta = bn.bias\n        eps = bn.eps\n        std = (running_var + eps).sqrt()\n        t = (gamma / std).reshape(-1, 1, 1, 1)\n        return kernel * t, beta + (0 - running_mean) * gamma / std\n    \n    def switch_to_deploy(self):\n        if not self.deploy:\n            kernel, bias = self._fuse_bn_tensor(self.weight, self.bn)\n            self.weight.data = kernel\n            self.bias = torch.nn.Parameter(torch.zeros(self.dim))\n            self.bias.data = bias\n            self.__delattr__('bn')\n            self.deploy = True\n\n\nclass Block(nn.Module):\n    def __init__(self, dim, dim_out, act_num=3, stride=2, deploy=False, ada_pool=None):\n        super().__init__()\n        self.act_learn = 1\n        self.deploy = deploy\n        if self.deploy:\n            self.conv = nn.Conv2d(dim, dim_out, kernel_size=1)\n        else:\n            self.conv1 = nn.Sequential(\n                nn.Conv2d(dim, dim, kernel_size=1),\n                nn.BatchNorm2d(dim, eps=1e-6),\n            )\n            self.conv2 = nn.Sequential(\n                nn.Conv2d(dim, dim_out, kernel_size=1),\n                nn.BatchNorm2d(dim_out, eps=1e-6)\n            )\n\n        if not ada_pool:\n            self.pool = nn.Identity() if stride == 1 else nn.MaxPool2d(stride)\n        else:\n            self.pool = nn.Identity() if stride == 1 else nn.AdaptiveMaxPool2d((ada_pool, ada_pool))\n\n        self.act = activation(dim_out, act_num)\n \n    def forward(self, x):\n        if self.deploy:\n            x = self.conv(x)\n        else:\n            x = self.conv1(x)\n            x = torch.nn.functional.leaky_relu(x,self.act_learn)\n            x = self.conv2(x)\n\n        x = self.pool(x)\n        x = self.act(x)\n        return x\n\n    def _fuse_bn_tensor(self, conv, bn):\n        kernel = conv.weight\n        bias = conv.bias\n        running_mean = bn.running_mean\n        running_var = bn.running_var\n        gamma = bn.weight\n        beta = bn.bias\n        eps = bn.eps\n        std = (running_var + eps).sqrt()\n        t = (gamma / std).reshape(-1, 1, 1, 1)\n        return kernel * t, beta + (bias - running_mean) * gamma / std\n    \n    def switch_to_deploy(self):\n        if not self.deploy:\n            kernel, bias = self._fuse_bn_tensor(self.conv1[0], self.conv1[1])\n            self.conv1[0].weight.data = kernel\n            self.conv1[0].bias.data = bias\n            # kernel, bias = self.conv2[0].weight.data, self.conv2[0].bias.data\n            kernel, bias = self._fuse_bn_tensor(self.conv2[0], self.conv2[1])\n            self.conv = self.conv2[0]\n            self.conv.weight.data = torch.matmul(kernel.transpose(1,3), self.conv1[0].weight.data.squeeze(3).squeeze(2)).transpose(1,3)\n            self.conv.bias.data = bias + (self.conv1[0].bias.data.view(1,-1,1,1)*kernel).sum(3).sum(2).sum(1)\n            self.__delattr__('conv1')\n            self.__delattr__('conv2')\n            self.act.switch_to_deploy()\n            self.deploy = True\n    \n\nclass VanillaNet(nn.Module):\n    def __init__(self, in_chans=3, num_classes=1000, dims=[96, 192, 384, 768], \n                 drop_rate=0, act_num=3, strides=[2,2,2,1], deploy=False, ada_pool=None, **kwargs):\n        super().__init__()\n        self.deploy = deploy\n        if self.deploy:\n            self.stem = nn.Sequential(\n                nn.Conv2d(in_chans, dims[0], kernel_size=4, stride=4),\n                activation(dims[0], act_num)\n            )\n        else:\n            self.stem1 = nn.Sequential(\n                nn.Conv2d(in_chans, dims[0], kernel_size=4, stride=4),\n                nn.BatchNorm2d(dims[0], eps=1e-6),\n            )\n            self.stem2 = nn.Sequential(\n                nn.Conv2d(dims[0], dims[0], kernel_size=1, stride=1),\n                nn.BatchNorm2d(dims[0], eps=1e-6),\n                activation(dims[0], act_num)\n            )\n\n        self.act_learn = 1\n\n        self.stages = nn.ModuleList()\n        for i in range(len(strides)):\n            if not ada_pool:\n                stage = Block(dim=dims[i], dim_out=dims[i+1], act_num=act_num, stride=strides[i], deploy=deploy)\n            else:\n                stage = Block(dim=dims[i], dim_out=dims[i+1], act_num=act_num, stride=strides[i], deploy=deploy, ada_pool=ada_pool[i])\n            self.stages.append(stage)\n        self.depth = len(strides)\n\n        self.apply(self._init_weights)\n        self.channel = [i.size(1) for i in self.forward(torch.randn(1, 3, 640, 640))]\n\n    def _init_weights(self, m):\n        if isinstance(m, (nn.Conv2d, nn.Linear)):\n            weight_init.trunc_normal_(m.weight, std=.02)\n            nn.init.constant_(m.bias, 0)\n\n    def change_act(self, m):\n        for i in range(self.depth):\n            self.stages[i].act_learn = m\n        self.act_learn = m\n\n    def forward(self, x):\n        input_size = x.size(2)\n        scale = [4, 8, 16, 32]\n        features = [None, None, None, None]\n        if self.deploy:\n            x = self.stem(x)\n        else:\n            x = self.stem1(x)\n            x = torch.nn.functional.leaky_relu(x,self.act_learn)\n            x = self.stem2(x)\n        if input_size // x.size(2) in scale:\n            features[scale.index(input_size // x.size(2))] = x\n        for i in range(self.depth):\n            x = self.stages[i](x)\n            if input_size // x.size(2) in scale:\n                features[scale.index(input_size // x.size(2))] = x\n        return features\n\n    def _fuse_bn_tensor(self, conv, bn):\n        kernel = conv.weight\n        bias = conv.bias\n        running_mean = bn.running_mean\n        running_var = bn.running_var\n        gamma = bn.weight\n        beta = bn.bias\n        eps = bn.eps\n        std = (running_var + eps).sqrt()\n        t = (gamma / std).reshape(-1, 1, 1, 1)\n        return kernel * t, beta + (bias - running_mean) * gamma / std\n    \n    def switch_to_deploy(self):\n        if not self.deploy:\n            self.stem2[2].switch_to_deploy()\n            kernel, bias = self._fuse_bn_tensor(self.stem1[0], self.stem1[1])\n            self.stem1[0].weight.data = kernel\n            self.stem1[0].bias.data = bias\n            kernel, bias = self._fuse_bn_tensor(self.stem2[0], self.stem2[1])\n            self.stem1[0].weight.data = torch.einsum('oi,icjk->ocjk', kernel.squeeze(3).squeeze(2), self.stem1[0].weight.data)\n            self.stem1[0].bias.data = bias + (self.stem1[0].bias.data.view(1,-1,1,1)*kernel).sum(3).sum(2).sum(1)\n            self.stem = torch.nn.Sequential(*[self.stem1[0], self.stem2[2]])\n            self.__delattr__('stem1')\n            self.__delattr__('stem2')\n\n            for i in range(self.depth):\n                self.stages[i].switch_to_deploy()\n\n            self.deploy = True\n\ndef update_weight(model_dict, weight_dict):\n    idx, temp_dict = 0, {}\n    for k, v in weight_dict.items():\n        if k in model_dict.keys() and np.shape(model_dict[k]) == np.shape(v):\n            temp_dict[k] = v\n            idx += 1\n    model_dict.update(temp_dict)\n    print(f'loading weights... {idx}/{len(model_dict)} items')\n    return model_dict\n\ndef vanillanet_5(pretrained='',in_22k=False, **kwargs):\n    model = VanillaNet(dims=[128*4, 256*4, 512*4, 1024*4], strides=[2,2,2], **kwargs)\n    if pretrained:\n        weights = torch.load(pretrained)['model_ema']\n        model.load_state_dict(update_weight(model.state_dict(), weights))\n    return model\n\ndef vanillanet_6(pretrained='',in_22k=False, **kwargs):\n    model = VanillaNet(dims=[128*4, 256*4, 512*4, 1024*4, 1024*4], strides=[2,2,2,1], **kwargs)\n    if pretrained:\n        weights = torch.load(pretrained)['model_ema']\n        model.load_state_dict(update_weight(model.state_dict(), weights))\n    return model\n\ndef vanillanet_7(pretrained='',in_22k=False, **kwargs):\n    model = VanillaNet(dims=[128*4, 128*4, 256*4, 512*4, 1024*4, 1024*4], strides=[1,2,2,2,1], **kwargs)\n    if pretrained:\n        weights = torch.load(pretrained)['model_ema']\n        model.load_state_dict(update_weight(model.state_dict(), weights))\n    return model\n\ndef vanillanet_8(pretrained='', in_22k=False, **kwargs):\n    model = VanillaNet(dims=[128*4, 128*4, 256*4, 512*4, 512*4, 1024*4, 1024*4], strides=[1,2,2,1,2,1], **kwargs)\n    if pretrained:\n        weights = torch.load(pretrained)['model_ema']\n        model.load_state_dict(update_weight(model.state_dict(), weights))\n    return model\n\ndef vanillanet_9(pretrained='', in_22k=False, **kwargs):\n    model = VanillaNet(dims=[128*4, 128*4, 256*4, 512*4, 512*4, 512*4, 1024*4, 1024*4], strides=[1,2,2,1,1,2,1], **kwargs)\n    if pretrained:\n        weights = torch.load(pretrained)['model_ema']\n        model.load_state_dict(update_weight(model.state_dict(), weights))\n    return model\n\ndef vanillanet_10(pretrained='', in_22k=False, **kwargs):\n    model = VanillaNet(\n        dims=[128*4, 128*4, 256*4, 512*4, 512*4, 512*4, 512*4, 1024*4, 1024*4],\n        strides=[1,2,2,1,1,1,2,1],\n        **kwargs)\n    if pretrained:\n        weights = torch.load(pretrained)['model_ema']\n        model.load_state_dict(update_weight(model.state_dict(), weights))\n    return model\n\ndef vanillanet_11(pretrained='', in_22k=False, **kwargs):\n    model = VanillaNet(\n        dims=[128*4, 128*4, 256*4, 512*4, 512*4, 512*4, 512*4, 512*4, 1024*4, 1024*4],\n        strides=[1,2,2,1,1,1,1,2,1],\n        **kwargs)\n    if pretrained:\n        weights = torch.load(pretrained)['model_ema']\n        model.load_state_dict(update_weight(model.state_dict(), weights))\n    return model\n\ndef vanillanet_12(pretrained='', in_22k=False, **kwargs):\n    model = VanillaNet(\n        dims=[128*4, 128*4, 256*4, 512*4, 512*4, 512*4, 512*4, 512*4, 512*4, 1024*4, 1024*4],\n        strides=[1,2,2,1,1,1,1,1,2,1],\n        **kwargs)\n    if pretrained:\n        weights = torch.load(pretrained)['model_ema']\n        model.load_state_dict(update_weight(model.state_dict(), weights))\n    return model\n\ndef vanillanet_13(pretrained='', in_22k=False, **kwargs):\n    model = VanillaNet(\n        dims=[128*4, 128*4, 256*4, 512*4, 512*4, 512*4, 512*4, 512*4, 512*4, 512*4, 1024*4, 1024*4],\n        strides=[1,2,2,1,1,1,1,1,1,2,1],\n        **kwargs)\n    if pretrained:\n        weights = torch.load(pretrained)['model_ema']\n        model.load_state_dict(update_weight(model.state_dict(), weights))\n    return model\n\ndef vanillanet_13_x1_5(pretrained='', in_22k=False, **kwargs):\n    model = VanillaNet(\n        dims=[128*6, 128*6, 256*6, 512*6, 512*6, 512*6, 512*6, 512*6, 512*6, 512*6, 1024*6, 1024*6],\n        strides=[1,2,2,1,1,1,1,1,1,2,1],\n        **kwargs)\n    if pretrained:\n        weights = torch.load(pretrained)['model_ema']\n        model.load_state_dict(update_weight(model.state_dict(), weights))\n    return model\n\ndef vanillanet_13_x1_5_ada_pool(pretrained='', in_22k=False, **kwargs):\n    model = VanillaNet(\n        dims=[128*6, 128*6, 256*6, 512*6, 512*6, 512*6, 512*6, 512*6, 512*6, 512*6, 1024*6, 1024*6],\n        strides=[1,2,2,1,1,1,1,1,1,2,1],\n        ada_pool=[0,40,20,0,0,0,0,0,0,10,0],\n        **kwargs)\n    if pretrained:\n        weights = torch.load(pretrained)['model_ema']\n        model.load_state_dict(update_weight(model.state_dict(), weights))\n    return model\n\nif __name__ == '__main__':\n    inputs = torch.randn((1, 3, 640, 640))\n    model = vanillanet_10()\n    # weights = torch.load('vanillanet_5.pth')['model_ema']\n    # model.load_state_dict(update_weight(model.state_dict(), weights))\n    pred = model(inputs)\n    for i in pred:\n        print(i.size())"
  },
  {
    "path": "yolo-improve/yolov5-backbone/fasternet/faster_cfg/fasternet_l.yaml",
    "content": "mlp_ratio: 2\nembed_dim: 192\ndepths: [3, 4, 18, 3]\nfeature_dim: 1280\npatch_size: 4\npatch_stride: 4\npatch_size2: 2\npatch_stride2: 2\nlayer_scale_init_value: 0 # no layer scale\ndrop_path_rate: 0.3\nnorm_layer:  BN\nact_layer: RELU\nn_div: 4"
  },
  {
    "path": "yolo-improve/yolov5-backbone/fasternet/faster_cfg/fasternet_m.yaml",
    "content": "mlp_ratio: 2\nembed_dim: 144\ndepths: [3, 4, 18, 3]\nfeature_dim: 1280\npatch_size: 4\npatch_stride: 4\npatch_size2: 2\npatch_stride2: 2\nlayer_scale_init_value: 0 # no layer scale\ndrop_path_rate: 0.2\nnorm_layer:  BN\nact_layer: RELU\nn_div: 4"
  },
  {
    "path": "yolo-improve/yolov5-backbone/fasternet/faster_cfg/fasternet_s.yaml",
    "content": "mlp_ratio: 2\nembed_dim: 128\ndepths: [1, 2, 13, 2]\nfeature_dim: 1280\npatch_size: 4\npatch_stride: 4\npatch_size2: 2\npatch_stride2: 2\nlayer_scale_init_value: 0 # no layer scale\ndrop_path_rate: 0.1\nnorm_layer:  BN\nact_layer: RELU\nn_div: 4"
  },
  {
    "path": "yolo-improve/yolov5-backbone/fasternet/faster_cfg/fasternet_t0.yaml",
    "content": "mlp_ratio: 2\nembed_dim: 40\ndepths: [1, 2, 8, 2]\nfeature_dim: 1280\npatch_size: 4\npatch_stride: 4\npatch_size2: 2\npatch_stride2: 2\nlayer_scale_init_value: 0 # no layer scale\ndrop_path_rate: 0.\nnorm_layer:  BN\nact_layer: GELU\nn_div: 4\n"
  },
  {
    "path": "yolo-improve/yolov5-backbone/fasternet/faster_cfg/fasternet_t1.yaml",
    "content": "mlp_ratio: 2\nembed_dim: 64\ndepths: [1, 2, 8, 2]\nfeature_dim: 1280\npatch_size: 4\npatch_stride: 4\npatch_size2: 2\npatch_stride2: 2\nlayer_scale_init_value: 0 # no layer scale\ndrop_path_rate: 0.02\nnorm_layer:  BN\nact_layer: GELU\nn_div: 4"
  },
  {
    "path": "yolo-improve/yolov5-backbone/fasternet/faster_cfg/fasternet_t2.yaml",
    "content": "mlp_ratio: 2\nembed_dim: 96\ndepths: [1, 2, 8, 2]\nfeature_dim: 1280\npatch_size: 4\npatch_stride: 4\npatch_size2: 2\npatch_stride2: 2\nlayer_scale_init_value: 0 # no layer scale\ndrop_path_rate: 0.05\nnorm_layer:  BN\nact_layer: RELU\nn_div: 4"
  },
  {
    "path": "yolo-improve/yolov5-backbone/fasternet/fasternet.py",
    "content": "# Copyright (c) Microsoft Corporation.\n# Licensed under the MIT License.\nimport torch, yaml\nimport torch.nn as nn\nfrom timm.models.layers import DropPath, to_2tuple, trunc_normal_\nfrom functools import partial\nfrom typing import List\nfrom torch import Tensor\nimport copy\nimport os\nimport numpy as np\n\n__all__ = ['fasternet_t0', 'fasternet_t1', 'fasternet_t2', 'fasternet_s', 'fasternet_m', 'fasternet_l']\n\nclass Partial_conv3(nn.Module):\n\n    def __init__(self, dim, n_div, forward):\n        super().__init__()\n        self.dim_conv3 = dim // n_div\n        self.dim_untouched = dim - self.dim_conv3\n        self.partial_conv3 = nn.Conv2d(self.dim_conv3, self.dim_conv3, 3, 1, 1, bias=False)\n\n        if forward == 'slicing':\n            self.forward = self.forward_slicing\n        elif forward == 'split_cat':\n            self.forward = self.forward_split_cat\n        else:\n            raise NotImplementedError\n\n    def forward_slicing(self, x: Tensor) -> Tensor:\n        # only for inference\n        x = x.clone()   # !!! Keep the original input intact for the residual connection later\n        x[:, :self.dim_conv3, :, :] = self.partial_conv3(x[:, :self.dim_conv3, :, :])\n\n        return x\n\n    def forward_split_cat(self, x: Tensor) -> Tensor:\n        # for training/inference\n        x1, x2 = torch.split(x, [self.dim_conv3, self.dim_untouched], dim=1)\n        x1 = self.partial_conv3(x1)\n        x = torch.cat((x1, x2), 1)\n\n        return x\n\n\nclass MLPBlock(nn.Module):\n\n    def __init__(self,\n                 dim,\n                 n_div,\n                 mlp_ratio,\n                 drop_path,\n                 layer_scale_init_value,\n                 act_layer,\n                 norm_layer,\n                 pconv_fw_type\n                 ):\n\n        super().__init__()\n        self.dim = dim\n        self.mlp_ratio = mlp_ratio\n        self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity()\n        self.n_div = n_div\n\n        mlp_hidden_dim = int(dim * mlp_ratio)\n\n        mlp_layer: List[nn.Module] = [\n            nn.Conv2d(dim, mlp_hidden_dim, 1, bias=False),\n            norm_layer(mlp_hidden_dim),\n            act_layer(),\n            nn.Conv2d(mlp_hidden_dim, dim, 1, bias=False)\n        ]\n\n        self.mlp = nn.Sequential(*mlp_layer)\n\n        self.spatial_mixing = Partial_conv3(\n            dim,\n            n_div,\n            pconv_fw_type\n        )\n\n        if layer_scale_init_value > 0:\n            self.layer_scale = nn.Parameter(layer_scale_init_value * torch.ones((dim)), requires_grad=True)\n            self.forward = self.forward_layer_scale\n        else:\n            self.forward = self.forward\n\n    def forward(self, x: Tensor) -> Tensor:\n        shortcut = x\n        x = self.spatial_mixing(x)\n        x = shortcut + self.drop_path(self.mlp(x))\n        return x\n\n    def forward_layer_scale(self, x: Tensor) -> Tensor:\n        shortcut = x\n        x = self.spatial_mixing(x)\n        x = shortcut + self.drop_path(\n            self.layer_scale.unsqueeze(-1).unsqueeze(-1) * self.mlp(x))\n        return x\n\n\nclass BasicStage(nn.Module):\n\n    def __init__(self,\n                 dim,\n                 depth,\n                 n_div,\n                 mlp_ratio,\n                 drop_path,\n                 layer_scale_init_value,\n                 norm_layer,\n                 act_layer,\n                 pconv_fw_type\n                 ):\n\n        super().__init__()\n\n        blocks_list = [\n            MLPBlock(\n                dim=dim,\n                n_div=n_div,\n                mlp_ratio=mlp_ratio,\n                drop_path=drop_path[i],\n                layer_scale_init_value=layer_scale_init_value,\n                norm_layer=norm_layer,\n                act_layer=act_layer,\n                pconv_fw_type=pconv_fw_type\n            )\n            for i in range(depth)\n        ]\n\n        self.blocks = nn.Sequential(*blocks_list)\n\n    def forward(self, x: Tensor) -> Tensor:\n        x = self.blocks(x)\n        return x\n\n\nclass PatchEmbed(nn.Module):\n\n    def __init__(self, patch_size, patch_stride, in_chans, embed_dim, norm_layer):\n        super().__init__()\n        self.proj = nn.Conv2d(in_chans, embed_dim, kernel_size=patch_size, stride=patch_stride, bias=False)\n        if norm_layer is not None:\n            self.norm = norm_layer(embed_dim)\n        else:\n            self.norm = nn.Identity()\n\n    def forward(self, x: Tensor) -> Tensor:\n        x = self.norm(self.proj(x))\n        return x\n\n\nclass PatchMerging(nn.Module):\n\n    def __init__(self, patch_size2, patch_stride2, dim, norm_layer):\n        super().__init__()\n        self.reduction = nn.Conv2d(dim, 2 * dim, kernel_size=patch_size2, stride=patch_stride2, bias=False)\n        if norm_layer is not None:\n            self.norm = norm_layer(2 * dim)\n        else:\n            self.norm = nn.Identity()\n\n    def forward(self, x: Tensor) -> Tensor:\n        x = self.norm(self.reduction(x))\n        return x\n\n\nclass FasterNet(nn.Module):\n    def __init__(self,\n                 in_chans=3,\n                 num_classes=1000,\n                 embed_dim=96,\n                 depths=(1, 2, 8, 2),\n                 mlp_ratio=2.,\n                 n_div=4,\n                 patch_size=4,\n                 patch_stride=4,\n                 patch_size2=2,  # for subsequent layers\n                 patch_stride2=2,\n                 patch_norm=True,\n                 feature_dim=1280,\n                 drop_path_rate=0.1,\n                 layer_scale_init_value=0,\n                 norm_layer='BN',\n                 act_layer='RELU',\n                 init_cfg=None,\n                 pretrained=None,\n                 pconv_fw_type='split_cat',\n                 **kwargs):\n        super().__init__()\n\n        if norm_layer == 'BN':\n            norm_layer = nn.BatchNorm2d\n        else:\n            raise NotImplementedError\n\n        if act_layer == 'GELU':\n            act_layer = nn.GELU\n        elif act_layer == 'RELU':\n            act_layer = partial(nn.ReLU, inplace=True)\n        else:\n            raise NotImplementedError\n\n        self.num_stages = len(depths)\n        self.embed_dim = embed_dim\n        self.patch_norm = patch_norm\n        self.num_features = int(embed_dim * 2 ** (self.num_stages - 1))\n        self.mlp_ratio = mlp_ratio\n        self.depths = depths\n\n        # split image into non-overlapping patches\n        self.patch_embed = PatchEmbed(\n            patch_size=patch_size,\n            patch_stride=patch_stride,\n            in_chans=in_chans,\n            embed_dim=embed_dim,\n            norm_layer=norm_layer if self.patch_norm else None\n        )\n\n        # stochastic depth decay rule\n        dpr = [x.item()\n               for x in torch.linspace(0, drop_path_rate, sum(depths))]\n\n        # build layers\n        stages_list = []\n        for i_stage in range(self.num_stages):\n            stage = BasicStage(dim=int(embed_dim * 2 ** i_stage),\n                               n_div=n_div,\n                               depth=depths[i_stage],\n                               mlp_ratio=self.mlp_ratio,\n                               drop_path=dpr[sum(depths[:i_stage]):sum(depths[:i_stage + 1])],\n                               layer_scale_init_value=layer_scale_init_value,\n                               norm_layer=norm_layer,\n                               act_layer=act_layer,\n                               pconv_fw_type=pconv_fw_type\n                               )\n            stages_list.append(stage)\n\n            # patch merging layer\n            if i_stage < self.num_stages - 1:\n                stages_list.append(\n                    PatchMerging(patch_size2=patch_size2,\n                                 patch_stride2=patch_stride2,\n                                 dim=int(embed_dim * 2 ** i_stage),\n                                 norm_layer=norm_layer)\n                )\n\n        self.stages = nn.Sequential(*stages_list)\n\n        # add a norm layer for each output\n        self.out_indices = [0, 2, 4, 6]\n        for i_emb, i_layer in enumerate(self.out_indices):\n            if i_emb == 0 and os.environ.get('FORK_LAST3', None):\n                raise NotImplementedError\n            else:\n                layer = norm_layer(int(embed_dim * 2 ** i_emb))\n            layer_name = f'norm{i_layer}'\n            self.add_module(layer_name, layer)\n        \n        self.channel = [i.size(1) for i in self.forward(torch.randn(1, 3, 640, 640))]\n    def forward(self, x: Tensor) -> Tensor:\n        # output the features of four stages for dense prediction\n        x = self.patch_embed(x)\n        outs = []\n        for idx, stage in enumerate(self.stages):\n            x = stage(x)\n            if idx in self.out_indices:\n                norm_layer = getattr(self, f'norm{idx}')\n                x_out = norm_layer(x)\n                outs.append(x_out)\n        return outs\n\ndef update_weight(model_dict, weight_dict):\n    idx, temp_dict = 0, {}\n    for k, v in weight_dict.items():\n        if k in model_dict.keys() and np.shape(model_dict[k]) == np.shape(v):\n            temp_dict[k] = v\n            idx += 1\n    model_dict.update(temp_dict)\n    print(f'loading weights... {idx}/{len(model_dict)} items')\n    return model_dict\n\ndef fasternet_t0(weights=None, cfg='models/faster_cfg/fasternet_t0.yaml'):\n    with open(cfg) as f:\n        cfg = yaml.load(f, Loader=yaml.SafeLoader)\n    model = FasterNet(**cfg)\n    if weights is not None:\n        pretrain_weight = torch.load(weights, map_location='cpu')\n        model.load_state_dict(update_weight(model.state_dict(), pretrain_weight))\n    return model\n\ndef fasternet_t1(weights=None, cfg='models/faster_cfg/fasternet_t1.yaml'):\n    with open(cfg) as f:\n        cfg = yaml.load(f, Loader=yaml.SafeLoader)\n    model = FasterNet(**cfg)\n    if weights is not None:\n        pretrain_weight = torch.load(weights, map_location='cpu')\n        model.load_state_dict(update_weight(model.state_dict(), pretrain_weight))\n    return model\n\ndef fasternet_t2(weights=None, cfg='models/faster_cfg/fasternet_t2.yaml'):\n    with open(cfg) as f:\n        cfg = yaml.load(f, Loader=yaml.SafeLoader)\n    model = FasterNet(**cfg)\n    if weights is not None:\n        pretrain_weight = torch.load(weights, map_location='cpu')\n        model.load_state_dict(update_weight(model.state_dict(), pretrain_weight))\n    return model\n\ndef fasternet_s(weights=None, cfg='models/faster_cfg/fasternet_s.yaml'):\n    with open(cfg) as f:\n        cfg = yaml.load(f, Loader=yaml.SafeLoader)\n    model = FasterNet(**cfg)\n    if weights is not None:\n        pretrain_weight = torch.load(weights, map_location='cpu')\n        model.load_state_dict(update_weight(model.state_dict(), pretrain_weight))\n    return model\n\ndef fasternet_m(weights=None, cfg='models/faster_cfg/fasternet_m.yaml'):\n    with open(cfg) as f:\n        cfg = yaml.load(f, Loader=yaml.SafeLoader)\n    model = FasterNet(**cfg)\n    if weights is not None:\n        pretrain_weight = torch.load(weights, map_location='cpu')\n        model.load_state_dict(update_weight(model.state_dict(), pretrain_weight))\n    return model\n\ndef fasternet_l(weights=None, cfg='models/faster_cfg/fasternet_l.yaml'):\n    with open(cfg) as f:\n        cfg = yaml.load(f, Loader=yaml.SafeLoader)\n    model = FasterNet(**cfg)\n    if weights is not None:\n        pretrain_weight = torch.load(weights, map_location='cpu')\n        model.load_state_dict(update_weight(model.state_dict(), pretrain_weight))\n    return model\n\nif __name__ == '__main__':\n    import yaml\n    model = fasternet_t0(weights='fasternet_t0-epoch.281-val_acc1.71.9180.pth', cfg='cfg/fasternet_t0.yaml')\n    print(model.channel)\n    inputs = torch.randn((1, 3, 640, 640))\n    for i in model(inputs):\n        print(i.size())"
  },
  {
    "path": "yolo-improve/yolov5-backbone/inceptionnext/inceptionnext.py",
    "content": "\"\"\"\nInceptionNeXt implementation, paper: https://arxiv.org/abs/2303.16900\nSome code is borrowed from timm: https://github.com/huggingface/pytorch-image-models\n\"\"\"\n\nfrom functools import partial\n\nimport torch\nimport torch.nn as nn\nimport numpy as np\n\nfrom timm.data import IMAGENET_DEFAULT_MEAN, IMAGENET_DEFAULT_STD\nfrom timm.models import checkpoint_seq, to_2tuple\nfrom timm.models.layers import trunc_normal_, DropPath\nfrom timm.models.registry import register_model\n\n__all__ = ['inceptionnext_tiny', 'inceptionnext_small', 'inceptionnext_base', 'inceptionnext_base_384']\n\nclass InceptionDWConv2d(nn.Module):\n    \"\"\" Inception depthweise convolution\n    \"\"\"\n    def __init__(self, in_channels, square_kernel_size=3, band_kernel_size=11, branch_ratio=0.125):\n        super().__init__()\n        \n        gc = int(in_channels * branch_ratio) # channel numbers of a convolution branch\n        self.dwconv_hw = nn.Conv2d(gc, gc, square_kernel_size, padding=square_kernel_size//2, groups=gc)\n        self.dwconv_w = nn.Conv2d(gc, gc, kernel_size=(1, band_kernel_size), padding=(0, band_kernel_size//2), groups=gc)\n        self.dwconv_h = nn.Conv2d(gc, gc, kernel_size=(band_kernel_size, 1), padding=(band_kernel_size//2, 0), groups=gc)\n        self.split_indexes = (in_channels - 3 * gc, gc, gc, gc)\n        \n    def forward(self, x):\n        x_id, x_hw, x_w, x_h = torch.split(x, self.split_indexes, dim=1)\n        return torch.cat(\n            (x_id, self.dwconv_hw(x_hw), self.dwconv_w(x_w), self.dwconv_h(x_h)), \n            dim=1,\n        )\n\n\nclass ConvMlp(nn.Module):\n    \"\"\" MLP using 1x1 convs that keeps spatial dims\n    copied from timm: https://github.com/huggingface/pytorch-image-models/blob/v0.6.11/timm/models/layers/mlp.py\n    \"\"\"\n    def __init__(\n            self, in_features, hidden_features=None, out_features=None, act_layer=nn.ReLU,\n            norm_layer=None, bias=True, drop=0.):\n        super().__init__()\n        out_features = out_features or in_features\n        hidden_features = hidden_features or in_features\n        bias = to_2tuple(bias)\n\n        self.fc1 = nn.Conv2d(in_features, hidden_features, kernel_size=1, bias=bias[0])\n        self.norm = norm_layer(hidden_features) if norm_layer else nn.Identity()\n        self.act = act_layer()\n        self.drop = nn.Dropout(drop)\n        self.fc2 = nn.Conv2d(hidden_features, out_features, kernel_size=1, bias=bias[1])\n\n    def forward(self, x):\n        x = self.fc1(x)\n        x = self.norm(x)\n        x = self.act(x)\n        x = self.drop(x)\n        x = self.fc2(x)\n        return x\n\n\nclass MlpHead(nn.Module):\n    \"\"\" MLP classification head\n    \"\"\"\n    def __init__(self, dim, num_classes=1000, mlp_ratio=3, act_layer=nn.GELU,\n        norm_layer=partial(nn.LayerNorm, eps=1e-6), drop=0., bias=True):\n        super().__init__()\n        hidden_features = int(mlp_ratio * dim)\n        self.fc1 = nn.Linear(dim, hidden_features, bias=bias)\n        self.act = act_layer()\n        self.norm = norm_layer(hidden_features)\n        self.fc2 = nn.Linear(hidden_features, num_classes, bias=bias)\n        self.drop = nn.Dropout(drop)\n\n    def forward(self, x):\n        x = x.mean((2, 3)) # global average pooling\n        x = self.fc1(x)\n        x = self.act(x)\n        x = self.norm(x)\n        x = self.drop(x)\n        x = self.fc2(x)\n        return x\n\n\nclass MetaNeXtBlock(nn.Module):\n    \"\"\" MetaNeXtBlock Block\n    Args:\n        dim (int): Number of input channels.\n        drop_path (float): Stochastic depth rate. Default: 0.0\n        ls_init_value (float): Init value for Layer Scale. Default: 1e-6.\n    \"\"\"\n\n    def __init__(\n            self,\n            dim,\n            token_mixer=InceptionDWConv2d,\n            norm_layer=nn.BatchNorm2d,\n            mlp_layer=ConvMlp,\n            mlp_ratio=4,\n            act_layer=nn.GELU,\n            ls_init_value=1e-6,\n            drop_path=0.,\n            \n    ):\n        super().__init__()\n        self.token_mixer = token_mixer(dim)\n        self.norm = norm_layer(dim)\n        self.mlp = mlp_layer(dim, int(mlp_ratio * dim), act_layer=act_layer)\n        self.gamma = nn.Parameter(ls_init_value * torch.ones(dim)) if ls_init_value else None\n        self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity()\n\n    def forward(self, x):\n        shortcut = x\n        x = self.token_mixer(x)\n        x = self.norm(x)\n        x = self.mlp(x)\n        if self.gamma is not None:\n            x = x.mul(self.gamma.reshape(1, -1, 1, 1))\n        x = self.drop_path(x) + shortcut\n        return x\n\n\nclass MetaNeXtStage(nn.Module):\n    def __init__(\n            self,\n            in_chs,\n            out_chs,\n            ds_stride=2,\n            depth=2,\n            drop_path_rates=None,\n            ls_init_value=1.0,\n            act_layer=nn.GELU,\n            norm_layer=None,\n            mlp_ratio=4,\n    ):\n        super().__init__()\n        self.grad_checkpointing = False\n        if ds_stride > 1:\n            self.downsample = nn.Sequential(\n                norm_layer(in_chs),\n                nn.Conv2d(in_chs, out_chs, kernel_size=ds_stride, stride=ds_stride),\n            )\n        else:\n            self.downsample = nn.Identity()\n\n        drop_path_rates = drop_path_rates or [0.] * depth\n        stage_blocks = []\n        for i in range(depth):\n            stage_blocks.append(MetaNeXtBlock(\n                dim=out_chs,\n                drop_path=drop_path_rates[i],\n                ls_init_value=ls_init_value,\n                act_layer=act_layer,\n                norm_layer=norm_layer,\n                mlp_ratio=mlp_ratio,\n            ))\n            in_chs = out_chs\n        self.blocks = nn.Sequential(*stage_blocks)\n\n    def forward(self, x):\n        x = self.downsample(x)\n        if self.grad_checkpointing and not torch.jit.is_scripting():\n            x = checkpoint_seq(self.blocks, x)\n        else:\n            x = self.blocks(x)\n        return x\n\n\nclass MetaNeXt(nn.Module):\n    r\"\"\" MetaNeXt\n        A PyTorch impl of : `InceptionNeXt: When Inception Meets ConvNeXt`  - https://arxiv.org/pdf/2203.xxxxx.pdf\n    Args:\n        in_chans (int): Number of input image channels. Default: 3\n        num_classes (int): Number of classes for classification head. Default: 1000\n        depths (tuple(int)): Number of blocks at each stage. Default: (3, 3, 9, 3)\n        dims (tuple(int)): Feature dimension at each stage. Default: (96, 192, 384, 768)\n        token_mixers: Token mixer function. Default: nn.Identity\n        norm_layer: Normalziation layer. Default: nn.BatchNorm2d\n        act_layer: Activation function for MLP. Default: nn.GELU\n        mlp_ratios (int or tuple(int)): MLP ratios. Default: (4, 4, 4, 3)\n        head_fn: classifier head\n        drop_rate (float): Head dropout rate\n        drop_path_rate (float): Stochastic depth rate. Default: 0.\n        ls_init_value (float): Init value for Layer Scale. Default: 1e-6.\n    \"\"\"\n\n    def __init__(\n            self,\n            in_chans=3,\n            num_classes=1000,\n            depths=(3, 3, 9, 3),\n            dims=(96, 192, 384, 768),\n            token_mixers=nn.Identity,\n            norm_layer=nn.BatchNorm2d,\n            act_layer=nn.GELU,\n            mlp_ratios=(4, 4, 4, 3),\n            head_fn=MlpHead,\n            drop_rate=0.,\n            drop_path_rate=0.,\n            ls_init_value=1e-6,\n            **kwargs,\n    ):\n        super().__init__()\n\n        num_stage = len(depths)\n        if not isinstance(token_mixers, (list, tuple)):\n            token_mixers = [token_mixers] * num_stage\n        if not isinstance(mlp_ratios, (list, tuple)):\n            mlp_ratios = [mlp_ratios] * num_stage\n\n\n        self.num_classes = num_classes\n        self.drop_rate = drop_rate\n        self.stem = nn.Sequential(\n            nn.Conv2d(in_chans, dims[0], kernel_size=4, stride=4),\n            norm_layer(dims[0])\n        )\n\n        self.stages = nn.Sequential()\n        dp_rates = [x.tolist() for x in torch.linspace(0, drop_path_rate, sum(depths)).split(depths)]\n        stages = []\n        prev_chs = dims[0]\n        # feature resolution stages, each consisting of multiple residual blocks\n        for i in range(num_stage):\n            out_chs = dims[i]\n            stages.append(MetaNeXtStage(\n                prev_chs,\n                out_chs,\n                ds_stride=2 if i > 0 else 1, \n                depth=depths[i],\n                drop_path_rates=dp_rates[i],\n                ls_init_value=ls_init_value,\n                act_layer=act_layer,\n                norm_layer=norm_layer,\n                mlp_ratio=mlp_ratios[i],\n            ))\n            prev_chs = out_chs\n        self.stages = nn.Sequential(*stages)\n        self.num_features = prev_chs\n        self.apply(self._init_weights)\n        self.channel = [i.size(1) for i in self.forward(torch.randn(1, 3, 640, 640))]\n\n    @torch.jit.ignore\n    def set_grad_checkpointing(self, enable=True):\n        for s in self.stages:\n            s.grad_checkpointing = enable\n\n    @torch.jit.ignore\n    def no_weight_decay(self):\n        return {'norm'}\n    \n    def forward(self, x):\n        input_size = x.size(2)\n        scale = [4, 8, 16, 32]\n        features = [None, None, None, None]\n        x = self.stem(x)\n        features[scale.index(input_size // x.size(2))] = x\n        for idx, layer in enumerate(self.stages):\n            x = layer(x)\n            if input_size // x.size(2) in scale:\n                features[scale.index(input_size // x.size(2))] = x\n        return features\n\n    def _init_weights(self, m):\n        if isinstance(m, (nn.Conv2d, nn.Linear)):\n            trunc_normal_(m.weight, std=.02)\n            if m.bias is not None:\n                nn.init.constant_(m.bias, 0)\n\ndef _cfg(url='', **kwargs):\n    return {\n        'url': url,\n        'num_classes': 1000, 'input_size': (3, 224, 224), 'pool_size': (7, 7),\n        'crop_pct': 0.875, 'interpolation': 'bicubic',\n        'mean': IMAGENET_DEFAULT_MEAN, 'std': IMAGENET_DEFAULT_STD,\n        'first_conv': 'stem.0', 'classifier': 'head.fc',\n        **kwargs\n    }\n\ndef update_weight(model_dict, weight_dict):\n    idx, temp_dict = 0, {}\n    for k, v in weight_dict.items():\n        if k in model_dict.keys() and np.shape(model_dict[k]) == np.shape(v):\n            temp_dict[k] = v\n            idx += 1\n    model_dict.update(temp_dict)\n    print(f'loading weights... {idx}/{len(model_dict)} items')\n    return model_dict\n\ndefault_cfgs = dict(\n    inceptionnext_tiny=_cfg(\n        url='https://github.com/sail-sg/inceptionnext/releases/download/model/inceptionnext_tiny.pth',\n    ),\n    inceptionnext_small=_cfg(\n        url='https://github.com/sail-sg/inceptionnext/releases/download/model/inceptionnext_small.pth',\n    ),\n    inceptionnext_base=_cfg(\n        url='https://github.com/sail-sg/inceptionnext/releases/download/model/inceptionnext_base.pth',\n    ),\n    inceptionnext_base_384=_cfg(\n        url='https://github.com/sail-sg/inceptionnext/releases/download/model/inceptionnext_base_384.pth',\n        input_size=(3, 384, 384), crop_pct=1.0,\n    ),\n)\n\ndef inceptionnext_tiny(pretrained=False, **kwargs):\n    model = MetaNeXt(depths=(3, 3, 9, 3), dims=(96, 192, 384, 768), \n                      token_mixers=InceptionDWConv2d,\n                      **kwargs\n    )\n    model.default_cfg = default_cfgs['inceptionnext_tiny']\n    if pretrained:\n        state_dict = torch.hub.load_state_dict_from_url(url=model.default_cfg['url'], map_location=\"cpu\", check_hash=True)\n        model.load_state_dict(state_dict)\n    return model\n\ndef inceptionnext_small(pretrained=False, **kwargs):\n    model = MetaNeXt(depths=(3, 3, 27, 3), dims=(96, 192, 384, 768), \n                      token_mixers=InceptionDWConv2d,\n                      **kwargs\n    )\n    model.default_cfg = default_cfgs['inceptionnext_small']\n    if pretrained:\n        state_dict = torch.hub.load_state_dict_from_url(url=model.default_cfg['url'], map_location=\"cpu\", check_hash=True)\n        model.load_state_dict(state_dict)\n    return model\n\ndef inceptionnext_base(pretrained=False, **kwargs):\n    model = MetaNeXt(depths=(3, 3, 27, 3), dims=(128, 256, 512, 1024), \n                      token_mixers=InceptionDWConv2d,\n                      **kwargs\n    )\n    model.default_cfg = default_cfgs['inceptionnext_base']\n    if pretrained:\n        state_dict = torch.hub.load_state_dict_from_url(url=model.default_cfg['url'], map_location=\"cpu\", check_hash=True)\n        model.load_state_dict(state_dict)\n    return model\n\ndef inceptionnext_base_384(pretrained=False, **kwargs):\n    model = MetaNeXt(depths=[3, 3, 27, 3], dims=[128, 256, 512, 1024], \n                      mlp_ratios=[4, 4, 4, 3],\n                      token_mixers=InceptionDWConv2d,\n                      **kwargs\n    )\n    model.default_cfg = default_cfgs['inceptionnext_base_384']\n    if pretrained:\n        state_dict = torch.hub.load_state_dict_from_url(url=model.default_cfg['url'], map_location=\"cpu\", check_hash=True)\n        model.load_state_dict(state_dict)\n    return model\n\nif __name__ == '__main__':\n    model = inceptionnext_tiny(pretrained=False)\n    inputs = torch.randn((1, 3, 640, 640))\n    for i in model(inputs):\n        print(i.size())"
  },
  {
    "path": "yolo-improve/yolov5-backbone/main.py",
    "content": "import torch, timm\nfrom thop import clever_format, profile\n\n# print(timm.list_models())\n\ndevice = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\ndummy_input = torch.randn(1, 3, 640, 640).to(device)\n\n# model = timm.create_model('edgenext_small', pretrained=False, features_only=True)\nmodel = timm.create_model('vovnet39a', pretrained=False, features_only=True)\nmodel.to(device)\nmodel.eval()\n\nprint(model.feature_info.channels())\nfor feature in model(dummy_input):\n    print(feature.size())\n\nflops, params = profile(model.to(device), (dummy_input,), verbose=False)\nflops, params = clever_format([flops * 2, params], \"%.3f\")\nprint('Total FLOPS: %s' % (flops))\nprint('Total params: %s' % (params))"
  },
  {
    "path": "yolo-improve/yolov5-backbone/yolo.py",
    "content": "def parse_model(d, ch):  # model_dict, input_channels(3)\n    # Parse a YOLOv5 model.yaml dictionary\n    LOGGER.info(f\"\\n{'':>3}{'from':>18}{'n':>3}{'params':>10}  {'module':<40}{'arguments':<30}\")\n    anchors, nc, gd, gw, act = d['anchors'], d['nc'], d['depth_multiple'], d['width_multiple'], d.get('activation')\n    if act:\n        Conv.default_act = eval(act)  # redefine default activation, i.e. Conv.default_act = nn.SiLU()\n        LOGGER.info(f\"{colorstr('activation:')} {act}\")  # print\n    na = (len(anchors[0]) // 2) if isinstance(anchors, list) else anchors  # number of anchors\n    no = na * (nc + 5)  # number of outputs = anchors * (classes + 5)\n\n    is_backbone = False\n    layers, save, c2 = [], [], ch[-1]  # layers, savelist, ch out\n    for i, (f, n, m, args) in enumerate(d['backbone'] + d['head']):  # from, number, module, args\n        try:\n            t = m\n            m = eval(m) if isinstance(m, str) else m  # eval strings\n        except:\n            pass\n        for j, a in enumerate(args):\n            with contextlib.suppress(NameError):\n                try:\n                    args[j] = eval(a) if isinstance(a, str) else a  # eval strings\n                except:\n                    args[j] = a\n\n        n = n_ = max(round(n * gd), 1) if n > 1 else n  # depth gain\n        if m in {\n                Conv, GhostConv, Bottleneck, GhostBottleneck, SPP, SPPF, DWConv, MixConv2d, Focus, CrossConv,\n                BottleneckCSP, C3, C3TR, C3SPP, C3Ghost, nn.ConvTranspose2d, DWConvTranspose2d, C3x}:\n            c1, c2 = ch[f], args[0]\n            if c2 != no:  # if not output\n                c2 = make_divisible(c2 * gw, 8)\n\n            args = [c1, c2, *args[1:]]\n            if m in {BottleneckCSP, C3, C3TR, C3Ghost, C3x}:\n                args.insert(2, n)  # number of repeats\n                n = 1\n        elif m is nn.BatchNorm2d:\n            args = [ch[f]]\n        elif m is Concat:\n            c2 = sum(ch[x] for x in f)\n        # TODO: channel, gw, gd\n        elif m in {Detect, Segment}:\n            args.append([ch[x] for x in f])\n            if isinstance(args[1], int):  # number of anchors\n                args[1] = [list(range(args[1] * 2))] * len(f)\n            if m is Segment:\n                args[3] = make_divisible(args[3] * gw, 8)\n        elif m is Contract:\n            c2 = ch[f] * args[0] ** 2\n        elif m is Expand:\n            c2 = ch[f] // args[0] ** 2\n        elif isinstance(m, str):\n            t = m\n            m = timm.create_model(m, pretrained=args[0], features_only=True)\n            c2 = m.feature_info.channels()\n        # elif m in {}:\n        #     m = m(*args)\n        #     c2 = m.channel\n        else:\n            c2 = ch[f]\n        if isinstance(c2, list):\n            is_backbone = True\n            m_ = m\n            m_.backbone = True\n        else:\n            m_ = nn.Sequential(*(m(*args) for _ in range(n))) if n > 1 else m(*args)  # module\n            t = str(m)[8:-2].replace('__main__.', '')  # module type\n        np = sum(x.numel() for x in m_.parameters())  # number params\n        m_.i, m_.f, m_.type, m_.np = i + 4 if is_backbone else i, f, t, np  # attach index, 'from' index, type, number params\n        LOGGER.info(f'{i:>3}{str(f):>18}{n_:>3}{np:10.0f}  {t:<40}{str(args):<30}')  # print\n        save.extend(x % (i + 4 if is_backbone else i) for x in ([f] if isinstance(f, int) else f) if x != -1)  # append to savelist\n        layers.append(m_)\n        if i == 0:\n            ch = []\n        if isinstance(c2, list):\n            ch.extend(c2)\n            for _ in range(5 - len(ch)):\n                ch.insert(0, 0)\n        else:\n            ch.append(c2)\n    return nn.Sequential(*layers), sorted(save)\n\ndef _forward_once(self, x, profile=False, visualize=False):\n    y, dt = [], []  # outputs\n    for m in self.model:\n        if m.f != -1:  # if not from previous layer\n            x = y[m.f] if isinstance(m.f, int) else [x if j == -1 else y[j] for j in m.f]  # from earlier layers\n        if profile:\n            self._profile_one_layer(m, x, dt)\n        if hasattr(m, 'backbone'):\n            x = m(x)\n            for _ in range(5 - len(x)):\n                x.insert(0, None)\n            for i_idx, i in enumerate(x):\n                if i_idx in self.save:\n                    y.append(i)\n                else:\n                    y.append(None)\n            x = x[-1]\n        else:\n            x = m(x)  # run\n            y.append(x if m.i in self.save else None)  # save output\n        if visualize:\n            feature_visualization(x, m.type, m.i, save_dir=visualize)\n    return x"
  },
  {
    "path": "yolo-improve/yolov5-backbone/yolov5-custom.yaml",
    "content": "# YOLOv5 🚀 by Ultralytics, GPL-3.0 license\n\n# Parameters\nnc: 80  # number of classes\ndepth_multiple: 0.33  # model depth multiple\nwidth_multiple: 0.25  # layer channel multiple\nanchors:\n  - [10,13, 16,30, 33,23]  # P3/8\n  - [30,61, 62,45, 59,119]  # P4/16\n  - [116,90, 156,198, 373,326]  # P5/32\n\n# 0-P1/2\n# 1-P2/4\n# 2-P3/8\n# 3-P4/16\n# 4-P5/32\n\n# YOLOv5 v6.0 backbone\nbackbone:\n  # [from, number, module, args]\n  [[-1, 1, vovnet39a, [False]], # 4\n   [-1, 1, SPPF, [1024, 5]],  # 5\n  ]\n\n# YOLOv5 v6.0 head\nhead:\n  [[-1, 1, Conv, [512, 1, 1]], # 6\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']], # 7\n   [[-1, 3], 1, Concat, [1]],  # cat backbone P4 8\n   [-1, 3, C3, [512, False]],  # 9\n\n   [-1, 1, Conv, [256, 1, 1]], # 10\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']], # 11\n   [[-1, 2], 1, Concat, [1]],  # cat backbone P3 12\n   [-1, 3, C3, [256, False]],  # 13 (P3/8-small)\n\n   [-1, 1, Conv, [256, 3, 2]], # 14\n   [[-1, 10], 1, Concat, [1]],  # cat head P4 15\n   [-1, 3, C3, [512, False]],  # 16 (P4/16-medium)\n\n   [-1, 1, Conv, [512, 3, 2]], # 17\n   [[-1, 5], 1, Concat, [1]],  # cat head P5 18\n   [-1, 3, C3, [1024, False]],  # 19 (P5/32-large)\n\n   [[13, 16, 19], 1, Detect, [nc, anchors]],  # Detect(P3, P4, P5)\n  ]"
  },
  {
    "path": "yolo-improve/yolov5-dyhead.py",
    "content": "# Copyright (c) OpenMMLab. All rights reserved.\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nfrom mmcv.cnn import build_activation_layer, build_norm_layer\nfrom mmcv.ops.modulated_deform_conv import ModulatedDeformConv2d\nfrom mmengine.model import constant_init, normal_init\n\ndef _make_divisible(v, divisor, min_value=None):\n    if min_value is None:\n        min_value = divisor\n    new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)\n    # Make sure that round down does not go down by more than 10%.\n    if new_v < 0.9 * v:\n        new_v += divisor\n    return new_v\n\n\nclass swish(nn.Module):\n    def forward(self, x):\n        return x * torch.sigmoid(x)\n\n\nclass h_swish(nn.Module):\n    def __init__(self, inplace=False):\n        super(h_swish, self).__init__()\n        self.inplace = inplace\n\n    def forward(self, x):\n        return x * F.relu6(x + 3.0, inplace=self.inplace) / 6.0\n\n\nclass h_sigmoid(nn.Module):\n    def __init__(self, inplace=True, h_max=1):\n        super(h_sigmoid, self).__init__()\n        self.relu = nn.ReLU6(inplace=inplace)\n        self.h_max = h_max\n\n    def forward(self, x):\n        return self.relu(x + 3) * self.h_max / 6\n\n\nclass DyReLU(nn.Module):\n    def __init__(self, inp, reduction=4, lambda_a=1.0, K2=True, use_bias=True, use_spatial=False,\n                 init_a=[1.0, 0.0], init_b=[0.0, 0.0]):\n        super(DyReLU, self).__init__()\n        self.oup = inp\n        self.lambda_a = lambda_a * 2\n        self.K2 = K2\n        self.avg_pool = nn.AdaptiveAvgPool2d(1)\n\n        self.use_bias = use_bias\n        if K2:\n            self.exp = 4 if use_bias else 2\n        else:\n            self.exp = 2 if use_bias else 1\n        self.init_a = init_a\n        self.init_b = init_b\n\n        # determine squeeze\n        if reduction == 4:\n            squeeze = inp // reduction\n        else:\n            squeeze = _make_divisible(inp // reduction, 4)\n        # print('reduction: {}, squeeze: {}/{}'.format(reduction, inp, squeeze))\n        # print('init_a: {}, init_b: {}'.format(self.init_a, self.init_b))\n\n        self.fc = nn.Sequential(\n            nn.Linear(inp, squeeze),\n            nn.ReLU(inplace=True),\n            nn.Linear(squeeze, self.oup * self.exp),\n            h_sigmoid()\n        )\n        if use_spatial:\n            self.spa = nn.Sequential(\n                nn.Conv2d(inp, 1, kernel_size=1),\n                nn.BatchNorm2d(1),\n            )\n        else:\n            self.spa = None\n\n    def forward(self, x):\n        if isinstance(x, list):\n            x_in = x[0]\n            x_out = x[1]\n        else:\n            x_in = x\n            x_out = x\n        b, c, h, w = x_in.size()\n        y = self.avg_pool(x_in).view(b, c)\n        y = self.fc(y).view(b, self.oup * self.exp, 1, 1)\n        if self.exp == 4:\n            a1, b1, a2, b2 = torch.split(y, self.oup, dim=1)\n            a1 = (a1 - 0.5) * self.lambda_a + self.init_a[0]  # 1.0\n            a2 = (a2 - 0.5) * self.lambda_a + self.init_a[1]\n\n            b1 = b1 - 0.5 + self.init_b[0]\n            b2 = b2 - 0.5 + self.init_b[1]\n            out = torch.max(x_out * a1 + b1, x_out * a2 + b2)\n        elif self.exp == 2:\n            if self.use_bias:  # bias but not PL\n                a1, b1 = torch.split(y, self.oup, dim=1)\n                a1 = (a1 - 0.5) * self.lambda_a + self.init_a[0]  # 1.0\n                b1 = b1 - 0.5 + self.init_b[0]\n                out = x_out * a1 + b1\n\n            else:\n                a1, a2 = torch.split(y, self.oup, dim=1)\n                a1 = (a1 - 0.5) * self.lambda_a + self.init_a[0]  # 1.0\n                a2 = (a2 - 0.5) * self.lambda_a + self.init_a[1]\n                out = torch.max(x_out * a1, x_out * a2)\n\n        elif self.exp == 1:\n            a1 = y\n            a1 = (a1 - 0.5) * self.lambda_a + self.init_a[0]  # 1.0\n            out = x_out * a1\n\n        if self.spa:\n            ys = self.spa(x_in).view(b, -1)\n            ys = F.softmax(ys, dim=1).view(b, 1, h, w) * h * w\n            ys = F.hardtanh(ys, 0, 3, inplace=True)/3\n            out = out * ys\n\n        return out\n\nclass DyDCNv2(nn.Module):\n    \"\"\"ModulatedDeformConv2d with normalization layer used in DyHead.\n    This module cannot be configured with `conv_cfg=dict(type='DCNv2')`\n    because DyHead calculates offset and mask from middle-level feature.\n    Args:\n        in_channels (int): Number of input channels.\n        out_channels (int): Number of output channels.\n        stride (int | tuple[int], optional): Stride of the convolution.\n            Default: 1.\n        norm_cfg (dict, optional): Config dict for normalization layer.\n            Default: dict(type='GN', num_groups=16, requires_grad=True).\n    \"\"\"\n\n    def __init__(self,\n                 in_channels,\n                 out_channels,\n                 stride=1,\n                 norm_cfg=dict(type='GN', num_groups=16, requires_grad=True)):\n        super().__init__()\n        self.with_norm = norm_cfg is not None\n        bias = not self.with_norm\n        self.conv = ModulatedDeformConv2d(\n            in_channels, out_channels, 3, stride=stride, padding=1, bias=bias)\n        if self.with_norm:\n            self.norm = build_norm_layer(norm_cfg, out_channels)[1]\n\n    def forward(self, x, offset, mask):\n        \"\"\"Forward function.\"\"\"\n        x = self.conv(x.contiguous(), offset, mask)\n        if self.with_norm:\n            x = self.norm(x)\n        return x\n\n\nclass DyHeadBlock(nn.Module):\n    \"\"\"DyHead Block with three types of attention.\n    HSigmoid arguments in default act_cfg follow official code, not paper.\n    https://github.com/microsoft/DynamicHead/blob/master/dyhead/dyrelu.py\n    \"\"\"\n\n    def __init__(self,\n                 in_channels,\n                 norm_type='GN',\n                 zero_init_offset=True,\n                 act_cfg=dict(type='HSigmoid', bias=3.0, divisor=6.0)):\n        super().__init__()\n        self.zero_init_offset = zero_init_offset\n        # (offset_x, offset_y, mask) * kernel_size_y * kernel_size_x\n        self.offset_and_mask_dim = 3 * 3 * 3\n        self.offset_dim = 2 * 3 * 3\n\n        if norm_type == 'GN':\n            norm_dict = dict(type='GN', num_groups=16, requires_grad=True)\n        elif norm_type == 'BN':\n            norm_dict = dict(type='BN', requires_grad=True)\n        \n        self.spatial_conv_high = DyDCNv2(in_channels, in_channels, norm_cfg=norm_dict)\n        self.spatial_conv_mid = DyDCNv2(in_channels, in_channels)\n        self.spatial_conv_low = DyDCNv2(in_channels, in_channels, stride=2)\n        self.spatial_conv_offset = nn.Conv2d(\n            in_channels, self.offset_and_mask_dim, 3, padding=1)\n        self.scale_attn_module = nn.Sequential(\n            nn.AdaptiveAvgPool2d(1), nn.Conv2d(in_channels, 1, 1),\n            nn.ReLU(inplace=True), build_activation_layer(act_cfg))\n        self.task_attn_module = DyReLU(in_channels)\n        self._init_weights()\n\n    def _init_weights(self):\n        for m in self.modules():\n            if isinstance(m, nn.Conv2d):\n                normal_init(m, 0, 0.01)\n        if self.zero_init_offset:\n            constant_init(self.spatial_conv_offset, 0)\n\n    def forward(self, x):\n        \"\"\"Forward function.\"\"\"\n        outs = []\n        for level in range(len(x)):\n            # calculate offset and mask of DCNv2 from middle-level feature\n            offset_and_mask = self.spatial_conv_offset(x[level])\n            offset = offset_and_mask[:, :self.offset_dim, :, :]\n            mask = offset_and_mask[:, self.offset_dim:, :, :].sigmoid()\n\n            mid_feat = self.spatial_conv_mid(x[level], offset, mask)\n            sum_feat = mid_feat * self.scale_attn_module(mid_feat)\n            summed_levels = 1\n            if level > 0:\n                low_feat = self.spatial_conv_low(x[level - 1], offset, mask)\n                sum_feat += low_feat * self.scale_attn_module(low_feat)\n                summed_levels += 1\n            if level < len(x) - 1:\n                # this upsample order is weird, but faster than natural order\n                # https://github.com/microsoft/DynamicHead/issues/25\n                high_feat = F.interpolate(\n                    self.spatial_conv_high(x[level + 1], offset, mask),\n                    size=x[level].shape[-2:],\n                    mode='bilinear',\n                    align_corners=True)\n                sum_feat += high_feat * self.scale_attn_module(high_feat)\n                summed_levels += 1\n            outs.append(self.task_attn_module(sum_feat / summed_levels))\n\n        return outs\n\n[17, 1, Conv, [128, 1, 1]],\n[20, 1, Conv, [128, 1, 1]],\n[23, 1, Conv, [128, 1, 1]],\n[[24, 25, 26], 1, Detect, [nc, anchors]],  # Detect(P3, P4, P5)\n\n\nself.dyhead = nn.Sequential(*[DyHeadBlock(ch[0]) for i in range(2)])\nfor dyhead_layer in self.dyhead:\n    x = dyhead_layer(x)"
  },
  {
    "path": "yolo-improve/yolov5-res2block.py",
    "content": "class Bottle2neck(nn.Module):\n    expansion = 1\n\n    def __init__(self, inplanes, planes, shortcut, baseWidth=26, scale = 4):\n        \"\"\" Constructor\n        Args:\n            inplanes: input channel dimensionality\n            planes: output channel dimensionality\n            baseWidth: basic width of conv3x3\n            scale: number of scale.\n        \"\"\"\n        super(Bottle2neck, self).__init__()\n\n        width = int(math.floor(planes * (baseWidth/64.0)))\n        self.conv1 = Conv(inplanes, width*scale, k=1)\n        \n        if scale == 1:\n          self.nums = 1\n        else:\n          self.nums = scale -1\n        convs = []\n        for i in range(self.nums):\n          convs.append(Conv(width, width, k=3))\n        self.convs = nn.ModuleList(convs)\n\n        self.conv3 = Conv(width*scale, planes * self.expansion, k=1, act=False)\n\n        self.silu = nn.SiLU(inplace=True)\n        self.scale = scale\n        self.width  = width\n        self.shortcut = shortcut\n\n    def forward(self, x):\n        print(1)\n        if self.shortcut:\n            residual = x\n        out = self.conv1(x)\n        spx = torch.split(out, self.width, 1)\n        for i in range(self.nums):\n          if i==0:\n            sp = spx[i]\n          else:\n            sp = sp + spx[i]\n          sp = self.convs[i](sp)\n          if i==0:\n            out = sp\n          else:\n            out = torch.cat((out, sp), 1)\n        if self.scale != 1:\n          out = torch.cat((out, spx[self.nums]),1)\n\n        out = self.conv3(out)\n        if self.shortcut:\n            out += residual\n        out = self.silu(out)\n        return out\n\nclass C3_Res2Block(C3):\n    # CSP Bottleneck with 3 convolutions\n    def __init__(self, c1, c2, n=1, shortcut=True, g=1, e=0.5):  # ch_in, ch_out, number, shortcut, groups, expansion\n        super().__init__(c1, c2, n, shortcut, g, e)\n        c_ = int(c2 * e)  # hidden channels\n        self.m = nn.Sequential(*(Bottle2neck(c_, c_, shortcut) for _ in range(n)))"
  },
  {
    "path": "yolo-improve/yolov5-softnms.py",
    "content": "def box_iou_for_nms(box1, box2, GIoU=False, DIoU=False, CIoU=False, SIoU=False, EIou=False, eps=1e-7):\n    # Returns Intersection over Union (IoU) of box1(1,4) to box2(n,4)\n\n    b1_x1, b1_y1, b1_x2, b1_y2 = box1.chunk(4, -1)\n    b2_x1, b2_y1, b2_x2, b2_y2 = box2.chunk(4, -1)\n    w1, h1 = b1_x2 - b1_x1, (b1_y2 - b1_y1).clamp(eps)\n    w2, h2 = b2_x2 - b2_x1, (b2_y2 - b2_y1).clamp(eps)\n\n    # Intersection area\n    inter = (b1_x2.minimum(b2_x2) - b1_x1.maximum(b2_x1)).clamp(0) * \\\n            (b1_y2.minimum(b2_y2) - b1_y1.maximum(b2_y1)).clamp(0)\n\n    # Union Area\n    union = w1 * h1 + w2 * h2 - inter + eps\n\n    # IoU\n    iou = inter / union\n    if CIoU or DIoU or GIoU or EIou:\n        cw = b1_x2.maximum(b2_x2) - b1_x1.minimum(b2_x1)  # convex (smallest enclosing box) width\n        ch = b1_y2.maximum(b2_y2) - b1_y1.minimum(b2_y1)  # convex height\n        if CIoU or DIoU or EIou:  # Distance or Complete IoU https://arxiv.org/abs/1911.08287v1\n            c2 = cw ** 2 + ch ** 2 + eps  # convex diagonal squared\n            rho2 = ((b2_x1 + b2_x2 - b1_x1 - b1_x2) ** 2 + (b2_y1 + b2_y2 - b1_y1 - b1_y2) ** 2) / 4  # center dist ** 2\n            if CIoU:  # https://github.com/Zzh-tju/DIoU-SSD-pytorch/blob/master/utils/box/box_utils.py#L47\n                v = (4 / math.pi ** 2) * (torch.atan(w2 / h2) - torch.atan(w1 / h1)).pow(2)\n                with torch.no_grad():\n                    alpha = v / (v - iou + (1 + eps))\n                return iou - (rho2 / c2 + v * alpha)  # CIoU\n            elif EIou:\n                rho_w2 = ((b2_x2 - b2_x1) - (b1_x2 - b1_x1)) ** 2\n                rho_h2 = ((b2_y2 - b2_y1) - (b1_y2 - b1_y1)) ** 2\n                cw2 = cw ** 2 + eps\n                ch2 = ch ** 2 + eps\n                return iou - (rho2 / c2 + rho_w2 / cw2 + rho_h2 / ch2)\n            return iou - rho2 / c2  # DIoU\n        c_area = cw * ch + eps  # convex area\n        return iou - (c_area - union) / c_area  # GIoU https://arxiv.org/pdf/1902.09630.pdf\n    elif SIoU:\n        # SIoU Loss https://arxiv.org/pdf/2205.12740.pdf\n        s_cw = (b2_x1 + b2_x2 - b1_x1 - b1_x2) * 0.5 + eps\n        s_ch = (b2_y1 + b2_y2 - b1_y1 - b1_y2) * 0.5 + eps\n        sigma = torch.pow(s_cw ** 2 + s_ch ** 2, 0.5)\n        sin_alpha_1 = torch.abs(s_cw) / sigma\n        sin_alpha_2 = torch.abs(s_ch) / sigma\n        threshold = pow(2, 0.5) / 2\n        sin_alpha = torch.where(sin_alpha_1 > threshold, sin_alpha_2, sin_alpha_1)\n        angle_cost = torch.cos(torch.arcsin(sin_alpha) * 2 - math.pi / 2)\n        rho_x = (s_cw / cw) ** 2\n        rho_y = (s_ch / ch) ** 2\n        gamma = angle_cost - 2\n        distance_cost = 2 - torch.exp(gamma * rho_x) - torch.exp(gamma * rho_y)\n        omiga_w = torch.abs(w1 - w2) / torch.max(w1, w2)\n        omiga_h = torch.abs(h1 - h2) / torch.max(h1, h2)\n        shape_cost = torch.pow(1 - torch.exp(-1 * omiga_w), 4) + torch.pow(1 - torch.exp(-1 * omiga_h), 4)\n        return iou - 0.5 * (distance_cost + shape_cost)\n    return iou  # IoU\n\ndef soft_nms(bboxes, scores, iou_thresh=0.5,sigma=0.5,score_threshold=0.25):\n    order = torch.arange(0, scores.size(0)).to(bboxes.device)\n    keep = []\n    \n    while order.numel() > 1:\n        if order.numel() == 1:\n            keep.append(order[0])\n            break\n        else:\n            i = order[0]\n            keep.append(i)\n        \n        iou = box_iou_for_nms(bboxes[i], bboxes[order[1:]]).squeeze()\n        \n        idx = (iou > iou_thresh).nonzero().squeeze()\n        if idx.numel() > 0: \n            iou = iou[idx] \n            newScores = torch.exp(-torch.pow(iou,2)/sigma)\n            scores[order[idx+1]] *= newScores\n        \n        newOrder = (scores[order[1:]] > score_threshold).nonzero().squeeze() \n        if newOrder.numel() == 0: \n            break\n        else:\n            maxScoreIndex = torch.argmax(scores[order[newOrder+1]]) \n            if maxScoreIndex != 0: \n                newOrder[[0,maxScoreIndex],] = newOrder[[maxScoreIndex,0],]\n            order = order[newOrder+1]\n    \n    return torch.LongTensor(keep)"
  },
  {
    "path": "yolo-improve/yolov5v7-light.md",
    "content": "# YOLOV5,YOLOV7剪枝和蒸馏项目介绍((不包含v8，但入手过这个剪枝项目，后续v8也会有对应的优惠))\n\n##### 对于群里的剪枝相关问题,我基本都会回复,对于一些剪枝问题,我都会给出建议。  \n\n### 首先剪枝是什么？  \n模型剪枝是深度学习中的一种技术，旨在通过减少神经网络中不必要的参数和连接，来优化模型的效率和性能。模型剪枝可以分为结构剪枝和参数剪枝两种类型。  \n\n### 为什么需要剪枝？  \n剪枝可以很好地衡量模型轻量化程度与精度的关系,是替换轻量化结构完全没办法比的,比如我模型剪枝可以压缩百分之30的计算量,精度只下降了百分之1,但是你通过换模块来达到压缩百分之30的计算量,一般时间就会变长,因为大部分轻量化模块都是由时间换空间,而且精度还会下降得比较多,但是剪枝可以很好地避免这个问题.\n\n### 目前剪枝项目包含：\n1. yolov5-PAGCP\n2. yolov7-PAGCP\n3. yolov7-prune\n4. yolov5-prune\n\n### 其中prune中的剪枝方法包含:\n1. L1 \n2. Random \n3. Slim \n4. GroupSlim \n5. GroupNorm \n6. LAMP \n7. GroupSL \n8. GroupReg\n9. GroupHessian\n10. GroupTaylor\n\n### 其中prune系列还有一些细节：\n1. 支持稀疏训练时候可视化BN稀疏程度和数值。\n2. 稀疏训练的稀疏系数会进行线性调整，让稀疏训练后期精度更容易回升，更稳定。\n3. 支持设定加速比例，模型会进行自动压缩，压缩到指定比例或者达到最大压缩次数后会自动进入finetune。\n\n### 剪枝的一些顾虑\n大家关心最多的一个问题就是，我的结构能不能剪之类的，目前剪枝都是基于Torch_Pruning库进行剪枝，其中PAGCP是版本比较旧的Torch_Pruning库，prune系列的都是最新Torch_Pruning库，所以PAGCP剪枝上兼容性会比prune系列的低，prune系列的可以跳过一些不能剪枝的层(某些复杂的结构可能在构建动态图的时候失败,这些就只能换结构)，这个项目会有比较多的示例和视频教程教大家如何去剪自己的结构,注意点在哪里等等。这个剪枝项目是没办法保证所有的结构都能剪，有一定的风险，是否入手请自行考虑！\n\n### 目前蒸馏方法包含：\n1. Logical\n    1. L1\n    2. L2\n    3. AlignSoftTarget(自研,部分参考[Bridging Cross-task Protocol Inconsistency for Distillation in Dense Object Detection,ICCV 2023]((https://link.zhihu.com/?target=https%3A//arxiv.org//pdf/2308.14286)))\n2. Feature\n    1. [Mimic](https://openaccess.thecvf.com/content_cvpr_2017/papers/Li_Mimicking_Very_Efficient_CVPR_2017_paper.pdf)\n    2. [Masked Generative Distillation](https://link.zhihu.com/?target=https%3A//arxiv.org/pdf/2205.01529.pdf) (ECCV 2022)\n    3. [Channel-wise Distillation](https://arxiv.org/pdf/2011.13256.pdf) (ICCV 2021)\n    4. [ChSimLoss Distillation](https://openaccess.thecvf.com/content/ICCV2021/html/Liu_Exploring_Inter-Channel_Correlation_for_Diversity-Preserved_Knowledge_Distillation_ICCV_2021_paper.html) (ICCV2021)\n    5. [SPKDLoss Distillation](https://arxiv.org/pdf/1907.09682.pdf) (ICCV2019)\n\n### 知识蒸馏的一些细节(具体项目会提供视频讲解)\n1. Feature蒸馏可以自定义选择层进行蒸馏.\n2. 蒸馏损失支持常数,线性,余弦进行动调整.\n3. 支持Logical和Feature一起使用.\n4. 过程中会输出Logical和Feature的损失,让用户可以及时调整对应的损失系数.\n5. 支持正常训练模型时候进行蒸馏和剪枝后finetune蒸馏.\n\n# 实验示例结果.(以下示例实验相关命令,视频教程,实验数据都在项目里面)\n### Sparse:代表需要进行稀疏训练.\n### 2.0x 代表的是设定为两倍加速(4.0x同理),当模型压缩达到设定的倍速时会自动进入finetune阶段.\n\n### Yolov7 相关实验\n#### Mode:Prune Dataset:CrowdHuman 20%  Model:Yolov7-Tiny using OTA  \n| model | Parameters | GFLOPs | Model Size | mAP50 | mAP50-95 | Inference Time(bs:32) |\n| :----: | :----: | :----: | :----: | :----: | :----: | :----: |\n| BaseLine | 6,010,302 | 13.0 | 12.0m | 0.76 | 0.429 | 0.6ms |\n| PAGCP-EXP1 | 3,239,782(53.9%) | 7.5(57.6%) | 6.4m(53.3%) | 0.747(-0.013) | 0.409(-0.02) | 0.5ms |\n| PAGCP-EXP2 | 2,035,468(33.8%) | 5.0(38.4%) | 4.1m(34.2%) | 0.731(-0.029) | 0.393(-0.026) | 0.5ms |\n| Slim(Sparse) 2.0x | 920,155(15.3%) | 6.2(47.7%) | 2.0m(16.7%) | 0.773(+0.013) | 0.429(0.0) | 0.6ms |\n| Slim(Sparse) 4.0x | 375,449(6.2%) | 3.2(24.6%) | 1.0m(8.3%) | 0.73(-0.03) | 0.376(-0.053) | 0.4ms |\n| GroupSlim (Sparse) 2.0x | 915,589(15.2%) | 6.4(49.2%) | 2.0m(16.7%) | 0.772(+0.012) | 0.43(+0.001) | 0.6ms |\n| GroupSlim (Sparse) 4.0x | 375,298(6.3%) | 3.2(24.6%) | 1.0m(8.3%) | 0.727(-0.033) | 0.372(-0.057) | 0.5ms |\n| LAMP 2.0x | 1,310,893(21.81%) | 6.5(50.0%) | 2.9m(24.1%) | 0.766(+0.006) | 0.423(-0.006) | 0.6ms |\n| GroupNorm 2.0x | 2,580,758(42.9%) | 6.5(50.0%) | 5.4m(41.5%) | 0.74(-0.02) | 0.398(-0.021) | 0.6ms |\n| Random 2.0x | 2,950,989(49.1%) | 6.5(50.0%) | 6.1m(46.9%) | 0.742(-0.018) | 0.399(-0.02) | 0.6ms |\n| L1 2.0x | 3,226,567(53.7%) | 6.4(49.2%) | 6.4m(56.3%) | 0.72(-0.04) | 0.387(0.042) | 0.6ms |\n\n#### Mode:Prune Dataset:CrowdHuman 20%  Model:Yolov7-Tiny+MobileNetV3_Small+LSKBlock+TSOCDE+RepConv\n| model | Parameters | GFLOPs | Model Size | mAP50 | mAP50-95 | Inference Time(bs:32) |\n| :----: | :----: | :----: | :----: | :----: | :----: | :----: |\n| BaseLine | 24,665,523 | 33.0 | 48.0m | 0.68 | 0.36 | 1.5ms |\n| LAMP 2.0x | 8,963,220(36.3%) | 16.4(49.7%) | 18.0m(37.5%) | 0.676(-0.004) | 0.354(-0.006) | 1.3ms |\n| GroupSlim (Sparse) 2.0x | 10,686,041(43.3%) | 16.2(49.1%) | 22.0m(45.8%) | 0.641(-0.039) | 0.319(-0.041) | 1.4ms |\n| Slim (Sparse) 2.0x |9,211,532(37.3%) | 16.3(49.4%) | 19.0m(39.6%) | 0.669(-0.011) | 0.342(-0.018) | 1.4ms |\n| L1 1.5x | 21,384,927(86.7%) | 21.8(66.1%) | 42.0m(87.5%) | 0.45(-0.23) | 0.185(-0.175) | 1.4ms |\n\n#### Mode:Prune Dataset:CrowdHuman 20%  Model:Yolov7-Tiny+DCN+AFPN\n| model | Parameters | GFLOPs | Model Size | mAP50 | mAP50-95 | Inference Time(bs:32) |\n| :----: | :----: | :----: | :----: | :----: | :----: | :----: |\n| BaseLine | 4,564,641 | 11.7 | 9.1m | 0.716 | 0.388 | 0.8ms | \n| LAMP 2.0x | 2,323,337(50.9%) | 5.8(49.6%) | 4.8m(52.7%) | 0.7(-0.016) | 0.372(-0.016) | 0.7ms | \n| L1 2.0x | 3,469,961(76.0%) | 5.8(49.6%) | 7.0m(76.9%) | 0.54(-0.176) | 0.268(-0.12) | 0.7ms | \n| Slim (Sparse) 2.0x | 2,385,252(52.2%) | 5.8(49.6%) | 5.8m(64.8%) | 0.641(-0.075) | 0.327(-0.061) | 0.7ms | \n\n#### Mode:Prune Dataset:CrowdHuman 20%  Model:Yolov7-Tiny+FasterNet+DiverseBranchBlock\n| model | Parameters | GFLOPs | Model Size | mAP50 | mAP50-95 | Inference Time(bs:32) |\n| :----: | :----: | :----: | :----: | :----: | :----: | :----: |\n| BaseLine | 4,092,258 | 8.5 | 9.8m | 0.69 | 0.358 | 0.6ms | \n| LAMP 2.0x | 1,392,932(34.0%) | 3.6(42.3%) | 4.4m(44.9%) | 0.67(-0.02) | 0.339(-0.019) | 0.5ms | \n| Slim (Sparse) 2.0x | 1,541,346(37.7%) | 3.6(42.3%) | 4.7m(48.0%) | 0.669(-0.176) | 0.337(-0.021) | 0.5ms | \n| GroupSlim (Sparse) 2.0x | 1,545,707(37.8%) | 3.6(42.3%) | 4.7m(48.0%) | 0.674(-0.016) | 0.342(-0.016) | 0.5ms | \n| GroupNorm 2.0x | 2,141,255(52.3%) | 3.7(43.5%) | 5.8m(59.2%) | 0.214(-0.476) | 0.0535(-0.305) | 0.5ms | \n\n#### Mode:Prune Dataset:CrowdHuman 20%  Model:Yolov7-Tiny+ReXNet(CVPR2021)+VoVGSCSP+DyHead+DecoupledHead\n| model | Parameters | GFLOPs | Model Size | mAP50 | mAP50-95 | Inference Time(bs:32) |\n| :----: | :----: | :----: | :----: | :----: | :----: | :----: |\n| BaseLine | 6,858,519 | 14.8 | 13.6m | 0.731 | 0.405 | 0.14s | \n| LAMP 1.5x | 3,840,822(56.0%) | 9.9(66.9%) | 7.8m(57.3%) | 0.7(-0.031) | 0.379(-0.019) | 0.09s | \n| LAMP 2.0x | 2,821,109(41.1%) | 7.4(50.0%) | 5.8m(42.6%) | 0.681(-0.06) | 0.359(-0.046) | 0.08s | \n\n#### Mode:Prune Dataset:CrowdHuman 20%  Model:Yolov7-Tiny+ReXNet(CVPR2021)+VoVGSCSP+DecoupledHead\n| model | Parameters | GFLOPs | Model Size | mAP50 | mAP50-95 | Inference Time(bs:32) |\n| :----: | :----: | :----: | :----: | :----: | :----: | :----: |\n| BaseLine | 6,512,095 | 11.3 | 12.9m | 0.715 | 0.383 | 0.091s | \n| LAMP 2.0x | 2,930,100(45.0%) | 5.6(49.6%) | 6.0m(46.5%) | 0.627(-0.088) | 0.32(-0.063) | 0.039s | \n| Slim (Sparse) 2.0x | 2,821,109(43.3%) | 5.6(49.6%) | 6.3m(48.8%) | 0.728(+0.013) | 0.373(+0.01) | 0.052s | \n| GroupSlim (Sparse) 2.0x | 3,304,167(50.7%) | 5.7(50.4%) | 6.8m(52.7%) | 0.724(+0.009) | 0.369(-0.014) | 0.053s | \n| GroupSl (Sparse) 2.0x Exp1 | 2,178,723(33.5%) | 5.7(50.4%) | 4.6m(35.7%) | 0.669(-0.046) | 0.341(-0.042) | 0.055s | \n| GroupSl (Sparse) 2.0x Exp2 | 2,060,599(31.6%) | 5.6(49.6%) | 4.4m(34.1%) | 0.761(+0.046) | 0.407(+0.024) | 0.056s | \n| GroupSl (Sparse) 3.0x Exp2 | 1,283,982(19.7%) | 3.7(32.7%) | 2.9m(22.5%) | 0.679(-0.036) | 0.342(-0.041) | 0.041s | \n\n#### Mode:Distill+Prune Dataset:VisDrone(训练集只用了百分之20的数据,验证集和测试集用了全量的数据) Teacher:Yolov7-Tiny\n| model | Parameters | GFLOPs | Model Size | mAP50 | mAP50-95 | Inference Time(bs:32) |\n| :----: | :----: | :----: | :----: | :----: | :----: | :----: |\n| BaseLine(Yolov7-Tiny) | 6,031,950 | 13.1 | 11.7m | 0.189 | 0.0948 | 0.00121s | \n| LAMP 2.0x | 1,309,098 | 6.5 | 2.7m | 0.186(-0.003) | 0.0903(-0.0045) | 0.00089s | \n| LAMP 3.0x | 615,877 | 4.3 | 1.4m | 0.151(-0.038) | 0.0691(-0.0257) | 0.00070s | \n| LAMP 3.0x + CWD exp1 | 615,877 | 4.3 | 1.4m | 0.158(-0.031) | 0.0715(-0.0233) | 0.00070s | \n| LAMP 3.0x + CWD exp2 | 615,877 | 4.3 | 1.4m | 0.155(-0.034)  | 0.0686(-0.0262) | 0.00070s | \n\n### Yolov5 相关实验\n#### Mode:Prune Dataset:CrowdHuman 20%  Model:Yolov5n\n| model | Parameters | GFLOPs | Model Size | mAP50 | mAP50-95 | Inference Time(bs:32) |\n| :----: | :----: | :----: | :----: | :----: | :----: | :----: |\n| BaseLine | 1,761,871 | 4.1 | 3.7m | 0.715 | 0.399 | 0.02s | \n| LAMP 2.0x | 296,498(16.8%) | 2.0(48.8%) | 0.9m(24.3%) | 0.694(-0.021) | 0.368(-0.031) | 0.0164s | \n| Slim (Sparse) 2.0x | 398,607(22.6%) | 2.0(48.8%) | 1.1m(29.7%) | 0.707(-0.008) | 0.38(-0.019) | 0.0166s | \n| GroupSlim (Sparse) 2.0x | 366,230(20.8%) | 2.0(48.8%) | 1.0m(27.0%) | 0.704(-0.011) | 0.381(-0.018) | 0.0165s | \n| GroupNorm 2.0x | 1,016,400(57.7%) | 2.1(51.2%) | 2.3m(62.2%) | 0.617(-0.098) | 0.312(-0.087) | 0.0134s | \n| GroupSl (Sparse) 2.0x | 474,024(26.9%) | 2.0(48.8%) | 1.3m(35.1%) | 0.711(-0.004) | 0.387(-0.012) | 0.0167s | \n\n#### Mode:Prune Dataset:CrowdHuman 20%  Model:Yolov5n+C3-Faster+RepConv\n| model | Parameters | GFLOPs | Model Size | mAP50 | mAP50-95 | Inference Time(bs:32) |\n| :----: | :----: | :----: | :----: | :----: | :----: | :----: |\n| BaseLine | 1,614,495 | 3.7 | 3.4m | 0.711 | 0.388 | 0.021s | \n| LAMP 2.0x | 285,554(17.7%) | 1.8(48.6%) | 0.9m(26.5%) | 0.687(-0.024) | 0.359(-0.029) | 0.017s | \n| Slim (Sparse) 2.0x | 418,550(25.9%) | 1.8(48.6%) | 1.2m(35.3%) | 0.695(-0.026) | 0.365(-0.023) | 0.168s | \n| GroupSlim (Sparse) 2.0x | 434,440(26.9%) | 1.8(48.6%) | 1.2m(35.3%) | 0.698(-0.013) | 0.369(-0.019) | 0.017s | \n| GroupSl (Sparse) 2.0x | 447,587(27.7%) | 1.8(48.6%) | 1.2m(35.3%) | 0.704(-0.007) | 0.376(-0.012) | 0.016s | \n| GroupNorm 2.0x | 935,451(57.9%) | 1.8(48.6%) | 2.1m(61.8%) | 0.652(-0.059) | 0.335(-0.053) | 0.015s | \n\n#### Mode:Distill Dataset:VisDrone(训练集只用了百分之20的数据,验证集和测试集用了全量的数据) Teacher:Yolov5s+OTA Student:Yolov5n\n#### Epoch:300 BatchSize:64 Device:RTX3090\n| model | GFLOPs | mAP50(test set) | mAP50-95(test set) |\n| :----: | :----: | :----: | :----: |\n| yolov5n | 4.2 | 0.171 | 0.0834 |\n| yolov5s | 15.8 | 0.263 | 0.136 |\n| yolov5n cwd exp1 | 4.2 | 0.181(+0.01) | 0.0898(+0.0064) |\n| yolov5n cwd exp2 | 4.2 | 0.188(+0.017) | 0.0931(+0.0097) |\n| yolov5n cwd exp3 | 4.2 | 0.176(+0.005) | 0.0845(+0.0011) |\n| yolov5n cwd exp4 | 4.2 | 0.175(+0.004) | 0.0852(+0.0018) |\n| yolov5n mgd exp1 | 4.2 | 0.181(+0.01) | 0.0883(+0.0049) |\n| yolov5n mgd exp2 | 4.2 | 0.166(-0.005) | 0.0795(-0.0039) |\n| yolov5n mimic exp1 | 4.2 | 0.178(+0.007) | 0.0865(+0.0031) |\n| yolov5n mimic exp1 | 4.2 | 0.172(+0.001) | 0.0833(-0.0001) |\n| yoplov5n l2 exp1 | 4.2 | 0.178(+0.007) | 0.0844(+0.001) |\n| yolov5n l2 exp2 | 4.2 | 0.179(+0.008) | 0.0834(0.0) |\n| yolov5n l2 exp3 | 4.2 | 0.176(+0.005) | 0.0795(-0.0039) |\n| yolov5n ast exp1 | 4.2 | 0.185(+0.014) | 0.0899(+0.0065) |\n| yolov5n ast exp2 | 4.2 | 0.189(+0.018) | 0.0908(+0.0074) |\n| yolov5n mgd+ast exp1 | 4.2 | 0.182(+0.011) | 0.0867(+0.0033) |\n| yolov5n mgd+ast exp2 | 4.2 | 0.185(+0.014) | 0.0902(+0.0068) |\n| yolov5n mgd+ast exp3 | 4.2 | 0.183(+0.012) | 0.0886(+0.0052) |\n\n#### Mode:Distill+Prune Dataset:VisDrone(训练集只用了百分之20的数据,验证集和测试集用了全量的数据) Teacher:Yolov5s+OTA\n| model | Parameters | GFLOPs | Model Size | mAP50 | mAP50-95 | Inference Time(bs:32) |\n| :----: | :----: | :----: | :----: | :----: | :----: | :----: |\n| BaseLine(Yolov5n) | 1,772,695 | 4.2 | 3.7m | 0.171 | 0.0834 | 0.020s | \n| LAMP 2.0x | 301,033(16.98%) | 2.1(50%) | 0.8m(21.62%) | 0.149(-0.022) | 0.0676(-0.0158) | 0.016s | \n| LAMP 2.0x + cwd exp1 | 301,033(16.98%) | 2.1(50%) | 0.8m(21.62%) | 0.163(+0.014) | 0.0745(+0.0069) | 0.016s | \n| LAMP 2.0x + cwd exp2 | 301,033(16.98%) | 2.1(50%) | 0.8m(21.62%) | 0.158(+0.009) | 0.0728(+0.0052) | 0.016s | \n| LAMP 2.0x + cwd exp3 | 301,033(16.98%) | 2.1(50%) | 0.8m(21.62%) | 0.164(+0.015) | 0.0742(+0.0066) | 0.016s | \n| LAMP 2.0x + mgd exp1 | 301,033(16.98%) | 2.1(50%) | 0.8m(21.62%) | 0.148(-0.001) | 0.066(-0.0016) | 0.016s | \n| LAMP 2.0x + mgd exp2 | 301,033(16.98%) | 2.1(50%) | 0.8m(21.62%) | 0.148(-0.001) | 0.0673(-0.0003) | 0.016s | \n| LAMP 2.0x + mgd exp3 | 301,033(16.98%) | 2.1(50%) | 0.8m(21.62%) | 0.152(+0.003) | 0.0687(+0.0011) | 0.016s | \n| LAMP 2.0x + l2 exp1 | 301,033(16.98%) | 2.1(50%) | 0.8m(21.62%) | 0.137(-0.012) | 0.0542(-0.0134) | 0.016s | \n| LAMP 2.0x + l2 exp2 | 301,033(16.98%) | 2.1(50%) | 0.8m(21.62%) | 0.149(+0.000) | 0.0638(+0.0011) | 0.016s | \n| LAMP 2.0x + ast exp1 | 301,033(16.98%) | 2.1(50%) | 0.8m(21.62%) | 0.154(+0.005) | 0.0679(+0.0003) | 0.016s | \n| LAMP 2.0x + ast exp2 | 301,033(16.98%) | 2.1(50%) | 0.8m(21.62%) | 0.152(+0.003) | 0.0693(+0.0017) | 0.016s | \n| LAMP 2.0x + ast exp3 | 301,033(16.98%) | 2.1(50%) | 0.8m(21.62%) | 0.154(+0.005) | 0.0652(-0.0024) | 0.016s | \n| LAMP 2.0x + ast exp4 | 301,033(16.98%) | 2.1(50%) | 0.8m(21.62%) | 0.125(-0.024) | 0.0547(-0.0129) | 0.016s | \n| LAMP 2.0x + ast exp5 | 301,033(16.98%) | 2.1(50%) | 0.8m(21.62%) | 0.141(-0.008) | 0.0635(-0.0041) | 0.016s | \n\n| model | Parameters | GFLOPs | Model Size | mAP50 | mAP50-95 | Inference Time(bs:32) |\n| :----: | :----: | :----: | :----: | :----: | :----: | :----: |\n| BaseLine(Yolov5n) | 1,772,695 | 4.2 | 3.7m | 0.171 | 0.0834 | 0.020s | \n| GroupSl (Sparse) 2.0x | 330,322(18.63%) | 2.1(50%) | 0.8m(21.62%) | 0.162(-0.009) | 0.0754(-0.008) | 0.017s | \n| GroupSl (Sparse) 2.0x + cwd exp1 | 330,322(18.63%) | 2.1(50%) | 0.8m(21.62%) | 0.174(+0.012) | 0.0817(+0.0063) | 0.017s | \n| GroupSl (Sparse) 2.0x + cwd exp2 | 330,322(18.63%) | 2.1(50%) | 0.8m(21.62%) | 0.177(+0.015) | 0.0815(+0.0061) | 0.017s | \n| GroupSl (Sparse) 2.0x + cwd exp3 | 330,322(18.63%) | 2.1(50%) | 0.8m(21.62%) | 0.177(+0.015) | 0.08(+0.0046) | 0.017s | \n| GroupSl (Sparse) 2.0x + cwd exp4 | 330,322(18.63%) | 2.1(50%) | 0.8m(21.62%) | 0.174(+0.012) | 0.0813(+0.0059) | 0.017s | \n| GroupSl (Sparse) 2.0x + cwd exp5 | 330,322(18.63%) | 2.1(50%) | 0.8m(21.62%) | 0.173(+0.011) | 0.0808(+0.0054) | 0.017s | \n| GroupSl (Sparse) 2.0x + mgd exp1 | 330,322(18.63%) | 2.1(50%) | 0.8m(21.62%) | 0.151(-0.011) | 0.0662(-0.0092) | 0.017s | \n| GroupSl (Sparse) 2.0x + mgd exp2 | 330,322(18.63%) | 2.1(50%) | 0.8m(21.62%) | 0.164(+0.002) | 0.0771(+0.0017) | 0.017s | \n| GroupSl (Sparse) 2.0x + mgd exp3 | 330,322(18.63%) | 2.1(50%) | 0.8m(21.62%) | 0.154(-0.08) | 0.0691(-0.0063) | 0.017s | \n| GroupSl (Sparse) 2.0x + mgd exp4 | 330,322(18.63%) | 2.1(50%) | 0.8m(21.62%) | 0.166(+0.004) | 0.0774(+0.002) | 0.017s | \n| GroupSl (Sparse) 2.0x + ast exp1 | 330,322(18.63%) | 2.1(50%) | 0.8m(21.62%) | 0.172(+0.01) | 0.0776(+0.0022) | 0.017s | \n| GroupSl (Sparse) 2.0x + ast exp2 | 330,322(18.63%) | 2.1(50%) | 0.8m(21.62%) | 0.167(+0.005) | 0.0763(+0.0009) | 0.017s | \n| GroupSl (Sparse) 2.0x + ast exp3 | 330,322(18.63%) | 2.1(50%) | 0.8m(21.62%) | 0.17(+0.008) | 0.0754(+0.0) | 0.017s | \n| GroupSl (Sparse) 2.0x + cwd + ast exp1 | 330,322(18.63%) | 2.1(50%) | 0.8m(21.62%) | 0.169(+0.007) | 0.0746(-0.008) | 0.017s | \n| GroupSl (Sparse) 2.0x + cwd + ast exp2 | 330,322(18.63%) | 2.1(50%) | 0.8m(21.62%) | 0.172(+0.01) | 0.078(+0.0026) | 0.017s | \n| GroupSl (Sparse) 2.0x + cwd + ast exp3 | 330,322(18.63%) | 2.1(50%) | 0.8m(21.62%) | 0.172(+0.01) | 0.0786(+0.0032) | 0.017s | \n\n#### Mode:Prune Dataset:CrowdHuman 20%train  Model:Yolov5n+RepViT+C2f\n| model | Parameters | GFLOPs | Model Size | mAP50 | mAP50-95 | Inference Time(bs:32) |\n| :----: | :----: | :----: | :----: | :----: | :----: | :----: |\n| BaseLine(Yolov5n) | 1,761,871 | 4.1 | 3.7M | 0.692 | 0.37 | 0.00062s |\n| Yolov5n+RepVit+C2f | 6,001,647(340.6%) | 16.2(395.1%) | 12.1M(327.0%) | 0.711(+0.019) | 0.386(+0.016) | 0.00262s |\n| Yolov5n+RepVit+C2f Lamp 2.0x | 2,318,239(131.5%) | 8.2(200%) | 5.0M(135.1%) | 0.721(+0.029) | 0.398(+0.028) | 0.00218s |\n| Yolov5n+RepVit+C2f Lamp 3.0x | 1,446,593(82.1%) | 5.6(136.6%) | 3.3M(89.2%) | 0.712(+0.02) | 0.388(+0.018) | 0.00197s |\n| Yolov5n+RepVit+C2f Lamp 3.5x | 1,231,668(69.9%) | 4.8(117.1%) | 2.9M(78.4%) | 0.71(+0.018) | 0.383(+0.013) | 0.00189s |\n| Yolov5n+RepVit+C2f Lamp 4.0x | 1,082,684(61.5%) | 4.3(104.9%) | 2.7M(73.0%) | 0.705(+0.013) | 0.378(+0.008) | 0.00185s |\n| Yolov5n+RepVit+C2f Lamp 5.0x | 897,472(50.9%) | 3.4(82.9%) | 2.3M(62.2%) | 0.69(-0.002) | 0.364(-0.006) | 0.00178s |\n| Yolov5n+RepVit+C2f GroupSl (Sparse) 2.0x | 1,695,853(96.3%) | 8.2(200%) | 3.8M(102.7%) | 0.694(+0.002) | 0.364(-0.006) | 0.022s |\n| Yolov5n+RepVit+C2f Slim (Sparse) 2.0x | 3,006,781(170.7%) | 8.1(197.6%) | 6.3M(170.3%) | 0.707(+0.015) | 0.376(+0.006) | 0.00206s |\n| Yolov5n+RepVit+C2f Slim (Sparse) 3.0x | 1,945,689(110.4%) | 5.6(136.6%) | 4.3M(116.2%) | 0.683(-0.009) | 0.348(-0.022) | 0.00189s |\n| Yolov5n+RepVit+C2f Slim (Sparse) 4.0x | 1,411,170(80.1%) | 4.2(102.4%) | 3.3M(89.2%) | 0.662(-0.03) | 0.331(-0.039) | 0.0018s |\n\n#### Mode:Prune Dataset:CrowdHuman 20%train  Model:Yolov5n+Fasternet+GoldYOLO+ASF+OTA\n| model | Parameters | GFLOPs | Model Size | mAP50 | mAP50-95 | Inference Time(bs:32) |\n| :----: | :----: | :----: | :----: | :----: | :----: | :----: |\n| BaseLine(Yolov5n) | 1,761,871 | 4.1 | 3.7M | 0.688 | 0.365 | 0.00062s |\n| Improve(Yolov5n+Fasternet+GoldYOLO+ASF+OTA) | 6,442,926(365.7%) | 10.5(256.1%) | 12.8M(345.9%) | 0.739(+0.051) | 0.395(+0.03) | 0.00221s(356.4%) |\n| Improve Lamp 2.0x | 3,753,930(213.1%) | 5.2(126.8%) | 7.6M(205.4%) | 0.732(+0.044) | 0.391(+0.026) | 0.00117s(188.7%) |\n| Improve Lamp 2.5x | 3,414,584(193.8%) | 4.2(102.4%) | 7.0M(189.2%) | 0.721(+0.033) | 0.377(+0.012) | 0.00097s(156.5%) |\n| Improve Lamp 3.0x | 3,198,691(181.6%) | 3.5(85.3%) | 6.6M(178.4%) | 0.7(+0.012) | 0.357(-0.08) | 0.00083s(133.9%) |"
  },
  {
    "path": "yolo-improve/yolov7-CoordConv.py",
    "content": "class AddCoords(nn.Module):\n    def __init__(self, with_r=False):\n        super().__init__()\n        self.with_r = with_r\n\n    def forward(self, input_tensor):\n        \"\"\"\n        Args:\n            input_tensor: shape(batch, channel, x_dim, y_dim)\n        \"\"\"\n        batch_size, _, x_dim, y_dim = input_tensor.size()\n\n        xx_channel = torch.arange(x_dim).repeat(1, y_dim, 1)\n        yy_channel = torch.arange(y_dim).repeat(1, x_dim, 1).transpose(1, 2)\n\n        xx_channel = xx_channel.float() / (x_dim - 1)\n        yy_channel = yy_channel.float() / (y_dim - 1)\n\n        xx_channel = xx_channel * 2 - 1\n        yy_channel = yy_channel * 2 - 1\n\n        xx_channel = xx_channel.repeat(batch_size, 1, 1, 1).transpose(2, 3)\n        yy_channel = yy_channel.repeat(batch_size, 1, 1, 1).transpose(2, 3)\n\n        ret = torch.cat([\n            input_tensor,\n            xx_channel.type_as(input_tensor),\n            yy_channel.type_as(input_tensor)], dim=1)\n\n        if self.with_r:\n            rr = torch.sqrt(torch.pow(xx_channel.type_as(input_tensor) - 0.5, 2) + torch.pow(yy_channel.type_as(input_tensor) - 0.5, 2))\n            ret = torch.cat([ret, rr], dim=1)\n\n        return ret\n\nclass CoordConv(nn.Module):\n    def __init__(self, in_channels, out_channels, kernel_size=1, stride=1, with_r=False):\n        super().__init__()\n        self.addcoords = AddCoords(with_r=with_r)\n        in_channels += 2\n        if with_r:\n            in_channels += 1\n        self.conv = Conv(in_channels, out_channels, k=kernel_size, s=stride)\n\n    def forward(self, x):\n        x = self.addcoords(x)\n        x = self.conv(x)\n        return x\n\n# yolov7 head\nhead:\n  [[-1, 1, SPPCSPC, [512]], # 51\n  \n   [-1, 1, CoordConv, [256, 1, 1]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [37, 1, CoordConv, [256, 1, 1]], # route backbone P4\n   [[-1, -2], 1, Concat, [1]],\n   \n   [-1, 1, Conv, [256, 1, 1]],\n   [-2, 1, Conv, [256, 1, 1]],\n   [-1, 1, Conv, [128, 3, 1]],\n   [-1, 1, Conv, [128, 3, 1]],\n   [-1, 1, Conv, [128, 3, 1]],\n   [-1, 1, Conv, [128, 3, 1]],\n   [[-1, -2, -3, -4, -5, -6], 1, Concat, [1]],\n   [-1, 1, Conv, [256, 1, 1]], # 63\n   \n   [-1, 1, CoordConv, [128, 1, 1]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [24, 1, CoordConv, [128, 1, 1]], # route backbone P3\n   [[-1, -2], 1, Concat, [1]],\n   \n   [-1, 1, Conv, [128, 1, 1]],\n   [-2, 1, Conv, [128, 1, 1]],\n   [-1, 1, Conv, [64, 3, 1]],\n   [-1, 1, Conv, [64, 3, 1]],\n   [-1, 1, Conv, [64, 3, 1]],\n   [-1, 1, Conv, [64, 3, 1]],\n   [[-1, -2, -3, -4, -5, -6], 1, Concat, [1]],\n   [-1, 1, Conv, [128, 1, 1]], # 75\n      \n   [-1, 1, MP, []],\n   [-1, 1, Conv, [128, 1, 1]],\n   [-3, 1, Conv, [128, 1, 1]],\n   [-1, 1, Conv, [128, 3, 2]],\n   [[-1, -3, 63], 1, Concat, [1]],\n   \n   [-1, 1, Conv, [256, 1, 1]],\n   [-2, 1, Conv, [256, 1, 1]],\n   [-1, 1, Conv, [128, 3, 1]],\n   [-1, 1, Conv, [128, 3, 1]],\n   [-1, 1, Conv, [128, 3, 1]],\n   [-1, 1, Conv, [128, 3, 1]],\n   [[-1, -2, -3, -4, -5, -6], 1, Concat, [1]],\n   [-1, 1, Conv, [256, 1, 1]], # 88\n      \n   [-1, 1, MP, []],\n   [-1, 1, Conv, [256, 1, 1]],\n   [-3, 1, Conv, [256, 1, 1]],\n   [-1, 1, Conv, [256, 3, 2]],\n   [[-1, -3, 51], 1, Concat, [1]],\n   \n   [-1, 1, Conv, [512, 1, 1]],\n   [-2, 1, Conv, [512, 1, 1]],\n   [-1, 1, Conv, [256, 3, 1]],\n   [-1, 1, Conv, [256, 3, 1]],\n   [-1, 1, Conv, [256, 3, 1]],\n   [-1, 1, Conv, [256, 3, 1]],\n   [[-1, -2, -3, -4, -5, -6], 1, Concat, [1]],\n   [-1, 1, Conv, [512, 1, 1]], # 101\n   \n   [75, 1, CoordConv, [256, 3, 1]],\n   [88, 1, CoordConv, [512, 3, 1]],\n   [101, 1, CoordConv, [1024, 3, 1]],\n\n   [[102,103,104], 1, IDetect, [nc, anchors]],   # Detect(P3, P4, P5)\n  ]\n"
  },
  {
    "path": "yolo-improve/yolov7-DBB.py",
    "content": "import torch.nn.functional as F\ndef transI_fusebn(kernel, bn):\n    gamma = bn.weight\n    std = (bn.running_var + bn.eps).sqrt()\n    return kernel * ((gamma / std).reshape(-1, 1, 1, 1)), bn.bias - bn.running_mean * gamma / std\n\ndef transII_addbranch(kernels, biases):\n    return sum(kernels), sum(biases)\n\ndef transIII_1x1_kxk(k1, b1, k2, b2, groups):\n    if groups == 1:\n        k = F.conv2d(k2, k1.permute(1, 0, 2, 3))      #\n        b_hat = (k2 * b1.reshape(1, -1, 1, 1)).sum((1, 2, 3))\n    else:\n        k_slices = []\n        b_slices = []\n        k1_T = k1.permute(1, 0, 2, 3)\n        k1_group_width = k1.size(0) // groups\n        k2_group_width = k2.size(0) // groups\n        for g in range(groups):\n            k1_T_slice = k1_T[:, g*k1_group_width:(g+1)*k1_group_width, :, :]\n            k2_slice = k2[g*k2_group_width:(g+1)*k2_group_width, :, :, :]\n            k_slices.append(F.conv2d(k2_slice, k1_T_slice))\n            b_slices.append((k2_slice * b1[g*k1_group_width:(g+1)*k1_group_width].reshape(1, -1, 1, 1)).sum((1, 2, 3)))\n        k, b_hat = transIV_depthconcat(k_slices, b_slices)\n    return k, b_hat + b2\n\ndef transIV_depthconcat(kernels, biases):\n    return torch.cat(kernels, dim=0), torch.cat(biases)\n\ndef transV_avg(channels, kernel_size, groups):\n    input_dim = channels // groups\n    k = torch.zeros((channels, input_dim, kernel_size, kernel_size))\n    k[np.arange(channels), np.tile(np.arange(input_dim), groups), :, :] = 1.0 / kernel_size ** 2\n    return k\n\n#   This has not been tested with non-square kernels (kernel.size(2) != kernel.size(3)) nor even-size kernels\ndef transVI_multiscale(kernel, target_kernel_size):\n    H_pixels_to_pad = (target_kernel_size - kernel.size(2)) // 2\n    W_pixels_to_pad = (target_kernel_size - kernel.size(3)) // 2\n    return F.pad(kernel, [H_pixels_to_pad, H_pixels_to_pad, W_pixels_to_pad, W_pixels_to_pad])\n\ndef conv_bn(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1,\n                   padding_mode='zeros'):\n    conv_layer = nn.Conv2d(in_channels=in_channels, out_channels=out_channels, kernel_size=kernel_size,\n                           stride=stride, padding=padding, dilation=dilation, groups=groups,\n                           bias=False, padding_mode=padding_mode)\n    bn_layer = nn.BatchNorm2d(num_features=out_channels, affine=True)\n    se = nn.Sequential()\n    se.add_module('conv', conv_layer)\n    se.add_module('bn', bn_layer)\n    return se\n\n\nclass IdentityBasedConv1x1(nn.Conv2d):\n    def __init__(self, channels, groups=1):\n        super(IdentityBasedConv1x1, self).__init__(in_channels=channels, out_channels=channels, kernel_size=1, stride=1, padding=0, groups=groups, bias=False)\n\n        assert channels % groups == 0\n        input_dim = channels // groups\n        id_value = np.zeros((channels, input_dim, 1, 1))\n        for i in range(channels):\n            id_value[i, i % input_dim, 0, 0] = 1\n        self.id_tensor = torch.from_numpy(id_value).type_as(self.weight)\n        nn.init.zeros_(self.weight)\n\n    def forward(self, input):\n        kernel = self.weight + self.id_tensor.to(self.weight.device).type_as(self.weight)\n        result = F.conv2d(input, kernel, None, stride=1, padding=0, dilation=self.dilation, groups=self.groups)\n        return result\n\n    def get_actual_kernel(self):\n        return self.weight + self.id_tensor.to(self.weight.device)\n\n\nclass BNAndPadLayer(nn.Module):\n    def __init__(self,\n                 pad_pixels,\n                 num_features,\n                 eps=1e-5,\n                 momentum=0.1,\n                 affine=True,\n                 track_running_stats=True):\n        super(BNAndPadLayer, self).__init__()\n        self.bn = nn.BatchNorm2d(num_features, eps, momentum, affine, track_running_stats)\n        self.pad_pixels = pad_pixels\n\n    def forward(self, input):\n        output = self.bn(input)\n        if self.pad_pixels > 0:\n            if self.bn.affine:\n                pad_values = self.bn.bias.detach() - self.bn.running_mean * self.bn.weight.detach() / torch.sqrt(self.bn.running_var + self.bn.eps)\n            else:\n                pad_values = - self.bn.running_mean / torch.sqrt(self.bn.running_var + self.bn.eps)\n            output = F.pad(output, [self.pad_pixels] * 4)\n            pad_values = pad_values.view(1, -1, 1, 1)\n            output[:, :, 0:self.pad_pixels, :] = pad_values\n            output[:, :, -self.pad_pixels:, :] = pad_values\n            output[:, :, :, 0:self.pad_pixels] = pad_values\n            output[:, :, :, -self.pad_pixels:] = pad_values\n        return output\n\n    @property\n    def weight(self):\n        return self.bn.weight\n\n    @property\n    def bias(self):\n        return self.bn.bias\n\n    @property\n    def running_mean(self):\n        return self.bn.running_mean\n\n    @property\n    def running_var(self):\n        return self.bn.running_var\n\n    @property\n    def eps(self):\n        return self.bn.eps\n\n\nclass DiverseBranchBlock(nn.Module):\n    def __init__(self, in_channels, out_channels, k,\n                 s=1, p=None, g=1, act=None,\n                 internal_channels_1x1_3x3=None,\n                 deploy=False, single_init=False):\n        super(DiverseBranchBlock, self).__init__()\n        self.deploy = deploy\n\n        self.nonlinear = act\n\n        self.kernel_size = k\n        self.out_channels = out_channels\n        self.groups = g\n        \n        if p is None:\n            p = autopad(k, p)\n        assert p == k // 2\n\n        if deploy:\n            self.dbb_reparam = nn.Conv2d(in_channels=in_channels, out_channels=out_channels, kernel_size=k, stride=s, padding=p, groups=g, bias=True)\n\n        else:\n\n            self.dbb_origin = conv_bn(in_channels=in_channels, out_channels=out_channels, kernel_size=k, stride=s, padding=p, groups=g)\n\n            self.dbb_avg = nn.Sequential()\n            if g < out_channels:\n                self.dbb_avg.add_module('conv', nn.Conv2d(in_channels=in_channels, out_channels=out_channels, kernel_size=1, stride=1, padding=0, groups=g, bias=False))\n                self.dbb_avg.add_module('bn', BNAndPadLayer(pad_pixels=p, num_features=out_channels))\n                self.dbb_avg.add_module('avg', nn.AvgPool2d(kernel_size=k, stride=s, padding=0))\n                self.dbb_1x1 = conv_bn(in_channels=in_channels, out_channels=out_channels, kernel_size=1, stride=s, padding=0, groups=g)\n            else:\n                self.dbb_avg.add_module('avg', nn.AvgPool2d(kernel_size=k, stride=s, padding=p))\n\n            self.dbb_avg.add_module('avgbn', nn.BatchNorm2d(out_channels))\n\n\n            if internal_channels_1x1_3x3 is None:\n                internal_channels_1x1_3x3 = in_channels if g < out_channels else 2 * in_channels   # For mobilenet, it is better to have 2X internal channels\n\n            self.dbb_1x1_kxk = nn.Sequential()\n            if internal_channels_1x1_3x3 == in_channels:\n                self.dbb_1x1_kxk.add_module('idconv1', IdentityBasedConv1x1(channels=in_channels, groups=g))\n            else:\n                self.dbb_1x1_kxk.add_module('conv1', nn.Conv2d(in_channels=in_channels, out_channels=internal_channels_1x1_3x3, kernel_size=1, stride=1, padding=0, groups=g, bias=False))\n            self.dbb_1x1_kxk.add_module('bn1', BNAndPadLayer(pad_pixels=p, num_features=internal_channels_1x1_3x3, affine=True))\n            self.dbb_1x1_kxk.add_module('conv2', nn.Conv2d(in_channels=internal_channels_1x1_3x3, out_channels=out_channels, kernel_size=k, stride=s, padding=0, groups=g, bias=False))\n            self.dbb_1x1_kxk.add_module('bn2', nn.BatchNorm2d(out_channels))\n\n        #   The experiments reported in the paper used the default initialization of bn.weight (all as 1). But changing the initialization may be useful in some cases.\n        if single_init:\n            #   Initialize the bn.weight of dbb_origin as 1 and others as 0. This is not the default setting.\n            self.single_init()\n\n    def get_equivalent_kernel_bias(self):\n        k_origin, b_origin = transI_fusebn(self.dbb_origin.conv.weight, self.dbb_origin.bn)\n\n        if hasattr(self, 'dbb_1x1'):\n            k_1x1, b_1x1 = transI_fusebn(self.dbb_1x1.conv.weight, self.dbb_1x1.bn)\n            k_1x1 = transVI_multiscale(k_1x1, self.kernel_size)\n        else:\n            k_1x1, b_1x1 = 0, 0\n\n        if hasattr(self.dbb_1x1_kxk, 'idconv1'):\n            k_1x1_kxk_first = self.dbb_1x1_kxk.idconv1.get_actual_kernel()\n        else:\n            k_1x1_kxk_first = self.dbb_1x1_kxk.conv1.weight\n        k_1x1_kxk_first, b_1x1_kxk_first = transI_fusebn(k_1x1_kxk_first, self.dbb_1x1_kxk.bn1)\n        k_1x1_kxk_second, b_1x1_kxk_second = transI_fusebn(self.dbb_1x1_kxk.conv2.weight, self.dbb_1x1_kxk.bn2)\n        k_1x1_kxk_merged, b_1x1_kxk_merged = transIII_1x1_kxk(k_1x1_kxk_first, b_1x1_kxk_first, k_1x1_kxk_second, b_1x1_kxk_second, groups=self.groups)\n\n        k_avg = transV_avg(self.out_channels, self.kernel_size, self.groups)\n        k_1x1_avg_second, b_1x1_avg_second = transI_fusebn(k_avg.to(self.dbb_avg.avgbn.weight.device), self.dbb_avg.avgbn)\n        if hasattr(self.dbb_avg, 'conv'):\n            k_1x1_avg_first, b_1x1_avg_first = transI_fusebn(self.dbb_avg.conv.weight, self.dbb_avg.bn)\n            k_1x1_avg_merged, b_1x1_avg_merged = transIII_1x1_kxk(k_1x1_avg_first, b_1x1_avg_first, k_1x1_avg_second, b_1x1_avg_second, groups=self.groups)\n        else:\n            k_1x1_avg_merged, b_1x1_avg_merged = k_1x1_avg_second, b_1x1_avg_second\n\n        return transII_addbranch((k_origin, k_1x1, k_1x1_kxk_merged, k_1x1_avg_merged), (b_origin, b_1x1, b_1x1_kxk_merged, b_1x1_avg_merged))\n\n    def switch_to_deploy(self):\n        if hasattr(self, 'dbb_reparam'):\n            return\n        kernel, bias = self.get_equivalent_kernel_bias()\n        self.dbb_reparam = nn.Conv2d(in_channels=self.dbb_origin.conv.in_channels, out_channels=self.dbb_origin.conv.out_channels,\n                                     kernel_size=self.dbb_origin.conv.kernel_size, stride=self.dbb_origin.conv.stride,\n                                     padding=self.dbb_origin.conv.padding, dilation=self.dbb_origin.conv.dilation, groups=self.dbb_origin.conv.groups, bias=True)\n        self.dbb_reparam.weight.data = kernel\n        self.dbb_reparam.bias.data = bias\n        for para in self.parameters():\n            para.detach_()\n        self.__delattr__('dbb_origin')\n        self.__delattr__('dbb_avg')\n        if hasattr(self, 'dbb_1x1'):\n            self.__delattr__('dbb_1x1')\n        self.__delattr__('dbb_1x1_kxk')\n\n    def forward(self, inputs):\n        if hasattr(self, 'dbb_reparam'):\n            return self.nonlinear(self.dbb_reparam(inputs))\n\n        out = self.dbb_origin(inputs)\n        if hasattr(self, 'dbb_1x1'):\n            out += self.dbb_1x1(inputs)\n        out += self.dbb_avg(inputs)\n        out += self.dbb_1x1_kxk(inputs)\n        return self.nonlinear(out)\n\n    def init_gamma(self, gamma_value):\n        if hasattr(self, \"dbb_origin\"):\n            torch.nn.init.constant_(self.dbb_origin.bn.weight, gamma_value)\n        if hasattr(self, \"dbb_1x1\"):\n            torch.nn.init.constant_(self.dbb_1x1.bn.weight, gamma_value)\n        if hasattr(self, \"dbb_avg\"):\n            torch.nn.init.constant_(self.dbb_avg.avgbn.weight, gamma_value)\n        if hasattr(self, \"dbb_1x1_kxk\"):\n            torch.nn.init.constant_(self.dbb_1x1_kxk.bn2.weight, gamma_value)\n\n    def single_init(self):\n        self.init_gamma(0.0)\n        if hasattr(self, \"dbb_origin\"):\n            torch.nn.init.constant_(self.dbb_origin.bn.weight, 1.0)"
  },
  {
    "path": "yolo-improve/yolov7-DCN.py",
    "content": "class DCNv2(nn.Module):\n    def __init__(self, in_channels, out_channels, kernel_size, stride=1,\n                 padding=1, groups=1, act=True, dilation=1, deformable_groups=1):\n        super(DCNv2, self).__init__()\n\n        self.in_channels = in_channels\n        self.out_channels = out_channels\n        self.kernel_size = (kernel_size, kernel_size)\n        self.stride = (stride, stride)\n        self.padding = (autopad(kernel_size, padding), autopad(kernel_size, padding))\n        self.dilation = (dilation, dilation)\n        self.groups = groups\n        self.deformable_groups = deformable_groups\n\n        self.weight = nn.Parameter(\n            torch.empty(out_channels, in_channels, *self.kernel_size)\n        )\n        self.bias = nn.Parameter(torch.empty(out_channels))\n\n        out_channels_offset_mask = (self.deformable_groups * 3 *\n                                    self.kernel_size[0] * self.kernel_size[1])\n        self.conv_offset_mask = nn.Conv2d(\n            self.in_channels,\n            out_channels_offset_mask,\n            kernel_size=self.kernel_size,\n            stride=self.stride,\n            padding=self.padding,\n            bias=True,\n        )\n        self.bn = nn.BatchNorm2d(out_channels)\n        self.act = nn.SiLU() if act is True else (act if isinstance(act, nn.Module) else nn.Identity())\n        self.reset_parameters()\n\n    def forward(self, x):\n        offset_mask = self.conv_offset_mask(x)\n        o1, o2, mask = torch.chunk(offset_mask, 3, dim=1)\n        offset = torch.cat((o1, o2), dim=1)\n        mask = torch.sigmoid(mask)\n        x = torch.ops.torchvision.deform_conv2d(\n            x,\n            self.weight,\n            offset,\n            mask,\n            self.bias,\n            self.stride[0], self.stride[1],\n            self.padding[0], self.padding[1],\n            self.dilation[0], self.dilation[1],\n            self.groups,\n            self.deformable_groups,\n            True\n        )\n        x = self.bn(x)\n        x = self.act(x)\n        return x\n\n    def reset_parameters(self):\n        n = self.in_channels\n        for k in self.kernel_size:\n            n *= k\n        std = 1. / math.sqrt(n)\n        self.weight.data.uniform_(-std, std)\n        self.bias.data.zero_()\n        self.conv_offset_mask.weight.data.zero_()\n        self.conv_offset_mask.bias.data.zero_()"
  },
  {
    "path": "yolo-improve/yolov7-DCNV3.py",
    "content": "from models.ops_dcnv3.modules import DCNv3\nclass DCNV3_YoLo(nn.Module):\n    def __init__(self, inc, ouc, k=1, s=1, p=None, g=1, act=True):\n        super().__init__()\n        \n        self.conv = Conv(inc, ouc, k=1)\n        self.dcnv3 = DCNv3(ouc, kernel_size=k, stride=s, group=g)\n        self.bn = nn.BatchNorm2d(ouc)\n        self.act = nn.SiLU() if act is True else (act if isinstance(act, nn.Module) else nn.Identity())\n    \n    def forward(self, x):\n        x = self.conv(x)\n        x = x.permute(0, 2, 3, 1)\n        x = self.dcnv3(x)\n        x = x.permute(0, 3, 1, 2)\n        x = self.act(self.bn(x))\n        return x\n\nif isinstance(m, Detect):\n    s = 256  # 2x min stride\n    self.model.to(torch.device('cuda'))\n    m.stride = torch.tensor([s / x.shape[-2] for x in self.forward(torch.zeros(1, ch, s, s).to(torch.device('cuda')))]).cpu()  # forward\n    self.model.cpu()\n    check_anchor_order(m)\n    m.anchors /= m.stride.view(-1, 1, 1)\n    self.stride = m.stride\n    self._initialize_biases()  # only run once\n    # print('Strides: %s' % m.stride.tolist())\nif isinstance(m, IDetect):\n    s = 256  # 2x min stride\n    self.model.to(torch.device('cuda'))\n    m.stride = torch.tensor([s / x.shape[-2] for x in self.forward(torch.zeros(1, ch, s, s).to(torch.device('cuda')))]).cpu()  # forward\n    self.model.cpu()\n    check_anchor_order(m)\n    m.anchors /= m.stride.view(-1, 1, 1)\n    self.stride = m.stride\n    self._initialize_biases()  # only run once\n    # print('Strides: %s' % m.stride.tolist())\nif isinstance(m, IAuxDetect):\n    s = 256  # 2x min stride\n    self.model.to(torch.device('cuda'))\n    m.stride = torch.tensor([s / x.shape[-2] for x in self.forward(torch.zeros(1, ch, s, s).to(torch.device('cuda')))[:4]]).cpu()  # forward\n    self.model.cpu()\n    #print(m.stride)\n    check_anchor_order(m)\n    m.anchors /= m.stride.view(-1, 1, 1)\n    self.stride = m.stride\n    self._initialize_aux_biases()  # only run once\n    # print('Strides: %s' % m.stride.tolist())"
  },
  {
    "path": "yolo-improve/yolov7-DSConv.py",
    "content": "import torch.nn.functional as F\nfrom torch.nn.modules.conv import _ConvNd\nfrom torch.nn.modules.utils import _pair\n\nclass DSConv(_ConvNd):\n    def __init__(self, in_channels, out_channels, kernel_size, stride=1,\n                 padding=None, dilation=1, groups=1, padding_mode='zeros', bias=False, block_size=32, KDSBias=False, CDS=False):\n        padding = _pair(autopad(kernel_size, padding))\n        kernel_size = _pair(kernel_size)\n        stride = _pair(stride)\n        dilation = _pair(dilation)\n\n        blck_numb = math.ceil(((in_channels)/(block_size*groups)))\n        super(DSConv, self).__init__(\n            in_channels, out_channels, kernel_size, stride, padding, dilation,\n            False, _pair(0), groups, bias, padding_mode)\n\n        # KDS weight From Paper\n        self.intweight = torch.Tensor(out_channels, in_channels, *kernel_size)\n        self.alpha = torch.Tensor(out_channels, blck_numb, *kernel_size)\n\n        # KDS bias From Paper\n        self.KDSBias = KDSBias\n        self.CDS = CDS\n\n        if KDSBias:\n            self.KDSb = torch.Tensor(out_channels, blck_numb, *kernel_size)\n        if CDS:\n            self.CDSw = torch.Tensor(out_channels)\n            self.CDSb = torch.Tensor(out_channels)\n\n        self.reset_parameters()\n\n    def get_weight_res(self):\n        # Include expansion of alpha and multiplication with weights to include in the convolution layer here\n        alpha_res = torch.zeros(self.weight.shape).to(self.alpha.device)\n\n        # Include KDSBias\n        if self.KDSBias:\n            KDSBias_res = torch.zeros(self.weight.shape).to(self.alpha.device)\n\n        # Handy definitions:\n        nmb_blocks = self.alpha.shape[1]\n        total_depth = self.weight.shape[1]\n        bs = total_depth//nmb_blocks\n\n        llb = total_depth-(nmb_blocks-1)*bs\n\n        # Casting the Alpha values as same tensor shape as weight\n        for i in range(nmb_blocks):\n            length_blk = llb if i==nmb_blocks-1 else bs\n\n            shp = self.alpha.shape # Notice this is the same shape for the bias as well\n            to_repeat=self.alpha[:, i, ...].view(shp[0],1,shp[2],shp[3]).clone()\n            repeated = to_repeat.expand(shp[0], length_blk, shp[2], shp[3]).clone()\n            alpha_res[:, i*bs:(i*bs+length_blk), ...] = repeated.clone()\n\n            if self.KDSBias:\n                to_repeat = self.KDSb[:, i, ...].view(shp[0], 1, shp[2], shp[3]).clone()\n                repeated = to_repeat.expand(shp[0], length_blk, shp[2], shp[3]).clone()\n                KDSBias_res[:, i*bs:(i*bs+length_blk), ...] = repeated.clone()\n\n        if self.CDS:\n            to_repeat = self.CDSw.view(-1, 1, 1, 1)\n            repeated = to_repeat.expand_as(self.weight)\n            print(repeated.shape)\n\n        # Element-wise multiplication of alpha and weight\n        weight_res = torch.mul(alpha_res, self.weight)\n        if self.KDSBias:\n            weight_res = torch.add(weight_res, KDSBias_res)\n        return weight_res\n\n    def forward(self, input):\n        # Get resulting weight\n        #weight_res = self.get_weight_res()\n\n        # Returning convolution\n        return F.conv2d(input, self.weight, self.bias,\n                            self.stride, self.padding, self.dilation,\n                            self.groups)\n\nclass DSConv2D(Conv):\n    def __init__(self, inc, ouc, k=1, s=1, p=None, g=1, act=True):\n        super().__init__(inc, ouc, k, s, p, g, act)\n        self.conv = DSConv(inc, ouc, k, s, p, g)"
  },
  {
    "path": "yolo-improve/yolov7-DecoupledHead.py",
    "content": "class IDetect_Decoupled(nn.Module):\n    stride = None  # strides computed during build\n    export = False  # onnx export\n    end2end = False\n    include_nms = False\n    concat = False\n\n    def __init__(self, nc=80, anchors=(), ch=()):  # detection layer\n        super(IDetect_Decoupled, self).__init__()\n        self.nc = nc  # number of classes\n        self.no = nc + 5  # number of outputs per anchor\n        self.nl = len(anchors)  # number of detection layers\n        self.na = len(anchors[0]) // 2  # number of anchors\n        self.grid = [torch.zeros(1)] * self.nl  # init grid\n        a = torch.tensor(anchors).float().view(self.nl, -1, 2)\n        self.register_buffer('anchors', a)  # shape(nl,na,2)\n        self.register_buffer('anchor_grid', a.clone().view(self.nl, 1, -1, 1, 1, 2))  # shape(nl,1,na,1,1,2)\n        \n        self.m_stem = nn.ModuleList(Conv(x, x, 1) for x in ch)  # stem conv\n        self.m_cls = nn.ModuleList(nn.Sequential(Conv(x, x, 3), nn.Conv2d(x, self.na * self.nc, 1)) for x in ch)  # cls conv\n        self.m_reg_conf = nn.ModuleList(Conv(x, x, 3) for x in ch)  # reg_conf stem conv\n        self.m_reg = nn.ModuleList(nn.Conv2d(x, self.na * 4, 1) for x in ch)  # reg conv\n        self.m_conf = nn.ModuleList(nn.Conv2d(x, self.na * 1, 1) for x in ch)  # conf conv\n        \n        self.ia_cls = nn.ModuleList(ImplicitA(x) for x in ch)\n        self.ia_reg = nn.ModuleList(ImplicitA(x) for x in ch)\n        self.ia_conf = nn.ModuleList(ImplicitA(x) for x in ch)\n        \n        self.im_cls = nn.ModuleList(ImplicitM(self.nc * self.na) for _ in ch)\n        self.im_reg = nn.ModuleList(ImplicitM(4 * self.na) for _ in ch)\n        self.im_conf = nn.ModuleList(ImplicitM(1 * self.na) for _ in ch)\n\n    def forward(self, x):\n        # x = x.copy()  # for profiling\n        z = []  # inference output\n        self.training |= self.export\n        for i in range(self.nl):\n            x[i] = self.m_stem[i](x[i])  # conv\n            \n            bs, _, ny, nx = x[i].shape\n            x_cls = self.im_cls[i](self.m_cls[i](self.ia_cls[i](x[i]))).view(bs, self.na, self.nc, ny, nx).permute(0, 1, 3, 4, 2).contiguous()\n            x_reg_conf = self.m_reg_conf[i](x[i])\n            x_reg = self.im_reg[i](self.m_reg[i](self.ia_reg[i](x_reg_conf))).view(bs, self.na, 4, ny, nx).permute(0, 1, 3, 4, 2).contiguous()\n            x_conf = self.im_conf[i](self.m_conf[i](self.ia_conf[i](x_reg_conf))).view(bs, self.na, 1, ny, nx).permute(0, 1, 3, 4, 2).contiguous()\n            x[i] = torch.cat([x_reg, x_conf, x_cls], dim=4)\n\n            if not self.training:  # inference\n                if self.grid[i].shape[2:4] != x[i].shape[2:4]:\n                    self.grid[i] = self._make_grid(nx, ny).to(x[i].device)\n\n                y = x[i].sigmoid()\n                y[..., 0:2] = (y[..., 0:2] * 2. - 0.5 + self.grid[i]) * self.stride[i]  # xy\n                y[..., 2:4] = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i]  # wh\n                z.append(y.view(bs, -1, self.no))\n\n        return x if self.training else (torch.cat(z, 1), x)\n    \n    def fuseforward(self, x):\n        # x = x.copy()  # for profiling\n        z = []  # inference output\n        self.training |= self.export\n        for i in range(self.nl):\n            x[i] = self.m_stem[i](x[i])  # conv\n            \n            bs, _, ny, nx = x[i].shape\n            x_cls = self.m_cls[i](x[i]).view(bs, self.na, self.nc, ny, nx).permute(0, 1, 3, 4, 2).contiguous()\n            x_reg_conf = self.m_reg_conf[i](x[i])\n            x_reg = self.m_reg[i](x_reg_conf).view(bs, self.na, 4, ny, nx).permute(0, 1, 3, 4, 2).contiguous()\n            x_conf = self.m_conf[i](x_reg_conf).view(bs, self.na, 1, ny, nx).permute(0, 1, 3, 4, 2).contiguous()\n            x[i] = torch.cat([x_reg, x_conf, x_cls], dim=4)\n\n            if not self.training:  # inference\n                if self.grid[i].shape[2:4] != x[i].shape[2:4]:\n                    self.grid[i] = self._make_grid(nx, ny).to(x[i].device)\n\n                y = x[i].sigmoid()\n                if not torch.onnx.is_in_onnx_export():\n                    y[..., 0:2] = (y[..., 0:2] * 2. - 0.5 + self.grid[i]) * self.stride[i]  # xy\n                    y[..., 2:4] = (y[..., 2:4] * 2) ** 2 * self.anchor_grid[i]  # wh\n                else:\n                    xy, wh, conf = y.split((2, 2, self.nc + 1), 4)  # y.tensor_split((2, 4, 5), 4)  # torch 1.8.0\n                    xy = xy * (2. * self.stride[i]) + (self.stride[i] * (self.grid[i] - 0.5))  # new xy\n                    wh = wh ** 2 * (4 * self.anchor_grid[i].data)  # new wh\n                    y = torch.cat((xy, wh, conf), 4)\n                z.append(y.view(bs, -1, self.no))\n\n        if self.training:\n            out = x\n        elif self.end2end:\n            out = torch.cat(z, 1)\n        elif self.include_nms:\n            z = self.convert(z)\n            out = (z, )\n        elif self.concat:\n            out = torch.cat(z, 1)            \n        else:\n            out = (torch.cat(z, 1), x)\n\n        return out\n    \n    def fuse(self):\n        print(\"IDetect.fuse\")\n        # fuse ImplicitA and Convolution\n        for i in range(len(self.m_cls)):\n            c1,c2,_,_ = self.m_cls[i][-1].weight.shape\n            c1_,c2_, _,_ = self.ia_cls[i].implicit.shape\n            self.m_cls[i][-1].bias += torch.matmul(self.m_cls[i][-1].weight.reshape(c1,c2),self.ia_cls[i].implicit.reshape(c2_,c1_)).squeeze(1)\n        \n        for i in range(len(self.m_reg)):\n            c1,c2,_,_ = self.m_reg[i].weight.shape\n            c1_,c2_, _,_ = self.ia_reg[i].implicit.shape\n            self.m_reg[i].bias += torch.matmul(self.m_reg[i].weight.reshape(c1,c2),self.ia_reg[i].implicit.reshape(c2_,c1_)).squeeze(1)\n        \n        for i in range(len(self.m_conf)):\n            c1,c2,_,_ = self.m_conf[i].weight.shape\n            c1_,c2_, _,_ = self.ia_conf[i].implicit.shape\n            self.m_conf[i].bias += torch.matmul(self.m_conf[i].weight.reshape(c1,c2),self.ia_conf[i].implicit.reshape(c2_,c1_)).squeeze(1)\n\n        # fuse ImplicitM and Convolution\n        for i in range(len(self.m_cls)):\n            c1,c2, _,_ = self.im_cls[i].implicit.shape\n            self.m_cls[i][-1].bias *= self.im_cls[i].implicit.reshape(c2)\n            self.m_cls[i][-1].weight *= self.im_cls[i].implicit.transpose(0,1)\n        \n        for i in range(len(self.m_reg)):\n            c1,c2, _,_ = self.im_reg[i].implicit.shape\n            self.m_reg[i].bias *= self.im_reg[i].implicit.reshape(c2)\n            self.m_reg[i].weight *= self.im_reg[i].implicit.transpose(0,1)\n        \n        for i in range(len(self.m_conf)):\n            c1,c2, _,_ = self.im_conf[i].implicit.shape\n            self.m_conf[i].bias *= self.im_conf[i].implicit.reshape(c2)\n            self.m_conf[i].weight *= self.im_conf[i].implicit.transpose(0,1)\n            \n    @staticmethod\n    def _make_grid(nx=20, ny=20):\n        yv, xv = torch.meshgrid([torch.arange(ny), torch.arange(nx)])\n        return torch.stack((xv, yv), 2).view((1, 1, ny, nx, 2)).float()\n\n    def convert(self, z):\n        z = torch.cat(z, 1)\n        box = z[:, :, :4]\n        conf = z[:, :, 4:5]\n        score = z[:, :, 5:]\n        score *= conf\n        convert_matrix = torch.tensor([[1, 0, 1, 0], [0, 1, 0, 1], [-0.5, 0, 0.5, 0], [0, -0.5, 0, 0.5]],\n                                           dtype=torch.float32,\n                                           device=z.device)\n        box @= convert_matrix                          \n        return (box, score)\n\ndef _initialize_biases(self, cf=None):  # initialize biases into Detect(), cf is class frequency\n    # https://arxiv.org/abs/1708.02002 section 3.3\n    # cf = torch.bincount(torch.tensor(np.concatenate(dataset.labels, 0)[:, 0]).long(), minlength=nc) + 1.\n    m = self.model[-1]  # Detect() module\n    \n    if isinstance(m, IDetect):\n        for mi, s in zip(m.m, m.stride):  # from\n            b = mi.bias.view(m.na, -1)  # conv.bias(255) to (3,85)\n            b.data[:, 4] += math.log(8 / (640 / s) ** 2)  # obj (8 objects per 640 image)\n            b.data[:, 5:] += math.log(0.6 / (m.nc - 0.99)) if cf is None else torch.log(cf / cf.sum())  # cls\n            mi.bias = torch.nn.Parameter(b.view(-1), requires_grad=True)\n    elif isinstance(m, IDetect_Decoupled):\n        for mi, s in zip(m.m_conf, m.stride):  # from\n            b = mi.bias.view(m.na, -1)  # conv.bias(255) to (3,85)\n            b.data += math.log(8 / (640 / s) ** 2)  # obj (8 objects per 640 image)\n            mi.bias = torch.nn.Parameter(b.view(-1), requires_grad=True)\n\n        for mi, s in zip(m.m_cls, m.stride):  # from\n            b = mi[-1].bias.view(m.na, -1)  # conv.bias(255) to (3,85)\n            b.data += math.log(0.6 / (m.nc - 0.99)) if cf is None else torch.log(cf / cf.sum())  # cls\n            mi[-1].bias = torch.nn.Parameter(b.view(-1), requires_grad=True)\n\nif isinstance(m, IDetect_Decoupled):\n    s = 256  # 2x min stride\n    m.stride = torch.tensor([s / x.shape[-2] for x in self.forward(torch.zeros(1, ch, s, s))])  # forward\n    check_anchor_order(m)\n    m.anchors /= m.stride.view(-1, 1, 1)\n    self.stride = m.stride\n    self._initialize_biases()  # only run once\n    # print('Strides: %s' % m.stride.tolist())"
  },
  {
    "path": "yolo-improve/yolov7-DySnakeConv.py",
    "content": "class DySnakeConv(nn.Module):\n    def __init__(self, inc, ouc, k=3, act=True) -> None:\n        super().__init__()\n        \n        self.conv_0 = Conv(inc, ouc, k, act=act)\n        self.conv_x = DSConv(inc, ouc, 0, k, act=True)\n        self.conv_y = DSConv(inc, ouc, 1, k, act=True)\n        self.conv_1x1 = Conv(ouc * 3, ouc, 1, act=act)\n    \n    def forward(self, x):\n        return self.conv_1x1(torch.cat([self.conv_0(x), self.conv_x(x), self.conv_y(x)], dim=1))\n\nclass DSConv(nn.Module):\n    def __init__(self, in_ch, out_ch, morph, kernel_size=3, if_offset=True, extend_scope=1, act=True):\n        \"\"\"\n        The Dynamic Snake Convolution\n        :param in_ch: input channel\n        :param out_ch: output channel\n        :param kernel_size: the size of kernel\n        :param extend_scope: the range to expand (default 1 for this method)\n        :param morph: the morphology of the convolution kernel is mainly divided into two types\n                        along the x-axis (0) and the y-axis (1) (see the paper for details)\n        :param if_offset: whether deformation is required, if it is False, it is the standard convolution kernel\n        \"\"\"\n        super(DSConv, self).__init__()\n        # use the <offset_conv> to learn the deformable offset\n        self.offset_conv = nn.Conv2d(in_ch, 2 * kernel_size, 3, padding=1)\n        self.bn = nn.BatchNorm2d(2 * kernel_size)\n        self.kernel_size = kernel_size\n\n        # two types of the DSConv (along x-axis and y-axis)\n        self.dsc_conv_x = nn.Conv2d(\n            in_ch,\n            out_ch,\n            kernel_size=(kernel_size, 1),\n            stride=(kernel_size, 1),\n            padding=0,\n        )\n        self.dsc_conv_y = nn.Conv2d(\n            in_ch,\n            out_ch,\n            kernel_size=(1, kernel_size),\n            stride=(1, kernel_size),\n            padding=0,\n        )\n\n        self.gn = nn.GroupNorm(out_ch // 4, out_ch)\n        self.act = nn.SiLU() if act is True else (act if isinstance(act, nn.Module) else nn.Identity())\n\n        self.extend_scope = extend_scope\n        self.morph = morph\n        self.if_offset = if_offset\n\n    def forward(self, f):\n        offset = self.offset_conv(f)\n        offset = self.bn(offset)\n        # We need a range of deformation between -1 and 1 to mimic the snake's swing\n        offset = torch.tanh(offset)\n        input_shape = f.shape\n        dsc = DSC(input_shape, self.kernel_size, self.extend_scope, self.morph)\n        deformed_feature = dsc.deform_conv(f, offset, self.if_offset)\n        if self.morph == 0:\n            x = self.dsc_conv_x(deformed_feature.type(f.dtype))\n            x = self.gn(x)\n            x = self.act(x)\n            return x\n        else:\n            x = self.dsc_conv_y(deformed_feature.type(f.dtype))\n            x = self.gn(x)\n            x = self.act(x)\n            return x\n\n\n# Core code, for ease of understanding, we mark the dimensions of input and output next to the code\nclass DSC(object):\n    def __init__(self, input_shape, kernel_size, extend_scope, morph):\n        self.num_points = kernel_size\n        self.width = input_shape[2]\n        self.height = input_shape[3]\n        self.morph = morph\n        self.extend_scope = extend_scope  # offset (-1 ~ 1) * extend_scope\n\n        # define feature map shape\n        \"\"\"\n        B: Batch size  C: Channel  W: Width  H: Height\n        \"\"\"\n        self.num_batch = input_shape[0]\n        self.num_channels = input_shape[1]\n\n    \"\"\"\n    input: offset [B,2*K,W,H]  K: Kernel size (2*K: 2D image, deformation contains <x_offset> and <y_offset>)\n    output_x: [B,1,W,K*H]   coordinate map\n    output_y: [B,1,K*W,H]   coordinate map\n    \"\"\"\n\n    def _coordinate_map_3D(self, offset, if_offset):\n        device = offset.device\n        # offset\n        y_offset, x_offset = torch.split(offset, self.num_points, dim=1)\n\n        y_center = torch.arange(0, self.width).repeat([self.height])\n        y_center = y_center.reshape(self.height, self.width)\n        y_center = y_center.permute(1, 0)\n        y_center = y_center.reshape([-1, self.width, self.height])\n        y_center = y_center.repeat([self.num_points, 1, 1]).float()\n        y_center = y_center.unsqueeze(0)\n\n        x_center = torch.arange(0, self.height).repeat([self.width])\n        x_center = x_center.reshape(self.width, self.height)\n        x_center = x_center.permute(0, 1)\n        x_center = x_center.reshape([-1, self.width, self.height])\n        x_center = x_center.repeat([self.num_points, 1, 1]).float()\n        x_center = x_center.unsqueeze(0)\n\n        if self.morph == 0:\n            \"\"\"\n            Initialize the kernel and flatten the kernel\n                y: only need 0\n                x: -num_points//2 ~ num_points//2 (Determined by the kernel size)\n                !!! The related PPT will be submitted later, and the PPT will contain the whole changes of each step\n            \"\"\"\n            y = torch.linspace(0, 0, 1)\n            x = torch.linspace(\n                -int(self.num_points // 2),\n                int(self.num_points // 2),\n                int(self.num_points),\n            )\n\n            y, x = torch.meshgrid(y, x)\n            y_spread = y.reshape(-1, 1)\n            x_spread = x.reshape(-1, 1)\n\n            y_grid = y_spread.repeat([1, self.width * self.height])\n            y_grid = y_grid.reshape([self.num_points, self.width, self.height])\n            y_grid = y_grid.unsqueeze(0)  # [B*K*K, W,H]\n\n            x_grid = x_spread.repeat([1, self.width * self.height])\n            x_grid = x_grid.reshape([self.num_points, self.width, self.height])\n            x_grid = x_grid.unsqueeze(0)  # [B*K*K, W,H]\n\n            y_new = y_center + y_grid\n            x_new = x_center + x_grid\n\n            y_new = y_new.repeat(self.num_batch, 1, 1, 1).to(device)\n            x_new = x_new.repeat(self.num_batch, 1, 1, 1).to(device)\n\n            y_offset_new = y_offset.detach().clone()\n\n            if if_offset:\n                y_offset = y_offset.permute(1, 0, 2, 3)\n                y_offset_new = y_offset_new.permute(1, 0, 2, 3)\n                center = int(self.num_points // 2)\n\n                # The center position remains unchanged and the rest of the positions begin to swing\n                # This part is quite simple. The main idea is that \"offset is an iterative process\"\n                y_offset_new[center] = 0\n                for index in range(1, center):\n                    y_offset_new[center + index] = (y_offset_new[center + index - 1] + y_offset[center + index])\n                    y_offset_new[center - index] = (y_offset_new[center - index + 1] + y_offset[center - index])\n                y_offset_new = y_offset_new.permute(1, 0, 2, 3).to(device)\n                y_new = y_new.add(y_offset_new.mul(self.extend_scope))\n\n            y_new = y_new.reshape(\n                [self.num_batch, self.num_points, 1, self.width, self.height])\n            y_new = y_new.permute(0, 3, 1, 4, 2)\n            y_new = y_new.reshape([\n                self.num_batch, self.num_points * self.width, 1 * self.height\n            ])\n            x_new = x_new.reshape(\n                [self.num_batch, self.num_points, 1, self.width, self.height])\n            x_new = x_new.permute(0, 3, 1, 4, 2)\n            x_new = x_new.reshape([\n                self.num_batch, self.num_points * self.width, 1 * self.height\n            ])\n            return y_new, x_new\n\n        else:\n            \"\"\"\n            Initialize the kernel and flatten the kernel\n                y: -num_points//2 ~ num_points//2 (Determined by the kernel size)\n                x: only need 0\n            \"\"\"\n            y = torch.linspace(\n                -int(self.num_points // 2),\n                int(self.num_points // 2),\n                int(self.num_points),\n            )\n            x = torch.linspace(0, 0, 1)\n\n            y, x = torch.meshgrid(y, x)\n            y_spread = y.reshape(-1, 1)\n            x_spread = x.reshape(-1, 1)\n\n            y_grid = y_spread.repeat([1, self.width * self.height])\n            y_grid = y_grid.reshape([self.num_points, self.width, self.height])\n            y_grid = y_grid.unsqueeze(0)\n\n            x_grid = x_spread.repeat([1, self.width * self.height])\n            x_grid = x_grid.reshape([self.num_points, self.width, self.height])\n            x_grid = x_grid.unsqueeze(0)\n\n            y_new = y_center + y_grid\n            x_new = x_center + x_grid\n\n            y_new = y_new.repeat(self.num_batch, 1, 1, 1)\n            x_new = x_new.repeat(self.num_batch, 1, 1, 1)\n\n            y_new = y_new.to(device)\n            x_new = x_new.to(device)\n            x_offset_new = x_offset.detach().clone()\n\n            if if_offset:\n                x_offset = x_offset.permute(1, 0, 2, 3)\n                x_offset_new = x_offset_new.permute(1, 0, 2, 3)\n                center = int(self.num_points // 2)\n                x_offset_new[center] = 0\n                for index in range(1, center):\n                    x_offset_new[center + index] = (x_offset_new[center + index - 1] + x_offset[center + index])\n                    x_offset_new[center - index] = (x_offset_new[center - index + 1] + x_offset[center - index])\n                x_offset_new = x_offset_new.permute(1, 0, 2, 3).to(device)\n                x_new = x_new.add(x_offset_new.mul(self.extend_scope))\n\n            y_new = y_new.reshape(\n                [self.num_batch, 1, self.num_points, self.width, self.height])\n            y_new = y_new.permute(0, 3, 1, 4, 2)\n            y_new = y_new.reshape([\n                self.num_batch, 1 * self.width, self.num_points * self.height\n            ])\n            x_new = x_new.reshape(\n                [self.num_batch, 1, self.num_points, self.width, self.height])\n            x_new = x_new.permute(0, 3, 1, 4, 2)\n            x_new = x_new.reshape([\n                self.num_batch, 1 * self.width, self.num_points * self.height\n            ])\n            return y_new, x_new\n\n    \"\"\"\n    input: input feature map [N,C,D,W,H]；coordinate map [N,K*D,K*W,K*H] \n    output: [N,1,K*D,K*W,K*H]  deformed feature map\n    \"\"\"\n    def _bilinear_interpolate_3D(self, input_feature, y, x):\n        device = input_feature.device\n        y = y.reshape([-1]).float()\n        x = x.reshape([-1]).float()\n\n        zero = torch.zeros([]).int()\n        max_y = self.width - 1\n        max_x = self.height - 1\n\n        # find 8 grid locations\n        y0 = torch.floor(y).int()\n        y1 = y0 + 1\n        x0 = torch.floor(x).int()\n        x1 = x0 + 1\n\n        # clip out coordinates exceeding feature map volume\n        y0 = torch.clamp(y0, zero, max_y)\n        y1 = torch.clamp(y1, zero, max_y)\n        x0 = torch.clamp(x0, zero, max_x)\n        x1 = torch.clamp(x1, zero, max_x)\n\n        input_feature_flat = input_feature.flatten()\n        input_feature_flat = input_feature_flat.reshape(\n            self.num_batch, self.num_channels, self.width, self.height)\n        input_feature_flat = input_feature_flat.permute(0, 2, 3, 1)\n        input_feature_flat = input_feature_flat.reshape(-1, self.num_channels)\n        dimension = self.height * self.width\n\n        base = torch.arange(self.num_batch) * dimension\n        base = base.reshape([-1, 1]).float()\n\n        repeat = torch.ones([self.num_points * self.width * self.height\n                             ]).unsqueeze(0)\n        repeat = repeat.float()\n\n        base = torch.matmul(base, repeat)\n        base = base.reshape([-1])\n\n        base = base.to(device)\n\n        base_y0 = base + y0 * self.height\n        base_y1 = base + y1 * self.height\n\n        # top rectangle of the neighbourhood volume\n        index_a0 = base_y0 - base + x0\n        index_c0 = base_y0 - base + x1\n\n        # bottom rectangle of the neighbourhood volume\n        index_a1 = base_y1 - base + x0\n        index_c1 = base_y1 - base + x1\n\n        # get 8 grid values\n        value_a0 = input_feature_flat[index_a0.type(torch.int64)].to(device)\n        value_c0 = input_feature_flat[index_c0.type(torch.int64)].to(device)\n        value_a1 = input_feature_flat[index_a1.type(torch.int64)].to(device)\n        value_c1 = input_feature_flat[index_c1.type(torch.int64)].to(device)\n\n        # find 8 grid locations\n        y0 = torch.floor(y).int()\n        y1 = y0 + 1\n        x0 = torch.floor(x).int()\n        x1 = x0 + 1\n\n        # clip out coordinates exceeding feature map volume\n        y0 = torch.clamp(y0, zero, max_y + 1)\n        y1 = torch.clamp(y1, zero, max_y + 1)\n        x0 = torch.clamp(x0, zero, max_x + 1)\n        x1 = torch.clamp(x1, zero, max_x + 1)\n\n        x0_float = x0.float()\n        x1_float = x1.float()\n        y0_float = y0.float()\n        y1_float = y1.float()\n\n        vol_a0 = ((y1_float - y) * (x1_float - x)).unsqueeze(-1).to(device)\n        vol_c0 = ((y1_float - y) * (x - x0_float)).unsqueeze(-1).to(device)\n        vol_a1 = ((y - y0_float) * (x1_float - x)).unsqueeze(-1).to(device)\n        vol_c1 = ((y - y0_float) * (x - x0_float)).unsqueeze(-1).to(device)\n\n        outputs = (value_a0 * vol_a0 + value_c0 * vol_c0 + value_a1 * vol_a1 +\n                   value_c1 * vol_c1)\n\n        if self.morph == 0:\n            outputs = outputs.reshape([\n                self.num_batch,\n                self.num_points * self.width,\n                1 * self.height,\n                self.num_channels,\n            ])\n            outputs = outputs.permute(0, 3, 1, 2)\n        else:\n            outputs = outputs.reshape([\n                self.num_batch,\n                1 * self.width,\n                self.num_points * self.height,\n                self.num_channels,\n            ])\n            outputs = outputs.permute(0, 3, 1, 2)\n        return outputs\n\n    def deform_conv(self, input, offset, if_offset):\n        y, x = self._coordinate_map_3D(offset, if_offset)\n        deformed_feature = self._bilinear_interpolate_3D(input, y, x)\n        return deformed_feature"
  },
  {
    "path": "yolo-improve/yolov7-EVC.py",
    "content": "# parameters\nnc: 80  # number of classes\ndepth_multiple: 1.0  # model depth multiple\nwidth_multiple: 1.0  # layer channel multiple\n\n# anchors\nanchors:\n  - [10,13, 16,30, 33,23]  # P3/8\n  - [30,61, 62,45, 59,119]  # P4/16\n  - [116,90, 156,198, 373,326]  # P5/32\n\n# yolov7-tiny backbone\nbackbone:\n  # [from, number, module, args] c2, k=1, s=1, p=None, g=1, act=True\n  [[-1, 1, Conv, [32, 3, 2, None, 1, nn.LeakyReLU(0.1)]],  # 0-P1/2  \n  \n   [-1, 1, Conv, [64, 3, 2, None, 1, nn.LeakyReLU(0.1)]],  # 1-P2/4    \n\n   [-1, 1, Yolov7_Tiny_E_ELAN, [64, 32, nn.LeakyReLU(0.1)]], # 2\n\n   [-1, 1, MP, []],  # 3-P3/8\n   [-1, 1, Yolov7_Tiny_E_ELAN, [128, 64, nn.LeakyReLU(0.1)]], # 4\n\n   [-1, 1, MP, []],  # 5-P4/16\n   [-1, 1, Yolov7_Tiny_E_ELAN, [256, 128, nn.LeakyReLU(0.1)]], # 6\n\n   [-1, 1, MP, []],  # 7-P5/32\n   [-1, 1, Yolov7_Tiny_E_ELAN, [512, 256, nn.LeakyReLU(0.1)]], # 8\n  ]\n\n# yolov7-tiny head\nhead:\n  [[-1, 1, Yolov7_Tiny_SPP, [256, nn.LeakyReLU(0.1)]], # 9-Yolov7-tiny-spp\n   \n   [-1, 1, Conv, [128, 1, 1, None, 1, nn.LeakyReLU(0.1)]], \n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [6, 1, Conv, [128, 1, 1, None, 1, nn.LeakyReLU(0.1)]], # route backbone P4\n   [-1, 1, EVCBlock, []],\n   [[-1, -3], 1, Concat, [1]],\n   [-1, 1, Yolov7_Tiny_E_ELAN, [128, 64, nn.LeakyReLU(0.1)]], # 15\n\n   [-1, 1, Conv, [64, 1, 1, None, 1, nn.LeakyReLU(0.1)]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [4, 1, Conv, [64, 1, 1, None, 1, nn.LeakyReLU(0.1)]], # route backbone P3\n   [[-1, -2], 1, Concat, [1]],\n   [-1, 1, Yolov7_Tiny_E_ELAN, [64, 32, nn.LeakyReLU(0.1)]], # 20\n   \n   [-1, 1, Conv, [128, 3, 2, None, 1, nn.LeakyReLU(0.1)]],\n   [[-1, 15], 1, Concat, [1]],\n   [-1, 1, Yolov7_Tiny_E_ELAN, [128, 64, nn.LeakyReLU(0.1)]], # 23\n   \n   [-1, 1, Conv, [256, 3, 2, None, 1, nn.LeakyReLU(0.1)]],\n   [[-1, 9], 1, Concat, [1]],\n   \n   [-1, 1, Yolov7_Tiny_E_ELAN, [256, 128, nn.LeakyReLU(0.1)]], # 26\n\n   [20, 1, Conv, [128, 3, 1, None, 1, nn.LeakyReLU(0.1)]], # 27-P3\n   [23, 1, Conv, [256, 3, 1, None, 1, nn.LeakyReLU(0.1)]], # 28-P4\n   [26, 1, Conv, [512, 3, 1, None, 1, nn.LeakyReLU(0.1)]], # 29-P5\n\n   [[27, 28, 29], 1, IDetect, [nc, anchors]],   # Detect(P3, P4, P5)\n  ]"
  },
  {
    "path": "yolo-improve/yolov7-MPDiou.py",
    "content": "def bbox_mpdiou(box1, box2, x1y1x2y2=True, mpdiou_hw=None, grid=None, eps=1e-7):\n    # Returns the IoU of box1 to box2. box1 is 4, box2 is nx4\n    box2 = box2.T\n    box1[:2] += grid\n    box2[:2] += grid\n\n    # Get the coordinates of bounding boxes\n    if x1y1x2y2:  # x1, y1, x2, y2 = box1\n        b1_x1, b1_y1, b1_x2, b1_y2 = box1[0], box1[1], box1[2], box1[3]\n        b2_x1, b2_y1, b2_x2, b2_y2 = box2[0], box2[1], box2[2], box2[3]\n    else:  # transform from xywh to xyxy\n        b1_x1, b1_x2 = box1[0] - box1[2] / 2, box1[0] + box1[2] / 2\n        b1_y1, b1_y2 = box1[1] - box1[3] / 2, box1[1] + box1[3] / 2\n        b2_x1, b2_x2 = box2[0] - box2[2] / 2, box2[0] + box2[2] / 2\n        b2_y1, b2_y2 = box2[1] - box2[3] / 2, box2[1] + box2[3] / 2\n    \n    # Intersection area\n    inter = (torch.min(b1_x2, b2_x2) - torch.max(b1_x1, b2_x1)).clamp(0) * \\\n            (torch.min(b1_y2, b2_y2) - torch.max(b1_y1, b2_y1)).clamp(0)\n\n    # Union Area\n    w1, h1 = b1_x2 - b1_x1, b1_y2 - b1_y1 + eps\n    w2, h2 = b2_x2 - b2_x1, b2_y2 - b2_y1 + eps\n    union = w1 * h1 + w2 * h2 - inter + eps\n\n    iou = inter / union\n    d1 = (b2_x1 - b1_x1) ** 2 + (b2_y1 - b1_y1) ** 2\n    d2 = (b2_x2 - b1_x2) ** 2 + (b2_y2 - b1_y2) ** 2\n    return iou - d1 / mpdiou_hw - d2 / mpdiou_hw  # MPDIoU\n\n# ComputeLoss\niou = bbox_mpdiou(pbox.T, tbox[i], x1y1x2y2=False, mpdiou_hw=pi.size(2) ** 2 + pi.size(3) ** 2, grid=torch.stack([gj, gi]))  # iou(prediction, target)\n\n# ComputeLossOTA\niou = bbox_mpdiou(pbox.T, selected_tbox, x1y1x2y2=False, mpdiou_hw=pi.size(2) ** 2 + pi.size(3) ** 2, grid=torch.stack([gj, gi]))  # iou(prediction, target)"
  },
  {
    "path": "yolo-improve/yolov7-NWD.py",
    "content": "def wasserstein_loss(pred, target, eps=1e-7, constant=12.8):\n    r\"\"\"`Implementation of paper `Enhancing Geometric Factors into\n    Model Learning and Inference for Object Detection and Instance\n    Segmentation <https://arxiv.org/abs/2005.03572>`_.\n    Code is modified from https://github.com/Zzh-tju/CIoU.\n    Args:\n        pred (Tensor): Predicted bboxes of format (x_center, y_center, w, h),\n            shape (n, 4).\n        target (Tensor): Corresponding gt bboxes, shape (n, 4).\n        eps (float): Eps to avoid log(0).\n    Return:\n        Tensor: Loss tensor.\n    \"\"\"\n\n    center1 = pred[:, :2]\n    center2 = target[:, :2]\n\n    whs = center1[:, :2] - center2[:, :2]\n\n    center_distance = whs[:, 0] * whs[:, 0] + whs[:, 1] * whs[:, 1] + eps #\n\n    w1 = pred[:, 2]  + eps\n    h1 = pred[:, 3]  + eps\n    w2 = target[:, 2] + eps\n    h2 = target[:, 3] + eps\n\n    wh_distance = ((w1 - w2) ** 2 + (h1 - h2) ** 2) / 4\n\n    wasserstein_2 = center_distance + wh_distance\n    return torch.exp(-torch.sqrt(wasserstein_2) / constant)\n\nnwd = wasserstein_loss(pbox, tbox[i])\niou_ratio = 0.5\nlbox += (1 - iou_ratio) * (1.0 - nwd).mean() + iou_ratio * (1.0 - iou).mean()  # iou loss\n\n# Objectness\niou = (iou.detach() * iou_ratio + nwd.detach() * (1 - iou_ratio)).clamp(0, 1).type(tobj.dtype)"
  },
  {
    "path": "yolo-improve/yolov7-PConv.py",
    "content": "class PConv(nn.Module):\n    def __init__(self, dim, ouc, n_div=4, forward='split_cat'):\n        super().__init__()\n        self.dim_conv3 = dim // n_div\n        self.dim_untouched = dim - self.dim_conv3\n        self.partial_conv3 = nn.Conv2d(self.dim_conv3, self.dim_conv3, 3, 1, 1, bias=False)\n        self.conv = Conv(dim, ouc, k=1)\n\n        if forward == 'slicing':\n            self.forward = self.forward_slicing\n        elif forward == 'split_cat':\n            self.forward = self.forward_split_cat\n        else:\n            raise NotImplementedError\n\n    def forward_slicing(self, x):\n        # only for inference\n        x = x.clone()   # !!! Keep the original input intact for the residual connection later\n        x[:, :self.dim_conv3, :, :] = self.partial_conv3(x[:, :self.dim_conv3, :, :])\n        x = self.conv(x)\n        return x\n\n    def forward_split_cat(self, x):\n        # for training/inference\n        x1, x2 = torch.split(x, [self.dim_conv3, self.dim_untouched], dim=1)\n        x1 = self.partial_conv3(x1)\n        x = torch.cat((x1, x2), 1)\n        x = self.conv(x)\n        return x\n\n\n\n# !!!!!!!!!!!!!!!!!!!!!! yolov7-PConv.yaml\n# parameters\nnc: 80  # number of classes\ndepth_multiple: 1.0  # model depth multiple\nwidth_multiple: 1.0  # layer channel multiple\n\n# anchors\nanchors:\n  - [12,16, 19,36, 40,28]  # P3/8\n  - [36,75, 76,55, 72,146]  # P4/16\n  - [142,110, 192,243, 459,401]  # P5/32\n\n# yolov7 backbone\nbackbone:\n  # [from, number, module, args]\n  [[-1, 1, Conv, [32, 3, 1]],  # 0\n  \n   [-1, 1, Conv, [64, 3, 2]],  # 1-P1/2      \n   [-1, 1, Conv, [64, 3, 1]],\n   \n   [-1, 1, Conv, [128, 3, 2]],  # 3-P2/4  \n   [-1, 1, Conv, [64, 1, 1]],\n   [-2, 1, Conv, [64, 1, 1]],\n   [-1, 1, PConv, [64]],\n   [-1, 1, PConv, [64]],\n   [-1, 1, PConv, [64]],\n   [-1, 1, PConv, [64]],\n   [[-1, -3, -5, -6], 1, Concat, [1]],\n   [-1, 1, Conv, [256, 1, 1]],  # 11\n         \n   [-1, 1, MP, []],\n   [-1, 1, Conv, [128, 1, 1]],\n   [-3, 1, Conv, [128, 1, 1]],\n   [-1, 1, Conv, [128, 3, 2]],\n   [[-1, -3], 1, Concat, [1]],  # 16-P3/8  \n   [-1, 1, Conv, [128, 1, 1]],\n   [-2, 1, Conv, [128, 1, 1]],\n   [-1, 1, PConv, [128]],\n   [-1, 1, PConv, [128]],\n   [-1, 1, PConv, [128]],\n   [-1, 1, PConv, [128]],\n   [[-1, -3, -5, -6], 1, Concat, [1]],\n   [-1, 1, Conv, [512, 1, 1]],  # 24\n         \n   [-1, 1, MP, []],\n   [-1, 1, Conv, [256, 1, 1]],\n   [-3, 1, Conv, [256, 1, 1]],\n   [-1, 1, Conv, [256, 3, 2]],\n   [[-1, -3], 1, Concat, [1]],  # 29-P4/16  \n   [-1, 1, Conv, [256, 1, 1]],\n   [-2, 1, Conv, [256, 1, 1]],\n   [-1, 1, PConv, [256]],\n   [-1, 1, PConv, [256]],\n   [-1, 1, PConv, [256]],\n   [-1, 1, PConv, [256]],\n   [[-1, -3, -5, -6], 1, Concat, [1]],\n   [-1, 1, Conv, [1024, 1, 1]],  # 37\n         \n   [-1, 1, MP, []],\n   [-1, 1, Conv, [512, 1, 1]],\n   [-3, 1, Conv, [512, 1, 1]],\n   [-1, 1, Conv, [512, 3, 2]],\n   [[-1, -3], 1, Concat, [1]],  # 42-P5/32  \n   [-1, 1, Conv, [256, 1, 1]],\n   [-2, 1, Conv, [256, 1, 1]],\n   [-1, 1, PConv, [256]],\n   [-1, 1, PConv, [256]],\n   [-1, 1, PConv, [256]],\n   [-1, 1, PConv, [256]],\n   [[-1, -3, -5, -6], 1, Concat, [1]],\n   [-1, 1, Conv, [1024, 1, 1]],  # 50\n  ]\n\n# yolov7 head\nhead:\n  [[-1, 1, SPPCSPC, [512]], # 51\n  \n   [-1, 1, Conv, [256, 1, 1]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [37, 1, Conv, [256, 1, 1]], # route backbone P4\n   [[-1, -2], 1, Concat, [1]],\n   \n   [-1, 1, Conv, [256, 1, 1]],\n   [-2, 1, Conv, [256, 1, 1]],\n   [-1, 1, Conv, [128, 3, 1]],\n   [-1, 1, Conv, [128, 3, 1]],\n   [-1, 1, Conv, [128, 3, 1]],\n   [-1, 1, Conv, [128, 3, 1]],\n   [[-1, -2, -3, -4, -5, -6], 1, Concat, [1]],\n   [-1, 1, Conv, [256, 1, 1]], # 63\n   \n   [-1, 1, Conv, [128, 1, 1]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [24, 1, Conv, [128, 1, 1]], # route backbone P3\n   [[-1, -2], 1, Concat, [1]],\n   \n   [-1, 1, Conv, [128, 1, 1]],\n   [-2, 1, Conv, [128, 1, 1]],\n   [-1, 1, Conv, [64, 3, 1]],\n   [-1, 1, Conv, [64, 3, 1]],\n   [-1, 1, Conv, [64, 3, 1]],\n   [-1, 1, Conv, [64, 3, 1]],\n   [[-1, -2, -3, -4, -5, -6], 1, Concat, [1]],\n   [-1, 1, Conv, [128, 1, 1]], # 75\n      \n   [-1, 1, MP, []],\n   [-1, 1, Conv, [128, 1, 1]],\n   [-3, 1, Conv, [128, 1, 1]],\n   [-1, 1, Conv, [128, 3, 2]],\n   [[-1, -3, 63], 1, Concat, [1]],\n   \n   [-1, 1, Conv, [256, 1, 1]],\n   [-2, 1, Conv, [256, 1, 1]],\n   [-1, 1, Conv, [128, 3, 1]],\n   [-1, 1, Conv, [128, 3, 1]],\n   [-1, 1, Conv, [128, 3, 1]],\n   [-1, 1, Conv, [128, 3, 1]],\n   [[-1, -2, -3, -4, -5, -6], 1, Concat, [1]],\n   [-1, 1, Conv, [256, 1, 1]], # 88\n      \n   [-1, 1, MP, []],\n   [-1, 1, Conv, [256, 1, 1]],\n   [-3, 1, Conv, [256, 1, 1]],\n   [-1, 1, Conv, [256, 3, 2]],\n   [[-1, -3, 51], 1, Concat, [1]],\n   \n   [-1, 1, Conv, [512, 1, 1]],\n   [-2, 1, Conv, [512, 1, 1]],\n   [-1, 1, Conv, [256, 3, 1]],\n   [-1, 1, Conv, [256, 3, 1]],\n   [-1, 1, Conv, [256, 3, 1]],\n   [-1, 1, Conv, [256, 3, 1]],\n   [[-1, -2, -3, -4, -5, -6], 1, Concat, [1]],\n   [-1, 1, Conv, [512, 1, 1]], # 101\n   \n   [75, 1, RepConv, [256, 3, 1]],\n   [88, 1, RepConv, [512, 3, 1]],\n   [101, 1, RepConv, [1024, 3, 1]],\n\n   [[102,103,104], 1, IDetect, [nc, anchors]],   # Detect(P3, P4, P5)\n  ]\n"
  },
  {
    "path": "yolo-improve/yolov7-RFEM.py",
    "content": "class TridentBlock(nn.Module):\n    def __init__(self, c1, c2, stride=1, c=False, e=0.5, padding=[1, 2, 3], dilate=[1, 2, 3], bias=False):\n        super(TridentBlock, self).__init__()\n        self.stride = stride\n        self.c = c\n        c_ = int(c2 * e)\n        self.padding = padding\n        self.dilate = dilate\n        self.share_weightconv1 = nn.Parameter(torch.Tensor(c_, c1, 1, 1))\n        self.share_weightconv2 = nn.Parameter(torch.Tensor(c2, c_, 3, 3))\n\n        self.bn1 = nn.BatchNorm2d(c_)\n        self.bn2 = nn.BatchNorm2d(c2)\n\n        self.act = nn.SiLU()\n\n        nn.init.kaiming_uniform_(self.share_weightconv1, nonlinearity=\"relu\")\n        nn.init.kaiming_uniform_(self.share_weightconv2, nonlinearity=\"relu\")\n\n        if bias:\n            self.bias = nn.Parameter(torch.Tensor(c2))\n        else:\n            self.bias = None\n\n        if self.bias is not None:\n            nn.init.constant_(self.bias, 0)\n\n    def forward_for_small(self, x):\n        residual = x\n        out = nn.functional.conv2d(x, self.share_weightconv1, bias=self.bias)\n        out = self.bn1(out)\n        out = self.act(out)\n\n        out = nn.functional.conv2d(out, self.share_weightconv2, bias=self.bias, stride=self.stride, padding=self.padding[0],\n                                   dilation=self.dilate[0])\n        out = self.bn2(out)\n        out += residual\n        out = self.act(out)\n\n        return out\n\n    def forward_for_middle(self, x):\n        residual = x\n        out = nn.functional.conv2d(x, self.share_weightconv1, bias=self.bias)\n        out = self.bn1(out)\n        out = self.act(out)\n\n        out = nn.functional.conv2d(out, self.share_weightconv2, bias=self.bias, stride=self.stride, padding=self.padding[1],\n                                   dilation=self.dilate[1])\n        out = self.bn2(out)\n        out += residual\n        out = self.act(out)\n\n        return out\n\n    def forward_for_big(self, x):\n        residual = x\n        out = nn.functional.conv2d(x, self.share_weightconv1, bias=self.bias)\n        out = self.bn1(out)\n        out = self.act(out)\n\n        out = nn.functional.conv2d(out, self.share_weightconv2, bias=self.bias, stride=self.stride, padding=self.padding[2],\n                                   dilation=self.dilate[2])\n        out = self.bn2(out)\n        out += residual\n        out = self.act(out)\n\n        return out\n\n    def forward(self, x):\n        xm = x\n        base_feat = []\n        if self.c is not False:\n            x1 = self.forward_for_small(x)\n            x2 = self.forward_for_middle(x)\n            x3 = self.forward_for_big(x)\n        else:\n            x1 = self.forward_for_small(xm[0])\n            x2 = self.forward_for_middle(xm[1])\n            x3 = self.forward_for_big(xm[2])\n\n        base_feat.append(x1)\n        base_feat.append(x2)\n        base_feat.append(x3)\n\n        return base_feat\n\nclass RFEM(nn.Module):\n    def __init__(self, c1, c2, n=1, e=0.5, stride=1):\n        super(RFEM, self).__init__()\n        c = True\n        layers = []\n        layers.append(TridentBlock(c1, c2, stride=stride, c=c, e=e))\n        c1 = c2\n        for i in range(1, n):\n            layers.append(TridentBlock(c1, c2))\n        self.layer = nn.Sequential(*layers)\n        self.bn = nn.BatchNorm2d(c2)\n        self.act = nn.SiLU()\n\n    def forward(self, x):\n        out = self.layer(x)\n        out = out[0] + out[1] + out[2] + x\n        out = self.act(self.bn(out))\n        return out\n\n# Yolov7-REFM\n# parameters\nnc: 80  # number of classes\ndepth_multiple: 1.0  # model depth multiple\nwidth_multiple: 1.0  # layer channel multiple\n\n# anchors\nanchors:\n  - [12,16, 19,36, 40,28]  # P3/8\n  - [36,75, 76,55, 72,146]  # P4/16\n  - [142,110, 192,243, 459,401]  # P5/32\n\n# yolov7 backbone\nbackbone:\n  # [from, number, module, args]\n  [[-1, 1, Conv, [32, 3, 1]],  # 0\n  \n   [-1, 1, Conv, [64, 3, 2]],  # 1-P1/2      \n   [-1, 1, Conv, [64, 3, 1]],\n   \n   [-1, 1, Conv, [128, 3, 2]],  # 3-P2/4  \n   [-1, 1, Yolov7_E_ELAN, [256, 64]], # 4\n         \n   [-1, 1, V7DownSampling, [128]],  # 5-P3/8  \n   [-1, 1, Yolov7_E_ELAN, [512, 128]], # 6\n         \n   [-1, 1, V7DownSampling, [256]],  # 7-P4/16  \n   [-1, 1, Yolov7_E_ELAN, [1024, 256]], # 8\n         \n   [-1, 1, V7DownSampling, [512]],  # 9-P5/32  \n   [-1, 1, Yolov7_E_ELAN, [1024, 256]],  # 10\n  ]\n\n# yolov7 head\nhead:\n  [[-1, 1, SPPCSPC, [512]], # 11\n   [-1, 1, RFEM, [512]], # 12\n\n   [-1, 1, Conv, [256, 1, 1]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [8, 1, Conv, [256, 1, 1]], # 15 route backbone P4\n   [[-1, -2], 1, Concat, [1]], # 16\n   \n   [-1, 1, Yolov7_E_ELAN_NECK, [256, 128]], # 17\n   \n   [-1, 1, Conv, [128, 1, 1]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [6, 1, Conv, [128, 1, 1]], # 20 route backbone P3\n   [[-1, -2], 1, Concat, [1]], # 21\n   \n   [-1, 1, Yolov7_E_ELAN_NECK, [128, 64]], # 22\n      \n   [[-1, 17], 1, V7DownSampling_Neck, [128]], # 23\n   \n   [-1, 1, Yolov7_E_ELAN_NECK, [256, 128]], # 24\n      \n   [[-1, 12], 1, V7DownSampling_Neck, [256]], # 25\n   \n   [-1, 1, Yolov7_E_ELAN_NECK, [512, 256]], # 26\n   \n   [22, 1, RepConv, [256, 3, 1]], # 27-P3\n   [24, 1, RepConv, [512, 3, 1]], # 28-P4\n   [26, 1, RepConv, [1024, 3, 1]], # 29-P5\n\n   [[27, 28, 29], 1, IDetect, [nc, anchors]],   # Detect(P3, P4, P5)\n]\n"
  },
  {
    "path": "yolo-improve/yolov7-RepNCSPELAN.py",
    "content": "class RepConvN(nn.Module):\n    \"\"\"RepConv is a basic rep-style block, including training and deploy status\n    This code is based on https://github.com/DingXiaoH/RepVGG/blob/main/repvgg.py\n    \"\"\"\n    default_act = nn.SiLU()  # default activation\n\n    def __init__(self, c1, c2, k=3, s=1, p=1, g=1, d=1, act=True, bn=False, deploy=False):\n        super().__init__()\n        assert k == 3 and p == 1\n        self.g = g\n        self.c1 = c1\n        self.c2 = c2\n        self.act = self.default_act if act is True else act if isinstance(act, nn.Module) else nn.Identity()\n\n        self.bn = None\n        self.conv1 = Conv(c1, c2, k, s, p=p, g=g, act=False)\n        self.conv2 = Conv(c1, c2, 1, s, p=(p - k // 2), g=g, act=False)\n\n    def forward_fuse(self, x):\n        \"\"\"Forward process\"\"\"\n        return self.act(self.conv(x))\n\n    def forward(self, x):\n        \"\"\"Forward process\"\"\"\n        id_out = 0 if self.bn is None else self.bn(x)\n        return self.act(self.conv1(x) + self.conv2(x) + id_out)\n\n    def get_equivalent_kernel_bias(self):\n        kernel3x3, bias3x3 = self._fuse_bn_tensor(self.conv1)\n        kernel1x1, bias1x1 = self._fuse_bn_tensor(self.conv2)\n        kernelid, biasid = self._fuse_bn_tensor(self.bn)\n        return kernel3x3 + self._pad_1x1_to_3x3_tensor(kernel1x1) + kernelid, bias3x3 + bias1x1 + biasid\n\n    def _avg_to_3x3_tensor(self, avgp):\n        channels = self.c1\n        groups = self.g\n        kernel_size = avgp.kernel_size\n        input_dim = channels // groups\n        k = torch.zeros((channels, input_dim, kernel_size, kernel_size))\n        k[np.arange(channels), np.tile(np.arange(input_dim), groups), :, :] = 1.0 / kernel_size ** 2\n        return k\n\n    def _pad_1x1_to_3x3_tensor(self, kernel1x1):\n        if kernel1x1 is None:\n            return 0\n        else:\n            return torch.nn.functional.pad(kernel1x1, [1, 1, 1, 1])\n\n    def _fuse_bn_tensor(self, branch):\n        if branch is None:\n            return 0, 0\n        if isinstance(branch, Conv):\n            kernel = branch.conv.weight\n            running_mean = branch.bn.running_mean\n            running_var = branch.bn.running_var\n            gamma = branch.bn.weight\n            beta = branch.bn.bias\n            eps = branch.bn.eps\n        elif isinstance(branch, nn.BatchNorm2d):\n            if not hasattr(self, 'id_tensor'):\n                input_dim = self.c1 // self.g\n                kernel_value = np.zeros((self.c1, input_dim, 3, 3), dtype=np.float32)\n                for i in range(self.c1):\n                    kernel_value[i, i % input_dim, 1, 1] = 1\n                self.id_tensor = torch.from_numpy(kernel_value).to(branch.weight.device)\n            kernel = self.id_tensor\n            running_mean = branch.running_mean\n            running_var = branch.running_var\n            gamma = branch.weight\n            beta = branch.bias\n            eps = branch.eps\n        std = (running_var + eps).sqrt()\n        t = (gamma / std).reshape(-1, 1, 1, 1)\n        return kernel * t, beta - running_mean * gamma / std\n\n    def fuse_convs(self):\n        if hasattr(self, 'conv'):\n            return\n        kernel, bias = self.get_equivalent_kernel_bias()\n        self.conv = nn.Conv2d(in_channels=self.conv1.conv.in_channels,\n                              out_channels=self.conv1.conv.out_channels,\n                              kernel_size=self.conv1.conv.kernel_size,\n                              stride=self.conv1.conv.stride,\n                              padding=self.conv1.conv.padding,\n                              dilation=self.conv1.conv.dilation,\n                              groups=self.conv1.conv.groups,\n                              bias=True).requires_grad_(False)\n        self.conv.weight.data = kernel\n        self.conv.bias.data = bias\n        for para in self.parameters():\n            para.detach_()\n        self.__delattr__('conv1')\n        self.__delattr__('conv2')\n        if hasattr(self, 'nm'):\n            self.__delattr__('nm')\n        if hasattr(self, 'bn'):\n            self.__delattr__('bn')\n        if hasattr(self, 'id_tensor'):\n            self.__delattr__('id_tensor')\n\nclass RepNBottleneck(nn.Module):\n    # Standard bottleneck\n    def __init__(self, c1, c2, shortcut=True, g=1, k=(3, 3), e=0.5, act=True):  # ch_in, ch_out, shortcut, kernels, groups, expand\n        super().__init__()\n        c_ = int(c2 * e)  # hidden channels\n        self.cv1 = RepConvN(c1, c_, k[0], 1, act=act)\n        self.cv2 = Conv(c_, c2, k[1], 1, g=g, act=act)\n        self.add = shortcut and c1 == c2\n\n    def forward(self, x):\n        return x + self.cv2(self.cv1(x)) if self.add else self.cv2(self.cv1(x))\n\nclass RepNCSP(nn.Module):\n    # CSP Bottleneck with 3 convolutions\n    def __init__(self, c1, c2, n=1, shortcut=True, g=1, e=0.5, act=True):  # ch_in, ch_out, number, shortcut, groups, expansion\n        super().__init__()\n        c_ = int(c2 * e)  # hidden channels\n        self.cv1 = Conv(c1, c_, 1, 1, act=act)\n        self.cv2 = Conv(c1, c_, 1, 1, act=act)\n        self.cv3 = Conv(2 * c_, c2, 1, act=act)  # optional act=FReLU(c2)\n        self.m = nn.Sequential(*(RepNBottleneck(c_, c_, shortcut, g, e=1.0, act=act) for _ in range(n)))\n\n    def forward(self, x):\n        return self.cv3(torch.cat((self.m(self.cv1(x)), self.cv2(x)), 1))\n\nclass RepNCSPELAN4(nn.Module):\n    # csp-elan\n    def __init__(self, c1, c2, c3, c4, c5=1, act=True):  # ch_in, ch_out, number, shortcut, groups, expansion\n        super().__init__()\n        self.c = c3//2\n        self.cv1 = Conv(c1, c3, 1, 1, act=act)\n        self.cv2 = nn.Sequential(RepNCSP(c3//2, c4, c5, act=act), Conv(c4, c4, 3, 1, act=act))\n        self.cv3 = nn.Sequential(RepNCSP(c4, c4, c5, act=act), Conv(c4, c4, 3, 1, act=act))\n        self.cv4 = Conv(c3+(2*c4), c2, 1, 1, act=act)\n\n    def forward(self, x):\n        y = list(self.cv1(x).chunk(2, 1))\n        y.extend((m(y[-1])) for m in [self.cv2, self.cv3])\n        return self.cv4(torch.cat(y, 1))\n\n    def forward_split(self, x):\n        y = list(self.cv1(x).split((self.c, self.c), 1))\n        y.extend(m(y[-1]) for m in [self.cv2, self.cv3])\n        return self.cv4(torch.cat(y, 1))\n\n# ------------------------------yolo----------------------------\nif hasattr(m, 'fuse_convs'):\n    m.fuse_convs()\n    m.forward = m.forward_fuse\n\n# ------------------------------yolov7-tiny----------------------------------------\n# parameters\nnc: 80  # number of classes\ndepth_multiple: 1.0  # model depth multiple\nwidth_multiple: 1.0  # layer channel multiple\n\n# anchors\nanchors:\n  - [10,13, 16,30, 33,23]  # P3/8\n  - [30,61, 62,45, 59,119]  # P4/16\n  - [116,90, 156,198, 373,326]  # P5/32\n\n# yolov7-tiny backbone\nbackbone:\n  # [from, number, module, args] c2, k=1, s=1, p=None, g=1, act=True\n  [[-1, 1, Conv, [32, 3, 2, None, 1, nn.LeakyReLU(0.1)]],  # 0-P1/2  \n  \n   [-1, 1, Conv, [64, 3, 2, None, 1, nn.LeakyReLU(0.1)]],  # 1-P2/4    \n\n   [-1, 1, RepNCSPELAN4, [64, 32, 32, 1, nn.LeakyReLU(0.1)]], # 2\n\n   [-1, 1, MP, []],  # 3-P3/8\n   [-1, 1, RepNCSPELAN4, [128, 64, 32, 1, nn.LeakyReLU(0.1)]], # 4\n\n   [-1, 1, MP, []],  # 5-P4/16\n   [-1, 1, RepNCSPELAN4, [256, 128, 64, 1, nn.LeakyReLU(0.1)]], # 6\n\n   [-1, 1, MP, []],  # 7-P5/32\n   [-1, 1, RepNCSPELAN4, [512, 256, 128, 1, nn.LeakyReLU(0.1)]], # 8\n  ]\n\n# yolov7-tiny head\nhead:\n  [[-1, 1, Yolov7_Tiny_SPP, [256, nn.LeakyReLU(0.1)]], # 9-Yolov7-tiny-spp\n   \n   [-1, 1, Conv, [128, 1, 1, None, 1, nn.LeakyReLU(0.1)]], \n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [6, 1, Conv, [128, 1, 1, None, 1, nn.LeakyReLU(0.1)]], # route backbone P4\n   [[-1, -2], 1, Concat, [1]],\n   [-1, 1, RepNCSPELAN4, [128, 64, 32, 1, nn.LeakyReLU(0.1)]], # 14\n\n   [-1, 1, Conv, [64, 1, 1, None, 1, nn.LeakyReLU(0.1)]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [4, 1, Conv, [64, 1, 1, None, 1, nn.LeakyReLU(0.1)]], # route backbone P3\n   [[-1, -2], 1, Concat, [1]],\n   [-1, 1, RepNCSPELAN4,[64, 32, 32, 1, nn.LeakyReLU(0.1)]], # 19\n   \n   [-1, 1, Conv, [128, 3, 2, None, 1, nn.LeakyReLU(0.1)]],\n   [[-1, 14], 1, Concat, [1]],\n   [-1, 1, RepNCSPELAN4, [128, 64, 32, 1, nn.LeakyReLU(0.1)]], # 22\n\n   [-1, 1, Conv, [256, 3, 2, None, 1, nn.LeakyReLU(0.1)]],\n   [[-1, 9], 1, Concat, [1]],\n   [-1, 1, RepNCSPELAN4, [256, 128, 64, 1, nn.LeakyReLU(0.1)]], # 25\n\n   [19, 1, Conv, [128, 3, 1, None, 1, nn.LeakyReLU(0.1)]], # 26-P3\n   [22, 1, Conv, [256, 3, 1, None, 1, nn.LeakyReLU(0.1)]], # 27-P4\n   [25, 1, Conv, [512, 3, 1, None, 1, nn.LeakyReLU(0.1)]], # 28-P5\n\n   [[26, 27, 28], 1, IDetect, [nc, anchors]],   # Detect(P3, P4, P5)\n  ]\n\n\n# -----------------------------yolov7--------------------------------\n# parameters\nnc: 80  # number of classes\ndepth_multiple: 1.0  # model depth multiple\nwidth_multiple: 1.0  # layer channel multiple\n\n# anchors\nanchors:\n  - [12,16, 19,36, 40,28]  # P3/8\n  - [36,75, 76,55, 72,146]  # P4/16\n  - [142,110, 192,243, 459,401]  # P5/32\n\n# yolov7 backbone\nbackbone:\n  # [from, number, module, args]\n  [[-1, 1, Conv, [32, 3, 1]],  # 0\n  \n   [-1, 1, Conv, [64, 3, 2]],  # 1-P1/2      \n   [-1, 1, Conv, [64, 3, 1]],\n   \n   [-1, 1, Conv, [128, 3, 2]],  # 3-P2/4  \n   [-1, 1, RepNCSPELAN4, [256, 128, 64, 1]], # 4\n         \n   [-1, 1, V7DownSampling, [128]],  # 5-P3/8  \n   [-1, 1, RepNCSPELAN4, [512, 256, 128, 1]], # 6\n         \n   [-1, 1, V7DownSampling, [256]],  # 7-P4/16  \n   [-1, 1, RepNCSPELAN4, [1024, 512, 256, 1]], # 8\n         \n   [-1, 1, V7DownSampling, [512]],  # 9-P5/32  \n   [-1, 1, RepNCSPELAN4, [1024, 512, 256, 1]],  # 10\n  ]\n\n# yolov7 head\nhead:\n  [[-1, 1, SPPCSPC, [512]], # 11\n\n   [-1, 1, Conv, [256, 1, 1]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [8, 1, Conv, [256, 1, 1]], # 14 route backbone P4\n   [[-1, -2], 1, Concat, [1]], # 15\n   \n   [-1, 1, RepNCSPELAN4, [256, 128, 64, 1]], # 16\n   \n   [-1, 1, Conv, [128, 1, 1]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [6, 1, Conv, [128, 1, 1]], # 19 route backbone P3\n   [[-1, -2], 1, Concat, [1]], # 20\n   \n   [-1, 1, RepNCSPELAN4, [128, 64, 32, 1]], # 21\n      \n   [[-1, 16], 1, V7DownSampling_Neck, [128]], # 22\n   \n   [-1, 1, RepNCSPELAN4, [256, 128, 64, 1]], # 23\n      \n   [[-1, 11], 1, V7DownSampling_Neck, [256]], # 24\n   \n   [-1, 1, RepNCSPELAN4, [512, 256, 128, 1]], # 25\n   \n   [21, 1, RepConv, [256, 3, 1]], # 26-P3\n   [23, 1, RepConv, [512, 3, 1]], # 27-P4\n   [25, 1, RepConv, [1024, 3, 1]], # 28-P5\n\n   [[26, 27, 28], 1, IDetect, [nc, anchors]],   # Detect(P3, P4, P5)\n  ]\n"
  },
  {
    "path": "yolo-improve/yolov7-SAConv.py",
    "content": "class ConvAWS2d(nn.Conv2d):\n    def __init__(self,\n                 in_channels,\n                 out_channels,\n                 kernel_size,\n                 stride=1,\n                 padding=0,\n                 dilation=1,\n                 groups=1,\n                 bias=True):\n        super().__init__(\n            in_channels,\n            out_channels,\n            kernel_size,\n            stride=stride,\n            padding=padding,\n            dilation=dilation,\n            groups=groups,\n            bias=bias)\n        self.register_buffer('weight_gamma', torch.ones(self.out_channels, 1, 1, 1))\n        self.register_buffer('weight_beta', torch.zeros(self.out_channels, 1, 1, 1))\n\n    def _get_weight(self, weight):\n        weight_mean = weight.mean(dim=1, keepdim=True).mean(dim=2,\n                                  keepdim=True).mean(dim=3, keepdim=True)\n        weight = weight - weight_mean\n        std = torch.sqrt(weight.view(weight.size(0), -1).var(dim=1) + 1e-5).view(-1, 1, 1, 1)\n        weight = weight / std\n        weight = self.weight_gamma * weight + self.weight_beta\n        return weight\n\n    def forward(self, x):\n        weight = self._get_weight(self.weight)\n        return super()._conv_forward(x, weight, None)\n\n    def _load_from_state_dict(self, state_dict, prefix, local_metadata, strict,\n                              missing_keys, unexpected_keys, error_msgs):\n        self.weight_gamma.data.fill_(-1)\n        super()._load_from_state_dict(state_dict, prefix, local_metadata, strict,\n                                      missing_keys, unexpected_keys, error_msgs)\n        if self.weight_gamma.data.mean() > 0:\n            return\n        weight = self.weight.data\n        weight_mean = weight.data.mean(dim=1, keepdim=True).mean(dim=2,\n                                       keepdim=True).mean(dim=3, keepdim=True)\n        self.weight_beta.data.copy_(weight_mean)\n        std = torch.sqrt(weight.view(weight.size(0), -1).var(dim=1) + 1e-5).view(-1, 1, 1, 1)\n        self.weight_gamma.data.copy_(std)\n    \nclass SAConv2d(ConvAWS2d):\n    def __init__(self,\n                 in_channels,\n                 out_channels,\n                 kernel_size,\n                 s=1,\n                 p=None,\n                 g=1,\n                 d=1,\n                 act=True,\n                 bias=True):\n        super().__init__(\n            in_channels,\n            out_channels,\n            kernel_size,\n            stride=s,\n            padding=autopad(kernel_size, p),\n            dilation=d,\n            groups=g,\n            bias=bias)\n        self.switch = torch.nn.Conv2d(\n            self.in_channels,\n            1,\n            kernel_size=1,\n            stride=s,\n            bias=True)\n        self.switch.weight.data.fill_(0)\n        self.switch.bias.data.fill_(1)\n        self.weight_diff = torch.nn.Parameter(torch.Tensor(self.weight.size()))\n        self.weight_diff.data.zero_()\n        self.pre_context = torch.nn.Conv2d(\n            self.in_channels,\n            self.in_channels,\n            kernel_size=1,\n            bias=True)\n        self.pre_context.weight.data.fill_(0)\n        self.pre_context.bias.data.fill_(0)\n        self.post_context = torch.nn.Conv2d(\n            self.out_channels,\n            self.out_channels,\n            kernel_size=1,\n            bias=True)\n        self.post_context.weight.data.fill_(0)\n        self.post_context.bias.data.fill_(0)\n        \n        self.bn = nn.BatchNorm2d(out_channels)\n        self.act = nn.SiLU() if act is True else (act if isinstance(act, nn.Module) else nn.Identity())\n\n    def forward(self, x):\n        # pre-context\n        avg_x = torch.nn.functional.adaptive_avg_pool2d(x, output_size=1)\n        avg_x = self.pre_context(avg_x)\n        avg_x = avg_x.expand_as(x)\n        x = x + avg_x\n        # switch\n        avg_x = torch.nn.functional.pad(x, pad=(2, 2, 2, 2), mode=\"reflect\")\n        avg_x = torch.nn.functional.avg_pool2d(avg_x, kernel_size=5, stride=1, padding=0)\n        switch = self.switch(avg_x)\n        # sac\n        weight = self._get_weight(self.weight)\n        out_s = super()._conv_forward(x, weight, None)\n        ori_p = self.padding\n        ori_d = self.dilation\n        self.padding = tuple(3 * p for p in self.padding)\n        self.dilation = tuple(3 * d for d in self.dilation)\n        weight = weight + self.weight_diff\n        out_l = super()._conv_forward(x, weight, None)\n        out = switch * out_s + (1 - switch) * out_l\n        self.padding = ori_p\n        self.dilation = ori_d\n        # post-context\n        avg_x = torch.nn.functional.adaptive_avg_pool2d(out, output_size=1)\n        avg_x = self.post_context(avg_x)\n        avg_x = avg_x.expand_as(out)\n        out = out + avg_x\n        return self.act(self.bn(out))\n"
  },
  {
    "path": "yolo-improve/yolov7-asf.py",
    "content": "import torch.nn.functional as F\nclass Zoom_cat(nn.Module):\n    def __init__(self):\n        super().__init__()\n\n    def forward(self, x):\n        \"\"\"l,m,s表示大中小三个尺度，最终会被整合到m这个尺度上\"\"\"\n        l, m, s = x[0], x[1], x[2]\n        tgt_size = m.shape[2:]\n        l = F.adaptive_max_pool2d(l, tgt_size) + F.adaptive_avg_pool2d(l, tgt_size)\n        s = F.interpolate(s, m.shape[2:], mode='nearest')\n        lms = torch.cat([l, m, s], dim=1)\n        return lms\n\nclass ScalSeq(nn.Module):\n    def __init__(self, inc, channel):\n        super(ScalSeq, self).__init__()\n        self.conv0 = Conv(inc[0], channel, 1)\n        self.conv1 =  Conv(inc[1], channel,1)\n        self.conv2 =  Conv(inc[2], channel,1)\n        self.conv3d = nn.Conv3d(channel,channel,kernel_size=(1,1,1))\n        self.bn = nn.BatchNorm3d(channel)\n        self.act = nn.LeakyReLU(0.1)\n        self.pool_3d = nn.MaxPool3d(kernel_size=(3,1,1))\n\n    def forward(self, x):\n        p3, p4, p5 = x[0],x[1],x[2]\n        p3 = self.conv0(p3)\n        p4_2 = self.conv1(p4)\n        p4_2 = F.interpolate(p4_2, p3.size()[2:], mode='nearest')\n        p5_2 = self.conv2(p5)\n        p5_2 = F.interpolate(p5_2, p3.size()[2:], mode='nearest')\n        p3_3d = torch.unsqueeze(p3, -3)\n        p4_3d = torch.unsqueeze(p4_2, -3)\n        p5_3d = torch.unsqueeze(p5_2, -3)\n        combine = torch.cat([p3_3d,p4_3d,p5_3d],dim = 2)\n        conv_3d = self.conv3d(combine)\n        bn = self.bn(conv_3d)\n        act = self.act(bn)\n        x = self.pool_3d(act)\n        x = torch.squeeze(x, 2)\n        return x\n    \nclass Add(nn.Module):\n    # Concatenate a list of tensors along dimension\n    def __init__(self):\n        super().__init__()\n\n    def forward(self, x):\n        input1,input2 = x[0],x[1]\n        x = input1 + input2\n        return x\n\nclass channel_att(nn.Module):\n    def __init__(self, channel, b=1, gamma=2):\n        super(channel_att, self).__init__()\n        kernel_size = int(abs((math.log(channel, 2) + b) / gamma))\n        kernel_size = kernel_size if kernel_size % 2 else kernel_size + 1\n        \n        self.avg_pool = nn.AdaptiveAvgPool2d(1)\n        self.conv = nn.Conv1d(1, 1, kernel_size=kernel_size, padding=(kernel_size - 1) // 2, bias=False) \n        self.sigmoid = nn.Sigmoid()\n\n    def forward(self, x):\n        y = self.avg_pool(x)\n        y = y.squeeze(-1)\n        y = y.transpose(-1, -2)\n        y = self.conv(y).transpose(-1, -2).unsqueeze(-1)\n        y = self.sigmoid(y)\n        return x * y.expand_as(x)\n    \nclass local_att(nn.Module):\n    def __init__(self, channel, reduction=16):\n        super(local_att, self).__init__()\n        \n        self.conv_1x1 = nn.Conv2d(in_channels=channel, out_channels=channel//reduction, kernel_size=1, stride=1, bias=False)\n \n        self.relu   = nn.ReLU()\n        self.bn     = nn.BatchNorm2d(channel//reduction)\n \n        self.F_h = nn.Conv2d(in_channels=channel//reduction, out_channels=channel, kernel_size=1, stride=1, bias=False)\n        self.F_w = nn.Conv2d(in_channels=channel//reduction, out_channels=channel, kernel_size=1, stride=1, bias=False)\n \n        self.sigmoid_h = nn.Sigmoid()\n        self.sigmoid_w = nn.Sigmoid()\n \n    def forward(self, x):\n        _, _, h, w = x.size()\n        \n        x_h = torch.mean(x, dim = 3, keepdim = True).permute(0, 1, 3, 2)\n        x_w = torch.mean(x, dim = 2, keepdim = True)\n \n        x_cat_conv_relu = self.relu(self.bn(self.conv_1x1(torch.cat((x_h, x_w), 3))))\n \n        x_cat_conv_split_h, x_cat_conv_split_w = x_cat_conv_relu.split([h, w], 3)\n \n        s_h = self.sigmoid_h(self.F_h(x_cat_conv_split_h.permute(0, 1, 3, 2)))\n        s_w = self.sigmoid_w(self.F_w(x_cat_conv_split_w))\n \n        out = x * s_h.expand_as(x) * s_w.expand_as(x)\n        return out\n    \nclass attention_model(nn.Module):\n    # Concatenate a list of tensors along dimension\n    def __init__(self, ch = 256):\n        super().__init__()\n        self.channel_att = channel_att(ch)\n        self.local_att = local_att(ch)\n    def forward(self, x):\n        input1,input2 = x[0],x[1]\n        input1 = self.channel_att(input1)\n        x = input1 + input2\n        x = self.local_att(x)\n        return x\n\nelif m is Zoom_cat:\n    c2 = sum(ch[x] for x in f)\nelif m is Add:\n    c2 = ch[f[-1]]\nelif m is attention_model:\n    c2 = ch[f[-1]]\n    args = [c2]\nelif m is ScalSeq:\n    c1 = [ch[x] for x in f]\n    c2 = make_divisible(args[0] * gw, 8)\n    args = [c1, c2]\n\n##################################################### YOLOV7-TINY #####################################################\n# parameters\nnc: 80  # number of classes\ndepth_multiple: 1.0  # model depth multiple\nwidth_multiple: 1.0  # layer channel multiple\n\n# anchors\nanchors:\n  - [10,13, 16,30, 33,23]  # P3/8\n  - [30,61, 62,45, 59,119]  # P4/16\n  - [116,90, 156,198, 373,326]  # P5/32\n\n# yolov7-tiny backbone\nbackbone:\n  # [from, number, module, args] c2, k=1, s=1, p=None, g=1, act=True\n  [[-1, 1, Conv, [32, 3, 2, None, 1, nn.LeakyReLU(0.1)]],  # 0-P1/2  \n  \n   [-1, 1, Conv, [64, 3, 2, None, 1, nn.LeakyReLU(0.1)]],  # 1-P2/4    \n\n   [-1, 1, Yolov7_Tiny_E_ELAN, [64, 32, nn.LeakyReLU(0.1)]], # 2\n\n   [-1, 1, MP, []],  # 3-P3/8\n   [-1, 1, Yolov7_Tiny_E_ELAN, [128, 64, nn.LeakyReLU(0.1)]], # 4\n\n   [-1, 1, MP, []],  # 5-P4/16\n   [-1, 1, Yolov7_Tiny_E_ELAN, [256, 128, nn.LeakyReLU(0.1)]], # 6\n\n   [-1, 1, MP, []],  # 7-P5/32\n   [-1, 1, Yolov7_Tiny_E_ELAN, [512, 256, nn.LeakyReLU(0.1)]], # 8\n  ]\n\n# yolov7-tiny head\nhead:\n  [[-1, 1, Yolov7_Tiny_SPP, [256, nn.LeakyReLU(0.1)]], # 9-Yolov7-tiny-spp\n   \n   [-1, 1, Conv, [256, 1, 1, None, 1, nn.LeakyReLU(0.1)]], \n   [4, 1, Conv, [256, 1, 1, None, 1, nn.LeakyReLU(0.1)]],\n   [[-1, 6, -2], 1, Zoom_cat, []], # route backbone P4\n   [-1, 1, Yolov7_Tiny_E_ELAN, [128, 64, nn.LeakyReLU(0.1)]], # 13\n\n   [-1, 1, Conv, [128, 1, 1, None, 1, nn.LeakyReLU(0.1)]],\n   [2, 1, Conv, [128, 1, 1, None, 1, nn.LeakyReLU(0.1)]], # 15\n   [[-1, 4, -2], 1, Zoom_cat, []],\n   [-1, 1, Yolov7_Tiny_E_ELAN, [64, 32, nn.LeakyReLU(0.1)]], # 17\n   \n   [-1, 1, Conv, [128, 3, 2, None, 1, nn.LeakyReLU(0.1)]], # 18\n   [[-1, 13], 1, Concat, [1]],\n   [-1, 1, Yolov7_Tiny_E_ELAN, [128, 64, nn.LeakyReLU(0.1)]], # 20\n\n   [-1, 1, Conv, [256, 3, 2, None, 1, nn.LeakyReLU(0.1)]],\n   [[-1, 9], 1, Concat, [1]],\n   [-1, 1, Yolov7_Tiny_E_ELAN, [256, 128, nn.LeakyReLU(0.1)]], # 23\n\n   [[4, 6, 8], 1, ScalSeq, [64]], #24 args[inchane]\n   [[17, -1], 1, attention_model, []], #25\n\n   [25, 1, Conv, [128, 3, 1, None, 1, nn.LeakyReLU(0.1)]], # 26-P3\n   [23, 1, Conv, [256, 3, 1, None, 1, nn.LeakyReLU(0.1)]], # 27-P4\n   [20, 1, Conv, [512, 3, 1, None, 1, nn.LeakyReLU(0.1)]], # 28-P5\n\n   [[26,27,28], 1, IDetect, [nc, anchors]],   # Detect(P3, P4, P5)\n  ]\n\n\n##################################################### YOLOV7 #####################################################\n# parameters\nnc: 80  # number of classes\ndepth_multiple: 1.0  # model depth multiple\nwidth_multiple: 1.0  # layer channel multiple\n\n# anchors\nanchors:\n  - [12,16, 19,36, 40,28]  # P3/8\n  - [36,75, 76,55, 72,146]  # P4/16\n  - [142,110, 192,243, 459,401]  # P5/32\n\n# yolov7 backbone\nbackbone:\n  # [from, number, module, args]\n  [[-1, 1, Conv, [32, 3, 1]],  # 0\n  \n   [-1, 1, Conv, [64, 3, 2]],  # 1-P1/2      \n   [-1, 1, Conv, [64, 3, 1]],\n   \n   [-1, 1, Conv, [128, 3, 2]],  # 3-P2/4  \n   [-1, 1, Yolov7_E_ELAN, [256, 64]], # 4\n         \n   [-1, 1, V7DownSampling, [128]],  # 5-P3/8  \n   [-1, 1, Yolov7_E_ELAN, [512, 128]], # 6\n         \n   [-1, 1, V7DownSampling, [256]],  # 7-P4/16  \n   [-1, 1, Yolov7_E_ELAN, [1024, 256]], # 8\n         \n   [-1, 1, V7DownSampling, [512]],  # 9-P5/32  \n   [-1, 1, Yolov7_E_ELAN, [1024, 256]],  # 10\n  ]\n\n# yolov7 head\nhead:\n  [[-1, 1, SPPCSPC, [512]], # 11\n\n   [-1, 1, Conv, [1024, 1, 1, None, 1, nn.LeakyReLU(0.1)]], \n   [6, 1, Conv, [1024, 1, 1, None, 1, nn.LeakyReLU(0.1)]],\n   [[-1, 8, -2], 1, Zoom_cat, []], # route backbone P4\n   [-1, 1, Yolov7_E_ELAN_NECK, [256, 128]], # 15\n   \n   [-1, 1, Conv, [512, 1, 1, None, 1, nn.LeakyReLU(0.1)]],\n   [4, 1, Conv, [512, 1, 1, None, 1, nn.LeakyReLU(0.1)]], # 17\n   [[-1, 6, -2], 1, Zoom_cat, []], # 18\n   [-1, 1, Yolov7_E_ELAN_NECK, [128, 64]], # 19\n      \n   [[-1, 15], 1, V7DownSampling_Neck, [128]], # 20\n   \n   [-1, 1, Yolov7_E_ELAN_NECK, [256, 128]], # 21\n      \n   [[-1, 11], 1, V7DownSampling_Neck, [256]], # 22\n   \n   [-1, 1, Yolov7_E_ELAN_NECK, [512, 256]], # 23\n   \n   [[6, 8, 10], 1, ScalSeq, [128]], #24 args[inchane]\n   [[19, -1], 1, attention_model, []], #25\n\n   [25, 1, RepConv, [256, 3, 1]], # 26-P3\n   [21, 1, RepConv, [512, 3, 1]], # 27-P4\n   [23, 1, RepConv, [1024, 3, 1]], # 28-P5\n\n   [[26, 27, 28], 1, IDetect, [nc, anchors]],   # Detect(P3, P4, P5)\n  ]\n"
  },
  {
    "path": "yolo-improve/yolov7-head/yolov7-tiny-5-heads.yaml",
    "content": "# parameters\nnc: 80  # number of classes\ndepth_multiple: 1.0  # model depth multiple\nwidth_multiple: 1.0  # layer channel multiple\n\n# anchors\nanchors: 3\n\n# yolov7-tiny backbone\nbackbone:\n  # [from, number, module, args] c2, k=1, s=1, p=None, g=1, act=True\n  [[-1, 1, Conv, [32, 3, 2, None, 1, nn.LeakyReLU(0.1)]],  # 0-P1/2  \n  \n   [-1, 1, Conv, [64, 3, 2, None, 1, nn.LeakyReLU(0.1)]],  # 1-P2/4    \n\n   [-1, 1, Yolov7_Tiny_E_ELAN, [64, 32, nn.LeakyReLU(0.1)]], # 2\n\n   [-1, 1, MP, []],  # 3-P3/8\n   [-1, 1, Yolov7_Tiny_E_ELAN, [128, 64, nn.LeakyReLU(0.1)]], # 4\n\n   [-1, 1, MP, []],  # 5-P4/16\n   [-1, 1, Yolov7_Tiny_E_ELAN, [256, 128, nn.LeakyReLU(0.1)]], # 6\n\n   [-1, 1, MP, []],  # 7-P5/32\n   [-1, 1, Yolov7_Tiny_E_ELAN, [512, 256, nn.LeakyReLU(0.1)]], # 8\n  ]\n\n# yolov7-tiny head\nhead:\n  [[-1, 1, Yolov7_Tiny_SPP, [256, nn.LeakyReLU(0.1)]], # 9-Yolov7-tiny-spp\n   \n   [-1, 1, Conv, [128, 1, 1, None, 1, nn.LeakyReLU(0.1)]], \n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [6, 1, Conv, [128, 1, 1, None, 1, nn.LeakyReLU(0.1)]], # route backbone P4\n   [[-1, -2], 1, Concat, [1]],\n   [-1, 1, Yolov7_Tiny_E_ELAN, [128, 64, nn.LeakyReLU(0.1)]], # 14\n\n   [-1, 1, Conv, [64, 1, 1, None, 1, nn.LeakyReLU(0.1)]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [4, 1, Conv, [64, 1, 1, None, 1, nn.LeakyReLU(0.1)]], # route backbone P3\n   [[-1, -2], 1, Concat, [1]],\n   [-1, 1, Yolov7_Tiny_E_ELAN, [64, 32, nn.LeakyReLU(0.1)]], # 19\n   \n   [-1, 1, Conv, [32, 1, 1, None, 1, nn.LeakyReLU(0.1)]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [2, 1, Conv, [32, 1, 1, None, 1, nn.LeakyReLU(0.1)]], # route backbone P2\n   [[-1, -2], 1, Concat, [1]],\n   [-1, 1, Yolov7_Tiny_E_ELAN, [64, 32, nn.LeakyReLU(0.1)]], # 24\n\n   [-1, 1, Conv, [64, 3, 2, None, 1, nn.LeakyReLU(0.1)]],\n   [[-1, 19], 1, Concat, [1]],\n   [-1, 1, Yolov7_Tiny_E_ELAN, [128, 64, nn.LeakyReLU(0.1)]], # 27\n\n   [-1, 1, Conv, [128, 3, 2, None, 1, nn.LeakyReLU(0.1)]],\n   [[-1, 14], 1, Concat, [1]],\n   [-1, 1, Yolov7_Tiny_E_ELAN, [128, 64, nn.LeakyReLU(0.1)]], # 30\n\n   [-1, 1, Conv, [256, 3, 2, None, 1, nn.LeakyReLU(0.1)]],\n   [[-1, 9], 1, Concat, [1]],\n   [-1, 1, Yolov7_Tiny_E_ELAN, [256, 128, nn.LeakyReLU(0.1)]], # 33\n\n   [24, 1, Conv, [64, 3, 1, None, 1, nn.LeakyReLU(0.1)]], # 34-P2\n   [27, 1, Conv, [128, 3, 1, None, 1, nn.LeakyReLU(0.1)]], # 35-P3\n   [30, 1, Conv, [256, 3, 1, None, 1, nn.LeakyReLU(0.1)]], # 36-P4\n   [33, 1, Conv, [512, 3, 1, None, 1, nn.LeakyReLU(0.1)]], # 37-P5\n\n   [33, 1, MP, []],  # 38-P5/32\n   [-1, 1, Yolov7_Tiny_E_ELAN, [512, 256, nn.LeakyReLU(0.1)]], # 39\n\n   [[34, 35, 36, 37, 39], 1, IDetect, [nc, anchors]],   # Detect(P2, P3, P4, P5, P6)\n  ]"
  },
  {
    "path": "yolo-improve/yolov7-head/yolov7-tiny-P2.yaml",
    "content": "# parameters\nnc: 80  # number of classes\ndepth_multiple: 1.0  # model depth multiple\nwidth_multiple: 1.0  # layer channel multiple\n\n# anchors\nanchors: 3\n\n# yolov7-tiny backbone\nbackbone:\n  # [from, number, module, args] c2, k=1, s=1, p=None, g=1, act=True\n  [[-1, 1, Conv, [32, 3, 2, None, 1, nn.LeakyReLU(0.1)]],  # 0-P1/2  \n  \n   [-1, 1, Conv, [64, 3, 2, None, 1, nn.LeakyReLU(0.1)]],  # 1-P2/4    \n\n   [-1, 1, Yolov7_Tiny_E_ELAN, [64, 32, nn.LeakyReLU(0.1)]], # 2\n\n   [-1, 1, MP, []],  # 3-P3/8\n   [-1, 1, Yolov7_Tiny_E_ELAN, [128, 64, nn.LeakyReLU(0.1)]], # 4\n\n   [-1, 1, MP, []],  # 5-P4/16\n   [-1, 1, Yolov7_Tiny_E_ELAN, [256, 128, nn.LeakyReLU(0.1)]], # 6\n\n   [-1, 1, MP, []],  # 7-P5/32\n   [-1, 1, Yolov7_Tiny_E_ELAN, [512, 256, nn.LeakyReLU(0.1)]], # 8\n  ]\n\n# yolov7-tiny head\nhead:\n  [[-1, 1, Yolov7_Tiny_SPP, [256, nn.LeakyReLU(0.1)]], # 9-Yolov7-tiny-spp\n   \n   [-1, 1, Conv, [128, 1, 1, None, 1, nn.LeakyReLU(0.1)]], \n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [6, 1, Conv, [128, 1, 1, None, 1, nn.LeakyReLU(0.1)]], # route backbone P4\n   [[-1, -2], 1, Concat, [1]],\n   [-1, 1, Yolov7_Tiny_E_ELAN, [128, 64, nn.LeakyReLU(0.1)]], # 14\n\n   [-1, 1, Conv, [64, 1, 1, None, 1, nn.LeakyReLU(0.1)]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [4, 1, Conv, [64, 1, 1, None, 1, nn.LeakyReLU(0.1)]], # route backbone P3\n   [[-1, -2], 1, Concat, [1]],\n   [-1, 1, Yolov7_Tiny_E_ELAN, [64, 32, nn.LeakyReLU(0.1)]], # 19\n   \n   [-1, 1, Conv, [32, 1, 1, None, 1, nn.LeakyReLU(0.1)]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [2, 1, Conv, [32, 1, 1, None, 1, nn.LeakyReLU(0.1)]], # route backbone P2\n   [[-1, -2], 1, Concat, [1]],\n   [-1, 1, Yolov7_Tiny_E_ELAN, [64, 32, nn.LeakyReLU(0.1)]], # 24\n\n   [-1, 1, Conv, [64, 3, 2, None, 1, nn.LeakyReLU(0.1)]],\n   [[-1, 19], 1, Concat, [1]],\n   [-1, 1, Yolov7_Tiny_E_ELAN, [128, 64, nn.LeakyReLU(0.1)]], # 27\n\n   [-1, 1, Conv, [128, 3, 2, None, 1, nn.LeakyReLU(0.1)]],\n   [[-1, 14], 1, Concat, [1]],\n   [-1, 1, Yolov7_Tiny_E_ELAN, [128, 64, nn.LeakyReLU(0.1)]], # 30\n\n   [-1, 1, Conv, [256, 3, 2, None, 1, nn.LeakyReLU(0.1)]],\n   [[-1, 9], 1, Concat, [1]],\n   [-1, 1, Yolov7_Tiny_E_ELAN, [256, 128, nn.LeakyReLU(0.1)]], # 33\n\n   [24, 1, Conv, [64, 3, 1, None, 1, nn.LeakyReLU(0.1)]], # 34-P2\n   [27, 1, Conv, [128, 3, 1, None, 1, nn.LeakyReLU(0.1)]], # 35-P3\n   [30, 1, Conv, [256, 3, 1, None, 1, nn.LeakyReLU(0.1)]], # 36-P4\n   [33, 1, Conv, [512, 3, 1, None, 1, nn.LeakyReLU(0.1)]], # 37-P5\n\n   [[34, 35, 36, 37], 1, IDetect, [nc, anchors]],   # Detect(P2, P3, P4, P5)\n  ]"
  },
  {
    "path": "yolo-improve/yolov7-head/yolov7-tiny-P6.yaml",
    "content": "# parameters\nnc: 80  # number of classes\ndepth_multiple: 1.0  # model depth multiple\nwidth_multiple: 1.0  # layer channel multiple\n\n# anchors\nanchors: 3\n\n# yolov7-tiny backbone\nbackbone:\n  # [from, number, module, args] c2, k=1, s=1, p=None, g=1, act=True\n  [[-1, 1, Conv, [32, 3, 2, None, 1, nn.LeakyReLU(0.1)]],  # 0-P1/2  \n  \n   [-1, 1, Conv, [64, 3, 2, None, 1, nn.LeakyReLU(0.1)]],  # 1-P2/4    \n\n   [-1, 1, Yolov7_Tiny_E_ELAN, [64, 32, nn.LeakyReLU(0.1)]], # 2\n\n   [-1, 1, MP, []],  # 3-P3/8\n   [-1, 1, Yolov7_Tiny_E_ELAN, [128, 64, nn.LeakyReLU(0.1)]], # 4\n\n   [-1, 1, MP, []],  # 5-P4/16\n   [-1, 1, Yolov7_Tiny_E_ELAN, [256, 128, nn.LeakyReLU(0.1)]], # 6\n\n   [-1, 1, MP, []],  # 7-P5/32\n   [-1, 1, Yolov7_Tiny_E_ELAN, [512, 256, nn.LeakyReLU(0.1)]], # 8\n  ]\n\n# yolov7-tiny head\nhead:\n  [[-1, 1, Yolov7_Tiny_SPP, [256, nn.LeakyReLU(0.1)]], # 9-Yolov7-tiny-spp\n   \n   [-1, 1, Conv, [128, 1, 1, None, 1, nn.LeakyReLU(0.1)]], \n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [6, 1, Conv, [128, 1, 1, None, 1, nn.LeakyReLU(0.1)]], # route backbone P4\n   [[-1, -2], 1, Concat, [1]],\n   [-1, 1, Yolov7_Tiny_E_ELAN, [128, 64, nn.LeakyReLU(0.1)]], # 14\n\n   [-1, 1, Conv, [64, 1, 1, None, 1, nn.LeakyReLU(0.1)]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [4, 1, Conv, [64, 1, 1, None, 1, nn.LeakyReLU(0.1)]], # route backbone P3\n   [[-1, -2], 1, Concat, [1]],\n   [-1, 1, Yolov7_Tiny_E_ELAN, [64, 32, nn.LeakyReLU(0.1)]], # 19\n   \n   [-1, 1, Conv, [128, 3, 2, None, 1, nn.LeakyReLU(0.1)]],\n   [[-1, 14], 1, Concat, [1]],\n   [-1, 1, Yolov7_Tiny_E_ELAN, [128, 64, nn.LeakyReLU(0.1)]], # 22\n\n   [-1, 1, Conv, [256, 3, 2, None, 1, nn.LeakyReLU(0.1)]],\n   [[-1, 9], 1, Concat, [1]],\n   [-1, 1, Yolov7_Tiny_E_ELAN, [256, 128, nn.LeakyReLU(0.1)]], # 25\n\n   [19, 1, Conv, [128, 3, 1, None, 1, nn.LeakyReLU(0.1)]], # 26-P3\n   [22, 1, Conv, [256, 3, 1, None, 1, nn.LeakyReLU(0.1)]], # 27-P4\n   [25, 1, Conv, [512, 3, 1, None, 1, nn.LeakyReLU(0.1)]], # 28-P5\n\n   [25, 1, MP, []],  # 29-P6/64\n   [-1, 1, Yolov7_Tiny_E_ELAN, [512, 256, nn.LeakyReLU(0.1)]], # 30\n\n   [[26, 27, 28, 30], 1, IDetect, [nc, anchors]],   # Detect(P3, P4, P5, P6)\n  ]"
  },
  {
    "path": "yolo-improve/yolov7-iou.py",
    "content": "import numpy as np\nimport torch, math\n\nclass WIoU_Scale:\n    ''' monotonous: {\n            None: origin v1\n            True: monotonic FM v2\n            False: non-monotonic FM v3\n        }\n        momentum: The momentum of running mean'''\n    \n    iou_mean = 1.\n    monotonous = False\n    _momentum = 1 - 0.5 ** (1 / 7000)\n    _is_train = True\n\n    def __init__(self, iou):\n        self.iou = iou\n        self._update(self)\n    \n    @classmethod\n    def _update(cls, self):\n        if cls._is_train: cls.iou_mean = (1 - cls._momentum) * cls.iou_mean + \\\n                                         cls._momentum * self.iou.detach().mean().item()\n    \n    @classmethod\n    def _scaled_loss(cls, self, gamma=1.9, delta=3):\n        if isinstance(self.monotonous, bool):\n            if self.monotonous:\n                return (self.iou.detach() / self.iou_mean).sqrt()\n            else:\n                beta = self.iou.detach() / self.iou_mean\n                alpha = delta * torch.pow(gamma, beta - delta)\n                return beta / alpha\n        return 1\n\ndef bbox_iou(box1, box2, x1y1x2y2=True, GIoU=False, DIoU=False, CIoU=False, SIoU=False, EIoU=False, WIoU=False, Focal=False, alpha=1, gamma=0.5, scale=False, eps=1e-7):\n    # Returns the IoU of box1 to box2. box1 is 4, box2 is nx4\n    box2 = box2.T\n\n    # Get the coordinates of bounding boxes\n    if x1y1x2y2:  # x1, y1, x2, y2 = box1\n        b1_x1, b1_y1, b1_x2, b1_y2 = box1[0], box1[1], box1[2], box1[3]\n        b2_x1, b2_y1, b2_x2, b2_y2 = box2[0], box2[1], box2[2], box2[3]\n    else:  # transform from xywh to xyxy\n        b1_x1, b1_x2 = box1[0] - box1[2] / 2, box1[0] + box1[2] / 2\n        b1_y1, b1_y2 = box1[1] - box1[3] / 2, box1[1] + box1[3] / 2\n        b2_x1, b2_x2 = box2[0] - box2[2] / 2, box2[0] + box2[2] / 2\n        b2_y1, b2_y2 = box2[1] - box2[3] / 2, box2[1] + box2[3] / 2\n\n    # Intersection area\n    inter = (torch.min(b1_x2, b2_x2) - torch.max(b1_x1, b2_x1)).clamp(0) * \\\n            (torch.min(b1_y2, b2_y2) - torch.max(b1_y1, b2_y1)).clamp(0)\n\n    # Union Area\n    w1, h1 = b1_x2 - b1_x1, b1_y2 - b1_y1 + eps\n    w2, h2 = b2_x2 - b2_x1, b2_y2 - b2_y1 + eps\n    union = w1 * h1 + w2 * h2 - inter + eps\n    if scale:\n        self = WIoU_Scale(1 - (inter / union))\n\n    # IoU\n    # iou = inter / union # ori iou\n    iou = torch.pow(inter/(union + eps), alpha) # alpha iou\n    if CIoU or DIoU or GIoU or EIoU or SIoU or WIoU:\n        cw = b1_x2.maximum(b2_x2) - b1_x1.minimum(b2_x1)  # convex (smallest enclosing box) width\n        ch = b1_y2.maximum(b2_y2) - b1_y1.minimum(b2_y1)  # convex height\n        if CIoU or DIoU or EIoU or SIoU or WIoU:  # Distance or Complete IoU https://arxiv.org/abs/1911.08287v1\n            c2 = (cw ** 2 + ch ** 2) ** alpha + eps  # convex diagonal squared\n            rho2 = (((b2_x1 + b2_x2 - b1_x1 - b1_x2) ** 2 + (b2_y1 + b2_y2 - b1_y1 - b1_y2) ** 2) / 4) ** alpha  # center dist ** 2\n            if CIoU:  # https://github.com/Zzh-tju/DIoU-SSD-pytorch/blob/master/utils/box/box_utils.py#L47\n                v = (4 / math.pi ** 2) * (torch.atan(w2 / h2) - torch.atan(w1 / h1)).pow(2)\n                with torch.no_grad():\n                    alpha_ciou = v / (v - iou + (1 + eps))\n                if Focal:\n                    return iou - (rho2 / c2 + torch.pow(v * alpha_ciou + eps, alpha)), torch.pow(inter/(union + eps), gamma)  # Focal_CIoU\n                else:\n                    return iou - (rho2 / c2 + torch.pow(v * alpha_ciou + eps, alpha))  # CIoU\n            elif EIoU:\n                rho_w2 = ((b2_x2 - b2_x1) - (b1_x2 - b1_x1)) ** 2\n                rho_h2 = ((b2_y2 - b2_y1) - (b1_y2 - b1_y1)) ** 2\n                cw2 = torch.pow(cw ** 2 + eps, alpha)\n                ch2 = torch.pow(ch ** 2 + eps, alpha)\n                if Focal:\n                    return iou - (rho2 / c2 + rho_w2 / cw2 + rho_h2 / ch2), torch.pow(inter/(union + eps), gamma) # Focal_EIou\n                else:\n                    return iou - (rho2 / c2 + rho_w2 / cw2 + rho_h2 / ch2) # EIou\n            elif SIoU:\n                # SIoU Loss https://arxiv.org/pdf/2205.12740.pdf\n                s_cw = (b2_x1 + b2_x2 - b1_x1 - b1_x2) * 0.5 + eps\n                s_ch = (b2_y1 + b2_y2 - b1_y1 - b1_y2) * 0.5 + eps\n                sigma = torch.pow(s_cw ** 2 + s_ch ** 2, 0.5)\n                sin_alpha_1 = torch.abs(s_cw) / sigma\n                sin_alpha_2 = torch.abs(s_ch) / sigma\n                threshold = pow(2, 0.5) / 2\n                sin_alpha = torch.where(sin_alpha_1 > threshold, sin_alpha_2, sin_alpha_1)\n                angle_cost = torch.cos(torch.arcsin(sin_alpha) * 2 - math.pi / 2)\n                rho_x = (s_cw / cw) ** 2\n                rho_y = (s_ch / ch) ** 2\n                gamma = angle_cost - 2\n                distance_cost = 2 - torch.exp(gamma * rho_x) - torch.exp(gamma * rho_y)\n                omiga_w = torch.abs(w1 - w2) / torch.max(w1, w2)\n                omiga_h = torch.abs(h1 - h2) / torch.max(h1, h2)\n                shape_cost = torch.pow(1 - torch.exp(-1 * omiga_w), 4) + torch.pow(1 - torch.exp(-1 * omiga_h), 4)\n                if Focal:\n                    return iou - torch.pow(0.5 * (distance_cost + shape_cost) + eps, alpha), torch.pow(inter/(union + eps), gamma) # Focal_SIou\n                else:\n                    return iou - torch.pow(0.5 * (distance_cost + shape_cost) + eps, alpha) # SIou\n            elif WIoU:\n                if Focal:\n                    raise RuntimeError(\"WIoU do not support Focal.\")\n                elif scale:\n                    return getattr(WIoU_Scale, '_scaled_loss')(self), (1 - iou) * torch.exp((rho2 / c2)), iou # WIoU https://arxiv.org/abs/2301.10051\n                else:\n                    return iou, torch.exp((rho2 / c2)) # WIoU v1\n            if Focal:\n                return iou - rho2 / c2, torch.pow(inter/(union + eps), gamma)  # Focal_DIoU\n            else:\n                return iou - rho2 / c2  # DIoU\n        c_area = cw * ch + eps  # convex area\n        if Focal:\n            return iou - torch.pow((c_area - union) / c_area + eps, alpha), torch.pow(inter/(union + eps), gamma)  # Focal_GIoU https://arxiv.org/pdf/1902.09630.pdf\n        else:\n            return iou - torch.pow((c_area - union) / c_area + eps, alpha)  # GIoU https://arxiv.org/pdf/1902.09630.pdf\n    if Focal:\n        return iou, torch.pow(inter/(union + eps), gamma)  # Focal_IoU\n    else:\n        return iou  # IoU\n\n\n### yolov7\nif type(iou) is tuple:\n    if len(iou) == 2:\n        lbox += (iou[1].detach() * (1 - iou[0])).mean()\n        iou = iou[0]\n    else:\n        lbox += (iou[0] * iou[1]).mean()\n        iou = iou[-1]\nelse:\n    lbox += (1.0 - iou).mean()  # iou loss"
  },
  {
    "path": "yolo-improve/yolov7-odconv.py",
    "content": "import torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport torch.autograd\nfrom models.common import Conv, autopad\n\nclass Attention(nn.Module):\n    def __init__(self, in_planes, out_planes, kernel_size, groups=1, reduction=0.0625, kernel_num=4, min_channel=16):\n        super(Attention, self).__init__()\n        attention_channel = max(int(in_planes * reduction), min_channel)\n        self.kernel_size = kernel_size\n        self.kernel_num = kernel_num\n        self.temperature = 1.0\n\n        self.avgpool = nn.AdaptiveAvgPool2d(1)\n        self.fc = Conv(in_planes, attention_channel, act=nn.ReLU(inplace=True))\n\n        self.channel_fc = nn.Conv2d(attention_channel, in_planes, 1, bias=True)\n        self.func_channel = self.get_channel_attention\n\n        if in_planes == groups and in_planes == out_planes:  # depth-wise convolution\n            self.func_filter = self.skip\n        else:\n            self.filter_fc = nn.Conv2d(attention_channel, out_planes, 1, bias=True)\n            self.func_filter = self.get_filter_attention\n\n        if kernel_size == 1:  # point-wise convolution\n            self.func_spatial = self.skip\n        else:\n            self.spatial_fc = nn.Conv2d(attention_channel, kernel_size * kernel_size, 1, bias=True)\n            self.func_spatial = self.get_spatial_attention\n\n        if kernel_num == 1:\n            self.func_kernel = self.skip\n        else:\n            self.kernel_fc = nn.Conv2d(attention_channel, kernel_num, 1, bias=True)\n            self.func_kernel = self.get_kernel_attention\n\n        self._initialize_weights()\n\n    def _initialize_weights(self):\n        for m in self.modules():\n            if isinstance(m, nn.Conv2d):\n                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')\n                if m.bias is not None:\n                    nn.init.constant_(m.bias, 0)\n            if isinstance(m, nn.BatchNorm2d):\n                nn.init.constant_(m.weight, 1)\n                nn.init.constant_(m.bias, 0)\n\n    def update_temperature(self, temperature):\n        self.temperature = temperature\n\n    @staticmethod\n    def skip(_):\n        return 1.0\n\n    def get_channel_attention(self, x):\n        channel_attention = torch.sigmoid(self.channel_fc(x).view(x.size(0), -1, 1, 1) / self.temperature)\n        return channel_attention\n\n    def get_filter_attention(self, x):\n        filter_attention = torch.sigmoid(self.filter_fc(x).view(x.size(0), -1, 1, 1) / self.temperature)\n        return filter_attention\n\n    def get_spatial_attention(self, x):\n        spatial_attention = self.spatial_fc(x).view(x.size(0), 1, 1, 1, self.kernel_size, self.kernel_size)\n        spatial_attention = torch.sigmoid(spatial_attention / self.temperature)\n        return spatial_attention\n\n    def get_kernel_attention(self, x):\n        kernel_attention = self.kernel_fc(x).view(x.size(0), -1, 1, 1, 1, 1)\n        kernel_attention = F.softmax(kernel_attention / self.temperature, dim=1)\n        return kernel_attention\n\n    def forward(self, x):\n        x = self.avgpool(x)\n        x = self.fc(x)\n        return self.func_channel(x), self.func_filter(x), self.func_spatial(x), self.func_kernel(x)\n\n\nclass ODConv2d(nn.Module):\n    def __init__(self, in_planes, out_planes, k, s=1, p=None, g=1, act=True, d=1,\n                 reduction=0.0625, kernel_num=1):\n        super(ODConv2d, self).__init__()\n        self.in_planes = in_planes\n        self.out_planes = out_planes\n        self.kernel_size = k\n        self.stride = s\n        self.padding = autopad(k, p)\n        self.dilation = d\n        self.groups = g\n        self.kernel_num = kernel_num\n        self.attention = Attention(in_planes, out_planes, k, groups=g,\n                                   reduction=reduction, kernel_num=kernel_num)\n        self.weight = nn.Parameter(torch.randn(kernel_num, out_planes, in_planes//g, k, k),\n                                   requires_grad=True)\n        self._initialize_weights()\n        self.bn = nn.BatchNorm2d(out_planes)\n        self.act = nn.SiLU() if act is True else (act if isinstance(act, nn.Module) else nn.Identity())\n\n        if self.kernel_size == 1 and self.kernel_num == 1:\n            self._forward_impl = self._forward_impl_pw1x\n        else:\n            self._forward_impl = self._forward_impl_common\n\n    def _initialize_weights(self):\n        for i in range(self.kernel_num):\n            nn.init.kaiming_normal_(self.weight[i], mode='fan_out', nonlinearity='relu')\n\n    def update_temperature(self, temperature):\n        self.attention.update_temperature(temperature)\n\n    def _forward_impl_common(self, x):\n        # Multiplying channel attention (or filter attention) to weights and feature maps are equivalent,\n        # while we observe that when using the latter method the models will run faster with less gpu memory cost.\n        channel_attention, filter_attention, spatial_attention, kernel_attention = self.attention(x)\n        batch_size, in_planes, height, width = x.size()\n        x = x * channel_attention\n        x = x.reshape(1, -1, height, width)\n        aggregate_weight = spatial_attention * kernel_attention * self.weight.unsqueeze(dim=0)\n        aggregate_weight = torch.sum(aggregate_weight, dim=1).view(\n            [-1, self.in_planes // self.groups, self.kernel_size, self.kernel_size])\n        output = F.conv2d(x, weight=aggregate_weight, bias=None, stride=self.stride, padding=self.padding,\n                          dilation=self.dilation, groups=self.groups * batch_size)\n        output = output.view(batch_size, self.out_planes, output.size(-2), output.size(-1))\n        output = output * filter_attention\n        return output\n\n    def _forward_impl_pw1x(self, x):\n        channel_attention, filter_attention, spatial_attention, kernel_attention = self.attention(x)\n        x = x * channel_attention\n        output = F.conv2d(x, weight=self.weight.squeeze(dim=0), bias=None, stride=self.stride, padding=self.padding,\n                          dilation=self.dilation, groups=self.groups)\n        output = output * filter_attention\n        return output\n\n    def forward(self, x):\n        return self.act(self.bn(self._forward_impl(x)))"
  },
  {
    "path": "yolo-improve/yolov7-slimneck.py",
    "content": "class GSConv(nn.Module):\n    # GSConv https://github.com/AlanLi1997/slim-neck-by-gsconv\n    # act参数在yolov7-tiny上记得修改为nn.LeakyReLU(0.1)\n    def __init__(self, c1, c2, k=1, s=1, p=None, g=1, act=True):\n        super().__init__()\n        c_ = c2 // 2\n        self.cv1 = Conv(c1, c_, k, s, p, g, act)\n        self.cv2 = Conv(c_, c_, 5, 1, p, c_, act)\n\n    def forward(self, x):\n        x1 = self.cv1(x)\n        x2 = torch.cat((x1, self.cv2(x1)), 1)\n        # shuffle\n        # y = x2.reshape(x2.shape[0], 2, x2.shape[1] // 2, x2.shape[2], x2.shape[3])\n        # y = y.permute(0, 2, 1, 3, 4)\n        # return y.reshape(y.shape[0], -1, y.shape[3], y.shape[4])\n\n        b, n, h, w = x2.size()\n        b_n = b * n // 2\n        y = x2.reshape(b_n, 2, h * w)\n        y = y.permute(1, 0, 2)\n        y = y.reshape(2, -1, n // 2, h, w)\n\n        return torch.cat((y[0], y[1]), 1)\n\nclass GSBottleneck(nn.Module):\n    # GS Bottleneck https://github.com/AlanLi1997/slim-neck-by-gsconv\n    def __init__(self, c1, c2, k=3, s=1, e=0.5):\n        super().__init__()\n        c_ = int(c2*e)\n        # for lighting\n        self.conv_lighting = nn.Sequential(\n            GSConv(c1, c_, 1, 1),\n            GSConv(c_, c2, 3, 1, act=False))\n        self.shortcut = Conv(c1, c2, 1, 1, act=False)\n\n    def forward(self, x):\n        return self.conv_lighting(x) + self.shortcut(x)\n\nclass GSBottleneckC(GSBottleneck):\n    # cheap GS Bottleneck https://github.com/AlanLi1997/slim-neck-by-gsconv\n    def __init__(self, c1, c2, k=3, s=1):\n        super().__init__(c1, c2, k, s)\n        self.shortcut = DWConv(c1, c2, k, s, act=False)\n\nclass VoVGSCSP(nn.Module):\n    # VoVGSCSP module with GSBottleneck\n    def __init__(self, c1, c2, n=1, shortcut=True, g=1, e=0.5):\n        super().__init__()\n        c_ = int(c2 * e)  # hidden channels\n        self.cv1 = Conv(c1, c_, 1, 1)\n        self.cv2 = Conv(c1, c_, 1, 1)\n        self.gsb = nn.Sequential(*(GSBottleneck(c_, c_, e=1.0) for _ in range(n)))\n        self.res = Conv(c_, c_, 3, 1, act=False)\n        self.cv3 = Conv(2 * c_, c2, 1)  #\n\n\n    def forward(self, x):\n        x1 = self.gsb(self.cv1(x))\n        y = self.cv2(x)\n        return self.cv3(torch.cat((y, x1), dim=1))\n\nclass VoVGSCSPC(VoVGSCSP):\n    # cheap VoVGSCSP module with GSBottleneck\n    def __init__(self, c1, c2, n=1, shortcut=True, g=1, e=0.5):\n        super().__init__(c1, c2)\n        c_ = int(c2 * 0.5)  # hidden channels\n        self.gsb = GSBottleneckC(c_, c_, 1, 1)\n\n\n# parameters\nnc: 80  # number of classes\ndepth_multiple: 1.0  # model depth multiple\nwidth_multiple: 1.0  # layer channel multiple\n\n# anchors\nanchors:\n  - [12,16, 19,36, 40,28]  # P3/8\n  - [36,75, 76,55, 72,146]  # P4/16\n  - [142,110, 192,243, 459,401]  # P5/32\n\n# yolov7 backbone\nbackbone:\n  # [from, number, module, args]\n  [[-1, 1, Conv, [32, 3, 1]],  # 0\n  \n   [-1, 1, Conv, [64, 3, 2]],  # 1-P1/2      \n   [-1, 1, Conv, [64, 3, 1]],\n   \n   [-1, 1, Conv, [128, 3, 2]],  # 3-P2/4  \n   [-1, 1, Yolov7_E_ELAN, [256, 64]], # 4\n         \n   [-1, 1, V7DownSampling, [128]],  # 5-P3/8  \n   [-1, 1, Yolov7_E_ELAN, [512, 128]], # 6\n         \n   [-1, 1, V7DownSampling, [256]],  # 7-P4/16  \n   [-1, 1, Yolov7_E_ELAN, [1024, 256]], # 8\n         \n   [-1, 1, V7DownSampling, [512]],  # 9-P5/32  \n   [-1, 1, Yolov7_E_ELAN, [1024, 256]],  # 10\n  ]\n\n# yolov7 head\nhead:\n  [[-1, 1, SPPCSPC, [512]], # 11\n\n   [-1, 1, GSConv, [256, 1, 1]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [8, 1, GSConv, [256, 1, 1]], # 14 route backbone P4\n   [[-1, -2], 1, Concat, [1]], # 15\n   \n   [-1, 1, VoVGSCSP, [256]], # 16\n   \n   [-1, 1, GSConv, [128, 1, 1]],\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']],\n   [6, 1, GSConv, [128, 1, 1]], # 19 route backbone P3\n   [[-1, -2], 1, Concat, [1]], # 20\n   \n   [-1, 1, VoVGSCSP, [128]], # 21\n      \n   [[-1, 16], 1, V7DownSampling_Neck, [128]], # 22\n   \n   [-1, 1, VoVGSCSP, [256]], # 23\n      \n   [[-1, 11], 1, V7DownSampling_Neck, [256]], # 24\n   \n   [-1, 1, VoVGSCSP, [512]], # 25\n   \n   [21, 1, RepConv, [256, 3, 1]], # 26-P3\n   [23, 1, RepConv, [512, 3, 1]], # 27-P4\n   [25, 1, RepConv, [1024, 3, 1]], # 28-P5\n\n   [[26, 27, 28], 1, IDetect, [nc, anchors]],   # Detect(P3, P4, P5)\n  ]"
  },
  {
    "path": "yolo-improve/yolov7-softnms.py",
    "content": "def box_iou_for_nms(box1, box2, GIoU=False, DIoU=False, CIoU=False, SIoU=False, EIou=False, eps=1e-7):\n    # Returns Intersection over Union (IoU) of box1(1,4) to box2(n,4)\n\n    b1_x1, b1_y1, b1_x2, b1_y2 = box1.chunk(4, -1)\n    b2_x1, b2_y1, b2_x2, b2_y2 = box2.chunk(4, -1)\n    w1, h1 = b1_x2 - b1_x1, (b1_y2 - b1_y1).clamp(eps)\n    w2, h2 = b2_x2 - b2_x1, (b2_y2 - b2_y1).clamp(eps)\n\n    # Intersection area\n    inter = (b1_x2.minimum(b2_x2) - b1_x1.maximum(b2_x1)).clamp(0) * \\\n            (b1_y2.minimum(b2_y2) - b1_y1.maximum(b2_y1)).clamp(0)\n\n    # Union Area\n    union = w1 * h1 + w2 * h2 - inter + eps\n\n    # IoU\n    iou = inter / union\n    if CIoU or DIoU or GIoU or EIou:\n        cw = b1_x2.maximum(b2_x2) - b1_x1.minimum(b2_x1)  # convex (smallest enclosing box) width\n        ch = b1_y2.maximum(b2_y2) - b1_y1.minimum(b2_y1)  # convex height\n        if CIoU or DIoU or EIou:  # Distance or Complete IoU https://arxiv.org/abs/1911.08287v1\n            c2 = cw ** 2 + ch ** 2 + eps  # convex diagonal squared\n            rho2 = ((b2_x1 + b2_x2 - b1_x1 - b1_x2) ** 2 + (b2_y1 + b2_y2 - b1_y1 - b1_y2) ** 2) / 4  # center dist ** 2\n            if CIoU:  # https://github.com/Zzh-tju/DIoU-SSD-pytorch/blob/master/utils/box/box_utils.py#L47\n                v = (4 / math.pi ** 2) * (torch.atan(w2 / h2) - torch.atan(w1 / h1)).pow(2)\n                with torch.no_grad():\n                    alpha = v / (v - iou + (1 + eps))\n                return iou - (rho2 / c2 + v * alpha)  # CIoU\n            elif EIou:\n                rho_w2 = ((b2_x2 - b2_x1) - (b1_x2 - b1_x1)) ** 2\n                rho_h2 = ((b2_y2 - b2_y1) - (b1_y2 - b1_y1)) ** 2\n                cw2 = cw ** 2 + eps\n                ch2 = ch ** 2 + eps\n                return iou - (rho2 / c2 + rho_w2 / cw2 + rho_h2 / ch2)\n            return iou - rho2 / c2  # DIoU\n        c_area = cw * ch + eps  # convex area\n        return iou - (c_area - union) / c_area  # GIoU https://arxiv.org/pdf/1902.09630.pdf\n    elif SIoU:\n        # SIoU Loss https://arxiv.org/pdf/2205.12740.pdf\n        s_cw = (b2_x1 + b2_x2 - b1_x1 - b1_x2) * 0.5 + eps\n        s_ch = (b2_y1 + b2_y2 - b1_y1 - b1_y2) * 0.5 + eps\n        sigma = torch.pow(s_cw ** 2 + s_ch ** 2, 0.5)\n        sin_alpha_1 = torch.abs(s_cw) / sigma\n        sin_alpha_2 = torch.abs(s_ch) / sigma\n        threshold = pow(2, 0.5) / 2\n        sin_alpha = torch.where(sin_alpha_1 > threshold, sin_alpha_2, sin_alpha_1)\n        angle_cost = torch.cos(torch.arcsin(sin_alpha) * 2 - math.pi / 2)\n        rho_x = (s_cw / cw) ** 2\n        rho_y = (s_ch / ch) ** 2\n        gamma = angle_cost - 2\n        distance_cost = 2 - torch.exp(gamma * rho_x) - torch.exp(gamma * rho_y)\n        omiga_w = torch.abs(w1 - w2) / torch.max(w1, w2)\n        omiga_h = torch.abs(h1 - h2) / torch.max(h1, h2)\n        shape_cost = torch.pow(1 - torch.exp(-1 * omiga_w), 4) + torch.pow(1 - torch.exp(-1 * omiga_h), 4)\n        return iou - 0.5 * (distance_cost + shape_cost)\n    return iou  # IoU\n\ndef soft_nms(bboxes, scores, iou_thresh=0.5,sigma=0.5,score_threshold=0.25):\n    order = scores.argsort(descending=True).to(bboxes.device)\n    keep = []\n    \n    while order.numel() > 1:\n        if order.numel() == 1:\n            keep.append(order[0])\n            break\n        else:\n            i = order[0]\n            keep.append(i)\n        \n        iou = box_iou_for_nms(bboxes[i], bboxes[order[1:]]).squeeze()\n        \n        idx = (iou > iou_thresh).nonzero().squeeze()\n        if idx.numel() > 0: \n            iou = iou[idx] \n            newScores = torch.exp(-torch.pow(iou,2)/sigma)\n            scores[order[idx+1]] *= newScores\n        \n        newOrder = (scores[order[1:]] > score_threshold).nonzero().squeeze() \n        if newOrder.numel() == 0: \n            break\n        else:\n            maxScoreIndex = torch.argmax(scores[order[newOrder+1]]) \n            if maxScoreIndex != 0: \n                newOrder[[0,maxScoreIndex],] = newOrder[[maxScoreIndex,0],]\n            order = order[newOrder+1]\n    \n    return torch.LongTensor(keep)"
  },
  {
    "path": "yolo-improve/yolov8-DCN.py",
    "content": "class DCNv2(nn.Module):\n    def __init__(self, in_channels, out_channels, kernel_size, stride=1,\n                 padding=1, dilation=1, groups=1, deformable_groups=1):\n        super(DCNv2, self).__init__()\n\n        self.in_channels = in_channels\n        self.out_channels = out_channels\n        self.kernel_size = (kernel_size, kernel_size)\n        self.stride = (stride, stride)\n        self.padding = (padding, padding)\n        self.dilation = (dilation, dilation)\n        self.groups = groups\n        self.deformable_groups = deformable_groups\n\n        self.weight = nn.Parameter(\n            torch.empty(out_channels, in_channels, *self.kernel_size)\n        )\n        self.bias = nn.Parameter(torch.empty(out_channels))\n\n        out_channels_offset_mask = (self.deformable_groups * 3 *\n                                    self.kernel_size[0] * self.kernel_size[1])\n        self.conv_offset_mask = nn.Conv2d(\n            self.in_channels,\n            out_channels_offset_mask,\n            kernel_size=self.kernel_size,\n            stride=self.stride,\n            padding=self.padding,\n            bias=True,\n        )\n        self.bn = nn.BatchNorm2d(out_channels)\n        self.act = Conv.default_act\n        self.reset_parameters()\n\n    def forward(self, x):\n        offset_mask = self.conv_offset_mask(x)\n        o1, o2, mask = torch.chunk(offset_mask, 3, dim=1)\n        offset = torch.cat((o1, o2), dim=1)\n        mask = torch.sigmoid(mask)\n        x = torch.ops.torchvision.deform_conv2d(\n            x,\n            self.weight,\n            offset,\n            mask,\n            self.bias,\n            self.stride[0], self.stride[1],\n            self.padding[0], self.padding[1],\n            self.dilation[0], self.dilation[1],\n            self.groups,\n            self.deformable_groups,\n            True\n        )\n        x = self.bn(x)\n        x = self.act(x)\n        return x\n\n    def reset_parameters(self):\n        n = self.in_channels\n        for k in self.kernel_size:\n            n *= k\n        std = 1. / math.sqrt(n)\n        self.weight.data.uniform_(-std, std)\n        self.bias.data.zero_()\n        self.conv_offset_mask.weight.data.zero_()\n        self.conv_offset_mask.bias.data.zero_()\n\nclass Bottleneck_DCN(nn.Module):\n    # Standard bottleneck with DCN\n    def __init__(self, c1, c2, shortcut=True, g=1, k=(3, 3), e=0.5):  # ch_in, ch_out, shortcut, groups, kernels, expand\n        super().__init__()\n        c_ = int(c2 * e)  # hidden channels\n        if k[0] == 3:\n            self.cv1 = DCNv2(c1, c_, k[0], 1)\n        else:\n            self.cv1 = Conv(c1, c_, k[0], 1)\n        if k[1] == 3:\n            self.cv2 = DCNv2(c_, c2, k[1], 1, groups=g)\n        else:\n            self.cv2 = Conv(c_, c2, k[1], 1, g=g)\n        self.add = shortcut and c1 == c2\n\n    def forward(self, x):\n        return x + self.cv2(self.cv1(x)) if self.add else self.cv2(self.cv1(x))\n\nclass C2f_DCN(nn.Module):\n    # CSP Bottleneck with 2 convolutions\n    def __init__(self, c1, c2, n=1, shortcut=False, g=1, e=0.5):  # ch_in, ch_out, number, shortcut, groups, expansion\n        super().__init__()\n        self.c = int(c2 * e)  # hidden channels\n        self.cv1 = Conv(c1, 2 * self.c, 1, 1)\n        self.cv2 = Conv((2 + n) * self.c, c2, 1)  # optional act=FReLU(c2)\n        self.m = nn.ModuleList(Bottleneck_DCN(self.c, self.c, shortcut, g, k=(3, 3), e=1.0) for _ in range(n))\n\n    def forward(self, x):\n        y = list(self.cv1(x).split((self.c, self.c), 1))\n        y.extend(m(y[-1]) for m in self.m)\n        return self.cv2(torch.cat(y, 1))"
  },
  {
    "path": "yolo-improve/yolov8-compress.md",
    "content": "# YOLOV8V10V11剪枝项目介绍\n\n## 对于群里的剪枝相关问题,我基本都会回复,对于一些剪枝问题,我都会给出建议。  \n\n### 首先剪枝是什么？  \n模型剪枝是深度学习中的一种技术，旨在通过减少神经网络中不必要的参数和连接，来优化模型的效率和性能。模型剪枝可以分为结构剪枝和参数剪枝两种类型。  \n\n### 为什么需要剪枝？  \n剪枝可以很好地衡量模型轻量化程度与精度的关系,是替换轻量化结构完全没办法比的,比如我模型剪枝可以压缩百分之30的计算量,精度只下降了百分之1,但是你通过换模块来达到压缩百分之30的计算量,一般时间就会变长,因为大部分轻量化模块都是由时间换空间,而且精度还会下降得比较多,但是剪枝可以很好地避免这个问题.\n\n### 目前剪枝项目包含以下剪枝方法：\n1. L1 \n2. Random \n3. Slim(需要稀疏训练) \n4. GroupSlim(需要稀疏训练) \n5. GroupNorm \n6. LAMP \n7. GroupSL(需要稀疏训练) \n8. GroupReg(需要稀疏训练)\n9. GroupHessian\n10. GroupTaylor\n\n### 其中prune系列还有一些细节：\n1. 支持稀疏训练时候可视化BN稀疏程度和数值。\n2. 稀疏训练的稀疏系数会进行线性调整，让稀疏训练后期精度更容易回升，更稳定。\n3. 支持设定加速比例，模型会进行自动压缩，压缩到指定比例或者达到最大压缩次数后会自动进入finetune。\n\n### 剪枝的一些顾虑\n大家关心最多的一个问题就是，我的结构能不能剪之类的，剪枝对模型复杂度的要求比较高，目前剪枝都是基于Torch_Pruning库进行剪枝，prune系列的可以跳过一些不能剪枝的层(某些复杂的结构可能在构建动态图的时候失败,这些就只能换结构)，这个项目会有比较多的示例和视频教程教大家如何去剪自己的结构,注意点在哪里等等。这个剪枝项目是没办法保证所有的结构都能剪，有一定的风险，是否入手请自行考虑！  \n[yolov5v7剪枝](https://github.com/z1069614715/objectdetection_script/blob/master/yolo-improve/yolov5v7-light.md)这里面的结构都经过实验是可剪的.\n\n### 那些人群建议入手剪枝\n1. 原始的算法精度很高,没办法再提升精度,只能走轻量化路线,这种建议配合一些轻量化模块+剪枝来增加你的工作量和创新度.\n2. 需要部署到嵌入式或者手机端等低算力设备,这类本身模型就不能太复杂,而且以轻量化为主,剪枝是非常适合的.\n3. 以后需从事深度学习方面的工作,模型轻量化(蒸馏、量化、剪枝)基本是必须要会的技能.\n\n### Yolov8 相关实验 GPU-Device:RTX3090\n#### Dataset:VisDrone 30%TrainingData Model:Yolov8n\n| model | Parameters | GFLOPs | Model Size | mAP50 | mAP50-95 | Inference Time(bs:32) |\n| :----: | :----: | :----: | :----: | :----: | :----: | :----: |\n| BaseLine | 3,007,598 | 8.1 | 5.9m | 0.225 | 0.124 | 0.00099s |\n| Lamp Exp1 2.0X | 1,513,245(50.3%) | 4.0(50%) | 3.1m(52.5%) | 0.197(-0.018) | 0.106(-0.018) | 0.00075s(75.8%) |\n| Lamp Exp2 2.0X | 679,484(22.6%) | 4.0(50%) | 1.5m(25.4%) | 0.231(+0.006) | 0.126(+0.002) | 0.00073s(73.7%) |\n| Lamp Exp3 2.5X | 503,959(16.8%) | 3.2(39.5%) | 1.2m(20.3%) | 0.225(0.0) | 0.123(-0.001) | 0.00068s(68.7%) |\n| Group-Taylor Exp1 2.0X | 1,093,305(36.4%) | 4.0(50%) | 2.3m(39%) | 0.203(-0.022) | 0.11(-0.014) | 0.00074s(74.8%) |\n| Group-Taylor Exp2 2.0X | 1,513,245(50.3%) | 4.0(50%) | 3.1m(52.5%) | 0.196(-0.029) | 0.105(-0.019) | 0.00075s(75.8%) |\n| Group-Hessian Exp1 2.0X | 1,436,390(47.8%) | 4.0(50%) | 3.0m(50.8%) | 0.168(-0.057) | 0.0883(-0.041) | 0.00071s(71.7%) |\n| Group-Sl Exp1 2.0X | 1,556,422(51.7%) | 4.0(50%) | 3.1m(52.5%) | 0.173(-0.052) | 0.0901(-0.0339) | 0.00066s(66.7%) |\n| Group-Slim Exp1 2.0X | 1,113,000(37%) | 4.0(50%) | 2.3m(39%) | 0.201(-0.024) | 0.108(-0.016) | 0.00075s(75.8%) |\n| Slim Exp1 2.0X | 932,902(31%) | 4.0(50%) | 2.0m(33.9%) | 0.21(-0.015) | 0.114(-0.01) | 0.00075s(75.8%) |\n\n#### Dataset:VisDrone 30%TrainingData Model:yolov8-Faster-GFPN-P2-EfficientHead\n| model | Parameters | GFLOPs | Model Size | mAP50 | mAP50-95 | Inference Time(bs:32) |\n| :----: | :----: | :----: | :----: | :----: | :----: | :----: |\n| BaseLine | 3,457,400 | 12.1 | 7.2M | 0.241 | 0.133 | 0.00188s |\n| Lamp Exp1 2.0X | 903,894(26.1%) | 5.9(48.6%) | 2.3M(32%) | 0.226(-0.015) | 0.127(-0.006) | 0.00150s(83.3%) |\n| GroupTaylor Exp1 2.0X | 1,699,046(49.1%) | 5.9(48.6%) | 3.9M(54.2%) | 0.212(-0.029) | 0.115(-0.028) | 0.00142s(75.5%) |\n| GroupTaylor Exp2 2.0X | 1,751,941(51%) | 6.0(49.6%) | 4.0M(55.6%) | 0.216(-0.025) | 0.119(-0.024) | 0.00147s(78.2%) |\n| GroupHessian Exp1 2.0X | 1,751,941(51%) | 6.0(49.6%) | 2.3M(32%) | 0.214(-0.023) | 0.118(-0.025) | 0.00147s(78.2%) |\n\n#### Dataset:Seaship BaseLine:Yolov8n Light:yolov8-BIFPN-EfficientRepHead.yaml(C2f-EMBC,BIFPN,EfficientRepHead)\n| model | Parameters | GFLOPs | Model Size | mAP50 | mAP50-95 | Inference Time(bs:32) |\n| :----: | :----: | :----: | :----: | :----: | :----: | :----: |\n| BaseLine | 3,006,818 | 8.1 | 5.9M | 0.986 | 0.813 | 0.00098s |\n| Light | 1,809,166(60.2%) | 5.6(69.1%) | 4.5M(76.3%) | 0.981(-0.005) | 0.787(-0.026) | 0.00109s(112.2%) |\n| Light Lamp Exp1 2.0X | 729,717(24.3%) | 2.4(30%) | 2.3M(39%) | 0.981(-0.005) | 0.777(-0.036) | 0.00080s(81.6%) |\n| Light Lamp Exp2 2.5X | 492,731(16.4%) | 1.6(19.8%) | 1.8M(31%) | 0.973(-0.013) | 0.746(-0.067) | 0.00062s(63.3%) |\n\n#### Dataset:VisDrone 100%TrainingData Model:yolov8-ASF-P2\n| model | Parameters | GFLOPs | Model Size | mAP50 | mAP50-95 | Inference Time(bs:32) |\n| :----: | :----: | :----: | :----: | :----: | :----: | :----: |\n| BaseLine | 2,490,488 | 12.0 | 5.0M | 0.295 | 0.166 | 0.00199s |\n| Lamp Exp1 2.0X | 664,162(26.7%) | 5.9(49.2%) | 2.3M(46%) | 0.277(-0.018) | 0.154(-0.012) | 0.00153s(76.9%) |\n| Lamp Exp2 1.5X | 1,065,363(42.8%) | 7.9(65.8%) | 2.4M(48%) | 0.296(+0.001) | 0.165(-0.001) | 0.00168s(84.4%) |\n| Lamp Exp3 1.7X | 885,911(35.6%) | 7.0(58.3%) | 2.3M(46%) | 0.29(-0.005) | 0.161(-0.005) | 0.00162s(81.4%) |\n\n#### Dataset:VisDrone 30%TrainingData Model:yolov8-GHostHGNetV2-SlimNeck-ASF\n| model | Parameters | GFLOPs | Model Size | mAP50 | mAP50-95 | Inference Time(bs:32) |\n| :----: | :----: | :----: | :----: | :----: | :----: | :----: |\n| BaseLine | 2,236,610 | 6.8 | 4.6M | 0.206 | 0.111 | 0.00137s |\n| LAMP Exp1 2.0X | 951,571(42.5%) | 3.4(50%) | 2.1M(45.7%) | 0.207(+0.001) | 0.112(+0.001) | 0.00092s(67.2%) |\n\n#### Dataset:CrowdHuman 20%TrainingData Model:yolov8-convnextv2-goldyolo-ASF\n| model | Parameters | GFLOPs | Model Size | mAP50 | mAP50-95 | Inference Time(bs:32) |\n| :----: | :----: | :----: | :----: | :----: | :----: | :----: |\n| BaseLine | 8,712,945 | 16.7 | 17.0M | 0.747 | 0.431 | 0.00461s |\n| LAMP Exp1 2.0X | 4,493,135(51.6%) | 8.3(49.7%) | 9.0M(52.9%) | 0.747(0.0) | 0.434(+0.003) | 0.00261s(56.6%) |\n| LAMP Exp2 2.5X | 3,899,980(44.8%) | 6.6(39.5%) | 7.9M(46.5%) | 0.742(-0.005) | 0.431(0.0) | 0.00219s(47.5%) |\n\n#### Dataset:CrowdHuman 20%TrainingData Model:yolov8-DyHead\n| model | Parameters | GFLOPs | Model Size | mAP50 | mAP50-95 | Inference Time(bs:32) |\n| :----: | :----: | :----: | :----: | :----: | :----: | :----: |\n| BaseLine | 3,485,458 | 9.6 | 6.9M | 0.743 | 0.436 | 0.00173s |\n| LAMP Exp1 2.0X | 1,167,932(33.5%) | 4.8(50%) | 2.5M(65.8%) | 0.745(+0.002) | 0.439(+0.003) | 0.00124s(71.7%) |\n| LAMP Exp1 2.5X | 815,035(23.4%) | 3.8(39.6%) | 1.8M(26.1%) | 0.74(-0.003) | 0.432(-0.004) | 0.00106s(61.3%) |\n| LAMP Exp1 3.0X | 628,561(18%) | 3.2(33.3%) | 1.5M(21.7%) | 0.733(-0.01) | 0.426(-0.01) | 0.00098s(56.6%) |\n\n#### Dataset:CrowdHuman 20%TrainingData Model:yolov8-repvit(CVPR2024)-RepNCSPELAN\n| model | Parameters | GFLOPs | Model Size | mAP50 | mAP50-95 | Inference Time(bs:32) |\n| :----: | :----: | :----: | :----: | :----: | :----: | :----: |\n| BaseLine | 6,288,382 | 17.6 | 12.7M | 0.74 | 0.431 | 0.00220s |\n| LAMP Exp1 2.0X | 2,300,482(36.6%) | 8.7(49.4%) | 5.0M(39.4%) | 0.747(+0.007) | 0.438(+0.007) | 0.00167s(76%) |\n| LAMP Exp2 3.0X | 1,536,813(24.4%) | 5.7(32.4%) | 3.6M(28.3%) | 0.732(-0.008) | 0.424(-0.007) | 0.00143s(65%) |\n| LAMP Exp3 3.5X | 1,328,534(21.1%) | 4.8(27.3%) | 3.2M(25.2%) | 0.73(-0.01) | 0.421(-0.01) | 0.00137s(63%) |\n| LAMP Exp4 4.0X | 1,179,757(18.8%) | 4.2(24.1%) | 2.9M(22.8%) | 0.738(-0.02) | 0.425(-0.006) | 0.00132s(61%) |\n| GROUP-TAYLOR Exp1 2.0X | 3,235,020(51.4%) | 8.7(49.4%) | 6.8M(53.5%) | 0.704(-0.036) | 0.396(-0.035) | 0.00154s(70%) |\n| GROUP-TAYLOR Exp2 2.0X | 3,197,034(50.8%) | 8.7(49.4%) | 6.7M(52.7%) | 0.707(-0.033) | 0.405(-0.026) | 0.00158s(72%) |\n\n#### Dataset:WIDER-FACE Model:yolov8n-pose (因此数据集的验证集没有pose标注,所以pose指标都为0)\n| model | Parameters | GFLOPs | Model Size | mAP50 | mAP50-95 | Inference Time(bs:32) |\n| :----: | :----: | :----: | :----: | :----: | :----: | :----: |\n| BaseLine | 3,078,128 | 8.3 | 6.1M | 0.639 | 0.334 | 0.00102s |\n| LAMP Exp1 2.0X | 731,605(23.8%) | 4.1(49.3%) | 1.6M(26.2%) | 0.636(-0.003) | 0.333(-0.001) | 0.00080s(78.4%) |\n\n#### Dataset:Seaship Model:yolov8-starnet-C2f-Star-LSCD.yaml\n| model | Parameters | GFLOPs | Model Size | mAP50 | mAP50-95 | Inference Time(bs:32) |\n| :----: | :----: | :----: | :----: | :----: | :----: | :----: |\n| BaseLine | 1,369,689 | 4.5 | 2.8M | 0.992 | 0.815 | 0.00079s |\n| LAMP Exp1 2.0X | 232,498(17%) | 2.2(49%) | 0.6M(21.4%) | 0.98(-0.012) | 0.791(-0.024) | 0.00047s(59.5%) |\n| LAMP Exp2 2.5X | 136,375(10%) | 1.8(40%) | 0.5M(17.9%) | 0.965(-0.027) | 0.736(-0.079) | 0.00035s(44.3%) |\n| LAMP Exp3 3.0X | 98,051(7.2%) | 1.5(33.3%) | 0.4M(14.3%) | 0.912(-0.08) | 0.629(-0.186) | 0.00024s(30.4%) |\n\n### Yolov10 相关实验 GPU-Device:RTX3090\n#### Dataset:Visdrone2019 Model:yolov10n.yaml\n| model | Parameters | GFLOPs | Model Size | mAP50 | mAP50-95 | Inference Time(bs:32) |\n| :----: | :----: | :----: | :----: | :----: | :----: | :----: |\n| BaseLine | 2,267,118 | 6.5 | 5.5M | 0.271 | 0.151 | 0.00107s |\n| LAMP Exp1 2.0X | 788,635(34.8%) | 3.5(53.8%) | 2.1M(38.2%) | 0.271(0.0) | 0.148(-0.003) | 0.00084s(78.5%) |\n| LAMP Exp2 2.5X | 614,698(27.1%) | 2.8(43.1%) | 1.7M(30.9%) | 0.258(-0.013) | 0.14(-0.011) | 0.00077s(72%) |"
  },
  {
    "path": "yolo-improve/yolov8-distill.md",
    "content": "# YOLOV8V10V11蒸馏项目介绍\n\n## 对于群里的蒸馏相关问题,我基本都会回复,对于一些蒸馏问题,我都会给出建议。\n\n### 首先蒸馏是什么？  \n模型蒸馏（Model Distillation）是一种用于在计算机视觉中提高模型性能和效率的技术。在模型蒸馏中，通常存在两个模型，即“教师模型”和“学生模型”。\n\n### 为什么需要蒸馏？  \n1. 在不增加模型计算量和参数量的情况下提升精度，也即是可以无损提高精度。\n2. 配合剪枝一起使用，可以尽量达到无损降低模型参数量、计算量，提高FPS的情况下，还能保持模型精度没有下降甚至上升，这是改进网络结构无法达到的高度。\n3. 论文中的保底手段，因为剪枝和蒸馏的特殊性，其都不会增加参数量和计算量，可以在最后一个点上大幅度增加实验和工作量，因为本身蒸馏也需要做大量实验。\n\n### 目前蒸馏方法包含：\n1. Logical\n    1. L1\n    2. L2\n    3. [BCKD](https://link.zhihu.com/?target=https%3A//arxiv.org//pdf/2308.14286)(Bridging Cross-task Protocol Inconsistency for Distillation in Dense Object Detection,ICCV 2023)\n    4. Double distillation strategy.(针对yolov10的结构开发)\n2. Feature\n    1. [Mimic](https://openaccess.thecvf.com/content_cvpr_2017/papers/Li_Mimicking_Very_Efficient_CVPR_2017_paper.pdf)\n    2. [Masked Generative Distillation](https://link.zhihu.com/?target=https%3A//arxiv.org/pdf/2205.01529.pdf) (ECCV 2022)\n    3. [Channel-wise Distillation](https://arxiv.org/pdf/2011.13256.pdf) (ICCV 2021)\n    4. [ChSimLoss Distillation](https://openaccess.thecvf.com/content/ICCV2021/html/Liu_Exploring_Inter-Channel_Correlation_for_Diversity-Preserved_Knowledge_Distillation_ICCV_2021_paper.html) (ICCV2021)\n    5. [SPKDLoss Distillation](https://arxiv.org/pdf/1907.09682.pdf) (ICCV2019)\n\n### 知识蒸馏的一些细节(具体项目会提供视频讲解)\n1. Feature蒸馏可以自定义选择层进行蒸馏.\n2. 蒸馏损失支持常数,线性,余弦进行动调整.\n3. 支持Logical和Feature一起使用.\n4. 过程中会输出Logical和Feature的损失,让用户可以及时调整对应的损失系数.\n5. 支持正常训练模型时候进行蒸馏和剪枝后finetune蒸馏.\n6. 支持自蒸馏.\n\n# 实验示例结果.(以下示例实验相关命令,视频教程,实验数据都在项目里面)\n#### Dataset:VisDrone(训练集只用了百分之30的数据,验证集和测试集用了全量的数据) Teacher:yolov8s Student:yolov8n (no pretrained weight)\n| model | GFLOPs | mAP50(test set) | mAP50-95(test set) |\n| :----: | :----: | :----: | :----: |\n| yolov8n | 8.1 | 0.202 | 0.108 |\n| yolov8s | 28.5 | 0.234 | 0.128 |\n| yolov8n CWD Exp1 | 8.1 | 0.211(+0.009) | 0.114(+0.006) |\n| yolov8n CWD Exp2 | 8.1 | 0.208(+0.006) | 0.112(+0.004) |\n| yolov8n CWD Exp3 | 8.1 | 0.21(+0.008) | 0.112(+0.004) |\n| yolov8n Mimic Exp1 | 8.1 | 0.203(+0.001) | 0.108(+0.0) |\n| yolov8n Mimic Exp2 | 8.1 | 0.204(+0.002) | 0.107(-0.001) |\n| yolov8n l2 Exp1 | 8.1 | 0.196(-0.006) | 0.106(-0.002) |\n| yolov8n BCKD Exp1 | 8.1 | 0.208(+0.006) | 0.112(+0.004) |\n| yolov8n BCKD Exp2 | 8.1 | 0.206(+0.004) | 0.106(-0.002) |\n| yolov8n BCKD Exp3 | 8.1 | 0.209(+0.007) | 0.113(+0.005) |\n| yolov8n BCKD Exp4 | 8.1 | 0.204(+0.002) | 0.11(+0.002) |\n| yolov8n BCKD+CWD Exp1 | 8.1 | 0.204(+0.002) | 0.109(+0.001) |\n| yolov8n BCKD+CWD Exp2 | 8.1 | 0.214(+0.012) | 0.115(+0.007) |\n| yolov8n BCKD+CWD Exp3 | 8.1 | 0.21(+0.008) | 0.114(+0.006) |\n| yolov8n BCKD+CWD Exp4 | 8.1 | 0.208(+0.006) | 0.113(+0.005) |\n\n#### Dataset:VisDrone(训练集只用了百分之30的数据,验证集和测试集用了全量的数据) Teacher:yolov8s Student:yolov8n-lamp (use pretrained weight)\n| model | GFLOPs | mAP50(test set) | mAP50-95(test set) |\n| :----: | :----: | :----: | :----: |\n| yolov8n | 8.1 | 0.225 | 0.124 |\n| yolov8n-lamp | 3.2 | 0.225 | 0.123(-0.001) |\n| yolov8s | 28.5 | 0.259 | 0.146 |\n| yolov8n-lamp cwd exp1 | 3.2 | 0.23(+0.005) | 0.124(0.0) |\n\n#### Dataset:VisDrone(训练集只用了百分之30的数据,验证集和测试集用了全量的数据) Teacher:yolov8s-asf-p2 Student:yolov8s-asf-p2\n| model | GFLOPs | mAP50(test set) | mAP50-95(test set) |\n| :----: | :----: | :----: | :----: |\n| yolov8n-asf-p2 | 12.0 | 0.237 | 0.127 |\n| yolov8s-asf-p2 | 35.8 | 0.282 | 0.155 |\n| yolov8n-asf-p2 cwd exp1 | 12.0 | 0.24(+0.003) | 0.129(+0.002) |\n| yolov8n-asf-p2 cwd exp2 | 12.0 | 0.239(+0.002) | 0.128(+0.001) |\n| yolov8n-asf-p2 cwd exp3 | 12.0 | 0.236(-0.001) | 0.125(-0.002) |\n| yolov8n-asf-p2 cwd exp4 | 12.0 | 0.239(+0.002) | 0.128(+0.001) |\n| yolov8n-asf-p2 cwd exp5 | 12.0 | 0.234(-0.004) | 0.125(-0.002) |\n| yolov8n-asf-p2 mgd exp1 | 12.0 | 0.234(-0.004) | 0.125(-0.002) |\n| yolov8n-asf-p2 mgd exp2 | 12.0 | 0.238(+0.001) | 0.127(0.0) |\n| yolov8n-asf-p2 BCKD exp1 | 12.0 | 0.241(+0.004) | 0.131(+0.004) |\n| yolov8n-asf-p2 BCKD exp2 | 12.0 | 0.24(+0.003) | 0.13(+0.003) |\n| yolov8n-asf-p2 cwd+BCKD exp1 | 12.0 | 0.241(+0.004) | 0.131(+0.004) |\n| yolov8n-asf-p2 cwd+BCKD exp2 | 12.0 | 0.239(+0.002) | 0.128(+0.001) |"
  },
  {
    "path": "yolo-improve/yolov8-erf.py",
    "content": "import warnings\nwarnings.filterwarnings('ignore')\nwarnings.simplefilter('ignore')\nimport torch, yaml, cv2, os, shutil, sys, glob\nimport numpy as np\nnp.random.seed(0)\nimport matplotlib.pyplot as plt\nfrom tqdm import trange\nfrom PIL import Image\nfrom ultralytics.nn.tasks import attempt_load_weights\nfrom timm.utils import AverageMeter\nimport matplotlib.pyplot as plt\nplt.rcParams[\"font.family\"] = \"Times New Roman\"\nimport seaborn as sns\n\ndef get_activation(feat, backbone_idx=-1):\n    def hook(model, inputs, outputs):\n        if backbone_idx != -1:\n            for _ in range(5 - len(outputs)): outputs.insert(0, None)\n            feat.append(outputs[backbone_idx])\n        else:\n            feat.append(outputs)\n    return hook\n\ndef letterbox(im, new_shape=(640, 640), color=(114, 114, 114), auto=True, scaleFill=False, scaleup=True, stride=32):\n    # Resize and pad image while meeting stride-multiple constraints\n    shape = im.shape[:2]  # current shape [height, width]\n    if isinstance(new_shape, int):\n        new_shape = (new_shape, new_shape)\n\n    # Scale ratio (new / old)\n    r = min(new_shape[0] / shape[0], new_shape[1] / shape[1])\n    if not scaleup:  # only scale down, do not scale up (for better val mAP)\n        r = min(r, 1.0)\n\n    # Compute padding\n    ratio = r, r  # width, height ratios\n    new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r))\n    dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1]  # wh padding\n    if auto:  # minimum rectangle\n        dw, dh = np.mod(dw, stride), np.mod(dh, stride)  # wh padding\n    elif scaleFill:  # stretch\n        dw, dh = 0.0, 0.0\n        new_unpad = (new_shape[1], new_shape[0])\n        ratio = new_shape[1] / shape[1], new_shape[0] / shape[0]  # width, height ratios\n\n    dw /= 2  # divide padding into 2 sides\n    dh /= 2\n\n    if shape[::-1] != new_unpad:  # resize\n        im = cv2.resize(im, new_unpad, interpolation=cv2.INTER_LINEAR)\n    top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1))\n    left, right = int(round(dw - 0.1)), int(round(dw + 0.1))\n    im = cv2.copyMakeBorder(im, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color)  # add border\n    return im, ratio, (dw, dh)\n\ndef get_rectangle(data, thresh):\n    h, w = data.shape\n    all_sum = np.sum(data)\n    for i in range(1, h // 2):\n        selected_area = data[h // 2 - i:h // 2 + 1 + i, w // 2 - i:w // 2 + 1 + i]\n        area_sum = np.sum(selected_area)\n        if area_sum / all_sum > thresh:\n            return i * 2 + 1, (i * 2 + 1) / h * (i * 2 + 1) / w\n    return None\n\ndef heatmap(data, camp='RdYlGn', figsize=(10, 10.75), ax=None, save_path=None):\n    plt.figure(figsize=figsize, dpi=40)\n    ax = sns.heatmap(data,\n                xticklabels=False,\n                yticklabels=False, cmap=camp,\n                center=0, annot=False, ax=ax, cbar=True, annot_kws={\"size\": 24}, fmt='.2f')\n    plt.tight_layout()\n    plt.savefig(save_path)\n\nclass yolov8_erf:\n    feature, hooks = [], []\n    \n    def __init__(self, weight, device, layer, dataset, num_images, save_path) -> None:\n        device = torch.device(device)\n        ckpt = torch.load(weight)\n        model = attempt_load_weights(weight, device)\n        model.info()\n        for p in model.parameters():\n            p.requires_grad_(True)\n        model.eval()\n        optimizer = torch.optim.SGD(model.parameters(), lr=0, weight_decay=0)\n        meter = AverageMeter()\n        optimizer.zero_grad()\n        \n        if '-' in layer:\n            layer_first, layer_second = layer.split('-')\n            self.hooks.append(model.model[int(layer_first)].register_forward_hook(get_activation(self.feature, backbone_idx=int(layer_second))))\n        else:\n            self.hooks.append(model.model[int(layer)].register_forward_hook(get_activation(self.feature)))\n    \n        self.__dict__.update(locals())\n    \n    def get_input_grad(self, samples):\n        _ = self.model(samples)\n        outputs = self.feature[-1]\n        self.feature.clear()\n        out_size = outputs.size()\n        central_point = torch.nn.functional.relu(outputs[:, :, out_size[2] // 2, out_size[3] // 2]).sum()\n        grad = torch.autograd.grad(central_point, samples)\n        grad = grad[0]\n        grad = torch.nn.functional.relu(grad)\n        aggregated = grad.sum((0, 1))\n        grad_map = aggregated.cpu().numpy()\n        return grad_map\n    \n    def process(self):\n        for image_path in os.listdir(self.dataset):\n            if self.meter.count == self.num_images:\n                break\n            \n            img = cv2.imread(f'{self.dataset}/{image_path}')\n            img = letterbox(img, auto=False)[0]\n            img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)\n            img = np.float32(img) / 255.0\n            samples = torch.from_numpy(np.transpose(img, axes=[2, 0, 1])).unsqueeze(0).to(self.device)\n            samples.requires_grad = True\n            self.optimizer.zero_grad()\n            contribution_scores = self.get_input_grad(samples)\n            \n            if np.isnan(np.sum(contribution_scores)):\n                print('got NAN, next image')\n                continue\n            else:\n                print(f'{self.meter.count}/{self.num_images} calculate....')\n                self.meter.update(contribution_scores)\n        \n        #   Set figure parameters\n        large = 24; med = 24; small = 24\n        params = {'axes.titlesize': large,\n                'legend.fontsize': med,\n                'figure.figsize': (16, 10),\n                'axes.labelsize': med,\n                'xtick.labelsize': med,\n                'ytick.labelsize': med,\n                'figure.titlesize': large}\n        plt.rcParams.update(params)\n        plt.style.use('seaborn-whitegrid')\n        sns.set_style(\"white\")\n        plt.rc('font', **{'family': 'Times New Roman'})\n        plt.rcParams['axes.unicode_minus'] = False\n        \n        data = self.meter.avg\n        print(f'max value:{np.max(data):.3f} min value:{np.min(data):.3f}')\n        \n        data = np.log10(data + 1)       #   the scores differ in magnitude. take the logarithm for better readability\n        data = data / np.max(data)      #   rescale to [0,1] for the comparability among models\n        print('======================= the high-contribution area ratio =====================')\n        for thresh in [0.2, 0.3, 0.5, 0.99]:\n            side_length, area_ratio = get_rectangle(data, thresh)\n            print('thresh, rectangle side length, area ratio: ', thresh, side_length, area_ratio)\n        heatmap(data, save_path=self.save_path)\n\n\ndef get_params():\n    params = {\n        'weight': 'yolov8n.pt', # 只需要指定权重即可\n        'device': 'cuda:0',\n        'layer': '10', # string\n        'dataset': '',\n        'num_images': 50,\n        'save_path': 'result.png'\n    }\n    return params\n\nif __name__ == '__main__':\n    cfg = get_params()\n    yolov8_erf(**cfg).process()"
  },
  {
    "path": "yolo-improve/yolov8-objectcount.py",
    "content": "import warnings\nwarnings.filterwarnings('ignore')\nimport cv2, os, shutil\nimport numpy as np\nfrom ultralytics import YOLO\n\ndef get_video_cfg(path):\n    video = cv2.VideoCapture(path)\n    size = (int(video.get(cv2.CAP_PROP_FRAME_WIDTH)), int(video.get(cv2.CAP_PROP_FRAME_HEIGHT)))\n    fps = int(video.get(cv2.CAP_PROP_FPS))\n    return cv2.VideoWriter_fourcc(*'XVID'), size, fps\n\ndef plot_and_counting(result):\n    image_plot = result.plot()\n    box_count = result.boxes.shape[0]\n    cv2.putText(image_plot, f'Object Counts:{box_count}', (10, 50), cv2.FONT_HERSHEY_SIMPLEX, 1.5, (0, 0, 255), 4)\n    return image_plot\n\nif __name__ == '__main__':\n    output_dir = 'result'\n    if os.path.exists(output_dir):\n        shutil.rmtree(output_dir)\n    os.makedirs(output_dir, exist_ok=True)\n    \n    model = YOLO('yolov8n.pt') # select your model.pt path\n    \n    # ----------------------for images or images-folder----------------------\n    for result in model.predict(source='ultralytics/assets',\n                  stream=True,\n                  imgsz=640,\n                  save=False,\n                  # conf=0.2,\n                  ):\n        image_plot = plot_and_counting(result)\n        cv2.imwrite(f'{output_dir}/{os.path.basename(result.path)}', image_plot)\n    \n    # ----------------------for video-folder----------------------\n    # video_base_path = 'video'\n    # for video_path in os.listdir(video_base_path):\n    #     fourcc, size, fps = get_video_cfg(f'{video_base_path}/{video_path}')\n    #     video_output = cv2.VideoWriter(f'{output_dir}/{video_path}', fourcc, fps, size)\n    #     for result in model.predict(source=f'{video_base_path}/{video_path}',\n    #                   stream=True,\n    #                   imgsz=640,\n    #                   save=False,\n    #                   # conf=0.2,\n    #                   ):\n    #         image_plot = plot_and_counting(result)\n    #         video_output.write(image_plot)\n    #     video_output.release()"
  },
  {
    "path": "yolo-improve/yolov8-track.py",
    "content": "import warnings\nwarnings.filterwarnings('ignore')\nimport cv2, os, shutil\nimport numpy as np\nfrom pathlib import Path\nfrom ultralytics import YOLO\nfrom boxmot import DeepOCSORT, BYTETracker, BoTSORT, StrongSORT, OCSORT, HybridSORT\n\ndef get_video_cfg(path):\n    video = cv2.VideoCapture(path)\n    size = (int(video.get(cv2.CAP_PROP_FRAME_WIDTH)), int(video.get(cv2.CAP_PROP_FRAME_HEIGHT)))\n    fps = int(video.get(cv2.CAP_PROP_FPS))\n    return cv2.VideoWriter_fourcc(*'XVID'), size, fps\n\ndef counting(image_plot, result):\n    box_count = result.boxes.shape[0]\n    cv2.putText(image_plot, f'Object Counts:{box_count}', (10, 50), cv2.FONT_HERSHEY_SIMPLEX, 1.5, (0, 0, 255), 4)\n    return image_plot\n\ndef transform_mot(result):\n    mot_result = []\n    for i in range(result.boxes.shape[0]):\n        mot_result.append(result.boxes.xyxy[i].cpu().detach().cpu().numpy().tolist() + [float(result.boxes.conf[i]), float(result.boxes.cls[i])])\n    return np.array(mot_result)\n\n# boxmot                        10.0.57\nif __name__ == '__main__':\n    output_dir = 'result'\n    if os.path.exists(output_dir):\n        shutil.rmtree(output_dir)\n    os.makedirs(output_dir, exist_ok=True)\n    \n    model = YOLO('runs/train/yolov8m-crowdhuman/weights/best.pt') # select your model.pt path\n    \n    video_base_path = 'video'\n    for video_path in os.listdir(video_base_path):\n        \n        tracker = DeepOCSORT(\n        model_weights=Path('osnet_x1_0_msmt17_256x128_amsgrad_ep150_stp60_lr0.0015_b64_fb10_softmax_labelsmooth_flip.pt'), # which ReID model to use\n        device='cuda:0',\n        fp16=False,\n        )\n        # tracker = BoTSORT(\n        #     model_weights=Path('osnet_x1_0_msmt17_256x128_amsgrad_ep150_stp60_lr0.0015_b64_fb10_softmax_labelsmooth_flip.pt'), # which ReID model to use\n        #     device='cuda:0',\n        #     fp16=False,\n        # )\n        # tracker = StrongSORT(\n        #     model_weights=Path('osnet_x1_0_msmt17_256x128_amsgrad_ep150_stp60_lr0.0015_b64_fb10_softmax_labelsmooth_flip.pt'), # which ReID model to use\n        #     device='cuda:0',\n        #     fp16=False,\n        # )\n        # tracker = HybridSORT(\n        #     reid_weights=Path('osnet_x1_0_msmt17_256x128_amsgrad_ep150_stp60_lr0.0015_b64_fb10_softmax_labelsmooth_flip.pt'), # which ReID model to use\n        #     device='cuda:0',\n        #     half=False,\n        #     det_thresh=0.3,\n        # )\n        # tracker = BYTETracker()\n        # tracker = OCSORT()\n        \n        fourcc, size, fps = get_video_cfg(f'{video_base_path}/{video_path}')\n        video_output = cv2.VideoWriter(f'{output_dir}/{video_path}', fourcc, fps, size)\n        for result in model.predict(source=f'{video_base_path}/{video_path}',\n                      stream=True,\n                      imgsz=640,\n                      save=False,\n                      # conf=0.2,\n                      classes=1\n                      ):\n            image_plot = result.orig_img\n            mot_input = transform_mot(result)\n            try:\n                tracker.update(mot_input, image_plot)\n                tracker.plot_results(image_plot, show_trajectories=True)\n            except:\n                continue\n            counting(image_plot, result)\n            video_output.write(image_plot)\n        video_output.release()"
  },
  {
    "path": "yolo-improve/yolov8.py",
    "content": "from ultralytics import YOLO\n\n# 安装命令\n# python setup.py develop\n\n# 数据集示例百度云链接\n# 链接：https://pan.baidu.com/s/19FM7XnKEFC83vpiRdtNA8A?pwd=n93i \n# 提取码：n93i \n\nif __name__ == '__main__':\n    # 直接使用预训练模型创建模型.\n    model = YOLO('yolov8n.pt')\n    model.train(**{'cfg':'ultralytics/cfg/exp1.yaml', 'data':'dataset/data.yaml'})\n    \n    # 使用yaml配置文件来创建模型,并导入预训练权重.\n    model = YOLO('ultralytics/cfg/models/v8/yolov8.yaml')\n    model.load('yolov8n.pt')\n    model.train(**{'cfg':'ultralytics/cfg/exp1.yaml', 'data':'dataset/data.yaml'})\n    \n    # 模型验证\n    model = YOLO('runs/detect/yolov8n_exp/weights/best.pt')\n    model.val(**{'data':'dataset/data.yaml'})\n    \n    # 模型推理\n    model = YOLO('runs/detect/yolov8n_exp/weights/best.pt')\n    model.predict(source='dataset/images/test', **{'save':True})"
  },
  {
    "path": "yolo-improve/yolov8v10-project.md",
    "content": "# [基于Ultralytics的YOLOV8V10改进项目.(69.9¥)](https://github.com/z1069614715/objectdetection_script)\n\n# 目前自带的一些改进方案(目前拥有合计300+个改进点！持续更新！)\n\n# 为了感谢各位对本项目的支持,本项目的赠品是yolov5-PAGCP通道剪枝算法.[具体使用教程](https://www.bilibili.com/video/BV1yh4y1Z7vz/)\n\n# 专栏改进汇总\n\n## YOLOV8系列\n### 二次创新系列\n1. ultralytics/cfg/models/v8/yolov8-RevCol.yaml\n\n    使用(ICLR2023)Reversible Column Networks对yolov8主干进行重设计,里面的支持更换不同的C2f-Block.\n2. EMASlideLoss\n\n    使用EMA思想与SlideLoss进行相结合.\n3. ultralytics/cfg/models/v8/yolov8-dyhead-DCNV3.yaml\n\n    使用[DCNV3](https://github.com/OpenGVLab/InternImage)替换DyHead中的DCNV2.\n4. ultralytics/cfg/models/v8/yolov8-C2f-EMBC.yaml\n\n    使用[Efficientnet](https://blog.csdn.net/weixin_43334693/article/details/131114618?spm=1001.2014.3001.5501)中的MBConv与EffectiveSE改进C2f.\n5. ultralytics/cfg/models/v8/yolov8-GhostHGNetV2.yaml\n\n    使用Ghost_HGNetV2作为YOLOV8的backbone.\n6. ultralytics/cfg/models/v8/yolov8-RepHGNetV2.yaml\n\n    使用Rep_HGNetV2作为YOLOV8的backbone.\n7. ultralytics/cfg/models/v8/yolov8-C2f-DWR-DRB.yaml\n\n    使用[UniRepLKNet](https://github.com/AILab-CVC/UniRepLKNet/tree/main)中的DilatedReparamBlock对[DWRSeg](https://arxiv.org/abs/2212.01173)中的Dilation-wise Residual(DWR)的模块进行二次创新后改进C2f.\n8. ultralytics/cfg/models/v8/yolov8-ASF-P2.yaml\n\n    在ultralytics/cfg/models/v8/yolov8-ASF.yaml的基础上进行二次创新，引入P2检测层并对网络结构进行优化.\n9. ultralytics/cfg/models/v8/yolov8-CSP-EDLAN.yaml\n\n    使用[DualConv](https://github.com/ChipsGuardian/DualConv)打造CSP Efficient Dual Layer Aggregation Networks改进yolov8.\n10. ultralytics/cfg/models/v8/yolov8-bifpn-SDI.yaml\n\n    使用[U-NetV2](https://github.com/yaoppeng/U-Net_v2)中的 Semantics and Detail Infusion Module对BIFPN进行二次创新.\n11. ultralytics/cfg/models/v8/yolov8-goldyolo-asf.yaml\n\n    利用华为2023最新GOLD-YOLO中的Gatherand-Distribute与[ASF-YOLO](https://github.com/mkang315/ASF-YOLO)中的Attentional Scale Sequence Fusion进行二次创新改进yolov8的neck.\n12. ultralytics/cfg/models/v8/yolov8-dyhead-DCNV4.yaml\n\n    使用[DCNV4](https://github.com/OpenGVLab/DCNv4)对DyHead进行二次创新.(请关闭AMP进行训练,使用教程请看20240116版本更新说明)\n13. ultralytics/cfg/models/v8/yolov8-HSPAN.yaml\n\n    对[MFDS-DETR](https://github.com/JustlfC03/MFDS-DETR)中的HS-FPN进行二次创新后得到HSPAN改进yolov8的neck.\n14. ultralytics/cfg/models/v8/yolov8-GDFPN.yaml\n\n    使用[DAMO-YOLO](https://github.com/tinyvision/DAMO-YOLO)中的RepGFPN与[ICCV2023 DySample](https://arxiv.org/abs/2308.15085)进行二次创新改进Neck.\n15. ultralytics/cfg/models/v8/yolov8-HSPAN-DySample.yaml\n\n    对[MFDS-DETR](https://github.com/JustlfC03/MFDS-DETR)中的HS-FPN进行二次创新后得到HSPAN再进行创新,使用[ICCV2023 DySample](https://arxiv.org/abs/2308.15085)改进其上采样模块.\n16. ultralytics/cfg/models/v8/yolov8-ASF-DySample.yaml\n\n    使用[ASF-YOLO](https://github.com/mkang315/ASF-YOLO)中的Attentional Scale Sequence Fusion与[ICCV2023 DySample](https://arxiv.org/abs/2308.15085)组合得到Dynamic Sample Attentional Scale Sequence Fusion.\n\n17. ultralytics/cfg/models/v8/yolov8-C2f-DCNV2-Dynamic.yaml\n\n    利用自研注意力机制MPCA强化DCNV2中的offset和mask.\n\n18. ultralytics/cfg/models/v8/yolov8-C2f-iRMB-Cascaded.yaml\n\n    使用[EfficientViT CVPR2023](https://github.com/microsoft/Cream/tree/main/EfficientViT)中的CascadedGroupAttention对[EMO ICCV2023](https://github.com/zhangzjn/EMO)中的iRMB进行二次创新来改进C2f.\n\n19. ultralytics/cfg/models/v8/yolov8-C2f-iRMB-DRB.yaml\n\n    使用[UniRepLKNet](https://github.com/AILab-CVC/UniRepLKNet/tree/main)中的DilatedReparamBlock对[EMO ICCV2023](https://github.com/zhangzjn/EMO)中的iRMB进行二次创新来改进C2f.\n\n20. ultralytics/cfg/models/v8/yolov8-C2f-iRMB-SWC.yaml\n\n    使用[shift-wise conv](https://arxiv.org/abs/2401.12736)对[EMO ICCV2023](https://github.com/zhangzjn/EMO)中的iRMB进行二次创新来改进C2f.\n\n21. ultralytics/cfg/models/v8/yolov8-DBBNCSPELAN.yaml\n\n    使用[Diverse Branch Block CVPR2021](https://arxiv.org/abs/2103.13425)对[YOLOV9](https://github.com/WongKinYiu/yolov9)中的RepNCSPELAN进行二次创新后改进yolov8.\n\n22. ultralytics/cfg/models/v8/yolov8-OREPANCSPELAN.yaml\n\n    使用[Online Convolutional Re-parameterization (CVPR2022)](https://github.com/JUGGHM/OREPA_CVPR2022/tree/main)对[YOLOV9](https://github.com/WongKinYiu/yolov9)中的RepNCSPELAN进行二次创新后改进yolov8.\n\n23. ultralytics/cfg/models/v8/yolov8-DRBNCSPELAN.yaml\n\n    使用[UniRepLKNet](https://github.com/AILab-CVC/UniRepLKNet/tree/main)中的DilatedReparamBlock对[YOLOV9](https://github.com/WongKinYiu/yolov9)中的RepNCSPELAN进行二次创新后改进yolov8.\n\n24. ultralytics/cfg/models/v8/yolov8-DynamicHGNetV2.yaml\n\n    使用[CVPR2024 parameternet](https://arxiv.org/pdf/2306.14525v2.pdf)中的DynamicConv对[CVPR2024 RTDETR](https://arxiv.org/abs/2304.08069)中的HGBlokc进行二次创新.\n\n25. ultralytics/cfg/models/v8/yolov8-C2f-RVB-EMA.yaml\n\n    使用[CVPR2024 RepViT](https://github.com/THU-MIG/RepViT/tree/main)中的RepViTBlock和EMA注意力机制改进C2f.\n\n26. ultralytics/cfg/models/v8/yolov8-ELA-HSFPN.yaml\n\n    使用[Efficient Local Attention](https://arxiv.org/abs/2403.01123)改进HSFPN.\n\n27. ultralytics/cfg/models/v8/yolov8-CA-HSFPN.yaml\n\n    使用[Coordinate Attention CVPR2021](https://github.com/houqb/CoordAttention)改进HSFPN.\n\n28. ultralytics/cfg/models/v8/yolov8-CAA-HSFPN.yaml\n\n    使用[CVPR2024 PKINet](https://github.com/PKINet/PKINet)中的CAA模块HSFPN.\n\n29. ultralytics/cfg/models/v8/yolov8-CSMHSA.yaml\n\n    对Mutil-Head Self-Attention进行创新得到Cross-Scale Mutil-Head Self-Attention.\n    1. 由于高维通常包含更高级别的语义信息，而低维包含更多细节信息，因此高维信息作为query，而低维信息作为key和Value，将两者结合起来可以利用高维的特征帮助低维的特征进行精细过滤，可以实现更全面和丰富的特征表达。\n    2. 通过使用高维的上采样信息进行Query操作，可以更好地捕捉到目标的全局信息，从而有助于增强模型对目标的识别和定位能力。\n\n30. ultralytics/cfg/models/v8/yolov8-CAFMFusion.yaml\n\n    利用具有[HCANet](https://github.com/summitgao/HCANet)中的CAFM，其具有获取全局和局部信息的注意力机制进行二次改进content-guided attention fusion.\n\n31. ultralytics/cfg/models/v8/yolov8-C2f-Faster-CGLU.yaml\n\n    使用[TransNeXt CVPR2024](https://github.com/DaiShiResearch/TransNeXt)中的Convolutional GLU对CVPR2023中的FasterNet进行二次创新.\n\n32. ultralytics/cfg/models/v8/yolov8-C2f-Star-CAA.yaml\n\n    使用[StarNet CVPR2024](https://github.com/ma-xu/Rewrite-the-Stars/tree/main)中的StarBlock和[CVPR2024 PKINet](https://github.com/PKINet/PKINet)中的CAA改进C2f.\n\n33. ultralytics/cfg/models/v8/yolov8-bifpn-GLSA.yaml\n\n    使用[GLSA](https://github.com/Barrett-python/DuAT)模块对bifpn进行二次创新.\n\n34. ultralytics/cfg/models/v8/yolov8-BIMAFPN.yaml\n\n    利用BIFPN的思想对[MAF-YOLO](https://arxiv.org/pdf/2407.04381)的MAFPN进行二次改进得到BIMAFPN.\n\n35. ultralytics/cfg/models/v8/yolov8-C2f-AdditiveBlock-CGLU.yaml\n\n    使用[CAS-ViT](https://github.com/Tianfang-Zhang/CAS-ViT)中的AdditiveBlock和[TransNeXt CVPR2024](https://github.com/DaiShiResearch/TransNeXt)中的Convolutional GLU改进c2f.\n\n36. ultralytics/cfg/models/v8/yolov8-C2f-MSMHSA-CGLU.yaml\n\n    使用[CMTFNet](https://github.com/DrWuHonglin/CMTFNet/tree/main)中的M2SA和[TransNeXt CVPR2024](https://github.com/DaiShiResearch/TransNeXt)中的Convolutional GLU改进c2f.\n\n37. ultralytics/cfg/models/v8/yolov8-C2f-IdentityFormer-CGLU.yaml\n\n    使用[Metaformer TPAMI2024](https://github.com/sail-sg/metaformer)中的IdentityFormer和[TransNeXt CVPR2024](https://github.com/DaiShiResearch/TransNeXt)中的CGLU改进c2f.\n\n38. ultralytics/cfg/models/v8/yolov8-C2f-RandomMixing-CGLU.yaml\n\n    使用[Metaformer TPAMI2024](https://github.com/sail-sg/metaformer)中的RandomMixing和[TransNeXt CVPR2024](https://github.com/DaiShiResearch/TransNeXt)中的CGLU改进c2f.\n\n39. ultralytics/cfg/models/v8/yolov8-C2f-PoolingFormer-CGLU.yaml\n\n    使用[Metaformer TPAMI2024](https://github.com/sail-sg/metaformer)中的PoolingFormer和[TransNeXt CVPR2024](https://github.com/DaiShiResearch/TransNeXt)中的CGLU改进c2f.\n\n40. ultralytics/cfg/models/v8/yolov8-C2f-ConvFormer-CGLU.yaml\n\n    使用[Metaformer TPAMI2024](https://github.com/sail-sg/metaformer)中的ConvFormer和[TransNeXt CVPR2024](https://github.com/DaiShiResearch/TransNeXt)中的CGLU改进c2f.\n\n41. ultralytics/cfg/models/v8/yolov8-C2f-CaFormer-CGLU.yaml\n\n    使用[Metaformer TPAMI2024](https://github.com/sail-sg/metaformer)中的CaFormer和[TransNeXt CVPR2024](https://github.com/DaiShiResearch/TransNeXt)中的CGLU改进c2f.\n\n42. ultralytics/cfg/models/v8/yolov8-MAN-Faster.yaml\n\n    使用[Hyper-YOLO](https://www.arxiv.org/pdf/2408.04804)中的 Mixed Aggregation Network和[FasterNet CVPR2023](https://github.com/JierunChen/FasterNet)中的Faster-Block进行二次创新改进yolov8.\n\n43. ultralytics/cfg/models/v8/yolov8-MAN-FasterCGLU.yaml\n\n    使用[Hyper-YOLO](https://www.arxiv.org/pdf/2408.04804)中的 Mixed Aggregation Network和[FasterNet CVPR2023](https://github.com/JierunChen/FasterNet)中的Faster-Block和[TransNeXt CVPR2024](https://github.com/DaiShiResearch/TransNeXt)中的Convolutional GLU进行二次创新改进yolov8.\n\n44. ultralytics/cfg/models/v8/yolov8-MAN-Star.yaml\n\n    使用[Hyper-YOLO](https://www.arxiv.org/pdf/2408.04804)中的 Mixed Aggregation Network和[StarNet CVPR2024](https://github.com/ma-xu/Rewrite-the-Stars/tree/main)中的StarBlock进行二次创新改进yolov8.\n\n45. ultralytics/cfg/models/v8/yolov8-MutilBackbone-MSGA.yaml\n\n    使用[MSA^2 Net](https://github.com/xmindflow/MSA-2Net)中的Multi-Scale Adaptive Spatial Attention Gate对自研系列MutilBackbone再次创新.\n\n46. ultralytics/cfg/models/v8/yolov8-slimneck-WFU.yaml\n\n    使用[ACMMM2024 WFEN](https://github.com/PRIS-CV/WFEN)中的Wavelet Feature Upgrade对slimneck二次创新.\n\n47. ultralytics/cfg/models/v8/yolov8-MAN-FasterCGLU-WFU.yaml\n\n    使用[ACMMM2024 WFEN](https://github.com/PRIS-CV/WFEN)中的Wavelet Feature Upgrade和[Hyper-YOLO](https://www.arxiv.org/pdf/2408.04804)中的 Mixed Aggregation Network和[FasterNet CVPR2023](https://github.com/JierunChen/FasterNet)中的Faster-Block和[TransNeXt CVPR2024](https://github.com/DaiShiResearch/TransNeXt)中的Convolutional GLU进行二次创新改进yolov8.\n\n48. ultralytics/cfg/models/v8/yolov8-CDFA.yaml\n\n    使用[ACMMM2024 WFEN](https://github.com/PRIS-CV/WFEN)中的WaveletConv与[AAAI2025 ConDSeg](https://github.com/Mengqi-Lei/ConDSeg)的ContrastDrivenFeatureAggregation结合改进yolov8.\n\n49. ultralytics/cfg/models/v8/yolov8-C2f-StripCGLU.yaml\n\n    使用[Strip R-CNN](https://arxiv.org/pdf/2501.03775)中的StripBlock和[TransNeXt CVPR2024](https://github.com/DaiShiResearch/TransNeXt)中的Convolutional GLU改进C2f.\n\n50. ultralytics/cfg/models/v8/yolov8-C2f-Faster-KAN.yaml\n\n    使用[ICLR2025 Kolmogorov-Arnold Transformer](https://github.com/Adamdad/kat)中的KAN对(CVPR2023)fasternet中的FastetBlock进行二次创新.\n\n51. ultralytics/cfg/models/v8/yolov8-C2f-DIMB-KAN.yaml\n\n    在yolov8-C2f-DIMB.yaml的基础上把mlp模块换成[ICLR2025 Kolmogorov-Arnold Transformer](https://github.com/Adamdad/kat)中的KAN.\n\n52. Localization Quality Estimation - Lightweight Shared Convolutional Detection Head\n\n    Localization Quality Estimation模块出自[GFocalV2](https://arxiv.org/abs/2011.12885).\n    detect:ultralytics/cfg/models/v8/yolov8-LSCD-LQE.yaml\n    seg:ultralytics/cfg/models/v8/yolov8-seg-LSCD-LQE.yaml\n    pose:ultralytics/cfg/models/v8/yolov8-pose-LSCD-LQE.yaml\n    obb:ultralytics/cfg/models/v8/yolov8-obb-LSCD-LQE.yaml\n\n53. ultralytics/cfg/models/v8/yolov8-C2f-EfficientVIM-CGLU.yaml\n\n    使用[CVPR2025 EfficientViM](https://github.com/mlvlab/EfficientViM)中的EfficientViMBlock和[TransNeXt CVPR2024](https://github.com/DaiShiResearch/TransNeXt)中的Convolutional GLU改进C2f.\n\n54. ultralytics/cfg/models/v8/yolov8-EUCB-SC.yaml\n\n    使用[CVPR2024 EMCAD](https://github.com/SLDGroup/EMCAD)中的EUCB和[CVPR2025 BHViT](https://github.com/IMRL/BHViT)中的ShiftChannelMix改进yolov8的上采样.\n\n55. ultralytics/cfg/models/v8/yolov8-EMBSFPN-SC.yaml\n\n    在ultralytics/cfg/models/v8/yolov8-EMBSFPN.yaml方案上引入[CVPR2025 BHViT](https://github.com/IMRL/BHViT)中的ShiftChannelMix.\n\n56. ultralytics/cfg/models/v8/yolov8-MFMMAFPN.yaml\n\n    使用[CVPR2024 DCMPNet](https://github.com/zhoushen1/DCMPNet)中的MFM对[MAF-YOLO](https://arxiv.org/pdf/2407.04381)的MAFPN进行二次创新.\n\n57. ultralytics/cfg/models/v8/yolov8-MBSMFFPN.yaml\n\n    使用[CVPR2024 DCMPNet](https://github.com/zhoushen1/DCMPNet)中的MFM对yolov8-EMBSFPN.yaml再次创新 Multi-Branch&Scale Modulation-Fusion FPN.\n\n58. ultralytics/cfg/models/v8/yolov8-C2f-mambaout-LSConv.yaml\n\n    使用[CVPR2025 LSNet](https://github.com/THU-MIG/lsnet)的LSConv与[CVPR2025 MambaOut](https://github.com/yuweihao/MambaOut)中的MambaOutBlock二次创新后改进C2f.\n\n59. ultralytics/cfg/models/v8/yolov8-SOEP-RFPN-MFM.yaml\n\n    使用[ECCV2024 rethinking-fpn](https://github.com/AlanLi1997/rethinking-fpn)的SNI和GSConvE和[CVPR2024 DCMPNet](https://github.com/zhoushen1/DCMPNet)中的MFM对原创改进SOEP再次创新.\n\n60. ultralytics/cfg/models/v8/yolov8-SOEP-PST.yaml\n\n    使用[Pyramid Sparse Transformer](https://arxiv.org/abs/2505.12772)中的Pyramid Sparse Transformer对SOEP进行二次创新.\n\n61. ultralytics/cfg/models/v8/yolov8-MAN-GCConv.yaml\n\n    使用[CVPR2025 Golden Cudgel Network](https://github.com/gyyang23/GCNet)中的GCConv改进[Hyper-YOLO TPAMI2025](https://www.arxiv.org/pdf/2408.04804)中的Mixed Aggregation Network.\n\n### 自研系列\n1. ultralytics/cfg/models/v8/yolov8-LAWDS.yaml\n\n    Light Adaptive-weight downsampling.自研模块,具体讲解请看百度云链接中的视频.\n\n2. ultralytics/cfg/models/v8/yolov8-C2f-EMSC.yaml\n\n    Efficient Multi-Scale Conv.自研模块,具体讲解请看百度云链接中的视频.\n\n3. ultralytics/cfg/models/v8/yolov8-C2f-EMSCP.yaml\n\n    Efficient Multi-Scale Conv Plus.自研模块,具体讲解请看百度云链接中的视频.\n\n4. Lightweight Shared Convolutional Detection Head\n\n    自研轻量化检测头.\n    detect:ultralytics/cfg/models/v8/yolov8-LSCD.yaml\n    seg:ultralytics/cfg/models/v8/yolov8-seg-LSCD.yaml\n    pose:ultralytics/cfg/models/v8/yolov8-pose-LSCD.yaml\n    obb:ultralytics/cfg/models/v8/yolov8-obb-LSCD.yaml\n    1. GroupNorm在FOCS论文中已经证实可以提升检测头定位和分类的性能.\n    2. 通过使用共享卷积，可以大幅减少参数数量，这使得模型更轻便，特别是在资源受限的设备上.\n    3. 在使用共享卷积的同时，为了应对每个检测头所检测的目标尺度不一致的问题，使用Scale层对特征进行缩放.\n    综合以上，我们可以让检测头做到参数量更少、计算量更少的情况下，尽可能减少精度的损失.\n\n5. Task Align Dynamic Detection Head\n\n    自研任务对齐动态检测头.\n    detect:ultralytics/cfg/models/v8/yolov8-TADDH.yaml\n    seg:ultralytics/cfg/models/v8/yolov8-seg-TADDH.yaml\n    pose:ultralytics/cfg/models/v8/yolov8-pose-TADDH.yaml\n    obb:ultralytics/cfg/models/v8/yolov8-obb-TADDH.yaml\n    1. GroupNorm在FCOS论文中已经证实可以提升检测头定位和分类的性能.\n    2. 通过使用共享卷积，可以大幅减少参数数量，这使得模型更轻便，特别是在资源受限的设备上.并且在使用共享卷积的同时，为了应对每个检测头所检测的目标尺度不一致的问题，使用Scale层对特征进行缩放.\n    3. 参照TOOD的思想,除了标签分配策略上的任务对齐,我们也在检测头上进行定制任务对齐的结构,现有的目标检测器头部通常使用独立的分类和定位分支,这会导致两个任务之间缺乏交互,TADDH通过特征提取器从多个卷积层中学习任务交互特征,得到联合特征,定位分支使用DCNV2和交互特征生成DCNV2的offset和mask,分类分支使用交互特征进行动态特征选择.\n\n6. ultralytics/cfg/models/v8/yolov8-FDPN.yaml\n\n    自研特征聚焦扩散金字塔网络(Focusing Diffusion Pyramid Network)\n    1. 通过定制的特征聚焦模块与特征扩散机制，能让每个尺度的特征都具有详细的上下文信息，更有利于后续目标的检测与分类。\n    2. 定制的特征聚焦模块可以接受三个尺度的输入，其内部包含一个Inception-Style的模块，其利用一组并行深度卷积来捕获丰富的跨多个尺度的信息。\n    3. 通过扩散机制使具有丰富的上下文信息的特征进行扩散到各个检测尺度.\n\n7. ultralytics/cfg/models/v8/yolov8-FDPN-DASI.yaml\n\n    使用[HCFNet](https://github.com/zhengshuchen/HCFNet)中的Dimension-Aware Selective Integration Module对自研的Focusing Diffusion Pyramid Network再次创新.\n\n8. ultralytics/cfg/models/v8/yolov8-RGCSPELAN.yaml\n\n    自研RepGhostCSPELAN.\n    1. 参考GhostNet中的思想(主流CNN计算的中间特征映射存在广泛的冗余)，采用廉价的操作生成一部分冗余特征图，以此来降低计算量和参数量。\n    2. 舍弃yolov5与yolov8中常用的BottleNeck，为了弥补舍弃残差块所带来的性能损失，在梯度流通分支上使用RepConv，以此来增强特征提取和梯度流通的能力，并且RepConv可以在推理的时候进行融合，一举两得。\n    3. 可以通过缩放因子控制RGCSPELAN的大小，使其可以兼顾小模型和大模型。\n\n9. Lightweight Shared Convolutional Separamter BN Detection Head\n\n    基于自研轻量化检测头上，参考NASFPN的设计思路把GN换成BN，并且BN层参数不共享.\n    detect:ultralytics/cfg/models/v8/yolov8-LSCSBD.yaml\n    seg:ultralytics/cfg/models/v8/yolov8-seg-LSCSBD.yaml\n    pose:ultralytics/cfg/models/v8/yolov8-pose-LSCSBD.yaml\n    obb:ultralytics/cfg/models/v8/yolov8-obb-LSCSBD.yaml\n    1. 由于不同层级之间特征的统计量仍存在差异，Normalization layer依然是必须的，由于直接在共享参数的检测头中引入BN会导致其滑动平均值产生误差，而引入 GN 又会增加推理时的开销，因此我们参考NASFPN的做法，让检测头共享卷积层，而BN则分别独立计算。\n\n10. ultralytics/cfg/models/v8/yolov8-EIEStem.yaml\n\n    1. 通过SobelConv分支，可以提取图像的边缘信息。由于Sobel滤波器可以检测图像中强度的突然变化，因此可以很好地捕捉图像的边缘特征。这些边缘特征在许多计算机视觉任务中都非常重要，例如图像分割和物体检测。\n    2. EIEStem模块还结合空间信息，除了边缘信息，EIEStem还通过池化分支提取空间信息，保留重要的空间信息。结合边缘信息和空间信息，可以帮助模型更好地理解图像内容。\n    3. 通过3D组卷积高效实现Sobel算子。\n\n11. ultralytics/cfg/models/v8/yolov8-C2f-EIEM.yaml\n\n    提出了一种新的EIEStem模块，旨在作为图像识别任务中的高效前端模块。该模块结合了提取边缘信息的SobelConv分支和提取空间信息的卷积分支，能够学习到更加丰富的图像特征表示。\n    1. 边缘信息学习: 卷积神经网络 (CNN)通常擅长学习空间信息，但是对于提取图像中的边缘信息可能稍显不足。EIEStem 模块通过SobelConv分支，显式地提取图像的边缘特征。Sobel滤波器是一种经典的边缘检测滤波器，可以有效地捕捉图像中强度的突然变化，从而获得重要的边缘信息。\n    2. 空间信息保留: 除了边缘信息，图像中的空间信息也同样重要。EIEStem模块通过一个额外的卷积分支 (conv_branch) 来提取空间信息。与SobelCon 分支不同，conv_branch提取的是原始图像的特征，可以保留丰富的空间细节。\n    3. 特征融合: EIEStem模块将来自SobelConv分支和conv_branch提取的特征进行融合 (concatenate)。 这种融合操作使得学习到的特征表示既包含了丰富的边缘信息，又包含了空间信息，能够更加全面地刻画图像内容。\n\n12. ultralytics/cfg/models/v8/yolov8-ContextGuideFPN.yaml\n\n    Context Guide Fusion Module（CGFM）是一个创新的特征融合模块，旨在改进YOLOv8中的特征金字塔网络（FPN）。该模块的设计考虑了多尺度特征融合过程中上下文信息的引导和自适应调整。\n    1. 上下文信息的有效融合：通过SE注意力机制，模块能够在特征融合过程中捕捉并利用重要的上下文信息，从而增强特征表示的有效性，并有效引导模型学习检测目标的信息，从而提高模型的检测精度。\n    2. 特征增强：通过权重化的特征重组操作，模块能够增强重要特征，同时抑制不重要特征，提升特征图的判别能力。\n    3. 简单高效：模块结构相对简单，不会引入过多的计算开销，适合在实时目标检测任务中应用。\n    这期视频讲解在B站:https://www.bilibili.com/video/BV1Vx4y1n7hZ/\n\n13. ultralytics/cfg/models/v8/yolov8-LSDECD.yaml\n\n    基于自研轻量化检测头上(LSCD)，使用detail-enhanced convolution进一步改进，提高检测头的细节捕获能力，进一步改善检测精度.\n    detect:ultralytics/cfg/models/v8/yolov8-LSDECD.yaml\n    segment:ultralytics/cfg/models/v8/yolov8-seg-LSDECD.yaml\n    pose:ultralytics/cfg/models/v8/yolov8-pose-LSDECD.yaml\n    obb:ultralytics/cfg/models/v8/yolov8-obb-LSDECD.yaml\n    1. DEA-Net中设计了一个细节增强卷积（DEConv），具体来说DEConv将先验信息整合到普通卷积层，以增强表征和泛化能力。然后，通过使用重参数化技术，DEConv等效地转换为普通卷积，不需要额外的参数和计算成本。\n\n14. ultralytics/cfg/models/v8/yolov8-C2f-SMPCGLU.yaml\n\n    Self-moving Point Convolutional GLU模型改进C2f.\n    SMP来源于[CVPR2023-SMPConv](https://github.com/sangnekim/SMPConv),Convolutional GLU来源于[TransNeXt CVPR2024](https://github.com/DaiShiResearch/TransNeXt).\n    1. 普通的卷积在面对数据中的多样性和复杂性时，可能无法捕捉到有效的特征，因此我们采用了SMPConv，其具备最新的自适应点移动机制，从而更好地捕捉局部特征，提高特征提取的灵活性和准确性。\n    2. 在SMPConv后添加CGLU，Convolutional GLU 结合了卷积和门控机制，能够选择性地通过信息通道，提高了特征提取的有效性和灵活性。\n\n15. Re-CalibrationFPN\n\n    为了加强浅层和深层特征的相互交互能力，推出重校准特征金字塔网络(Re-CalibrationFPN).\n    P2345：ultralytics/cfg/models/v8/yolov8-ReCalibrationFPN-P2345.yaml(带有小目标检测头的ReCalibrationFPN)\n    P345：ultralytics/cfg/models/v8/yolov8-ReCalibrationFPN-P345.yaml\n    P3456：ultralytics/cfg/models/v8/yolov8-ReCalibrationFPN-P3456.yaml(带有大目标检测头的ReCalibrationFPN)\n    1. 浅层语义较少，但细节丰富，有更明显的边界和减少失真。此外，深层蕴藏着丰富的物质语义信息。因此，直接融合低级具有高级特性的特性可能导致冗余和不一致。为了解决这个问题，我们提出了SBA模块，它有选择地聚合边界信息和语义信息来描绘更细粒度的物体轮廓和重新校准物体的位置。\n    2. 相比传统的FPN结构，SBA模块引入了高分辨率和低分辨率特征之间的双向融合机制，使得特征之间的信息传递更加充分，进一步提升了多尺度特征融合的效果。\n    3. SBA模块通过自适应的注意力机制，根据特征图的不同分辨率和内容，自适应地调整特征的权重，从而更好地捕捉目标的多尺度特征。\n\n16. ultralytics/cfg/models/v8/yolov8-CSP-PTB.yaml\n\n    Cross Stage Partial - Partially Transformer Block\n    在计算机视觉任务中，Transformer结构因其强大的全局特征提取能力而受到广泛关注。然而，由于Transformer结构的计算复杂度较高，直接将其应用于所有通道会导致显著的计算开销。为了在保证高效特征提取的同时降低计算成本，我们设计了一种混合结构，将输入特征图分为两部分，分别由CNN和Transformer处理，结合了卷积神经网络(CNN)和Transformer机制的模块，旨在增强特征提取的能力。\n    我们提出了一种名为CSP_PTB(Cross Stage Partial - Partially Transformer Block)的模块，旨在结合CNN和Transformer的优势，通过对输入通道进行部分分配来优化计算效率和特征提取能力。\n    1. 融合局部和全局特征：多项研究表明，CNN的感受野大小较少，导致其只能提取局部特征，但Transformer的MHSA能够提取全局特征，能够同时利用两者的优势。\n    2. 保证高效特征提取的同时降低计算成本：为了能引入Transformer结构来提取全局特征又不想大幅度增加计算复杂度，因此提出Partially Transformer Block，只对部分通道使用TransformerBlock。\n    3. MHSA_CGLU包含Mutil-Head-Self-Attention和[ConvolutionalGLU(TransNext CVPR2024)](https://github.com/DaiShiResearch/TransNeXt)，其中Mutil-Head-Self-Attention负责提取全局特征，ConvolutionalGLU用于增强非线性特征表达能力，ConvolutionalGLU相比于传统的FFN，具有更强的性能。\n    4. 可以根据不同的模型大小和具体的运行情况调节用于Transformer的通道数。\n\n17. ultralytics/cfg/models/v8/yolov8-SOEP.yaml  \n    \n    小目标在正常的P3、P4、P5检测层上略显吃力，比较传统的做法是加上P2检测层来提升小目标的检测能力，但是同时也会带来一系列的问题，例如加上P2检测层后计算量过大、后处理更加耗时等问题，日益激发需要开发新的针对小目标有效的特征金字塔，我们基于原本的PAFPN上进行改进，提出SmallObjectEnhancePyramid，相对于传统的添加P2检测层，我们使用P2特征层经过SPDConv得到富含小目标信息的特征给到P3进行融合，然后使用CSP思想和基于[AAAI2024的OmniKernel](https://ojs.aaai.org/index.php/AAAI/article/view/27907)进行改进得到CSP-OmniKernel进行特征整合，OmniKernel模块由三个分支组成，即三个分支，即全局分支、大分支和局部分支、以有效地学习从全局到局部的特征表征，最终从而提高小目标的检测性能。(该模块需要在train.py中关闭amp、且在ultralytics/engine/validator.py 115行附近的self.args.half设置为False、跑其余改进记得修改回去！)\n    出现这个报错的:RuntimeError: cuFFT error: CUFFT_INTERNAL_ERROR,如果你是40系显卡,需要更新torch大于2.0，并且cuda大于12.0.\n\n18. ultralytics/cfg/models/v8/yolov8-CGRFPN.yaml\n\n    Context-Guided Spatial Feature Reconstruction Feature Pyramid Network.\n    1. 借鉴[ECCV2024-CGRSeg](https://github.com/nizhenliang/CGRSeg)中的Rectangular Self-Calibration Module经过精心设计,用于空间特征重建和金字塔上下文提取,它在水平和垂直方向上捕获全局上下文，并获得轴向全局上下文来显式地建模矩形关键区域.\n    2. PyramidContextExtraction Module使用金字塔上下文提取模块（PyramidContextExtraction），有效整合不同层级的特征信息，提升模型的上下文感知能力。\n    3. FuseBlockMulti 和 DynamicInterpolationFusion 这些模块用于多尺度特征的融合，通过动态插值和多特征融合，进一步提高了模型的多尺度特征表示能力和提升模型对复杂背景下目标的识别能力。\n\n19. ultralytics/cfg/models/v8/yolov8-FeaturePyramidSharedConv.yaml\n\n    1. 多尺度特征提取\n        通过使用不同膨胀率的卷积层，模块能够提取不同尺度的特征。这对捕捉图像中不同大小和不同上下文的信息非常有利。\n        低膨胀率捕捉局部细节，高膨胀率捕捉全局上下文。\n    2. 参数共享\n        使用共享的卷积层 self.share_conv，大大减少了需要训练的参数数量。相比于每个膨胀率使用独立的卷积层，共享卷积层能够减少冗余，提升模型效率。\n        减少了模型的存储和计算开销，提升了计算效率。\n    3. 高效的通道变换\n        通过1x1卷积层 self.cv1 和 self.cv2，模块能够高效地调整通道数，并进行特征融合。1x1卷积层在减少参数量的同时还能保留重要的特征信息。\n    4. 更细粒度的特征提取\n        FeaturePyramidSharedConv 使用卷积操作进行特征提取，能够捕捉更加细粒度的特征。相比之下，SPPF 的池化操作可能会丢失一些细节信息。\n        卷积操作在特征提取时具有更高的灵活性和表达能力，可以更好地捕捉图像中的细节和复杂模式。\n\n20. APT(Adaptive Power Transformation)-TAL.\n\n    为了使不同gt预测对的匹配质量和损失权重更具鉴别性，我们通过自定义的PowerTransformer显著增强高质量预测框的权重，抑制低质量预测框的影响，并使模型在学习的过程可以更关注质量高的预测框。\n\n21. ultralytics/cfg/models/v8/yolov8-EMBSFPN.yaml\n\n    基于BIFPN、[MAF-YOLO](https://arxiv.org/pdf/2407.04381)、[CVPR2024 EMCAD](https://github.com/SLDGroup/EMCAD)提出全新的Efficient Multi-Branch&Scale FPN.\n    Efficient Multi-Branch&Scale FPN拥有<轻量化>、<多尺度特征加权融合>、<多尺度高效卷积模块>、<高效上采样模块>、<全局异构核选择机制>。\n    1. 具有多尺度高效卷积模块和全局异构核选择机制，Trident网络的研究表明，具有较大感受野的网络更适合检测较大的物体，反之，较小尺度的目标则从较小的感受野中受益，因此我们在FPN阶段，对于不同尺度的特征层选择不同的多尺度卷积核以适应并逐步获得多尺度感知场信息。\n    2. 借鉴BIFPN中的多尺度特征加权融合，能把Concat换成Add来减少参数量和计算量的情况下，还能通过不同尺度特征的重要性进行自适用选择加权融合。\n    3. 高效上采样模块来源于CVPR2024-EMCAD中的EUCB，能够在保证一定效果的同时保持高效性。\n\n22. ultralytics/cfg/models/v8/yolov8-CSP-PMSFA.yaml\n\n    自研模块:CSP-Partial Multi-Scale Feature Aggregation.\n    1. 部分多尺度特征提取：参考CVPR2020-GhostNet、CVPR2024-FasterNet的思想，采用高效的PartialConv，该模块能够从输入中提取多种尺度的特征信息，但它并不是在所有通道上进行这种操作，而是部分（Partial）地进行，从而提高了计算效率。\n    2. 增强的特征融合: 最后的 1x1 卷积层通过将不同尺度的特征融合在一起，同时使用残差连接将输入特征与处理后的特征相加，有效保留了原始信息并引入了新的多尺度信息，从而提高模型的表达能力。\n\n23. ultralytics/cfg/models/v8/yolov8-MutilBackbone-DAF.yaml\n\n    自研MutilBackbone-DynamicAlignFusion.\n    1. 为了避免在浅层特征图上消耗过多计算资源，设计的MutilBackbone共享一个stem的信息，这个设计有利于避免计算量过大，推理时间过大的问题。\n    2. 为了避免不同Backbone信息融合出现不同来源特征之间的空间差异，我们为此设计了DynamicAlignFusion，其先通过融合来自两个不同模块学习到的特征，然后生成一个名为DynamicAlignWeight去调整各自的特征，最后使用一个可学习的通道权重，其可以根据输入特征动态调整两条路径的权重，从而增强模型对不同特征的适应能力。\n\n24. Rep Shared Convolutional Detection Head\n\n    自研重参数轻量化检测头.\n    detect:ultralytics/cfg/models/v8/yolov8-RSCD.yaml\n    seg:ultralytics/cfg/models/v8/yolov8-seg-RSCD.yaml\n    pose:ultralytics/cfg/models/v8/yolov8-pose-RSCD.yaml\n    obb:ultralytics/cfg/models/v8/yolov8-obb-RSCD.yaml\n    1. 通过使用共享卷积，可以大幅减少参数数量，这使得模型更轻便，特别是在资源受限的设备上.但由于共享参数可能限制模型的表达能力，因为不同特征可能需要不同的卷积核来捕捉复杂的模式。共享参数可能无法充分捕捉这些差异。为了尽量弥补实现轻量化所采取的共享卷积带来的负面影响，我们使用可重参数化卷积，通过引入更多的可学习参数，网络可以更有效地从数据中提取特征，进而弥补轻量化模型后可能带来的精度丢失问题，并且重参数化卷积可以大大提升参数利用率，并且在推理阶段与普通卷积无差，为模型带来无损的优化方案。\n    2. 在使用共享卷积的同时，为了应对每个检测头所检测的目标尺度不一致的问题，使用Scale层对特征进行缩放.\n\n25. ultralytics/cfg/models/v8/yolov8-CSP-FreqSpatial.yaml\n\n    FreqSpatial 是一个融合时域和频域特征的卷积神经网络（CNN）模块。该模块通过在时域和频域中提取特征，旨在捕捉不同层次的空间和频率信息，以增强模型在处理图像数据时的鲁棒性和表示能力。模块的主要特点是将 Scharr 算子（用于边缘检测）与 时域卷积 和 频域卷积 结合，通过多种视角捕获图像的结构特征。\n    1. 时域特征提取：从原始图像中提取出基于空间结构的特征，主要捕捉图像的细节、边缘信息等。\n    2. 频域特征提取：从频率域中提取出频率相关的模式，捕捉到图像的低频和高频成分，能够帮助模型在全局和局部的尺度上提取信息。\n    3. 特征融合：将时域和频域的特征进行加权相加，得到最终的输出特征图。这种加权融合允许模型同时考虑空间结构信息和频率信息，从而增强模型在多种场景下的表现能力。\n\n26. ultralytics/cfg/models/v8/yolov8-C2f-MutilScaleEdgeInformationSelect.yaml\n\n    基于自研CSP-MutilScaleEdgeInformationEnhance再次创新.\n    我们提出了一个 多尺度边缘信息选择模块（MutilScaleEdgeInformationSelect），其目的是从多尺度边缘信息中高效选择与目标任务高度相关的关键特征。为了实现这一目标，我们引入了一个具有通过聚焦更重要的区域能力的注意力机制[ICCV2023 DualDomainSelectionMechanism, DSM](https://github.com/c-yn/FocalNet)。该机制通过聚焦图像中更重要的区域（如复杂边缘和高频信号区域），在多尺度特征中自适应地筛选具有更高任务相关性的特征，从而显著提升了特征选择的精准度和整体模型性能。\n\n27. GlobalEdgeInformationTransfer\n\n    实现版本1：ultralytics/cfg/models/v8/yolov8-GlobalEdgeInformationTransfer1.yaml\n    实现版本2：ultralytics/cfg/models/v8/yolov8-GlobalEdgeInformationTransfer2.yaml\n    实现版本3：ultralytics/cfg/models/v8/yolov8-GlobalEdgeInformationTransfer3.yaml\n    总所周知，物体框的定位非常之依赖物体的边缘信息，但是对于常规的目标检测网络来说，没有任何组件能提高网络对物体边缘信息的关注度，我们需要开发一个能让边缘信息融合到各个尺度所提取的特征中，因此我们提出一个名为GlobalEdgeInformationTransfer(GEIT)的模块，其可以帮助我们把浅层特征中提取到的边缘信息传递到整个backbone上，并与不同尺度的特征进行融合。\n    1. 由于原始图像中含有大量背景信息，因此从原始图像上直接提取边缘信息传递到整个backbone上会给网络的学习带来噪声，而且浅层的卷积层会帮助我们过滤不必要的背景信息，因此我们选择在网络的浅层开发一个名为MutilScaleEdgeInfoGenetator的模块，其会利用网络的浅层特征层去生成多个尺度的边缘信息特征图并投放到主干的各个尺度中进行融合。\n    2. 对于下采样方面的选择，我们需要较为谨慎，我们的目标是保留并增强边缘信息，同时进行下采样，选择MaxPool 会更合适。它能够保留局部区域的最强特征，更好地体现边缘信息。因为 AvgPool 更适用于需要平滑或均匀化特征的场景，但在保留细节和边缘信息方面的表现不如 MaxPool。\n    3. 对于融合部分，ConvEdgeFusion巧妙地结合边缘信息和普通卷积特征，提出了一种新的跨通道特征融合方式。首先，使用conv_channel_fusion进行边缘信息与普通卷积特征的跨通道融合，帮助模型更好地整合不同来源的特征。然后采用conv_3x3_feature_extract进一步提取融合后的特征，以增强模型对局部细节的捕捉能力。最后通过conv_1x1调整输出特征维度。\n\n28. ultralytics/cfg/models/v8/yolov8-C2f-DIMB.yaml\n\n    自研模块DynamicInceptionDWConv2d.(详细请看项目内配置文件.md)\n\n29. ultralytics/cfg/models/v8/yolov8-HAFB-1.yaml\n    \n    自研Hierarchical Attention Fusion Block.(详细请看项目内配置文件.md)\n\n30. ultralytics/cfg/models/v8/yolov8-HAFB-2.yaml\n\n    HAFB另外一种使用方法.\n\n31. ultralytics/cfg/models/v8/yolov8-MutilBackbone-HAFB.yaml\n    \n    yolov8-MutilBackbone-DAF.yaml基础上用上HAFB.\n\n### BackBone系列\n1. ultralytics/cfg/models/v8/yolov8-efficientViT.yaml\n    \n    (CVPR2023)efficientViT替换yolov8主干.\n2. ultralytics/cfg/models/v8/yolov8-fasternet.yaml\n\n    (CVPR2023)fasternet替换yolov8主干.\n3. ultralytics/cfg/models/v8/yolov8-timm.yaml\n\n    使用timm支持的主干网络替换yolov8主干.\n\n4. ultralytics/cfg/models/v8/yolov8-convnextv2.yaml\n\n    使用convnextv2网络替换yolov8主干.\n5. ultralytics/cfg/models/v8/yolov8-EfficientFormerV2.yaml\n\n    使用EfficientFormerV2网络替换yolov8主干.(需要看[常见错误和解决方案的第五点](#a))  \n6. ultralytics/cfg/models/v8/yolov8-vanillanet.yaml\n\n    vanillanet替换yolov8主干.\n7. ultralytics/cfg/models/v8/yolov8-LSKNet.yaml\n\n    LSKNet(2023旋转目标检测SOTA的主干)替换yolov8主干.\n8. ultralytics/cfg/models/v8/yolov8-swintransformer.yaml\n\n    SwinTransformer-Tiny替换yolov8主干.\n9. ultralytics/cfg/models/v8/yolov8-repvit.yaml\n\n    [RepViT](https://github.com/THU-MIG/RepViT/tree/main)替换yolov8主干.\n10. ultralytics/cfg/models/v8/yolov8-CSwinTransformer.yaml\n\n    使用[CSWin-Transformer(CVPR2022)](https://github.com/microsoft/CSWin-Transformer/tree/main)替换yolov8主干.(需要看[常见错误和解决方案的第五点](#a))\n11. ultralytics/cfg/models/v8/yolov8-HGNetV2.yaml\n\n    使用HGNetV2作为YOLOV8的backbone.\n12. ultralytics/cfg/models/v8/yolov8-unireplknet.yaml\n\n    使用[UniRepLKNet](https://github.com/AILab-CVC/UniRepLKNet/tree/main)替换yolov8主干.\n13. ultralytics/cfg/models/v8/yolov8-TransNeXt.yaml\n\n    使用[TransNeXt](https://github.com/DaiShiResearch/TransNeXt)改进yolov8的backbone.(需要看[常见错误和解决方案的第五点](#a))   \n14. ultralytics/cfg/models/rt-detr/yolov8-rmt.yaml\n\n    使用[CVPR2024 RMT](https://arxiv.org/abs/2309.11523)改进rtdetr的主干.\n15. ultralytics/cfg/models/v8/yolov8-pkinet.yaml\n\n    使用[CVPR2024 PKINet](https://github.com/PKINet/PKINet)改进backbone.(需要安装mmcv和mmengine)\n16. ultralytics/cfg/models/v8/yolov8-mobilenetv4.yaml\n\n    使用[MobileNetV4](https://github.com/jaiwei98/MobileNetV4-pytorch/tree/main)改进yolov8-backbone.\n17. ultralytics/cfg/models/v8/yolov8-starnet.yaml\n\n    使用[StarNet CVPR2024](https://github.com/ma-xu/Rewrite-the-Stars/tree/main)改进yolov8-backbone.\n18. ultralytics/cfg/models/v8/yolov8-mambaout.yaml\n     \n    使用[CVPR2025 MambaOut](https://github.com/yuweihao/MambaOut)中的MambaOut替换BackBone.\n19. ultralytics/cfg/models/v8/yolov8-lsnet.yaml\n\n    使用[CVPR2025 LSNet](https://github.com/THU-MIG/lsnet)中的lsnet替换yolov8的backbone.\n20. ultralytics/cfg/models/v8/yolov8-overlock.yaml\n\n    使用[CVPR2025 OverLock](https://arxiv.org/pdf/2502.20087)中的overlock-backbone替换backbone.\n\n### SPPF系列\n1. ultralytics/cfg/models/v8/yolov8-FocalModulation.yaml\n\n    使用[Focal Modulation](https://github.com/microsoft/FocalNet)替换SPPF.\n2. ultralytics/cfg/models/v8/yolov8-SPPF-LSKA.yaml\n\n    使用[LSKA](https://github.com/StevenLauHKHK/Large-Separable-Kernel-Attention)注意力机制改进SPPF,增强多尺度特征提取能力.\n3. ultralytics/cfg/models/v8/yolov8-AIFI.yaml\n\n    使用[RT-DETR](https://arxiv.org/pdf/2304.08069.pdf)中的Attention-based Intrascale Feature Interaction(AIFI)改进yolov8.\n4. ultralytics/cfg/models/v8/yolov8-AIFIRepBN.yaml\n\n    使用[ICML-2024 SLAB](https://github.com/xinghaochen/SLAB)中的RepBN改进AIFI.\n5. ultralytics/cfg/models/v8/yolov8-ASSR.yaml\n     \n    使用[CVPR2025 MambaIR](https://github.com/csguoh/MambaIR)中的Attentive State Space Group改进yolov8.\n\n### Neck系列\n1. ultralytics/cfg/models/v8/yolov8-bifpn.yaml\n\n    添加BIFPN到yolov8中.  \n    其中BIFPN中有三个可选参数：\n    1. Fusion  \n        其中BIFPN中的Fusion模块支持五种: weight, adaptive, concat, bifpn(default), SDI  \n        其中weight, adaptive, concat出自[paper链接-Figure 3](https://openreview.net/pdf?id=q2ZaVU6bEsT), SDI出自[U-NetV2](https://github.com/yaoppeng/U-Net_v2)\n    2. node_mode  \n        支持大部分C2f-XXX结构.\n    3. head_channel  \n        BIFPN中的通道数,默认设置为256.\n2. ultralytics/cfg/models/v8/yolov8-slimneck.yaml\n\n    使用VoVGSCSP\\VoVGSCSPC和GSConv替换yolov8 neck中的C2f和Conv.\n3. Asymptotic Feature Pyramid Network[reference](https://github.com/gyyang23/AFPN/tree/master)\n\n    a. ultralytics/cfg/models/v8/yolov8-AFPN-P345.yaml  \n    b. ultralytics/cfg/models/v8/yolov8-AFPN-P345-Custom.yaml  \n    c. ultralytics/cfg/models/v8/yolov8-AFPN-P2345.yaml  \n    d. ultralytics/cfg/models/v8/yolov8-AFPN-P2345-Custom.yaml  \n    其中Custom中的block支持大部分C2f-XXX结构.\n4. ultralytics/cfg/models/v8/yolov8-RCSOSA.yaml\n\n    使用[RCS-YOLO](https://github.com/mkang315/RCS-YOLO/tree/main)中的RCSOSA替换C2f.\n5. ultralytics/cfg/models/v8/yolov8-goldyolo.yaml\n\n    利用华为2023最新GOLD-YOLO中的Gatherand-Distribute进行改进特征融合模块\n6. ultralytics/cfg/models/v8/yolov8-GFPN.yaml\n\n    使用[DAMO-YOLO](https://github.com/tinyvision/DAMO-YOLO)中的RepGFPN改进Neck.\n7. ultralytics/cfg/models/v8/yolov8-EfficientRepBiPAN.yaml\n\n    使用[YOLOV6](https://github.com/meituan/YOLOv6/tree/main)中的EfficientRepBiPAN改进Neck.\n8. ultralytics/cfg/models/v8/yolov8-ASF.yaml\n\n    使用[ASF-YOLO](https://github.com/mkang315/ASF-YOLO)中的Attentional Scale Sequence Fusion改进yolov8.\n9. ultralytics/cfg/models/v8/yolov8-SDI.yaml\n\n    使用[U-NetV2](https://github.com/yaoppeng/U-Net_v2)中的 Semantics and Detail Infusion Module对yolov8中的feature fusion部分进行重设计.\n10. ultralytics/cfg/models/v8/yolov8-HSFPN.yaml\n\n    使用[MFDS-DETR](https://github.com/JustlfC03/MFDS-DETR)中的HS-FPN改进yolov8的neck.\n11. ultralytics/cfg/models/v8/yolov8-CSFCN.yaml\n\n    使用[Context and Spatial Feature Calibration for Real-Time Semantic Segmentation](https://github.com/kaigelee/CSFCN/tree/main)中的Context and Spatial Feature Calibration模块改进yolov8.\n12. ultralytics/cfg/models/v8/yolov8-CGAFusion.yaml\n\n    使用[DEA-Net](https://github.com/cecret3350/DEA-Net)中的content-guided attention fusion改进yolov8-neck.\n13. ultralytics/cfg/models/v8/yolov8-SDFM.yaml\n\n    使用[PSFusion](https://github.com/Linfeng-Tang/PSFusion)中的superficial detail fusion module改进yolov8-neck.\n\n14. ultralytics/cfg/models/v8/yolov8-PSFM.yaml\n\n    使用[PSFusion](https://github.com/Linfeng-Tang/PSFusion)中的profound semantic fusion module改进yolov8-neck.\n\n15. ultralytics/cfg/models/v8/yolov8-GLSA.yaml\n\n    使用[GLSA](https://github.com/Barrett-python/DuAT)模块改进yolov8的neck.\n\n16. ultralytics/cfg/models/v8/yolov8-CTrans.yaml\n\n    使用[[AAAI2022] UCTransNet](https://github.com/McGregorWwww/UCTransNet/tree/main)中的ChannelTransformer改进yolov8-neck.(需要看[常见错误和解决方案的第五点](#a))  \n\n17. ultralytics/cfg/models/v8/yolov8-p6-CTrans.yaml\n\n    使用[[AAAI2022] UCTransNet](https://github.com/McGregorWwww/UCTransNet/tree/main)中的ChannelTransformer改进yolov8-neck.(带有p6版本)(需要看[常见错误和解决方案的第五点](#a))  \n\n18. ultralytics/cfg/models/v8/yolov8-MAFPN.yaml\n\n    使用[MAF-YOLO](https://arxiv.org/pdf/2407.04381)的MAFPN改进Neck.\n\n19. Cross-Layer Feature Pyramid Transformer.   \n\n    P345:ultralytics/cfg/models/v8/yolov8-CFPT.yaml\n    P2345:ultralytics/cfg/models/v8/yolov8-CFPT-P2345.yaml\n    P3456:ultralytics/cfg/models/v8/yolov8-CFPT-P3456.yaml\n    P23456:ultralytics/cfg/models/v8/yolov8-CFPT-P23456.yaml\n\n    使用[CFPT](https://github.com/duzw9311/CFPT/tree/main)改进neck.\n\n20. ultralytics/cfg/models/v8/yolov8-hyper.yaml\n\n    使用[Hyper-YOLO TPAMI2025](https://www.arxiv.org/pdf/2408.04804)中的Hypergraph Computation in Semantic Space改进yolov8.\n\n21. ultralytics/cfg/models/v8/yolov8-msga.yaml\n\n    使用[MSA^2 Net](https://github.com/xmindflow/MSA-2Net)中的Multi-Scale Adaptive Spatial Attention Gate改进yolov8-neck.\n\n22. ultralytics/cfg/models/v8/yolov8-WFU.yaml\n\n    使用[ACMMM2024 WFEN](https://github.com/PRIS-CV/WFEN)中的Wavelet Feature Upgrade改进yolov8-neck.\n\n23. ultralytics/cfg/models/v8/yolov8-mscafsa.yaml\n\n    使用[BIBM2024 Spatial-Frequency Dual Domain Attention Network For Medical Image Segmentation](https://github.com/nkicsl/SF-UNet)的Frequency-Spatial Attention和Multi-scale Progressive Channel Attention改进yolov8-neck.\n\n24. ultralytics/cfg/models/v8/yolov8-fsa.yaml\n\n    使用[BIBM2024 Spatial-Frequency Dual Domain Attention Network For Medical Image Segmentation](https://github.com/nkicsl/SF-UNet)的Frequency-Spatial Attention改进yolov8.\n\n25. ultralytics/cfg/models/v8/yolov8-MFM.yaml\n\n    使用[CVPR2024 DCMPNet](https://github.com/zhoushen1/DCMPNet)中的MFM改进neck.\n\n26. ultralytics/cfg/models/v8/yolov8-GDSAFusion.yaml\n\n    使用[CVPR2025 OverLock](https://arxiv.org/pdf/2502.20087)中的GDSAFusion改进neck.\n\n27. ultralytics/cfg/models/v8/yolov8-RFPN.yaml\n\n    使用[ECCV2024 rethinking-fpn](https://github.com/AlanLi1997/rethinking-fpn)的SNI和GSConvE改进YOLOV8-neck.\n\n28. ultralytics/cfg/models/v8/yolov8-PST.yaml\n\n    使用[Pyramid Sparse Transformer](https://arxiv.org/abs/2505.12772)中的Pyramid Sparse Transformer改进neck.\n\n29. ultralytics/cfg/models/v8/yolov8-HS-FPN.yaml\n\n    使用[AAAI2025 HS-FPN](https://github.com/ShiZican/HS-FPN/tree/main)中的HFP和SDP改进yolo-neck.\n\n30. ultralytics/cfg/models/v8/yolov8-LCA.yaml\n\n    使用[CVPR2025 HVI](https://arxiv.org/pdf/2502.20272)中的LCA改进yolov8-neck.\n\n31. ultralytics/cfg/models/v8/yolov8-HFFE.yaml\n\n    使用[TGRS2025 HAFNet](https://ieeexplore.ieee.org/document/11154006)中的HFFE改进yolov8-neck.\n\n### Head系列\n1. ultralytics/cfg/models/v8/yolov8-dyhead.yaml\n\n    添加基于注意力机制的目标检测头到yolov8中.\n2. ultralytics/cfg/models/v8/yolov8-EfficientHead.yaml\n\n    对检测头进行重设计,支持10种轻量化检测头.详细请看ultralytics/nn/extra_modules/head.py中的Detect_Efficient class.\n3. ultralytics/cfg/models/v8/yolov8-aux.yaml\n\n    参考YOLOV7-Aux对YOLOV8添加额外辅助训练头,在训练阶段参与训练,在最终推理阶段去掉.  \n    其中辅助训练头的损失权重系数可在ultralytics/utils/loss.py中的class v8DetectionLoss中的__init__函数中的self.aux_loss_ratio设定,默认值参考yolov7为0.25.\n4. ultralytics/cfg/models/v8/yolov8-seg-EfficientHead.yaml(实例分割)\n\n    对检测头进行重设计,支持10种轻量化检测头.详细请看ultralytics/nn/extra_modules/head.py中的Detect_Efficient class. \n5. ultralytics/cfg/models/v8/yolov8-SEAMHead.yaml\n\n    使用[YOLO-Face V2](https://arxiv.org/pdf/2208.02019v2.pdf)中的遮挡感知注意力改进Head,使其有效地处理遮挡场景.\n6. ultralytics/cfg/models/v8/yolov8-MultiSEAMHead.yaml\n\n    使用[YOLO-Face V2](https://arxiv.org/pdf/2208.02019v2.pdf)中的遮挡感知注意力改进Head,使其有效地处理遮挡场景.\n7. ultralytics/cfg/models/v8/yolov8-PGI.yaml\n\n    使用[YOLOV9](https://github.com/WongKinYiu/yolov9)的programmable gradient information改进YOLOV8.(PGI模块可在训练结束后去掉)\n8. Lightweight Asymmetric Detection Head\n\n    detect:ultralytics/cfg/models/v8/yolov8-LADH.yaml\n    segment:ultralytics/cfg/models/v8/yolov8-seg-LADH.yaml\n    pose:ultralytics/cfg/models/v8/yolov8-pose-LADH.yaml\n    obb:ultralytics/cfg/models/v8/yolov8-obb-LADH.yaml\n    使用[Faster and Lightweight: An Improved YOLOv5 Object Detector for Remote Sensing Images](https://www.mdpi.com/2072-4292/15/20/4974)中的Lightweight Asymmetric Detection Head改进yolov8-head.\n9. Localization Quality Estimation Head\n\n    此模块出自[GFocalV2](https://arxiv.org/abs/2011.12885).\n    detect:ultralytics/cfg/models/v8/yolov8-LQEHead.yaml\n    segmet:ultralytics/cfg/models/v8/yolov8-seg-LQE.yaml\n    pose:ultralytics/cfg/models/v8/yolov8-pose-LQE.yaml\n    obb:ultralytics/cfg/models/v8/yolov8-obb-LQE.yaml\n\n### Label Assign系列\n1. Adaptive Training Sample Selection匹配策略.\n\n    在ultralytics/utils/loss.py中的class v8DetectionLoss中自行选择对应的self.assigner即可.\n\n### PostProcess系列\n1. soft-nms(IoU,GIoU,DIoU,CIoU,EIoU,SIoU,ShapeIoU)\n\n    soft-nms替换nms.(建议:仅在val.py时候使用,具体替换请看20240122版本更新说明)\n\n2. ultralytics/cfg/models/v8/yolov8-nmsfree.yaml\n\n    仿照yolov10的思想采用双重标签分配和一致匹配度量进行训练,后处理不需要NMS!\n\n### 上下采样算子\n1. ultralytics/cfg/models/v8/yolov8-ContextGuidedDown.yaml\n\n    使用[CGNet](https://github.com/wutianyiRosun/CGNet/tree/master)中的Light-weight Context Guided DownSample进行下采样.\n2. ultralytics/cfg/models/v8/yolov8-SPDConv.yaml\n\n    使用[SPDConv](https://github.com/LabSAINT/SPD-Conv/tree/main)进行下采样.\n3. ultralytics/cfg/models/v8/yolov8-dysample.yaml\n\n    使用[ICCV2023 DySample](https://arxiv.org/abs/2308.15085)改进yolov8-neck中的上采样.\n\n4. ultralytics/cfg/models/v8/yolov8-CARAFE.yaml\n\n    使用[ICCV2019 CARAFE](https://arxiv.org/abs/1905.02188)改进yolov8-neck中的上采样.\n\n5. ultralytics/cfg/models/v8/yolov8-HWD.yaml\n\n    使用[Haar wavelet downsampling](https://www.sciencedirect.com/science/article/abs/pii/S0031320323005174)改进yolov8的下采样.(请关闭AMP情况下使用)\n\n6. ultralytics/cfg/models/v8/yolov8-v7DS.yaml\n\n    使用[YOLOV7 CVPR2023](https://arxiv.org/abs/2207.02696)的下采样结构改进YOLOV8中的下采样.\n\n7. ultralytics/cfg/models/v8/yolov8-ADown.yaml\n\n    使用[YOLOV9](https://github.com/WongKinYiu/yolov9)的下采样结构改进YOLOV8中的下采样.\n\n8. ultralytics/cfg/models/v8/yolov8-SRFD.yaml\n\n    使用[A Robust Feature Downsampling Module for Remote Sensing Visual Tasks](https://ieeexplore.ieee.org/document/10142024)改进yolov8的下采样.\n\n9. ultralytics/cfg/models/v8/yolov8-WaveletPool.yaml\n\n    使用[Wavelet Pooling](https://openreview.net/forum?id=rkhlb8lCZ)改进YOLOV8的上采样和下采样。\n\n10. ultralytics/cfg/models/v8/yolov8-LDConv.yaml\n\n    使用[LDConv](https://github.com/CV-ZhangXin/LDConv/tree/main)改进下采样.\n\n11. ultralytics/cfg/models/v8/yolov8-PSConv.yaml\n\n    使用[AAAI2025 Pinwheel-shaped Convolution and Scale-based Dynamic Loss for Infrared Small Target Detection](https://github.com/JN-Yang/PConv-SDloss-Data)中的Pinwheel-shaped Convolution改进yolov8.\n\n12. ultralytics/cfg/models/v8/yolov8-EUCB.yaml\n\n    使用[CVPR2024 EMCAD](https://github.com/SLDGroup/EMCAD)中的EUCB改进yolov8的上采样.\n\n13. ultralytics/cfg/models/v8/yolov8-LoGStem.yaml\n\n    使用[LEGNet](https://github.com/lwCVer/LEGNet)中的LoGStem改进Stem(第一第二层卷积).\n\n14. ultralytics/cfg/models/v8/yolov8-FourierConv.yaml\n\n    使用[MIA2025 Fourier Convolution Block with global receptive field for MRI reconstruction](https://www.sciencedirect.com/science/article/abs/pii/S1361841524002743)中的FourierConv改进Conv.\n\n15. ultralytics/cfg/models/v8/yolov8-GCConv.yaml\n\n    使用[CVPR2025 Golden Cudgel Network](https://github.com/gyyang23/GCNet)中的GCConv改进下采样.\n\n16. ultralytics/cfg/models/v8/yolov8-RepStem.yaml\n\n    使用[ICCV2023 FastVit](https://arxiv.org/pdf/2303.14189)中的RepStem改进yolov8下采样.\n\n### YOLOV8-C2f系列\n1. ultralytics/cfg/models/v8/yolov8-C2f-Faster.yaml\n\n    使用C2f-Faster替换C2f.(使用FasterNet中的FasterBlock替换C2f中的Bottleneck)\n2. ultralytics/cfg/models/v8/yolov8-C2f-ODConv.yaml\n\n    使用C2f-ODConv替换C2f.(使用ODConv替换C2f中的Bottleneck中的Conv)\n3. ultralytics/cfg/models/v8/yolov8-C2f-ODConv.yaml\n\n    使用C2f-ODConv替换C2f.(使用ODConv替换C2f中的Bottleneck中的Conv)\n4. ultralytics/cfg/models/v8/yolov8-C2f-Faster-EMA.yaml\n\n    使用C2f-Faster-EMA替换C2f.(C2f-Faster-EMA推荐可以放在主干上,Neck和head部分可以选择C2f-Faster)\n5. ultralytics/cfg/models/v8/yolov8-C2f-DBB.yaml\n\n    使用C2f-DBB替换C2f.(使用DiverseBranchBlock替换C2f中的Bottleneck中的Conv)\n6. ultralytics/cfg/models/v8/yolov8-C2f-CloAtt.yaml\n\n    使用C2f-CloAtt替换C2f.(使用CloFormer中的具有全局和局部特征的注意力机制添加到C2f中的Bottleneck中)(需要看[常见错误和解决方案的第五点](#a))\n7. ultralytics/cfg/models/v8/yolov8-C2f-SCConv.yaml\n\n    SCConv(CVPR2020 http://mftp.mmcheng.net/Papers/20cvprSCNet.pdf)与C2f融合.\n8. ultralytics/cfg/models/v8/yolov8-C2f-SCcConv.yaml\n\n    ScConv(CVPR2023 https://openaccess.thecvf.com/content/CVPR2023/papers/Li_SCConv_Spatial_and_Channel_Reconstruction_Convolution_for_Feature_Redundancy_CVPR_2023_paper.pdf)与C2f融合.  \n    (取名为SCcConv的原因是在windows下命名是不区分大小写的)\n9. ultralytics/cfg/models/v8/yolov8-KernelWarehouse.yaml\n    \n    使用[Towards Parameter-Efficient Dynamic Convolution](https://github.com/OSVAI/KernelWarehouse)添加到yolov8中.  \n    使用此模块需要注意,在epoch0-20的时候精度会非常低,过了20epoch会正常.\n10. ultralytics/cfg/models/v8/yolov8-C2f-DySnakeConv.yaml\n\n    [DySnakeConv](https://github.com/YaoleiQi/DSCNet)与C2f融合.\n11. ultralytics/cfg/models/v8/yolov8-C2f-DCNV2.yaml\n\n    使用C2f-DCNV2替换C2f.(DCNV2为可变形卷积V2)\n12. ultralytics/cfg/models/v8/yolov8-C2f-DCNV3.yaml\n\n    使用C2f-DCNV3替换C2f.([DCNV3](https://github.com/OpenGVLab/InternImage)为可变形卷积V3(CVPR2023,众多排行榜的SOTA))  \n    官方中包含了一些指定版本的DCNV3 whl包,下载后直接pip install xxx即可.具体和安装DCNV3可看百度云链接中的视频.\n13. ultralytics/cfg/models/v8/yolov8-C2f-OREPA.yaml\n\n    使用C2f-OREPA替换C2f.[Online Convolutional Re-parameterization (CVPR2022)](https://github.com/JUGGHM/OREPA_CVPR2022/tree/main)\n14. ultralytics/cfg/models/v8/yolov8-C2f-REPVGGOREPA.yaml\n\n    使用C2f-REPVGGOREPA替换C2f.[Online Convolutional Re-parameterization (CVPR2022)](https://github.com/JUGGHM/OREPA_CVPR2022/tree/main)\n15. ultralytics/cfg/models/v8/yolov8-C2f-DCNV4.yaml\n\n    使用[DCNV4](https://github.com/OpenGVLab/DCNv4)改进C2f.(请关闭AMP进行训练,使用教程请看20240116版本更新说明)\n16. ultralytics/cfg/models/v8/yolov8-C2f-ContextGuided.yaml\n\n    使用[CGNet](https://github.com/wutianyiRosun/CGNet/tree/master)中的Light-weight Context Guided改进C2f.\n17. ultralytics/cfg/models/v8/yolov8-C2f-MSBlock.yaml\n\n    使用[YOLO-MS](https://github.com/FishAndWasabi/YOLO-MS/tree/main)中的MSBlock改进C2f.\n18. ultralytics/cfg/models/v8/yolov8-C2f-DLKA.yaml\n\n    使用[deformableLKA](https://github.com/xmindflow/deformableLKA)改进C2f.\n19. ultralytics/cfg/models/v8/yolov8-C2f-DAttention.yaml\n\n    使用[Vision Transformer with Deformable Attention(CVPR2022)](https://github.com/LeapLabTHU/DAT)改进C2f.(需要看[常见错误和解决方案的第五点](#a))  \n    使用注意点请看百度云视频.(DAttention(Vision Transformer with Deformable Attention CVPR2022)使用注意说明.)\n20. 使用[ParC-Net](https://github.com/hkzhang-git/ParC-Net/tree/main)中的ParC_Operator改进C2f.(需要看[常见错误和解决方案的第五点](#a))  \n    使用注意点请看百度云视频.(20231031更新说明)    \n21. ultralytics/cfg/models/v8/yolov8-C2f-DWR.yaml\n\n    使用[DWRSeg](https://arxiv.org/abs/2212.01173)中的Dilation-wise Residual(DWR)模块,加强从网络高层的可扩展感受野中提取特征.\n22. ultralytics/cfg/models/v8/yolov8-C2f-RFAConv.yaml\n\n    使用[RFAConv](https://github.com/Liuchen1997/RFAConv/tree/main)中的RFAConv改进yolov8.\n\n23. ultralytics/cfg/models/v8/yolov8-C2f-RFCBAMConv.yaml\n\n    使用[RFAConv](https://github.com/Liuchen1997/RFAConv/tree/main)中的RFCBAMConv改进yolov8.\n\n24. ultralytics/cfg/models/v8/yolov8-C2f-RFCAConv.yaml\n\n    使用[RFAConv](https://github.com/Liuchen1997/RFAConv/tree/main)中的RFCAConv改进yolov8.\n25. ultralytics/cfg/models/v8/yolov8-C2f-FocusedLinearAttention.yaml\n\n    使用[FLatten Transformer(ICCV2023)](https://github.com/LeapLabTHU/FLatten-Transformer)中的FocusedLinearAttention改进C2f.(需要看[常见错误和解决方案的第五点](#a))    \n    使用注意点请看百度云视频.(20231114版本更新说明.)\n26. ultralytics/cfg/models/v8/yolov8-C2f-MLCA.yaml\n\n    使用[Mixed Local Channel Attention 2023](https://github.com/wandahangFY/MLCA/tree/master)改进C2f.(用法请看百度云视频-20231129版本更新说明)\n\n27. ultralytics/cfg/models/v8/yolov8-C2f-AKConv.yaml\n\n    使用[AKConv 2023](https://github.com/CV-ZhangXin/AKConv)改进C2f.(用法请看百度云视频-20231129版本更新说明)\n28. ultralytics/cfg/models/v8/yolov8-C2f-UniRepLKNetBlock.yaml\n\n    使用[UniRepLKNet](https://github.com/AILab-CVC/UniRepLKNet/tree/main)中的UniRepLKNetBlock改进C2f.\n29. ultralytics/cfg/models/v8/yolov8-C2f-DRB.yaml\n\n    使用[UniRepLKNet](https://github.com/AILab-CVC/UniRepLKNet/tree/main)中的DilatedReparamBlock改进C2f.\n30. ultralytics/cfg/models/v8/yolov8-C2f-AggregatedAtt.yaml\n\n    使用[TransNeXt](https://github.com/DaiShiResearch/TransNeXt)中的聚合感知注意力改进C2f.(需要看[常见错误和解决方案的第五点](#a))   \n\n31. ultralytics/cfg/models/v8/yolov8-C2f-SWC.yaml\n\n    使用[shift-wise conv](https://arxiv.org/abs/2401.12736)改进yolov8中的C2f.\n\n32. ultralytics/cfg/models/v8/yolov8-C2f-iRMB.yaml\n\n    使用[EMO ICCV2023](https://github.com/zhangzjn/EMO)中的iRMB改进C2f.\n\n33. ultralytics/cfg/models/v8/yolov8-C2f-VSS.yaml\n\n    使用最新的Mamba架构[Mamba-UNet中的VSS](https://github.com/ziyangwang007/Mamba-UNet)对C2f中的BottleNeck进行改进,使其能更有效地捕获图像中的复杂细节和更广泛的语义上下文.\n\n34. ultralytics/cfg/models/v8/yolov8-C2f-LVMB.yaml\n\n    使用最新的Mamba架构[Mamba-UNet中的VSS](https://github.com/ziyangwang007/Mamba-UNet)与Cross Stage Partial进行结合,使其能更有效地捕获图像中的复杂细节和更广泛的语义上下文.\n\n35. ultralytics/cfg/models/v8/yolov8-RepNCSPELAN.yaml\n\n    使用[YOLOV9](https://github.com/WongKinYiu/yolov9)中的RepNCSPELAN进行改进yolov8.\n\n36. ultralytics/cfg/models/v8/yolov8-C2f-DynamicConv.yaml\n\n    使用[CVPR2024 parameternet](https://arxiv.org/pdf/2306.14525v2.pdf)中的DynamicConv改进C2f.\n\n37. ultralytics/cfg/models/v8/yolov8-C2f-GhostDynamicConv.yaml\n\n    使用[CVPR2024 parameternet](https://arxiv.org/pdf/2306.14525v2.pdf)中的GhostModule改进C2f.\n\n38. ultralytics/cfg/models/v8/yolov8-C2f-RVB.yaml\n\n    使用[CVPR2024 RepViT](https://github.com/THU-MIG/RepViT/tree/main)中的RepViTBlock改进C2f.\n\n39. ultralytics/cfg/models/v8/yolov8-DGCST.yaml\n\n    使用[Lightweight Object Detection](https://arxiv.org/abs/2403.01736)中的Dynamic Group Convolution Shuffle Transformer改进yolov8.\n\n40. ultralytics/cfg/models/v8/yolov8-C2f-RetBlock.yaml\n\n    使用[CVPR2024 RMT](https://arxiv.org/abs/2309.11523)中的RetBlock改进C2f.\n\n41. ultralytics/cfg/models/v8/yolov8-C2f-PKI.yaml\n\n    使用[CVPR2024 PKINet](https://github.com/PKINet/PKINet)中的PKIModule和CAA模块改进C2f.\n\n42. ultralytics/cfg/models/v8/yolov8-RepNCSPELAN_CAA.yaml\n\n    使用[CVPR2024 PKINet](https://github.com/PKINet/PKINet)中的CAA模块改进RepNCSPELAN.\n\n43. ultralytics/cfg/models/v8/yolov8-C2f-fadc.yaml\n\n    使用[CVPR2024 Frequency-Adaptive Dilated Convolution](https://github.com/Linwei-Chen/FADC)改进C2f.\n\n44. ultralytics/cfg/models/v8/yolov8-C2f-PPA.yaml\n\n    使用[HCFNet](https://github.com/zhengshuchen/HCFNet)中的Parallelized Patch-Aware Attention Module改进C2f.\n\n45. ultralytics/cfg/models/v8/yolov8-C2f-Star.yaml\n\n    使用[StarNet CVPR2024](https://github.com/ma-xu/Rewrite-the-Stars/tree/main)中的StarBlock改进C2f.\n\n46. ultralytics/cfg/models/v8/yolov8-C2f-KAN.yaml\n\n    KAN In! Mamba Out! Kolmogorov-Arnold Networks.\n    目前支持:\n    1. FastKANConv2DLayer\n    2. KANConv2DLayer\n    3. KALNConv2DLayer\n    4. KACNConv2DLayer\n    5. KAGNConv2DLayer\n\n47. ultralytics/cfg/models/v8/yolov8-C2f-DEConv.yaml\n\n    使用[DEA-Net](https://github.com/cecret3350/DEA-Net)中的detail-enhanced convolution改进C2f.\n\n48. ultralytics/cfg/models/v8/yolov8-C2f-Heat.yaml\n\n    使用[vHeat](https://github.com/MzeroMiko/vHeat/tree/main)中的HeatBlock改进C2f.\n\n49. ultralytics/cfg/models/v8/yolov8-C2f-WTConv.yaml\n\n    使用[ECCV2024 Wavelet Convolutions for Large Receptive Fields](https://github.com/BGU-CS-VIL/WTConv)中的WTConv改进C2f-BottleNeck.\n\n50. ultralytics/cfg/models/v8/yolov8-C2f-FMB.yaml\n\n    使用[ECCV2024 SMFANet](https://github.com/Zheng-MJ/SMFANet/tree/main)的Feature Modulation block改进C2f.\n\n51. ultralytics/cfg/models/v8/yolov8-C2f-gConv.yaml\n\n    使用[Rethinking Performance Gains in Image Dehazing Networks](https://arxiv.org/abs/2209.11448)的gConvblock改进C2f.\n\n52. ultralytics/cfg/models/v8/yolov8-C2f-WDBB.yaml\n\n    使用[YOLO-MIF](https://github.com/wandahangFY/YOLO-MIF)中的WDBB改进c2f.\n\n53. ultralytics/cfg/models/v8/yolov8-C2f-DeepDBB.yaml\n\n    使用[YOLO-MIF](https://github.com/wandahangFY/YOLO-MIF)中的DeepDBB改进c2f.\n\n54. ultralytics/cfg/models/v8/yolov8-C2f-AdditiveBlock.yaml\n\n    使用[CAS-ViT](https://github.com/Tianfang-Zhang/CAS-ViT)中的AdditiveBlock改进c2f.\n\n55. ultralytics/cfg/models/v8/yolov8-C2f-MogaBlock.yaml\n\n    使用[MogaNet ICLR2024](https://github.com/Westlake-AI/MogaNet)中的MogaBlock改进C2f.\n\n56. ultralytics/cfg/models/v8/yolov8-C2f-IdentityFormer.yaml\n\n    使用[Metaformer TPAMI2024](https://github.com/sail-sg/metaformer)中的IdentityFormer改进c2f.\n\n57. ultralytics/cfg/models/v8/yolov8-C2f-RandomMixing.yaml\n\n    使用[Metaformer TPAMI2024](https://github.com/sail-sg/metaformer)中的RandomMixingFormer改进c2f.(需要看[常见错误和解决方案的第五点](#a))\n\n58. ultralytics/cfg/models/v8/yolov8-C2f-PoolingFormer.yaml\n\n    使用[Metaformer TPAMI2024](https://github.com/sail-sg/metaformer)中的PoolingFormer改进c2f.\n\n59. ultralytics/cfg/models/v8/yolov8-C2f-ConvFormer.yaml\n\n    使用[Metaformer TPAMI2024](https://github.com/sail-sg/metaformer)中的ConvFormer改进c2f.\n\n60. ultralytics/cfg/models/v8/yolov8-C2f-CaFormer.yaml\n\n    使用[Metaformer TPAMI2024](https://github.com/sail-sg/metaformer)中的CaFormer改进c2f.\n\n61. ultralytics/cfg/models/v8/yolov8-C2f-SFHF.yaml\n\n    使用[SFHformer ECCV2024](https://github.com/deng-ai-lab/SFHformer)中的block改进C2f.\n\n62. ultralytics/cfg/models/v8/yolov8-C2f-MSM.yaml\n\n    使用[Revitalizing Convolutional Network for Image Restoration TPAMI2024](https://zhuanlan.zhihu.com/p/720777160)中的MSM改进C2f.\n\n63. ultralytics/cfg/models/v8/yolov8-C2f-RAB.yaml\n\n    使用[Pattern Recognition 2024|DRANet](https://github.com/WenCongWu/DRANet)中的HDRAB(hybrid dilated residual attention block)改进C2f.\n\n64. ultralytics/cfg/models/v8/yolov8-C2f-HDRAB.yaml\n\n    使用[Pattern Recognition 2024|DRANet](https://github.com/WenCongWu/DRANet)中的RAB( residual attention block)改进C2f.\n\n65. ultralytics/cfg/models/v8/yolov8n-C2f-LFE.yaml\n\n    使用[Efficient Long-Range Attention Network for Image Super-resolution ECCV2022](https://github.com/xindongzhang/ELAN)中的Local feature extraction改进C2f.\n\n66. ultralytics/cfg/models/v8/yolov8-C2f-SFA.yaml\n\n    使用[FreqFormer](https://github.com/JPWang-CS/FreqFormer)的Frequency-aware Cascade Attention-SFA改进C2f.\n\n67. ultralytics/cfg/models/v8/yolov8-C2f-CTA.yaml\n\n    使用[FreqFormer](https://github.com/JPWang-CS/FreqFormer)的Frequency-aware Cascade Attention-CTA改进C2f.\n\n68. ultralytics/cfg/models/v8/yolov8-C2f-CAMixer.yaml\n\n    使用[CAMixerSR CVPR2024](https://github.com/icandle/CAMixerSR)中的CAMixer改进C2f.\n\n69. ultralytics/cfg/models/v8/yolov8-MAN.yaml\n\n    使用[Hyper-YOLO TPAMI2025](https://www.arxiv.org/pdf/2408.04804)中的Mixed Aggregation Network改进yolov8.\n\n70. ultralytics/cfg/models/v8/yolov8-C2f-HFERB.yaml\n\n    使用[ICCV2023 CRAFT-SR](https://github.com/AVC2-UESTC/CRAFT-SR)中的high-frequency enhancement residual block改进C2f.\n\n71. ultralytics/cfg/models/v8/yolov8-C2f-DTAB.yaml\n\n    使用[AAAI2025 TBSN](https://github.com/nagejacob/TBSN)中的DTAB改进C2f.\n\n72. ultralytics/cfg/models/v8/yolov8-C2f-JDPM.yaml\n\n    使用[ECCV2024 FSEL](https://github.com/CSYSI/FSEL)中的joint domain perception module改进C2f.\n\n73. ultralytics/cfg/models/v8/yolov8-C2f-ETB.yaml\n\n    使用[ECCV2024 FSEL](https://github.com/CSYSI/FSEL)中的entanglement transformer block改进C2f.\n\n74. ultralytics/cfg/models/v8/yolov8-C2f-AP.yaml\n\n    使用[AAAI2025 Pinwheel-shaped Convolution and Scale-based Dynamic Loss for Infrared Small Target Detection](https://github.com/JN-Yang/PConv-SDloss-Data)中的Asymmetric Padding bottleneck改进C2f.\n\n75. ultralytics/cfg/models/v8/yolov8-C2f-Strip.yaml\n\n    使用[Strip R-CNN](https://arxiv.org/pdf/2501.03775)中的StripBlock改进C2f.\n\n76. ultralytics/cfg/models/v8/yolov8-C2f-Kat.yaml\n\n    使用[ICLR2025 Kolmogorov-Arnold Transformer](https://github.com/Adamdad/kat)中的KAT改进C2f.\n\n77. ultralytics/cfg/models/v8/yolov8-C2f-GlobalFilter.yaml\n\n    使用[T-PAMI Global Filter Networks for Image Classification](https://github.com/raoyongming/GFNet)中的GlobalFilterBlock和[TransNeXt CVPR2024](https://github.com/DaiShiResearch/TransNeXt)中的Convolutional GLU改进C2f.\n\n78. ultralytics/cfg/models/v8/yolov8-C2f-DynamicFilter.yaml\n\n    使用[AAAI2024 FFT-Based Dynamic Token Mixer for Vision](https://github.com/okojoalg/dfformer)中的DynamicFilter改进C2f.\n\n79. ultralytics/cfg/models/v8/yolov8-RepHMS.yaml\n    \n    使用[MHAF-YOLO](https://github.com/yang-0201/MHAF-YOLO)中的RepHMS改进yolov8.\n\n80. ultralytics/cfg/models/v8/yolov8-C2f-SAVSS.yaml\n\n    使用[CVPR2025 SCSegamba](https://github.com/Karl1109/SCSegamba)中的Structure-Aware Scanning Strategy改进C2f.\n\n81. ultralytics/cfg/models/v8/yolov8-C2f-mambaout.yaml\n     \n     使用[CVPR2025 MambaOut](https://github.com/yuweihao/MambaOut)中的MambaOutBlock改进C2f.\n\n82. ultralytics/cfg/models/v8/yolov8-C2f-EfficientVIM.yaml\n\n    使用[CVPR2025 EfficientViM](https://github.com/mlvlab/EfficientViM)中的EfficientViMBlock改进C2f.\n\n83. ultralytics/cfg/models/v8/yolov8-C2f-LEGM.yaml\n\n    使用[CVPR2024 DCMPNet](https://github.com/zhoushen1/DCMPNet)中的LEGM改进C2f.\n\n84. ultralytics/cfg/models/v8/yolov8-C2f-LSBlock.yaml\n\n    使用[CVPR2025 LSNet](https://github.com/THU-MIG/lsnet)中的LSBlock改进C2f.\n\n85. ultralytics/cfg/models/v8/yolov8-C2f-LFEM.yaml\n\n    使用[LEGNet](https://github.com/lwCVer/LEGNet)中的LFEModule改进C2f.\n\n86. ultralytics/cfg/models/v8/yolov8-C2f-RCB.yaml\n\n    使用[CVPR2025 OverLock](https://arxiv.org/pdf/2502.20087)中的RepConvBlock改进C2f.\n\n87. ultralytics/cfg/models/v8/yolov8-C2f-TransMamba.yaml\n\n    使用[TransMamba](https://github.com/sunshangquan/TransMamba)的TransMamba改进C2f\n\n88. ultralytics/cfg/models/v8/yolov8-C2f-EVS.yaml\n\n    使用[CVPR2025 EVSSM](https://github.com/kkkls/EVSSM)中的EVS改进C2f\n\n89. ultralytics/cfg/models/v8/yolov8-C2f-EBlock.yaml\n\n    使用[CVPR2025 DarkIR](https://github.com/cidautai/DarkIR)中的EBlock改进C2f.\n\n90. ultralytics/cfg/models/v8/yolov8-C2f-DBlock.yaml\n\n    使用[CVPR2025 DarkIR](https://github.com/cidautai/DarkIR)中的DBlock改进C2f.\n\n91. ultralytics/cfg/models/v8/yolov8-C2f-SFSConv.yaml\n\n    使用[CVPR2024 SFSConv](https://github.com/like413/SFS-Conv)的SFSConv改进C2f.\n\n92. ultralytics/cfg/models/v8/yolov8-FCM.yaml\n\n    使用[AAAI2025 FBRT-YOLO](https://github.com/galaxy-oss/FCM)的模块改进yolov8.\n\n93. ultralytics/cfg/models/v8/yolov8-C2f-GroupMamba.yaml\n\n    使用[CVPR2025 GroupMamba](https://github.com/Amshaker/GroupMamba)中的GroupMambaBlock改进C2f.\n\n94. ultralytics/cfg/models/v8/yolov8-C2f-MambaVision.yaml\n\n    使用[CVPR2025 MambaVision](https://github.com/NVlabs/MambaVision)中的MambaVision改进C2f.\n\n95. ultralytics/cfg/models/v8/yolov8-C2f-FourierConv.yaml\n\n    使用[MIA2025 Fourier Convolution Block with global receptive field for MRI reconstruction](https://www.sciencedirect.com/science/article/abs/pii/S1361841524002743)中的FourierConv改进C2f.\n\n96. ultralytics/cfg/models/v8/yolov8-C2f-GLVSS.yaml\n\n    使用[TGRS2025 UMFormer](https://github.com/takeyoutime/UMFormer)中的GLVSS改进C2f.\n\n97. ultralytics/cfg/models/v8/yolov8-C2f-ESC.yaml\n\n    使用[ICCV2025 ESC: Emulating Self-attention with Convolution for Efficient Image Super-Resolution](https://github.com/dslisleedh/ESC)中的ESC改进C2f.\n\n98. ultralytics/cfg/models/v8/yolov8-C2f-ConvAttn.yaml\n\n    使用[ICCV2025 ESC: Emulating Self-attention with Convolution for Efficient Image Super-Resolution](https://github.com/dslisleedh/ESC)中的ConvAttn改进C2f.\n\n99. ultralytics/cfg/models/v8/yolov8-C2f-UniConv.yaml\n\n    使用[ICCV2025 UniConvBlock](https://github.com/ai-paperwithcode/UniConvNet)中的UniConvBlock改进C2f.\n\n100. ultralytics/cfg/models/v8/yolov8-C2f-GCConv.yaml\n\n    使用[CVPR2025 Golden Cudgel Network](https://github.com/gyyang23/GCNet)中的GCConv改进C2f.\n\n101. ultralytics/cfg/models/v8/yolov8-C2f-CFBlock.yaml\n\n    使用[AAAI2024 SCTNet](https://arxiv.org/pdf/2312.17071)中的CFBlock改进C2f.\n\n102. ultralytics/cfg/models/v8/yolov8-C2f-CSSC.yaml\n\n    使用[TGRS2025 ASCNet](https://ieeexplore.ieee.org/document/10855453)中的CSSC改进C2f.\n\n103. ultralytics/cfg/models/v8/yolov8-C2f-CNCM.yaml\n\n    使用[TGRS2025 ASCNet](https://ieeexplore.ieee.org/document/10855453)中的CNCM改进C2f.\n\n104. ultralytics/cfg/models/v8/yolov8-C2f-HFRB.yaml\n\n    使用[ICCV2025 HFRB](https://arxiv.org/pdf/2507.10689)中的HFRB改进C2f.\n\n105. ultralytics/cfg/models/v8/yolov8-C2f-EVA.yaml\n\n    使用[ICIP2025 BEVANET](https://arxiv.org/pdf/2508.07300)中的EVA改进C2f.\n\n106. ultralytics/cfg/models/v8/yolov8-C2f-RMBC.yaml\n\n    使用[PlainUSR](https://arxiv.org/pdf/2409.13435)中的RepMBConv改进C2f.\n\n107. ultralytics/cfg/models/v8/yolov8-C2f-RMBC-LA.yaml\n\n    使用[PlainUSR](https://arxiv.org/pdf/2409.13435)中的RepMBConv和Local Importance-based Attention改进C2f.\n\n108. ultralytics/cfg/models/v8/yolov8-C2f-IEL.yaml\n\n    使用[CVPR2025 HVI](https://arxiv.org/pdf/2502.20272)中的IEL改进C2f.\n\n### 组合系列\n1. ultralytics/cfg/models/v8/yolov8-fasternet-bifpn.yaml\n\n    fasternet与bifpn的结合.  \n    其中BIFPN中有三个可选参数：\n    1. Fusion  \n        其中BIFPN中的Fusion模块支持五种: weight, adaptive, concat, bifpn(default), SDI  \n        其中weight, adaptive, concat出自[paper链接-Figure 3](https://openreview.net/pdf?id=q2ZaVU6bEsT), SDI出自[U-NetV2](https://github.com/yaoppeng/U-Net_v2)\n    2. node_mode  \n        其中目前(后续会更新喔)支持这些[结构](#b)\n    3. head_channel  \n        BIFPN中的通道数,默认设置为256.\n\n2. ultralytics/cfg/models/v8/yolov8-ELA-HSFPN-TADDH.yaml\n\n    使用[Efficient Local Attention](https://arxiv.org/abs/2403.01123)改进HSFPN,使用自研动态动态对齐检测头改进Head.\n\n3. ultralytics/cfg/models/v8/yolov8-FDPN-TADDH.yaml\n\n    自研结构的融合.\n    1. 自研特征聚焦扩散金字塔网络(Focusing Diffusion Pyramid Network)\n    2. 自研任务对齐动态检测头(Task Align Dynamic Detection Head)\n\n4. ultralytics/cfg/models/v8/yolov8-starnet-C2f-Star-LSCD.yaml\n\n    轻量化模型组合.\n    1. CVPR2024-StarNet Backbone.\n    2. C2f-Star.\n    3. Lightweight Shared Convolutional Detection Head.\n\n## YOLOV10系列\n#### 以下配置文件都基于v10n，如果需要使用其他大小的模型(s,m,b,l,x)可以看项目视频百度云链接-YOLOV10模型大小切换教程.\n\n### 二次创新系列\n1. SlideLoss and EMASlideLoss.[Yolo-Face V2](https://github.com/Krasjet-Yu/YOLO-FaceV2/blob/master/utils/loss.py)\n\n    在ultralytics/utils/loss.py中的class v8DetectionLoss进行设定.\n\n2. ultralytics/cfg/models/v10/yolov10n-RevCol.yaml\n\n    使用[(ICLR2023)Reversible Column Networks](https://github.com/megvii-research/RevCol)对yolov10主干进行重设计,里面的支持更换不同的C2f-Block.\n\n3. ultralytics/cfg/models/v10/yolov10n-BIMAFPN.yaml\n\n    利用BIFPN的思想对[MAF-YOLO](https://arxiv.org/pdf/2407.04381)的MAFPN进行二次改进得到BIMAFPN.\n\n4. ultralytics/cfg/models/v10/yolov10n-C2f-AdditiveBlock-CGLU.yaml\n\n    使用[CAS-ViT](https://github.com/Tianfang-Zhang/CAS-ViT)中的AdditiveBlock和[TransNeXt CVPR2024](https://github.com/DaiShiResearch/TransNeXt)中的Convolutional GLU改进c2f.\n\n5. ultralytics/cfg/models/v10/yolov10n-ASF-P2.yaml\n\n    在ultralytics/cfg/models/v8/yolov8-ASF.yaml的基础上进行二次创新，引入P2检测层并对网络结构进行优化.\n\n6. ultralytics/cfg/models/v10/yolov10n-ASF-DySample.yaml\n\n    使用[ASF-YOLO](https://github.com/mkang315/ASF-YOLO)中的Attentional Scale Sequence Fusion与[ICCV2023 DySample](https://arxiv.org/abs/2308.15085)组合得到Dynamic Sample Attentional Scale Sequence Fusion.\n\n7. ultralytics/cfg/models/v10/yolov10n-goldyolo-asf.yaml\n\n    利用华为2023最新GOLD-YOLO中的Gatherand-Distribute与[ASF-YOLO](https://github.com/mkang315/ASF-YOLO)中的Attentional Scale Sequence Fusion进行二次创新改进yolov10的neck.\n\n8. ultralytics/cfg/models/v10/yolov10n-C2f-MSMHSA-CGLU.yaml\n\n    使用[CMTFNet](https://github.com/DrWuHonglin/CMTFNet/tree/main)中的M2SA和[TransNeXt CVPR2024](https://github.com/DaiShiResearch/TransNeXt)中的Convolutional GLU改进c2f.\n\n9. ultralytics/cfg/models/v10/yolov10n-C2f-IdentityFormer-CGLU.yaml\n\n    使用[Metaformer TPAMI2024](https://github.com/sail-sg/metaformer)中的IdentityFormer和[TransNeXt CVPR2024](https://github.com/DaiShiResearch/TransNeXt)中的CGLU改进c2f.\n\n10. ultralytics/cfg/models/v10/yolov10n-C2f-RandomMixing-CGLU.yaml\n\n    使用[Metaformer TPAMI2024](https://github.com/sail-sg/metaformer)中的RandomMixing和[TransNeXt CVPR2024](https://github.com/DaiShiResearch/TransNeXt)中的CGLU改进c2f.\n\n11. ultralytics/cfg/models/v10/yolov10n-C2f-PoolingFormer-CGLU.yaml\n\n    使用[Metaformer TPAMI2024](https://github.com/sail-sg/metaformer)中的PoolingFormer和[TransNeXt CVPR2024](https://github.com/DaiShiResearch/TransNeXt)中的CGLU改进c2f.\n\n12. ultralytics/cfg/models/v10/yolov10n-C2f-ConvFormer-CGLU.yaml\n\n    使用[Metaformer TPAMI2024](https://github.com/sail-sg/metaformer)中的ConvFormer和[TransNeXt CVPR2024](https://github.com/DaiShiResearch/TransNeXt)中的CGLU改进c2f.\n\n13. ultralytics/cfg/models/v10/yolov10n-C2f-CaFormer-CGLU.yaml\n\n    使用[Metaformer TPAMI2024](https://github.com/sail-sg/metaformer)中的CaFormer和[TransNeXt CVPR2024](https://github.com/DaiShiResearch/TransNeXt)中的CGLU改进c2f.\n\n14. ultralytics/cfg/models/v10/yolov10n-dyhead-DCNV3.yaml\n\n    使用[DCNV3](https://github.com/OpenGVLab/InternImage)替换DyHead中的DCNV2.\n\n15. ultralytics/cfg/models/v10/yolov10n-dyhead-DCNV4.yaml\n\n    使用[DCNV4](https://github.com/OpenGVLab/DCNv4)对DyHead进行二次创新.\n\n16. ultralytics/cfg/models/v10/yolov10n-C2f-iRMB-Cascaded.yaml\n\n    使用[EfficientViT CVPR2023](https://github.com/microsoft/Cream/tree/main/EfficientViT)中的CascadedGroupAttention对[EMO ICCV2023](https://github.com/zhangzjn/EMO)中的iRMB进行二次创新来改进C2f.\n\n17. ultralytics/cfg/models/v10/yolov10n-C2f-iRMB-DRB.yaml\n\n    使用[UniRepLKNet](https://github.com/AILab-CVC/UniRepLKNet/tree/main)中的DilatedReparamBlock对[EMO ICCV2023](https://github.com/zhangzjn/EMO)中的iRMB进行二次创新来改进C2f.\n\n18. ultralytics/cfg/models/v10/yolov10n-C2f-iRMB-SWC.yaml\n\n    使用[shift-wise conv](https://arxiv.org/abs/2401.12736)对[EMO ICCV2023](https://github.com/zhangzjn/EMO)中的iRMB进行二次创新来改进C2f.\n\n19. ultralytics/cfg/models/v10/yolov10n-ELA-HSFPN.yaml\n\n    使用[Efficient Local Attention](https://arxiv.org/abs/2403.01123)改进HSFPN.\n\n20. ultralytics/cfg/models/v10/yolov10n-CA-HSFPN.yaml\n\n    使用[Coordinate Attention CVPR2021](https://github.com/houqb/CoordAttention)改进HSFPN.\n\n21. ultralytics/cfg/models/v10/yolov10n-CAA-HSFPN.yaml\n\n    使用[CVPR2024 PKINet](https://github.com/PKINet/PKINet)中的CAA模块HSFPN.\n\n22. ultralytics/cfg/models/v10/yolov10n-MAN-Faster.yaml\n\n    使用[Hyper-YOLO](https://www.arxiv.org/pdf/2408.04804)中的 Mixed Aggregation Network和[FasterNet CVPR2023](https://github.com/JierunChen/FasterNet)中的Faster-Block进行二次创新改进yolov10.\n\n23. ultralytics/cfg/models/v10/yolov10n-MAN-FasterCGLU.yaml\n\n    使用[Hyper-YOLO](https://www.arxiv.org/pdf/2408.04804)中的 Mixed Aggregation Network和[FasterNet CVPR2023](https://github.com/JierunChen/FasterNet)中的Faster-Block和[TransNeXt CVPR2024](https://github.com/DaiShiResearch/TransNeXt)中的Convolutional GLU进行二次创新改进yolov10.\n\n24. ultralytics/cfg/models/v10/yolov10n-MAN-Star.yaml\n\n    使用[Hyper-YOLO](https://www.arxiv.org/pdf/2408.04804)中的 Mixed Aggregation Network和[StarNet CVPR2024](https://github.com/ma-xu/Rewrite-the-Stars/tree/main)中的StarBlock进行二次创新改进yolov10.\n\n25. ultralytics/cfg/models/v10/yolov10n-MutilBackbone-MSGA.yaml\n\n    使用[MSA^2 Net](https://github.com/xmindflow/MSA-2Net)中的Multi-Scale Adaptive Spatial Attention Gate对自研系列MutilBackbone再次创新.\n\n26. ultralytics/cfg/models/v10/yolov10n-slimneck-WFU.yaml\n\n    使用[ACMMM2024 WFEN](https://github.com/PRIS-CV/WFEN)中的Wavelet Feature Upgrade对slimneck二次创新.\n\n27. ultralytics/cfg/models/v10/yolov10n-MAN-FasterCGLU-WFU.yaml\n\n    使用[ACMMM2024 WFEN](https://github.com/PRIS-CV/WFEN)中的Wavelet Feature Upgrade和[Hyper-YOLO](https://www.arxiv.org/pdf/2408.04804)中的 Mixed Aggregation Network和[FasterNet CVPR2023](https://github.com/JierunChen/FasterNet)中的Faster-Block和[TransNeXt CVPR2024](https://github.com/DaiShiResearch/TransNeXt)中的Convolutional GLU进行二次创新改进yolov10.\n\n28. ultralytics/cfg/models/v10/yolov10n-CDFA.yaml\n\n    使用[ACMMM2024 WFEN](https://github.com/PRIS-CV/WFEN)中的WaveletConv与[AAAI2025 ConDSeg](https://github.com/Mengqi-Lei/ConDSeg)的ContrastDrivenFeatureAggregation结合改进yolov10.\n\n29. ultralytics/cfg/models/v10/yolov10n-C2f-StripCGLU.yaml\n\n    使用[Strip R-CNN](https://arxiv.org/pdf/2501.03775)中的StripBlock和[TransNeXt CVPR2024](https://github.com/DaiShiResearch/TransNeXt)中的Convolutional GLU改进C2f.\n\n30. ultralytics/cfg/models/v10/yolov10n-C2f-Faster-KAN.yaml\n\n    使用[ICLR2025 Kolmogorov-Arnold Transformer](https://github.com/Adamdad/kat)中的KAN对(CVPR2023)fasternet中的FastetBlock进行二次创新.\n\n31. ultralytics/cfg/models/v10/yolov10n-C2f-DIMB-KAN.yaml\n\n    在yolov10n-C2f-DIMB.yaml的基础上把mlp模块换成[ICLR2025 Kolmogorov-Arnold Transformer](https://github.com/Adamdad/kat)中的KAN.\n\n32. ultralytics/cfg/models/v10/yolov10n-C2f-EfficientVIM-CGLU.yaml\n\n    使用[CVPR2025 EfficientViM](https://github.com/mlvlab/EfficientViM)中的EfficientViMBlock和[TransNeXt CVPR2024](https://github.com/DaiShiResearch/TransNeXt)中的Convolutional GLU改进C2f.\n\n33. ultralytics/cfg/models/v10/yolov10n-LSCD-LQE.yaml\n\n    Localization Quality Estimation Head-LSCD-NMSFree,Localization Quality Estimation此模块出自[GFocalV2](https://arxiv.org/abs/2011.12885).\n\n34. ultralytics/cfg/models/v10/yolov10n-EUCB-SC.yaml\n\n    使用[CVPR2024 EMCAD](https://github.com/SLDGroup/EMCAD)中的EUCB和[CVPR2025 BHViT](https://github.com/IMRL/BHViT)中的ShiftChannelMix改进yolov10的上采样.\n\n35. ultralytics/cfg/models/v10/yolov10n-EMBSFPN-SC.yaml\n\n    在ultralytics/cfg/models/v10/yolov10n-EMBSFPN.yaml方案上引入[CVPR2025 BHViT](https://github.com/IMRL/BHViT)中的ShiftChannelMix.\n\n36. ultralytics/cfg/models/v10/yolov10n-MFMMAFPN.yaml\n\n    使用[CVPR2024 DCMPNet](https://github.com/zhoushen1/DCMPNet)中的MFM对[MAF-YOLO](https://arxiv.org/pdf/2407.04381)的MAFPN进行二次创新.\n\n37. ultralytics/cfg/models/v10/yolov10n-MBSMFFPN.yaml\n\n    使用[CVPR2024 DCMPNet](https://github.com/zhoushen1/DCMPNet)中的MFM对yolov10n-EMBSFPN.yaml再次创新 Multi-Branch&Scale Modulation-Fusion FPN.\n\n38. ultralytics/cfg/models/v10/yolov10n-C2f-mambaout-LSConv.yaml\n\n    使用[CVPR2025 LSNet](https://github.com/THU-MIG/lsnet)的LSConv与[CVPR2025 MambaOut](https://github.com/yuweihao/MambaOut)中的MambaOutBlock二次创新后改进C2f.\n\n39. ultralytics/cfg/models/v10/yolov10n-SOEP-RFPN-MFM.yaml\n\n    使用[ECCV2024 rethinking-fpn](https://github.com/AlanLi1997/rethinking-fpn)的SNI和GSConvE和[CVPR2024 DCMPNet](https://github.com/zhoushen1/DCMPNet)中的MFM对原创改进SOEP再次创新.\n\n40. ultralytics/cfg/models/v10/yolov10n-SOEP-PST.yaml\n\n    使用[Pyramid Sparse Transformer](https://arxiv.org/abs/2505.12772)中的Pyramid Sparse Transformer改进SOEP.\n\n41. ultralytics/cfg/models/v10/yolov10n-MAN-GCConv.yaml\n\n    使用[CVPR2025 Golden Cudgel Network](https://github.com/gyyang23/GCNet)中的GCConv改进[Hyper-YOLO TPAMI2025](https://www.arxiv.org/pdf/2408.04804)中的Mixed Aggregation Network.\n\n### 自研系列\n\n1. ultralytics/cfg/models/v10/yolov10n-C2f-EMSC.yaml\n\n    Efficient Multi-Scale Conv.自研模块,具体讲解请看百度云链接中的视频.\n\n2. ultralytics/cfg/models/v10/yolov10n-C2f-EMSCP.yaml\n\n    Efficient Multi-Scale Conv Plus.自研模块,具体讲解请看百度云链接中的视频.\n\n3. ultralytics/cfg/models/v10/yolov10n-LAWDS.yaml\n\n    Light Adaptive-weight downsampling.自研模块,具体讲解请看百度云链接中的视频.\n\n4. ultralytics/cfg/models/v10/yolov10n-LSCD.yaml\n\n    自研轻量化检测头.(Lightweight Shared Convolutional Detection Head)\n    1. GroupNorm在FCOS论文中已经证实可以提升检测头定位和分类的性能.\n    2. 通过使用共享卷积，可以大幅减少参数数量，这使得模型更轻便，特别是在资源受限的设备上.\n    3. 在使用共享卷积的同时，为了应对每个检测头所检测的目标尺度不一致的问题，使用Scale层对特征进行缩放.\n    综合以上，我们可以让检测头做到参数量更少、计算量更少的情况下，尽可能减少精度的损失.\n\n5. ultralytics/cfg/models/v10/yolov10n-CGRFPN.yaml\n\n    Context-Guided Spatial Feature Reconstruction Feature Pyramid Network.\n    1. 借鉴[ECCV2024-CGRSeg](https://github.com/nizhenliang/CGRSeg)中的Rectangular Self-Calibration Module经过精心设计,用于空间特征重建和金字塔上下文提取,它在水平和垂直方向上捕获全局上下文，并获得轴向全局上下文来显式地建模矩形关键区域.\n    2. PyramidContextExtraction Module使用金字塔上下文提取模块（PyramidContextExtraction），有效整合不同层级的特征信息，提升模型的上下文感知能力。\n    3. FuseBlockMulti 和 DynamicInterpolationFusion 这些模块用于多尺度特征的融合，通过动态插值和多特征融合，进一步提高了模型的多尺度特征表示能力和提升模型对复杂背景下目标的识别能力。\n\n6. ultralytics/cfg/models/v10/yolov10n-FeaturePyramidSharedConv.yaml\n\n    1. 多尺度特征提取\n        通过使用不同膨胀率的卷积层，模块能够提取不同尺度的特征。这对捕捉图像中不同大小和不同上下文的信息非常有利。\n        低膨胀率捕捉局部细节，高膨胀率捕捉全局上下文。\n    2. 参数共享\n        使用共享的卷积层 self.share_conv，大大减少了需要训练的参数数量。相比于每个膨胀率使用独立的卷积层，共享卷积层能够减少冗余，提升模型效率。\n        减少了模型的存储和计算开销，提升了计算效率。\n    3. 高效的通道变换\n        通过1x1卷积层 self.cv1 和 self.cv2，模块能够高效地调整通道数，并进行特征融合。1x1卷积层在减少参数量的同时还能保留重要的特征信息。\n    4. 更细粒度的特征提取\n        FeaturePyramidSharedConv 使用卷积操作进行特征提取，能够捕捉更加细粒度的特征。相比之下，SPPF 的池化操作可能会丢失一些细节信息。\n        卷积操作在特征提取时具有更高的灵活性和表达能力，可以更好地捕捉图像中的细节和复杂模式。\n\n7. APT(Adaptive Power Transformation)-TAL.\n\n    为了使不同gt预测对的匹配质量和损失权重更具鉴别性，我们通过自定义的PowerTransformer显著增强高质量预测框的权重，抑制低质量预测框的影响，并使模型在学习的过程可以更关注质量高的预测框。\n\n8. ultralytics/cfg/models/v10/yolov10n-SOEP.yaml \n\n    小目标在正常的P3、P4、P5检测层上略显吃力，比较传统的做法是加上P2检测层来提升小目标的检测能力，但是同时也会带来一系列的问题，例如加上P2检测层后计算量过大、后处理更加耗时等问题，日益激发需要开发新的针对小目标有效的特征金字塔，我们基于原本的PAFPN上进行改进，提出SmallObjectEnhancePyramid，相对于传统的添加P2检测层，我们使用P2特征层经过SPDConv得到富含小目标信息的特征给到P3进行融合，然后使用CSP思想和基于[AAAI2024的OmniKernel](https://ojs.aaai.org/index.php/AAAI/article/view/27907)进行改进得到CSP-OmniKernel进行特征整合，OmniKernel模块由三个分支组成，即三个分支，即全局分支、大分支和局部分支、以有效地学习从全局到局部的特征表征，最终从而提高小目标的检测性能。\n\n9. ultralytics/cfg/models/v10/yolov10n-EMBSFPN.yaml\n\n    基于BIFPN、[MAF-YOLO](https://arxiv.org/pdf/2407.04381)、[CVPR2024 EMCAD](https://github.com/SLDGroup/EMCAD)提出全新的Efficient Multi-Branch&Scale FPN.\n    Efficient Multi-Branch&Scale FPN拥有<轻量化>、<多尺度特征加权融合>、<多尺度高效卷积模块>、<高效上采样模块>、<全局异构核选择机制>。\n    1. 具有多尺度高效卷积模块和全局异构核选择机制，Trident网络的研究表明，具有较大感受野的网络更适合检测较大的物体，反之，较小尺度的目标则从较小的感受野中受益，因此我们在FPN阶段，对于不同尺度的特征层选择不同的多尺度卷积核以适应并逐步获得多尺度感知场信息。\n    2. 借鉴BIFPN中的多尺度特征加权融合，能把Concat换成Add来减少参数量和计算量的情况下，还能通过不同尺度特征的重要性进行自适用选择加权融合。\n    3. 高效上采样模块来源于CVPR2024-EMCAD中的EUCB，能够在保证一定效果的同时保持高效性。\n\n10. ultralytics/cfg/models/v10/yolov10n-CSP-PMSFA.yaml\n\n    自研模块:CSP-Partial Multi-Scale Feature Aggregation.\n    1. 部分多尺度特征提取：参考CVPR2020-GhostNet、CVPR2024-FasterNet的思想，采用高效的PartialConv，该模块能够从输入中提取多种尺度的特征信息，但它并不是在所有通道上进行这种操作，而是部分（Partial）地进行，从而提高了计算效率。\n    2. 增强的特征融合: 最后的 1x1 卷积层通过将不同尺度的特征融合在一起，同时使用残差连接将输入特征与处理后的特征相加，有效保留了原始信息并引入了新的多尺度信息，从而提高模型的表达能力。\n\n11. ultralytics/cfg/models/v10/yolov10n-MutilBackbone-DAF.yaml\n\n    自研MutilBackbone-DynamicAlignFusion.\n    1. 为了避免在浅层特征图上消耗过多计算资源，设计的MutilBackbone共享一个stem的信息，这个设计有利于避免计算量过大，推理时间过大的问题。\n    2. 为了避免不同Backbone信息融合出现不同来源特征之间的空间差异，我们为此设计了DynamicAlignFusion，其先通过融合来自两个不同模块学习到的特征，然后生成一个名为DynamicAlignWeight去调整各自的特征，最后使用一个可学习的通道权重，其可以根据输入特征动态调整两条路径的权重，从而增强模型对不同特征的适应能力。\n\n12. ultralytics/cfg/models/v10/yolov10n-TADDH.yaml\n\n    自研任务对齐动态检测头\n    1. GroupNorm在FCOS论文中已经证实可以提升检测头定位和分类的性能.\n    2. 通过使用共享卷积，可以大幅减少参数数量，这使得模型更轻便，特别是在资源受限的设备上.并且在使用共享卷积的同时，为了应对每个检测头所检测的目标尺度不一致的问题，使用Scale层对特征进行缩放.\n    3. 参照TOOD的思想,除了标签分配策略上的任务对齐,我们也在检测头上进行定制任务对齐的结构,现有的目标检测器头部通常使用独立的分类和定位分支,这会导致两个任务之间缺乏交互,TADDH通过特征提取器从多个卷积层中学习任务交互特征,得到联合特征,定位分支使用DCNV2和交互特征生成DCNV2的offset和mask,分类分支使用交互特征进行动态特征选择.\n\n13. ultralytics/cfg/models/v10/yolov10n-C2f-MutilScaleEdgeInformationEnhance.yaml\n\n    自研CSP-MutilScaleEdgeInformationEnhance.\n    MutilScaleEdgeInformationEnhance模块结合了多尺度特征提取、边缘信息增强和卷积操作。它的主要目的是从不同尺度上提取特征，突出边缘信息，并将这些多尺度特征整合到一起，最后通过卷积层输出增强的特征。这个模块在特征提取和边缘增强的基础上有很好的表征能力.\n    1. 多尺度特征提取：通过 nn.AdaptiveAvgPool2d 进行多尺度的池化，提取不同大小的局部信息，有助于捕捉图像的多层次特征。\n    2. 边缘增强：EdgeEnhancer 模块专门用于提取边缘信息，使得网络对边缘的敏感度增强，这对许多视觉任务（如目标检测、语义分割等）有重要作用。\n    3. 特征融合：将不同尺度下提取的特征通过插值操作对齐到同一尺度，然后将它们拼接在一起，最后经过卷积层融合成统一的特征表示，能够提高模型对多尺度特征的感知。\n\n14. ultralytics/cfg/models/v10/yolov10n-RSCD.yaml\n\n    自研重参数轻量化检测头.(Rep Shared Convolutional Detection Head)\n    1. 通过使用共享卷积，可以大幅减少参数数量，这使得模型更轻便，特别是在资源受限的设备上.但由于共享参数可能限制模型的表达能力，因为不同特征可能需要不同的卷积核来捕捉复杂的模式。共享参数可能无法充分捕捉这些差异。为了尽量弥补实现轻量化所采取的共享卷积带来的负面影响，我们使用可重参数化卷积，通过引入更多的可学习参数，网络可以更有效地从数据中提取特征，进而弥补轻量化模型后可能带来的精度丢失问题，并且重参数化卷积可以大大提升参数利用率，并且在推理阶段与普通卷积无差，为模型带来无损的优化方案。\n    2. 在使用共享卷积的同时，为了应对每个检测头所检测的目标尺度不一致的问题，使用Scale层对特征进行缩放.\n\n15. ultralytics/cfg/models/v10/yolov10n-CSP-FreqSpatial.yaml\n\n    FreqSpatial 是一个融合时域和频域特征的卷积神经网络（CNN）模块。该模块通过在时域和频域中提取特征，旨在捕捉不同层次的空间和频率信息，以增强模型在处理图像数据时的鲁棒性和表示能力。模块的主要特点是将 Scharr 算子（用于边缘检测）与 时域卷积 和 频域卷积 结合，通过多种视角捕获图像的结构特征。\n    1. 时域特征提取：从原始图像中提取出基于空间结构的特征，主要捕捉图像的细节、边缘信息等。\n    2. 频域特征提取：从频率域中提取出频率相关的模式，捕捉到图像的低频和高频成分，能够帮助模型在全局和局部的尺度上提取信息。\n    3. 特征融合：将时域和频域的特征进行加权相加，得到最终的输出特征图。这种加权融合允许模型同时考虑空间结构信息和频率信息，从而增强模型在多种场景下的表现能力。\n\n16. ultralytics/cfg/models/v10/yolov10n-C2f-MutilScaleEdgeInformationSelect.yaml\n\n    基于自研CSP-MutilScaleEdgeInformationEnhance再次创新.\n    我们提出了一个 多尺度边缘信息选择模块（MutilScaleEdgeInformationSelect），其目的是从多尺度边缘信息中高效选择与目标任务高度相关的关键特征。为了实现这一目标，我们引入了一个具有通过聚焦更重要的区域能力的注意力机制[ICCV2023 DualDomainSelectionMechanism, DSM](https://github.com/c-yn/FocalNet)。该机制通过聚焦图像中更重要的区域（如复杂边缘和高频信号区域），在多尺度特征中自适应地筛选具有更高任务相关性的特征，从而显著提升了特征选择的精准度和整体模型性能。\n\n17. ultralytics/cfg/models/v10/yolov10n-LSDECD.yaml\n\n    基于自研轻量化检测头上(LSCD)，使用detail-enhanced convolution进一步改进，提高检测头的细节捕获能力，进一步改善检测精度.\n    关于DEConv在运行的时候重参数化后比重参数化前的计算量还要大的问题:是因为重参数化前thop库其计算不准的问题,看重参数化后的参数即可.\n    1. DEA-Net中设计了一个细节增强卷积（DEConv），具体来说DEConv将先验信息整合到普通卷积层，以增强表征和泛化能力。然后，通过使用重参数化技术，DEConv等效地转换为普通卷积，不需要额外的参数和计算成本。\n\n18. ultralytics/cfg/models/v10/yolov10n-ContextGuideFPN.yaml\n\n    Context Guide Fusion Module（CGFM）是一个创新的特征融合模块，旨在改进YOLOv8中的特征金字塔网络（FPN）。该模块的设计考虑了多尺度特征融合过程中上下文信息的引导和自适应调整。\n    1. 上下文信息的有效融合：通过SE注意力机制，模块能够在特征融合过程中捕捉并利用重要的上下文信息，从而增强特征表示的有效性，并有效引导模型学习检测目标的信息，从而提高模型的检测精度。\n    2. 特征增强：通过权重化的特征重组操作，模块能够增强重要特征，同时抑制不重要特征，提升特征图的判别能力。\n    3. 简单高效：模块结构相对简单，不会引入过多的计算开销，适合在实时目标检测任务中应用。\n\n19. Re-CalibrationFPN\n\n    为了加强浅层和深层特征的相互交互能力，推出重校准特征金字塔网络(Re-CalibrationFPN).\n    P2345：ultralytics/cfg/models/v10/yolov10n-ReCalibrationFPN-P2345.yaml(带有小目标检测头的ReCalibrationFPN)\n    P345：ultralytics/cfg/models/v10/yolov10n-ReCalibrationFPN-P345.yaml\n    P3456：ultralytics/cfg/models/v10/yolov10n-ReCalibrationFPN-P3456.yaml(带有大目标检测头的ReCalibrationFPN)\n    1. 浅层语义较少，但细节丰富，有更明显的边界和减少失真。此外，深层蕴藏着丰富的物质语义信息。因此，直接融合低级具有高级特性的特性可能导致冗余和不一致。为了解决这个问题，我们提出了[SBA](https://github.com/Barrett-python/DuAT)模块，它有选择地聚合边界信息和语义信息来描绘更细粒度的物体轮廓和重新校准物体的位置。\n    2. 相比传统的FPN结构，[SBA](https://github.com/Barrett-python/DuAT)模块引入了高分辨率和低分辨率特征之间的双向融合机制，使得特征之间的信息传递更加充分，进一步提升了多尺度特征融合的效果。\n    3. [SBA](https://github.com/Barrett-python/DuAT)模块通过自适应的注意力机制，根据特征图的不同分辨率和内容，自适应地调整特征的权重，从而更好地捕捉目标的多尺度特征。\n\n20. ultralytics/cfg/models/v10/yolov10n-CSP-PTB.yaml\n\n    Cross Stage Partial - Partially Transformer Block\n    在计算机视觉任务中，Transformer结构因其强大的全局特征提取能力而受到广泛关注。然而，由于Transformer结构的计算复杂度较高，直接将其应用于所有通道会导致显著的计算开销。为了在保证高效特征提取的同时降低计算成本，我们设计了一种混合结构，将输入特征图分为两部分，分别由CNN和Transformer处理，结合了卷积神经网络(CNN)和Transformer机制的模块，旨在增强特征提取的能力。\n    我们提出了一种名为CSP_PTB(Cross Stage Partial - Partially Transformer Block)的模块，旨在结合CNN和Transformer的优势，通过对输入通道进行部分分配来优化计算效率和特征提取能力。\n    1. 融合局部和全局特征：多项研究表明，CNN的感受野大小较少，导致其只能提取局部特征，但Transformer的MHSA能够提取全局特征，能够同时利用两者的优势。\n    2. 保证高效特征提取的同时降低计算成本：为了能引入Transformer结构来提取全局特征又不想大幅度增加计算复杂度，因此提出Partially Transformer Block，只对部分通道使用TransformerBlock。\n    3. MHSA_CGLU包含Mutil-Head-Self-Attention和[ConvolutionalGLU(TransNext CVPR2024)](https://github.com/DaiShiResearch/TransNeXt)，其中Mutil-Head-Self-Attention负责提取全局特征，ConvolutionalGLU用于增强非线性特征表达能力，ConvolutionalGLU相比于传统的FFN，具有更强的性能。\n    4. 可以根据不同的模型大小和具体的运行情况调节用于Transformer的通道数。\n\n21. GlobalEdgeInformationTransfer\n\n    实现版本1：ultralytics/cfg/models/v10/yolov10n-GlobalEdgeInformationTransfer1.yaml\n    实现版本3：ultralytics/cfg/models/v10/yolov10n-GlobalEdgeInformationTransfer3.yaml\n    实现版本2：ultralytics/cfg/models/v10/yolov10n-GlobalEdgeInformationTransfer2.yaml\n    总所周知，物体框的定位非常之依赖物体的边缘信息，但是对于常规的目标检测网络来说，没有任何组件能提高网络对物体边缘信息的关注度，我们需要开发一个能让边缘信息融合到各个尺度所提取的特征中，因此我们提出一个名为GlobalEdgeInformationTransfer(GEIT)的模块，其可以帮助我们把浅层特征中提取到的边缘信息传递到整个backbone上，并与不同尺度的特征进行融合。\n    1. 由于原始图像中含有大量背景信息，因此从原始图像上直接提取边缘信息传递到整个backbone上会给网络的学习带来噪声，而且浅层的卷积层会帮助我们过滤不必要的背景信息，因此我们选择在网络的浅层开发一个名为MutilScaleEdgeInfoGenetator的模块，其会利用网络的浅层特征层去生成多个尺度的边缘信息特征图并投放到主干的各个尺度中进行融合。\n    2. 对于下采样方面的选择，我们需要较为谨慎，我们的目标是保留并增强边缘信息，同时进行下采样，选择MaxPool 会更合适。它能够保留局部区域的最强特征，更好地体现边缘信息。因为 AvgPool 更适用于需要平滑或均匀化特征的场景，但在保留细节和边缘信息方面的表现不如 MaxPool。\n    3. 对于融合部分，ConvEdgeFusion巧妙地结合边缘信息和普通卷积特征，提出了一种新的跨通道特征融合方式。首先，使用conv_channel_fusion进行边缘信息与普通卷积特征的跨通道融合，帮助模型更好地整合不同来源的特征。然后采用conv_3x3_feature_extract进一步提取融合后的特征，以增强模型对局部细节的捕捉能力。最后通过conv_1x1调整输出特征维度。\n\n22. ultralytics/cfg/models/v10/yolov10n-C2f-DIMB.yaml\n\n    自研模块DynamicInceptionDWConv2d.(详细请看项目内配置文件.md)\n\n23. ultralytics/cfg/models/v10/yolov10n-HAFB-1.yaml\n    \n    自研Hierarchical Attention Fusion Block.(详细请看项目内配置文件.md)\n\n24. ultralytics/cfg/models/v10/yolov10n-HAFB-2.yaml\n\n    HAFB另外一种使用方法.\n\n25. ultralytics/cfg/models/v10/yolov10n-MutilBackbone-HAFB.yaml\n    \n    yolov10n-MutilBackbone-DAF.yaml基础上用上HAFB.\n\n### BackBone系列\n\n1. ultralytics/cfg/models/v10/yolov10n-efficientViT.yaml\n\n    (CVPR2023)efficientViT替换yolov10主干.\n\n2. ultralytics/cfg/models/v10/yolov10n-fasternet.yaml\n\n    (CVPR2023)fasternet替换yolov10主干.\n\n3. ultralytics/cfg/models/v10/yolov10n-timm.yaml\n\n    使用timm支持的主干网络替换yolov10主干.\n\n4. ultralytics/cfg/models/v10/yolov10n-convnextv2.yaml\n\n    使用convnextv2网络替换yolov10主干.\n\n5. ultralytics/cfg/models/v10/yolov10n-EfficientFormerV2.yaml\n\n    使用EfficientFormerV2网络替换yolov10主干.(需要看[常见错误和解决方案的第五点](#a))  \n\n6. ultralytics/cfg/models/v10/yolov10n-vanillanet.yaml\n\n    vanillanet替换yolov10主干.\n\n7. ultralytics/cfg/models/v10/yolov10n-LSKNet.yaml\n\n    LSKNet(2023旋转目标检测SOTA的主干)替换yolov10主干.\n\n8. ultralytics/cfg/models/v10/yolov10n-swintransformer.yaml\n\n    SwinTransformer-Tiny替换yolov10主干.\n\n9. ultralytics/cfg/models/v10/yolov10n-repvit.yaml\n\n    [CVPR2024 RepViT](https://github.com/THU-MIG/RepViT/tree/main)替换yolov10主干.\n\n10. ultralytics/cfg/models/v10/yolov10n-CSwinTransformer.yaml\n\n    使用[CSWin-Transformer(CVPR2022)](https://github.com/microsoft/CSWin-Transformer/tree/main)替换yolov10主干.(需要看[常见错误和解决方案的第五点](#a))\n\n11. ultralytics/cfg/models/v10/yolov10n-HGNetV2.yaml\n\n    使用HGNetV2作为YOLOV10的backbone.\n\n12. ultralytics/cfg/models/v10/yolov10n-unireplknet.yaml\n\n    使用[UniRepLKNet](https://github.com/AILab-CVC/UniRepLKNet/tree/main)替换yolov10主干.\n\n13. ultralytics/cfg/models/v10/yolov10n-TransNeXt.yaml\n\n    使用[TransNeXt](https://github.com/DaiShiResearch/TransNeXt)改进yolov10的backbone.(需要看[常见错误和解决方案的第五点](#a))   \n\n14. ultralytics/cfg/models/v10/yolov10n-rmt.yaml\n\n    使用[CVPR2024 RMT](https://arxiv.org/abs/2309.11523)改进yolov10的主干.\n\n15. ultralytics/cfg/models/v10/yolov10n-pkinet.yaml\n\n    使用[CVPR2024 PKINet](https://github.com/PKINet/PKINet)改进backbone.(需要安装mmcv和mmengine)\n\n16. ultralytics/cfg/models/v10/yolov10n-mobilenetv4.yaml\n\n    使用[MobileNetV4](https://github.com/jaiwei98/MobileNetV4-pytorch/tree/main)改进yolov10的backbone.\n\n17. ultralytics/cfg/models/v10/yolov10n-starnet.yaml\n\n    使用[StarNet CVPR2024](https://github.com/ma-xu/Rewrite-the-Stars/tree/main)改进yolov10-backbone.\n\n18. ultralytics/cfg/models/v10/yolov10n-mambaout.yaml\n     \n    使用[CVPR2025 MambaOut](https://github.com/yuweihao/MambaOut)中的MambaOut替换BackBone.\n\n19. ultralytics/cfg/models/v10/yolov10n-lsnet.yaml\n\n    使用[CVPR2025 LSNet](https://github.com/THU-MIG/lsnet)中的lsnet替换yolov10的backbone.\n\n20. ultralytics/cfg/models/v10/yolov10n-overlock.yaml\n\n    使用[CVPR2025 OverLock](https://arxiv.org/pdf/2502.20087)中的overlock-backbone替换backbone.\n\n### SPPF系列\n\n1. ultralytics/cfg/models/v10/yolov10n-FocalModulation.yaml\n\n    使用[Focal Modulation](https://github.com/microsoft/FocalNet)替换SPPF.\n\n2. ultralytics/cfg/models/v10/yolov10n-SPPF-LSKA.yaml\n\n    使用[LSKA](https://github.com/StevenLauHKHK/Large-Separable-Kernel-Attention)注意力机制改进SPPF,增强多尺度特征提取能力.\n\n3. ultralytics/cfg/models/v10/yolov10n-AIFIRep.yaml\n\n    使用[ICML-2024 SLAB](https://github.com/xinghaochen/SLAB)与AIFI改进yolov10.\n\n### Neck系列\n\n1. ultralytics/cfg/models/v10/yolov10n-bifpn.yaml\n\n    添加BIFPN到yolov10中.  \n    其中BIFPN中有三个可选参数：\n    1. Fusion  \n        其中BIFPN中的Fusion模块支持五种: weight, adaptive, concat, bifpn(default), SDI  \n        其中weight, adaptive, concat出自[paper链接-Figure 3](https://openreview.net/pdf?id=q2ZaVU6bEsT), SDI出自[U-NetV2](https://github.com/yaoppeng/U-Net_v2)\n    2. node_mode  \n        其中支持这些[结构](#b)\n    3. head_channel  \n        BIFPN中的通道数,默认设置为256.\n\n2. ultralytics/cfg/models/v10/yolov10n-slimneck.yaml\n\n    使用[VoVGSCSP\\VoVGSCSPC和GSConv](https://github.com/AlanLi1997/slim-neck-by-gsconv)替换yolov10 neck中的C2f和Conv.\n\n3. ultralytics/cfg/models/v10/yolov10n-goldyolo.yaml\n\n    利用华为2023最新GOLD-YOLO中的Gatherand-Distribute进行改进特征融合模块.\n\n4. ultralytics/cfg/models/v10/yolov10n-MAFPN.yaml\n\n    使用[MAF-YOLO](https://arxiv.org/pdf/2407.04381)的MAFPN改进Neck.\n\n5. ultralytics/cfg/models/v10/yolov10n-ASF.yaml\n\n    使用[ASF-YOLO](https://github.com/mkang315/ASF-YOLO)中的Attentional Scale Sequence Fusion改进yolov10.\n\n6. Cross-Layer Feature Pyramid Transformer.   \n\n    P345:ultralytics/cfg/models/v10/yolov10n-CFPT.yaml\n    P2345:ultralytics/cfg/models/v10/yolov10n-CFPT-P2345.yaml\n    P3456:ultralytics/cfg/models/v10/yolov10n-CFPT-P3456.yaml\n    P23456:ultralytics/cfg/models/v10/yolov10n-CFPT-P23456.yaml\n\n    使用[CFPT](https://github.com/duzw9311/CFPT/tree/main)改进neck.\n7. ultralytics/cfg/models/v10/yolov10n-RCSOSA.yaml\n\n    使用[RCS-YOLO](https://github.com/mkang315/RCS-YOLO/tree/main)中的RCSOSA替换C2f.\n\n8. ultralytics/cfg/models/v10/yolov10n-GFPN.yaml\n\n    使用[DAMO-YOLO](https://github.com/tinyvision/DAMO-YOLO)中的RepGFPN改进Neck.\n\n9. ultralytics/cfg/models/v10/yolov10n-EfficientRepBiPAN.yaml\n\n    使用[YOLOV6](https://github.com/meituan/YOLOv6/tree/main)中的EfficientRepBiPAN改进Neck.\n\n10. ultralytics/cfg/models/v10/yolov10n-HSFPN.yaml\n\n    使用[MFDS-DETR](https://github.com/JustlfC03/MFDS-DETR)中的HS-FPN改进yolov10的neck.\n\n11. ultralytics/cfg/models/v10/yolov10n-hyper.yaml\n\n    使用[Hyper-YOLO](https://www.arxiv.org/pdf/2408.04804)中的Hypergraph Computation in Semantic Space改进yolov10.\n\n12. ultralytics/cfg/models/v10/yolov10n-msga.yaml\n\n    使用[MSA^2 Net](https://github.com/xmindflow/MSA-2Net)中的Multi-Scale Adaptive Spatial Attention Gate改进yolov10-neck.\n\n13. ultralytics/cfg/models/v10/yolov10n-CGAFusion.yaml\n\n    使用[DEA-Net](https://github.com/cecret3350/DEA-Net)中的content-guided attention fusion改进yolov10-neck.\n\n14. ultralytics/cfg/models/v10/yolov10n-WFU.yaml\n\n    使用[ACMMM2024 WFEN](https://github.com/PRIS-CV/WFEN)中的Wavelet Feature Upgrade改进yolov10-neck.\n\n15. ultralytics/cfg/models/v10/yolov10n-fsa.yaml\n\n    使用[BIBM2024 Spatial-Frequency Dual Domain Attention Network For Medical Image Segmentation](https://github.com/nkicsl/SF-UNet)的Frequency-Spatial Attention改进yolov10.\n\n16. ultralytics/cfg/models/v10/yolov10n-mscafsa.yaml\n\n    使用[BIBM2024 Spatial-Frequency Dual Domain Attention Network For Medical Image Segmentation](https://github.com/nkicsl/SF-UNet)的Frequency-Spatial Attention和Multi-scale Progressive Channel Attention改进yolov10-neck.\n\n17. ultralytics/cfg/models/v10/yolov10n-MFM.yaml\n\n    使用[CVPR2024 DCMPNet](https://github.com/zhoushen1/DCMPNet)中的MFM改进neck.\n\n18. ultralytics/cfg/models/v10/yolov10n-GDSAFusion.yaml\n\n    使用[CVPR2025 OverLock](https://arxiv.org/pdf/2502.20087)中的GDSAFusion改进neck.\n\n19. ultralytics/cfg/models/v10/yolov10n-RFPN.yaml\n\n    使用[ECCV2024 rethinking-fpn](https://github.com/AlanLi1997/rethinking-fpn)的SNI和GSConvE改进YOLOV10n-neck.\n\n20. ultralytics/cfg/models/v10/yolov10n-PST.yaml\n\n    使用[Pyramid Sparse Transformer](https://arxiv.org/abs/2505.12772)中的Pyramid Sparse Transformer改进neck.\n\n21. ultralytics/cfg/models/v10/yolov10n-HS-FPN.yaml\n\n    使用[AAAI2025 HS-FPN](https://github.com/ShiZican/HS-FPN/tree/main)中的HFP和SDP改进yolo-neck.\n\n22. ultralytics/cfg/models/v10/yolov10n-LCA.yaml\n\n    使用[CVPR2025 HVI](https://arxiv.org/pdf/2502.20272)中的LCA改进yolov10-neck.\n\n23. ultralytics/cfg/models/v10/yolov10n-HFFE.yaml\n\n    使用[TGRS2025 HAFNet](https://ieeexplore.ieee.org/document/11154006)中的HFFE改进yolov10-neck.\n\n### Head系列\n\n1. ultralytics/cfg/models/v10/yolov10n-dyhead.yaml\n\n    添加基于注意力机制的目标检测头到yolov10中.\n\n2. ultralytics/cfg/models/v10/yolov10n-LQE.yaml\n\n    Localization Quality Estimation Head-NMSFree,Localization Quality Estimation此模块出自[GFocalV2](https://arxiv.org/abs/2011.12885).\n\n### Label Assign系列\n### PostProcess系列\n\n### 上下采样算子\n\n1. ultralytics/cfg/models/v10/yolov10n-ContextGuidedDown.yaml\n\n    使用[CGNet](https://github.com/wutianyiRosun/CGNet/tree/master)中的Light-weight Context Guided DownSample进行下采样.\n\n2. ultralytics/cfg/models/v10/yolov10n-SPDConv.yaml\n\n    使用[SPDConv](https://github.com/LabSAINT/SPD-Conv/tree/main)进行下采样.\n\n3. ultralytics/cfg/models/v10/yolov10n-dysample.yaml\n\n    使用[ICCV2023 DySample](https://arxiv.org/abs/2308.15085)改进yolov10-neck中的上采样.\n\n4. ultralytics/cfg/models/v10/yolov10n-CARAFE.yaml\n\n    使用[ICCV2019 CARAFE](https://arxiv.org/abs/1905.02188)改进yolov10-neck中的上采样.\n\n5. ultralytics/cfg/models/v10/yolov10n-HWD.yaml\n\n    使用[Haar wavelet downsampling](https://www.sciencedirect.com/science/article/abs/pii/S0031320323005174)改进yolov8的下采样.(请关闭AMP情况下使用)\n\n6. ultralytics/cfg/models/v8=10/yolov10n-v7DS.yaml\n\n    使用[YOLOV7 CVPR2023](https://arxiv.org/abs/2207.02696)的下采样结构改进YOLOV10中的下采样.\n\n7. ultralytics/cfg/models/v10/yolov10n-ADown.yaml\n\n    使用[YOLOV9](https://github.com/WongKinYiu/yolov9)的下采样结构改进YOLOV10中的下采样.\n\n8. ultralytics/cfg/models/v10/yolov10n-SRFD.yaml\n\n    使用[A Robust Feature Downsampling Module for Remote Sensing Visual Tasks](https://ieeexplore.ieee.org/document/10142024)改进yolov10的下采样.\n\n9. ultralytics/cfg/models/v10/yolov10n-WaveletPool.yaml\n\n    使用[Wavelet Pooling](https://openreview.net/forum?id=rkhlb8lCZ)改进YOLOV10的上采样和下采样。\n\n10. ultralytics/cfg/models/v10/yolov10n-LDConv.yaml\n\n    使用[LDConv](https://github.com/CV-ZhangXin/LDConv/tree/main)改进下采样.\n\n11. ultralytics/cfg/models/v10/yolov10n-PSConv.yaml\n\n    使用[AAAI2025 Pinwheel-shaped Convolution and Scale-based Dynamic Loss for Infrared Small Target Detection](https://github.com/JN-Yang/PConv-SDloss-Data)中的Pinwheel-shaped Convolution改进yolov10.\n\n12. ultralytics/cfg/models/v10/yolov10n-EUCB.yaml\n\n    使用[CVPR2024 EMCAD](https://github.com/SLDGroup/EMCAD)中的EUCB改进yolov10的上采样.\n\n13. ultralytics/cfg/models/v10/yolov10n-LoGStem.yaml\n\n    使用[LEGNet](https://github.com/lwCVer/LEGNet)中的LoGStem改进Stem(第一第二层卷积).\n\n14. ultralytics/cfg/models/v10/yolov10n-FourierConv.yaml\n\n    使用[MIA2025 Fourier Convolution Block with global receptive field for MRI reconstruction](https://www.sciencedirect.com/science/article/abs/pii/S1361841524002743)中的FourierConv改进Conv.\n\n15. ultralytics/cfg/models/v10/yolov10n-RepStem.yaml\n\n    使用[ICCV2023 FastVit](https://arxiv.org/pdf/2303.14189)中的RepStem改进yolov10下采样.\n\n16. ultralytics/cfg/models/v10/yolov10n-C2f-GCConv.yaml\n\n    使用[CVPR2025 Golden Cudgel Network](https://github.com/gyyang23/GCNet)中的GCConv改进C2f.\n\n### C2f系列\n\n1. ultralytics/cfg/models/v10/yolov10n-C2f-WTConv.yaml\n\n    使用[ECCV2024 Wavelet Convolutions for Large Receptive Fields](https://github.com/BGU-CS-VIL/WTConv)中的WTConv改进C2f-BottleNeck.\n\n2. ultralytics/cfg/models/v10/yolov10n-attention.yaml\n\n    可以看项目视频-如何在yaml配置文件中添加注意力层  \n    多种注意力机制在yolov10中的使用. [多种注意力机制github地址](https://github.com/z1069614715/objectdetection_script/tree/master/cv-attention)  \n    目前内部整合的注意力可看[链接](#c)\n\n3. ultralytics/cfg/models/v10/yolov10n-C2f-FMB.yaml\n\n    使用[ECCV2024 SMFANet](https://github.com/Zheng-MJ/SMFANet/tree/main)的Feature Modulation block改进C2f.\n\n4. ultralytics/cfg/models/v10/yolov10n-C2f-Faster.yaml\n\n    使用C2f-Faster替换C2f.(使用FasterNet中的FasterBlock替换C2f中的Bottleneck)\n\n5. ultralytics/cfg/models/v10/yolov10n-C2f-ODConv.yaml\n\n    使用C2f-ODConv替换C2f.(使用ODConv替换C2f中的Bottleneck中的Conv)\n\n6. ultralytics/cfg/models/v10/yolov10n-C2f-Faster-EMA.yaml\n\n    使用C2f-Faster-EMA替换C2f.(C2f-Faster-EMA推荐可以放在主干上,Neck和head部分可以选择C2f-Faster)\n\n7. ultralytics/cfg/models/v10/yolov10n-C2f-DBB.yaml\n\n    使用C2f-DBB替换C2f.(使用DiverseBranchBlock替换C2f中的Bottleneck中的Conv)\n\n8. ultralytics/cfg/models/v10/yolov10n-C2f-CloAtt.yaml\n\n    使用C2f-CloAtt替换C2f.(使用CloFormer中的具有全局和局部特征的注意力机制添加到C2f中的Bottleneck中)(需要看[常见错误和解决方案的第五点](#a))\n\n9. ultralytics/cfg/models/v10/yolov10n-C2f-gConv.yaml\n\n    使用[Rethinking Performance Gains in Image Dehazing Networks](https://arxiv.org/abs/2209.11448)的gConvblock改进C2f.\n\n10. ultralytics/cfg/models/v10/yolov10n-C2f-SCConv.yaml\n\n    SCConv(CVPR2020 http://mftp.mmcheng.net/Papers/20cvprSCNet.pdf)与C2f融合.\n\n11. ultralytics/cfg/models/v10/yolov10n-C2f-SCcConv.yaml\n\n    ScConv(CVPR2023 https://openaccess.thecvf.com/content/CVPR2023/papers/Li_SCConv_Spatial_and_Channel_Reconstruction_Convolution_for_Feature_Redundancy_CVPR_2023_paper.pdf)与C2f融合.  \n    (取名为SCcConv的原因是在windows下命名是不区分大小写的)\n\n12. ultralytics/cfg/models/v10/yolov10n-KernelWarehouse.yaml\n\n    使用[Towards Parameter-Efficient Dynamic Convolution](https://github.com/OSVAI/KernelWarehouse)添加到yolov10中.  \n    使用此模块需要注意,在epoch0-20的时候精度会非常低,过了20epoch会正常.\n\n13. ultralytics/cfg/models/v10/yolov10n-C2f-DySnakeConv.yaml\n\n    [DySnakeConv](https://github.com/YaoleiQi/DSCNet)与C2f融合.\n\n14. ultralytics/cfg/models/v10/yolov10n-C2f-WDBB.yaml\n\n    使用[YOLO-MIF](https://github.com/wandahangFY/YOLO-MIF)中的WDBB改进c2f.\n\n15. ultralytics/cfg/models/v10/yolov10n-C2f-DeepDBB.yaml\n\n    使用[YOLO-MIF](https://github.com/wandahangFY/YOLO-MIF)中的DeepDBB改进c2f.\n\n16. ultralytics/cfg/models/v10/yolov10n-C2f-AdditiveBlock.yaml\n\n    使用[CAS-ViT](https://github.com/Tianfang-Zhang/CAS-ViT)中的AdditiveBlock改进c2f.\n\n17. ultralytics/cfg/models/v10/yolov10n-C2f-MogaBlock.yaml\n\n    使用[MogaNet ICLR2024](https://github.com/Westlake-AI/MogaNet)中的MogaBlock改进C2f.\n\n18. ultralytics/cfg/models/v10/yolov10n-C2f-IdentityFormer.yaml\n\n    使用[Metaformer TPAMI2024](https://github.com/sail-sg/metaformer)中的IdentityFormer改进c2f.\n\n19. ultralytics/cfg/models/v10/yolov10n-C2f-RandomMixing.yaml\n\n    使用[Metaformer TPAMI2024](https://github.com/sail-sg/metaformer)中的RandomMixingFormer改进c2f.(需要看[常见错误和解决方案的第五点](#a))\n\n20. ultralytics/cfg/models/v10/yolov10n-C2f-PoolingFormer.yaml\n\n    使用[Metaformer TPAMI2024](https://github.com/sail-sg/metaformer)中的PoolingFormer改进c2f.\n\n21. ultralytics/cfg/models/v10/yolov10n-C2f-ConvFormer.yaml\n\n    使用[Metaformer TPAMI2024](https://github.com/sail-sg/metaformer)中的ConvFormer改进c2f.\n\n22. ultralytics/cfg/models/v10/yolov10n-C2f-CaFormer.yaml\n\n    使用[Metaformer TPAMI2024](https://github.com/sail-sg/metaformer)中的CaFormer改进c2f.\n\n23. ultralytics/cfg/models/v10/yolov10n-C2f-FFCM.yaml\n\n    使用[Efficient Frequency-Domain Image Deraining with Contrastive Regularization ECCV2024](https://github.com/deng-ai-lab/FADformer)中的Fused_Fourier_Conv_Mixer改进C2f.\n\n25. ultralytics/cfg/models/v10/yolov10n-C2f-SFHF.yaml\n\n    使用[SFHformer ECCV2024](https://github.com/deng-ai-lab/SFHformer)中的block改进C2f.\n\n26. ultralytics/cfg/models/v10/yolov10n-C2f-MSM.yaml\n\n    使用[Revitalizing Convolutional Network for Image Restoration TPAMI2024](https://zhuanlan.zhihu.com/p/720777160)中的MSM改进C2f.\n\n27. ultralytics/cfg/models/v10/yolov10n-C2f-iRMB.yaml\n\n    使用[EMO ICCV2023](https://github.com/zhangzjn/EMO)中的iRMB改进C2f.\n\n30. ultralytics/cfg/models/v10/yolov10n-C2f-RAB.yaml\n\n    使用[Pattern Recognition 2024|DRANet](https://github.com/WenCongWu/DRANet)中的HDRAB(hybrid dilated residual attention block)改进C2f.\n\n31. ultralytics/cfg/models/v10/yolov10n-C2f-HDRAB.yaml\n\n    使用[Pattern Recognition 2024|DRANet](https://github.com/WenCongWu/DRANet)中的RAB( residual attention block)改进C2f.\n\n32. ultralytics/cfg/models/v10/yolov10n-C2f-LFE.yaml\n\n    使用[Efficient Long-Range Attention Network for Image Super-resolution ECCV2022](https://github.com/xindongzhang/ELAN)中的Local feature extraction改进C2f.\n\n32. ultralytics/cfg/models/v10/yolov10n-C2f-SFA.yaml\n\n    使用[FreqFormer](https://github.com/JPWang-CS/FreqFormer)的Frequency-aware Cascade Attention-SFA改进C2f.\n\n33. ultralytics/cfg/models/v10/yolov10n-C2f-CTA.yaml\n\n    使用[FreqFormer](https://github.com/JPWang-CS/FreqFormer)的Frequency-aware Cascade Attention-CTA改进C2f.\n\n34. ultralytics/cfg/models/v10/yolov10n-C2f-CAMixer.yaml\n\n    使用[CAMixerSR CVPR2024](https://github.com/icandle/CAMixerSR)中的CAMixer改进C2f.\n\n35. ultralytics/cfg/models/v10/yolov10n-MAN.yaml\n\n    使用[Hyper-YOLO TPAMI2025](https://www.arxiv.org/pdf/2408.04804)中的Mixed Aggregation Network改进yolov10.\n\n36. ultralytics/cfg/models/v10/yolov10n-C2f-HFERB.yaml\n\n    使用[ICCV2023 CRAFT-SR](https://github.com/AVC2-UESTC/CRAFT-SR)中的high-frequency enhancement residual block改进C2f.\n\n37. ultralytics/cfg/models/v10/yolov10n-C2f-DTAB.yaml\n\n    使用[AAAI2025 TBSN](https://github.com/nagejacob/TBSN)中的DTAB改进C2f.\n\n38. ultralytics/cfg/models/v10/yolov10n-C2f-JDPM.yaml\n\n    使用[ECCV2024 FSEL](https://github.com/CSYSI/FSEL)中的joint domain perception module改进C2f.\n\n39. ultralytics/cfg/models/v10/yolov10n-C2f-ETB.yaml\n\n    使用[ECCV2024 FSEL](https://github.com/CSYSI/FSEL)中的entanglement transformer block改进C2f.\n\n40. ultralytics/cfg/models/v10/yolov10n-C2f-AP.yaml\n\n    使用[AAAI2025 Pinwheel-shaped Convolution and Scale-based Dynamic Loss for Infrared Small Target Detection](https://github.com/JN-Yang/PConv-SDloss-Data)中的Asymmetric Padding bottleneck改进C2f.\n\n41. ultralytics/cfg/models/v10/yolov10n-C2f-Kat.yaml\n\n    使用[ICLR2025 Kolmogorov-Arnold Transformer](https://github.com/Adamdad/kat)中的KAT改进C2f.\n\n42. ultralytics/cfg/models/v10/yolov10n-C2f-GlobalFilter.yaml\n\n    使用[T-PAMI Global Filter Networks for Image Classification](https://github.com/raoyongming/GFNet)中的GlobalFilterBlock和[TransNeXt CVPR2024](https://github.com/DaiShiResearch/TransNeXt)中的Convolutional GLU改进C2f.\n\n43. ultralytics/cfg/models/v10/yolov10n-C2f-DynamicFilter.yaml\n\n    使用[AAAI2024 FFT-Based Dynamic Token Mixer for Vision](https://github.com/okojoalg/dfformer)中的DynamicFilter改进C2f.\n\n44. ultralytics/cfg/models/v10/yolov10n-RepHMS.yaml\n\n    使用[MHAF-YOLO](https://github.com/yang-0201/MHAF-YOLO)中的RepHMS改进yolov10.\n\n45. ultralytics/cfg/models/v10/yolov10n-C2f-SAVSS.yaml\n\n    使用[CVPR2025 SCSegamba](https://github.com/Karl1109/SCSegamba)中的Structure-Aware Scanning Strategy改进C2f.\n\n46. ultralytics/cfg/models/v10/yolov10n-C2f-mambaout.yaml\n     \n     使用[CVPR2025 MambaOut](https://github.com/yuweihao/MambaOut)中的MambaOutBlock改进C2f.\n\n47. ultralytics/cfg/models/v10/yolov10n-C2f-EfficientVIM.yaml\n\n    使用[CVPR2025 EfficientViM](https://github.com/mlvlab/EfficientViM)中的EfficientViMBlock改进C2f.\n\n48. ultralytics/cfg/models/v10/yolov10n-C2f-LEGM.yaml\n\n    使用[CVPR2024 DCMPNet](https://github.com/zhoushen1/DCMPNet)中的LEGM改进C2f.\n\n49. ultralytics/cfg/models/v10/yolov10n-C2f-RCB.yaml\n\n    使用[CVPR2025 OverLock](https://arxiv.org/pdf/2502.20087)中的RepConvBlock改进C2f.\n\n50. ultralytics/cfg/models/v10/yolov10n-C2f-LFEM.yaml\n\n    使用[LEGNet](https://github.com/lwCVer/LEGNet)中的LFEModule改进C2f.\n\n51. ultralytics/cfg/models/v10/yolov10n-C2f-LSBlock.yaml\n\n    使用[CVPR2025 LSNet](https://github.com/THU-MIG/lsnet)中的LSBlock改进C2f.\n\n52. ultralytics/cfg/models/v10/yolov10n-C2f-TransMamba.yaml\n\n    使用[TransMamba](https://github.com/sunshangquan/TransMamba)的TransMamba改进C2f\n\n53. ultralytics/cfg/models/v10/yolov10n-C2f-EVS.yaml\n\n    使用[CVPR2025 EVSSM](https://github.com/kkkls/EVSSM)中的EVS改进C2f.(编译教程请看:20240219版本更新说明)\n\n54. ultralytics/cfg/models/v10/yolov10n-C2f-EBlock.yaml\n\n    使用[CVPR2025 DarkIR](https://github.com/cidautai/DarkIR)中的EBlock改进C2f.\n\n55. ultralytics/cfg/models/v10/yolov10n-C2f-DBlock.yaml\n\n    使用[CVPR2025 DarkIR](https://github.com/cidautai/DarkIR)中的DBlock改进C2f.\n\n56. ultralytics/cfg/models/v10/yolov10n-C2f-SFSConv.yaml\n\n    使用[CVPR2024 SFSConv](https://github.com/like413/SFS-Conv)的SFSConv改进C2f.\n\n57. ultralytics/cfg/models/v10/yolov10n-FCM.yaml\n\n    使用[AAAI2025 FBRT-YOLO](https://github.com/galaxy-oss/FCM)的模块改进yolov10.\n\n58. ultralytics/cfg/models/v10/yolov10n-C2f-GroupMamba.yaml\n\n    使用[CVPR2025 GroupMamba](https://github.com/Amshaker/GroupMamba)中的GroupMambaBlock改进C2f.\n\n59. ultralytics/cfg/models/v10/yolov10n-C2f-MambaVision.yaml\n\n    使用[CVPR2025 MambaVision](https://github.com/NVlabs/MambaVision)中的MambaVision改进C2f.\n\n60. ultralytics/cfg/models/v10/yolov10n-C2f-FourierConv.yaml\n\n    使用[MIA2025 Fourier Convolution Block with global receptive field for MRI reconstruction](https://www.sciencedirect.com/science/article/abs/pii/S1361841524002743)中的FourierConv改进C2f.\n\n61. ultralytics/cfg/models/v10/yolov10n-C2f-GLVSS.yaml\n\n    使用[TGRS2025 UMFormer](https://github.com/takeyoutime/UMFormer)中的GLVSS改进C2f.\n\n62. ultralytics/cfg/models/v10/yolov10n-C2f-ESC.yaml\n\n    使用[ICCV2025 ESC: Emulating Self-attention with Convolution for Efficient Image Super-Resolution](https://github.com/dslisleedh/ESC)中的ESC改进C2f.\n\n63. ultralytics/cfg/models/v10/yolov10n-C2f-ConvAttn.yaml\n\n    使用[ICCV2025 ESC: Emulating Self-attention with Convolution for Efficient Image Super-Resolution](https://github.com/dslisleedh/ESC)中的ConvAttn改进C2f.\n\n64. ultralytics/cfg/models/v10/yolov10n-C2f-UniConv.yaml\n\n    使用[ICCV2025 UniConvBlock](https://github.com/ai-paperwithcode/UniConvNet)中的UniConvBlock改进C2f.\n\n65. ultralytics/cfg/models/v10/yolov10n-C2f-GCConv.yaml\n\n    使用[CVPR2025 Golden Cudgel Network](https://github.com/gyyang23/GCNet)中的GCConv改进C2f.\n\n66. ultralytics/cfg/models/v10/yolov10n-C2f-CFBlock.yaml\n\n    使用[AAAI2024 SCTNet](https://arxiv.org/pdf/2312.17071)中的CFBlock改进C2f.\n\n67. ultralytics/cfg/models/v10/yolov10n-C2f-CSSC.yaml\n\n    使用[TGRS2025 ASCNet](https://ieeexplore.ieee.org/document/10855453)中的CSSC改进C2f.\n\n68. ultralytics/cfg/models/v10/yolov10n-C2f-CNCM.yaml\n\n    使用[TGRS2025 ASCNet](https://ieeexplore.ieee.org/document/10855453)中的CNCM改进C2f.\n\n69. ultralytics/cfg/models/v10/yolov10n-C2f-HFRB.yaml\n\n    使用[ICCV2025 HFRB](https://arxiv.org/pdf/2507.10689)中的HFRB改进C2f.\n\n70. ultralytics/cfg/models/v10/yolov10n-C2f-EVA.yaml\n\n    使用[ICIP2025 BEVANET](https://arxiv.org/pdf/2508.07300)中的EVA改进C2f.\n\n71. ultralytics/cfg/models/v10/yolov10n-C2f-RMBC.yaml\n\n    使用[PlainUSR](https://arxiv.org/pdf/2409.13435)中的RepMBConv改进C2f.\n\n72. ultralytics/cfg/models/v10/yolov10n-C2f-RMBC-LA.yaml\n\n    使用[PlainUSR](https://arxiv.org/pdf/2409.13435)中的RepMBConv和Local Importance-based Attention改进C2f.\n\n73. ultralytics/cfg/models/v10/yolov10n-C2f-IEL.yaml\n\n    使用[CVPR2025 HVI](https://arxiv.org/pdf/2502.20272)中的IEL改进C2f.\n\n### PSA系列\n\n1. ultralytics/cfg/models/v10/yolov10n-PTSSA.yaml\n    \n    使用[Token Statistics Transformer](https://github.com/RobinWu218/ToST)中的Token Statistics Self-Attention改进PSA.\n\n2. ultralytics/cfg/models/v10/yolov10n-ASSR.yaml\n     \n    使用[CVPR2025 MambaIR](https://github.com/csguoh/MambaIR)中的Attentive State Space Group改进yolov10.\n\n### 组合系列\n\n1. ultralytics/cfg/models/v10/yolov10n-starnet-bifpn.yaml\n\n    使用[StarNet CVPR2024](https://github.com/ma-xu/Rewrite-the-Stars/tree/main)和bifpn改进yolov10.\n\n2. ultralytics/cfg/models/v10/yolov10n-ELA-HSFPN-TADDH.yaml\n\n    使用[Efficient Local Attention](https://arxiv.org/abs/2403.01123)改进HSFPN,使用自研动态动态对齐检测头改进Head.\n\n# Mamba-YOLO\n1. [Mamba-YOLO](https://github.com/HZAI-ZJNU/Mamba-YOLO)\n\n    集成Mamba-YOLO.(需要编译请看百度云视频-20240619版本更新说明)\n    ultralytics/cfg/models/mamba-yolo/Mamba-YOLO-T.yaml\n    ultralytics/cfg/models/mamba-yolo/Mamba-YOLO-B.yaml\n    ultralytics/cfg/models/mamba-yolo/Mamba-YOLO-L.yaml\n    ultralytics/cfg/models/mamba-yolo/yolo-mamba-seg.yaml\n\n# Hyper-YOLO\n1. ultralytics/cfg/models/hyper-yolo/hyper-yolo.yaml\n2. ultralytics/cfg/models/hyper-yolo/hyper-yolot.yaml\n3. ultralytics/cfg/models/hyper-yolo/hyper-yolo-seg.yaml\n\n\n# 注意力系列\n1. EMA\n2. SimAM\n3. SpatialGroupEnhance\n4. BiLevelRoutingAttention, BiLevelRoutingAttention_nchw\n5. TripletAttention\n6. CoordAtt\n7. CBAM\n8. BAMBlock\n9. EfficientAttention(CloFormer中的注意力)\n10. LSKBlock\n11. SEAttention\n12. CPCA\n13. deformable_LKA\n14. EffectiveSEModule\n15. LSKA\n16. SegNext_Attention\n17. DAttention(Vision Transformer with Deformable Attention CVPR2022)\n18. FocusedLinearAttention(ICCV2023)\n19. MLCA\n20. TransNeXt_AggregatedAttention\n21. LocalWindowAttention(EfficientViT中的CascadedGroupAttention注意力)\n22. Efficient Local Attention[Efficient Local Attention](https://arxiv.org/abs/2403.01123)\n23. CAA(CVPR2024 PKINet中的注意力)\n24. CAFM\n25. AFGCAttention[Neural Networks ECCV2024](https://www.sciencedirect.com/science/article/abs/pii/S0893608024002387)\n\n# Loss系列\n1. SlideLoss,EMASlideLoss.(可动态调节正负样本的系数,让模型更加注重难分类,错误分类的样本上)\n2. IoU,GIoU,DIoU,CIoU,EIoU,SIoU,MPDIoU,ShapeIoU.\n3. Inner-IoU,Inner-GIoU,Inner-DIoU,Inner-CIoU,Inner-EIoU,Inner-SIoU,Inner-ShapeIoU.\n4. Wise-IoU(v1,v2,v3)系列(IoU,WIoU,EIoU,GIoU,DIoU,CIoU,SIoU,MPDIoU,ShapeIoU).\n5. Inner-Wise-IoU(v1,v2,v3)系列(IoU,WIoU,EIoU,GIoU,DIoU,CIoU,SIoU,MPDIoU,ShapeIoU).\n6. FocalLoss,VarifocalLoss,QualityfocalLoss\n7. Focaler-IoU系列(IoU,GIoU,DIoU,CIoU,EIoU,SIoU,WIoU,MPDIoU,ShapeIoU)\n8. Powerful-IoU,Powerful-IoUV2,Inner-Powerful-IoU,Inner-Powerful-IoUV2,Focaler-Powerful-IoU,Focaler-Powerful-IoUV2,Wise-Powerful-IoU(v1,v2,v3),Wise-Powerful-IoUV2(v1,v2,v3)[论文链接](https://www.sciencedirect.com/science/article/abs/pii/S0893608023006640)\n9. Normalized Gaussian Wasserstein Distance.\n10. Gaussian Combined Distance.\n\n# 更新公告\n\n- **20230620-yolov8-v1.1**\n    1. 增加EMA,C2f-Faster-EMA.\n    2. val.py增加batch选择.\n    3. train.py增加resume断点续训.\n\n- **20230625-yolov8-v1.2**\n    1. 使用说明和视频增加断点续训教程.\n    2. 增加 使用C2f-DBB替换C2f.(使用DiverseBranchBlock替换C2f中的Bottleneck中的Conv) C2f-DBB同样可以用在bifpn中的node.\n    3. 使用说明中增加常见错误以及解决方案.\n\n- **20230627-yolov8-v1.3**\n    1. 增加Adaptive Training Sample Selection匹配策略.\n    2. val.py增加save_txt参数.\n    3. 更新使用教程.\n\n- **20230701-yolov8-v1.4**\n    1. val.py中增加imgsz参数，可以自定义val时候的图片尺寸，默认为640.\n    2. 增加plot_result.py，用于绘制对比曲线图，详细请看使用说明13点.\n    3. 支持计算COCO评价指标.详细请看使用说明12点.\n    4. 增加yolov8-slimneck.其中VoVGSCSP\\VoVGSCSPC支持在bifpn中使用,支持GSConv的替换.\n\n- **20230703-yolov8-v1.5**\n    1. 修正计算gflops.\n    2. 增加YOLOV5-AnchorFree改进，详细可看使用教程.md\n    3. 增加yolov8-attention.yaml，并附带视频如何在yaml中添加注意力层\n    4. 更新train.py --info参数的功能，增加打印每一层的参数，增加模型融合前后的层数，参数量，计算量对比。\n\n- **20230705-yolov8-v1.6**\n    1. yolov5和yolov8 支持 Asymptotic Feature Pyramid Network.\n\n- **20230714-yolov8-v1.7**\n    1. 把添加的所有模块全部转移到ultralytics/nn/extra_modules，以便后面进行同步代码。\n    2. 增加yolov5-bifpn。\n    3. 修正ultralytics/models/v8/yolov8-efficientViT.yaml，经粉丝反映，EfficientViT存在同名论文，本次更新的EfficientViT更适合目标检测，之前的efficientViT的原文是在语义分割上进行提出的。\n    4. 更新使用教程。\n    5. 更新import逻辑，现在不需要安装mmcv也可以进行使用，但是没有安装mmcv的使用dyhead会进行报错，降低上手难度。\n\n- **20230717-yolov8-v1.8**\n    1. 修正vanillanet主干进行fuse后没法计算GFLOPs的bug.\n    2. 添加yolov8-C2f-CloAtt,yolov5-C3-CloAtt.\n    3. 添加yolov8-vanillanet.yaml.\n\n- **20230723-yolov8-v1.9**\n    1. 利用(ICLR2023)Reversible Column Networks对yolov5,yolov8的结构进行重设计.\n    2. 支持旋转目标检测2023SOTA的LSKNet主干.\n    3. 支持旋转目标检测2023SOTA的LSKNet主干中的LSKBlock注意力机制.\n    4. 更新使用教程中的常见错误.\n    5. 使用教程中增加常见疑问.\n\n- **20230730-yolov8-v1.10**\n    1. 增加yolov8-C2f-SCConv,yolov5-C3-SCConv.(CVPR 2020 http://mftp.mmcheng.net/Papers/20cvprSCNet.pdf)\n    2. 增加yolov8-C2f-ScConv,yolov5-C3-ScConv.(CVPR 2023 https://openaccess.thecvf.com/content/CVPR2023/papers/Li_SCConv_Spatial_and_Channel_Reconstruction_Convolution_for_Feature_Redundancy_CVPR_2023_paper.pdf)\n    3. 更新使用教程.\n    4. 更新视频百度云链接,增加SCConv和ScConv的使用教程.\n\n- **20230730-yolov8-v1.11**\n    1. yolov8-C2f-ScConv,yolov5-C3-ScConv分别更名为yolov8-C2f-SCcConv,yolov5-C3-SCcConv,因为在windows下命名不会区分大小写,导致解压的时候会出现覆盖请求.\n    2. 支持MPDiou,具体修改方法请看使用教程.\n\n- **20230802-yolov8-v1.11.1**\n    1. 去除dataloader中的drop_last(ultralytics/yolo/data/build.py, build_dataloader func).\n    2. 修正MPDiou.\n\n- **20230806-yolov8-v1.12**\n    1. 添加全新自研模块(Light Adaptive-weight downsampling),具体可看使用教程.\n\n- **20230808-yolov8-v1.13**\n    1. 添加全新自研模块(EMSC, EMSCP),具体可看使用教程.\n    2. 添加RSC-YOLO中的RCSOSA到yolov5和yolov8中.\n    3. 更新使用教程.\n\n- **20230824-yolov8-v1.14**\n    1. 支持SlideLoss和EMASlideLoss(利用Exponential Moving Average优化mean iou,可当自研创新模块),使用方式具体看使用教程.\n    2. 支持KernelWarehouse:Towards Parameter-Efficient Dynamic Convolution(2023最新发布的动态卷积).\n    3. 支持最新可变形卷积-Dynamic Snake Convolution.\n    4. 支持Normalized Gaussian Wasserstein Distance(NWD).\n    5. 增加CPCANet中的CPCA注意力机制.\n    6. 更新使用教程.\n\n- **20230830-yolov8-v1.15**\n    1. 对检测头进行重设计,支持10种(参数量和计算量更低的)检测头,详细请看使用教程.\n\n- **20230904-yolov8-v1.16**\n    1. 支持DCNV2,DCNV3.详细请看项目百度云视频.\n    2. 使用DCNV3改进DyHead.(ultralytics/models/v5/yolov5-dyhead-DCNV3.yaml,ultralytics/models/v8/yolov8-dyhead-DCNV3.yaml)\n    3. 根据YOLOV7-AUX辅助训练头思想,改进YOLOV8,增加辅助训练头,训练时候参与训练,检测时候去掉.(ultralytics/models/v5/yolov5-AuxHead.yaml, ultralytics/models/v8/yolov8-AuxHead.yaml)\n    4. 增加C3-Faster(ultralytics/models/v5/yolov5-C3-Faster.yaml).\n    5. 增加C3-ODConv(ultralytics/models/v5/yolov5-C3-ODConv.yaml).\n    6. 增加C3-Faster-EMA(ultralytics/models/v5/yolov5-C3-Faster-EMA.yaml).\n    7. 更新使用教程.\n\n- **20230909-yolov8-v1.17**\n    1. 优化辅助训练头部分代码.\n    2. 修复多卡训练中的一些bug.\n    3. 更新使用教程.(百度云视频中增加关于C3-XXX和C2f-XXX移植到官方yolov5上的讲解)\n    4. 支持TAL标签分配策略中使用NWD(具体可看使用教程).\n\n- **20230915-yolov8-v1.18**\n    1. 新增Online Convolutional Re-parameterization (CVPR2022).(超越DBB和RepVGG) (C3-OREPA,C3-REPVGGOREPA,C2f-OREPA,C2f-REPVGGOREPA)\n    2. 新增FocalModulation.\n    3. 支持RepViT和SwinTransformer-Tiny主干.\n    4. 利用OREPA优化自研模块(EMSC,EMSCP).\n    5. 更新使用教程和百度云视频.\n\n- **20230916-yolov8-v1.19**\n    1. 去除OREPA_1x1,该结构会让模型无法收敛或者NAN.\n    2. 新增yolov8-fasternet-bifpn和yolov5-fasternet-bifpn.\n    3. 更新使用教程和百度云视频.(更新OREPA的视频和增加如何看懂代码结构-以C2f-Faster-EMA为例).\n\n- **20230919-yolov8-v1.19.1**\n    1. 修复C2f-ODConv在20epochs后精度异常问题.\n    2. 修复BAM注意力机制中的padding问题.\n    3. 修复EfficientAttention(CloFormer中的注意力)注意力机制不能在配置文件添加的问题.\n    4. 去除C2f-EMSP-OREPA,C2f-EMSCP-OREPA,C3-EMSP-OREPA,C3-EMSCP-OREPA,这部分不稳定,容易出现NAN.\n    5. 群公告中增加使用前必看的百度云视频链接.\n\n- **20230924-yolov8-v1.20**\n    1. 增加自研注意力机制MPCA(基于CVPR2021 CA注意力机制).详细可看百度云视频.\n    2. 使用自研注意力机制MPCA强化DCNV2中的offset和mask生成.详细可看百度云视频和使用教程.\n    3. 把timm配置文件的预训练权重参数改为False,也即是默认不下载和使用预训练权重.\n    4. 利用华为2023最新GOLD-YOLO中的Gatherand-Distribute进行改进特征融合模块.\n\n- **20230927-yolov8-v1.21**\n    1. 使用YOLO-MS中的MSBlock改进C2f和C3模块,具体请看使用教程.\n    2. 使用GCNet中的Light-weight Context Guided改进C2f和C3模块,具体请看使用教程.\n    3. 使用GCNet中的Light-weight Context Guided Down替换YOLO中的下采样模块,具体请看使用教程.\n\n- **20231010-yolov8-v1.22**\n    1. RepViT同步官方源码.\n    2. 经实验发现网络全使用C2f-MSBlock和C3-MSBlock不稳定,因此在Neck部分还是使用C2f或C3,具体可参看对应的配置文件.\n    3. 支持deformableLKA注意力机制,并进行改进C2f和C3,提出C2f_DLKA,C3_DLKA.\n    4. 使用DAMO-YOLO中的RepGFPN改进yolov8中的Neck.\n    5. 使用YOLOV6中的EfficientRepBiPAN改进yolov8中的Neck.\n    6. 新增支持SPDConv进行下采样.\n    7. 使用Efficientnet中的MBConv与EffectiveSE改进C2f和C3.\n\n- **20231020-yolov8-v1.23**\n    1. 更新使用教程和百度云视频.(更新DAttention使用说明视频).\n    2. 增加LSKA, SegNext_Attention, DAttention(Vision Transformer with Deformable Attention CVPR2022).\n    3. 使用LSKA改进SPPF,增强多尺度特征提取能力.\n    4. 使用[Vision Transformer with Deformable Attention(CVPR2022)]改进C2f,C3.\n\n- **20231107-yolov8-v1.24**\n    1. 新增CVPR2022-CSwinTransformer主干.\n    2. 新增yolov5-AIFI.yaml,yolov8-AIFI.yaml.\n    3. 新增使用ParC-Net中的位置感知循环卷积改进C3,C2f.\n    4. 新增使用DWRSeg中的Dilation-wise Residual(DWR)模块,加强从网络高层的可扩展感受野中提取特征.(yolov5-C3-DWR.yaml,yolov8-C2f-DWR.yaml)\n    5. 把当前所有的改进同步到ultralytics-8.0.202版本上.\n    6. 更新新版百度云链接视频.\n    7. 新增热力图、FPS脚本.\n\n- **20231114-yolov8-v1.25**\n    1. 新增EIou,SIou.\n    2. 新增Inner-IoU,Inner-GIoU,Inner-DIoU,Inner-CIoU,Inner-EIoU,Inner-SIoU.\n    3. 使用今年最新的MPDIoU与Inner-IoU相结合得到Inner-MPDIoU.\n    4. 新增[FLatten Transformer(ICCV2023)](https://github.com/LeapLabTHU/FLatten-Transformer)中的FocusedLinearAttention改进C3,C2f.\n    5. 更新get_FPS脚本中的模型导入方式,避免一些device报错.\n    6. 更新百度云链接视频-20231114版本更新说明.\n\n- **20231114-yolov8-v1.26**\n    1. 修正MPDIOU中的mpdiou_hw参数.\n    2. 更新使用教程.\n\n- **20231129-yolov8-v1.27**\n    1. 新增Mixed Local Channel Attention改进C2f和C3.\n    2. 新增AKConv改进C2f和C3.\n    3. 更新使用教程.\n    4. 更新百度云链接视频-20231129版本更新说明.\n\n- **20231207-yolov8-v1.28**\n    1. 新增支持2023最新大卷积核CNN架构RepLKNet升级版-UniRepLKNet.\n    2. 新增UniRepLKNet中的[UniRepLKNetBlock, DilatedReparamBlock]改进C3和C2f.\n    3. 使用UniRepLKNet中的DilatedReparamBlock对DWRSeg中的Dilation-wise Residual(DWR)模块进行二次创新后改进C3和C2f.\n    4. 修复get_FPS.py测速前没有进行fuse的问题.\n    5. 更新使用教程.\n    6. 更新百度云链接视频-20231207版本更新说明.\n\n- **20231217-yolov8-v1.29**\n    1. 新增ASF-YOLO中的Attentional Scale Sequence Fusion,并在其基础上增加P2检测层并进行优化网络结构.\n    2. 新增使用DualConv打造CSP Efficient Dual Layer Aggregation Networks.\n    3. 更新使用教程.\n    4. 更新百度云链接视频-20231217版本更新说明.\n\n- **20231227-yolov8-v1.30**\n    1. 新增支持TransNeXt主干和TransNeXt中的聚焦感知注意力机制.\n    2. 新增U-NetV2中的Semantics and Detail Infusion Module,分别对BIFPN和PAFPN中的feature fusion部分进行二次创新.\n    3. 更新使用教程.\n    4. 更新百度云链接视频-20231227版本更新说明.\n\n- **20240104-yolov8-v1.31**\n    1. 新增Shape-IoU,Inner-Shape-IoU.\n    2. 更新使用教程.\n    3. 更新百度云链接视频-20230104版本更新说明.\n\n- **20240111-yolov8-v1.32**\n    1. 支持FocalLoss,VarifocalLoss,QualityfocalLoss.\n    2. 支持Wise-IoU(v1,v2,v3)系列(IoU,WIoU,EIoU,GIoU,DIoU,CIoU,SIoU,MPDIoU,ShapeIoU).\n    3. 支持Inner-Wise-IoU(v1,v2,v3)系列(IoU,WIoU,EIoU,GIoU,DIoU,CIoU,SIoU,MPDIoU,ShapeIoU).\n    4. 更新使用教程.\n    5. 更新百度云链接视频-20230111版本更新说明.\n\n- **20240116-yolov8-v1.33**\n    1. 使用ASF-YOLO中Attentional Scale Sequence Fusion与GOLD-YOLO中的Gatherand-Distribute进行二次创新结合.\n    2. 支持最新的DCNV4,C2f-DCNV4,C3-DCNV4,并使用DCNV4对DyHead进行二次创新(DyHead_DCNV4).\n    3. 修复不使用wise的情况下断点续训的bug.\n    4. 更新使用教程.\n    5. 更新百度云链接视频-20230116版本更新说明.\n\n- **20240122-yolov8-v1.34**\n    1. 使用[MFDS-DETR](https://github.com/JustlfC03/MFDS-DETR)中的HS-FPN改进YOLOV5、YOLOV8中的Neck.\n    2. 对[MFDS-DETR](https://github.com/JustlfC03/MFDS-DETR)中的HS-FPN进行二次创新后得到HSPAN改进YOLOV5、YOLOV8中的Neck.\n    3. 增加CARAFE轻量化上采样算子.\n    4. 增加DySample(ICCV2023)动态上采样算子.\n    5. 增加Haar wavelet downsampling下采样算子.\n    6. 支持soft-nms.(IoU,GIoU,DIoU,CIoU,EIoU,SIoU,ShapeIoU)\n    7. 更新使用教程.\n    8. 更新百度云链接视频-20230122版本更新说明.\n\n- **20240203-yolov8-v1.35**\n    1. 增加Focaler-IoU(IoU,GIoU,DIoU,CIoU,EIoU,SIoU,WIoU,MPDIoU,ShapeIoU).\n    2. 增加RepGFPN与DySample的二次创新组合.\n    3. 增加ASF-YOLO中的ASSF与DySample的二次创新组合.\n    4. 增加HS-PAN与DySample的二次创新组合.\n    5. 使用遮挡感知注意力SEAM,MultiSEAM改进Head,得到具有遮挡感知识别的SEAMHead,MultiSEAMHead.\n    6. 优化plot_result.py,使用线性插值来填充inf或者nan的数据,降低出现乱码问题的概率.\n    7. 更新使用教程.\n    8. 更新百度云链接视频-20230203版本更新说明.\n\n- **20240208-yolov8-v1.36**\n    1. 将所有改进代码同步到8.1.9上.\n\n- **20240216-yolov8-v1.37**\n    1. 增加EMO模型中的iRMB模块,并使用(EfficientViT-CVPR2023)中的CascadedAttention对其二次创新得到iRMB_Cascaded.\n    2. 新增Shift-ConvNets相关改进内容.(rtdetr-SWC.yaml,rtdetr-R50-SWC.yaml,yolov8-detr-C2f-SWC.yaml,yolov5-detr-C3-SWC.yaml)\n    3. 使用UniRepLKNet中的DilatedReparamBlock对EMO中的iRMB进行二次创新.\n    4. 使用Shift-ConvNets中的具有移位操作的卷积对EMO中的iRMB进行二次创新.\n    5. 修复一些已知问题.\n    6. 更新使用教程.\n    8. 百度云视频增加20240216更新说明.\n\n- **20240219-yolov8-v1.38**\n    1. 使用最新的Mamba架构(号称超越Transformer的新架构)改进C2f(提供两种改进方式).\n    2. 新增Powerful-IoU,Powerful-IoUV2,Inner-Powerful-IoU,Inner-Powerful-IoUV2,Focaler-Powerful-IoU,Focaler-Powerful-IoUV2,Wise-Powerful-IoU(v1,v2,v3),Wise-Powerful-IoUV2(v1,v2,v3)系列.\n    3. 修复一些已知问题.\n    4. 更新使用教程.\n    5. 百度云视频增加20240219更新说明.\n\n- **20240222-yolov8-v1.39**\n    1. 新增YOLOV9中的RepNCSPELAN模块.\n    2. 使用DBB,OREPA,DilatedReparamBlock对YOLOV9中的RepNCSPELAN模块进行二次创新.\n    3. 更新使用教程.\n    4. 百度云视频增加20240222更新说明.\n\n- **20240229-yolov8-v1.40**\n    1. 新增YOLOV9中的ADown下采样模块.\n    2. 新增YOLOV7中的下采样模块.\n    3. 新增YOLOV9中的programmable gradient information,并且PGI模块可以在训练后去除.\n    4. 更新使用教程.\n    5. 百度云视频增加20240229更新说明.\n\n- **20240303-yolov8-v1.41**\n    1. 新增CVPR2024-parameternet中的GhostModule与DynamicConv.\n    2. 使用CVPR2024-parameternet中的DynamicConv对CVPR2024-RTDETR中的HGBlokc进行二次创新.\n    3. 更新使用教程.\n    4. 百度云视频增加20240303更新说明.\n\n- **20240309-yolov8-v1.42**\n    1. 新增拆分CVPR2024 RepVIT里面的block,提出C2f-RVB、C2f-RVB-EMA.\n    2. 新增Lightweight Object Detection论文中的Dynamic Group Convolution Shuffle Transformer.\n    3. 新增自研Lightweight Shared Convolutional Detection Head,支持Detect、Seg、Pose、Obb.\n    4. 更新使用教程.\n    5. 百度云视频增加20240309更新说明.\n\n- **20240314-yolov8-v1.43**\n    1. 新增自研Task Align Dynamic Detection Head,支持Detect、Seg、Pose、Obb.\n    2. 更新使用教程，新增几个常见疑问回答.\n    3. 修复shapeiou调用不生效的问题.\n    4. 百度云视频增加20240314更新说明.\n\n- **20240323-yolov8-v1.44**\n    1. 新增CVPR2024-RMT主干,并支持RetBlock改进C3、C2f.\n    2. 新增2024年新出的Efficient Local Attention,并用其对HSFPN进行二次创新，并加入自研检测头TADDH.\n    3. 使用CVPR2021-CoordAttention对HSFPN进行二次创新.\n    4. 更新使用教程,增加多个常见疑问解答.\n    5. 百度云视频增加20240323更新说明.\n\n- **20240330-yolov8-v1.45**\n    1. 新增CVPR2024 PKINet主干.\n    2. 新增CVPR2024 PKINet中的PKIModule和CAA模块,提出C2f-PKI.\n    3. 使用CVPR2024 PKINet中的Context Anchor Attention改进RepNCSPELAN、HSFPN.\n    4. 更新使用教程.\n    5. 百度云视频增加20240330更新说明.\n\n- **20240406-yolov8-v1.46**\n    1. 新增CVPR2024 Frequency-Adaptive Dilated Convolution.\n    2. 新增自研Focusing Diffusion Pyramid Network.\n    3. 更新使用教程.\n    4. 百度云视频增加20240406更新说明.\n\n- **20240408-yolov8-v1.47**\n    1. 修复自研Focusing Diffusion Pyramid Network的一个小bug.\n    2. 新增使用自研特征聚焦扩散金字塔网络和自研任务对齐动态检测头相结合的配置文件yolov8-FDPN-TADDH.yaml\n    3. 新增HCFNet针对小目标分割的Parallelized Patch-Aware Attention Module改进C2f.\n    4. 新增HCFNet针对小目标分割的Dimension-Aware Selective Integration Module对自研Focusing Diffusion Pyramid Network再次进行创新.\n    5. 更新使用教程.\n    6. 百度云视频增加20240408更新说明.\n\n- **20240414-yolov8-v1.48**\n    1. 新增Cross-Scale Mutil-Head Self-Attention,对Mutil-Head Self-Attention进行二次创新.\n    2. 更新使用教程.\n    3. 百度云视频增加20240414更新说明.\n\n- **20240420-yolov8-v1.49**\n    1. 新增A Robust Feature Downsampling Module for Remote Sensing Visual Tasks中的下采样.\n    2. 新增Context and Spatial Feature Calibration for Real-Time Semantic Segmentation中的Context and Spatial Feature Calibration.\n    3. 更新使用教程.\n    4. 百度云视频增加20240420更新说明.\n\n- **20240428-yolov8-v1.50**\n    1. 修复20240420更新中的Context and Spatial Feature Calibration序号错误问题.\n    2. 新增支持mobilenetv4-backbone.\n    3. 新增支持content-guided attention fusion改进yolov8-neck.\n    4. 新增支持使用CAFM对CGAFusion进行二次改进,得到CAFMFusion改进yolov8-neck.\n    5. 更新使用教程.\n    6. 百度云视频增加20240428更新说明.\n\n- **20240501-yolov8-v1.51**\n    1. get_FPS.py脚本新增可以通过yaml测试推理速度.\n    2. 新增自研RGCSPELAN,其比C3、ELAN、C2f、RepNCSPELAN更低参数量和计算量更快推理速度.\n    3. 更新使用教程.\n    4. 百度云视频增加20240501更新说明.\n\n- **20240505-yolov8-v1.52**\n    1. 新增LADH.(Lightweight Asymmetric Detection Head).\n    2. 使用CVPR2024-TransNext中的Convolutional GLU对CVPR2023-FasterBlock进行二次创新.\n    3. 更新使用教程.\n    4. 百度云视频增加20240505更新说明.\n\n- **20240512-yolov8-v1.53**\n    1. 基于LSCD自研轻量化检测头再次进行改进得到LSCSBD.\n    2. 新增PSFusion中的superficial detail fusion module、profound semantic fusion module改进yolov8-neck.\n    3. 更新使用教程.\n    4. 百度云视频增加20240512更新说明.\n\n- **20240513-yolov8-v1.54**\n    1. 支持CVPR2024-StarNet,新一代SOTA轻量化模型.\n    2. 使用CVPR2024-StarNet对C2f进行创新得到C2f-Star.\n    3. 使用CVPR2024-StarNet与CVPR2024-PKINet进行组合创新得到C2f-Star-CAA.\n    4. 增加轻量化模型组合配置文件,融合StarNet、C2f-Star、LSCD.\n    5. 更新使用教程.\n    6. 百度云视频增加20240513更新说明.\n\n- **20240523-yolov8-v1.55**\n    1. KAN In! Mamba Out!,集成pytorch-kan-conv，支持多种KAN变种！\n    2. 同步DCNV4-CVPR2024最新代码.\n    3. 修复AIFI在某些组合会报错的问题.\n    4. 更新使用教程.\n    5. 百度云视频增加20240523更新说明.\n\n- **20240526-yolov8-v1.56**\n    1. 支持YOLOV8-NMSFree，仿照yolov10的思想采用双重标签分配和一致匹配度量进行训练,后处理不需要NMS!\n    2. 新增边缘信息增强模块自研模块，EIEStem、EIEM。\n    3. 更新使用教程.\n    4. 百度云视频增加20240526更新说明.\n\n- **20240601-yolov8-v1.57**\n    1. 新增自研ContextGuideFPN.\n    2. 新增detail-enhanced convolution改进c2f.\n    3. 新增自研LSDECD，在LSCD的基础上引入可重参数化的detail-enhanced convolution.\n    4. 新增自研SMPCGLU，里面的模块分别来自CVPR2023和CVPR2024.\n    5. 更新使用教程.\n    6. 百度云视频增加20240601更新说明.\n\n- **20240609-yolov8-v1.58**\n    1. 新增支持物理传热启发的视觉表征模型vHeat中的vHeatBlock.\n    2. 新增自研重校准特征金字塔网络(Re-CalibrationFPN),推出多个版本(P2345,P345,P3456).\n    3. 更新使用教程.\n    4. 百度云视频增加20240609更新说明.\n\n- **20240613-yolov8-v1.59**\n    1. 新增WaveletPool改进上采样和下采样.\n    2. 新增自研Cross Stage Partial - Partially Transformer Block模块.\n    3. 更新使用教程.\n    4. 百度云视频增加20240613更新说明.\n\n- **20240619-yolov8-v1.60**\n    1. 集成mamba-yolo.\n    2. 新增GLSA改进yolov8-neck.\n    3. 新增GLSA对BIFPN进行二次创新.\n    4. 更新使用教程.\n    5. 百度云视频增加20240619更新说明.\n\n- **20240627-yolov8-v1.61**\n    1. 新增UCTransNet中的ChannelTransformer改进yolov8-neck.\n    2. 新增自研SmallObjectEnhancePyramid.\n    3. 更新使用教程.\n    4. 百度云视频增加20240627更新说明.\n\n- **20240707-yolov8-v1.62**\n    1. 更新使用教程,增加常见疑问.  \n\n- **20240713-ultralytics-v1.63**\n    1. ultralytics版本已更新至8.2.50，后续会更新YOLOv8、YOLOv10的改进方案.\n    2. 新增YOLOV10改进、后续会一步一步更新V10的配置文件.（目前更新了backbone系列,一些自研系列的改进到v10中）\n    3. 更新使用教程.\n    4. 百度云视频增加20240713更新说明.\n    5. 百度云视频更新(断点续训教程、计算COCO指标教程、plot_result.py使用教程、项目使用教程必看系列、YOLOV10版本切换教程一)\n    6. 补充了EMSC和EMSCP的结构图.\n\n- **20240720-ultralytics-v1.64**\n    1. 修复一些已知问题.\n    2. 新增自研Context-Guided Spatial Feature Reconstruction Feature Pyramid Network.\n    3. 新增Wavelet Convolutions for Large Receptive Fields中的WTConv改进C2f.\n    4. 新增UBRFC-Net中的Adaptive Fine-Grained Channel Attention.\n    5. 更新使用教程.\n    6. 百度云视频增加20240720更新说明.\n    7. 增加v10多个改进、主要是上下采样系列.\n\n- **20240729-ultralytics-v1.65**\n    1. 新增自研FeaturePyramidSharedConv.\n    2. 新增ECCV2024-SMFANet中的Feature Modulation block.\n    3. 增加v10多个改进.\n    4. 更新使用教程.\n    5. 百度云视频增加20240729更新说明.\n\n- **20240803-ultralytics-v1.66**\n    1. 新增LDConv.\n    2. 新增Rethinking Performance Gains in Image Dehazing Networks中的gConv.\n    3. 新增MAF-YOLO中的MAFPN，并利用BIFPN的思想对MAFPN进行二次创新得到BIMAFPN.\n    4. 更新使用教程.\n    5. 百度云视频增加20240803更新说明.\n\n- **20240813-ultralytics-v1.67**\n    1. 新增APT-TAL标签分配策略.\n    2. 新增YOLO-MIF中的WDBB、DeepDBB的重参数化模块.\n    3. 新增SLAB中的RepBN改进AIFI.\n    4. 更新使用教程.\n    5. 百度云视频增加20240813更新说明.\n\n- **20240822-ultralytics-v1.68**\n    1. 新增CAS-ViT的AdditiveBlock.\n    2. 新增TransNeXt的Convolutional GLU对CAS-ViT的AdditiveBlock进行二次创新.\n    3. 新增自研Efficient Multi-Branch&Scale FPN.\n    4. 新增v10多个改进.\n    5. 更新使用教程.\n    6. 百度云视频增加20240822更新说明.\n\n- **20240831-ultralytics-v1.69**\n    1. 新增CMTFUnet和TransNext的二次创新模块.\n    2. 新增自研CSP-Partial Multi-Scale Feature Aggregation.\n    3. 更新使用教程.\n    4. 百度云视频增加20240831更新说明.\n\n- **20240908-ultralytics-v1.70**\n    1. 新增Cross-Layer Feature Pyramid Transformer for Small Object Detection in Aerial Images中的CFPT.\n    2. 新增ICLR2024中的MogaBlock.\n    3. 新增v10多个改进.\n    4. 更新使用教程.\n    5. 百度云视频增加20240908更新说明.\n\n- **20240920-ultralytics-v1.71**\n    1. 新增CVPR2024-SHViT中的SHSABlock和其的二次创新.\n    2. 新增BIBM2024-SMAFormer中的SMAFormerBlock和其的二次创新.\n    3. 新增TPAMI2024-FreqFusion中的FreqFusion改进Neck.\n    4. 新增v10多个改进.\n    5. 更新使用教程.\n    6. 百度云视频增加20240920更新说明.\n\n- **20241007-ultralytics-v1.72**\n    1. 新增自研MutilBackBone-DynamicAlignFusion.\n    2. 新增Metaformer TPAMI2024的IdentityFormer、RandomMixingFormer、PoolingFormer、ConvFormer、CaFormer改进C2f.\n    3. 新增Metaformer TPAMI2024的IdentityFormer、RandomMixingFormer、PoolingFormer、ConvFormer、CaFormer与CVPR2024-TranXNet的二次创新模块改进C2f.\n    4. 更新使用教程.\n    5. 百度云视频增加20241007更新说明.\n\n- **20241024-ultralytics-v1.73**\n    1. 增加v10多个改进.\n    2. 新增自研CSP-MutilScaleEdgeInformationEnhance.\n    3. 新增Efficient Frequency-Domain Image Deraining with Contrastive Regularization中的Fused_Fourier_Conv_Mixer.\n    4. 更新使用教程.\n    5. 百度云视频增加20241024更新说明.\n\n- **20241031-ultralytics-v1.74**\n    1. 新增v8、v10自研Rep Shared Convolutional Detection Head.\n    2. 更新使用教程.\n    3. 百度云视频增加20241031更新说明.\n\n- **20241109-ultralytics-v1.75**\n    1. 新增自研CSP-FreqSpatial.\n    2. 新增SFHformer ECCV2024中的block改进C2f.\n    3. 新增Revitalizing Convolutional Network for Image Restoration TPAMI2024中的MSM改进C2f.\n    4. 增加v10多个改进.\n    5. 更新使用教程.\n    6. 百度云视频增加20241109更新说明.\n\n- **20241122-ultralytics-v1.76**\n    1. 基于自研CSP-MutilScaleEdgeInformationEnhance再次创新得到CSP-MutilScaleEdgeInformationSelect.\n    2. 新增Pattern Recognition 2024|DRANet中的HDRAB和RAB模块改进C2f.\n    3. 新增ECCV2022-ELAN中的Local feature extraction改进C2f.\n    4. 增加v10多个改进.\n    5. 更新使用教程.\n    6. 百度云视频增加20241122更新说明.\n\n- **20241204-ultralytics-v1.77**\n    1. 新增自研GlobalEdgeInformationTransfer.\n    2. 新增FreqFormer的Frequency-aware Cascade Attention改进C2f.\n    3. 更新使用教程.\n    4. 百度云视频增加20241204更新说明.\n\n- **20241219-ultralytics-v1.78**\n    1. 新增CAMixerSR中的CAMixer改进C2f.\n    2. 新增支持Hyper-YOLO，并可以利用项目自带的改进改进Hyper-YOLO.\n    3. 新增Hyper-YOLO中的Hypergraph Computation in Semantic Space和Mixed Aggregation Network的改进.\n    4. 更新使用教程.\n    5. 百度云视频增加20241219更新说明.\n\n- **20250101-ultralytics-v1.79**\n    1. 新增基于Hyper-YOLO中的Mixed Aggregation Network三个二次改进系列.\n    2. 新增使用MSA^2 Net中的Multi-Scale Adaptive Spatial Attention Gate改进yolo11-neck.\n    3. 新增使用MSA^2 Net中的Multi-Scale Adaptive Spatial Attention Gate改进自研系列的MutilBackbone.\n    4. 更新使用教程.\n    5. 百度云视频增加20250101更新说明.\n\n- **20250119-ultralytics-v1.80**\n    1. 新增CRAFT-SR中的high-frequency enhancement residual block.\n    2. 新增AAAI2025-TBSN中的DTAB.\n    3. 新增ECCV2024-FSEL中的多个模块.\n    4. 新增ACMMM2024-WFEN中的小波变换特征融合.\n    5. 新增AAAI2025 Pinwheel-shaped Convolution and Scale-based Dynamic Loss for Infrared Small Target Detection中的Pinwheel-shaped Convolution类型改进.\n    6. 新增AAAI2025 ConDSeg中的ContrastDrivenFeatureAggregation与ACMMM2024 WFEN中的小波变换进行创新.\n    7. 更新使用教程.\n    8. 百度云视频增加20250119更新说明.\n\n- **20250207-ultralytics-v1.81**\n    1. 新增遥感目标检测Strip R-CNN中的StripBlock及其二次创新.\n    2. 新增BIBM2024 Spatial-Frequency Dual Domain Attention Network For Medical Image Segmentation中的Frequency-Spatial Attention和Multi-scale Progressive Channel Attention.\n    3. 新增ICLR2025 Kolmogorov-Arnold Transformer中的KAT及其配合FasterBlock的二次创新.<此模块需要编译>\n    4. 更新使用教程.\n    5. 百度云视频增加20250207更新说明.\n\n- **20250220-ultralytics-v1.82**\n    1. 新增自研模块DynamicInceptionDWConv2d.\n    2. 新增GlobalFilter和DynamicFilter.\n    3. 更新使用教程.\n    4. 百度云视频增加20250220更新说明.\n\n- **20250308-ultralytics-v1.83**\n    1. 新增自研模块Hierarchical Attention Fusion并提供多种使用方式.\n    2. 新增ICLR2025-Token Statistics Transformer改进PSA.\n    3. 新增MHAF-YOLO中的RepHMS.<这个是YOLO群内的一个博士新作品>\n    4. 更新使用教程.\n    5. 百度云视频增加20250308更新说明.\n\n- **20250323-ultralytics-v1.84**\n    1. 新增CVPR2025-MambaIR的模块.\n    2. 新增CVPR2025-SCSegamba中的模块.\n    3. 新增CVPR2025-MambaOut中的模块.\n    4. 更新使用教程.\n    5. 百度云视频增加20250323更新说明.\n\n- **20250406-ultralytics-v1.85**\n    1. 新增CVPR2025-DEIM中的Localization Quality Estimation改进YOLOHead使其分类头同时具备分类score和预测框质量score.\n    2. 新增Localization Quality Estimation - Lightweight Shared Convolutional Detection Head.\n    3. 新增CVPR2025-EfficientViM和其与CVPR2024-TransNeXt的二次创新后的模块.\n    4. 更新使用教程.\n    5. 百度云视频增加20250406更新说明.\n\n- **20250426-ultralytics-v1.86**\n    1. 新增CVPR2024-EMCAD中的EUCB上采样.\n    2. 新增CVPR2024-EMCAD与CVPR2025-BHViT的二次创新模块.\n    3. 新增CVPR2024-DCMPNet的多个模块和二次创新的模块.\n    4. 新增统计配置文件的计算量和参数量并排序的脚本.\n    5. 更新使用教程.\n    6. 百度云视频增加20250426更新说明.\n\n- **20250514-ultralytics-v1.87**\n    1. 新增LEGNet的LoGStem和LFEModule.\n    2. 新增新一代轻量化SOTA的CVPR2025-LSNet的LSNet和LSConv的多个改进和二次创新改进.\n    3. 新增CVPR2025-OverLock中的多个模块.\n    4. 修改保存权重的逻辑，训练结束(注意是正常训练结束后，手动停止的没有)后统一会保存4个模型，分别是best.pt、last.pt、best_fp32.pt、last_fp32.pt，其中不带fp32后缀的是fp16格式保存的，但由于有些模块对fp16非常敏感，会出现后续使用val.py的时候精度为0的情况，这种情况下可以用后缀带fp32去测试。\n    5. 更新使用教程.\n    6. 百度云视频增加20250514更新说明.\n\n- **20250601-ultralytics-v1.88**\n    1. 新增TransMamba的改进.\n    2. 新增CVPR2025-DarkIR的改进.\n    3. 新增CVPR2025-EVSSM的改进.\n    4. 更新使用教程.\n    5. 百度云视频增加20250601更新说明.\n\n- **20250629-ultralytics-v1.89**\n    1. 新增ECCV2024-rethinkingfpn中的模块，并对原创改进SOEP再次创新。\n    2. 新增CVPR2024-SFSConv的模块.\n    3. 新增CVPR2025-GroupMamba中的模块.\n    4. 新增CVPR2025-MambaVision中的模块.\n    5. 新增AAAI2025-FBRTYOLO中的模块.\n    6. 更新使用教程.\n    7. 百度云视频增加20250629更新说明.\n    8. 修复在torch2.6.0以及以上的版本会出现模型读取失败的问题.\n\n- **20250727-ultralytics-v1.90**\n    1. 新增Pyramid Sparse Transformer改进yolo11-neck.\n    2. 新增Pyramid Sparse Transformer对SOEP再创新.\n    3. 新增MIA2025-FourierConv.\n    4. 新增AAAI2025的HS-FPN.\n    5. 新增TGRS2025-UMFormer中的模块.\n    6. 更新使用教程.\n    7. 百度云视频增加20250727更新说明.\n\n- **20250822-ultralytics-v1.91**\n    1. 新增ICCV2025-ESC中的多个改进。\n    2. 新增ICCV2025-UniConvBlock中的改进。\n    3. 更新使用教程.\n    4. 百度云视频增加20250822更新说明.\n\n- **20250919-ultralytics-v1.92**\n    1. 新增CVPR2025-GCConv模块.\n    2. 新增AAAI2024-CFBlock模块.\n    3. 新增ICCV2023-FastViT中的RepStem模块.\n    4. 更新使用教程.\n    5. 百度云视频增加20250919更新说明.\n\n- **20251028-ultralytics-v1.93**\n    1. 新增TGRS2025-ASCNet中的模块.\n    2. 新增ICCV2025-HFRB模块.\n    3. 新增ICIP2025-BEVANET中的模块.\n    4. 更新使用教程.\n    5. 百度云视频增加20251028更新说明.\n\n- **20251129-ultralytics-v1.94**\n    1. 新增GRSL2025-Gaussian Combined Distance,支持在目标框损失和标签分配策略上更改，详细请看LOSS改进系列.md\n    2. 新增ACCV2024-PlainUSR中的模块.\n    3. 更新使用教程.\n    4. 百度云视频增加20251129更新说明.\n\n- **20260118-ultralytics-v1.95**\n    1. 新增CVPR2025-HVI中的LCA、IEL模块.\n    2. 新增TGRS2025-HAFNet中的HFFE模块.\n    3. 更新使用教程.\n    4. 百度云视频增加20260118更新说明.\n\n- **20260227-ultralytics-v1.96**\n    1. 优化detect.py中的特征图保存机制，使其可以单独保存每一个通道的特征图和总通道求和的特征图.\n    2. 优化训练过程的输出，增加训练过程中的mAP75输出."
  },
  {
    "path": "yolo-improve/yolov9-backbone/yolo.py",
    "content": "def _forward_once(self, x, profile=False, visualize=False):\n        y, dt = [], []  # outputs\n        for m in self.model:\n            if m.f != -1:  # if not from previous layer\n                x = y[m.f] if isinstance(m.f, int) else [x if j == -1 else y[j] for j in m.f]  # from earlier layers\n            if profile:\n                self._profile_one_layer(m, x, dt)\n            if hasattr(m, 'backbone'):\n                x = m(x)\n                for _ in range(5 - len(x)):\n                    x.insert(0, None)\n                have_silence = False\n                if len(y) == 1:\n                    have_silence = True\n                for i_idx, i in enumerate(x):\n                    if have_silence:\n                        i_idx += 1\n                    if i_idx in self.save:\n                        y.append(i)\n                    else:\n                        y.append(None)\n                x = x[-1]\n            else:\n                x = m(x)  # run\n                y.append(x if m.i in self.save else None)  # save output\n            if visualize:\n                feature_visualization(x, m.type, m.i, save_dir=visualize)\n        return x\n\ndef parse_model(d, ch):  # model_dict, input_channels(3)\n    # Parse a YOLO model.yaml dictionary\n    LOGGER.info(f\"\\n{'':>3}{'from':>18}{'n':>3}{'params':>10}  {'module':<40}{'arguments':<30}\")\n    anchors, nc, gd, gw, act = d['anchors'], d['nc'], d['depth_multiple'], d['width_multiple'], d.get('activation')\n    if act:\n        Conv.default_act = eval(act)  # redefine default activation, i.e. Conv.default_act = nn.SiLU()\n        RepConvN.default_act = eval(act)  # redefine default activation, i.e. Conv.default_act = nn.SiLU()\n        LOGGER.info(f\"{colorstr('activation:')} {act}\")  # print\n    na = (len(anchors[0]) // 2) if isinstance(anchors, list) else anchors  # number of anchors\n    no = na * (nc + 5)  # number of outputs = anchors * (classes + 5)\n\n    is_backbone = False\n    layers, save, c2 = [], [], ch[-1]  # layers, savelist, ch out\n    for i, (f, n, m, args) in enumerate(d['backbone'] + d['head']):  # from, number, module, args\n        try:\n            t = m\n            m = eval(m) if isinstance(m, str) else m  # eval strings\n        except:\n            pass\n        for j, a in enumerate(args):\n            with contextlib.suppress(NameError):\n                args[j] = eval(a) if isinstance(a, str) else a  # eval strings\n\n        n = n_ = max(round(n * gd), 1) if n > 1 else n  # depth gain\n        if m in {\n            Conv, AConv, ConvTranspose, \n            Bottleneck, SPP, SPPF, DWConv, BottleneckCSP, nn.ConvTranspose2d, DWConvTranspose2d, SPPCSPC, ADown,\n            RepNCSPELAN4, SPPELAN}:\n            c1, c2 = ch[f], args[0]\n            if c2 != no:  # if not output\n                c2 = make_divisible(c2 * gw, 8)\n\n            args = [c1, c2, *args[1:]]\n            if m in {BottleneckCSP, SPPCSPC}:\n                args.insert(2, n)  # number of repeats\n                n = 1\n        elif m is nn.BatchNorm2d:\n            args = [ch[f]]\n        elif m is Concat:\n            c2 = sum(ch[x] for x in f)\n        elif m is Shortcut:\n            c2 = ch[f[0]]\n        elif m is ReOrg:\n            c2 = ch[f] * 4\n        elif m is CBLinear:\n            c2 = args[0]\n            c1 = ch[f]\n            args = [c1, c2, *args[1:]]\n        elif m is CBFuse:\n            c2 = ch[f[-1]]\n        # TODO: channel, gw, gd\n        elif m in {Detect, DualDetect, TripleDetect, DDetect, DualDDetect, TripleDDetect, Segment, DSegment, DualDSegment, Panoptic}:\n            args.append([ch[x] for x in f])\n            # if isinstance(args[1], int):  # number of anchors\n            #     args[1] = [list(range(args[1] * 2))] * len(f)\n            if m in {Segment, DSegment, DualDSegment, Panoptic}:\n                args[2] = make_divisible(args[2] * gw, 8)\n        elif m is Contract:\n            c2 = ch[f] * args[0] ** 2\n        elif m is Expand:\n            c2 = ch[f] // args[0] ** 2\n        elif isinstance(m, str):\n            t = m\n            m = timm.create_model(m, pretrained=args[0], features_only=True)\n            c2 = m.feature_info.channels()\n        # elif m in {}:\n        #     m = m(*args)\n        #     c2 = m.channel\n        else:\n            c2 = ch[f]\n\n        if isinstance(c2, list) and m not in {CBLinear, }:\n            is_backbone = True\n            m_ = m\n            m_.backbone = True\n        else:\n            m_ = nn.Sequential(*(m(*args) for _ in range(n))) if n > 1 else m(*args)  # module\n            t = str(m)[8:-2].replace('__main__.', '')  # module type\n        np = sum(x.numel() for x in m_.parameters())  # number params\n        m_.i, m_.f, m_.type, m_.np = i + 4 if is_backbone else i, f, t, np  # attach index, 'from' index, type, number params\n        LOGGER.info(f'{i:>3}{str(f):>18}{n_:>3}{np:10.0f}  {t:<40}{str(args):<30}')  # print\n        save.extend(x % (i + 4 if is_backbone else i) for x in ([f] if isinstance(f, int) else f) if x != -1)  # append to savelist\n        layers.append(m_)\n        if i == 0:\n            ch = []\n        if isinstance(c2, list) and m not in {CBLinear, }:\n            for _ in range(5 - len(c2)):\n                c2.insert(0, 0)\n            ch.extend(c2)\n        else:\n            ch.append(c2)\n    return nn.Sequential(*layers), sorted(save)"
  },
  {
    "path": "yolo-improve/yolov9-backbone/yolov9-c-custom.yaml",
    "content": "# YOLOv9\n\n# parameters\nnc: 80  # number of classes\ndepth_multiple: 1.0  # model depth multiple\nwidth_multiple: 1.0  # layer channel multiple\n#activation: nn.LeakyReLU(0.1)\n#activation: nn.ReLU()\n\n# anchors\nanchors: 3\n\n# 1-P1/2\n# 2-P2/4\n# 3-P3/8\n# 4-P4/16\n# 5-P5/32\n\n# YOLOv9 backbone\nbackbone:\n  [\n   [-1, 1, Silence, []], # 0\n   [-1, 1, mobilenetv2_035, [False]] # 5\n  ]\n\n# YOLOv9 head\nhead:\n  [\n   # elan-spp block\n   [-1, 1, SPPELAN, [512, 256]],  # 6\n\n   # up-concat merge\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']], # 7\n   [[-1, 4], 1, Concat, [1]],  # cat backbone P4 8\n\n   # elan-2 block\n   [-1, 1, RepNCSPELAN4, [512, 512, 256, 1]],  # 9\n\n   # up-concat merge\n   [-1, 1, nn.Upsample, [None, 2, 'nearest']], # 10\n   [[-1, 3], 1, Concat, [1]],  # cat backbone P3 11\n\n   # elan-2 block\n   [-1, 1, RepNCSPELAN4, [256, 256, 128, 1]],  # 12 (P3/8-small)\n\n   # avg-conv-down merge\n   [-1, 1, ADown, [256]],  # 13\n   [[-1, 9], 1, Concat, [1]],  # cat head P4 14\n\n   # elan-2 block\n   [-1, 1, RepNCSPELAN4, [512, 512, 256, 1]],  # 15 (P4/16-medium)\n\n   # avg-conv-down merge\n   [-1, 1, ADown, [512]],  # 16\n   [[-1, 6], 1, Concat, [1]],  # cat head P5 17\n\n   # elan-2 block\n   [-1, 1, RepNCSPELAN4, [512, 512, 256, 1]],  # 18 (P5/32-large)\n   \n   \n   # multi-level reversible auxiliary branch\n   \n   # routing\n   [3, 1, CBLinear, [[256]]], # 19\n   [4, 1, CBLinear, [[256, 512]]], # 20\n   [5, 1, CBLinear, [[256, 512, 512]]], # 21\n   \n   # conv down\n   [0, 1, Conv, [64, 3, 2]],  # 22-P1/2\n\n   # conv down\n   [-1, 1, Conv, [128, 3, 2]],  # 23-P2/4\n\n   # elan-1 block\n   [-1, 1, RepNCSPELAN4, [256, 128, 64, 1]],  # 24\n\n   # avg-conv down fuse\n   [-1, 1, ADown, [256]],  # 25-P3/8\n   [[19, 20, 21, -1], 1, CBFuse, [[0, 0, 0]]], # 26\n\n   # elan-2 block\n   [-1, 1, RepNCSPELAN4, [512, 256, 128, 1]],  # 27\n\n   # avg-conv down fuse\n   [-1, 1, ADown, [512]],  # 28-P4/16\n   [[20, 21, -1], 1, CBFuse, [[1, 1]]], # 29\n\n   # elan-2 block\n   [-1, 1, RepNCSPELAN4, [512, 512, 256, 1]],  # 30\n\n   # avg-conv down fuse\n   [-1, 1, ADown, [512]],  # 31-P5/32\n   [[21, -1], 1, CBFuse, [[2]]], # 32\n\n   # elan-2 block\n   [-1, 1, RepNCSPELAN4, [512, 512, 256, 1]],  # 33\n   \n   # detect\n   [[27, 30, 33, 12, 15, 18], 1, DualDDetect, [nc]],  # DualDDetect(A3, A4, A5, P3, P4, P5)\n  ]\n"
  }
]