Repository: dmarx/video-killed-the-radio-star Branch: main Commit: 7d1053356aa8 Files: 14 Total size: 270.1 KB Directory structure: gitextract_j7_i1yfm/ ├── .github/ │ └── workflows/ │ └── python-publish.yml ├── .gitignore ├── LICENSE ├── README.md ├── VERSION ├── Video_Killed_The_Radio_Star_Defusion.ipynb ├── pyproject.toml └── vktrs/ ├── __init__.py ├── api.py ├── asr.py ├── hf.py ├── tsp.py ├── utils.py └── youtube.py ================================================ FILE CONTENTS ================================================ ================================================ FILE: .github/workflows/python-publish.yml ================================================ # This workflow will upload a Python Package using Twine when a release is created # For more information see: https://help.github.com/en/actions/language-and-framework-guides/using-python-with-github-actions#publishing-to-package-registries # This workflow uses actions that are not certified by GitHub. # They are provided by a third-party and are governed by # separate terms of service, privacy policy, and support # documentation. name: Upload Python Package on: release: types: [published] permissions: contents: read jobs: deploy: runs-on: ubuntu-latest steps: - uses: actions/checkout@v3 - name: Set up Python uses: actions/setup-python@v3 with: python-version: '3.x' - name: Install dependencies run: | python -m pip install --upgrade pip pip install build - name: Build package run: python -m build - name: Publish package uses: pypa/gh-action-pypi-publish@27b31702a0e7fc50959f5ad993c78deac1bdfc29 with: user: __token__ password: ${{ secrets.PYPI_TOKEN }} ================================================ FILE: .gitignore ================================================ _venv *.srv2 *.vtt *.webm *.mp3 *.mp4 *.pyc *.egg-info/ **/frames **/archive *.yaml *.whl *.tar.gz dist # local huggingface model directories and files feature_extractor safety_checker scheduler text_encoder tokenizer unet vae model_index.json ================================================ FILE: LICENSE ================================================ MIT License Copyright (c) 2022 David Marx Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ================================================ FILE: README.md ================================================ # Video Killed The Radio Star [](https://colab.research.google.com/github/dmarx/video-killed-the-radio-star/blob/main/Video_Killed_The_Radio_Star_Defusion.ipynb) ## Requirements * ffmpeg - https://ffmpeg.org/ * pytorch - https://pytorch.org/get-started/locally/ * vktrs - (this repo) - `pip install vktrs[api]` * stability_sdk api token - https://beta.dreamstudio.ai/ > circular icon in top right > membership > API Key * whisper - `pip install git+https://github.com/openai/whisper` ## FAQ **What is this?** TLDR: Automated music video maker, given an mp3 or a youtube URL **How does this animation technique work?** For each text prompt you provide, the notebook will... 1. Generate an image based on that text prompt (using stable diffusion) 2. Use the generated image as the `init_image` to recombine with the text prompt to generate variations similar to the first image. This produces a sequence of extremely similar images based on the original text prompt 3. Images are then intelligently reordered to find the smoothest animation sequence of those frames 3. This image sequence is then repeated to pad out the animation duration as needed The technique demonstrated in this notebook was inspired by a [video](https://www.youtube.com/watch?v=WJaxFbdjm8c) created by Ben Gillin. **How are lyrics transcribed?** This notebook uses openai's recently released 'whisper' model for performing automatic speech recognition. OpenAI was kind of to offer several different sizes of this model which each have their own pros and cons. This notebook uses the largest whisper model for transcribing the actual lyrics. Additionally, we use the smallest model for performing the lyric segmentation. Neither of these models is perfect, but the results so far seem pretty decent. The first draft of this notebook relied on subtitles from youtube videos to determine timing, which was then aligned with user-provided lyrics. Youtube's automated captions are powerful and I'll update the notebook shortly to leverage those again, but for the time being we're just using whisper for everything and not referencing user-provided captions at all. **Something didn't work quite right in the transcription process. How do fix the timing or the actual lyrics?** The notebook is divided into several steps. Between each step, a "storyboard" file is updated. If you want to make modifications, you can edit this file directly and those edits should be reflected when you next load the file. Depending on what you changed and what step you run next, your changes may be ignored or even overwritten. Still playing with different solutions here. **Can I provide my own images to 'bring to life' and associate with certain lyrics/sequences?** Yes, you can! As described above: you just need to modify the storyboard. Will describe this functionality in greater detail after the implementation stabilizes a bit more. **This gave me an idea and I'd like to use just a part of your process here. What's the best way to reuse just some of the machinery you've developed here?** Most of the functionality in this notebook has been offloaded to library I published to pypi called `vktrs`. I strongly encourage you to import anything you need from there rather than cutting and pasting function into a notebook. Similarly, if you have ideas for improvements, please don't hesitate to submit a PR! ## Dev notes ``` !pip install --upgrade setuptools build !git clone https://github.com/dmarx/video-killed-the-radio-star/ !cd video-killed-the-radio-star; python -m build; python -m pip install -e .[api,hf] !pip install ipykernel ipywidgets panel prefetch_generator ``` ================================================ FILE: VERSION ================================================ 0.1.8 ================================================ FILE: Video_Killed_The_Radio_Star_Defusion.ipynb ================================================ { "cells": [ { "cell_type": "markdown", "metadata": { "id": "mgXxoDhMAiti" }, "source": [ "# $ \\text{Video Killed The Radio Star}$ $\\color{red}{...Diffusion}$\n", "\n", "\n", "\n", "\n", "\n", "Notebook by David Marx ([@DigThatData](https://twitter.com/digthatdata))\n", "\n", "Shared under MIT license\n", "\n", "Last updated: 2023-06-15\n", "\n", "Latest stable notebook revision: [](https://colab.research.google.com/github/dmarx/video-killed-the-radio-star/blob/main/Video_Killed_The_Radio_Star_Defusion.ipynb)\n", "\n", "# Introduction\n", "\n", "VKTRS is a tool for planning and generating complex animations using audio with spoken/sung words as conditioning input. The planning work can be used independently from generating an animation: all of your work is saved to a human-readable and modifiable `storyboard.yaml` file. Parameter names and notation are also compatible with other AI animation tools, such as a1111-deforum.\n", "\n", "The inspiration for this notebook and the \"variations\" animation modes were [this video](https://www.youtube.com/watch?v=WJaxFbdjm8c) created by Ben Gillin.\n", "\n", "# General Workflow\n", "\n", "If you are reading this in colab, open the \"Table of contents\" tab on the left sidebar for a more detailed enumeration of the steps described below.\n", "\n", "1. Pick an audio source, like a youtube video\n", "2. The notebook will download the video if necessary\n", "3. OpenAI's \"whisper\" model is used to transcribe the lyrics\n", "4. The timing of this transcription is used to segment the timeline into a sequence of \"scenes\"\n", "5. The musical structure of the audio is analyzed to group thematically similar themes\n", "6. Scene/theme specific settings (prompts, camera motion) can then be added\n", "7. The audio can be further processed to isolate signals for driving animation parameters for \"audioreactive\" effects. This includes isolating instruments (demucs) and chaining manipulations (librosa)\n", "8. Generate or specify starting images for each scene\n", "9. Generate the remaining frames to animate each scene\n", "\n", "**NB: the `img2img` animation mode is currently only supported when using the stability ai animation api (DreamStudio) to generate animations**. The audioreactivity features are currently only relevant in img2img mode. If you don't have a DreamStudio account, you can still use this notebook to parameterize an animation, you'll just need to cut-and-paste settings out of the \"export settings\" cell (deforum compatible) or the `storyboard.yaml` file.\n", "\n", "# The \"Storyboard\"\n", "\n", "Start at the top of the notebook and work your way down. Each decision you make contributes information to a file called \"the storyboard\", which persists project state. The notebook is designed to facilitate working iteratively: once you've made it past a given decision point, you should generally be able to jump back to that point to tweak whatever the decision was and return to where you were.\n", "\n", "The storyboard isn't just a logical abstraction, it's a physical file you can modify directly, and is reasonably human readable.\n", "\n", "Every project you start will create a new storyboard. If you set your project name to one you used previously, the notebook will attempt to find that project and load its storyboard.\n", "\n", "The original motivation behind VKTRS was to be a toy for automating generative music videos. It has evolved quite a bit since, and I now mostly think about it as a tool for building and manipulating these storyboards. The storyboard parameterizes an animation, but the actual animation doesn't need to be generated by this tool.\n", "\n", "\n", "# $\\text{FAQ}$\n", "\n", "**Why the name?**\n", "\n", "The notebook's name is an ironic homage to the first music video played on MTV: [The Bugles - Video Killed The Radio Star](https://www.youtube.com/watch?v=W8r-tXRLazs). Quoting [wikipedia](https://en.wikipedia.org/wiki/Video_Killed_the_Radio_Star):\n", "\n", "> \"The song relates to concerns about, and mixed attitudes toward 20th-century inventions and machines for the media arts.\"\n", "\n", "**Something didn't work quite right in the transcription process. How do fix the timing or the actual lyrics?**\n", "\n", "The notebook is divided into several steps. Between each step, a \"storyboard\" file is updated. If you want to\n", "make modifications, you can edit this file directly and those edits should be reflected when you next load the\n", "file. Depending on what you changed and what step you run next, your changes may be ignored or even overwritten.\n", "Still playing with different solutions here.\n", "\n", "**Can I provide my own images to 'bring to life' and associate with certain lyrics/sequences?**\n", "\n", "Yes, you can! As described above: you just need to modify the storyboard. Will describe this functionality in\n", "greater detail after the implementation stabilizes a bit more.\n", "\n", "**How can I support your work or work like it?**\n", "\n", "This notebook was made possible thanks to ongoing support from [stability.ai](https://stability.ai/). The best way to support my work is to share it with your friends, [report bugs](https://github.com/dmarx/video-killed-the-radio-star/issues/new), [suggest features](https://github.com/dmarx/video-killed-the-radio-star/discussions) or to donate to open source non-profits :)\n", "\n", "# Examples of content made with VKTRS\n", "\n", "[](https://www.youtube.com/watch?v=dx8LmqalrmU)\n", "\n", "**The Sudden - Fine (Official Music Video)** `variations` animation mode and theme prompts.\n", "\n", "
\n", "\n", " **Hi-Standard - Asian Pride** `img2img` animation mode (via the Stability.AI animation API), with camera movement and simple audioreactivity (vocals stem driving `strength_curve` and `noise_curve`). This was one of my test animations and doesn't do the notebook justice, but you'll get the idea.\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "id": "sM147HP4kAdY" }, "source": [ "## $0.$ Setup" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "id": "ZnTe8clZuZuj", "tags": [] }, "outputs": [], "source": [ "# @title # 📊 Check GPU Status\n", "\n", "import pandas as pd\n", "import subprocess\n", "\n", "def gpu_info():\n", " outv = subprocess.run([\n", " 'nvidia-smi',\n", " # these lines concatenate into a single query string\n", " '--query-gpu='\n", " 'timestamp,'\n", " 'name,'\n", " 'utilization.gpu,'\n", " 'utilization.memory,'\n", " 'memory.used,'\n", " 'memory.free,'\n", " ,\n", " '--format=csv'\n", " ],\n", " stdout=subprocess.PIPE).stdout.decode('utf-8')\n", "\n", " header, rec = outv.split('\\n')[:-1]\n", " return pd.DataFrame({' '.join(k.strip().split('.')).capitalize():v for k,v in zip(header.split(','), rec.split(','))}, index=[0]).T\n", "\n", "gpu_info()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "id": "FIJ-gPjcVXby", "tags": [] }, "outputs": [], "source": [ "#%%capture\n", "\n", "# @title # 🛠️ Setup: Install Dependencies\n", "\n", "# Install dependencies\n", "\n", "try:\n", " import google.colab\n", " local=False\n", "except:\n", " local=True\n", "\n", "# TODO: pin versions\n", "\n", "# local only additional dependencies\n", "if local:\n", " %pip install pandas torch pillow beautifulsoup4 scipy toolz numpy lxml librosa scikit-learn rich\n", "\n", "# dependencies for both colab and local\n", "%pip install yt-dlp python-tsp stability-sdk[anim_ui] diffusers transformers ftfy accelerate omegaconf\n", "%pip install openai-whisper panel huggingface_hub ipywidgets safetensors keyframed demucs parse" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "OrHUOTwdgCfK", "tags": [], "cellView": "form" }, "outputs": [], "source": [ "#%%capture\n", "\n", "# @title # 🛠️ Setup: Imports and Definitions\n", "\n", "# Definitions and imports\n", "\n", "from collections import defaultdict\n", "import copy\n", "import datetime as dt\n", "import gc\n", "import io\n", "from itertools import chain, cycle\n", "import json\n", "import os\n", "from pathlib import Path\n", "import random\n", "import re\n", "import shutil\n", "import string\n", "import subprocess\n", "from subprocess import Popen, PIPE\n", "import time\n", "import warnings\n", "\n", "from bokeh.models.widgets.tables import (\n", " NumberFormatter,\n", " BooleanFormatter,\n", " CheckboxEditor,\n", ")\n", "from diffusers import (\n", " StableDiffusionImg2ImgPipeline,\n", " StableDiffusionPipeline,\n", ")\n", "from IPython.display import display\n", "import keyframed\n", "import keyframed as kf # TODO...\n", "import keyframed.dsl\n", "from keyframed.serialization import from_dict as load_curve\n", "import librosa\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "from omegaconf import OmegaConf, DictConfig\n", "import pandas as pd\n", "import panel as pn\n", "import parse\n", "import PIL\n", "from PIL import Image\n", "from PIL import Image, ImageDraw, ImageFont\n", "from python_tsp.exact import solve_tsp_dynamic_programming\n", "import rich\n", "from safetensors.numpy import save_file as save_safetensors\n", "from safetensors.numpy import load_file as load_safetensors\n", "import scipy\n", "from scipy.spatial.distance import pdist, squareform\n", "import sklearn.cluster\n", "import textwrap\n", "from tqdm.autonotebook import tqdm\n", "import torch\n", "from torch import autocast\n", "import whisper\n", "\n", "from stability_sdk.api import Context\n", "from stability_sdk.animation import AnimationArgs, Animator\n", "\n", "from stability_sdk.animation import (\n", " AnimationArgs,\n", " Animator,\n", " AnimationSettings,\n", " BasicSettings,\n", " CoherenceSettings,\n", " ColorSettings,\n", " DepthSettings,\n", " InpaintingSettings,\n", " Rendering3dSettings,\n", " CameraSettings,\n", " VideoInputSettings,\n", " VideoOutputSettings,\n", ")\n", "\n", "try:\n", " import google.colab\n", " local=False\n", "except:\n", " local=True\n", "\n", "\n", "def sanitize_folder_name(fp):\n", " outv = ''\n", " whitelist = string.ascii_letters + string.digits + '-_'\n", " for token in str(fp):\n", " if token not in whitelist:\n", " token = '-'\n", " outv += token\n", " return outv\n", "\n", "# to do: is there a way to check if this is in the env already?\n", "#pn.extension('tabulator')\n", "\n", "\n", "def establish_workspace(\n", " use_stability_api,\n", " mount_gdrive,\n", " application_name=\"VideoKilledTheRadioStar\",\n", " active_project=None,\n", "):\n", " \"\"\"\n", " This function constructs a local file called `config.yaml` that maintains state that will be used elsewhere.\n", " It mostly sets the names of project folders and a handful of settings. The reason for doing things this way\n", " is to facilitate \"resume\" functionality and creating new projects without overwriting previously created assets.\n", "\n", " By convention, when loaded the config.yaml is referred to as the `workspace` object.\n", "\n", " Most project-specific content will be located in a project-specific config -- `storyboard.yaml` -- which should be\n", " located in the folder path given by `workspace.project_root`. By convention, when loaded this is referred to as the\n", " `storyboard` object.\n", "\n", " If everything is set up correctly, you should be able to load the currently configured workspace and storyboard via:\n", "\n", " workspace, storyboard = load_storyboard()\n", " \"\"\"\n", " # yeah... so... this shouldn't be necessary....\n", " import os\n", "\n", " # infer if we're on colab or not, since this impacts gdrive mounting\n", " try:\n", " import google.colab\n", " local=False\n", " except:\n", " local=True\n", "\n", " if local:\n", " mount_gdrive=False\n", "\n", " # Infer directory locations\n", " os.environ['XDG_CACHE_HOME'] = os.environ.get(\n", " 'XDG_CACHE_HOME',\n", " str(Path('~/.cache').expanduser())\n", " )\n", " if mount_gdrive:\n", " from google.colab import drive\n", " drive.mount('/content/drive')\n", " Path('/content/drive/MyDrive/AI/models/.cache/').mkdir(parents=True, exist_ok=True)\n", " os.environ['XDG_CACHE_HOME']='/content/drive/MyDrive/AI/models/.cache'\n", "\n", " model_dir_str=str(Path(os.environ['XDG_CACHE_HOME']))\n", " proj_root_str = '${active_project}'\n", " application_root = str(Path('.').absolute())\n", " if mount_gdrive:\n", " application_root = '/content/drive/MyDrive/AI/VideoKilledTheRadioStar'\n", "\n", "\n", " # Build config file that defines the \"workspace\" abstraction\n", " workspace = OmegaConf.create({\n", " 'active_project': active_project if active_project else str(time.time()),\n", " 'application_root':application_root,\n", " 'project_root':\"${application_root}/${active_project}\",\n", " 'shared_assets_root':\"${application_root}/shared_assets\",\n", " 'gdrive_mounted':mount_gdrive,\n", " 'use_stability_api':use_stability_api,\n", " 'model_dir':model_dir_str,\n", " 'output_dir':'${project_root}/frames'\n", " })\n", "\n", " Path(workspace.project_root).mkdir(parents=True, exist_ok=True)\n", " Path(workspace.model_dir).mkdir(parents=True, exist_ok=True)\n", " Path(workspace.output_dir).mkdir(parents=True, exist_ok=True)\n", "\n", " ###################\n", "\n", " # Assign tracking locations for A/V assets and generally useful outputs\n", "\n", " assets_dir = Path(workspace.shared_assets_root)\n", " assets_dir.mkdir(parents=True, exist_ok=True)\n", "\n", " # TODO: yaml -> jsonl ?\n", " video_assets_meta_fname = assets_dir / 'video_assets_meta.yaml'\n", " if not video_assets_meta_fname.exists():\n", " video_assets_meta = OmegaConf.create()\n", " video_assets_meta.videos = []\n", " with video_assets_meta_fname.open('w') as fp:\n", " OmegaConf.save(config=video_assets_meta, f=fp.name)\n", " else:\n", " video_assets_meta = OmegaConf.load(video_assets_meta_fname)\n", "\n", " audio_assets_meta_fname = assets_dir / 'audio_assets_meta.yaml'\n", " if not audio_assets_meta_fname.exists():\n", " audio_assets_meta = OmegaConf.create()\n", " audio_assets_meta.content = []\n", " with audio_assets_meta_fname.open('w') as fp:\n", " OmegaConf.save(config=audio_assets_meta, f=fp.name)\n", " else:\n", " audio_assets_meta = OmegaConf.load(audio_assets_meta_fname)\n", "\n", " ###################\n", "\n", " # Request user provide credentials as needed\n", "\n", " # if use_stability_api:\n", " # import os, getpass\n", " # if not os.environ.get('STABILITY_KEY'):\n", " # os.environ['STABILITY_KEY'] = getpass.getpass('Enter your Stability API Key, then press enter to continue')\n", " # else:\n", " # # TODO: check for HF token in environment\n", " # if not local:\n", " # from google.colab import output\n", " # output.enable_custom_widget_manager()\n", "\n", " # from huggingface_hub import notebook_login\n", " # notebook_login()\n", "\n", " ###################\n", "\n", " with open('config.yaml','w') as fp:\n", " OmegaConf.save(config=workspace, f=fp.name)\n", "\n", " return workspace\n", "\n", "########################\n", "\n", "# wrap some of the loading logic for portability\n", "\n", "\n", "def resolve_scene_ids_and_start_end_times(storyboard):\n", " \"\"\"\n", " 1. Force first scene to start at frame 0\n", " 2. Force last scene to end in accordance with duration\n", " 3. Force each scene's `end` attr to correspond to the `start` attr of the subsequent scene\n", " 4. Force scene_id to correspond to scenes index position\n", " \"\"\"\n", " # nothing to see here, move along.\n", " if not storyboard.params.get('video_duration'):\n", " return storyboard\n", "\n", " storyboard.prompt_starts[0]['start']=0\n", " storyboard.prompt_starts[-1]['end']=max(storyboard.params.video_duration, storyboard.prompt_starts[-1]['end'])\n", " for idx, rec in enumerate(storyboard.prompt_starts):\n", " rec['scene_id']=idx\n", " if idx==0:\n", " prev_rec = rec\n", " continue\n", " prev_rec['end'] = rec['start']\n", " for rec in storyboard.prompt_starts:\n", " rec['duration_'] = rec['end'] - rec['start']\n", " return storyboard\n", "\n", "def save_storyboard(storyboard):\n", " #if storyboard.params.get('video_duration'):\n", " if storyboard.prompt_starts:\n", " storyboard = resolve_scene_ids_and_start_end_times(storyboard)\n", " root = Path(load_workspace().project_root)\n", " root.mkdir(parents=True, exist_ok=True)\n", " storyboard_fname = root / 'storyboard.yaml'\n", " with open(storyboard_fname, 'w') as fp:\n", " OmegaConf.save(config=storyboard, f=fp.name)\n", "\n", "def load_workspace():\n", " return OmegaConf.load('config.yaml')\n", "\n", "def load_storyboard():\n", " workspace = load_workspace()\n", " root = Path(workspace.project_root)\n", " storyboard_fname = root / 'storyboard.yaml'\n", " storyboard = OmegaConf.load(storyboard_fname)\n", " return workspace, storyboard\n", "\n", "def load_audio_meta(workspace, storyboard):\n", " assets_dir = Path(workspace.shared_assets_root)\n", " audio_assets_meta_fname = assets_dir / 'audio_assets_meta.yaml'\n", " audio_assets_meta = OmegaConf.load(audio_assets_meta_fname)\n", " audio_meta=dict()\n", " for idx, rec in enumerate(audio_assets_meta.content):\n", " if rec.audio_fpath == storyboard.params.audio_fpath:\n", " audio_meta = rec\n", " break\n", " return audio_meta\n", "\n", "#######################\n", "\n", "# EXTRA SEGMENTATION STUFF\n", "\n", "\n", "def calculate_interword_gaps(segment):\n", " end_prev = -1\n", " gaps = []\n", " for word in segment['words']:\n", " if end_prev < 0:\n", " end_prev = word['end']\n", " continue\n", " gap = word['start'] - end_prev\n", " gaps.append(gap)\n", " end_prev = word['end']\n", " return gaps\n", "\n", "def trivial_subsegmentation(segment, threshold=0, gaps=None):\n", " \"\"\"\n", " split on gaps in detected vocal activity.\n", " Contiguity = gap between adjacent tokens is less than the input threshold.\n", " \"\"\"\n", " if gaps is None:\n", " gaps = calculate_interword_gaps(seg)\n", " out_segments = []\n", " this_segment = [seg['words'][0]]\n", " for word, preceding_pause in zip(seg['words'][1:], gaps):\n", " if preceding_pause <= threshold:\n", " this_segment.append(word)\n", " else:\n", " out_segments.append(this_segment)\n", " this_segment = [word]\n", " out_segments.append(this_segment)\n", "\n", " outv = [dict(\n", " start=seg[0]['start'],\n", " end=seg[-1]['end'],\n", " text=''.join([w['word'] for w in seg]).strip(),\n", " ) for seg in out_segments]\n", "\n", " return outv\n", "\n", "##############################################################\n", "\n", "# audio processing\n", "\n", "\n", "def analyze_audio_structure(\n", " audio_fpath,\n", " BINS_PER_OCTAVE = 12 * 3, # should be a multiple of twelve: https://github.com/MTG/essentia/blob/master/src/examples/python/tutorial_spectral_constantq-nsg.ipynb\n", " N_OCTAVES = 7,\n", "):\n", " \"\"\"\n", " via librosa docs\n", " https://librosa.org/doc/latest/auto_examples/plot_segmentation.html#sphx-glr-auto-examples-plot-segmentation-py\n", " cites: McFee and Ellis, 2014 - https://brianmcfee.net/papers/ismir2014_spectral.pdf\n", " \"\"\"\n", " y, sr = librosa.load(audio_fpath)\n", "\n", " C = librosa.amplitude_to_db(np.abs(librosa.cqt(y=y, sr=sr,\n", " bins_per_octave=BINS_PER_OCTAVE,\n", " n_bins=N_OCTAVES * BINS_PER_OCTAVE)),\n", " ref=np.max)\n", "\n", " # reduce dimensionality via beat-synchronization\n", " tempo, beats = librosa.beat.beat_track(y=y, sr=sr, trim=False)\n", " Csync = librosa.util.sync(C, beats, aggregate=np.median)\n", "\n", " # I have concerns about this frame fixing operation\n", " beat_times = librosa.frames_to_time(librosa.util.fix_frames(beats, x_min=0), sr=sr)\n", "\n", " # width=3 prevents links within the same bar\n", " # mode=’affinity’ here implements S_rep (after Eq. 8)\n", " R = librosa.segment.recurrence_matrix(Csync, width=3, mode='affinity', sym=True)\n", " # Enhance diagonals with a median filter (Equation 2)\n", " df = librosa.segment.timelag_filter(scipy.ndimage.median_filter)\n", " Rf = df(R, size=(1, 7))\n", " # build the sequence matrix (S_loc) using mfcc-similarity\n", " mfcc = librosa.feature.mfcc(y=y, sr=sr)\n", " Msync = librosa.util.sync(mfcc, beats)\n", " path_distance = np.sum(np.diff(Msync, axis=1)**2, axis=0)\n", " sigma = np.median(path_distance)\n", " path_sim = np.exp(-path_distance / sigma)\n", " R_path = np.diag(path_sim, k=1) + np.diag(path_sim, k=-1)\n", " # compute the balanced combination\n", " deg_path = np.sum(R_path, axis=1)\n", " deg_rec = np.sum(Rf, axis=1)\n", " mu = deg_path.dot(deg_path + deg_rec) / np.sum((deg_path + deg_rec)**2)\n", " A = mu * Rf + (1 - mu) * R_path\n", "\n", " # compute normalized laplacian and its spectrum\n", " L = scipy.sparse.csgraph.laplacian(A, normed=True)\n", " evals, evecs = scipy.linalg.eigh(L)\n", " # clean this up with a median filter. can help smooth over discontinuities\n", " evecs = scipy.ndimage.median_filter(evecs, size=(9, 1))\n", " return dict(\n", " y=y,\n", " sr=np.array(sr).astype(np.uint32),\n", " tempo=tempo,\n", " beats=beats,\n", " beat_times=beat_times,\n", " evecs=evecs,\n", " )\n", "\n", "\n", "def laplacian_segmentation(\n", " audio_fpath=None,\n", " evecs=None,\n", " n_clusters = 5,\n", " n_spectral_features = None,\n", "):\n", " \"\"\"\n", " segment audio by clustering a self-similarity matrix.\n", " via librosa docs\n", " https://librosa.org/doc/latest/auto_examples/plot_segmentation.html#sphx-glr-auto-examples-plot-segmentation-py\n", " cites: McFee and Ellis, 2014 - https://brianmcfee.net/papers/ismir2014_spectral.pdf\n", " \"\"\"\n", " if evecs is None:\n", " if audio_fpath is None:\n", " raise Exception(\"One of `audio_fpath` or `evecs` must be provided\")\n", " features = analyze_audio_structure(audio_fpath)\n", " evecs = features['evecs']\n", "\n", " if n_clusters < 2:\n", " seg_ids = np.zeros(evecs.shape[0], dtype=int)\n", " return seg_ids\n", "\n", " if n_spectral_features is None:\n", " n_spectral_features = n_clusters\n", "\n", " # cumulative normalization is needed for symmetric normalize laplacian eigenvectors\n", " Cnorm = np.cumsum(evecs**2, axis=1)**0.5\n", " k = n_spectral_features\n", " X = evecs[:, :k] / Cnorm[:, k-1:k]\n", "\n", "\n", " # use these k components to cluster beats into segments\n", " KM = sklearn.cluster.KMeans(n_clusters=n_clusters, n_init=\"auto\")\n", " seg_ids = KM.fit_predict(X)\n", "\n", " return seg_ids #, beat_times, tempo\n", "\n", "\n", "# for video duration\n", "def get_audio_duration_seconds(audio_fpath):\n", " outv = subprocess.run([\n", " 'ffprobe'\n", " ,'-i',audio_fpath\n", " ,'-show_entries', 'format=duration'\n", " ,'-v','quiet'\n", " ,'-of','csv=p=0'\n", " ],\n", " stdout=subprocess.PIPE\n", " ).stdout.decode('utf-8')\n", " return float(outv.strip())\n", "\n", "\n", "##########################################\n", "\n", "# animation stuff\n", "\n", "# TODO: update this stuff to reflect updates to API/sdk\n", "def get_image_for_prompt_sai(prompt, max_retries=5, **kargs):\n", " stability_api = client.StabilityInference(\n", " key=os.environ['STABILITY_KEY'],\n", " verbose=False,\n", " )\n", "\n", " # auto-retry if mitigation triggered\n", " while max_retries:\n", " try:\n", " answers = stability_api.generate(prompt=prompt, **kargs)\n", " response = process_response(answers)\n", " for img in response:\n", " yield img\n", " break\n", "\n", " # TODO: better regen handling\n", " except RuntimeError:\n", " print(\"runtime error\")\n", " max_retries -= 1\n", " warnings.warn(f\"mitigation triggered, retries remaining: {max_retries}\")\n", "\n", "def process_response(answers):\n", " for resp in answers:\n", " for artifact in resp.artifacts:\n", " if artifact.finish_reason == generation.FILTER:\n", " warnings.warn(\n", " \"Your request activated the API's safety filters and could not be processed.\"\n", " \"Please modify the prompt and try again.\")\n", " raise RuntimeError\n", " if artifact.type == generation.ARTIFACT_IMAGE:\n", " img = Image.open(io.BytesIO(artifact.binary))\n", " yield img\n", "\n", "\n", "########################################\n", "\n", "# misc utils\n", "\n", "def rand_str(n_char=5):\n", " return ''.join(random.choice(string.ascii_lowercase) for i in range(n_char))\n", "\n", "def save_frame(\n", " img: Image,\n", " idx:int=0,\n", " root_path=Path('./frames'),\n", " name=None,\n", "):\n", " root_path.mkdir(parents=True, exist_ok=True)\n", " if name is None:\n", " name = rand_str()\n", " outpath = root_path / f\"{idx}-{name}.png\"\n", " img.save(outpath)\n", " return str(outpath)\n", "\n", "def get_image_sequence(idx, root, init_first=True):\n", " root = Path(root)\n", " images = (root / 'frames' ).glob(f'{idx}-*.png')\n", " images = [str(fp) for fp in images]\n", " if init_first:\n", " init_image = None\n", " images2 = []\n", " for i, fp in enumerate(images):\n", " if 'anchor' in fp:\n", " init_image = fp\n", " else:\n", " images2.append(fp)\n", " if not init_image:\n", " try:\n", " init_image, images2 = images2[0], images2[1:]\n", " images = [init_image] + images2\n", " except IndexError:\n", " images = images2\n", " return images\n", "\n", "def archive_images(idx, root, archive_root = None):\n", " root = Path(root)\n", " if archive_root is None:\n", " archive_root = root / 'archive'\n", " archive_root = Path(archive_root)\n", " archive_root.mkdir(parents=True, exist_ok=True)\n", " old_images = get_image_sequence(idx, root=root)\n", " if not old_images:\n", " return\n", " print(f\"moving {len(old_images)} old images for scene {idx} to {archive_root}\")\n", " for old_fp in old_images:\n", " old_fp = Path(old_fp)\n", " im_name = Path(old_fp.name)\n", " new_path = archive_root / im_name\n", " if new_path.exists():\n", " im_name = f\"{im_name.stem}-{time.time()}{im_name.suffix}\"\n", " new_path = archive_root / im_name\n", " old_fp.rename(new_path)\n", "\n", "\n", "############################\n", "\n", "# video compilation stuff\n", "\n", "# TODO: Sorting algorithm that can tolerate more than 15-ish frames (GPU?)\n", "def tsp_sort(frames):\n", " frames_m = np.array([np.array(f).ravel() for f in frames])\n", " dmat = pdist(frames_m, metric='cosine')\n", " dmat = squareform(dmat)\n", " permutation, _ = solve_tsp_dynamic_programming(dmat)\n", " return permutation\n", "\n", "def add_caption2image(\n", " image,\n", " caption,\n", " text_font='LiberationSans-Regular.ttf',\n", " font_size=20,\n", " fill_color=(255, 255, 255),\n", " stroke_color=(0, 0, 0), #stroke_fill\n", " stroke_width=2,\n", " align='center',\n", " ):\n", " # via https://stackoverflow.com/a/59104505/819544\n", " wrapper = textwrap.TextWrapper(width=50)\n", " word_list = wrapper.wrap(text=caption)\n", " caption_new = ''\n", " for ii in word_list[:-1]:\n", " caption_new = caption_new + ii + '\\n'\n", " caption_new += word_list[-1]\n", "\n", " draw = ImageDraw.Draw(image)\n", "\n", " # Download the Font and Replace the font with the font file.\n", " font = ImageFont.truetype(text_font, size=font_size)\n", " w,h = draw.textsize(caption_new, font=font, stroke_width=stroke_width)\n", " W,H = image.size\n", " x,y = 0.5*(W-w),0.90*H-h\n", " draw.text(\n", " (x,y),\n", " caption_new,\n", " font=font,\n", " fill=fill_color,\n", " stroke_fill=stroke_color,\n", " stroke_width=stroke_width,\n", " align=align,\n", " )\n", "\n", " return image\n", "\n", "##########################################################\n", "\n", "# audioreactivity stuff\n", "\n", "\n", "def full_width_plot():\n", " ax = plt.gca()\n", " ax.figure.set_figwidth(20)\n", " plt.show()\n", "\n", "def display_signal(y, sr, show_spec=True, title=None, start_time=0, end_time=9999):\n", "\n", "# if show_spec:\n", "# frame_time = librosa.samples_to_time(np.arange(len(normalized_signal)), sr=sr)\n", "# else:\n", "# frame_time = librosa.frames_to_time(np.arange(len(normalized_signal)), sr=sr)\n", "\n", " if show_spec:\n", " #librosa.display.waveshow(y, sr=sr)\n", " times = librosa.samples_to_time(np.arange(len(y)), sr=sr)\n", " else:\n", " #times = librosa.times_like(y, sr=sr).ravel()\n", " times = librosa.frames_to_time(np.arange(len(y)), sr=sr).ravel()\n", "\n", " start_idx = np.argmax(start_time <= times)\n", " #end_idx = len(times) - np.argmax([end_time <= times][::-1])\n", " end_idx = np.argmax(end_time <= times)\n", " if start_idx >= end_idx:\n", " end_idx = -1\n", "\n", " times = times[start_idx:end_idx]\n", " y = y[start_idx:end_idx]\n", "\n", " plt.plot(times, y)\n", " if title:\n", " plt.title(title)\n", " full_width_plot()\n", "\n", " if show_spec:\n", " try:\n", " M = librosa.feature.melspectrogram(y=y, sr=sr)\n", " librosa.display.specshow(librosa.power_to_db(M, ref=np.max),\n", " y_axis='mel', x_axis='time')\n", " full_width_plot()\n", "\n", " except:\n", " pass\n", "\n", " # plt.plot(frame_time, y)\n", " # if title:\n", " # plt.title(title)\n", " # full_width_plot()\n", "\n", "\n", "# https://github.com/pytti-tools/pytti-core/blob/9e8568365cfdc123d2d2fbc20d676ca0f8715341/src/pytti/AudioParse.py#L95\n", "from scipy.signal import butter, sosfilt, sosfreqz\n", "\n", "def butter_bandpass(lowcut, highcut, fs, order):\n", " nyq = 0.5 * fs\n", " low = lowcut / nyq\n", " high = highcut / nyq\n", " sos = butter(order, [low, high], analog=False, btype='bandpass', output='sos')\n", " return sos\n", "\n", "def butter_bandpass_filter(y, sr, lowcut, highcut, order=10):\n", " sos = butter_bandpass(lowcut, highcut, sr, order=order)\n", " y = sosfilt(sos, y)\n", " return y\n", "\n", "########################################################################\n", "\n", "def is_multi_valued_curve(curve_str):\n", " try:\n", " return (\": (\" in curve_str) and (\"), \" in curve_str)\n", " except:\n", " return False\n", "\n", "def show_storyboard(storyboard=None):\n", " if storyboard is None:\n", " workspace, storyboard = load_storyboard()\n", " storyboard = resolve_scene_ids_and_start_end_times(storyboard)\n", "\n", " # TODO: fix this...\n", " reactive_signal_map = {}\n", " if storyboard.get('audioreactive'):\n", " reactive_signal_map = storyboard.audioreactive.get('reactive_signal_map')\n", "\n", " # TODO: should just invoke compile for basically all of this.\n", " try:\n", " # ... borks on ifps...\n", " settings = compile_storyboard(ignore_defaults=True)\n", " except:\n", " settings = []\n", "\n", " for idx, rec in enumerate(storyboard.prompt_starts):\n", " report = f\"scene: {idx}\\t start: {rec['start']:.2f}\"\n", " if rec.get('duration_'):\n", " report += f\"\\t duration: {rec.get('duration_'):.2f}\"\n", " report += f\"\\nspoken text: {rec.get('text')}\\n\"\n", "\n", " # TODO: wrap prompt construction logic in a function (better yet use omegaconf substitution variables)\n", "\n", " #'_theme':'theme', 'structural_segmentation_label':\n", " if rec.get('_theme'):\n", " report += f\"theme prompt: {rec['_theme']}\\n\"\n", " #f\"image prompt: {rec['_prompt']}\\n\"\n", " prompt = rec.get('prompt')\n", " #if not prompt:\n", " # prompt = ...\n", " if prompt:\n", " report += f\"image prompt: {rec['_prompt']}\\n\"\n", "\n", " if rec.get('animation_mode'):\n", " report += f\"animation mode: {rec['animation_mode']}\"\n", " print(report)\n", " im_path = rec.get('frame0_fpath')\n", " if im_path and Path(im_path).exists():\n", " display(Image.open(rec['frame0_fpath']))\n", "\n", " #if reactive_signal_map:\n", " n = rec['frames']\n", " if n <1:\n", " continue\n", "\n", " if not settings:\n", " continue\n", " scene_settings = settings[idx]\n", " #for signal_name in reactive_signal_map.keys():\n", " for signal_name, signal_val in scene_settings.items():\n", " if not is_multi_valued_curve(signal_val):\n", " continue\n", "\n", " curve = kf.dsl.curve_from_cn_string(signal_val)\n", " xs = [i for i in range(n)]\n", " ys = [curve[i] for i in xs]\n", " plt.plot(xs, ys, label=signal_name)\n", " plt.title(f\"scene {idx} - {signal_name}\")\n", " plt.xlabel(\"frame index within scene\")\n", " plt.legend()\n", " plt.show()\n", "\n", "#########################################\n", "\n", "def get_path_to_stems():\n", " workspace, storyboard = load_storyboard()\n", " assets_root = Path(workspace.application_root) / 'shared_assets'\n", " #stems_path = root / \"stems\"\n", " stems_path = assets_root / \"stems\"\n", " stems_outpath = stems_path / 'htdemucs_ft' / Path(storyboard.params.audio_fpath).stem\n", " return stems_outpath\n", "\n", "def ensure_stems_separated():\n", " stems_outpath = get_path_to_stems()\n", " stems_path = str(stems_outpath.parent.parent)\n", " if not stems_outpath.exists():\n", " !demucs -n htdemucs_ft -o \"{stems_path}\" \"{storyboard.params.audio_fpath}\"\n", "\n", "def get_stem(instrument_name):\n", " ensure_stems_separated()\n", " stems_outpath = get_path_to_stems()\n", " stem_fpaths = list(stems_outpath.glob('*.wav'))\n", "\n", " for stem_fpath in stem_fpaths:\n", " if instrument_name in str(stem_fpath):\n", " y, sr = librosa.load(stem_fpath)\n", " return y, sr\n", " raise ValueError(\n", " f\"Unable to locate stem for instrument: {instrument_name}\\n\"\n", " f\"in folder: {stems_outpath}\"\n", " )\n", "\n", "\n", "##########################################################################################################\n", "\n", "# deforum compatibility sprint\n", "\n", "\n", "import math\n", "import numpy as np\n", "\n", "def build_eval_scope(storyboard):\n", " # preload eval scope with math stuff\n", " math_env = {\n", " \"abs\": abs,\n", " \"max\": max,\n", " \"min\": min,\n", " \"pow\": pow,\n", " \"round\": round,\n", " \"np\": np,\n", " \"__builtins__\": None,\n", " }\n", " math_env.update(\n", " {key: getattr(math, key) for key in dir(math) if \"_\" not in key}\n", " )\n", "\n", " # add signals to scope\n", " for signal_name, sig_curve in storyboard.signals.items():\n", " sig_curve = OmegaConf.to_container(sig_curve) # zomg...\n", " curve = load_curve(sig_curve)\n", " math_env[signal_name] = curve\n", " return math_env\n", "\n", "\n", "#eval(signal_mappings['noise_curve'], math_env, t=0)\n", "#math_env['t']=0\n", "#eval(signal_mappings['noise_curve'], math_env)\n", "\n", "\n", "#################\n", "\n", "\n", "true=True\n", "false=False\n", "DEFORUM_DEFAULTS = {\n", " \"W\": 512,\n", " \"H\": 512,\n", " \"show_info_on_ui\": true,\n", " \"tiling\": false,\n", " \"restore_faces\": false,\n", " \"seed_resize_from_w\": 0,\n", " \"seed_resize_from_h\": 0,\n", " \"seed\": -1,\n", " \"sampler\": \"Euler a\",\n", " \"steps\": 25,\n", " \"batch_name\": \"Deforum_{timestring}\",\n", " \"seed_behavior\": \"iter\",\n", " \"seed_iter_N\": 1,\n", " \"use_init\": false,\n", " \"strength\": 0.8,\n", " \"strength_0_no_init\": true,\n", " \"init_image\": \"https://deforum.github.io/a1/I1.png\",\n", " \"use_mask\": false,\n", " \"use_alpha_as_mask\": false,\n", " \"mask_file\": \"https://deforum.github.io/a1/M1.jpg\",\n", " \"invert_mask\": false,\n", " \"mask_contrast_adjust\": 1.0,\n", " \"mask_brightness_adjust\": 1.0,\n", " \"overlay_mask\": true,\n", " \"mask_overlay_blur\": 4,\n", " \"fill\": 1,\n", " \"full_res_mask\": true,\n", " \"full_res_mask_padding\": 4,\n", " \"reroll_blank_frames\": \"ignore\",\n", " \"reroll_patience\": 10.0,\n", " # \"prompts\": {\n", " # \"0\": \"tiny cute bunny, vibrant diffraction, highly detailed, intricate, ultra hd, sharp photo, crepuscular rays, in focus, by tomasz alen kopera\",\n", " # \"30\": \"anthropomorphic clean cat, surrounded by fractals, epic angle and pose, symmetrical, 3d, depth of field, ruan jia and fenghua zhong\",\n", " # \"60\": \"a beautiful coconut --neg photo, realistic\",\n", " # \"90\": \"a beautiful durian, trending on Artstation\"\n", " # },\n", " \"animation_prompts_positive\": \"\",\n", " \"animation_prompts_negative\": \"nsfw, nude\",\n", " \"animation_mode\": \"2D\",\n", " \"max_frames\": 1,\n", " \"border\": \"replicate\",\n", " \"angle\": \"0: (0)\",\n", " \"zoom\": \"0: (1.0025+0.002*sin(1.25*3.14*t/30))\",\n", " \"translation_x\": \"0: (0)\",\n", " \"translation_y\": \"0: (0)\",\n", " \"translation_z\": \"0: (1.75)\",\n", " \"transform_center_x\": \"0: (0.5)\",\n", " \"transform_center_y\": \"0: (0.5)\",\n", " \"rotation_3d_x\": \"0: (0)\",\n", " \"rotation_3d_y\": \"0: (0)\",\n", " \"rotation_3d_z\": \"0: (0)\",\n", " \"enable_perspective_flip\": false,\n", " \"perspective_flip_theta\": \"0: (0)\",\n", " \"perspective_flip_phi\": \"0: (0)\",\n", " \"perspective_flip_gamma\": \"0: (0)\",\n", " \"perspective_flip_fv\": \"0: (53)\",\n", " \"noise_schedule\": \"0: (0.065)\",\n", " \"strength_schedule\": \"0: (0.65)\",\n", " \"contrast_schedule\": \"0: (1.0)\",\n", " \"cfg_scale_schedule\": \"0: (7)\",\n", " \"enable_steps_scheduling\": false,\n", " \"steps_schedule\": \"0: (25)\",\n", " \"fov_schedule\": \"0: (70)\",\n", " \"aspect_ratio_schedule\": \"0: (1)\",\n", " \"aspect_ratio_use_old_formula\": false,\n", " \"near_schedule\": \"0: (200)\",\n", " \"far_schedule\": \"0: (10000)\",\n", " \"seed_schedule\": \"0:(s), 1:(-1), \\\"max_f-2\\\":(-1), \\\"max_f-1\\\":(s)\",\n", " \"pix2pix_img_cfg_scale_schedule\": \"0:(1.5)\",\n", " \"enable_subseed_scheduling\": false,\n", " \"subseed_schedule\": \"0: (1)\",\n", " \"subseed_strength_schedule\": \"0: (0)\",\n", " \"enable_sampler_scheduling\": false,\n", " \"sampler_schedule\": \"0: (\\\"Euler a\\\")\",\n", " \"use_noise_mask\": false,\n", " \"mask_schedule\": \"0: (\\\"{video_mask}\\\")\",\n", " \"noise_mask_schedule\": \"0: (\\\"{video_mask}\\\")\",\n", " \"enable_checkpoint_scheduling\": false,\n", " \"checkpoint_schedule\": \"0: (\\\"model1.ckpt\\\"), 100: (\\\"model2.safetensors\\\")\",\n", " \"enable_clipskip_scheduling\": false,\n", " \"clipskip_schedule\": \"0: (2)\",\n", " \"enable_noise_multiplier_scheduling\": true,\n", " \"noise_multiplier_schedule\": \"0: (1.05)\",\n", " \"resume_from_timestring\": false,\n", " \"resume_timestring\": \"20230621175028\",\n", " \"enable_ddim_eta_scheduling\": false,\n", " \"ddim_eta_schedule\": \"0: (0)\",\n", " \"enable_ancestral_eta_scheduling\": false,\n", " \"ancestral_eta_schedule\": \"0: (1)\",\n", " \"amount_schedule\": \"0: (0.1)\",\n", " \"kernel_schedule\": \"0: (5)\",\n", " \"sigma_schedule\": \"0: (1)\",\n", " \"threshold_schedule\": \"0: (0)\",\n", " \"color_coherence\": \"LAB\",\n", " \"color_coherence_image_path\": \"\",\n", " \"color_coherence_video_every_N_frames\": 1,\n", " \"color_force_grayscale\": false,\n", " \"legacy_colormatch\": false,\n", " \"diffusion_cadence\": 2,\n", " \"optical_flow_cadence\": \"None\",\n", " \"cadence_flow_factor_schedule\": \"0: (1)\",\n", " \"optical_flow_redo_generation\": \"None\",\n", " \"redo_flow_factor_schedule\": \"0: (1)\",\n", " \"diffusion_redo\": \"0\",\n", " \"noise_type\": \"perlin\",\n", " \"perlin_octaves\": 4,\n", " \"perlin_persistence\": 0.5,\n", " \"use_depth_warping\": true,\n", " \"depth_algorithm\": \"Midas-3-Hybrid\",\n", " \"midas_weight\": 0.2,\n", " \"padding_mode\": \"border\",\n", " \"sampling_mode\": \"bicubic\",\n", " \"save_depth_maps\": false,\n", " \"video_init_path\": \"https://deforum.github.io/a1/V1.mp4\",\n", " \"extract_nth_frame\": 1,\n", " \"extract_from_frame\": 0,\n", " \"extract_to_frame\": -1,\n", " \"overwrite_extracted_frames\": false,\n", " \"use_mask_video\": false,\n", " \"video_mask_path\": \"https://deforum.github.io/a1/VM1.mp4\",\n", " \"hybrid_comp_alpha_schedule\": \"0:(0.5)\",\n", " \"hybrid_comp_mask_blend_alpha_schedule\": \"0:(0.5)\",\n", " \"hybrid_comp_mask_contrast_schedule\": \"0:(1)\",\n", " \"hybrid_comp_mask_auto_contrast_cutoff_high_schedule\": \"0:(100)\",\n", " \"hybrid_comp_mask_auto_contrast_cutoff_low_schedule\": \"0:(0)\",\n", " \"hybrid_flow_factor_schedule\": \"0:(1)\",\n", " \"hybrid_generate_inputframes\": false,\n", " \"hybrid_generate_human_masks\": \"None\",\n", " \"hybrid_use_first_frame_as_init_image\": true,\n", " \"hybrid_motion\": \"None\",\n", " \"hybrid_motion_use_prev_img\": false,\n", " \"hybrid_flow_consistency\": false,\n", " \"hybrid_consistency_blur\": 2,\n", " \"hybrid_flow_method\": \"RAFT\",\n", " \"hybrid_composite\": \"None\",\n", " \"hybrid_use_init_image\": false,\n", " \"hybrid_comp_mask_type\": \"None\",\n", " \"hybrid_comp_mask_inverse\": false,\n", " \"hybrid_comp_mask_equalize\": \"None\",\n", " \"hybrid_comp_mask_auto_contrast\": false,\n", " \"hybrid_comp_save_extra_frames\": false,\n", " \"parseq_manifest\": \"\",\n", " \"parseq_use_deltas\": true,\n", " \"use_looper\": false,\n", " \"init_images\": \"{\\n \\\"0\\\": \\\"https://deforum.github.io/a1/Gi1.png\\\",\\n \\\"max_f/4-5\\\": \\\"https://deforum.github.io/a1/Gi2.png\\\",\\n \\\"max_f/2-10\\\": \\\"https://deforum.github.io/a1/Gi3.png\\\",\\n \\\"3*max_f/4-15\\\": \\\"https://deforum.github.io/a1/Gi4.jpg\\\",\\n \\\"max_f-20\\\": \\\"https://deforum.github.io/a1/Gi1.png\\\"\\n}\",\n", " \"image_strength_schedule\": \"0:(0.75)\",\n", " \"blendFactorMax\": \"0:(0.35)\",\n", " \"blendFactorSlope\": \"0:(0.25)\",\n", " \"tweening_frames_schedule\": \"0:(20)\",\n", " \"color_correction_factor\": \"0:(0.075)\",\n", " \"skip_video_creation\": false,\n", " \"fps\": 15,\n", " \"make_gif\": false,\n", " \"delete_imgs\": false,\n", " \"add_soundtrack\": \"None\",\n", " \"soundtrack_path\": \"https://deforum.github.io/a1/A1.mp3\",\n", " \"r_upscale_video\": false,\n", " \"r_upscale_factor\": \"x2\",\n", " \"r_upscale_model\": \"realesr-animevideov3\",\n", " \"r_upscale_keep_imgs\": true,\n", " \"store_frames_in_ram\": false,\n", " \"frame_interpolation_engine\": \"None\",\n", " \"frame_interpolation_x_amount\": 2,\n", " \"frame_interpolation_slow_mo_enabled\": false,\n", " \"frame_interpolation_slow_mo_amount\": 2,\n", " \"frame_interpolation_keep_imgs\": true,\n", " \"frame_interpolation_use_upscaled\": false,\n", " \"sd_model_name\": \"v1-5-pruned-emaonly.ckpt\",\n", " \"sd_model_hash\": \"81761151\",\n", " \"deforum_git_commit_id\": \"b58056f9\",\n", "}\n", "\n", "\n", "def resolve_prompt(idx, storyboard):\n", " prompt_lag = storyboard.params.get(\"prompt_lag\",True)\n", " rec = storyboard.prompt_starts[idx]\n", " if not rec.get('_prompt'):\n", " theme = rec.get('_theme')\n", " prompt = rec.get('prompt')\n", " if not prompt:\n", " prompt = f\"{rec['text']}, {theme}\"\n", "\n", " if prompt_lag and (idx > 0):\n", " rec_prev = storyboard.prompt_starts[idx -1]\n", " prev_text = rec_prev.get('text','')\n", " if not prev_text:\n", " prev_text = rec_prev.get('prompt','').split(',')[0]\n", " this_text = rec.get('text','')\n", " if this_text:\n", " prompt = f\"{prev_text} {this_text}, {theme}\"\n", " else:\n", " prompt = rec_prev['_prompt']\n", " rec['_prompt'] = prompt\n", "\n", "\n", "def resolve_signals(scene_id=0, storyboard=None):\n", " if storyboard is None:\n", " _, storyboard = load_storyboard()\n", " idx=scene_id\n", " rec = storyboard.prompt_starts[scene_id]\n", " # resolve signals\n", " default_mappings = storyboard.get('signal_mappings',{})\n", " signal_mappings = rec.get('signal_mappings', default_mappings)\n", "\n", " math_env = build_eval_scope(storyboard)\n", " curves = {}\n", " for param_name, param_meta in signal_mappings.items():\n", " param_expr, attr_hi, attr_low = param_meta['parameter_value'], param_meta['param_max'], param_meta['param_min']\n", " curve_chunks = []\n", " start=rec['start']\n", " for frame_idx in range(rec['frames']):\n", " curr_time = start + frame_idx * ifps\n", " # SUPPORTED VARIABLES\n", " # TODO: describe this somewhere\n", " math_env['t'] = curr_time\n", " math_env['scene_id'] = idx\n", " math_env['frame_id'] = frame_idx\n", " math_env['theme_id'] = rec.get('structural_segmentation_label',0)\n", " signal_value=eval(param_expr, math_env)\n", " #if inverse_relationship:\n", " if param_meta['invert_relationship_to_signal']:\n", " signal_value = 1-signal_value\n", " attr_value = signal_value*(attr_hi-attr_low)+attr_low\n", " curve_chunks.append(f\"{frame_idx}: ({attr_value})\")\n", " curve_str = ', '.join(curve_chunks)\n", " #scene_args[param_name] = curve_str\n", " #scene_args = OmegaConf.to_container(scene_args)\n", " curves[param_name] = curve_str\n", " return curves\n", "\n", "\n", "def compile_storyboard(storyboard=None, ignore_defaults=False):\n", " if storyboard is None:\n", " _, storyboard = load_storyboard()\n", " scenes = []\n", " story_defaults = storyboard.get('animation_args',{})\n", " for idx, rec in enumerate(storyboard.prompt_starts):\n", "\n", " # establish defaults\n", " scene_args=copy.deepcopy(story_defaults)\n", " if not ignore_defaults:\n", " scene_args = rec.get('animation_args',{})\n", " try:\n", " scene_args = OmegaConf.to_container(scene_args)\n", " except ValueError:\n", " pass\n", "\n", " # translate settings to deforum names\n", " resolve_prompt(idx, storyboard)\n", " #scene_args['prompts'] = {\"0\":rec['_prompt']}\n", " #scene_args['max_frames'] = rec['frames']\n", " scene_args['_prompt'] = rec['_prompt']\n", " scene_args['frames'] = rec['frames']\n", " scene_args['init_image'] = rec.get('frame0_fpath')\n", "\n", " curves = resolve_signals(idx, storyboard)\n", " scene_args.update(curves)\n", " scenes.append(scene_args)\n", " return scenes\n", "\n", "# TODO: copy init images into this folder, add to settings.txt's\n", "\n", "\n", "\n", "def make_compatible_for_deforum(settings, backfill_deforum_defaults = True):\n", " fields_mapping={\n", " #'prompts':'prompts',\n", " 'fps':'fs',\n", " 'extract_nth_frame':'extract_nth_frame',\n", " 'angle':'angle',\n", " 'zoom':'zoom',\n", " 'translation_x': 'translation_x',\n", " 'translation_y': 'translation_y',\n", " 'translation_z': 'translation_z',\n", " 'rotation_x': 'rotation_3d_x',\n", " 'rotation_y': 'rotation_3d_y',\n", " 'rotation_z': 'rotation_3d_z',\n", " 'frames':'max_frames',\n", " #'noise_add_curve':'noise_schedule',\n", " 'noise_curve':'noise_schedule',\n", " 'noise_scale_curve':'noise_multiplier_curve',\n", " 'strength_curve':'strength_schedule',\n", " 'steps_curve':'steps_schedule',\n", " 'srength_curve': 'strength_schedule',\n", " 'steps_curve': 'steps_schedule',\n", " }\n", " # SAI_scalars_2_deforum_curves = {\n", " # 'cfg_scale':'cfg_scale_schedule',\n", " # 'seed':'seed_schedule',\n", " # }\n", " #keep_params = set(fields_mapping) + set(SAI_scalars_2_deforum_curves)\n", "\n", " outv = {}\n", " if backfill_deforum_defaults:\n", " outv.update(DEFORUM_DEFAULTS )\n", "\n", " for k,v in settings.items():\n", " if k in fields_mapping:\n", " outv[fields_mapping[k]] = v\n", " elif k == '_prompt':\n", " outv['prompts'] = {\"0\":v}\n", " elif k == 'cfg_scale':\n", " outv['cfg_scale_schedule'] = f\"0: ({v})\"\n", " elif k == 'seed':\n", " if v == -1:\n", " outv['seed_schedule'] = \"0: (-1)\"\n", " else:\n", " seed_schedule_chunks = []\n", " for i in range(settings['max_frames']):\n", " #if v == -1:\n", " # v = random.randrange(0, 4294967295)\n", " seed_schedule_chunks.append(f\"{i}: ({v})\")\n", " v+=1\n", " outv['seed_schedule'] = ', '.join(seed_schedule_chunks)\n", " return outv\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "id": "PknSJ48jAmuP" }, "source": [ "## 1. 📋💬 Build Base Storyboard\n", "* Initial setup\n", "* Infer keyframes for scene segmentation" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "id": "cM8cux9b7F4v", "tags": [] }, "outputs": [], "source": [ "# @title ### 📋 Attach Storyboard (create or resume project)\n", "\n", "project_name = '' # @param {type:'string'}\n", "\n", "# TODO: make it so I can change this value without restarting the kernel....\n", "use_stability_api = False # @param {type:'boolean'}\n", "mount_gdrive = True # @param {type:'boolean'}\n", "resume=True # @param {type:'boolean'}\n", "\n", "# TODO: add support for whisper API\n", "\n", "# @markdown Enter a unique `project_name`.\n", "# @markdown If left blank, the current unix timestamp will be used\n", "# @markdown (seconds since 1970-01-01 00:00).\n", "\n", "# @markdown If you use the name of an existing project, the workspace will switch to that project (even if `resume` is unchecked. Each project needs a unique name).\n", "\n", "# @markdown Non-alphanumeric characters (excluding '-' and '_') will be replaced with hyphens.\n", "\n", "# @markdown ---\n", "\n", "# @markdown # Detailed Discussion\n", "# @markdown In VKTRS, a \"project\" is encapsulated by a folder.\n", "# @markdown With google drive loaded, it will be the `