Repository: ADT109119/WhisperGUI
Branch: main
Commit: 24a34f397e70
Files: 10
Total size: 14.2 KB

Directory structure:
gitextract_zebwciym/

├── .gitignore
├── LICENSE
├── README.md
├── main.py
├── requirements.txt
├── run.bat
├── setup.bat
├── update.bat
├── version.txt
└── 使用說明.txt

================================================
FILE CONTENTS
================================================

================================================
FILE: .gitignore
================================================
venv/
output/
model/
config.json

================================================
FILE: LICENSE
================================================
MIT License

Copyright (c) 2023 ADT109119

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.


================================================
FILE: README.md
================================================
# WhisperGUI

![pic](https://user-images.githubusercontent.com/106337749/221340883-4b437d03-97fc-42ee-821e-dd04096323fe.png)

> 此專案是我用2個小時簡單製作的WhisperGUI+快速安裝包，可以讓我們在使用Whisper時快速操作，無須打指令，以及讓懶得手動裝一堆東西的人可以快速的使用(Python以及FFmpeg還是要自己裝)。

> 2023.03.30更新 日前有人回報無法抓到GPU的問題，目前推測可能跟CUDA Toolkit有關，若未安裝CUDA Toolkit的人，可嘗試先安裝Nvidia CUDA Toolkit。若已安裝卻還是無法顯示GPU，可嘗試調整Bat檔中的第22行，將安裝的Torch CUDA版本改為與電腦的CUDA Toolkit版本相同。

> 2023.08.12更新 目前我有測試打包這個專案成執行檔，不用跑setup.bat創虛擬環境，直接下載下來解壓縮，執行裡面的**main.exe**就可以執行了，大家可以到releases裡下載下來試試看。(打包版經測試無須手動安裝CUDA Toolkit)

[OpenAI Whisper](https://github.com/openai/whisper)

## 專案用途

此專案的作用

在於方便大家可以快速設定好的Whisper執行環境

以及讓多數使用者

可以僅需透過此GUI介面操作使用

而無須打指令


## 功能

目前支援操作以下幾種Whisper的功能
1. 選擇多個音檔
1. 選擇輸出位置
1. 選擇使用模型
1. 選擇辨識語言
1. 選擇使用裝置(CPU、指定顯卡)
1. 將字幕翻為英文


## 畫面

> 操作示意 選擇檔案>選擇跑模型的裝置>執行完成

![選模型](https://user-images.githubusercontent.com/106337749/218459288-0fd24ee4-4ed6-49c9-a3f4-1fd97976a89d.png)
![選裝置](https://user-images.githubusercontent.com/106337749/218459323-faaf2d8d-0a68-4bfc-a6e3-62e45b94ad0f.png)
![執行完成](https://user-images.githubusercontent.com/106337749/218460468-a801fe68-0f01-479d-a4bd-4f04eea1af41.png)

## 安裝

> 請先自行安裝Python 3.7以上版本，以及FFmpeg

以下將會引導你如何在你的電腦上執行此專案。

### 取得專案

```bash
git clone git@github.com/ADT109119/WhisperGUI.git
```

**或是直接在GitHub頁面點Download ZIP**

### 確認電腦已有Pyhton以及FFmpeg

```bash
python --version
ffmpeg -version
```

### 執行setup.bat

請直接執行資料夾中的setup.bat，等待虛擬環境完成設置

### 執行專案

請直接執行資料夾中的run.bat，若無報錯，將可以看到GUI介面

## 資料夾說明

- model - 模型存放處
- output - 預設輸出資料夾
- venv - 虛擬環境資料夾
...

## 專案技術

- Python
- tkinter
- ttkbootstrap

## 聯絡作者

你可以透過以下方式與我聯絡

- [Email: 2.jerry32262686@gmail.com](mailto:2.jerry32262686@gmail.com)
...

## License
This project is under the MIT License. See [LICENSE](https://github.com/ADT109119/WhisperGUI/blob/main/LICENSE) for further details.


================================================
FILE: main.py
================================================
import os
import easygui
import tkinter as tk
from tkinter import filedialog
import tkinter.ttk
from tkinter.constants import *
from tkinter import messagebox
import subprocess
import json

import torch
import time
import urllib.request

import ttkbootstrap as ttk
# from ttkbootstrap.constants import *

#easygui.fileopenbox()

import webbrowser

import threading

def callback(url):
    webbrowser.open_new(url)


def selectPhotoFolder():
    outputDir = easygui.diropenbox("字幕存放資料夾")
    output_dir.delete(0, 'end')
    output_dir.insert(0, outputDir)
    config["outputDir"] = outputDir
    saveConfig("outputDir", outputDir)
    # with open('config.json', 'w', encoding='utf8') as f:
    #     json.dump(config, f)


def selectAudioFile():
    paths = filedialog.askopenfilenames()
    for path in paths:
        displayAudioFilePath.insert(END, path)
    # File = easygui.fileopenbox("選擇音檔檔案")
    # displayAudioFilePath.delete(0, 'end')
    # displayAudioFilePath.insert(0, File)
    
def detectAvailableDevice():
    if torch.cuda.is_available():
        for i in range(torch.cuda.device_count()):
            devices.append(torch.cuda.get_device_name(i))
        
    usingDevice['value'] = devices
    return

def deviceChange(index, value, op):
    #print("%s %s %s"%(index, value, op))
    dev = devices.index(usingDevice.get())
    config["usingDevice"] = dev
    saveConfig("usingDevice", dev)
    # with open('config.json', 'w', encoding='utf8') as f:
    #     json.dump(config, f)

    if dev > 0:
        print("cuda:%s"%(dev-1))
    else:
        print("cpu")

def deviceDecode():
    dev = devices.index(usingDevice.get())
    if dev > 0:
        return("cuda:%s"%(dev-1))
    else:
        return("cpu")

def languageChange(index, value, op):
    transcribeLanguage = languages.index(usingLanguage.get())
    config["transcribeLanguage"] = transcribeLanguage
    saveConfig("transcribeLanguage", transcribeLanguage)
    # with open('config.json', 'w', encoding='utf8') as f:
    #     json.dump(config, f)

def modelChange(index, value, op):
    model = models.index(usingModel.get())
    config["usingModel"] = model
    saveConfig("usingModel", model)
    # with open('config.json', 'w', encoding='utf8') as f:
    #     json.dump(config, f)

def versionCheck(ori): # fetch app version
    try:
        url = 'https://raw.githubusercontent.com/ADT109119/WhisperGUI/main/version.txt'
        response = urllib.request.urlopen(url)
        fetchVersion = response.read().decode('utf-8')
        if fetchVersion != "ver 1.9.1":
            checkVisitGithub = messagebox.askquestion(title="有新版本", message="目前最新版本為%s\n請問您是否想前往GitHub下載最新版本"%(fetchVersion))
            if checkVisitGithub == 'yes':
                callback("https://github.com/ADT109119/WhisperGUI")
        elif ori == 1:
            messagebox.showinfo(title="訊息", message="目前版本為最新")

    except:
        print("無法獲取版本資訊")

def saveConfig(key, value):
    global config
    config[key] = value
    with open('config.json', 'w', encoding='utf8') as f:
        json.dump(config, f)

def start_process():
    t = threading.Thread(target=process)
    t.start()

def process():
    # File = displayAudioFilePath.get(0)
    if displayAudioFilePath.size() == 0:
        messagebox.showerror(title="錯誤", message='未選擇音檔')
        return 0

    start = time.time()

    baseCommandStr = "venv\\Scripts\\whisper"
    
    for i in range(displayAudioFilePath.size()):
        path = displayAudioFilePath.get(i)
        print(path)
        commandStr = baseCommandStr + ' "%s"'%path

        languageInput = usingLanguage.get()

        if languageInput != "自動偵測":
            commandStr = commandStr + " --language %s "%languageInput

        deviceInput = deviceDecode()
        commandStr = commandStr + " --device %s --fp16 False"%deviceInput

        commandStr = commandStr + " --model %s "%usingModel.get()

        commandStr = commandStr + " -f srt"

        output_dirInput = output_dir.get()

        if outputToTheSamePathAsInputVar.get()=="1":
            output_dirInput = "/".join(path.split("/")[:-1])
            # print(output_dirInput)

        if output_dirInput != "":
            commandStr = commandStr + ' --output_dir "%s" '%output_dirInput

        commandStr = commandStr + " --model_dir %s "%("model")

        if translateToEnglishVar.get() == '1':
            commandStr = commandStr + " --task %s "%("translate")


        if initial_prompt.get() != "":
            commandStr = commandStr + ' --initial_prompt "%s" '%initial_prompt.get()

        # print(os.system("echo %s"%commandStr))
        # print(commandStr)
        processButton["state"] = "disable"
        
        out = subprocess.Popen(commandStr)
        (out, err) = out.communicate()
    

    end = time.time()

    messagebox.showinfo(title="訊息", message="處理完成\n花費時間%.2f秒"%(end-start))
    processButton["state"] = "normal"
    saveConfig("prompt", initial_prompt.get())
    #outputPreviewVar.set(out)
    #print(out)

# cinfig
config = {
    "outputDir": os.getcwd() + "\\output",
    "usingModel": 2,
    "usingDevice": 0,
    "transcribeLanguage": 0,
    "autoCheckVersion": True,
    "prompt": ""
}


if not os.path.exists(".\\config.json"):
    with open('config.json', 'w', encoding='utf8') as f:
        json.dump(config, f)
else:
    with open('config.json', 'r', encoding='utf8') as f:
        config = json.load(f)

if config["autoCheckVersion"] == True:
    versionCheck(0)


heightFix_1 = 70

window = tk.Tk()
window.title('WhisperGUI By The Walking Fish')
window.geometry('580x330')
window.resizable(False, False)

label1 = tk.Label(text='選擇音檔')
label1.place(x=0, y=10)
displayAudioFilePath = tk.Listbox(width=60, height=5)
displayAudioFilePath.place(x=80, y=10)
selectAudioFileButton = ttk.Button(text='＋添加', command=selectAudioFile)
selectAudioFileButton.place(x=510, y=20)
selectAudioFileButton = ttk.Button(text='－刪除', bootstyle='danger', command=lambda x=displayAudioFilePath: x.delete("active"))
selectAudioFileButton.place(x=510, y=50)


label2 = tk.Label(text='字幕存放資料夾')
label2.place(x=0, y=30+heightFix_1)
output_dir = tk.Entry(width=55)
output_dir.place(x=120, y=30+heightFix_1)
output_dir.insert(0, config["outputDir"])
selectPhotoPathButton = tk.Button(text='....', command=selectPhotoFolder)
selectPhotoPathButton.place(x=500, y=30+heightFix_1)

outputToTheSamePathAsInputVar = tk.StringVar()
outputToTheSamePathAsInput = tk.Checkbutton(text="檔案輸出到與個別輸入檔案相同位置", variable=outputToTheSamePathAsInputVar, onvalue="1", offvalue="0")
outputToTheSamePathAsInput.deselect()
outputToTheSamePathAsInput.place(x=300, y=60+heightFix_1)

label_usingModel = tk.Label(text='使用模型')
label_usingModel.place(x=0, y=60+heightFix_1)
var = tk.StringVar()
var.trace("w", modelChange)
usingModel = tkinter.ttk.Combobox(window, textvariable=var)
models = ['tiny', 'base', 'small', 'medium', 'large', 'large-v1', 'large-v2', 'large-v3', 'turbo']
usingModel['value'] = models
usingModel.current(config["usingModel"])
usingModel.place(x=60, y=60+heightFix_1)

label_usingDevice = tk.Label(text='使用裝置')
label_usingDevice.place(x=0, y=90+heightFix_1)
deviceVar = tk.StringVar()
deviceVar.trace("w", deviceChange)
usingDevice = tkinter.ttk.Combobox(window, textvariable=deviceVar)
devices = ['cpu']
usingDevice['value'] = devices
detectAvailableDevice()
usingDevice.current(config["usingDevice"])
usingDevice.place(x=60, y=90+heightFix_1)

label_language = tk.Label(text='辨識語言')
label_language.place(x=0, y=120+heightFix_1)
languageVar = tk.StringVar()
languageVar.trace_add("write", languageChange)
usingLanguage = tkinter.ttk.Combobox(window, textvariable=languageVar)
languages = ['自動偵測', "Afrikaans","Albanian","Amharic","Arabic","Armenian","Assamese","Azerbaijani","Bashkir","Basque","Belarusian","Bengali","Bosnian","Breton","Bulgarian","Burmese","Castilian","Catalan","Chinese","Croatian","Czech","Danish","Dutch","English","Estonian","Faroese","Finnish","Flemish","French","Galician","Georgian","German","Greek","Gujarati","Haitian","Haitian Creole","Hausa","Hawaiian","Hebrew","Hindi","Hungarian","Icelandic","Indonesian","Italian","Japanese","Javanese","Kannada","Kazakh","Khmer","Korean","Lao","Latin","Latvian","Letzeburgesch","Lingala","Lithuanian","Luxembourgish","Macedonian","Malagasy","Malay","Malayalam","Maltese","Maori","Marathi","Moldavian","Moldovan","Mongolian","Myanmar","Nepali","Norwegian","Nynorsk","Occitan","Panjabi","Pashto","Persian","Polish","Portuguese","Punjabi","Pushto","Romanian","Russian","Sanskrit","Serbian","Shona","Sindhi","Sinhala","Sinhalese","Slovak","Slovenian","Somali","Spanish","Sundanese","Swahili","Swedish","Tagalog","Tajik","Tamil","Tatar","Telugu","Thai","Tibetan","Turkish","Turkmen","Ukrainian","Urdu","Uzbek","Valencian","Vietnamese","Welsh","Yiddish","Yoruba"]
usingLanguage['value'] = languages
usingLanguage.current(config["transcribeLanguage"])
usingLanguage.place(x=60, y=120+heightFix_1)

translateToEnglishVar = tk.StringVar()
translateToEnglish = tk.Checkbutton(text="將輸出字幕翻譯為英文", variable=translateToEnglishVar, onvalue="1", offvalue="0")
translateToEnglish.deselect()
translateToEnglish.place(x=0, y=150+heightFix_1)

initial_prompt_label = tk.Label(text='內容提示詞')
initial_prompt_label.place(x=0, y=180+heightFix_1)
initial_prompt_var = tk.StringVar(value=config.get("prompt"))
initial_prompt = tk.Entry(width=55, textvariable=initial_prompt_var)
initial_prompt.pack()
initial_prompt.place(x=80, y=180+heightFix_1)

label_copyright = tk.Label(text='MIT License')
label_copyright.place(x=0, y=210+heightFix_1)
label_author = tk.Label(text='製作: The Walking Fish')
label_author.bind("<Button-1>", lambda e: callback("https://www.youtube.com/@the_walking_fish"))
label_author.place(x=0, y=230+heightFix_1)

processButton = tk.Button(text='執行', width=20, command=start_process)
processButton.place(anchor='center', x=290, y=240+heightFix_1)

# menu
menu = tk.Menu(window)
settingMenu = tk.Menu(menu)
autoCheckVar = tk.BooleanVar()
autoCheckVar.trace_add("write", lambda index, value, op: saveConfig("autoCheckVersion", autoCheckVar.get()))
autoCheckVar.set(config["autoCheckVersion"])
settingMenu.add_checkbutton(label="自動檢查版本", variable=autoCheckVar)
settingMenu.add_separator()
settingMenu.add_command(label="檢查版本", command=lambda: versionCheck(1))
menu.add_cascade(label="setting", menu=settingMenu)
window.config(menu=menu)

window.mainloop()


================================================
FILE: requirements.txt
================================================
easygui
ttkbootstrap
openai-whisper
requests

================================================
FILE: run.bat
================================================
venv\Scripts\python main.py


================================================
FILE: setup.bat
================================================
@echo off
goto :DOES_PYTHON_EXIST


:DOES_PYTHON_EXIST
python -V | find /v "Python" >NUL 2>NUL && (goto :PYTHON_DOES_NOT_EXIST)
python -V | find "Python"    >NUL 2>NUL && (goto :PYTHON_DOES_EXIST)
goto :EOF

:PYTHON_DOES_NOT_EXIST
echo Python is not installed on your system.
echo Now opeing the download URL.
start "" "https://www.microsoft.com/store/productId/9PJPW5LDXLZ5"
PAUSE
goto :EOF

:PYTHON_DOES_EXIST
:: This will retrieve Python 3.8.0 for example.
for /f "delims=" %%V in ('python -V') do @set ver=%%V
echo Congrats, %ver% is installed...
python -m venv .\venv
venv\Scripts\pip install torch==1.13.1 torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116
venv\Scripts\pip install -U -r requirements.txt

PAUSE
goto :EOF


================================================
FILE: update.bat
================================================
venv\Scripts\pip install -U openai-whisper

================================================
FILE: version.txt
================================================
ver 1.9.1

================================================
FILE: 使用說明.txt
================================================
1.先打開setup.bat，此檔案會自動創建虛擬環境，以及安裝所需要的各種函式庫
2.打開run.bat，開始使用
3.在輸出資料夾找到檔案