English | 中文
This is a gradio library and research repository that combines SOTA AI applications. It can help you achieve anything - all you need to do is provide prompts and make one click. Through the prompts and creativity of SOTA models, you can do anything.You don't have to install all the features, you can install them according to the features you want to use. **Motivation** Currently, the “Anything” AI intelligent agent backend has been accumulated for engineering and research. This requires the use of more multi-modal tasks and zero-shot models, not only to provide multi-modal AI processing web UI, but also to gradually enrich its functionality. You can accomplish anything through this project! Let’s learn more about the development progress and plan of this project, and the final complete intelligent agent that combines the local GPT repository can help you call any AI task! Questions, stars, forks,You can also become a developer. ## Feature 1. (YOCO) It is not just a tool that can prompt anything 🔥 Data Engine: In addition, we will introduce video, audio, and 3D annotations in the future. YOCO relies on integrated multimodal models and auxiliary generators such as ChatGPT. Of course, it is not omnipotent. Through effective fully automatic annotation and stable diffusion series methods to produce and control data that meet the requirements, we complete the “data engine” and generate customized label formats that facilitate the training of conventional models. 🔥 Model Training: For each model, we not only need to use it, but also read its paper, fine-tuning methods, and communicate with the original author to try some development work for improvement and better training. We use fine-tune large models and customized label formats generated by YOCO to more efficiently train conventional models.
2. 🚀 Interactive content creation and visual GPT
Integrate diversified GPT, mainly using the port of chatgpt, and use the open-source Tsinghua VISUALGLM to deploy and fine-tune localized GPT, as well as try to improve the model structure. Through multimodal application tools, we can conduct dialogues and content creation.
easy example( asr->llM_model->tts->a2f app)
https://github.com/positive666/Prompt-Can-Anything/assets/28972473/c9cc64af-939d-480f-a684-08d8db34b25f
3. ⭐ 3D && 2D Avatar(comming soon)
Complete a role design interaction through a 3D Engine combined with multimodal tasks such as GPT;
Complete a role design interaction through the Sadtalker open source project and multimodal tasks such as GPT.
4. 🔥🔥🚀 Unlimited potential “Anything”
Through continuous creativity and accumulation, we will integrate and learn from Sota AI. We will record each integrated model and provide a detailed explanation and summary in the article. The author will summarize all the AI-related knowledge reserves and engineering experience for the local large model (this part is the final development function and is planned).
| name | backbone | Data | Checkpoint | model-config | |
|---|---|---|---|---|---|
| 1 | Tag2Text-Swin | Swin-Base | COCO, VG, SBU, CC-3M, CC-12M | Download link | |
| 2 | Segment-anything | vit | Download link| Download link| Download link | ||
| 3 | Lama | FFC | Download link | ||
| 4 | GroundingDINO-T | Swin-T | O365,GoldG,Cap4M | Github link | HF link | link |
| 5 | GroundingDINO-B | Swin-B | COCO,O365,GoldG,Cap4M,OpenImage,ODinW-35,RefCOCO | Github link | HF link | link |
"Prompt" control models output, example
2. webui(all)
```pyhton
python app.py
```
2.1 audio2face with llm model (Beta)
In Fact, ASR\TTS\LLM ,They are all arbitrarily replaceable.
this is a easy example, support chatglm,chatgpt(you can use anything llm model,but you need custom )
start asr&tts with audio2face
you need install audio2face in omniverse APP,see
https://www.nvidia.cn/omniverse/
step1. In audio2face,open a demo ,choose a Player ,auto build Trt engine ,(not support GTX10xx GPU),latest version support chinese!
get model pim path.

step 2. in webui , configure your Prim path "Avatar_instance_A" in config_private.py , click"start system" and" Speech_system"
## 🔨To Do List
- [x] Release demo and code.
- [x] web ui demo
- [x] Support ChatGPT/VISUALGLM/ASR/TTS
- [x] YOCO labeling fine-tuning of VISUALGLM demo[next week]
- [x] 3D && 2D avatar
- [ ] Complete the planned AI combination “Anything”
- [ ] Fine-tune the segmentation and ground detectors of SAM, and expand the input control of SAM
- [ ] Release training methods
- [ ] Knowledge cloning
## :cupid: Acknowledgements
- [gpt_academic](https://github.com/binary-husky/gpt_academic)
- [Segment Anything](https://github.com/facebookresearch/segment-anything)
- [Grounding DINO](https://github.com/IDEA-Research/GroundingDINO)
- [Tag2text](https://github.com/xinyu1205/Tag2Text)
- [SadTalker](https://github.com/OpenTalker/SadTalker)
- [lama](https://github.com/advimman/lama)
- [ VisualGLM-6B](https://github.com/THUDM/VisualGLM-6B.git)
Thanks for their great work!
================================================
FILE: README_zh.md
================================================
# Prompt-Can-Anything
这是一个结合SOTA AI的应用web库以及研究的储备库,它能够帮你实现一切:你只需要提供提示!只需一次点击!通过SOTA模型的提示和创意,你可以做任何事情。
**动机**
当前:为工程和研究所积累的AI智能体后台”安尼森“,这需要使用更多的多模态任务以及zero-shot模型,不仅提供多模态的AI处理web UI,逐渐丰富的功能。
目标:你可以通过它完成一切事情!让我们详细了解下该项目的开发进度和计划,最终完整的智能体结合本地储备的GPT可以帮你调用一切AI任务!欢迎提问、star和fork,以及伸出援助之手~
## 特性
1. (YOCO)它不仅是一个可以提示任何事情的工具
🔥 数据引擎:
此外,我们将在未来引入视频、音频和3D注释,YOCO依赖于集成的多模态模型以及GPT等辅助生成,当然它并不是万能的,通过有效的全自动标注和stable diffusion系列的方法去生产和控制符合需求的数据,完成”数据引擎“,并且生成的定制化的标签格式,去便于训练常规模型。
🔥 模型训练:
对于每一个模型我们不仅要做到使用,还在阅读它的论文和微调方法以及和原作者交流,尝试一些改进和更好训练的开发工作,Fine-tune大模型和通过YOCO生成的定制化的标签格式,更高效地训练常规模型。
2. 🚀交互内容创作和视觉&&语音GPT
集成多样化GPT,目前主要以chatgpt的端口为主,利用开源的清华VISUALGLM,我们实现本地化GPT的部署和微调,以及尝试改进模型结构,通过多模态的应用工具进行对话和内容创作,支持语音识别、语音合成、并发送Audio2face.
这是一个最简单的例子
https://github.com/positive666/Prompt-Can-Anything/assets/28972473/c9cc64af-939d-480f-a684-08d8db34b25f
3. ⭐ 应用角色扮演—— 3D &&2D 虚拟人(开发中)
通过3D引擎去结合GPT等多模态任务完成一个角色设计互动;
通过saldtalker开源项目去结合GPT等多模态任务完成一个角色设计互动;
4. 🔥🔥🚀无限的潜力“安尼森”
不断的创意和积累,SOTA -AI的集成和学习,我们会通过记录每一个集成的模型,对它进行一次详解,总结在文章中。
作者AI相关所有知识储备和工程经验总结给本地大模型(这部分是最终开发功能,计划中)
| 名称 | 骨干 | 数据 | 权重 | 模型配置 | |
|---|---|---|---|---|---|
| 1 | Tag2Text-Swin | Swin-Base | COCO、VG、SBU、CC-3M、CC-12M | 下载链接 | |
| 2 | Segment-anything | vit | 下载链接| 下载链接| 下载链接 | ||
| 3 | Lama | FFC | 下载链接 | ||
| 4 | GroundingDINO-T | Swin-T | O365、GoldG、Cap4M | Github链接 | HF链接 | 链接 |
"Prompt" control models output, example
python auto_label_demo.py --source --save-txt --save-mask --save-xml --save_caption
2. webui
```pyhton
python app.py
```
2.语音大语言模型&&驱动a2f
这是一个简单的例子,实际上asr、tts\llm_model\这些组件是可以任意替换的,只要你具备基本的开发能力,通过语言模型和语音驱动去完成A2F的服务,你需要安装Omniverse软件和Audio2face的应用,GPU不能是比较旧的帕斯卡架构,详情可以看https://www.nvidia.cn/omniverse/
步骤1.在Omniverse中,点击如图下的例子,安装一个Demo player,它会自动完成tensortt的构建,然后可以如下图中获取Player的路径Prim Path

步骤2. 程序运行起来后,将上面获得的路径拷贝,填写在config_private的“Avatar_instance_A”,在web端如图下操作点击 ‘start system’后,点击加载“Speech_system”启动语音模式,但是注意TTS是网络服务。
## 🔨计划清单
- [x] 释放初版
- [x] web ui 界面调整
- [x] 支持chatgpt/VISUALGLM/ASR/TTS
- [x] Yoco一键标注微调VISUALGLM Demo
- [x] 3d &&2d avatvor
- [ ] 完成计划的AI结合体“安尼森”
- [ ] 微调sam分割器 and ground检测器 ,拓展SAM的输入控制
- [ ] 释放训练方法.
- [ ] 知识克隆
## 参考工作
- [gpt_academic](https://github.com/binary-husky/gpt_academic)
- [Segment Anything](https://github.com/facebookresearch/segment-anything)
- [Grounding DINO](https://github.com/IDEA-Research/GroundingDINO)
- [Tag2text](https://github.com/xinyu1205/Tag2Text)
- [SadTalker](https://github.com/OpenTalker/SadTalker)
- [lama](https://github.com/advimman/lama)
- [VisualGLM-6B](https://github.com/THUDM/VisualGLM-6B.git)
感谢他们的出色工作!
================================================
FILE: a2f.py
================================================
import argparse
import functools
import os
import yaml
import numpy as np
import ffmpeg
import grpc
import grpc
import audio2face_pb2
import audio2face_pb2_grpc
from pydub import AudioSegment
from pydub.silence import split_on_silence
import soundfile
from audio2face_streaming_utils import push_audio_track_stream,push_audio_track,push_stream
import pyaudio
import wave
from queue import Queue
import time
import whisper
import requests
#from llm_cards.bridge_chatgpt import predict
from config_private import API_KEY
import uuid
import re
import asyncio
import threading
# 创建事件,用于线程间同步
send_event = threading.Event()
# 按秒截取音频
def get_part_wav(sound, start_time, end_time, part_wav_path):
save_path = os.path.dirname(part_wav_path)
if not os.path.exists(save_path):
os.makedirs(save_path)
start_time = int(start_time) * 1000
end_time = int(end_time) * 1000
word = sound[start_time:end_time]
word.export(part_wav_path, format="wav")
def crop_wav(path, crop_len):
for src_wav_path in os.listdir(path):
wave_path = os.path.join(path, src_wav_path)
print(wave_path[-4:])
if wave_path[-4:] != '.wav':
continue
file = wave.open(wave_path)
# 帧总数
a = file.getparams().nframes
# 采样频率
f = file.getparams().framerate
# 获取音频时间长度
t = int(a / f)
print('总时长为 %d s' % t)
# 读取语音
sound = AudioSegment.from_wav(wave_path)
for start_time in range(0, t, crop_len):
save_path = os.path.join(path, os.path.basename(wave_path)[:-4], str(uuid.uuid1()) + '.wav')
get_part_wav(sound, start_time, start_time + crop_len, save_path)
from concurrent.futures import ThreadPoolExecutor
def process_chunk(model, chunk, detect_language):
# make log-Mel spectrogram and move to the same device as the model
mel = whisper.log_mel_spectrogram(chunk).to(model.device)
# detect the spoken language
speech_language = 'zh'
if detect_language :
_, probs = model.detect_language(mel)
speech_language = max(probs, key=probs.get)
# decode the audio
options = whisper.DecodingOptions()
result = whisper.decode(model, mel, options)
return result.text, speech_language
def speech_recognition(inputs, model,stream_model=False,detect_language=False):
# whisper
all_result=''
speech_language='zh'
executor = ThreadPoolExecutor()
results = []
audio=None
if not stream_model:
audio,sr= soundfile.read(inputs, dtype='float32')
else:
print('numpy data')
sr,audio=inputs
data = audio / 65538
audio = data.astype(np.float32)
print(sr)
chunk_size= sr*30
print((audio))
for i in range(0, len(audio), chunk_size):
chunk_end = min(i + chunk_size, len(audio))
chunk = whisper.pad_or_trim(audio[i:chunk_end])
# submit the chunk to the thread pool for processing
results.append(executor.submit(process_chunk, model, chunk, detect_language))
# print the recognized text and the detected language
for result in results:
text, language = result.result()
#print(text)
all_result += text
speech_language = language
# # print the recognized text
# all_result+=result.text
return all_result, speech_language
Avatar_instance_A='/World/audio2face/PlayerStreaming'
a2f_url = 'localhost:50051' # The audio2face url by default
sample_rate_Omniverse = 22050 # Audio frame rate
# 录音参数
CHUNK = 1024
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 16000
RECORD_SECONDS =5
audio_file = "F:\\VoiceprintRecognition-Pytorch-develop\\error001.wav"
buffer_length=int(RATE / CHUNK * RECORD_SECONDS)
record_file='record.wav'
p = pyaudio.PyAudio()
def mic_audio(record_file="record.wav"):
# 打开录音
import keyboard
stream = p.open(
input_device_index=1,
format=FORMAT,
channels=CHANNELS,
rate=RATE,
input=True,
frames_per_buffer=CHUNK)
print("Recording...")
frames = []
while True:
data = stream.read(CHUNK)
frames.append(data)
if keyboard.is_pressed('s'):
break
stream.stop_stream()
stream.close()
p.terminate()
wf = wave.open(record_file, 'wb')
wf.setnchannels(CHANNELS)
wf.setsampwidth(p.get_sample_size(FORMAT))
wf.setframerate(RATE)
wf.writeframes(b''.join(frames))
wf.close()
return 'OK'
import edge_tts
async def tts_send(text,onmiverse=False,send_file='voice_dir/send_a2f.wav'):
if text is not None:
sentences = re.split(r'[!?。: ]', text)
sentences = [s.strip() for s in sentences if s.strip()]
sentences_len=len(sentences)
audio_chunks = {}
async def process_sentences():
tasks = []
for i, sentence in enumerate(sentences):
if len(sentence) > 0:
# 提交任务到协程池
task = asyncio.create_task(speak(sentence, i % sentences_len))
tasks.append(task)
await asyncio.gather(*tasks)
async def speak(sentence, worker_id):
# 合成语音
print(worker_id)
audio_stream =edge_tts.Communicate(sentence, voice='zh-CN-YunxiNeural', rate='+1%', volume='+1%').stream()
async for package in audio_stream:
if package['type'] == 'audio':
# 获取音频数据的字节流(chunk)
audio_chunk = package['data']
# 将音频数据添加到字典中
if worker_id not in audio_chunks:
audio_chunks[worker_id] = []
audio_chunks[worker_id].append(audio_chunk)
await process_sentences()
# 将每个协程合成的音频数据拼接起来
audio_data = b''
for i in range(sentences_len):
if i in audio_chunks:
for chunk in audio_chunks[i]:
audio_data += chunk
with open(f'{send_file}', 'wb') as f:
f.write(audio_data)
if onmiverse:
audio_data, samplerate = soundfile.read(f'{send_file}', dtype="float32")
if len(audio_data.shape) > 1:
audio_data = np.average(audio_data, axis=1)
push_audio_track_stream(a2f_url, audio_data, samplerate, Avatar_instance_A)
async def tts_a2f(text):
import edge_tts
import soundfile as sf
import numpy as np
from audio2face_streaming_utils import push_audio_track_stream
generate_wave = edge_tts.Communicate(text, voice='zh-CN-YunxiNeural', rate='-5%', volume='+1%')
await generate_wave.save('./voice_dir/send_frame.wav')
try:
audio_data, samplerate = sf.read('./voice_dir/send_frame.wav', dtype="float32")
if len(audio_data.shape) > 1:
audio_data = np.average(audio_data, axis=1)
push_audio_track_stream(a2f_url, audio_data, samplerate , Avatar_instance_A)
print("send done")
return 'Send Done!'
except Exception as e:
print(f"检查是否开启omniverse!!!")
def push_stream(url,player,dir="voice_dir/send_omniverse.wav"):
from audio2face_streaming_utils import push_audio_track_stream
import soundfile
import numpy as np
retry=0
while True:
try:
audio_data,sr= soundfile.read(dir, dtype='float32');break
except :
print("tts合成速度稍慢,等待....")
retry += 1
print('正在重试')
if retry >=2: raise TimeoutError
if len(audio_data.shape) > 1:
audio_data = np.average(audio_data, axis=1)
push_audio_track_stream(url, audio_data, sr, player)
def audio_synthesis(gpt_replying_buffer,url,player):
import threading
threading.Thread(target=process_send_stream, args=(gpt_replying_buffer,url,player,)).start()
def process_send_stream(gpt_replying_buffer,url,player):
import subprocess
dir="voice_dir/send_omniverse.wav"
cmd = f'edge-tts --voice {"zh-CN-YunxiNeural"} --text "{gpt_replying_buffer}" --write-media {dir} '
subprocess.run(cmd, shell=True)
time.sleep(0.5)
push_stream(url,player,dir)
def receive_max(q,Text):
global receive_flag
receive_flag=True
sentences = re.split(r'[!?。: ,]', Text)
sentences = [s.strip() for s in sentences if s.strip()]
# from VITS import
while True :
if len(sentences)>0 :
#audio_data=vit_tts(sentences.pop(0)
#audio_data=r'voice_dir/send_frame.wav'
audio_data=edge_tts.Communicate(sentences.pop(0), voice='zh-CN-YunxiNeural', rate='+1%', volume='+1%')
q.put((audio_data,True))
print('done')
else :
print('语音合成线程结束......')
receive_flag=False
break
###--------线程:收集数据,中转处理源buffer收集后发送------------###
def send_stream2(q):
global mess
global receive_flag
mess=False
with grpc.insecure_channel(a2f_url) as channel:
stub= audio2face_pb2_grpc.Audio2FaceStub(channel)
def create_generator():
global mess
while True:
if not q.empty():
#取出队列中的音频文件路径和对应的发送标志位
#print("检查缓存容量 :",q.qsize())
#time.sleep(2)
audio_data,send_flag = q.get()
if not send_flag:
# TODO: 将音频文件发送出去
print(f'Sending audio...')
audio_data,sr= soundfile.read('voice_dir/send_framex.wav', dtype='float32')
if len(audio_data.shape) > 1:
audio_data = np.average(audio_data, axis=1)
#yield audio2face_pb2.PushAudioStreamRequest(start_marker=Avatar_instance_A)
#for i in range(len(audio_data) // sr//10 + 1):
# chunk = audio_data[i * sr//10: i * sr//10+ sr//10]
#yield audio2face_pb2.PushAudioStreamRequest(audio_data=chunk.astype(np.float32).tobytes())
push_audio_track_stream(a2f_url, audio_data, sr, Avatar_instance_A)
send_flag=True
# 重置事件状态
send_event.clear()
else:
if not receive_flag:
print("发送线程结束")
break
else:
continue
stub.PushAudioStream(create_generator())
def audio_chatbot(text):
q = Queue()
t1 = threading.Thread(target=receive_max,args=(q,text))
t2 = threading.Thread(target=send_stream2,args=(q,))
t1.start()
t2.start()
# t1.join()
#t2.join()
global receive_flag
while True:
send_flag=True
# 从队列中取出音频文件路径和对应的发送标志位
audio, send_flag = q.get()
if not send_flag:
# 将音频文件路径放回队列(因为发送是在另一个线程中完成的)
q.put((audio,False))
# 设置事件,通知发送线程可以发送该音频
send_event.set()
if not receive_flag:
break
if __name__ == "__main__":
text = "这里是一段较长的文本,需要拆分成多个句子来进行语音合成!句子也可以用问号来结尾吗?\
当然可以。我要实现一个人工智能,这里是一段较长的文本,需要拆分成多个句子来进行语音合成!句子也可以用问号来结尾吗?当然可以。我要实现一个人工智能,但是我需要很多时间和精力完成\
这里是一段较长的文本,需要拆分成多个句子来进行语音合成!句子也可以用问号来结尾吗?当然可以。我要实现一个人工智能,但是我需要很多时间和精力完成"
# 启动主程序
audio_chatbot(text)
# t1=time.time()
# asyncio.run(tts_send(text))
# print(time.time()-t1)
# # t1 = threading.Thread(target=send_stream)
# t1=time.time()
# #asyncio.run(tts_a2f(text))
# print(time.time()-t1)
================================================
FILE: app.py
================================================
from model_cards.autoback import AutoBackend
import argparse
import os
import platform
import sys
from pathlib import Path
import numpy as np
import torch
import torch.backends.cudnn as cudnn
import matplotlib.pyplot as plt
from PIL import Image,ImageDraw,ImageFont
from utils.ops import (LOGGER, Profile, check_file, check_requirements, colorstr, cv2,
dilate_mask, increment_path , scale_boxes, xyxy2xywh,save_format)
from utils.plot import Annotator, save_one_box,show_box,show_mask,save_mask_data,Draw_img
from config_private import *
from llm_cards.bridge_all import predict_all,talk_all
from llm_cards.bridge_chatgpt import Talk_with_app
from llm_cards.core_functional import get_core_functions
from utils.toolbox import format_io, find_free_port, on_file_uploaded, on_report_generated, get_conf, ArgsGeneralWrapper, load_chat_cookies, DummyWith
from utils.torch_utils import select_device
from utils import VID_FORMATS,IMG_FORMATS,write_categories
import gradio as gr
import random
import json
import multiprocessing as mp
import asyncio
import concurrent.futures
from utils.colorful import *
functional = get_core_functions()
VisualGLM_dir=f"VisualGLM_6B"
sys.path.append(VisualGLM_dir)
FILE = Path(__file__).resolve()
ROOT = FILE.parents[0] # root directory
if str(ROOT) not in sys.path:
sys.path.append(str(ROOT)) # add ROOT to PATH
ROOT = Path(os.path.relpath(ROOT, Path.cwd())) # relative
global categories
categories = {}
global category_colors
category_colors={}
# 初始对应类别编号
class_ids = []
global speech_AI
speech_AI={'asr':{'whisper':None},'tts':{'tts_VITS':None,'tts_edge': None}} ## speech
global models_config
models_config = {'tag2text': None, 'ram': None,'lama': None,'sam': None,'grounded': None,'sd': None, ## cv with text
'visual_glm': None , 'trans_zh': None,'gligen': None}
NUM_WORKERS=1
JSON_DATASETS=[]
operation_running = False
def toggle_operation(flag):
import whisper
from a2f import speech_recognition,mic_audio,keyboard
if speech_AI['asr']['whisper'] is None:
speech_AI['asr']['whisper']=whisper.load_model("small",
download_root="weights")
print("asr加载完毕,开始录音!")
text=[]
speech_text=''
while True:
# result_txt="你好我没有正确识别到结果"
if keyboard.is_pressed('q'):
mic_audio('voice_dir/send_asr.wav')
speech_text,__=speech_recognition('voice_dir/send_asr.wav',speech_AI['asr']['whisper'],False)
break
print(speech_text)
text.append(speech_text)
return text
async def sadtalker_demo(checkpoint_path,config_path,source_image,
driven_audio,
preprocess_type,
is_still_mode,
enhancer,
batch_size,
size_of_image,
pose_style,
exp_weight):
sys.path.append('SadTalker')
from SadTalker.app import SadTalker
sadtaker_model=SadTalker(checkpoint_path, config_path, lazy_load=True)
output = await asyncio.to_thread(sadtaker_model.test, source_image,
driven_audio,
preprocess_type,
is_still_mode,
enhancer,
batch_size,
size_of_image,
pose_style,
exp_weight)
return output
def train_visualGLM(name,model_size,mode,train_iters,resume_data,
max_source_length,max_target_length,lora_rank,layer_range_s,layer_range_e,pre_seq_len,
train_data,valid_data,distributed_backend,lr_decay_style,warmup,
checkpoint_activations,save_interval,eval_interval,save_path,
split,eval_iters,eval_batch_size ,zero_stage,
lr,batch_size,accumulation_steps,method_type):
model_args=[max_source_length,max_target_length,lora_rank,layer_range_s,layer_range_e,pre_seq_len]
gpt_option=[name,int(model_size),mode,int(train_iters),resume_data, #23
train_data,valid_data,distributed_backend,lr_decay_style,warmup,
checkpoint_activations,int(save_interval),int(eval_interval),save_path,
int(split),int(eval_iters),int(eval_batch_size),int(zero_stage),
lr,int(batch_size),int(accumulation_steps)]
processes = []
for i in range(NUM_WORKERS):
p = mp.Process(target=start_finetuning_process, args=(gpt_option,model_args,method_type))
p.start()
processes.append(p)
for p in processes:
p.join()
return 'OK'
#具体参数待修复调整
def start_finetuning_process(gpt_option,model_args,method_type):
print('fine subprocess start')
script_path = os.path.abspath(__file__)
script_dir = os.path.dirname(script_path)
print(script_dir+'/'+VisualGLM_dir)
main_dir = os.path.dirname(script_dir)
model_args = f'--max_source_length {model_args[0]} --max_target_length {model_args[1]} --lora_rank {model_args[2]} --layer_range {model_args[3]} {model_args[4]} --pre_seq_len {model_args[5]}'
options_nccl = 'NCCL_DEBUG=info NCCL_IB_DISABLE=0 NCCL_NET_GDR_LEVEL=2'
host_file_path = 'hostfile_single'
gpt_option_prefix=f" \
--experiment-name finetune-{gpt_option[0]} \
--model-parallel-size {gpt_option[1]} \
--mode {gpt_option[2]} \
--train-iters {gpt_option[3]} \
--resume-dataloader \
{model_args} \
--train-data {gpt_option[5]} \
--valid-data {gpt_option[6]} \
--distributed-backend {gpt_option[7]} \
--lr-decay-style {gpt_option[8]}\
--warmup {gpt_option[9]} \
--checkpoint-activations \
--save-interval {gpt_option[11]} \
--eval-interval {gpt_option[12]} \
--save {gpt_option[13]} \
--split {gpt_option[14]}\
--eval-iters {gpt_option[15]} \
--eval-batch-size {gpt_option[16]}\
--zero-stage {gpt_option[17]} \
--lr {gpt_option[18]} \
--batch-size {gpt_option[19]} "
lora=f" \
--skip-init \
--fp16 \
--use_lora "
qlora=f"--gradient-accumulation-steps {gpt_option[20]} \
--skip-init \
--fp16 \
--use_qlora"
ptune=f" \
--skip-init \
--fp16 \
--use_ptuning"
if method_type=='use_qlora':
gpt_options=gpt_option_prefix+qlora
elif method_type=='use_lora':
gpt_options=gpt_option_prefix+lora
elif method_type=='use_ptuning':
gpt_options=gpt_option_prefix+ptune
else:
LOGGER.info("没有选择训练方法!!!")
return
run_cmd = f'{options_nccl} deepspeed --master_port 16666 --hostfile {host_file_path} {VisualGLM_dir}/finetune_visualglm.py {gpt_options} '
os.system(run_cmd)
async def load_speech_model(asr_method,tts_method):
import whisper
global speech_AI
if asr_method=='whisper' :
speech_AI['asr']['whisper']= whisper.load_model("small",download_root="weights")
LOGGER.info('loads whisper')
elif not asr_method and speech_AI['asr']['whisper']:
LOGGER.info('free memory')
speech_AI['asr']['whisper']=None
else:
LOGGER.info('pass')
if tts_method =="VITS":
print('调试中,很快更新')
# speech_AI['tts']['VITS'] =
# LOGGER.info('loads whisper')
elif not tts_method:
LOGGER.info('pass')
return '语音识别记载完成'
def save_text2img_data(prompt,label,img_name,zh_select):
global JSON_DATASETS
if not prompt :
prompt=f"这张图片的背景里有什么内容?"
if not zh_select:
prompt=f'What contents are present in the background of this picture?'
example = {
"img": f"{img_name}",
"prompt": prompt,
"label": label
}
JSON_DATASETS.append(example)
async def load_auto_backend_models(lama, sam, det,tag2text,ram, trans_zh, visual_glm,device=0, quant=4, bar=None):
try:
with concurrent.futures.ThreadPoolExecutor() as pool:
wait_coros = asyncio.get_event_loop().run_in_executor(pool, load_auto_backend_model, lama, sam, det, tag2text,ram,trans_zh, visual_glm,device, quant, bar)
await asyncio.wait([wait_coros])
await asyncio.sleep(0.01)
except Exception as e:
LOGGER.info("An error occurred: ", e)
return 'windows可能会出现问题,请再次点击加载按钮,也可以检查后台'
return 'Loads Done !'
def load_auto_backend_model(lama,sam,det,tag2text,ram,trans_zh,visual_glm,device,quant,bar):
"""
加载模型库
"""
# Load model
global models_config
if visual_glm and not models_config['visual_glm']:
from VisualGLM_6B.chatglm import VisualGLM
models_config['visual_glm']=VisualGLM(gpu_device=int(device),quant=int(quant))
LOGGER.info(f'GPU{int(device)}———量化VisualGLM模型:int{int(quant)}')
elif not visual_glm:
LOGGER.info('no select visualGLM')
models_config['visual_glm']=None
else:
LOGGER.info('free or no visual_glm')
device = select_device(device)
if tag2text and not models_config['tag2text']:
models_config['tag2text'] = AutoBackend("tag2text",weights=Tag2Text_Model_Path,device=device)
elif not tag2text :
LOGGER.info('no tag2text')
models_config['tag2text'] =None
else :
LOGGER.info('free or tag2text pass')
if det and not models_config['grounded']:
models_config['grounded'] = AutoBackend("grounded-DINO",weights=GROUNED_MODEL_TYPE['S'], device=device,
args_config= 'model_cards/groundingdino/config/GroundingDINO_SwinT_OGC.py')
elif not det :
models_config['grounded'] =None
else :
LOGGER.info('free or grounded pass')
if sam and not models_config['sam']:
models_config['sam']= AutoBackend("segment-anything",weights=SAM_MODEL_TYPE['vit_h'] ,device=device)
elif not sam :
models_config['sam'] =None
else:
LOGGER.info("PASS SAM")
if ram and not models_config['ram']:
LOGGER.info("ram loads")
models_config['ram']= AutoBackend('ram',weights=Ram_Model_Path ,device=device)
elif not ram :
models_config['ram'] =None
else:
LOGGER.info("PASS ram")
if trans_zh and not models_config['trans_zh']:
from transformers import pipeline, AutoTokenizer, AutoModelForSeq2SeqLM
cn_tokenizer = AutoTokenizer.from_pretrained("Helsinki-NLP/opus-mt-en-zh",cache_dir='weights')
cn_model = AutoModelForSeq2SeqLM.from_pretrained("Helsinki-NLP/opus-mt-en-zh",cache_dir='weights')
translator = pipeline("text2text-generation", model=cn_model, tokenizer=cn_tokenizer)
models_config['trans_zh']= translator
elif not trans_zh :
models_config['trans_zh'] =None
else :
LOGGER.info('zh model pass')
if lama and not models_config['lama']:
models_config['lama']= AutoBackend("lama",weights=None,args_config='model_cards/lama/configs/prediction/default.yaml',device=device)
elif not lama :
models_config['lama'] =None
else :
LOGGER.info('free or lama pass')
return 'OK'
def Auto_run(
source= 'data/images', # file/dir/URL/glob, 0 for webcam
img_input='',
input_prompt="Anything in this image",
conf_thres=0.3, # confidence threshold
iou_thres=0.5, # NMS IOU threshold
text_thres=0.2,
device='', # cuda device, i.e. 0 or 0,1,2,3 or cpu
quant=4,
save_conf=False, # save confidences in --save-txt labels
img_save=False, # do not save images/videos
visualize=False, # visualize features
project=ROOT / 'runs/detect', # save results to project/name
name='exp', # save results to project/name
exist_ok=False, # existing project/name ok, do not increment
lama=False, # use lama models
sam=True, # use segment-anythings
det=True, # use grounded detect model with text
tag2text=False,
ram=False,
save_txt=False, # save results to *.txt
save_xml=False, # save results to *.xml
save_mask=False,
save_caption=False,
batch_process=False,
color_flag=False,
zh_select=False,
record_audio=None,
up_audio=None,
process_name=0,
):
global models_config
global category_colors
global JSON_DATASETS
cls_index = -1 # 设置默认值为 -1
if img_input:
source =img_input
source = str(source)
img_paths=None
if os.path.isdir(source):
img_paths = [os.path.join(source, f) for f in os.listdir(source) if
Path(f).suffix[1:] in (IMG_FORMATS + VID_FORMATS)]
else:
img_paths = [source]
# Directories
is_file = Path(source).suffix[1:] in (IMG_FORMATS + VID_FORMATS)
# save_img = img_save and not source.endswith('.txt') # save inference images
is_url = source.lower().startswith(('rtsp://', 'rtmp://', 'http://', 'https://'))
#webcam = source.isnumeric() or source.endswith('.streams') or (is_url )
if is_url and is_file:
source = check_file(source) # download
save_dir = increment_path(Path(project) / name, exist_ok=exist_ok) # increment run
(save_dir / 'labels' if save_txt else save_dir).mkdir(parents=True, exist_ok=True) # make dir
(save_dir / 'xmls' if save_xml else save_dir).mkdir(parents=True, exist_ok=True) # make dir
(save_dir / 'masks' if save_mask else save_dir).mkdir(parents=True, exist_ok=True) # make dir
(save_dir / 'captions' if save_caption else save_dir).mkdir(parents=True, exist_ok=True) # make dir
p = Path(str(save_dir) ) # to Path
seen=0
# loda data and inference
caption=None
for source in (img_paths):
im = cv2.imread(source)
name_p= source.split('/')[-1].split('.')[0]
img_rgb = cv2.cvtColor(im, cv2.COLOR_BGR2RGB)
preds=None
masks=[]
prompt=input_prompt
if tag2text:
LOGGER.info(f'text_prompt:{prompt}')
preds = models_config['tag2text'](im = img_rgb ,prompt=prompt,box_threshold=conf_thres,text_threshold=text_thres,iou_threshold=iou_thres)
# Currently ", " is better for detecting single tags
# while ". " is a little worse in some case
prompt=preds[0].replace(' |', ',')
caption=preds[2]
LOGGER.info(f"Caption: {caption}")
LOGGER.info(f"Tags: {prompt}")
if zh_select and prompt :
caption=models_config['trans_zh'](caption, max_length=1000, clean_up_tokenization_spaces=True)[0]["generated_text"]
if save_caption:
save_text2img_data(None, caption,name_p,zh_select)
#save_format(label_format="txt",save_path=f'{save_dir}/captions',img_name=name_p, results=caption)
if ram:
LOGGER.info(f'ram No need prompt:{prompt}')
en_tag,zh_tag = models_config['ram'](im = img_rgb,prompt=prompt,box_threshold=conf_thres,text_threshold=text_thres,iou_threshold=iou_thres)
prompt=en_tag.replace(' |', ',')
zh_tag=zh_tag.replace(' |', ', ')
#LOGGER.info(preds[1])
LOGGER.info(f"en_Tags: {prompt}")
print(f"zh_Tags : {zh_tag}")
# if zh_select and prompt :
# caption=models_config['trans_zh'](caption, max_length=1000, clean_up_tokenization_spaces=True)[0]["generated_text"]
# if save_caption:
# save_text2img_data(None, caption,name_p,zh_select)
if det:
if input_prompt:
prompt=input_prompt
LOGGER.info('your input prompt replace default:',prompt)
preds= models_config['grounded'](im = img_rgb,prompt=prompt, box_threshold=conf_thres,text_threshold=text_thres, iou_threshold=iou_thres)
if sam and det :
if preds[0].numel()>0:
masks= models_config['sam'](im = img_rgb, prompt=preds[0],box_threshold=conf_thres,text_threshold=text_thres, iou_threshold=iou_thres)
if save_mask:
save_mask_data(str(save_dir)+'/masks', caption, masks, preds[0], preds[2],name_p)
# Write results
if img_save:
seen+=1
plt.figure(figsize=(20,18))
plt.imshow(img_rgb)
if det:
for box,label in zip(preds[0],preds[2]):
show_box(box.numpy(),plt.gca(),label)
if sam :
for mask in masks:
show_mask(mask.cpu().numpy(),plt.gca(),random_color=True)
if tag2text:
plt.title('Captioning: ' + caption + '\n' + 'Tagging:' + prompt + '\n')
plt.axis('off')
plt.savefig(f'{save_dir}/{seen}.jpg',bbox_iches='tight',dpi=600,pad_inches=0.0)
if lama and masks is not None :
masks_prompts= masks.detach().cpu().numpy().astype(np.uint8) * 255
for idx, mask in enumerate(masks_prompts):
sub_mask = [dilate_mask(ma, 15) for ma in mask]
img_inpainted_p= f'{save_dir}/mask_{idx}.png'
idx=idx+1
img_inpainted = models_config['lama'](
im=img_rgb, prompt=sub_mask[0])
Image.fromarray(img_inpainted.astype(np.uint8)).save(img_inpainted_p)
img_rgb=img_inpainted
for category in categories:
if category not in category_colors:
category_colors[category] = (random.randint(0, 255), random.randint(0, 255), random.randint(0, 255))
gn = torch.tensor(im.shape)[[1, 0, 1, 0]] # normalization gain whwh
if (color_flag or save_txt) and(det ) :
seg_mask = np.zeros_like(img_rgb) # img_array 为输入图像的数组表示
category_color=[]
for xyxy, conf, cls,mask in zip(preds[0],preds[1],preds[2],masks): #per im boxes
xywh = (xyxy2xywh((xyxy).view(1,4)) / gn).view(-1).tolist() # normalized xywh
if cls not in categories:
categories.update({
str(cls): len(categories)})
write_categories(cls,f'{save_dir}/classes_id.txt')
cls_index = len(categories) - 1
category_colors.update({
str(cls): (random.randint(0, 255), random.randint(0, 255), random.randint(0, 255))})
category_color=category_colors[str(cls)]
else:
cls_index = categories[str(cls)]
if str(cls) not in category_colors:
category_colors.update({
str(cls): (random.randint(0, 255), random.randint(0, 255), random.randint(0, 255))})
category_color=category_colors[str(cls)]
line = (cls_index, xywh, conf) if save_conf else (cls_index, xywh) # label format
line = str(line).replace('[', '').replace(']', '').replace("(",'').replace(")"," ").replace(",", " " * 2)
if save_mask:
h, w = mask.shape[-2:]
mask_color = np.array(category_color).reshape((1, 1, -1))
seg_mask = seg_mask + mask.cpu().numpy().reshape(h, w, 1) * mask_color # add
if save_txt:
save_format(label_format="txt",save_path=f'{save_dir}/labels', img_name=name_p, results=line)
if save_mask:
plt.figure(figsize=(10,10))
plt.imshow(seg_mask)
#plt.title('Captioning: ' + caption + '\n' + 'Tagging:' + prompt + '\n')
plt.axis('off')
plt.savefig(os.path.join(f'{save_dir}/masks', f'{name_p}_cls.jpg'), bbox_inches="tight", dpi=300, pad_inches=0.0)
if save_xml:
h,w=im.shape[:2]
save_format("xml",f'{save_dir}/xmls' ,name_p, Path(source).parent, preds, h, w)
if det:
img_rgb= Image.fromarray(np.uint8(img_rgb), mode='RGB')
draw_img=ImageDraw.Draw(img_rgb)
for box,label in zip(preds[0],preds[2]):
Draw_img( box, draw_img,'box',label,category_colors[str(label)] if color_flag else None)
if sam:
img_mask=Image.new('RGBA',img_rgb.size,color=(0,0,0,0) )
draw_mask=ImageDraw.Draw(img_mask)
for mask in masks:
Draw_img(mask[0].cpu().numpy(),draw_mask,'mask',None,category_colors[str(label)] if color_flag else None)
img_rgb.paste(img_mask, mask=img_mask)
#img_rgb.save(f'{save_dir}/{seen}.jpg')
if save_txt:
#class_ids.append(cls)
LOGGER.info(f"Results saved to {colorstr('bold', save_dir)}/labels")
if save_xml:
LOGGER.info(f"Results saved to {colorstr('bold', save_dir)}/xmls")
if save_caption:
with open(f'{save_dir}/captions/dataset.json', 'a',encoding='utf-8') as f:
json.dump(JSON_DATASETS,f,ensure_ascii=False)
f.write('\n')
LOGGER.info(f"Results saved to {colorstr('bold', save_dir)}/captions")
if save_mask:
LOGGER.info(f"Results saved to {colorstr('bold', save_dir)}/masks")
LOGGER.info('Done...')
return [[img_rgb],caption,prompt,len(categories)]
def visual_chat(prompt_input, temperature, top_p, image_prompt, result_text,record_audio,upload_audio,omniverse=False):
global models_config
print(f"是否连接omniverse:{omniverse}")
if models_config['visual_glm']:
if image_prompt and prompt_input:
__, result_text=(models_config['visual_glm'].request_model(prompt_input, temperature, top_p, image_prompt, result_text))
if omniverse:
from a2f import tts_a2f
asyncio.run(tts_a2f(result_text[-1][-1]))
return "",result_text
else :
LOGGER.info("请检查你的输入格式和glm模型的参数配置!!!")
else:
return result_text,"没有加载部署的VisualGLM模型!!!"
def clear_fn_image(value):
return [("", "Hi, What do you want to know ?或者你想从图像中知道什么?")]
if __name__ == "__main__":
#check_requirements(exclude=('tensorboard', 'thop'))
proxies, WEB_PORT, LLM_MODEL, CONCURRENT_COUNT, AUTHENTICATION, CHATBOT_HEIGHT, LAYOUT, AVAIL_LLM_MODELS, AUTO_CLEAR_TXT = \
get_conf('proxies', 'WEB_PORT', 'LLM_MODEL', 'CONCURRENT_COUNT', 'AUTHENTICATION', 'CHATBOT_HEIGHT', 'LAYOUT', 'AVAIL_LLM_MODELS', 'AUTO_CLEAR_TXT')
AUTO_CLEAR_TXT = get_conf('AUTO_CLEAR_TXT')
# 如果WEB_PORT是-1, 则随机选取WEB端口
PORT = find_free_port() if WEB_PORT <= 0 else WEB_PORT
functional = get_core_functions()
from themes.theme import adjust_theme, advanced_css, theme_declaration
# 高级函数插件
from llm_cards.crazy_functional import get_crazy_functions
crazy_fns = get_crazy_functions()
import logging, uuid
os.makedirs("gpt_log", exist_ok=True)
try:logging.basicConfig(filename="gpt_log/chat_secrets.log", level=logging.INFO, encoding="utf-8", format="%(asctime)s %(levelname)-8s %(message)s", datefmt="%Y-%m-%d %H:%M:%S")
except:logging.basicConfig(filename="gpt_log/chat_secrets.log", level=logging.INFO, format="%(asctime)s %(levelname)-8s %(message)s", datefmt="%Y-%m-%d %H:%M:%S")
# Disable logging output from the 'httpx' logger
logging.getLogger("httpx").setLevel(logging.WARNING)
print("所有问询记录将自动保存在本地目录./gpt_log/chat_secrets.log, 请注意自我隐私保护哦!")
# 处理markdown文本格式的转变
gr.Chatbot.postprocess = format_io
# 代理与自动更新
from utils.check_proxy import check_proxy, auto_update, warm_up_modules
proxy_info = check_proxy(proxies)
voice_dir='voice_dir'
if not os.path.exists(voice_dir):
os.mkdir(voice_dir)
inputxs=[]
outputs=[]
cancel_handles = []
with gr.Blocks(title="Prompt-Can-Anythings",reload=True, theme=adjust_theme(), analytics_enabled=False,full_width=True,css=advanced_css) as block:
gr.HTML( f"#{node.range}{show_html}#
') else: f.write(f'{show_html}
') node = node.next if node is None: break for n in nodes: n.next = None # break return_dict['nodes'] = nodes return_dict['segment_parts_for_gpt'] = segment_parts_for_gpt return return_dict class LatexPaperSplit(): """ break down latex file to a linked list, each node use a preserve flag to indicate whether it should be proccessed by GPT. """ def __init__(self) -> None: self.nodes = None self.msg = "*{\\scriptsize\\textbf{警告:该PDF由GPT-Academic开源项目调用大语言模型+Latex翻译插件一键生成," + \ "版权归原文作者所有。翻译内容可靠性无保障,请仔细鉴别并以原文为准。" + \ "项目Github地址 \\url{https://github.com/binary-husky/gpt_academic/}。" # 请您不要删除或修改这行警告,除非您是论文的原作者(如果您是论文原作者,欢迎加REAME中的QQ联系开发者) self.msg_declare = "为了防止大语言模型的意外谬误产生扩散影响,禁止移除或修改此警告。}}\\\\" def merge_result(self, arr, mode, msg, buggy_lines=[], buggy_line_surgery_n_lines=10): """ Merge the result after the GPT process completed """ result_string = "" node_cnt = 0 line_cnt = 0 for node in self.nodes: if node.preserve: line_cnt += node.string.count('\n') result_string += node.string else: translated_txt = fix_content(arr[node_cnt], node.string) begin_line = line_cnt end_line = line_cnt + translated_txt.count('\n') # reverse translation if any error if any([begin_line-buggy_line_surgery_n_lines <= b_line <= end_line+buggy_line_surgery_n_lines for b_line in buggy_lines]): translated_txt = node.string result_string += translated_txt node_cnt += 1 line_cnt += translated_txt.count('\n') if mode == 'translate_zh': pattern = re.compile(r'\\begin\{abstract\}.*\n') match = pattern.search(result_string) if not match: # match \abstract{xxxx} pattern_compile = re.compile(r"\\abstract\{(.*?)\}", flags=re.DOTALL) match = pattern_compile.search(result_string) position = match.regs[1][0] else: # match \begin{abstract}xxxx\end{abstract} position = match.end() result_string = result_string[:position] + self.msg + msg + self.msg_declare + result_string[position:] return result_string def split(self, txt, project_folder, opts): """ break down latex file to a linked list, each node use a preserve flag to indicate whether it should be proccessed by GPT. P.S. use multiprocessing to avoid timeout error """ import multiprocessing manager = multiprocessing.Manager() return_dict = manager.dict() p = multiprocessing.Process( target=split_subprocess, args=(txt, project_folder, return_dict, opts)) p.start() p.join() p.close() self.nodes = return_dict['nodes'] self.sp = return_dict['segment_parts_for_gpt'] return self.sp class LatexPaperFileGroup(): """ use tokenizer to break down text according to max_token_limit """ def __init__(self): self.file_paths = [] self.file_contents = [] self.sp_file_contents = [] self.sp_file_index = [] self.sp_file_tag = [] # count_token from request_llm.bridge_all import model_info enc = model_info["gpt-3.5-turbo"]['tokenizer'] def get_token_num(txt): return len(enc.encode(txt, disallowed_special=())) self.get_token_num = get_token_num def run_file_split(self, max_token_limit=1900): """ use tokenizer to break down text according to max_token_limit """ for index, file_content in enumerate(self.file_contents): if self.get_token_num(file_content) < max_token_limit: self.sp_file_contents.append(file_content) self.sp_file_index.append(index) self.sp_file_tag.append(self.file_paths[index]) else: from ..crazy_utils import breakdown_txt_to_satisfy_token_limit_for_pdf segments = breakdown_txt_to_satisfy_token_limit_for_pdf(file_content, self.get_token_num, max_token_limit) for j, segment in enumerate(segments): self.sp_file_contents.append(segment) self.sp_file_index.append(index) self.sp_file_tag.append(self.file_paths[index] + f".part-{j}.tex") print('Segmentation: done') def merge_result(self): self.file_result = ["" for _ in range(len(self.file_paths))] for r, k in zip(self.sp_file_result, self.sp_file_index): self.file_result[k] += r def write_result(self): manifest = [] for path, res in zip(self.file_paths, self.file_result): with open(path + '.polish.tex', 'w', encoding='utf8') as f: manifest.append(path + '.polish.tex') f.write(res) return manifest def Latex精细分解与转化(file_manifest, project_folder, llm_kwargs, plugin_kwargs, chatbot, history, system_prompt, mode='proofread', switch_prompt=None, opts=[]): import time, os, re from ..crazy_utils import request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency from .latex_actions import LatexPaperFileGroup, LatexPaperSplit # <-------- 寻找主tex文件 ----------> maintex = find_main_tex_file(file_manifest, mode) chatbot.append((f"定位主Latex文件", f'[Local Message] 分析结果:该项目的Latex主文件是{maintex}, 如果分析错误, 请立即终止程序, 删除或修改歧义文件, 然后重试。主程序即将开始, 请稍候。')) yield from update_ui(chatbot=chatbot, history=history) # 刷新界面 time.sleep(3) # <-------- 读取Latex文件, 将多文件tex工程融合为一个巨型tex ----------> main_tex_basename = os.path.basename(maintex) assert main_tex_basename.endswith('.tex') main_tex_basename_bare = main_tex_basename[:-4] may_exist_bbl = pj(project_folder, f'{main_tex_basename_bare}.bbl') if os.path.exists(may_exist_bbl): shutil.copyfile(may_exist_bbl, pj(project_folder, f'merge.bbl')) shutil.copyfile(may_exist_bbl, pj(project_folder, f'merge_{mode}.bbl')) shutil.copyfile(may_exist_bbl, pj(project_folder, f'merge_diff.bbl')) with open(maintex, 'r', encoding='utf-8', errors='replace') as f: content = f.read() merged_content = merge_tex_files(project_folder, content, mode) with open(project_folder + '/merge.tex', 'w', encoding='utf-8', errors='replace') as f: f.write(merged_content) # <-------- 精细切分latex文件 ----------> chatbot.append((f"Latex文件融合完成", f'[Local Message] 正在精细切分latex文件,这需要一段时间计算,文档越长耗时越长,请耐心等待。')) yield from update_ui(chatbot=chatbot, history=history) # 刷新界面 lps = LatexPaperSplit() res = lps.split(merged_content, project_folder, opts) # 消耗时间的函数 # <-------- 拆分过长的latex片段 ----------> pfg = LatexPaperFileGroup() for index, r in enumerate(res): pfg.file_paths.append('segment-' + str(index)) pfg.file_contents.append(r) pfg.run_file_split(max_token_limit=1024) n_split = len(pfg.sp_file_contents) # <-------- 根据需要切换prompt ----------> inputs_array, sys_prompt_array = switch_prompt(pfg, mode) inputs_show_user_array = [f"{mode} {f}" for f in pfg.sp_file_tag] if os.path.exists(pj(project_folder,'temp.pkl')): # <-------- 【仅调试】如果存在调试缓存文件,则跳过GPT请求环节 ----------> pfg = objload(file=pj(project_folder,'temp.pkl')) else: # <-------- gpt 多线程请求 ----------> gpt_response_collection = yield from request_gpt_model_multi_threads_with_very_awesome_ui_and_high_efficiency( inputs_array=inputs_array, inputs_show_user_array=inputs_show_user_array, llm_kwargs=llm_kwargs, chatbot=chatbot, history_array=[[""] for _ in range(n_split)], sys_prompt_array=sys_prompt_array, # max_workers=5, # 并行任务数量限制, 最多同时执行5个, 其他的排队等待 scroller_max_len = 40 ) # <-------- 文本碎片重组为完整的tex片段 ----------> pfg.sp_file_result = [] for i_say, gpt_say, orig_content in zip(gpt_response_collection[0::2], gpt_response_collection[1::2], pfg.sp_file_contents): pfg.sp_file_result.append(gpt_say) pfg.merge_result() # <-------- 临时存储用于调试 ----------> pfg.get_token_num = None objdump(pfg, file=pj(project_folder,'temp.pkl')) write_html(pfg.sp_file_contents, pfg.sp_file_result, chatbot=chatbot, project_folder=project_folder) # <-------- 写出文件 ----------> msg = f"当前大语言模型: {llm_kwargs['llm_model']},当前语言模型温度设定: {llm_kwargs['temperature']}。" final_tex = lps.merge_result(pfg.file_result, mode, msg) objdump((lps, pfg.file_result, mode, msg), file=pj(project_folder,'merge_result.pkl')) with open(project_folder + f'/merge_{mode}.tex', 'w', encoding='utf-8', errors='replace') as f: if mode != 'translate_zh' or "binary" in final_tex: f.write(final_tex) # <-------- 整理结果, 退出 ----------> chatbot.append((f"完成了吗?", 'GPT结果已输出, 即将编译PDF')) yield from update_ui(chatbot=chatbot, history=history) # 刷新界面 # <-------- 返回 ----------> return project_folder + f'/merge_{mode}.tex' def remove_buggy_lines(file_path, log_path, tex_name, tex_name_pure, n_fix, work_folder_modified, fixed_line=[]): try: with open(log_path, 'r', encoding='utf-8', errors='replace') as f: log = f.read() import re buggy_lines = re.findall(tex_name+':([0-9]{1,5}):', log) buggy_lines = [int(l) for l in buggy_lines] buggy_lines = sorted(buggy_lines) buggy_line = buggy_lines[0]-1 print("reversing tex line that has errors", buggy_line) # 重组,逆转出错的段落 if buggy_line not in fixed_line: fixed_line.append(buggy_line) lps, file_result, mode, msg = objload(file=pj(work_folder_modified,'merge_result.pkl')) final_tex = lps.merge_result(file_result, mode, msg, buggy_lines=fixed_line, buggy_line_surgery_n_lines=5*n_fix) with open(pj(work_folder_modified, f"{tex_name_pure}_fix_{n_fix}.tex"), 'w', encoding='utf-8', errors='replace') as f: f.write(final_tex) return True, f"{tex_name_pure}_fix_{n_fix}", buggy_lines except: print("Fatal error occurred, but we cannot identify error, please download zip, read latex log, and compile manually.") return False, -1, [-1] def 编译Latex(chatbot, history, main_file_original, main_file_modified, work_folder_original, work_folder_modified, work_folder, mode='default'): import os, time n_fix = 1 fixed_line = [] max_try = 32 chatbot.append([f"正在编译PDF文档", f'编译已经开始。当前工作路径为{work_folder},如果程序停顿5分钟以上,请直接去该路径下取回翻译结果,或者重启之后再度尝试 ...']); yield from update_ui(chatbot=chatbot, history=history) chatbot.append([f"正在编译PDF文档", '...']); yield from update_ui(chatbot=chatbot, history=history); time.sleep(1); chatbot[-1] = list(chatbot[-1]) # 刷新界面 yield from update_ui_lastest_msg('编译已经开始...', chatbot, history) # 刷新Gradio前端界面 while True: import os may_exist_bbl = pj(work_folder_modified, f'merge.bbl') target_bbl = pj(work_folder_modified, f'{main_file_modified}.bbl') if os.path.exists(may_exist_bbl) and not os.path.exists(target_bbl): shutil.copyfile(may_exist_bbl, target_bbl) # https://stackoverflow.com/questions/738755/dont-make-me-manually-abort-a-latex-compile-when-theres-an-error yield from update_ui_lastest_msg(f'尝试第 {n_fix}/{max_try} 次编译, 编译原始PDF ...', chatbot, history) # 刷新Gradio前端界面 ok = compile_latex_with_timeout(f'pdflatex -interaction=batchmode -file-line-error {main_file_original}.tex', work_folder_original) yield from update_ui_lastest_msg(f'尝试第 {n_fix}/{max_try} 次编译, 编译转化后的PDF ...', chatbot, history) # 刷新Gradio前端界面 ok = compile_latex_with_timeout(f'pdflatex -interaction=batchmode -file-line-error {main_file_modified}.tex', work_folder_modified) if ok and os.path.exists(pj(work_folder_modified, f'{main_file_modified}.pdf')): # 只有第二步成功,才能继续下面的步骤 yield from update_ui_lastest_msg(f'尝试第 {n_fix}/{max_try} 次编译, 编译BibTex ...', chatbot, history) # 刷新Gradio前端界面 if not os.path.exists(pj(work_folder_original, f'{main_file_original}.bbl')): ok = compile_latex_with_timeout(f'bibtex {main_file_original}.aux', work_folder_original) if not os.path.exists(pj(work_folder_modified, f'{main_file_modified}.bbl')): ok = compile_latex_with_timeout(f'bibtex {main_file_modified}.aux', work_folder_modified) yield from update_ui_lastest_msg(f'尝试第 {n_fix}/{max_try} 次编译, 编译文献交叉引用 ...', chatbot, history) # 刷新Gradio前端界面 ok = compile_latex_with_timeout(f'pdflatex -interaction=batchmode -file-line-error {main_file_original}.tex', work_folder_original) ok = compile_latex_with_timeout(f'pdflatex -interaction=batchmode -file-line-error {main_file_modified}.tex', work_folder_modified) ok = compile_latex_with_timeout(f'pdflatex -interaction=batchmode -file-line-error {main_file_original}.tex', work_folder_original) ok = compile_latex_with_timeout(f'pdflatex -interaction=batchmode -file-line-error {main_file_modified}.tex', work_folder_modified) if mode!='translate_zh': yield from update_ui_lastest_msg(f'尝试第 {n_fix}/{max_try} 次编译, 使用latexdiff生成论文转化前后对比 ...', chatbot, history) # 刷新Gradio前端界面 print( f'latexdiff --encoding=utf8 --append-safecmd=subfile {work_folder_original}/{main_file_original}.tex {work_folder_modified}/{main_file_modified}.tex --flatten > {work_folder}/merge_diff.tex') ok = compile_latex_with_timeout(f'latexdiff --encoding=utf8 --append-safecmd=subfile {work_folder_original}/{main_file_original}.tex {work_folder_modified}/{main_file_modified}.tex --flatten > {work_folder}/merge_diff.tex') yield from update_ui_lastest_msg(f'尝试第 {n_fix}/{max_try} 次编译, 正在编译对比PDF ...', chatbot, history) # 刷新Gradio前端界面 ok = compile_latex_with_timeout(f'pdflatex -interaction=batchmode -file-line-error merge_diff.tex', work_folder) ok = compile_latex_with_timeout(f'bibtex merge_diff.aux', work_folder) ok = compile_latex_with_timeout(f'pdflatex -interaction=batchmode -file-line-error merge_diff.tex', work_folder) ok = compile_latex_with_timeout(f'pdflatex -interaction=batchmode -file-line-error merge_diff.tex', work_folder) # <---------- 检查结果 -----------> results_ = "" original_pdf_success = os.path.exists(pj(work_folder_original, f'{main_file_original}.pdf')) modified_pdf_success = os.path.exists(pj(work_folder_modified, f'{main_file_modified}.pdf')) diff_pdf_success = os.path.exists(pj(work_folder, f'merge_diff.pdf')) results_ += f"原始PDF编译是否成功: {original_pdf_success};" results_ += f"转化PDF编译是否成功: {modified_pdf_success};" results_ += f"对比PDF编译是否成功: {diff_pdf_success};" yield from update_ui_lastest_msg(f'第{n_fix}编译结束: