Showing preview only (3,127K chars total). Download the full file or copy to clipboard to get everything.
Repository: sunnysavita10/Generative-AI-Indepth-Basic-to-Advance
Branch: main
Commit: 59e7a0a31b08
Files: 86
Total size: 3.0 MB
Directory structure:
gitextract_kzntgr93/
├── Access_APIs_Using_Langchain/
│ ├── LangChain_Complete_Course.ipynb
│ └── requirements.txt
├── Advance RAG Hybrid Search/
│ └── Hybrid_Search_in_RAG.ipynb
├── Advance RAG Reranking from Scratch/
│ └── Reranking_from_Scratch.ipynb
├── Advance RAG with Hybrid Search and Reranker/
│ └── Hybrid_Search_and_reranking_in_RAG.ipynb
├── Chat with Multiple Doc using Astradb and Langchain/
│ └── Chat_With_Multiple_Doc(pdfs,_docs,_txt,_pptx)_using_AstraDB_and_Langchain.ipynb
├── Child_to_Parent_Retrieval.ipynb
├── ConversationEntityMemory.ipynb
├── Conversational_Summary_Memory.ipynb
├── FlashRerankPractical.ipynb
├── Generative AI Dataset/
│ ├── llama3.txt
│ └── state_of_the_union.txt
├── Generative AI Interview Questions/
│ └── Generative_AI_Interview_Questions.docx
├── Google Gemini API with Python/
│ └── GeminiAPI_With_Python.ipynb
├── LCEL(Langchain_Expression_Language).ipynb
├── Langchain_memory_classes.ipynb
├── MergerRetriever_and_LongContextReorder.ipynb
├── MongoDB with Pinecone/
│ ├── Mongodb_with_Pinecone_Realtime_RAG_Pipeline_yt.ipynb
│ └── Mongodb_with_Pinecone_Realtime_RAG_Pipeline_yt_Part2.ipynb
├── MultiModal RAG/
│ ├── Extract_Image,Table,Text_from_Document_MultiModal_Summrizer_AAG_App_YT.ipynb
│ ├── Extract_Image,Table,Text_from_Document_MultiModal_Summrizer_RAG_App.ipynb
│ ├── MultiModal RAG using Vertex AI AstraDB(Cassandra) & Langchain.ipynb
│ ├── MultiModal_RAG_with_llamaIndex_and_LanceDB.ipynb
│ └── Multimodal_RAG_with_Gemini_Langchain_and_Google_AI_Studio_Yt.ipynb
├── MultiModal RAG with Vertex AI/
│ └── MultiModal RAG using Vertex AI AstraDB(Cassandra) & Langchain.ipynb
├── Multilingual AI based Voice Assistant/
│ ├── .gitignore
│ ├── README.md
│ ├── app.py
│ ├── genai_AI_Project.egg-info/
│ │ ├── PKG-INFO
│ │ ├── SOURCES.txt
│ │ ├── dependency_links.txt
│ │ └── top_level.txt
│ ├── multilingual_assistant.egg-info/
│ │ ├── PKG-INFO
│ │ ├── SOURCES.txt
│ │ ├── dependency_links.txt
│ │ ├── requires.txt
│ │ └── top_level.txt
│ ├── requirements.txt
│ ├── research/
│ │ └── trials.ipynb
│ ├── setup.py
│ ├── src/
│ │ ├── __init__.py
│ │ └── helper.py
│ └── template.py
├── QA_With_Doc_Using_LlamaIndex_Gemini/
│ ├── Data/
│ │ └── MLDOC.txt
│ ├── Exception.py
│ ├── Experiments/
│ │ ├── ChatWithDoc.ipynb
│ │ └── storage/
│ │ ├── default__vector_store.json
│ │ ├── docstore.json
│ │ ├── graph_store.json
│ │ ├── image__vector_store.json
│ │ └── index_store.json
│ ├── Logger.py
│ ├── QAWithPDF/
│ │ ├── __init__.py
│ │ ├── data_ingestion.py
│ │ ├── embeddings.py
│ │ └── model_api.py
│ ├── StreamlitApp.py
│ ├── Template.py
│ ├── logs/
│ │ ├── 02_15_2024_16_21_43.log
│ │ ├── 02_15_2024_16_22_49.log
│ │ ├── 02_15_2024_16_23_52.log
│ │ ├── 02_15_2024_16_26_42.log
│ │ ├── 02_15_2024_16_27_41.log
│ │ ├── 02_15_2024_16_45_53.log
│ │ └── 02_15_2024_16_58_10.log
│ ├── requirements.txt
│ ├── setup.py
│ └── storage/
│ ├── default__vector_store.json
│ ├── docstore.json
│ ├── graph_store.json
│ ├── image__vector_store.json
│ └── index_store.json
├── RAG App using Haystack & OpenAI/
│ └── RAG_Application_Using_Haystack_and_OpenAI.ipynb
├── RAG App using LLAMAINDEX & MistralAI/
│ └── RAG_Application_Using_LlamaIndex_and_Mistral_AI.ipynb
├── RAG App using Langchain Mistral Weaviate/
│ └── RAG_Application_Using_LangChain_Mistral_and_Weviate.ipynb
├── RAG App using Langchain OpenAI FAISS/
│ ├── RAG_Application_using_Langchain_OpenAI_API_and_FAISS.ipynb
│ └── state_of_the_union.txt
├── RAG App with Mongo Vector Search & Gemma/
│ └── rag_with_huggingface_and_mongodb.ipynb
├── RAG Pipeline from Scratch/
│ └── RAG_Implementation_from _Scartch.ipynb
├── RAG_Fusion.ipynb
├── RAG_With_Knowledge_graph(Neo4j).ipynb
├── RAG_with_LLAMA3_1.ipynb
├── README.md
├── Roadmap of Generative AI/
│ └── Generative_AI_Roadmap.pptx
├── basic_retrieval_and_contextual_compression_retrieval.ipynb
└── self_query_retrieval.ipynb
================================================
FILE CONTENTS
================================================
================================================
FILE: Access_APIs_Using_Langchain/LangChain_Complete_Course.ipynb
================================================
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"id": "0",
"metadata": {},
"outputs": [],
"source": [
"import langchain\n",
"print(\"ok!\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1",
"metadata": {},
"outputs": [],
"source": [
"from dotenv import load_dotenv\n",
"\n",
"load_dotenv() # take environment variables from .env."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"GOOGLE_API_KEY=os.getenv(\"GOOGLE_API_KEY\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3",
"metadata": {},
"outputs": [],
"source": [
"HUGGINGFACE_TOKEN=os.getenv(\"HUGGINGFACE_TOKEN\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4",
"metadata": {},
"outputs": [],
"source": [
"HUGGINGFACE_TOKEN\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5",
"metadata": {},
"outputs": [],
"source": [
"OPENAI_API_KEY=os.getenv(\"OPENAI_API_KEY\")"
]
},
{
"cell_type": "markdown",
"id": "6",
"metadata": {},
"source": [
"# Langchain with openapi api"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "7",
"metadata": {},
"outputs": [],
"source": [
"import openai"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8",
"metadata": {},
"outputs": [],
"source": [
"from langchain.llms import OpenAI"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9",
"metadata": {},
"outputs": [],
"source": [
"llm=OpenAI()"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "10",
"metadata": {},
"outputs": [],
"source": [
"text=\"can you tell me about the chaina?\""
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "11",
"metadata": {},
"outputs": [],
"source": [
"print(llm.predict(text))"
]
},
{
"cell_type": "markdown",
"id": "12",
"metadata": {},
"source": [
"# Langchain with Huggingface hub"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "13",
"metadata": {},
"outputs": [],
"source": [
"from langchain import HuggingFaceHub"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "14",
"metadata": {},
"outputs": [],
"source": [
"llm2=HuggingFaceHub(repo_id=\"google/flan-t5-large\",huggingfacehub_api_token=HUGGINGFACE_TOKEN)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "15",
"metadata": {},
"outputs": [],
"source": [
"print(llm2(\"'how old are you?'please translate it in hindi\"))"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "16",
"metadata": {},
"outputs": [],
"source": [
"llm3=HuggingFaceHub(repo_id=\"mistralai/Mistral-7B-Instruct-v0.2\",huggingfacehub_api_token=HUGGINGFACE_TOKEN)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "17",
"metadata": {},
"outputs": [],
"source": [
"print(llm3(\"what is the capital city of India?\"))"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "18",
"metadata": {},
"outputs": [],
"source": [
"print(llm3.predict(\"can you give me 200 line of summary on the capital city of India\"))"
]
},
{
"cell_type": "markdown",
"id": "19",
"metadata": {},
"source": [
"# Lanchain with gemini api"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "20",
"metadata": {},
"outputs": [],
"source": [
"from langchain_google_genai import ChatGoogleGenerativeAI"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "21",
"metadata": {},
"outputs": [],
"source": [
"llm4=ChatGoogleGenerativeAI(model=\"gemini-pro\",google_api_key=GOOGLE_API_KEY)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "22",
"metadata": {},
"outputs": [],
"source": [
"llm4.predict(\"what is capital of usa?\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "23",
"metadata": {},
"outputs": [],
"source": [
"llm4.invoke(\"what is capital of usa?\").content"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "24",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.13"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
================================================
FILE: Access_APIs_Using_Langchain/requirements.txt
================================================
langchain
openai
huggingface_hub
langchain_google_genai
================================================
FILE: Advance RAG Hybrid Search/Hybrid_Search_in_RAG.ipynb
================================================
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "view-in-github"
},
"source": [
"<a href=\"https://colab.research.google.com/github/sunnysavita10/Indepth-GENAI/blob/main/Hybrid_Search_in_RAG.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "ZHzAavdZ3VNX"
},
"outputs": [],
"source": [
"from sklearn.feature_extraction.text import TfidfVectorizer\n",
"from sklearn.metrics.pairwise import cosine_similarity\n",
"import numpy as np"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "nYRfi-RmDbp3"
},
"outputs": [],
"source": [
"# Sample documents\n",
"documents = [\n",
" \"This is a list which containig sample documents.\",\n",
" \"Keywords are important for keyword-based search.\",\n",
" \"Document analysis involves extracting keywords.\",\n",
" \"Keyword-based search relies on sparse embeddings.\"\n",
"]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "H4MwrCZ_DmrA"
},
"outputs": [],
"source": [
"query=\"keyword-based search\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "NhzyM3v3Du2R"
},
"outputs": [],
"source": [
"import re\n",
"def preprocess_text(text):\n",
" # Convert text to lowercase\n",
" text = text.lower()\n",
" # Remove punctuation\n",
" text = re.sub(r'[^\\w\\s]', '', text)\n",
" return text\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "y2ni_SqXD0Vd"
},
"outputs": [],
"source": [
"preprocess_documents=[preprocess_text(doc) for doc in documents]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "j8V1C_9tEBMQ",
"outputId": "7b32b1e6-9a86-46cc-ce34-69853884e2bf"
},
"outputs": [],
"source": [
"preprocess_documents"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "gIOe6cD3EEsR",
"outputId": "f8d7ed10-52fd-4017-d609-b2d23c5db662"
},
"outputs": [],
"source": [
"print(\"Preprocessed Documents:\")\n",
"for doc in preprocess_documents:\n",
" print(doc)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "YsE3-_29EQZ4",
"outputId": "928dc874-96c1-43df-ad6c-bc2012537f7f"
},
"outputs": [],
"source": [
"print(\"Preprocessed Query:\")\n",
"print(query)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "SHeGaVJWESI-"
},
"outputs": [],
"source": [
"preprocessed_query = preprocess_text(query)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 35
},
"id": "M0KhXDLiEcCI",
"outputId": "d191b0de-17db-44e8-de9a-e32b1166e7ab"
},
"outputs": [],
"source": [
"preprocessed_query"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "DxMRTcYiEdHG"
},
"outputs": [],
"source": [
"vector=TfidfVectorizer()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "08jzr0KsEmDX"
},
"outputs": [],
"source": [
"X=vector.fit_transform(preprocess_documents)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "J_dkpYYZErZv",
"outputId": "1cb63639-5057-4d47-b1db-d7772f021e75"
},
"outputs": [],
"source": [
"X.toarray()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "Qzz9npHZE0oV",
"outputId": "02716dd3-9e0e-4d69-c48c-55b643cd6062"
},
"outputs": [],
"source": [
"X.toarray()[0]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "LckZUiA4E4ft"
},
"outputs": [],
"source": [
"query_embedding=vector.transform([preprocessed_query])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "aiNDyXHJFEZu",
"outputId": "6021c89a-d268-47bb-c582-de2a3e0769bc"
},
"outputs": [],
"source": [
"query_embedding.toarray()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "XXBAHj3nFGXh"
},
"outputs": [],
"source": [
"similarities = cosine_similarity(X, query_embedding)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "mrsvAIehHhIf",
"outputId": "95d2b3dd-f983-4f4c-b91e-6e7339ff5c83"
},
"outputs": [],
"source": [
"similarities"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "Juj5TN8GHzpV",
"outputId": "9d081198-b336-4f24-cffc-3665d37c7529"
},
"outputs": [],
"source": [
"np.argsort(similarities,axis=0)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "RHj8jNt2IPzU"
},
"outputs": [],
"source": [
"ranked_documents = [documents[i] for i in ranked_indices]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "gRmz-mQVHh-u"
},
"outputs": [],
"source": [
"#Ranking\n",
"ranked_indices=np.argsort(similarities,axis=0)[::-1].flatten()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "tqcS1JjmICiX",
"outputId": "5686d7b5-d395-4f1b-9115-dab500b4a561"
},
"outputs": [],
"source": [
"ranked_indices\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "Wsr1s-vcIEGm",
"outputId": "8b98886b-0d39-4580-efcf-541a871ded6b"
},
"outputs": [],
"source": [
"# Output the ranked documents\n",
"for i, doc in enumerate(ranked_documents):\n",
" print(f\"Rank {i+1}: {doc}\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 35
},
"id": "P4bJxZwAILue",
"outputId": "288b18fa-cf8f-4f4f-ef7c-fc3dc03fbe88"
},
"outputs": [],
"source": [
"query"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "JVa9FNvtJADx"
},
"outputs": [],
"source": [
"documents = [\n",
" \"This is a list which containig sample documents.\",\n",
" \"Keywords are important for keyword-based search.\",\n",
" \"Document analysis involves extracting keywords.\",\n",
" \"Keyword-based search relies on sparse embeddings.\"\n",
"]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "hU93ANjGJDLt"
},
"outputs": [],
"source": [
"#https://huggingface.co/sentence-transformers"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "c2Eh8p_MIVAV"
},
"outputs": [],
"source": [
"document_embeddings = np.array([\n",
" [0.634, 0.234, 0.867, 0.042, 0.249],\n",
" [0.123, 0.456, 0.789, 0.321, 0.654],\n",
" [0.987, 0.654, 0.321, 0.123, 0.456]\n",
"])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "YHKoe1BBIw1j"
},
"outputs": [],
"source": [
"# Sample search query (represented as a dense vector)\n",
"query_embedding = np.array([[0.789, 0.321, 0.654, 0.987, 0.123]])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "-EYl_pwbIyvN"
},
"outputs": [],
"source": [
"# Calculate cosine similarity between query and documents\n",
"similarities = cosine_similarity(document_embeddings, query_embedding)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "IMNMKcChLjkE",
"outputId": "2e582a10-31bb-4c99-9966-35b21ac0f901"
},
"outputs": [],
"source": [
"similarities"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "Vk1EdOJBI0S1"
},
"outputs": [],
"source": [
"ranked_indices = np.argsort(similarities, axis=0)[::-1].flatten()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "cA8La-wuI1rV",
"outputId": "f5e5ceb8-1533-4cee-b50c-d510a64acc8a"
},
"outputs": [],
"source": [
"ranked_indices"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "T_DQrmU9I2b2",
"outputId": "f8abc51c-7bbe-4a46-88f5-e7cb3e1fcddb"
},
"outputs": [],
"source": [
"# Output the ranked documents\n",
"for i, idx in enumerate(ranked_indices):\n",
" print(f\"Rank {i+1}: Document {idx+1}\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "bonW5T3DI343"
},
"outputs": [],
"source": [
"doc_path=\"/content/Retrieval-Augmented-Generation-for-NLP.pdf\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "4i1BwkuaJdUG",
"outputId": "b56b6dca-172f-4e11-9204-369e45d0420b"
},
"outputs": [],
"source": [
"!pip install pypdf"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "1IG4zizRJgWW",
"outputId": "898c9837-265b-409e-a684-eadef1844a97"
},
"outputs": [],
"source": [
"!pip install langchain_community"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "uYdubydrJmUH"
},
"outputs": [],
"source": [
"from langchain_community.document_loaders import PyPDFLoader"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "2f9DJUCzJprn"
},
"outputs": [],
"source": [
"loader=PyPDFLoader(doc_path)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "B98wvocsJvTN"
},
"outputs": [],
"source": [
"docs=loader.load()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "v7l4fCgvJxUW"
},
"outputs": [],
"source": [
"from langchain.text_splitter import RecursiveCharacterTextSplitter"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "WepxAdEdJ_nW"
},
"outputs": [],
"source": [
"splitter = RecursiveCharacterTextSplitter(chunk_size=200,chunk_overlap=30)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "lwvamrKDKCn_"
},
"outputs": [],
"source": [
"chunks = splitter.split_documents(docs)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "jeYdtmSQKFII",
"outputId": "cf3c4288-aeea-4f6f-d29f-d37dd6220d55"
},
"outputs": [],
"source": [
"chunks"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "9ELPWtoiKGj_"
},
"outputs": [],
"source": [
"from langchain.embeddings import HuggingFaceInferenceAPIEmbeddings"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "tie5VFKiKNLG"
},
"outputs": [],
"source": [
"HF_TOKEN=\"\" # Replace with your Hugging Face API token"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "zUHbfW8kKOvP"
},
"outputs": [],
"source": [
"embeddings = HuggingFaceInferenceAPIEmbeddings(api_key=HF_TOKEN, model_name=\"BAAI/bge-base-en-v1.5\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "ac6yOdC2KYRP",
"outputId": "f176c60f-ea0e-426e-ceb7-cc18cc6829ce"
},
"outputs": [],
"source": [
"!pip install chromadb"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "Y0quqPhKKc22"
},
"outputs": [],
"source": [
"from langchain.vectorstores import Chroma"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "Zfzae2UlKh9O"
},
"outputs": [],
"source": [
"vectorstore=Chroma.from_documents(chunks,embeddings)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "0ALPQsPUKpau"
},
"outputs": [],
"source": [
"vectorstore_retreiver = vectorstore.as_retriever(search_kwargs={\"k\": 3})"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "2FV-WXkyKx6P",
"outputId": "b6130974-ba6b-4296-9105-d750ab9c77d3"
},
"outputs": [],
"source": [
"vectorstore_retreiver"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "QT6vnCxHKyw9",
"outputId": "05a917ff-c00c-460c-bb49-a711f88e52d0"
},
"outputs": [],
"source": [
"!pip install rank_bm25"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "IqeQYitAK4ct"
},
"outputs": [],
"source": [
"from langchain.retrievers import BM25Retriever, EnsembleRetriever"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "K0Ysb2j7K8q-"
},
"outputs": [],
"source": [
"keyword_retriever = BM25Retriever.from_documents(chunks)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "ns_BlaSPK_7G"
},
"outputs": [],
"source": [
"keyword_retriever.k = 3"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "mgWvoTb6LFTu"
},
"outputs": [],
"source": [
"ensemble_retriever = EnsembleRetriever(retrievers=[vectorstore_retreiver,keyword_retriever],weights=[0.3, 0.7])"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "UofjUpUzLYep"
},
"source": [
"# Mixing vector search and keyword search for Hybrid search\n",
"\n",
"## hybrid_score = (1 — alpha) * sparse_score + alpha * dense_score"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "YcoWWuHCLRpI"
},
"outputs": [],
"source": [
"model_name = \"HuggingFaceH4/zephyr-7b-beta\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "npRU0vb2MID-",
"outputId": "9ed32b71-d556-4ce3-b173-4dde1adeffad"
},
"outputs": [],
"source": [
"!pip install bitsandbytes"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "1-5-EKRgMKIG",
"outputId": "92a5cc0e-a1d0-4632-feeb-c4fe330db197"
},
"outputs": [],
"source": [
"!pip install accelerate"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "j1hZfTx7MMvF"
},
"outputs": [],
"source": [
"import torch\n",
"from transformers import ( AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, pipeline, )\n",
"from langchain import HuggingFacePipeline"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "wreWtbxiMjX2"
},
"outputs": [],
"source": [
"# function for loading 4-bit quantized model\n",
"def load_quantized_model(model_name: str):\n",
" \"\"\"\n",
" model_name: Name or path of the model to be loaded.\n",
" return: Loaded quantized model.\n",
" \"\"\"\n",
" bnb_config = BitsAndBytesConfig(\n",
" load_in_4bit=True,\n",
" bnb_4bit_use_double_quant=True,\n",
" bnb_4bit_quant_type=\"nf4\",\n",
" bnb_4bit_compute_dtype=torch.bfloat16,\n",
" )\n",
"\n",
" model = AutoModelForCausalLM.from_pretrained(\n",
" model_name,\n",
" torch_dtype=torch.bfloat16,\n",
" quantization_config=bnb_config,\n",
" )\n",
" return model"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "NwjY8MH2MlPy"
},
"outputs": [],
"source": [
"# initializing tokenizer\n",
"def initialize_tokenizer(model_name: str):\n",
" \"\"\"\n",
" model_name: Name or path of the model for tokenizer initialization.\n",
" return: Initialized tokenizer.\n",
" \"\"\"\n",
" tokenizer = AutoTokenizer.from_pretrained(model_name, return_token_type_ids=False)\n",
" tokenizer.bos_token_id = 1 # Set beginning of sentence token id\n",
" return tokenizer"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 301,
"referenced_widgets": [
"80926a7d4df344508960da3bd0ca49f7",
"6c641853c1b74a41b184f51b87ae906f",
"8d2c6e157a924a26880515f7324b2c75",
"a0be9de9c0a74bf7a71990b7cf90fc81",
"af31aa045fc34aa988fd62c069526825",
"2a1bec229f7d47848a7b2295c6a268f4",
"26095125285d4e9ea83874f6ffe25942",
"08cf29966c294859bccd6c732c5f3d7f",
"9e1aa7119d43472d8aa59c7dab6c694a",
"16eb1f8b842d4826ba3ef2005ed31e6e",
"3c1d8fe6e80543a2882f85db87ea1ac0",
"0b1bcab0cf134f63a5e8266625a942bd",
"09826733b772426087d6637d792ba548",
"44e281bf0ce446c4b8498228561135de",
"59b548dd70b44891b3aa98de1791bfb5",
"384b571d3e39475b85ca761ab673ee73",
"447d6969157c43bcb0631a26b6c8cac0",
"071e4ae0b00c4f26be7211138a1181ae",
"f8be43f1c46c4faf8f051210a43f0bfb",
"a1a23db33c224752a1dc2196ba382ae6",
"a378202cc8d949f69d3d12d4fa73213e",
"f31d3074eae94590b898da18bac54d06",
"ad36707579d845b7a06f50cb63ed7b83",
"3a814f699a8343c3af2fad3f95de8de1",
"4ad0219054824fd0af5bbdb93442da57",
"71bc0d30d99a4e47a0404b1f6889eaf3",
"263d4bd8d0574212a6a321a1c7bdb196",
"fd6295669e164ce4a387bc2e62946a4f",
"ae1113c16973440d83b13200041c3951",
"21f3d9c1e06e4296926555bfdc1f06fa",
"458d94bad2094540a2f39a68ca453b69",
"3c67a6a263944cf6b4c5a16bb12f645c",
"582960e55cca46a2bb98c6262b4b8dac",
"cb616a7cdbe34801b06a02faa2e1bf63",
"840eaca5fb2946509606fbb200dc09f4",
"363406e6e0014ebc84b4dc59b02f02c5",
"85b93405ad3a4ba38e2556d6524d65fd",
"a742a5585675495a93f79f8637ca1280",
"de907241aa264c45934acfb7e9d24f57",
"615281b2b6784b98847b50f01192f1a2",
"9c4daf9db0034665a2594f47327b6788",
"88d9d8050d1e4969b098f6abf84c1fee",
"26b3044048b64a73b51d5898315d6dc5",
"538545500c344b779460fb35fe1518db",
"1d037ffc17d640548efc347135c3161c",
"092a08cdbb7642afbc7690fc24df984f",
"13625c7173694bf4aacdcc3d220d1987",
"21c9320ad08a483d8d157a54618b560b",
"fb5058f498e343109b9efc4e5b686abb",
"36acdc477dde484db4a62e3457d5e541",
"dc92e27c5fb54dfeba414b52de3e61ff",
"176dbed88ecc4ac48b8f5c8ee5f18954",
"11f30ee987e44ab3a314a8bfcb97ae65",
"c897c57b73b94813b98b404a09eb8c27",
"73a4375a86fc4d238183d1c5b8ec0947"
]
},
"id": "6jPsnRl1MnTT",
"outputId": "01daf0fc-3df3-4595-d25c-cbac7a854885"
},
"outputs": [],
"source": [
"tokenizer = initialize_tokenizer(model_name)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 450,
"referenced_widgets": [
"07d8c85868c449449b6baa5e01a73b29",
"e471c5568cb64517b04d6928bf8fe489",
"57fe89b8228949a38be75ef7e4de7839",
"77e99c47ee4e454eb52737d88bbd251d",
"34f2a684450948a9b3282c6ed6a942f8",
"2df56bd6d013457394f095f7059eb749",
"cd260bcc399844f4a1e8ba5f6e9c7583",
"a0966045942d45f4971f9cafe09ca3d9",
"202f976699894040976c79e074f3f872",
"afadf115dc5b48dd96c6116f79c5d1e2",
"63de9883e84a4b5da61f47973c20d9ef",
"4881c9bf45b3451db46fee4c157f0f04",
"2442fa158c45499d8cd8180a8315c17c",
"5f117ff65397456db4831a76762e6fc7",
"f51636a5b18446ec8f796bed4cac5235",
"70aabcb99c8d440ebf7e62fa8bef67ca",
"43b7aa07093e4743a1840b537e45ec49",
"bd3b9e1eec7f4e15a6fbb5a5008b0ed7",
"cdb4ac97ae9c4f9ba4acd34c8ad754e1",
"005034b8094743e6a66ba5f3a92e2528",
"f361d834ed24402aa92d7e68a5643baa",
"040e68a26cba4ed59cbbb645b415849c",
"1cdbc02ee2254e4d8d9f74f03877982f",
"2750c773ab1f4aba83a4bc49810e0b2d",
"c621c7507656401996f18f6d8838c10f",
"49df4a8fb9324b38beb88827dc616397",
"f5aa99bc46bd402c8f87d66974ab5381",
"e9e8dcede4384b909a3d4075ee0e81ad",
"03f5ae15263d430ab78f9b411b3a791c",
"842c86b1c46b4e52940fccbe4189e99c",
"b5f2d4c72b4844b893aa321beb203024",
"d3b49dd99b7f4bdc99916a8489d5a2b9",
"2962ac0a38fd4520a5d1a8dbff0d0017",
"c62b209db3d84077955d5f8098ba8e7e",
"03fead08af6e4045bd486059db06778e",
"07d9e2dc13e6425aabab9742595319f9",
"f8dcdea2c21746e8bd31d3545bec063e",
"6b09ce81e3d14ba393667f1d2107d664",
"520c039fd1ff42888eb2ccedcae2206b",
"ed6bb088584346f29abc2dd96a165f25",
"0b93285e9d6648a18895bcc97f8fd047",
"cc1faa52bbb349da949ba8685a02a634",
"2117b98dcd9145788388d65cfc226276",
"5b7b92c68df94ad4aa7a57c24392b881",
"15668490074e4146924c76d30d58b36d",
"a295f559a84548b99dbf37b543be5a3e",
"58e921fa323a469d9a0605ddbed59a75",
"7e27e52524774fc18cbb1be105a95754",
"63847ee284194d009825351d96a5b02b",
"865e74c7b5f74079af763e2263bc2327",
"2113668d3b97456bb12ce916a676feaf",
"9c800bd92ce74075b918c4d466e84863",
"52fa72ead88142e98c414a13859d6eab",
"af20ec59965f42b18c2f96351b5fb0ba",
"7971e0b059c64a23a389b43ddf387122",
"6f7c608396e44029991536f150818d16",
"d15b0846c4314acdaa7b3a1dcb71f0bc",
"f758e0d677a547c3841b80497c674978",
"ab34a0c84c3e46c1b4f68d1283f5935d",
"b48a6bba01984146a10e72505b3759a7",
"38f8636944b44bb49d36566ced632076",
"f9420643986c4ed4b8d9c07e27acd48f",
"8d145b507e28468f9c856014199d4db5",
"e46fcdd2b72b4e30b0e12c0d624dd98e",
"cdcb1c54936141dcac96fddd96f271a5",
"8434d230785e4dee8d148f68c6a888fb",
"50454394fabb4f31a45cb58b96cc26d5",
"977c8cfbd6a44a5d86c594d26911d557",
"f468056e46044349b8ce3a41d550ee78",
"7bd32807c89848e1bbf33f3caf1f387e",
"bd90f8108f084b87a4d430ba0e515cf2",
"b0519b9d03394d5eab8484f2abe3a70c",
"e05120f1c0a4434ab707344d1e383ed9",
"b050ac0e6aea4d79bb1d2490dfc3ae98",
"94bf0da48d34429886d0e4cce5450fd3",
"c17d1f96f8594112b765de1dbc6f3b74",
"e2aeb272fb374ef592c822c90ed8778d",
"165a5174fbab472c9ef25bd99e6ac28c",
"c1acc369a31e4e79a938a6c0cb36e559",
"73b31d036d464c82a3a1ff920f6a3449",
"4972967026934ea5acaf5f6ff7e85959",
"3dcb0ed858364e949e263d5d4826ef2a",
"a6076c8c267b4eec8178d409074903c7",
"73cac54fe96f4a279e0c207709a86eaf",
"e4bd4490dcec4d7d87be03a9a4d4382d",
"57b4dcff412b4b468a6de20591de26ce",
"4693d715c66547ea8d39cd1a6ba0336f",
"2a01c5ef1de4456aad63cca3b2069593",
"677a79a0338d422ca3368dd57f178b85",
"d862657320ee4504a7265c5c97c31081",
"04cfd5ba6ecc40c1952558acfbb2f4ce",
"b52cb5cffa3141419db3efa89113e814",
"cc6887ac097e4289a6ce4b1b1bf173e7",
"4a7a6b7c7d784185b57d67d7f54b2691",
"44bc78242bf94d79ac70fc791e3af16a",
"6666fa1158c346e29cad1357589fcaa6",
"db8b5eeea5754a399ce04f1870623cb7",
"b28becf405c34086b763f02b63260aee",
"69352bdd872d47649698abb7de37d3ef",
"aaabc037d2f646788594f91d50da1997",
"a2433ece64f547f39705001c3e30a6d9",
"92c03fe2ae76461790874e0018815a28",
"584662741e874eefacf19320737c2b59",
"a6801fca131c4230987a70253e4ec6d7",
"4b2a88c686e3410cbb0dea89f563a36a",
"32ca4c3fc1814b42a14f37b6f6fd0882",
"909b322452804f2abc0d2b4b56f0dfc5",
"37d95d871dc8470ea003d21eae074fc8",
"9029d7f32c234fdca72c18b30dbd43d6",
"3fc32e8724b643b8b99ed542902ffb50",
"809e6a3712b74ba2bee89cdefbbb5a8a",
"b9e89f8490404229b28b9bfbc1f07ed3",
"d7dbac36a72a4bc792127f08576906fd",
"29323042926c40fe9d6bfdf90a1a8461",
"236c12bc38d740b9a40e77b0257f614b",
"07c3636d7be347f7a5c7ecdfe20e3b6f",
"99437495138f4cff8fa55107bfb2c6e2",
"2b2e6307d78645fe84349bd7f84bba2d",
"e75fbee35ed940dca1d5c2f8c648180f",
"fe2e1a2e23c7469382035a693633530a",
"2998a1d726c44d13870fb56d0382414a",
"a915809bec36492e82b2ffe103dbaa38",
"00d6d459dc3e419d882aa81ae4f28154",
"4a1269dcfb8f426a980f5e0154d72a69",
"2b10ac1bc5e645609c9e94c611ab5d9d",
"154279dc043b484ab636061c284b999e",
"00e126aa54674c7f94645fc575d337a3",
"68473a26b8fa4e53a0323b87cf64c85b",
"88e62c14da344b09924afb5f76fb82f2",
"dfc1ce03a4264afdbfcbdc3e49da55f7",
"7180d98479b04fc7ac68016993803fd1",
"d6b7a480f8c841b484a5dc7f25fd07e6",
"d6ae442401a549a183fe9ee6acde7d6c",
"c5730f6901a94164b6d011b1be334779",
"083df6dcb00e4c4d8067aebdb17739f3",
"84c4aaf7020a42fea9195c6721140956",
"774958b959c144ad96ad9ed6cd5a65b8",
"2bab900d4d464e38b02e8428a9167a20",
"6f940eecd7a24898aec4adeb1c2ea9d7",
"5a1d235b8ed447718d6c99999b27f663",
"12c23e90d37d444bb5a6a29d282b0a48",
"1ee7eb7cb9c94e89a82e7d01008d1030",
"f6458c06ee6e472d8f48ebe902d6e420"
]
},
"id": "SlPXp-MdMoud",
"outputId": "a10228f2-d79e-4e87-8802-a5d2c4923ffe"
},
"outputs": [],
"source": [
"model = load_quantized_model(model_name)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "W92XMGCnMuuG"
},
"outputs": [],
"source": [
"pipeline = pipeline(\n",
" \"text-generation\",\n",
" model=model,\n",
" tokenizer=tokenizer,\n",
" use_cache=True,\n",
" device_map=\"auto\",\n",
" max_length=2048,\n",
" do_sample=True,\n",
" top_k=5,\n",
" num_return_sequences=1,\n",
" eos_token_id=tokenizer.eos_token_id,\n",
" pad_token_id=tokenizer.pad_token_id,\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "c_9lkcQxMzRz"
},
"outputs": [],
"source": [
"llm = HuggingFacePipeline(pipeline=pipeline)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "xifUF7rhM0zw"
},
"outputs": [],
"source": [
"from langchain.chains import RetrievalQA"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "SusMb1LuM2I9"
},
"outputs": [],
"source": [
"normal_chain = RetrievalQA.from_chain_type(\n",
" llm=llm, chain_type=\"stuff\", retriever=vectorstore_retreiver\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "EryZWwp0OK1b"
},
"outputs": [],
"source": [
"hybrid_chain = RetrievalQA.from_chain_type(\n",
" llm=llm, chain_type=\"stuff\", retriever=ensemble_retriever\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "8LfE83mROQPS"
},
"outputs": [],
"source": [
"response1 = normal_chain.invoke(\"What is Abstractive Question Answering?\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "V9AD5METOTne",
"outputId": "ee2d24ca-4e41-4a09-e061-01c4a7a2fe5c"
},
"outputs": [],
"source": [
"response1"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "3upJ2p95OSA2",
"outputId": "fab70c85-73b4-4aeb-b4af-de5a38f14bc0"
},
"outputs": [],
"source": [
"print(response1.get(\"result\"))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "05btkVByOVPA"
},
"outputs": [],
"source": [
"response2 = hybrid_chain.invoke(\"What is Abstractive Question Answering?\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "8iTPRsBqO_o9",
"outputId": "213c4356-657f-4cef-a814-3885ce7c88e7"
},
"outputs": [],
"source": [
"response2"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "TH4DKQYYPDuA",
"outputId": "2dd0f6e8-fa4a-464b-8c12-5605c26a2141"
},
"outputs": [],
"source": [
"print(response2.get(\"result\"))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "r3k6SAjmPH5X"
},
"outputs": [],
"source": []
}
],
"metadata": {
"accelerator": "GPU",
"colab": {
"authorship_tag": "ABX9TyN9J8sAFAwcZchZQM3mPc4J",
"gpuType": "T4",
"include_colab_link": true,
"provenance": []
},
"kernelspec": {
"display_name": "Python 3",
"name": "python3"
},
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
================================================
FILE: Advance RAG Reranking from Scratch/Reranking_from_Scratch.ipynb
================================================
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "view-in-github"
},
"source": [
"<a href=\"https://colab.research.google.com/github/sunnysavita10/Indepth-GENAI/blob/main/Reranking_from_Scratch.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "zkZpN87d4HJf"
},
"outputs": [],
"source": [
"documents = [\n",
" \"This is a list which containing sample documents.\",\n",
" \"Keywords are important for keyword-based search.\",\n",
" \"Document analysis involves extracting keywords.\",\n",
" \"Keyword-based search relies on sparse embeddings.\",\n",
" \"Understanding document structure aids in keyword extraction.\",\n",
" \"Efficient keyword extraction enhances search accuracy.\",\n",
" \"Semantic similarity improves document retrieval performance.\",\n",
" \"Machine learning algorithms can optimize keyword extraction methods.\"\n",
"]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "MLF_E-ZQCYq_",
"outputId": "d6663d67-6aaa-4d05-e6d6-f93a38bee6d0"
},
"outputs": [],
"source": [
"!pip install sentence_transformers"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "A2TapY91Cde2",
"outputId": "59730e9c-973c-4e42-9e62-319b0c783df2"
},
"outputs": [],
"source": [
"from sentence_transformers import SentenceTransformer"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "YcMjOGquCkSu"
},
"outputs": [],
"source": [
"# Load pre-trained Sentence Transformer model\n",
"model_name = 'sentence-transformers/paraphrase-xlm-r-multilingual-v1'"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 528,
"referenced_widgets": [
"a73efd61756445ee8f59a9d4c477a496",
"68ccd8382e3b4b67918c1185619569be",
"abdde99c2beb414a85ec34c5786d6bb4",
"268eb8843398482fafafc7872d0ecfe3",
"237e90ec67814d0dae44d87298d7832f",
"065e9d8dea90483e82a8b2245f4536ca",
"b79089b353334f93815eca389075e53e",
"4e4fceb1c2694de983367ffbc14cc6b1",
"78a1ec72ded34a38be5a53b826a3f47e",
"0e6e82e47c794f8c84635a9dbcfe2f41",
"9e2638ccb48848ecab369c3cfc3858d5",
"8a3090b91dda4a14a4ef1edcdcdcd169",
"c9108a02b3ca4b3fa1fd27498841f5fe",
"d61f57fb6cad482090fe874268bb7fb0",
"e994f58d6e7e4639b9feb59166e49ece",
"6116d2df9d8b4290981f7d45b0c1ece9",
"147152f3f8314d5f93c35194e96bf41f",
"1a0b0b6b1848413baea8804af947cf7f",
"d3603d3fdc314fca9132355c70948e28",
"d6258c5bfc474ef59d9c106245dfa5f6",
"918d839c76564c41ba9373b576de1cfd",
"47dad9a831044689b1926f2b9e608787",
"e767c41657d345a5ba8569b89d3347f7",
"ea8879768efa416d9a032ac56760039f",
"61ad01c8d10d41579300176be2388cb5",
"7a2e7317b9d64b07b032e42fd1a7f021",
"3a0feffa1d4b4f49bf875eeece9eda44",
"eaf1862044564eefb3ae75a3f1d9b5cb",
"8f337c9564e748d5825b0338cbfe61ed",
"83362efd58e543b3b37fef730a92e472",
"a766e351652e480ba9c2bbe9e4654277",
"f6c2e7b4ffd24a299f77966e6a01bfae",
"3722b793a3864a7ea37d17190502214f",
"4fafa1c7e2ba4226adcc96923366ad27",
"485bbf74d899477aa078deb9c041ed78",
"f99622285dd54f7ba2e53328281f4622",
"1cb793924d74484cab6ef3d25f03d6a1",
"085ef7cf00f34d17b0a408d30edf2fe4",
"783b47a891e54914b6c1c02282d60ab4",
"316319c51c8e4641bf6a54b0faa115da",
"cdfe2c8c9e85479d85e9f8807a8e0d80",
"eaaf65e04ae643e0b345e4c9fb66b395",
"a9655237479c4dda903e7cf2bce5ed47",
"6ed8f516577445a59b14d83cc70051b9",
"d0ad25f469704abc9f50593e7a700493",
"ab96ab19368e4e0a9b9db9651e82dcfb",
"d7e49001505f423a875b4a104f802791",
"e0430cc1c475449cb7d7e95996346386",
"310b3c09502d478cb4f6d5c368477d3d",
"da40ed22f73d45e38c18a0840447f9f4",
"3b691de532754ca896537b9545b98b3c",
"23d6300688ed48f4854d3b1b1d95f7b5",
"07e8d136197c42e4bf13e268fe4a7e18",
"adb7a266e6f544d69a431d639b7ec8ca",
"389621bc791d43009a1dc72f8b8c6255",
"d0d22e982bae4a22bc3096a20d5df450",
"5b15d268cb2f41f789fc64b69ecbfacd",
"e193101058a84d748d8d9d7de188206c",
"2260ad86c1fc4fd19a45c739a149a468",
"12eebf8fb4754a3fb437826534a59122",
"aed192be683a4457a203023a767d9657",
"07804e100ec94400be6112715766edd4",
"ca3c48a038ba41ad8cb8c63818efc56e",
"b7b6faa206d24c2aa170db8c1e4fdd5a",
"429081ab64d243658e23629aa5f5dd42",
"b48366eeb2764d538c7377dbd885e645",
"d7551d3c5c7b40feaf5e10980941be6a",
"fc7e4928ab3d41399a353a635458b839",
"92c726d5833442a2b2370e39f1f3beb6",
"b66f31b83a7a45c2802f82e71977a484",
"f23f5ac637634ce99d980ff914f5f2b5",
"7b8d20d4f0ea491eb0c572b35632b491",
"b0c146ab051e469c9a221f7a4aa2dcb0",
"7870aca389cd476f9f9fc3e18a3a7dc1",
"b0196615760e4f8090bcefeb6dc75ad0",
"2543d539ac1846bc9c8a60f5c9bf8b0f",
"0518c6d6fd6345a0bd4d8cd42624c9b0",
"a24d5de3191d4ab9b14042d1b74b2771",
"360e5501781641e28223c2ca04885712",
"61ba97ed908347bdb108414d00ae1da8",
"f532e2107a504f02b1c8aaf808fab285",
"bbbe88e4bcb8419eae1287f873030c32",
"6fe021d4c3354f928864812d36c6f505",
"b858cf901d114e0fb8c5d4b5e60a37fd",
"5f9cd04cd1ad45e797b5a19b59d12a53",
"ed88bb06e121466990f46709fe4cb04a",
"bb1918b54e5a49c7955615803d61d9fb",
"f2c7997c24034b339b32bc6c6e29caf1",
"6251884fe0ed43649766f7564e6ad115",
"c14ae62edc51484d963fa9160806d171",
"611c4d1847b34f17ae0e00dcd94154ce",
"29fd88b5e6e6476cbcce4f612905a8e6",
"90f11cadc2ca48f58e2d63351593710c",
"3d35a1d2cfc94dcebf9e972fb156d966",
"d33d91cab584458ca5c739c4bc90c303",
"f4e5d0f719a04dc0b67589274369a421",
"81202f4eeb1c41378a67ef9b5841c1c0",
"b6acd772cff24e5d96d0624ddf4ff44e",
"b1c612692bb1479b8dec5f467ad58832",
"7a96b13409884197a28ec4b78a86282d",
"71f6ffd6635b40608b37066f9b67e176",
"70af227d52094d18a829639dd84955b1",
"17317293304644c5a891955e3b387964",
"c74008e0e8e2429b89f1192f8bb0d4b7",
"e1cb3eb282a14850b48ab5a380a2acb5",
"7319aebddf3445b2b7e9a0306e0447e7",
"5d19b88244904aa88059b6c9e27a8599",
"fffd1f3b84b443828a2f8b49d41be906",
"5f5f4d08736d4f2e96c439820fc64713",
"ced7c15e2caa4e73a6feb289b66a3dfe",
"319c9d6c19af43a1bfdeea654417384f",
"b57386b666e74e1e98b2dd63ddee47f7",
"5a1215e03da746c1afdd509b76ab1004",
"8ccec9a9c0f74eb8be6955cff6e53705",
"ab8123ece1a240dfa5ca5745e6ab8292",
"edd08d1dc4ba4372a0d922d6d3082aed",
"638363a050bb40e3bd7c4486bae71ed5",
"4fbbae328003461a9ff3504ad3ea6ce1",
"4261373691d54f818090945cc72bae02",
"1a8126495ffe4bbc9a25ee1815db4cbb",
"9b3cad44fe2548c48336193e7c4643c9"
]
},
"id": "3HLEx9rKCxdn",
"outputId": "cf3da8e1-e2b6-4153-8d0e-384210001ba0"
},
"outputs": [],
"source": [
"model = SentenceTransformer(model_name)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "oj8kcRVZDDYs",
"outputId": "814cd1b0-cacd-44c3-b3d1-65df0b6534cd"
},
"outputs": [],
"source": [
"documents"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "yaI-PRMwDGxf",
"outputId": "00c9abfc-2371-4006-d76a-681e7b9c619b"
},
"outputs": [],
"source": [
"len(documents)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "PYxjbDxdC0T_"
},
"outputs": [],
"source": [
"document_embeddings = model.encode(documents)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "UUENS13LDJ5y",
"outputId": "08c30846-cab5-41ce-a7de-fc7daaabc4a0"
},
"outputs": [],
"source": [
"len(document_embeddings)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "25ZYBnhcDMcj",
"outputId": "b9665b05-aea9-4d7b-fc4a-70e578f8eb79"
},
"outputs": [],
"source": [
"len(document_embeddings[0])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "IcQ1Q9PtC9o-",
"outputId": "15b836c5-7519-4ce7-dde0-6f0a8ff833ea"
},
"outputs": [],
"source": [
"for i, embedding in enumerate(document_embeddings):\n",
" print(f\"Document {i+1} embedding: {embedding}\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "z29WzyItDX8x",
"outputId": "7da471bd-609b-493b-d371-c23acc609bed"
},
"outputs": [],
"source": [
"documents"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "1nQF36rhDA9_"
},
"outputs": [],
"source": [
"query = \"Natural language processing techniques enhance keyword extraction efficiency.\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "bJatNM_4Da5y"
},
"outputs": [],
"source": [
"query_embedding = model.encode(query)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "ZxQf2v2TDc3I",
"outputId": "1269d99d-8489-44ab-9782-b8b4cbcf9f02"
},
"outputs": [],
"source": [
"print(\"Query embedding:\", query_embedding)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "161pAXWNE6Ch",
"outputId": "6e81b715-5a8c-4aed-daf1-8f366e6ac0c8"
},
"outputs": [],
"source": [
"len(query_embedding)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "4dZjQsGDDj5m"
},
"outputs": [],
"source": [
"import numpy as np\n",
"from sklearn.metrics.pairwise import cosine_similarity"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "Rcf5V3I7Dp-B"
},
"outputs": [],
"source": [
"similarities = cosine_similarity(np.array([query_embedding]), document_embeddings)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "sfk1qPUeDt_l",
"outputId": "9795635a-1773-4f29-96dd-eab6c337cf5e"
},
"outputs": [],
"source": [
"similarities"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "L_vQO9WoDvLF"
},
"outputs": [],
"source": [
"most_similar_index = np.argmax(similarities)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "kstZiJpZFKLe",
"outputId": "63f259e1-4fa0-432b-eecb-b7575f064f77"
},
"outputs": [],
"source": [
"most_similar_index"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "NQiqJlgNFLHn"
},
"outputs": [],
"source": [
"most_similar_document = documents[most_similar_index]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 35
},
"id": "ghiLFqGEFPxn",
"outputId": "13def6c9-0535-4b37-a51a-c58199e973ed"
},
"outputs": [],
"source": [
"most_similar_document"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 35
},
"id": "J0WDD1hvFQ62",
"outputId": "6decb463-f85d-4720-f866-2eabd198767d"
},
"outputs": [],
"source": [
"query"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "l0FK7NeAFR2V"
},
"outputs": [],
"source": [
"similarity_score = similarities[0][most_similar_index]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "vVYM05pnFa6I",
"outputId": "bb851a47-2724-43c2-b079-2d7f822435ea"
},
"outputs": [],
"source": [
"similarity_score"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "0MDWnmaWFbwm"
},
"outputs": [],
"source": [
"sorted_indices = np.argsort(similarities[0])[::-1]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "2ANc52lQFizG",
"outputId": "d98ed7fc-8c1c-4cef-8d73-64a3203a46b7"
},
"outputs": [],
"source": [
"sorted_indices"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "K3ydRBXWFj0N"
},
"outputs": [],
"source": [
"ranked_documents = [(documents[i], similarities[0][i]) for i in sorted_indices]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "SC6o7OhYFrC_",
"outputId": "204cc6d6-aa63-4296-cc92-003002dc80be"
},
"outputs": [],
"source": [
"ranked_documents"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 35
},
"id": "XKIQNhJ2FsCd",
"outputId": "51c38f25-a8c8-4c06-cdf0-dc1cf7513634"
},
"outputs": [],
"source": [
"query"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "fyEQPRNaFwDn",
"outputId": "58ac9e6f-8ab1-4c61-baa9-84fab33a3d36"
},
"outputs": [],
"source": [
"print(\"Ranked Documents:\")\n",
"for rank, (document, similarity) in enumerate(ranked_documents, start=1):\n",
" print(f\"Rank {rank}: Document - '{document}', Similarity Score - {similarity}\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "nIRnIQbbF6u4",
"outputId": "2cbfeb6e-ded3-4187-eaf7-3e20d4a6a870"
},
"outputs": [],
"source": [
"print(\"Top 4 Documents:\")\n",
"for rank, (document, similarity) in enumerate(ranked_documents[:4], start=1):\n",
" print(f\"Rank {rank}: Document - '{document}', Similarity Score - {similarity}\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 35
},
"id": "_OrCV0bDGWeN",
"outputId": "7850bfa2-1446-4f82-fe3f-7f9b66fcac73"
},
"outputs": [],
"source": [
"query"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "RdUhRaPuGBdG",
"outputId": "07310c7d-e6f7-4448-f486-fc74e977c0b8"
},
"outputs": [],
"source": [
"!pip install rank_bm25"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "V2xHQLECGOPh"
},
"outputs": [],
"source": [
"from rank_bm25 import BM25Okapi"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "IOWKXh97GTt9"
},
"outputs": [],
"source": [
"top_4_documents = [doc[0] for doc in ranked_documents[:4]]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "HL6C8FBkGkvR",
"outputId": "b6008275-ca9e-4f57-af0d-8022926eb976"
},
"outputs": [],
"source": [
"top_4_documents"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "JRxkOMP8GlmO"
},
"outputs": [],
"source": [
"tokenized_top_4_documents = [doc.split() for doc in top_4_documents]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "JXfRdUURGqak",
"outputId": "284d7b52-f5c8-4b54-e65b-93b51ccac61d"
},
"outputs": [],
"source": [
"tokenized_top_4_documents"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "qI6FBUxVGrPG"
},
"outputs": [],
"source": [
"tokenized_query = query.split()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "pmLnmTKwHV3Q",
"outputId": "4e581b53-63c5-4cd7-bd5f-beba513e71a6"
},
"outputs": [],
"source": [
"tokenized_query"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "tFnVRMXgHXGf"
},
"outputs": [],
"source": [
"bm25=BM25Okapi(tokenized_top_4_documents)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "M1JsrrgQHft_",
"outputId": "9ce9731f-1a5f-4a6d-9a5a-c2de87d74441"
},
"outputs": [],
"source": [
"bm25"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "_qSArmibHhIm"
},
"outputs": [],
"source": [
"bm25_scores = bm25.get_scores(tokenized_query)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "uCwf4pd1HsKe",
"outputId": "701d823b-66ad-47ee-bc36-721482a5a30d"
},
"outputs": [],
"source": [
"bm25_scores"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "NZ9O9jCqHuEV"
},
"outputs": [],
"source": [
"sorted_indices2 = np.argsort(bm25_scores)[::-1]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "r4E-x3nyIBoH",
"outputId": "54cf1b6d-0b7c-4530-9d6a-c064af50b876"
},
"outputs": [],
"source": [
"sorted_indices2"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "b7CaBP5QICd2",
"outputId": "46e955c8-59ed-4754-9670-703431fb2939"
},
"outputs": [],
"source": [
"top_4_documents"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 35
},
"id": "uRatS013IM0m",
"outputId": "121e1576-5234-41d0-f283-5e3ebc013549"
},
"outputs": [],
"source": [
"query"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "IDrlrwEZQwgz"
},
"outputs": [],
"source": [
"reranked_documents = [(top_4_documents[i], bm25_scores[i]) for i in sorted_indices2]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "mdlTwxcSQ7UD",
"outputId": "65dc55c0-7add-48c4-8edc-13df8fd6e683"
},
"outputs": [],
"source": [
"reranked_documents"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "Tp0KrhpHQYIC",
"outputId": "d8a30b20-6d03-434e-cda1-cd194d652be0"
},
"outputs": [],
"source": [
"print(\"Rerank of top 4 Documents:\")\n",
"for rank, (document, similarity) in enumerate(reranked_documents, start=1):\n",
" print(f\"Rank {rank}: Document - '{document}', Similarity Score - {similarity}\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "EStAQfpCRS42",
"outputId": "95fb587a-c7d1-4306-8aec-1c1c5117ed59"
},
"outputs": [],
"source": [
"ranked_documents[:4]"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "G4J4__8URiHJ"
},
"source": [
"# Cross-Encoder"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "1-5TWTMuLLP3"
},
"outputs": [],
"source": [
"from sentence_transformers import CrossEncoder"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 232,
"referenced_widgets": [
"333ba957c0164cc7b1367fb4e77c165f",
"60dffc29c59743719ccbe57607b306be",
"1c1b5ab6594246699420d15f40dcbf9a",
"9d17552a55fc44be878dd307775ac321",
"9e0dd06ae62747ffa2afd3603eece633",
"cbedaac8d74e4abca2d169ac86c1a637",
"255b200fe7844af5ae00e2db870baeca",
"8dd323012ba14f6eb099213e75490155",
"9234b774c856496a93449f43840cdf22",
"9945335709b245ca8ec2442cc91e3f17",
"f1ffd96a7aab475283cc4eeec7f51e79",
"35a8f13d044b46839c6929f4b3c051b5",
"47f9be3cc02742e8a4a0631e32ebef7c",
"334ee20108d542f0adc8f90a03639ce2",
"e90378ec97604ff7a9d210d9ab813770",
"51a10f9e1520468abe3aa8ab11b96fe4",
"8ab3c4817e1646e8997d3a7dbe2d8de0",
"c4d4e21690a8406db75c681fe5a982c3",
"d8a6c3f4dd5843808e9f892568528b2f",
"46cd6012a4764a61bc4d4afb901adfbf",
"e35dfa10dc3343498f7d99b99f8f9bbe",
"06f393835dcd46bd826335e62c32de9d",
"8e3c909019364932843a339d20f5b361",
"18dac171c82043688a6d2f181ff675db",
"49768866bcd04cd1a1d9a87bd52d8ece",
"01db3797f9bf4f538502c97de09c87d0",
"d31a76180c5049768d150c88cdb56a6d",
"876314d86ebf4879a3f831a982d6c9a0",
"5907a798063a4e5cb2c943a23eb82d70",
"acce923405544520ac3173e6d98e1c1c",
"754f2e87a56e410d961c8bb803258d22",
"f9c9e5dd77294f47b227fceea135c663",
"1145cfda26014c00a372cad81fd7292f",
"5e0aa6c336094cdcbff419ace9866327",
"bf81c8f43ac8436eaf7e4011c51a05be",
"38cef872a51f4ef499e0fd885144c593",
"b95e39b0d7304f6a8e8f3c1f539b5cfe",
"7a87c8bf911f403d92563ac4b0c8e708",
"0002e0e41c2246c8828d54ff07773cd7",
"ff569f3b6df641d48f6657e699c32c5a",
"e4f33c9295cf402e803f9f0ffa676894",
"1b167f862fa44b199524107af690eaea",
"be453dbcf0ab4bda8eb5b4586ea260df",
"f993ac5a08e948cebb08ce7b03cb1553",
"41feb9480c9d4766870aae759b13eb83",
"b331725039514bdd855459d96399b6b4",
"186c4b647c444c3a8883ad7356057d82",
"2eee9553552c424c9c2af6a203122659",
"00f97ae5e93b4b9c8a2875ad7261d920",
"bb61044a9b8d4a0b8046323b1362d5c0",
"f82665fb752e424fa74862157349de6f",
"893ed7f5802543a1a91ad502cb4604c4",
"7ed3ad5926fa416689ab21d83d3c4130",
"1d89db0e3d1e44acbf306d20b8bf38fb",
"09987e3201424e1f9958604ea336e601"
]
},
"id": "i0mlZrepRlY5",
"outputId": "1e56d95b-710a-4b33-ffee-4e4e3326e790"
},
"outputs": [],
"source": [
"cross_encoder = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "3qe0M86ERoHJ",
"outputId": "8d042ab6-4236-4adb-f0a8-4f8d3a7c65df"
},
"outputs": [],
"source": [
"top_4_documents"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 35
},
"id": "aEpEpJGHRrQD",
"outputId": "4466a61f-3662-4d03-9068-40d5b7e8f58f"
},
"outputs": [],
"source": [
"query"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "pl0C674yRtFQ"
},
"outputs": [],
"source": [
"pairs = []\n",
"for doc in top_4_documents:\n",
" pairs.append([query, doc])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "XTFyfLz5XXO8",
"outputId": "b47d87a7-b7b7-47f5-b535-34ee99a37f10"
},
"outputs": [],
"source": [
"pairs"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "9Mo5_OHVRu0x",
"outputId": "1156480a-fecf-4491-ab6f-2ab02d7fdef6"
},
"outputs": [],
"source": [
"scores = cross_encoder.predict(pairs)\n",
"scores"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "3RstDfogRwZi"
},
"outputs": [],
"source": [
"scored_docs = zip(scores, top_4_documents)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "M2feALlgXqnq",
"outputId": "cf780c2c-41a9-419a-e922-aab20209a2b7"
},
"outputs": [],
"source": [
"scored_docs"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "mPmPy-XwRy6n"
},
"outputs": [],
"source": [
"reranked_document_cross_encoder = sorted(scored_docs, reverse=True)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "eXb2dOvwR0YR",
"outputId": "dc214e72-abaf-4f51-f9e7-e4ba0aad906a"
},
"outputs": [],
"source": [
"reranked_document_cross_encoder"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ZCb89yk6X600"
},
"source": [
"# BM_25"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "jkxrfvbSR2cz",
"outputId": "598a94dc-9cae-4d70-fac1-4553ccb64e66"
},
"outputs": [],
"source": [
"reranked_documents"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "L9nMXBERSroy",
"outputId": "a7620b5f-c385-43a3-f6de-0efb5457dbf3"
},
"outputs": [],
"source": [
"!pip install cohere"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "_a4J2-TfS2vC"
},
"outputs": [],
"source": [
"import cohere"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "SdLeyOkES5OP"
},
"outputs": [],
"source": [
"co = cohere.Client(\"nbDqU1hTVxWmXGbLYI6OnYhp4Cx40MZ5hOmO5oKX\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "oG-b7zwjTJu6",
"outputId": "383724b7-6087-4623-a061-9590f515975f"
},
"outputs": [],
"source": [
"top_4_documents"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 35
},
"id": "yb8ykLpRTMBk",
"outputId": "7b160b1a-9256-406f-ddee-6e4db54de349"
},
"outputs": [],
"source": [
"query"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "FYPqN4zZS6wC"
},
"outputs": [],
"source": [
"response = co.rerank(\n",
" model=\"rerank-english-v3.0\",\n",
" query=\"Natural language processing techniques enhance keyword extraction efficiency.\",\n",
" documents=top_4_documents,\n",
" return_documents=True\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "i_PU-k1HTXbR",
"outputId": "3657e386-83be-4fa0-a2ac-e7800c11aa31"
},
"outputs": [],
"source": [
"print(response)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 35
},
"id": "j6rK9-qJTaLZ",
"outputId": "698a9024-284e-4314-c00b-13c7c5a8bfb2"
},
"outputs": [],
"source": [
"response.results[0].document.text"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "JhWpXlwsTcAr",
"outputId": "56c8afcf-a423-480c-917d-5b1056335b1c"
},
"outputs": [],
"source": [
"response.results[0].relevance_score"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "XK91v711TdaV",
"outputId": "26a032c6-07c9-47a9-d0b1-221e5edf0c2a"
},
"outputs": [],
"source": [
"for i in range(4):\n",
" print(f'text: {response.results[i].document.text} score: {response.results[i].relevance_score}')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "vHkJZ_ODTe5a"
},
"outputs": [],
"source": []
}
],
"metadata": {
"accelerator": "GPU",
"colab": {
"authorship_tag": "ABX9TyMiYSfyl0P/2phVKD60MU27",
"gpuType": "T4",
"include_colab_link": true,
"provenance": []
},
"kernelspec": {
"display_name": "Python 3",
"name": "python3"
},
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
================================================
FILE: Advance RAG with Hybrid Search and Reranker/Hybrid_Search_and_reranking_in_RAG.ipynb
================================================
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "view-in-github"
},
"source": [
"<a href=\"https://colab.research.google.com/github/sunnysavita10/Indepth-GENAI/blob/main/Hybrid_Search_and_reranking_in_RAG.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ZHlE17nUjXnp"
},
"source": [
"https://s4ds.org/\n",
"\n",
"https://www.icdmai.org/\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "qmp_SaX69q18",
"outputId": "63596de4-d1d9-4d78-cf94-7586f314ec44"
},
"outputs": [],
"source": [
"!pip install weaviate-client"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "qQLSw3iJ_0RX",
"outputId": "d628e74a-a8de-42d2-ed1a-522acb9c3f51"
},
"outputs": [],
"source": [
"!pip install langchain"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "4lpn398P__vR",
"outputId": "2f217e89-f2ad-4b53-9968-dfa0d3c857ef"
},
"outputs": [],
"source": [
"!pip install -U langchain-community"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "RSik_tYq-JRN"
},
"outputs": [],
"source": [
"import weaviate"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "M5rKS1Co-22r"
},
"outputs": [],
"source": [
"WEAVIATE_CLUSTER=\"https://hybridsearch-ewd5zpr1.weaviate.network\"\n",
"WEAVIATE_API_KEY=\"\" # Replace with your Weaviate API key"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "ovLN44VY-6tU"
},
"outputs": [],
"source": [
"WEAVIATE_URL = WEAVIATE_CLUSTER\n",
"WEAVIATE_API_KEY = WEAVIATE_API_KEY"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "Z93YcxMF_iCN"
},
"outputs": [],
"source": [
"HF_TOKEN=\"\" # Replace with your Hugging Face API token"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "JUDJ74Ut_N-M"
},
"outputs": [],
"source": [
"import os"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "YFrBhzvM--rd"
},
"outputs": [],
"source": [
"client = weaviate.Client(\n",
" url=WEAVIATE_URL, auth_client_secret=weaviate.AuthApiKey(WEAVIATE_API_KEY),\n",
" additional_headers={\n",
" \"X-HuggingFace-Api-Key\": HF_TOKEN\n",
" },\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "LQJQDj68Cy4J",
"outputId": "ccf1aad1-8ca1-4079-b284-2f60397d0cd1"
},
"outputs": [],
"source": [
"client.is_ready()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "6ouOrLG2B9wj",
"outputId": "3038912b-d5cb-4714-9803-6706392ca7cf"
},
"outputs": [],
"source": [
"client.schema.get()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "9zR5jAGHC3bS"
},
"outputs": [],
"source": [
"client.schema.delete_all()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "7l8nTgbRDCWt"
},
"outputs": [],
"source": [
"schema = {\n",
" \"classes\": [\n",
" {\n",
" \"class\": \"RAG\",\n",
" \"description\": \"Documents for RAG\",\n",
" \"vectorizer\": \"text2vec-huggingface\",\n",
" \"moduleConfig\": {\"text2vec-huggingface\": {\"model\": \"sentence-transformers/all-MiniLM-L6-v2\", \"type\": \"text\"}},\n",
" \"properties\": [\n",
" {\n",
" \"dataType\": [\"text\"],\n",
" \"description\": \"The content of the paragraph\",\n",
" \"moduleConfig\": {\n",
" \"text2vec-huggingface\": {\n",
" \"skip\": False,\n",
" \"vectorizePropertyName\": False,\n",
" }\n",
" },\n",
" \"name\": \"content\",\n",
" },\n",
" ],\n",
" },\n",
" ]\n",
"}\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "XxlykBOsD4oW"
},
"outputs": [],
"source": [
"client.schema.create(schema)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "boKhfW7xD8je",
"outputId": "6dec38eb-ab67-428a-c5fa-79849de612f5"
},
"outputs": [],
"source": [
"client.schema.get()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "9fYFxszF_lTL"
},
"outputs": [],
"source": [
"from langchain.retrievers.weaviate_hybrid_search import WeaviateHybridSearchRetriever"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "xDD_FAKZ_sZK"
},
"outputs": [],
"source": [
"retriever = WeaviateHybridSearchRetriever(\n",
" alpha = 0.5, # defaults to 0.5, which is equal weighting between keyword and semantic search\n",
" client = client, # keyword arguments to pass to the Weaviate client\n",
" index_name = \"RAG\", # The name of the index to use\n",
" text_key = \"content\", # The name of the text key to use\n",
" attributes = [], # The attributes to return in the results\n",
" create_schema_if_missing=True,\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "RJLYAGHbE1Z5"
},
"outputs": [],
"source": [
"model_name = \"HuggingFaceH4/zephyr-7b-beta\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "1w6ml1DsEv-q",
"outputId": "235602d0-14da-4ebb-fd21-fe314ed872c5"
},
"outputs": [],
"source": [
"!pip install bitsandbytes"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "LtJsxhOmEzWX",
"outputId": "ceb6003d-d09d-4e19-90fe-beb540912dc7"
},
"outputs": [],
"source": [
"!pip install accelerate"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "7YcsnAveEiFy"
},
"outputs": [],
"source": [
"import torch\n",
"from transformers import ( AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, pipeline, )\n",
"from langchain import HuggingFacePipeline"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "Yflg19-qEiJs"
},
"outputs": [],
"source": [
"# function for loading 4-bit quantized model\n",
"def load_quantized_model(model_name: str):\n",
" \"\"\"\n",
" model_name: Name or path of the model to be loaded.\n",
" return: Loaded quantized model.\n",
" \"\"\"\n",
" bnb_config = BitsAndBytesConfig(\n",
" load_in_4bit=True,\n",
" bnb_4bit_use_double_quant=True,\n",
" bnb_4bit_quant_type=\"nf4\",\n",
" bnb_4bit_compute_dtype=torch.bfloat16,\n",
" low_cpu_mem_usage=True\n",
" )\n",
"\n",
" model = AutoModelForCausalLM.from_pretrained(\n",
" model_name,\n",
" torch_dtype=torch.bfloat16,\n",
" quantization_config=bnb_config,\n",
" )\n",
" return model"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "Pfdzn1ukEiMd"
},
"outputs": [],
"source": [
"# initializing tokenizer\n",
"def initialize_tokenizer(model_name: str):\n",
" \"\"\"\n",
" model_name: Name or path of the model for tokenizer initialization.\n",
" return: Initialized tokenizer.\n",
" \"\"\"\n",
" tokenizer = AutoTokenizer.from_pretrained(model_name, return_token_type_ids=False)\n",
" tokenizer.bos_token_id = 1 # Set beginning of sentence token id\n",
" return tokenizer"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "a8UgT93sEiQK"
},
"outputs": [],
"source": [
"tokenizer = initialize_tokenizer(model_name)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 104,
"referenced_widgets": [
"1a3922c925d243fe825c2fdffc1ac440",
"848a9e20a5ff46329ac18f0f168a5d52",
"3c0f9911a51648cb8be3aaf49a806575",
"3f634bcca28549c8b922c73c7b475d91",
"25ddfbae30f74ca6b5baf5cc1d94bcb1",
"de412a1000a94bea8707e1cdc8d805b7",
"65283aca47324d5b917ba33f61e2f240",
"7a27e7b7ea6045a7a855237fd2a009e8",
"e85f3538253c482eb76e42e6341abb83",
"791e2040d86848d6be8fbc486e8ab8b5",
"201266a8824041118a32f623036eb633"
]
},
"id": "Csv9lG6cErbb",
"outputId": "1984deee-8c48-49af-bd66-9ee1d3018221"
},
"outputs": [],
"source": [
"model = load_quantized_model(model_name)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 446
},
"id": "IplrZgxvEreX",
"outputId": "82543b1a-a0bf-4693-e975-98fce166013b"
},
"outputs": [],
"source": [
"pipeline = pipeline(\n",
" \"text-generation\",\n",
" model=model,\n",
" tokenizer=tokenizer,\n",
" use_cache=True,\n",
" device_map=\"auto\",\n",
" #max_length=2048,\n",
" do_sample=True,\n",
" top_k=5,\n",
" max_new_tokens=100,\n",
" num_return_sequences=1,\n",
" eos_token_id=tokenizer.eos_token_id,\n",
" pad_token_id=tokenizer.pad_token_id,\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "Uo348jKvErhO"
},
"outputs": [],
"source": [
"llm = HuggingFacePipeline(pipeline=pipeline)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "uva-5Nkqpr8w"
},
"outputs": [],
"source": [
"doc_path=\"/content/Retrieval-Augmented-Generation-for-NLP.pdf\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "BTNRvdSNp9jC",
"outputId": "68d151fe-ac47-4e64-9d56-7cabd3fb2c50"
},
"outputs": [],
"source": [
"!pip install pypdf"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "ev8_SeQIp_4A",
"outputId": "f4dc1edd-7f8d-4d60-da77-96284597c657"
},
"outputs": [],
"source": [
"!pip install langchain_community"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "3n-7-QWyp_8x"
},
"outputs": [],
"source": [
"from langchain_community.document_loaders import PyPDFLoader"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "nhBRpl8dsHw6"
},
"outputs": [],
"source": [
"loader = PyPDFLoader(doc_path)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "xHegGUGssHzV"
},
"outputs": [],
"source": [
"docs = loader.load()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "gpshkBhjvLlC",
"outputId": "6fa66ef9-f60c-4e6a-ad16-0d1464f27246"
},
"outputs": [],
"source": [
"docs"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "6DNTiAvgvNYC",
"outputId": "3bc37592-d65b-473e-ae39-ec0dd2b79c40"
},
"outputs": [],
"source": [
"docs[6]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "Ux831sq2pq3C",
"outputId": "3ee30aa3-f465-4d02-e4ea-4a2e07b6bc69"
},
"outputs": [],
"source": [
"retriever.add_documents(docs)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "jRoDhLHjsy5f",
"outputId": "9e1b9921-2fe7-4549-dfa0-b64fef8da144"
},
"outputs": [],
"source": [
"print(retriever.invoke(\"what is RAG token?\")[0].page_content)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "WHdda33buBrS",
"outputId": "843cffb4-caad-4033-97ce-e43c4035e3b3"
},
"outputs": [],
"source": [
"retriever.invoke(\n",
" \"what is RAG token?\",\n",
" score=True\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "Vt5vaVuLEdY9"
},
"outputs": [],
"source": [
"from langchain.chains import RetrievalQA"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "HkhbVjqiMJXJ"
},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "heu-l-l176Pp"
},
"outputs": [],
"source": [
"from langchain_core.prompts import ChatPromptTemplate"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "RrEl6Nm87_Vi"
},
"outputs": [],
"source": [
"system_prompt = (\n",
" \"Use the given context to answer the question. \"\n",
" \"If you don't know the answer, say you don't know. \"\n",
" \"Use three sentence maximum and keep the answer concise. \"\n",
" \"Context: {context}\"\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "Gg0TRf_Q72P6"
},
"outputs": [],
"source": [
"prompt = ChatPromptTemplate.from_messages(\n",
" [\n",
" (\"system\", system_prompt),\n",
" (\"human\", \"{query}\"),\n",
" ]\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "GNPZSFun-4Ka"
},
"outputs": [],
"source": [
"from langchain.prompts import PromptTemplate\n",
"template = \"\"\"\n",
"Use the following pieces of context to answer the question at the end.\n",
"If you don't know the answer, just say that you do not have the relevant information needed to provide a verified answer, don't try to make up an answer.\n",
"When providing an answer, aim for clarity and precision. Position yourself as a knowledgeable authority on the topic, but also be mindful to explain the information in a manner that is accessible and comprehensible to those without a technical background.\n",
"Always say \"Do you have any more questions pertaining to this instrument?\" at the end of the answer.\n",
"{context}\n",
"Question: {question}\n",
"Helpful Answer:\"\"\"\n",
"\n",
"prompt = PromptTemplate.from_template(template)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "Q3lt9jMW8hxK"
},
"outputs": [],
"source": [
"from langchain.chains.combine_documents import create_stuff_documents_chain"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "ppRiYOIa8b6y"
},
"outputs": [],
"source": [
"question_answer_chain = create_stuff_documents_chain(llm, prompt)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "3t7fVtBaAOfq"
},
"outputs": [],
"source": [
"hybrid_chain = RetrievalQA.from_chain_type(llm=llm, chain_type=\"stuff\", retriever=retriever,)\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "I0DfMLiJ6lbr",
"outputId": "29e93eae-37ce-48b4-8c49-dd04a7195edc"
},
"outputs": [],
"source": [
"result1 = hybrid_chain.invoke(\"what is natural language processing?\")\n",
"print(result1)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "Flsjn21WMypT",
"outputId": "3e4d6073-bfd3-4c31-b08c-d5fe399e8935"
},
"outputs": [],
"source": [
"print(result1['result'])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "QhG3Krz99APy"
},
"outputs": [],
"source": [
"query=\"What is Abstractive Question Answering?\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 304
},
"id": "hmmRp1O_ArC9",
"outputId": "e56e99d7-4ddf-460a-e5a7-330b968d5cf6"
},
"outputs": [],
"source": [
"response = hybrid_chain.invoke({\"query\":query})"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "LZ-Id5sW-LLR"
},
"outputs": [],
"source": [
"from langchain_core.runnables import RunnableParallel, RunnablePassthrough"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "b1DvxugA-DIC"
},
"outputs": [],
"source": [
"# Set up the RAG chain\n",
"rag_chain = (\n",
" {\"context\": retriever, \"question\": RunnablePassthrough()} |\n",
" prompt |\n",
" llm\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "OTU5Wycg-l9y"
},
"outputs": [],
"source": [
"query=\"what is RAG token?\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "ykAekNO_-bkZ",
"outputId": "140e6b43-ffac-43f3-c2bd-17959f6dea91"
},
"outputs": [],
"source": [
"response=rag_chain.invoke(\"what is RAG token?\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "iqKKvHQ-_05x",
"outputId": "a07f9be8-c605-4902-e957-7a005e296185"
},
"outputs": [],
"source": [
"print(response)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "KWe11B_3H6Yc",
"outputId": "08ac50fc-d407-4647-d7ac-ce0b786e5dd1"
},
"outputs": [],
"source": [
"response"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "_Y6DcD3Z5lZp",
"outputId": "f792675c-237c-4e08-9f09-ed5229d4dad5"
},
"outputs": [],
"source": [
"print(response[\"result\"])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "0A3hrUdwJ3pC"
},
"outputs": [],
"source": [
"from langchain.retrievers import ContextualCompressionRetriever\n",
"from langchain.retrievers.document_compressors import CohereRerank"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "1VewE8gRKCla",
"outputId": "48c1abc0-eb81-4bec-ada4-00c316b18120"
},
"outputs": [],
"source": [
"!pip install cohere"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "OE0vUax4J-Ij"
},
"outputs": [],
"source": [
"compressor = CohereRerank(cohere_api_key=\"\") # Replace with your Cohere API key"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "b3Kmr4CIKG7n"
},
"outputs": [],
"source": [
"compression_retriever = ContextualCompressionRetriever(\n",
" base_compressor=compressor, base_retriever=retriever\n",
" )"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "f7m22qlCiUAb"
},
"outputs": [],
"source": [
"compressed_docs = compression_retriever.get_relevant_documents(user_query)\n",
"# Print the relevant documents from using the embeddings and reranker\n",
"print(compressed_docs)\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "0dKqM3XbKkE4"
},
"outputs": [],
"source": [
"hybrid_chain = RetrievalQA.from_chain_type(\n",
" llm=llm, chain_type=\"stuff\", retriever=compression_retriever\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "2N2k_RCmKAIL",
"outputId": "466dc508-4180-48d4-f167-fd267628dd92"
},
"outputs": [],
"source": [
"response = hybrid_chain.invoke(\"What is Abstractive Question Answering?\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "DVJxJg-bK2pg",
"outputId": "9ee8590f-6350-4821-cb51-e497e4a020c0"
},
"outputs": [],
"source": [
"print(response.get(\"result\"))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "8Wa3jBEgLwXB",
"outputId": "d5e1a29a-5969-4ff2-d147-cdd49d2f7ed0"
},
"outputs": [],
"source": [
"print(response.get(\"result\"))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "tcdaBC5gMCzh"
},
"outputs": [],
"source": []
}
],
"metadata": {
"accelerator": "GPU",
"colab": {
"gpuType": "T4",
"include_colab_link": true,
"provenance": []
},
"kernelspec": {
"display_name": "Python 3",
"name": "python3"
},
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
================================================
FILE: Chat with Multiple Doc using Astradb and Langchain/Chat_With_Multiple_Doc(pdfs,_docs,_txt,_pptx)_using_AstraDB_and_Langchain.ipynb
================================================
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "9RDOffvrZ3F4"
},
"outputs": [],
"source": [
"!pip install langchain\n",
"!pip install unstructured\n",
"!pip install openai\n",
"!pip install Cython\n",
"!pip install tiktoken"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "i929xxKLnRgr",
"outputId": "a3e71b8a-85a9-4dc0-c259-c19cb5039baf"
},
"outputs": [],
"source": [
"!pip install --upgrade langchain-astradb"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 1000
},
"id": "IWdY3uvRnZKn",
"outputId": "4fadc829-460e-410d-fc7d-f4013ee62966"
},
"outputs": [],
"source": [
"!pip install langchain langchain-openai datasets pypdf"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "B6oJrqqRauvY"
},
"outputs": [],
"source": [
"!pip install pdf2image"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "Ox_1QUszavjV"
},
"outputs": [],
"source": [
"!pip install pdfminer.six"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "fvp_dAEWayjg"
},
"outputs": [],
"source": [
"!pip install unstructured[pdf]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "gPuH-fXlnaiX"
},
"outputs": [],
"source": [
"import os\n",
"from getpass import getpass\n",
"\n",
"from datasets import (\n",
" load_dataset,\n",
")\n",
"from langchain_community.document_loaders import PyPDFLoader\n",
"from langchain_core.documents import Document\n",
"from langchain_core.output_parsers import StrOutputParser\n",
"from langchain_core.prompts import ChatPromptTemplate\n",
"from langchain_core.runnables import RunnablePassthrough\n",
"from langchain_openai import ChatOpenAI, OpenAIEmbeddings\n",
"from langchain_text_splitters import RecursiveCharacterTextSplitter\n",
"from langchain.document_loaders import UnstructuredPDFLoader\n",
"from langchain.indexes import VectorstoreIndexCreator"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "Bost4y11ngS2"
},
"outputs": [],
"source": [
"import os\n",
"from google.colab import userdata\n",
"OPENAI_API_KEY=userdata.get('OPENAI_API_KEY')\n",
"os.environ[\"OPENAI_API_KEY\"] = OPENAI_API_KEY\n",
"\n",
"embedding = OpenAIEmbeddings()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "gXD1e0iknq9m"
},
"outputs": [],
"source": [
"embedding = OpenAIEmbeddings()"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "BhXC2nsaaao4"
},
"source": [
"# Using Unstructured for loading Multiple Pdfs"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "obMEfgOUaYoI"
},
"outputs": [],
"source": [
"root_dir=\"/content/\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "fHwmBphmaMrJ"
},
"outputs": [],
"source": [
"pdf_folder_path = f'{root_dir}/docs/'"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "EXg7WYjmaMx6"
},
"outputs": [],
"source": [
"os.listdir(pdf_folder_path)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "gdyyz5uDbF65"
},
"outputs": [],
"source": [
"# location of the pdf file/files.\n",
"loaders = [UnstructuredPDFLoader(os.path.join(pdf_folder_path, fn)) for fn in os.listdir(pdf_folder_path)]"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "cIOOjInebHHR"
},
"outputs": [],
"source": [
"loaders"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "C6sNGjHsaM05"
},
"outputs": [],
"source": [
"index = VectorstoreIndexCreator().from_loaders(loaders)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "TyONx7bRaM6q"
},
"outputs": [],
"source": [
"index.query('What is the tokenization in RAG?')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "1kCaJmvhaM9o"
},
"outputs": [],
"source": [
"index.query_with_sources('What is the tokenization in RAG?')"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "2X3IRcpxbSKZ"
},
"source": [
"# Pypdf loader with Multiple Pdfs."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "btAgdVVknvyd"
},
"outputs": [],
"source": [
"from langchain_astradb import AstraDBVectorStore"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "DgLFd0Kd2nIO"
},
"outputs": [],
"source": [
"from langchain_astradb import AstraDBVectorStore\n",
"ASTRA_DB_API_ENDPOINT=\"https://d2357619-8f04-4cfd-bc3a-16e410893ba3-us-east-2.apps.astra.datastax.com\"\n",
"ASTRA_DB_APPLICATION_TOKEN=\"ASTRA_TOKEN_REMOVEDhTmlZSqmAOUHSWZaeNqzEDOR:1128826e960e49c2508b3014ae7fa40e6b5d0490d8565702a30b4ea338083a4a\"\n",
"ASTRA_DB_KEYSPACE=\"default_keyspace\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "fLh8RfMwaNLM"
},
"outputs": [],
"source": [
"root_dir=\"/content/\"\n",
"pdf_folder_path = f'{root_dir}/data/'\n",
"pdfs=os.listdir(pdf_folder_path)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "Quw8romYBEpV",
"outputId": "d3a645f6-8cce-4d4a-b0c5-3d35f2ae51ae"
},
"outputs": [],
"source": [
"pdfs"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "iGWaBKx7BSiP"
},
"outputs": [],
"source": [
"data=PyPDFLoader(\"/content/data/MachineTranslationwithAttention.pdf\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "Z4Y6bmItBhbB",
"outputId": "0ba46137-2b3e-42b7-90ae-b9afefcad5b4"
},
"outputs": [],
"source": [
"data"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "u280YuAtCCzX"
},
"outputs": [],
"source": [
"splitter = RecursiveCharacterTextSplitter(chunk_size=512, chunk_overlap=64)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "NRH8dsh5B-n9",
"outputId": "c092b97a-84e9-4605-bf8b-010ee09482c8"
},
"outputs": [],
"source": [
"data.load_and_split(text_splitter=splitter)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "hK6CgClrbbS5"
},
"outputs": [],
"source": [
"docs=[]\n",
"for pdf in pdfs:\n",
" data=PyPDFLoader(f\"/content/data/{pdf}\")\n",
" docs.append(data)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 158
},
"id": "agk6IZLabd3p",
"outputId": "ffdbaa2b-58a4-406f-c781-a9a9fa2b20c7"
},
"outputs": [],
"source": [
"\n",
"docs_from_pdf = docs.load_and_split(text_splitter=splitter)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "oNllVkvIbgKM"
},
"outputs": [],
"source": [
"print(f\"Documents from PDF: {len(docs_from_pdf)}.\")\n",
"inserted_ids_from_pdf = vstore.add_documents(docs_from_pdf)\n",
"print(f\"Inserted {len(inserted_ids_from_pdf)} documents.\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "n4G743wn3i9F"
},
"outputs": [],
"source": [
"vstore = AstraDBVectorStore(\n",
" embedding=embedding,\n",
" collection_name=\"astra_vector_demo\",\n",
" api_endpoint=ASTRA_DB_API_ENDPOINT,\n",
" token=ASTRA_DB_APPLICATION_TOKEN,\n",
" namespace=ASTRA_DB_KEYSPACE,\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "cfzD7a8naIEK"
},
"outputs": [],
"source": [
"retriever = vstore.as_retriever(search_kwargs={\"k\": 3})"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "9Y2EFU9_aINQ"
},
"outputs": [],
"source": [
"prompt_template = \"\"\"\n",
"You are a philosopher that draws inspiration from great thinkers of the past\n",
"to craft well-thought answers to user questions. Use the provided context as the basis\n",
"for your answers and do not make up new reasoning paths - just mix-and-match what you are given.\n",
"Your answers must be concise and to the point, and refrain from answering about other topics than philosophy.\n",
"\n",
"CONTEXT:\n",
"{context}\n",
"\n",
"QUESTION: {question}\n",
"\n",
"YOUR ANSWER:\"\"\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "Nx0rM706aIPo"
},
"outputs": [],
"source": [
"prompt_template = ChatPromptTemplate.from_template(prompt_template)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "tRg2VFehaISq"
},
"outputs": [],
"source": [
"llm = ChatOpenAI()\n",
"\n",
"chain = (\n",
" {\"context\": retriever, \"question\": RunnablePassthrough()}\n",
" | philo_prompt\n",
" | llm\n",
" | StrOutputParser()\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "D9pg2syhbyHI"
},
"outputs": [],
"source": [
"chain.invoke(\"How does Russel elaborate on Peirce's idea of the security blanket?\")"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "v2b452jhb6mh"
},
"source": [
"# Directory loders(Chat With Multiple Doc)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "tZS1rEQB7YOP"
},
"outputs": [],
"source": [
"!rm -rf \"/content/docs/.ipynb_checkpoints\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "1yqDtZ1M3z8U",
"outputId": "1602047a-f75d-4544-e49d-d1ea5405e3f6"
},
"outputs": [],
"source": [
"%pip install langchain_community"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "1UuNkzrU5Q5q",
"outputId": "3f4e6178-064f-406e-cf4a-909229fb3da6"
},
"outputs": [],
"source": [
"!pip install unstructured"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 1000
},
"id": "ksK7gi4p5d1l",
"outputId": "dc8ba9fb-b8fd-46cc-b38c-ebbafca693c7"
},
"outputs": [],
"source": [
"!pip install \"unstructured[pdf]\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "3uk7ezbu7OQp",
"outputId": "1bbf4d20-f90d-4247-b38f-bbab19599190"
},
"outputs": [],
"source": [
"!sudo apt-get update"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "nkesIO_m7P9P",
"outputId": "0c321129-5b04-4a63-fded-445eab6bb4a2"
},
"outputs": [],
"source": [
"!sudo apt-get install poppler-utils"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "o9OycnSq7Tt9",
"outputId": "689a781d-dfb9-4c9d-d386-9f17804a3006"
},
"outputs": [],
"source": [
"!sudo apt-get install libleptonica-dev tesseract-ocr libtesseract-dev python3-pil tesseract-ocr-eng tesseract-ocr-script-latn\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "TMP99Q_y7XWl",
"outputId": "38690471-e581-4be4-d99c-9e5f0d07f120"
},
"outputs": [],
"source": [
"!pip install unstructured-pytesseract\n",
"!pip install tesseract-ocr"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "RruMFEmtMhVw",
"outputId": "12220b5c-ef1f-451d-997d-9283aa4cbb84"
},
"outputs": [],
"source": [
"!pip install \"unstructured[pptx]\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "DR8YmEFX_bXo",
"outputId": "6fbcfd3f-9d50-44b7-c210-7b2bc74abb06"
},
"outputs": [],
"source": [
"!pip install langchain_astradb"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "4rN3g_sxPLjN",
"outputId": "93096ee0-bcca-4233-a170-ee4a68ad727e"
},
"outputs": [],
"source": [
"!pip install langchain langchain-openai datasets pypdf"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "AjAFSJYlDpkA"
},
"outputs": [],
"source": [
"from langchain_text_splitters import RecursiveCharacterTextSplitter"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "nBVPhAdPDNE3"
},
"outputs": [],
"source": [
"from langchain_community.document_loaders import DirectoryLoader\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "GYA9S1oaU1g3"
},
"outputs": [],
"source": [
"loader = DirectoryLoader('/content/docs')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "icOls_EgDQy_"
},
"outputs": [],
"source": [
"splitter = RecursiveCharacterTextSplitter(chunk_size=512, chunk_overlap=64)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 304,
"referenced_widgets": [
"8e035199e06b40eabbe34b3852c53034",
"75777df2725d4509adaddc144ee52baa",
"61c802dc2e7a486e91ed26b43a579a65",
"9123e7f5130a4f498130e117726e8430",
"2ed0f1ed939949518905dbcd850f9ee8",
"73d7a2d3325a469c89c275c0d6912551",
"7cf8bdbebd52448bb8444099cfb70886",
"0fb6312ab0b94e92a8989059794038ce",
"bcc6d619f2ce49b6bb2ad45099d229a2",
"80b344425b2e46868c6988d0b6bf0a60",
"f12e15d72d634c8ba643a468f4d76735",
"4e840cdb44a44a99b851cbe1673db6b5",
"4f02de36c68c4a12978129ce6856a104",
"c2f04856ddcb4fd1bbc1ead4274ce0aa",
"bc28ccf6311547daaa88e14861fc653d",
"8e12ba7b025e42b09bff0115dd840e49",
"b4207375ca9442d9b88cbaa5810f5041",
"51fb8e49361e48479028d3112a4bcd90",
"6216f9fd90b44c97bc251ad1b554047d",
"64e9f9d5e7504dbea334234d4788089d",
"0b912a2a673f4daea2da687bb94547c6",
"e4d2f8a121c54c49b681a767ac1fc3b1",
"8adde71b0962495c80261b2dd1d4abf3",
"9596bb6c2fa149b4945ab2d10e207e84",
"bcd1278264fa44518f09164105271b22",
"93edc57d3b134be88f4b5d0fdf12ebbc",
"88c95ef5ee33412c8141ffc7c11c702e",
"af1fea75b14f4b0a936513c4f3074fbc",
"3f9760917bcf4e249f16f34a2361b73c",
"037f7836ab6f465caf2b87dc5b7aef63",
"829923a46f24479ea648945c677d9e3a",
"db4ea8e3882c493cb980e9dfd8151a84",
"a67ed49aae1544e3b5a9b141d1c5dd3e",
"9d9f277060934802932d690307fc9685",
"1a3c537f212645fda454ffa103aac256",
"1646ce29bbb5425a9262a009f7fa2a13",
"3b8846ae905f4b7683e4f5e422e21f75",
"5b9853b590fe415fb559ae396a7bc3c7",
"f63494b47dbe412cb82f29a350cbbbc2",
"12d2ab6a477e4b94a83dae2651c6fb4b",
"d7e400593ed24394a24fb07c069b83c9",
"bc2a5e16203d4c83a976cd85e9622467",
"5b38a4d2ed5b4078b13b8397d6439ae8",
"c38f90a27964497db1c6f500510b4c03"
]
},
"id": "gXfYNkYx5Lx7",
"outputId": "7097f252-1c7e-45e2-bf13-044263056b27"
},
"outputs": [],
"source": [
"docs = loader.load_and_split(text_splitter=splitter)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "uaBLlukoN1in",
"outputId": "91fbb728-e2b2-4eb1-e6e1-25531fcb53a9"
},
"outputs": [],
"source": [
"len(docs)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "irbe3D7R_J_n"
},
"outputs": [],
"source": [
"import os\n",
"from langchain_core.documents import Document\n",
"from langchain_community.document_loaders import PyPDFLoader\n",
"\n",
"from langchain_core.output_parsers import StrOutputParser\n",
"from langchain_core.prompts import ChatPromptTemplate\n",
"from langchain_core.runnables import RunnablePassthrough\n",
"from langchain_openai import ChatOpenAI, OpenAIEmbeddings\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "YoyE7fpl_pDB"
},
"outputs": [],
"source": [
"import os\n",
"from google.colab import userdata\n",
"OPENAI_API_KEY=userdata.get('OPENAI_API_KEY')\n",
"os.environ[\"OPENAI_API_KEY\"] = OPENAI_API_KEY"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "mVnI4Sc5_pxr"
},
"outputs": [],
"source": [
"embedding = OpenAIEmbeddings()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "8TCV0FA2YwxY"
},
"outputs": [],
"source": [
"from langchain_astradb import AstraDBVectorStore\n",
"from langchain.indexes import VectorstoreIndexCreator"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "KLWvXEYS_WGA"
},
"outputs": [],
"source": [
"ASTRA_DB_API_ENDPOINT=\"https://79b63042-b3d1-4163-b10a-75c9979ebf59-us-east-2.apps.astra.datastax.com\"\n",
"ASTRA_DB_APPLICATION_TOKEN=\"ASTRA_TOKEN_REMOVEDRyuexWdwLrGymMZnubGtbuZq:b7e36eae7d7f021e542f9f8b541a4ccdd7a5705e077b18887579f56bb0955ad4\"\n",
"ASTRA_DB_KEYSPACE=\"default_keyspace\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "qwf9jP-mFsho"
},
"outputs": [],
"source": [
"vstore = AstraDBVectorStore(\n",
" embedding=embedding,\n",
" collection_name=\"multidoc_vector\",\n",
" api_endpoint=ASTRA_DB_API_ENDPOINT,\n",
" token=ASTRA_DB_APPLICATION_TOKEN,\n",
" namespace=ASTRA_DB_KEYSPACE,\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "PB0OTyiPZYtj"
},
"outputs": [],
"source": [
"inserted_ids = vstore.add_documents(docs)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "MCZF7rhmOEBQ",
"outputId": "3f41cd26-3df0-4d85-859e-a1815abaf89e"
},
"outputs": [],
"source": [
"print(f\"\\nInserted {len(inserted_ids)} documents.\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "U8IkQRVzF9pP"
},
"outputs": [],
"source": [
"prompt_template = \"\"\"\n",
"You are an AI philosopher drawing insights from the roadmap of \"rag,\" \"llama3,\" and \"genai.\"\n",
"Craft thoughtful answers based on this roadmap, mixing and matching existing paths.\n",
"Your responses should be concise and strictly related to the provided context.\n",
"\n",
"ROADMAP CONTEXT:\n",
"{context}\n",
"\n",
"QUESTION: {question}\n",
"\n",
"YOUR ANSWER:\"\"\""
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "DQp4n2tCG-F_"
},
"outputs": [],
"source": [
"prompt_template = ChatPromptTemplate.from_template(prompt_template)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "HLTlpaHDGg6n"
},
"outputs": [],
"source": [
"retriever = vstore.as_retriever(search_kwargs={\"k\": 3})"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "QdvsgC2UG2F4",
"outputId": "82af6575-982b-4b13-efe5-c35b6e23d109"
},
"outputs": [],
"source": [
"retriever"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "jp8EyMrWGxUx"
},
"outputs": [],
"source": [
"llm = ChatOpenAI()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "7HITJ2t3GtNf"
},
"outputs": [],
"source": [
"chain = (\n",
" {\"context\": retriever, \"question\": RunnablePassthrough()}\n",
" | prompt_template\n",
" | llm\n",
" | StrOutputParser()\n",
")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 87
},
"id": "uYnIVzpTcauK",
"outputId": "fdf2ab30-f628-4b8b-d02e-5c8140c8d701"
},
"outputs": [],
"source": [
"chain.invoke(\"can you tell me the roadmap of generative ai?\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 87
},
"id": "jVag171QaHi2",
"outputId": "b99cd96e-c72d-4d74-9f71-c1801cbd76ba"
},
"outputs": [],
"source": [
"chain.invoke(\"what is a llama can you tell me some important point on top of it.\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "M0NfhTCIaRMF"
},
"outputs": [],
"source": []
}
],
"metadata": {
"accelerator": "GPU",
"colab": {
"gpuType": "T4",
"provenance": []
},
"kernelspec": {
"display_name": "Python 3",
"name": "python3"
},
"language_info": {
"name": "python"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
================================================
FILE: Child_to_Parent_Retrieval.ipynb
================================================
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "view-in-github"
},
"source": [
"<a href=\"https://colab.research.google.com/github/sunnysavita10/Generative-AI-Indepth-Basic-to-Advance/blob/main/Child_to_Parent_Retrieval.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "o7u2h6FLqlhE"
},
"source": [
"_using_AstraDB_and_Langchain.ipynb ├── Child_to_Parent_Retrieval.ipynb ├── ConversationEntityMemory.ipynb ├── Conversational_Summary_Memory.ipynb ├── FlashRerankPractical.ipynb ├── Generative AI Dataset/ │ ├── llama3.txt │ └── state_of_the_union.txt ├── Generative AI Interview Questions/ │ └── Generative_AI_Interview_Questions.docx ├── Google Gemini API with Python/ │ └── GeminiAPI_With_Python.ipynb ├── LCEL(Langchain_Expression_Language).ipynb ├── Langchain_memory_classes.ipynb ├── MergerRetriever_and_LongContextReorder.ipynb ├── MongoDB with Pinecone/ │ ├── Mongodb_with_Pinecone_Realtime_RAG_Pipeline_yt.ipynb │ └── Mongodb_with_Pinecone_Realtime_RAG_Pipeline_yt_Part2.ipynb ├── MultiModal RAG/ │ ├── Extract_Image,Table,Text_from_Document_MultiModal_Summrizer_AAG_App_YT.ipynb │ ├── Extract_Image,Table,Text_from_Document_MultiModal_Summrizer_RAG_App.ipynb │ ├── MultiModal RAG using Vertex AI AstraDB(Cassandra) & Langchain.ipynb │ ├── MultiModal_RAG_with_llamaIndex_and_LanceDB.ipynb │ └── Multimodal_RAG_with_Gemini_Langchain_and_Google_AI_Studio_Yt.ipynb ├── MultiModal RAG with Vertex AI/ │ └── MultiModal RAG using Vertex AI AstraDB(Cassandra) & Langchain.ipynb ├── Multilingual AI based Voice Assistant/ │ ├── .gitignore │ ├── README.md │ ├── app.py │ ├── genai_AI_Project.egg-info/ │ │ ├── PKG-INFO │ │ ├── SOURCES.txt │ │ ├── dependency_links.txt │ │ └── top_level.txt │ ├── multilingual_assistant.egg-info/ │ │ ├── PKG-INFO │ │ ├── SOURCES.txt │ │ ├── dependency_links.txt │ │ ├── requires.txt │ │ └── top_level.txt │ ├── requirements.txt │ ├── research/ │ │ └── trials.ipynb │ ├── setup.py │ ├── src/ │ │ ├── __init__.py │ │ └── helper.py │ └── template.py ├── QA_With_Doc_Using_LlamaIndex_Gemini/ │ ├── Data/ │ │ └── MLDOC.txt │ ├── Exception.py │ ├── Experiments/ │ │ ├── ChatWithDoc.ipynb │ │ └── storage/ │ │ ├── default__vector_store.json │ │ ├── docstore.json │ │ ├── graph_store.json │ │ ├── image__vector_store.json │ │ └── index_store.json │ ├── Logger.py │ ├── QAWithPDF/ │ │ ├── __init__.py │ │ ├── data_ingestion.py │ │ ├── embeddings.py │ │ └── model_api.py │ ├── StreamlitApp.py │ ├── Template.py │ ├── logs/ │ │ ├── 02_15_2024_16_21_43.log │ │ ├── 02_15_2024_16_22_49.log │ │ ├── 02_15_2024_16_23_52.log │ │ ├── 02_15_2024_16_26_42.log │ │ ├── 02_15_2024_16_27_41.log │ │ ├── 02_15_2024_16_45_53.log │ │ └── 02_15_2024_16_58_10.log │ ├── requirements.txt │ ├── setup.py │ └── storage/ │ ├── default__vector_store.json │ ├── docstore.json │ ├── graph_store.json │ ├── image__vector_store.json │ └── index_store.json ├── RAG App using Haystack & OpenAI/ │ └── RAG_Application_Using_Haystack_and_OpenAI.ipynb ├── RAG App using LLAMAINDEX & MistralAI/ │ └── RAG_Application_Using_LlamaIndex_and_Mistral_AI.ipynb ├── RAG App using Langchain Mistral Weaviate/ │ └── RAG_Application_Using_LangChain_Mistral_and_Weviate.ipynb ├── RAG App using Langchain OpenAI FAISS/ │ ├── RAG_Application_using_Langchain_OpenAI_API_and_FAISS.ipynb │ └── state_of_the_union.txt ├── RAG App with Mongo Vector Search & Gemma/ │ └── rag_with_huggingface_and_mongodb.ipynb ├── RAG Pipeline from Scratch/ │ └── RAG_Implementation_from _Scartch.ipynb ├── RAG_Fusion.ipynb ├── RAG_With_Knowledge_graph(Neo4j).ipynb ├── RAG_with_LLAMA3_1.ipynb ├── README.md ├── Roadmap of Generative AI/ │ └── Generative_AI_Roadmap.pptx ├── basic_retrieval_and_contextual_compression_retrieval.ipynb └── self_query_retrieval.ipynb
SYMBOL INDEX (11 symbols across 7 files)
FILE: Multilingual AI based Voice Assistant/app.py
function main (line 5) | def main():
FILE: Multilingual AI based Voice Assistant/src/helper.py
function voice_input (line 15) | def voice_input():
function text_to_speech (line 31) | def text_to_speech(text):
function llm_model_object (line 37) | def llm_model_object(user_text):
FILE: QA_With_Doc_Using_LlamaIndex_Gemini/Exception.py
class customexception (line 4) | class customexception(Exception):
method __init__ (line 6) | def __init__(self,error_message,error_details:sys):
method __str__ (line 14) | def __str__(self):
FILE: QA_With_Doc_Using_LlamaIndex_Gemini/QAWithPDF/data_ingestion.py
function load_data (line 6) | def load_data(data):
FILE: QA_With_Doc_Using_LlamaIndex_Gemini/QAWithPDF/embeddings.py
function download_gemini_embedding (line 13) | def download_gemini_embedding(model,document):
FILE: QA_With_Doc_Using_LlamaIndex_Gemini/QAWithPDF/model_api.py
function load_model (line 17) | def load_model():
FILE: QA_With_Doc_Using_LlamaIndex_Gemini/StreamlitApp.py
function main (line 7) | def main():
Condensed preview — 86 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (3,214K chars).
[
{
"path": "Access_APIs_Using_Langchain/LangChain_Complete_Course.ipynb",
"chars": 4996,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"code\",\n \"execution_count\": null,\n \"id\": \"0\",\n \"metadata\": {},\n \"outputs\": [],"
},
{
"path": "Access_APIs_Using_Langchain/requirements.txt",
"chars": 55,
"preview": "langchain\nopenai\nhuggingface_hub\nlangchain_google_genai"
},
{
"path": "Advance RAG Hybrid Search/Hybrid_Search_in_RAG.ipynb",
"chars": 30870,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"markdown\",\n \"metadata\": {\n \"colab_type\": \"text\",\n \"id\": \"view-in-github\"\n }"
},
{
"path": "Advance RAG Reranking from Scratch/Reranking_from_Scratch.ipynb",
"chars": 30492,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"markdown\",\n \"metadata\": {\n \"colab_type\": \"text\",\n \"id\": \"view-in-github\"\n }"
},
{
"path": "Advance RAG with Hybrid Search and Reranker/Hybrid_Search_and_reranking_in_RAG.ipynb",
"chars": 23100,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"markdown\",\n \"metadata\": {\n \"colab_type\": \"text\",\n \"id\": \"view-in-github\"\n }"
},
{
"path": "Chat with Multiple Doc using Astradb and Langchain/Chat_With_Multiple_Doc(pdfs,_docs,_txt,_pptx)_using_AstraDB_and_Langchain.ipynb",
"chars": 23006,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"code\",\n \"execution_count\": null,\n \"metadata\": {\n \"id\": \"9RDOffvrZ3F4\"\n },\n "
},
{
"path": "Child_to_Parent_Retrieval.ipynb",
"chars": 131045,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"markdown\",\n \"metadata\": {\n \"colab_type\": \"text\",\n \"id\": \"view-in-github\"\n }"
},
{
"path": "ConversationEntityMemory.ipynb",
"chars": 15880,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"markdown\",\n \"metadata\": {\n \"colab_type\": \"text\",\n \"id\": \"view-in-github\"\n }"
},
{
"path": "Conversational_Summary_Memory.ipynb",
"chars": 26354,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"markdown\",\n \"metadata\": {\n \"colab_type\": \"text\",\n \"id\": \"view-in-github\"\n }"
},
{
"path": "FlashRerankPractical.ipynb",
"chars": 15763,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"markdown\",\n \"metadata\": {\n \"colab_type\": \"text\",\n \"id\": \"view-in-github\"\n }"
},
{
"path": "Generative AI Dataset/llama3.txt",
"chars": 1552,
"preview": "Llama (Large Language Model Meta AI) is a family of autoregressive large language models released by Meta AI starting in"
},
{
"path": "Generative AI Dataset/state_of_the_union.txt",
"chars": 38540,
"preview": "Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices "
},
{
"path": "Google Gemini API with Python/GeminiAPI_With_Python.ipynb",
"chars": 25359,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"markdown\",\n \"metadata\": {\n \"id\": \"MAVEZiWFTGXH\"\n },\n \"source\": [\n \"\\n\",\n "
},
{
"path": "LCEL(Langchain_Expression_Language).ipynb",
"chars": 32835,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"markdown\",\n \"metadata\": {\n \"colab_type\": \"text\",\n \"id\": \"view-in-github\"\n }"
},
{
"path": "Langchain_memory_classes.ipynb",
"chars": 20130,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"markdown\",\n \"metadata\": {\n \"colab_type\": \"text\",\n \"id\": \"view-in-github\"\n }"
},
{
"path": "MergerRetriever_and_LongContextReorder.ipynb",
"chars": 30590,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"markdown\",\n \"metadata\": {\n \"colab_type\": \"text\",\n \"id\": \"view-in-github\"\n }"
},
{
"path": "MongoDB with Pinecone/Mongodb_with_Pinecone_Realtime_RAG_Pipeline_yt.ipynb",
"chars": 11737,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"code\",\n \"execution_count\": null,\n \"metadata\": {\n \"colab\": {\n \"base_uri\": \""
},
{
"path": "MongoDB with Pinecone/Mongodb_with_Pinecone_Realtime_RAG_Pipeline_yt_Part2.ipynb",
"chars": 18779,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"code\",\n \"execution_count\": null,\n \"metadata\": {\n \"colab\": {\n \"base_uri\": \""
},
{
"path": "MultiModal RAG/Extract_Image,Table,Text_from_Document_MultiModal_Summrizer_AAG_App_YT.ipynb",
"chars": 17633,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"markdown\",\n \"metadata\": {\n \"id\": \"AUeScs9rB6Nk\"\n },\n \"source\": [\n \"# Real"
},
{
"path": "MultiModal RAG/Extract_Image,Table,Text_from_Document_MultiModal_Summrizer_RAG_App.ipynb",
"chars": 152615,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"code\",\n \"execution_count\": null,\n \"metadata\": {\n \"colab\": {\n \"base_uri\": \""
},
{
"path": "MultiModal RAG/MultiModal RAG using Vertex AI AstraDB(Cassandra) & Langchain.ipynb",
"chars": 23457,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"markdown\",\n \"id\": \"0\",\n \"metadata\": {\n \"colab_type\": \"text\",\n \"id\": \"view-i"
},
{
"path": "MultiModal RAG/MultiModal_RAG_with_llamaIndex_and_LanceDB.ipynb",
"chars": 21353,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"markdown\",\n \"metadata\": {\n \"id\": \"Yn8jv85EiZn_\"\n },\n \"source\": [\n \"# **Mu"
},
{
"path": "MultiModal RAG/Multimodal_RAG_with_Gemini_Langchain_and_Google_AI_Studio_Yt.ipynb",
"chars": 13975,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"code\",\n \"execution_count\": null,\n \"metadata\": {\n \"colab\": {\n \"base_uri\": \""
},
{
"path": "MultiModal RAG with Vertex AI/MultiModal RAG using Vertex AI AstraDB(Cassandra) & Langchain.ipynb",
"chars": 23544,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"markdown\",\n \"id\": \"0\",\n \"metadata\": {\n \"colab_type\": \"text\",\n \"id\": \"view-i"
},
{
"path": "Multilingual AI based Voice Assistant/.gitignore",
"chars": 22,
"preview": "multilingual\n/.env\nenv"
},
{
"path": "Multilingual AI based Voice Assistant/README.md",
"chars": 709,
"preview": "# Multilingual Assistant \n\n\n# How to run?\n### STEPS:\n\nClone the repository\n\n```bash\nProject repo: https://github.com/\n``"
},
{
"path": "Multilingual AI based Voice Assistant/app.py",
"chars": 848,
"preview": "import streamlit as st\nfrom src.helper import voice_input, llm_model_object, text_to_speech\n\n\ndef main():\n st.title(\""
},
{
"path": "Multilingual AI based Voice Assistant/genai_AI_Project.egg-info/PKG-INFO",
"chars": 117,
"preview": "Metadata-Version: 2.1\nName: genai-AI-Project\nVersion: 0.0.0\nAuthor: sunny savita\nAuthor-email: sunnysavita@gmail.com\n"
},
{
"path": "Multilingual AI based Voice Assistant/genai_AI_Project.egg-info/SOURCES.txt",
"chars": 198,
"preview": "setup.py\ngenai_AI_Project.egg-info/PKG-INFO\ngenai_AI_Project.egg-info/SOURCES.txt\ngenai_AI_Project.egg-info/dependency_l"
},
{
"path": "Multilingual AI based Voice Assistant/genai_AI_Project.egg-info/dependency_links.txt",
"chars": 1,
"preview": "\n"
},
{
"path": "Multilingual AI based Voice Assistant/genai_AI_Project.egg-info/top_level.txt",
"chars": 4,
"preview": "src\n"
},
{
"path": "Multilingual AI based Voice Assistant/multilingual_assistant.egg-info/PKG-INFO",
"chars": 305,
"preview": "Metadata-Version: 2.1\nName: multilingual-assistant\nVersion: 0.0.1\nAuthor: sunny\nAuthor-email: sunny.savita@ineuron.ai\nRe"
},
{
"path": "Multilingual AI based Voice Assistant/multilingual_assistant.egg-info/SOURCES.txt",
"chars": 277,
"preview": "README.md\nsetup.py\nmultilingual_assistant.egg-info/PKG-INFO\nmultilingual_assistant.egg-info/SOURCES.txt\nmultilingual_ass"
},
{
"path": "Multilingual AI based Voice Assistant/multilingual_assistant.egg-info/dependency_links.txt",
"chars": 1,
"preview": "\n"
},
{
"path": "Multilingual AI based Voice Assistant/multilingual_assistant.egg-info/requires.txt",
"chars": 82,
"preview": "SpeechRecognition\npipwin\npyaudio\ngTTS\ngoogle-generativeai\npython-dotenv\nstreamlit\n"
},
{
"path": "Multilingual AI based Voice Assistant/multilingual_assistant.egg-info/top_level.txt",
"chars": 4,
"preview": "src\n"
},
{
"path": "Multilingual AI based Voice Assistant/requirements.txt",
"chars": 87,
"preview": "SpeechRecognition\npipwin\npyaudio\ngTTS\ngoogle-generativeai\npython-dotenv\nstreamlit\n\n-e ."
},
{
"path": "Multilingual AI based Voice Assistant/research/trials.ipynb",
"chars": 5824,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"code\",\n \"execution_count\": 1,\n \"metadata\": {},\n \"outputs\": [],\n \"source\": [\n "
},
{
"path": "Multilingual AI based Voice Assistant/setup.py",
"chars": 322,
"preview": "from setuptools import find_packages, setup\n\nsetup(\n name=\"multilingual assistant\",\n version=\"0.0.1\",\n author=\""
},
{
"path": "Multilingual AI based Voice Assistant/src/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "Multilingual AI based Voice Assistant/src/helper.py",
"chars": 1170,
"preview": "import speech_recognition as sr\nimport google.generativeai as genai\nfrom dotenv import load_dotenv\nimport os\nfrom gtts i"
},
{
"path": "Multilingual AI based Voice Assistant/template.py",
"chars": 805,
"preview": "import os\nimport logging\nfrom pathlib import Path\n\nlogging.basicConfig(level=logging.INFO, format='[%(asctime)s]: %(mess"
},
{
"path": "QA_With_Doc_Using_LlamaIndex_Gemini/Data/MLDOC.txt",
"chars": 21888,
"preview": "What is machine learning?\nMachine learning is a branch of artificial intelligence (AI) and computer science which\nfocuse"
},
{
"path": "QA_With_Doc_Using_LlamaIndex_Gemini/Exception.py",
"chars": 630,
"preview": "import sys\n\n\nclass customexception(Exception):\n\n def __init__(self,error_message,error_details:sys):\n self.err"
},
{
"path": "QA_With_Doc_Using_LlamaIndex_Gemini/Experiments/ChatWithDoc.ipynb",
"chars": 58882,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"code\",\n \"execution_count\": 1,\n \"metadata\": {},\n \"outputs\": [\n {\n \"data\":"
},
{
"path": "QA_With_Doc_Using_LlamaIndex_Gemini/Experiments/storage/default__vector_store.json",
"chars": 65019,
"preview": "{\"embedding_dict\": {\"35be54dc-30ad-446a-8ceb-55211de07da6\": [-0.002203774, -0.060024295, -0.04713062, -0.0013296342, 0.0"
},
{
"path": "QA_With_Doc_Using_LlamaIndex_Gemini/Experiments/storage/docstore.json",
"chars": 34933,
"preview": "{\"docstore/metadata\": {\"e853545c-0ca1-4b7e-9681-02919ad26522\": {\"doc_hash\": \"1adc40efe6dabc0a3eddca0cbda5a4c97bb0422c871"
},
{
"path": "QA_With_Doc_Using_LlamaIndex_Gemini/Experiments/storage/graph_store.json",
"chars": 18,
"preview": "{\"graph_dict\": {}}"
},
{
"path": "QA_With_Doc_Using_LlamaIndex_Gemini/Experiments/storage/image__vector_store.json",
"chars": 72,
"preview": "{\"embedding_dict\": {}, \"text_id_to_ref_doc_id\": {}, \"metadata_dict\": {}}"
},
{
"path": "QA_With_Doc_Using_LlamaIndex_Gemini/Experiments/storage/index_store.json",
"chars": 751,
"preview": "{\"index_store/data\": {\"f7d0f6e4-6ef2-489e-b7a2-4e5e22a937b6\": {\"__type__\": \"vector_store\", \"__data__\": \"{\\\"index_id\\\": \\"
},
{
"path": "QA_With_Doc_Using_LlamaIndex_Gemini/Logger.py",
"chars": 526,
"preview": "import logging\nimport os\nfrom datetime import datetime\n\nLOG_FILE=f\"{datetime.now().strftime('%m_%d_%Y_%H_%M_%S')}.log\"\n\n"
},
{
"path": "QA_With_Doc_Using_LlamaIndex_Gemini/QAWithPDF/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "QA_With_Doc_Using_LlamaIndex_Gemini/QAWithPDF/data_ingestion.py",
"chars": 732,
"preview": "from llama_index.core import SimpleDirectoryReader\nimport sys\nfrom Exception import customexception\nfrom Logger import l"
},
{
"path": "QA_With_Doc_Using_LlamaIndex_Gemini/QAWithPDF/embeddings.py",
"chars": 1210,
"preview": "from llama_index.core import VectorStoreIndex\nfrom llama_index.core import ServiceContext\nfrom llama_index.core import S"
},
{
"path": "QA_With_Doc_Using_LlamaIndex_Gemini/QAWithPDF/model_api.py",
"chars": 713,
"preview": "import os\nfrom dotenv import load_dotenv\nimport sys\n\nfrom llama_index.llms.gemini import Gemini\nfrom IPython.display imp"
},
{
"path": "QA_With_Doc_Using_LlamaIndex_Gemini/StreamlitApp.py",
"chars": 881,
"preview": "import streamlit as st\nfrom QAWithPDF.data_ingestion import load_data\nfrom QAWithPDF.embeddings import download_gemini_e"
},
{
"path": "QA_With_Doc_Using_LlamaIndex_Gemini/Template.py",
"chars": 719,
"preview": "import os\nfrom pathlib import Path\nimport logging\n\n\nlist_of_files=[\n \"QAWithPDF/__init__.py\",\n \"QAWithPDF/helper.p"
},
{
"path": "QA_With_Doc_Using_LlamaIndex_Gemini/logs/02_15_2024_16_21_43.log",
"chars": 0,
"preview": ""
},
{
"path": "QA_With_Doc_Using_LlamaIndex_Gemini/logs/02_15_2024_16_22_49.log",
"chars": 139,
"preview": "[2024-02-15 16:23:21,778] 17 root - INFO - data loading started...\n[2024-02-15 16:23:22,114] 23 root - INFO - exception "
},
{
"path": "QA_With_Doc_Using_LlamaIndex_Gemini/logs/02_15_2024_16_23_52.log",
"chars": 268,
"preview": "[2024-02-15 16:24:13,493] 17 root - INFO - data loading started...\n[2024-02-15 16:24:13,796] 20 root - INFO - data loadi"
},
{
"path": "QA_With_Doc_Using_LlamaIndex_Gemini/logs/02_15_2024_16_26_42.log",
"chars": 0,
"preview": ""
},
{
"path": "QA_With_Doc_Using_LlamaIndex_Gemini/logs/02_15_2024_16_27_41.log",
"chars": 2680,
"preview": "[2024-02-15 16:28:32,771] 17 root - INFO - data loading started...\n[2024-02-15 16:28:33,067] 20 root - INFO - data loadi"
},
{
"path": "QA_With_Doc_Using_LlamaIndex_Gemini/logs/02_15_2024_16_45_53.log",
"chars": 268,
"preview": "[2024-02-15 16:46:23,481] 17 root - INFO - data loading started...\n[2024-02-15 16:46:23,570] 20 root - INFO - data loadi"
},
{
"path": "QA_With_Doc_Using_LlamaIndex_Gemini/logs/02_15_2024_16_58_10.log",
"chars": 1340,
"preview": "[2024-02-15 16:59:17,283] 17 root - INFO - data loading started...\n[2024-02-15 16:59:17,318] 20 root - INFO - data loadi"
},
{
"path": "QA_With_Doc_Using_LlamaIndex_Gemini/requirements.txt",
"chars": 123,
"preview": "llama-index\ngoogle-generativeai\nllama-index-llms-gemini\npypdf\npython-dotenv\nIPython\nllama-index-embeddings-gemini\nstream"
},
{
"path": "QA_With_Doc_Using_LlamaIndex_Gemini/setup.py",
"chars": 232,
"preview": "from setuptools import find_packages, setup\n\nsetup(\n name = 'QApplication',\n version= '0.0.1',\n author= 'sunny "
},
{
"path": "QA_With_Doc_Using_LlamaIndex_Gemini/storage/default__vector_store.json",
"chars": 65033,
"preview": "{\"embedding_dict\": {\"488d9176-adb9-4aa4-be31-c79adbf45c9a\": [-0.003852138, -0.061276186, -0.047077414, 0.0011844443, 0.0"
},
{
"path": "QA_With_Doc_Using_LlamaIndex_Gemini/storage/docstore.json",
"chars": 34861,
"preview": "{\"docstore/metadata\": {\"f82f40b2-29b4-4536-b1a3-9f272306d5cd\": {\"doc_hash\": \"233d2f3b87af08e48b322323d0b7ce130fce41f511a"
},
{
"path": "QA_With_Doc_Using_LlamaIndex_Gemini/storage/graph_store.json",
"chars": 18,
"preview": "{\"graph_dict\": {}}"
},
{
"path": "QA_With_Doc_Using_LlamaIndex_Gemini/storage/image__vector_store.json",
"chars": 72,
"preview": "{\"embedding_dict\": {}, \"text_id_to_ref_doc_id\": {}, \"metadata_dict\": {}}"
},
{
"path": "QA_With_Doc_Using_LlamaIndex_Gemini/storage/index_store.json",
"chars": 751,
"preview": "{\"index_store/data\": {\"3a5a6a94-7296-41c3-b23a-b18b995074ee\": {\"__type__\": \"vector_store\", \"__data__\": \"{\\\"index_id\\\": \\"
},
{
"path": "RAG App using Haystack & OpenAI/RAG_Application_Using_Haystack_and_OpenAI.ipynb",
"chars": 18779,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"markdown\",\n \"metadata\": {\n \"id\": \"ozHSXlNCxdsr\"\n },\n \"source\": [\n \"# **Ha"
},
{
"path": "RAG App using LLAMAINDEX & MistralAI/RAG_Application_Using_LlamaIndex_and_Mistral_AI.ipynb",
"chars": 16785,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"code\",\n \"execution_count\": null,\n \"metadata\": {\n \"colab\": {\n \"base_uri\": \""
},
{
"path": "RAG App using Langchain Mistral Weaviate/RAG_Application_Using_LangChain_Mistral_and_Weviate.ipynb",
"chars": 14856,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"code\",\n \"execution_count\": null,\n \"metadata\": {\n \"colab\": {\n \"base_uri\": \""
},
{
"path": "RAG App using Langchain OpenAI FAISS/RAG_Application_using_Langchain_OpenAI_API_and_FAISS.ipynb",
"chars": 146126,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"markdown\",\n \"metadata\": {\n \"id\": \"6JUkLoO0l9RC\"\n },\n \"source\": [\n \"#What "
},
{
"path": "RAG App using Langchain OpenAI FAISS/state_of_the_union.txt",
"chars": 38540,
"preview": "Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices "
},
{
"path": "RAG App with Mongo Vector Search & Gemma/rag_with_huggingface_and_mongodb.ipynb",
"chars": 32239,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"code\",\n \"execution_count\": null,\n \"metadata\": {\n \"id\": \"L1-5cYCKA4XS\"\n },\n "
},
{
"path": "RAG Pipeline from Scratch/RAG_Implementation_from _Scartch.ipynb",
"chars": 514080,
"preview": "{\n \"cells\": [\n {\n \"attachments\": {\n \"image.png\": {\n \"image/png\": \"iVBORw0KGgoAAAANSUhEUgAABrgAAAKkCAYAAABbDT9N"
},
{
"path": "RAG_Fusion.ipynb",
"chars": 25339,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"markdown\",\n \"metadata\": {\n \"colab_type\": \"text\",\n \"id\": \"view-in-github\"\n }"
},
{
"path": "RAG_With_Knowledge_graph(Neo4j).ipynb",
"chars": 20701,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"markdown\",\n \"metadata\": {\n \"colab_type\": \"text\",\n \"id\": \"view-in-github\"\n }"
},
{
"path": "RAG_with_LLAMA3_1.ipynb",
"chars": 906428,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"markdown\",\n \"metadata\": {\n \"colab_type\": \"text\",\n \"id\": \"view-in-github\"\n }"
},
{
"path": "README.md",
"chars": 159,
"preview": "# All of the tutorials are available on my YouTube channel; please visit there.\n\nYoutube Channel Link: https://youtube.c"
},
{
"path": "basic_retrieval_and_contextual_compression_retrieval.ipynb",
"chars": 21447,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"markdown\",\n \"metadata\": {\n \"colab_type\": \"text\",\n \"id\": \"view-in-github\"\n }"
},
{
"path": "self_query_retrieval.ipynb",
"chars": 313997,
"preview": "{\n \"cells\": [\n {\n \"cell_type\": \"markdown\",\n \"metadata\": {\n \"colab_type\": \"text\",\n \"id\": \"view-in-github\"\n }"
}
]
// ... and 2 more files (download for full content)
About this extraction
This page contains the full source code of the sunnysavita10/Generative-AI-Indepth-Basic-to-Advance GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 86 files (3.0 MB), approximately 781.8k tokens, and a symbol index with 11 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.
Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.