1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112
| from langchain.document_loaders import PyMuPDFLoader
def load_pdf_data(file_path): loader = PyMuPDFLoader(file_path=file_path) docs = loader.load() return docs
def split_docs(documents, chunk_size=1000, chunk_overlap=20): text_splitter = RecursiveCharacterTextSplitter( chunk_size=chunk_size, chunk_overlap=chunk_overlap ) chunks = text_splitter.split_documents(documents=documents) return chunks
def load_embedding_model(model_path, normalize_embedding=True): return OllamaEmbeddings(base_url="http://localhost:11434", model="nomic-embed-text")
def create_embeddings(chunks, embedding_model, storing_path="vectorstore"): vectorstore = Chroma.from_documents(chunks, embedding_model) vectorstore.save_local(storing_path) return vectorstore
prompt = """ ### System: You are an AI Assistant that follows instructions extreamly well. \ Help as much as you can. ### User: {prompt} ### Response: """
template = """ ### System: You are an respectful and honest assistant. You have to answer the user's \ questions using only the context provided to you. If you don't know the answer, \ just say you don't know. Don't try to make up an answer. ### Context: {context} ### User: {question} ### Response: """
def load_qa_chain(retriever, llm, prompt): return RetrievalQA.from_chain_type( llm=llm, retriever=retriever, chain_type="stuff", return_source_documents=True, chain_type_kwargs={'prompt': prompt} )
def get_response(query, chain): response = chain({'query': query}) wrapped_text = textwrap.fill(response['result'], width=100) print(wrapped_text)
from lang_funcs import * from langchain.llms import Ollama from langchain import PromptTemplate
llm = Ollama(model="orca-mini", temperature=0)
embed = load_embedding_model(model_path="all-MiniLM-L6-v2")
docs = load_pdf_data(file_path="data/ml_book.pdf") documents = split_docs(documents=docs)
vectorstore = create_embeddings(documents, embed)
retriever = vectorstore.as_retriever()
prompt = PromptTemplate.from_template(template)
chain = load_qa_chain(retriever, llm, prompt)
>>>get_response("What is random forest?", chain)
|