Chat with PDF docs using AI (quoting sources)

Chat with PDF docs using AI (quoting sources) Efficiently extract information and generate AI-powered responses directly from your Google Drive PDF documents with this powerful n8n workflow. This automation connects Google Drive, Pinecone, and OpenAI to enable intelligent querying of your document library. When you manually trigger the workflow, it first retrieves a specified PDF from Google Drive using the Download file node. The document content is then processed by the Recursive Character Text Splitter and embedded into a Pinecone vector store using the Embeddings OpenAI and Add to Pinecone vector store nodes, making it searchable. For each query, the Get top chunks matching query node retrieves the most relevant sections from Pinecone, which are then fed to the OpenAI Chat Model via the Answer the query based on chunks node. This allows the AI to provide accurate answers, complete with citations back to the original document sections, thanks to the Structured Output Parser. This workflow is ideal for researchers, legal professionals, and anyone needing to quickly find specific information within large PDF archives, saving significant time and effort in manual document review and ensuring factual accuracy in AI-generated summaries or answers.

22 nodesmanual trigger232 views0 copiesData

Google DrivePineconeOpenAI

Workflow JSON

{"meta": {"instanceId": "cb484ba7b742928a2048bf8829668bed5b5ad9787579adea888f05980292a4a7", "templateId": "1960"}, "nodes": [{"id": "296a935f-bd02-44bc-9e1e-3e4d6a307e38", "name": "When clicking \"Execute Workflow\"", "type": "n8n-nodes-base.manualTrigger", "position": [260, 240], "parameters": {}, "typeVersion": 1}, {"id": "61a38c00-f196-4b01-9274-c5e0f4c511bc", "name": "Embeddings OpenAI", "type": "@n8n/n8n-nodes-langchain.embeddingsOpenAi", "position": [1060, 460], "parameters": {"options": {}}, "credentials": {"openAiApi": {"id": "", "name": "[Your openAiApi]"}}, "typeVersion": 1}, {"id": "816066bd-02e8-4de2-bcee-ab81d890435a", "name": "Sticky Note", "type": "n8n-nodes-base.stickyNote", "position": [426.9261940355327, 60.389291053299075], "parameters": {"color": 7, "width": 1086.039382705461, "height": 728.4168721167887, "content": "## 1. Setup: Fetch file from Google Drive, split it into chunks and insert into a vector database\nNote that running this part multiple times will insert multiple copies into your DB"}, "typeVersion": 1}, {"id": "30cd81ad-d658-4c33-9a38-68e33b74cdae", "name": "Default Data Loader", "type": "@n8n/n8n-nodes-langchain.documentDefaultDataLoader", "position": [1240, 460], "parameters": {"options": {"metadata": {"metadataValues": [{"name": "file_url", "value": "={{ $json.file_url }}"}, {"name": "file_name", "value": "={{ $('Add in metadata').item.json.file_name }}"}]}}, "dataType": "binary"}, "typeVersion": 1}, {"id": "718f09e0-67be-41a6-a90d-f58e64ffee4d", "name": "Set file URL in Google Drive", "type": "n8n-nodes-base.set", "position": [480, 240], "parameters": {"options": {}, "assignments": {"assignments": [{"id": "50025ff5-1b53-475f-b150-2aafef1c4c21", "name": "file_url", "type": "string", "value": " https://drive.google.com/file/d/11Koq9q53nkk0F5Y8eZgaWJUVR03I4-MM/view"}]}}, "typeVersion": 3.3}, {"id": "8f536a96-a6b1-4291-9cac-765759c396a8", "name": "Sticky Note2", "type": "n8n-nodes-base.stickyNote", "position": [-40, 140], "parameters": {"height": 350.7942096493649, "content": "# Try me out\n1. In Pinecone, create an index with 1536 dimensions and select it in the two vector store nodes\n2. Populate Pinecone by clicking the 'test workflow' button below\n3. Click the 'chat' button below and enter the following:\n\n_Which email provider does the creator of Bitcoin use?_"}, "typeVersion": 1}, {"id": "ec7c9407-93c3-47a6-90f2-6e6056f5af84", "name": "Add in metadata", "type": "n8n-nodes-base.code", "position": [900, 240], "parameters": {"mode": "runOnceForEachItem", "jsCode": "// Add a new field called 'myNewField' to the JSON of the item\n$input.item.json.file_name = $input.item.binary.data.fileName;\n$input.item.json.file_ext = $input.item.binary.data.fileExtension;\n$input.item.json.file_url = $('Set file URL in Google Drive').item.json.file_url\n\nreturn $input.item;"}, "typeVersion": 2}, {"id": "ab3131d5-4b04-48b4-b5d5-787e3ed18917", "name": "Download file", "type": "n8n-nodes-base.googleDrive", "position": [680, 240], "parameters": {"fileId": {"__rl": true, "mode": "url", "value": "={{ $json.file_url }}"}, "options": {}, "operation": "download"}, "credentials": {"googleDriveOAuth2Api": {"id": "", "name": "[Your googleDriveOAuth2Api]"}}, "typeVersion": 3}, {"id": "764a865c-7efe-4eec-a34c-cc87c5f085b1", "name": "Chat Trigger", "type": "@n8n/n8n-nodes-langchain.chatTrigger", "position": [260, 960], "webhookId": "1727c687-aed0-49cf-96af-e7796819fbb3", "parameters": {}, "typeVersion": 1}, {"id": "36cd9a8d-7d89-49b3-8a81-baa278201a21", "name": "Prepare chunks", "type": "n8n-nodes-base.code", "position": [1080, 960], "parameters": {"jsCode": "let out = \"\"\nfor (const i in $input.all()) {\n let itemText = \"--- CHUNK \" + i + \" ---\\n\"\n itemText += $input.all()[i].json.document.pageContent + \"\\n\"\n itemText += \"\\n\"\n out += itemText\n}\n\nreturn {\n 'context': out\n};"}, "typeVersion": 2}, {"id": "6356bce2-9aae-43ed-97ce-a27cbfb80df9", "name": "Embeddings OpenAI2", "type": "@n8n/n8n-nodes-langchain.embeddingsOpenAi", "position": [700, 1180], "parameters": {"options": {}}, "credentials": {"openAiApi": {"id": "", "name": "[Your openAiApi]"}}, "typeVersion": 1}, {"id": "8fb697ea-f2e5-4105-b6c8-ab869c2e5ab2", "name": "OpenAI Chat Model", "type": "@n8n/n8n-nodes-langchain.lmChatOpenAi", "position": [1320, 1180], "parameters": {"options": {}}, "credentials": {"openAiApi": {"id": "", "name": "[Your openAiApi]"}}, "typeVersion": 1}, {"id": "9a2b0152-d008-42cb-bc10-495135d5ef45", "name": "Set max chunks to send to model", "type": "n8n-nodes-base.set", "position": [480, 960], "parameters": {"options": {}, "assignments": {"assignments": [{"id": "236047ff-75a2-47fd-b338-1e9763c4015e", "name": "chunks", "type": "number", "value": 4}]}, "includeOtherFields": true}, "typeVersion": 3.3}, {"id": "f2ab813f-0f0c-4d3a-a1de-7896ad736698", "name": "Structured Output Parser", "type": "@n8n/n8n-nodes-langchain.outputParserStructured", "position": [1500, 1180], "parameters": {"jsonSchema": "{\n \"type\": \"object\",\n \"properties\": {\n \"answer\": {\n \"type\": \"string\"\n },\n \"citations\": {\n \"type\": \"array\",\n \"items\": {\n \"type\": \"number\"\n }\n }\n }\n}"}, "typeVersion": 1}, {"id": "ada2a38b-0f6e-4115-97c0-000e97a5e62e", "name": "Compose citations", "type": "n8n-nodes-base.set", "position": [1680, 960], "parameters": {"options": {}, "assignments": {"assignments": [{"id": "67ecefcf-a30c-4cc4-89ca-b9b23edd6585", "name": "citations", "type": "array", "value": "={{ $json.citations.map(i => '[' + $('Get top chunks matching query').all()[$json.citations].json.document.metadata.file_name + ', lines ' + $('Get top chunks matching query').all()[$json.citations].json.document.metadata['loc.lines.from'] + '-' + $('Get top chunks matching query').all()[$json.citations].json.document.metadata['loc.lines.to'] + ']') }}"}]}, "includeOtherFields": true}, "typeVersion": 3.3}, {"id": "8e115308-532e-4afd-b766-78e54c861f33", "name": "Generate response", "type": "n8n-nodes-base.set", "position": [1900, 960], "parameters": {"options": {}, "assignments": {"assignments": [{"id": "d77956c4-0ff4-4c64-80c2-9da9d4c8ad34", "name": "text", "type": "string", "value": "={{ $json.answer }} {{ $if(!$json.citations.isEmpty(), \"\\n\" + $json.citations.join(\"\"), '') }}"}]}}, "typeVersion": 3.3}, {"id": "40c5f9d8-38da-41ac-ab99-98f6010ba8bf", "name": "Sticky Note1", "type": "n8n-nodes-base.stickyNote", "position": [428.71587064297796, 840], "parameters": {"color": 7, "width": 1693.989843925635, "height": 548.5086735412393, "content": "## 2. Chat with file, getting citations in reponse"}, "typeVersion": 1}, {"id": "ef357a2b-bc8d-43f7-982f-73c3a85a60be", "name": "Answer the query based on chunks", "type": "@n8n/n8n-nodes-langchain.chainLlm", "position": [1300, 960], "parameters": {"text": "=Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer. Important: In your response, also include the the indexes of the chunks you used to generate the answer.\n\n{{ $json.context }}\n\nQuestion: {{ $(\"Chat Trigger\").first().json.chatInput }}\nHelpful Answer:", "promptType": "define", "hasOutputParser": true}, "typeVersion": 1.4}, {"id": "cbb1b60c-b396-4f0e-8dc6-dfa41dbb178e", "name": "Sticky Note4", "type": "n8n-nodes-base.stickyNote", "position": [442.5682587140436, 150.50554725042372], "parameters": {"color": 7, "width": 179.58883583572606, "height": 257.75985739596473, "content": "Will fetch the Bitcoin whitepaper, but you can change this"}, "typeVersion": 1}, {"id": "1a5511b9-5a24-40d5-a5b1-830376226e4e", "name": "Get top chunks matching query", "type": "@n8n/n8n-nodes-langchain.vectorStorePinecone", "position": [700, 960], "parameters": {"mode": "load", "topK": "={{ $json.chunks }}", "prompt": "={{ $json.chatInput }}", "options": {}, "pineconeIndex": {"__rl": true, "mode": "list", "value": "test-index", "cachedResultName": "test-index"}}, "credentials": {"pineconeApi": {"id": "", "name": "[Your pineconeApi]"}}, "typeVersion": 1}, {"id": "d8d210cf-f12e-4e82-9b28-f531d2ff14a6", "name": "Add to Pinecone vector store", "type": "@n8n/n8n-nodes-langchain.vectorStorePinecone", "position": [1120, 240], "parameters": {"mode": "insert", "options": {}, "pineconeIndex": {"__rl": true, "mode": "list", "value": "test-index", "cachedResultName": "test-index"}}, "credentials": {"pineconeApi": {"id": "", "name": "[Your pineconeApi]"}}, "typeVersion": 1}, {"id": "c501568b-fb49-487d-bced-757e3d7ed13c", "name": "Recursive Character Text Splitter", "type": "@n8n/n8n-nodes-langchain.textSplitterRecursiveCharacterTextSplitter", "position": [1240, 620], "parameters": {"chunkSize": 3000, "chunkOverlap": 200}, "typeVersion": 1}], "pinData": {}, "connections": {"Chat Trigger": {"main": [[{"node": "Set max chunks to send to model", "type": "main", "index": 0}]]}, "Download file": {"main": [[{"node": "Add in metadata", "type": "main", "index": 0}]]}, "Prepare chunks": {"main": [[{"node": "Answer the query based on chunks", "type": "main", "index": 0}]]}, "Add in metadata": {"main": [[{"node": "Add to Pinecone vector store", "type": "main", "index": 0}]]}, "Compose citations": {"main": [[{"node": "Generate response", "type": "main", "index": 0}]]}, "Embeddings OpenAI": {"ai_embedding": [[{"node": "Add to Pinecone vector store", "type": "ai_embedding", "index": 0}]]}, "OpenAI Chat Model": {"ai_languageModel": [[{"node": "Answer the query based on chunks", "type": "ai_languageModel", "index": 0}]]}, "Embeddings OpenAI2": {"ai_embedding": [[{"node": "Get top chunks matching query", "type": "ai_embedding", "index": 0}]]}, "Default Data Loader": {"ai_document": [[{"node": "Add to Pinecone vector store", "type": "ai_document", "index": 0}]]}, "Structured Output Parser": {"ai_outputParser": [[{"node": "Answer the query based on chunks", "type": "ai_outputParser", "index": 0}]]}, "Set file URL in Google Drive": {"main": [[{"node": "Download file", "type": "main", "index": 0}]]}, "Get top chunks matching query": {"main": [[{"node": "Prepare chunks", "type": "main", "index": 0}]]}, "Set max chunks to send to model": {"main": [[{"node": "Get top chunks matching query", "type": "main", "index": 0}]]}, "Answer the query based on chunks": {"main": [[{"node": "Compose citations", "type": "main", "index": 0}]]}, "When clicking \"Execute Workflow\"": {"main": [[{"node": "Set file URL in Google Drive", "type": "main", "index": 0}]]}, "Recursive Character Text Splitter": {"ai_textSplitter": [[{"node": "Default Data Loader", "type": "ai_textSplitter", "index": 0}]]}}}

How to Import This Workflow

1Copy the workflow JSON above using the Copy Workflow JSON button.
2Open your n8n instance and go to Workflows.
3Click Import from JSON and paste the copied workflow.

Don't have an n8n instance? Start your free trial at n8nautomation.cloud

Related Templates

Ask questions about a PDF using AI

Effortlessly transform your Google Drive PDFs into an interactive knowledge base with this powerful AI workflow. This n8n automation connects your Google Drive files, processes them with OpenAI embeddings, and stores them in a Pinecone vector database, allowing you to ask questions and receive intelligent answers directly from your document content. When a new PDF is uploaded to Google Drive, the workflow automatically extracts its text, splits it into manageable chunks using the Recursive Character Text Splitter, generates embeddings via OpenAI, and then inserts this structured data into Pinecone for efficient retrieval. Later, by clicking the 'Chat' button, you can engage in a natural language conversation with your document, powered by the OpenAI Chat Model and the Question and Answer Chain, which retrieves relevant information from Pinecone. This is ideal for researchers needing to quickly extract insights from large reports, legal professionals analyzing contracts, or businesses creating searchable knowledge bases from their documentation, saving countless hours of manual review and information searching.

16 nodes

Supabase Insertion & Upsertion & Retrieval

Efficiently manage and query your data with the Supabase Insertion & Upsertion & Retrieval workflow, a powerful solution for integrating document management with intelligent data processing. This 21-node workflow, triggered manually, connects Google Drive, Supabase, and OpenAI to automate the ingestion, updating, and retrieval of information. It allows you to upload documents from Google Drive, which are then processed by a Recursive Character Text Splitter and embedded using OpenAI Embeddings for insertion or upsertion into your Supabase vector store via the Insert Documents and Update Documents nodes. When a chat message is received, the workflow leverages OpenAI's Chat Model and a Question and Answer Chain to retrieve relevant information from Supabase using the Retrieve by Query node, providing intelligent responses based on your stored documents. This workflow is ideal for businesses and individuals who need to maintain an up-to-date knowledge base, power AI-driven chatbots with proprietary information, or automate the synchronization of document content with a searchable database, significantly reducing manual data entry and improving information accessibility.

21 nodes

Chat with Postgresql Database

Empower your users to interact with your PostgreSQL database using natural language by automating the process of querying and retrieving information. This workflow connects a chat interface, triggered by a new message, to an AI Agent that leverages OpenAI's powerful language model to understand user requests. The AI Agent intelligently utilizes a suite of PostgreSQL tools, including "Get Table Definition," "Execute SQL Query," and "Get DB Schema and Tables List," to dynamically fetch database schema, generate appropriate SQL queries, and execute them against your database. Chat history is maintained using an AI memory buffer, allowing for contextual conversations. This solution is ideal for support teams needing quick data lookups, business analysts exploring data without writing SQL, or developers building interactive data dashboards. It eliminates the need for manual SQL query writing, speeds up data access, and reduces the training burden for non-technical users, saving significant time and resources while improving data accessibility.

11 nodes

Ready to automate with n8n?

Get affordable managed n8n hosting with 24/7 support.