Transcribing Bank Statements To Markdown Using Gemini Vision AI
Automate the tedious process of transcribing bank statements into structured Markdown with this powerful n8n workflow. This solution leverages Google Gemini Vision AI to intelligently extract financial data from PDF bank statements stored in Google Drive, transforming scanned documents into easily parsable text. It begins by fetching a specified bank statement PDF from Google Drive upon manual trigger, then splits the PDF into individual image pages. These images are then resized for optimal AI processing before being fed to Google Gemini Vision AI for transcription. The AI identifies and extracts deposit table rows, and a final AI chain node converts this raw data into a clean, organized Markdown format. This workflow is ideal for financial analysts, small business owners, or anyone needing to quickly digitize and analyze physical bank records, saving significant time and reducing manual data entry errors.
Workflow JSON
{"meta": {"instanceId": "408f9fb9940c3cb18ffdef0e0150fe342d6e655c3a9fac21f0f644e8bedabcd9"}, "nodes": [{"id": "490493d1-e9ac-458a-ac9e-a86048ce6169", "name": "When clicking \u2018Test workflow\u2019", "type": "n8n-nodes-base.manualTrigger", "position": [-700, 260], "parameters": {}, "typeVersion": 1}, {"id": "116f1137-632f-4021-ad0f-cf59ed1776fd", "name": "Google Gemini Chat Model", "type": "@n8n/n8n-nodes-langchain.lmChatGoogleGemini", "position": [980, 440], "parameters": {"options": {}, "modelName": "models/gemini-1.5-pro-latest"}, "credentials": {"googlePalmApi": {"id": "", "name": "[Your googlePalmApi]"}}, "typeVersion": 1}, {"id": "44695b4f-702c-4230-9ec3-e37447fed38e", "name": "Sort Pages", "type": "n8n-nodes-base.sort", "position": [400, 320], "parameters": {"options": {}, "sortFieldsUi": {"sortField": [{"fieldName": "fileName"}]}}, "typeVersion": 1}, {"id": "f2575b2c-0808-464e-b982-1eed8e0d9df7", "name": "Sticky Note", "type": "n8n-nodes-base.stickyNote", "position": [-1280, 0], "parameters": {"width": 437.0502325581392, "height": 430.522325581395, "content": "## Try Me Out!\n\n### This workflow converts a bank statement to markdown, faithfully capturing the details using the power of Vision Language Models (\"VLMs\"). The resulting markdown can then be parsed again by your standard LLM to extract data such as identifying all deposit table rows in the document.\n\nThis workflow is able to handle both downloaded PDFs as well as scanned PDFs. Be sure to protect sensitive data before running this workflow.\n\n### Need Help?\nJoin the [Discord](https://discord.com/invite/XPKeKXeB7d) or ask in the [Forum](https://community.n8n.io/)!"}, "typeVersion": 1}, {"id": "d62d7b0e-29eb-48a9-a471-4279e663c521", "name": "Get Bank Statement", "type": "n8n-nodes-base.googleDrive", "position": [-500, 260], "parameters": {"fileId": {"__rl": true, "mode": "id", "value": "1wS9U7MQDthj57CvEcqG_Llkr-ek6RqGA"}, "options": {}, "operation": "download"}, "credentials": {"googleDriveOAuth2Api": {"id": "", "name": "[Your googleDriveOAuth2Api]"}}, "typeVersion": 3}, {"id": "1329973b-a4e0-4272-9e24-3674bb9d4923", "name": "Split PDF into Images", "type": "n8n-nodes-base.httpRequest", "position": [-140, 320], "parameters": {"url": "http://stirling-pdf:8080/api/v1/convert/pdf/img", "method": "POST", "options": {}, "sendBody": true, "contentType": "multipart-form-data", "bodyParameters": {"parameters": [{"name": "fileInput", "parameterType": "formBinaryData", "inputDataFieldName": "data"}, {"name": "imageFormat", "value": "jpg"}, {"name": "singleOrMultiple", "value": "multiple"}, {"name": "dpi", "value": "300"}]}}, "typeVersion": 4.2}, {"id": "4e263346-9f55-4316-a505-4a54061ccfbb", "name": "Extract Zip File", "type": "n8n-nodes-base.compression", "position": [40, 320], "parameters": {}, "typeVersion": 1.1}, {"id": "5e97072f-a7c5-45aa-99d1-3231a9230b53", "name": "Images To List", "type": "n8n-nodes-base.code", "position": [220, 320], "parameters": {"jsCode": "let results = [];\n\nfor (item of items) {\n for (key of Object.keys(item.binary)) {\n results.push({\n json: {\n fileName: item.binary[key].fileName\n },\n binary: {\n data: item.binary[key],\n }\n });\n }\n}\n\nreturn results;"}, "typeVersion": 2}, {"id": "62836c73-4cf7-4225-a45d-0cd62b7e227d", "name": "Resize Images For AI", "type": "n8n-nodes-base.editImage", "position": [800, 280], "parameters": {"width": 75, "height": 75, "options": {}, "operation": "resize", "resizeOption": "percent"}, "typeVersion": 1}, {"id": "59fc6716-9826-4463-be33-923a8f6f33f1", "name": "Sticky Note1", "type": "n8n-nodes-base.stickyNote", "position": [-820, 0], "parameters": {"color": 7, "width": 546.4534883720931, "height": 478.89348837209275, "content": "## 1. Download Bank Statement PDF\n[Read more about Google Drive node](https://docs.n8n.io/integrations/builtin/app-nodes/n8n-nodes-base.googledrive)\n\nFor this demonstration, we'll pull an example bank statement off Google Drive however, you can also swap this out for other triggers such as webhook.\n\nYou can use the example bank statement created specifically for this workflow here: https://drive.google.com/file/d/1wS9U7MQDthj57CvEcqG_Llkr-ek6RqGA/view?usp=sharing"}, "typeVersion": 1}, {"id": "8e68a295-ff35-4d28-86bb-c8ea5664b3c6", "name": "Sticky Note2", "type": "n8n-nodes-base.stickyNote", "position": [-240, 3.173953488372149], "parameters": {"color": 7, "width": 848.0232558139535, "height": 533.5469767441862, "content": "## 2. Split PDF Pages into Seperate Images\n\nCurrently, the vision model we'll be using can't accept raw PDFs so we'll have to convert our PDF to a image in order to use it. To achieve this, we'll use the free [Stirling PDF webservice](https://stirlingpdf.io/) for convenience but if we need data privacy (recommended!), we could self-host our own [Stirling PDF instance](https://github.com/Stirling-Tools/Stirling-PDF/) instead. Alternatively, feel free to swap this service out for one of your own as long as it can convert PDFs into images!\n\nWe will ask the PDF service to return each page of our statement as separate images, which it does so as a zip file. Next steps is to just unzip the file and convert the output as a list of images."}, "typeVersion": 1}, {"id": "5286aa35-9687-4d5b-987c-79322a1ddc84", "name": "Sticky Note3", "type": "n8n-nodes-base.stickyNote", "position": [640, -40], "parameters": {"color": 7, "width": 775.3441860465115, "height": 636.0809302325588, "content": "## 3. Convert PDF Pages to Markdown Using Vision Model\n[Learn more about using the Basic LLM node](https://docs.n8n.io/integrations/builtin/cluster-nodes/root-nodes/n8n-nodes-langchain.chainllm)\n\nUnlike traditional OCR, vision models (\"VLMs\") \"transcribe\" what they see so while we shouldn't expect an exact replication of a document, they may perform better making sense of complex document layouts ie. such as with horizontally stacked tables.\n \nIn this demonstration, we can transcribe our bank statement scans to markdown text for the purpose of further processing. With markdown, we can retain tables or columnar data found in the document. We'll employ two optimisations however as a workaround for token and timeout limits (1) we'll only transcribe one page at a time and (2) we'll shrink the pages just a little just enough to speed up processing but not enough to reduce our required resolution."}, "typeVersion": 1}, {"id": "49deef00-4617-4b19-a56f-08fd195dfb82", "name": "Google Gemini Chat Model1", "type": "@n8n/n8n-nodes-langchain.lmChatGoogleGemini", "position": [1760, 480], "parameters": {"options": {"safetySettings": {"values": [{"category": "HARM_CATEGORY_DANGEROUS_CONTENT", "threshold": "BLOCK_NONE"}]}}, "modelName": "models/gemini-1.5-pro-latest"}, "credentials": {"googlePalmApi": {"id": "", "name": "[Your googlePalmApi]"}}, "typeVersion": 1}, {"id": "8e9c5d1d-d610-4bad-8feb-7ff0d5e1e64f", "name": "Sticky Note4", "type": "n8n-nodes-base.stickyNote", "position": [1440, 80], "parameters": {"color": 7, "width": 719.7534883720941, "height": 574.3134883720929, "content": "## 4. Extract Key Data Confidently From Statement\n[Read more about the Information Extractor](https://docs.n8n.io/integrations/builtin/cluster-nodes/root-nodes/n8n-nodes-langchain.information-extractor)\n\nWith our newly generated transcript, let's pull just the deposit line items from our statement. Processing all pages together as images may have been compute-extensive but as text, this is usually no problem at all for our LLM.\n\nFor our example bank statement PDF, the resulting extraction should be 8 table rows where a value exists in the \"deposits\" column."}, "typeVersion": 1}, {"id": "f849ad3c-69ec-443c-b7cd-ab24e210af73", "name": "Sticky Note6", "type": "n8n-nodes-base.stickyNote", "position": [-640, 500], "parameters": {"color": 5, "width": 366.00558139534894, "height": 125.41023255813957, "content": "### \ud83d\udca1 About the Example PDF\nScanned PDFs (ie. where each page is a scanned image) are a use-case where extracting PDF text content will not work. Vision models are a great solution as this workflow aims to demonstrate!"}, "typeVersion": 1}, {"id": "be6f529b-8220-4879-bd99-4333b4d764b6", "name": "Combine All Pages", "type": "n8n-nodes-base.aggregate", "position": [1580, 320], "parameters": {"options": {}, "fieldsToAggregate": {"fieldToAggregate": [{"renameField": true, "outputFieldName": "pages", "fieldToAggregate": "text"}]}}, "typeVersion": 1}, {"id": "2b35755c-7bae-4896-b9f9-1e9110209526", "name": "Sticky Note5", "type": "n8n-nodes-base.stickyNote", "position": [-190.1172093023256, 280], "parameters": {"width": 199.23348837209306, "height": 374.95069767441856, "content": "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n### Privacy Warning!\nThis example uses a public third party service. If your data is senstive, please swap this out for the self-hosted version!"}, "typeVersion": 1}, {"id": "f638ba05-9ae2-447f-82af-eb22d8b9d6f1", "name": "Extract All Deposit Table Rows", "type": "@n8n/n8n-nodes-langchain.informationExtractor", "position": [1760, 320], "parameters": {"text": "= {{ $json.pages.join('---') }}", "options": {"systemPromptTemplate": "This statement contains tables with rows showing deposit and withdrawal made to the user's account. Deposits and withdrawals are identified by have the amount in their respective columns. What are the deposits to the account found in this statement?"}, "schemaType": "manual", "inputSchema": "{\n \"type\": \"array\",\n \"items\": {\n\t\"type\": \"object\",\n\t\"properties\": {\n \"date\": { \"type\": \"string\" },\n \"description\": { \"type\": \"string\" },\n \"amount\": { \"type\": \"number\" }\n\t}\n }\n}"}, "typeVersion": 1}, {"id": "cf1e8d85-5c92-469d-98af-7bdd5f469167", "name": "Sticky Note7", "type": "n8n-nodes-base.stickyNote", "position": [913.9944186046506, 620], "parameters": {"color": 5, "width": 498.18790697674433, "height": 130.35162790697677, "content": "### \ud83d\udca1 Don't use Google?\nFeel free to swap the model out for any state-of-the-art multimodal model which supports image inputs such as GPT4o(-mini) or Claude Sonnet/Opus. Note, I've found Gemini to produce the most accurate and consistent for this example use-case so no guarantees if you switch!"}, "typeVersion": 1}, {"id": "20f33372-a6b6-4f4d-987d-a94c85313fa8", "name": "Transcribe to Markdown", "type": "@n8n/n8n-nodes-langchain.chainLlm", "position": [980, 280], "parameters": {"text": "transcribe the image to markdown.", "messages": {"messageValues": [{"message": "=You help transcribe documents to markdown, keeping faithful to all text printed and visible to the best of your ability. Ensure you capture all headings, subheadings, titles as well as small print.\nFor any tables found with the document, convert them to markdown tables. If table row descriptions overflow into more than 1 row, concatanate and fit them into a single row. If two or more tables are adjacent horizontally, stack the tables vertically instead. There should be a newline after every markdown table.\nFor any graphics, use replace with a description of the image. Images of scanned checks should be converted to the phrase \"<scanned image of check>\"."}, {"type": "HumanMessagePromptTemplate", "messageType": "imageBinary"}]}, "promptType": "define"}, "typeVersion": 1.4}], "pinData": {}, "connections": {"Sort Pages": {"main": [[{"node": "Resize Images For AI", "type": "main", "index": 0}]]}, "Images To List": {"main": [[{"node": "Sort Pages", "type": "main", "index": 0}]]}, "Extract Zip File": {"main": [[{"node": "Images To List", "type": "main", "index": 0}]]}, "Combine All Pages": {"main": [[{"node": "Extract All Deposit Table Rows", "type": "main", "index": 0}]]}, "Get Bank Statement": {"main": [[{"node": "Split PDF into Images", "type": "main", "index": 0}]]}, "Resize Images For AI": {"main": [[{"node": "Transcribe to Markdown", "type": "main", "index": 0}]]}, "Split PDF into Images": {"main": [[{"node": "Extract Zip File", "type": "main", "index": 0}]]}, "Transcribe to Markdown": {"main": [[{"node": "Combine All Pages", "type": "main", "index": 0}]]}, "Google Gemini Chat Model": {"ai_languageModel": [[{"node": "Transcribe to Markdown", "type": "ai_languageModel", "index": 0}]]}, "Google Gemini Chat Model1": {"ai_languageModel": [[{"node": "Extract All Deposit Table Rows", "type": "ai_languageModel", "index": 0}]]}, "When clicking \u2018Test workflow\u2019": {"main": [[{"node": "Get Bank Statement", "type": "main", "index": 0}]]}}}How to Import This Workflow
- 1Copy the workflow JSON above using the Copy Workflow JSON button.
- 2Open your n8n instance and go to Workflows.
- 3Click Import from JSON and paste the copied workflow.
Don't have an n8n instance? Start your free trial at n8nautomation.cloud
Related Templates
Ask questions about a PDF using AI
Effortlessly transform your Google Drive PDFs into an interactive knowledge base with this powerful AI workflow. This n8n automation connects your Google Drive files, processes them with OpenAI embeddings, and stores them in a Pinecone vector database, allowing you to ask questions and receive intelligent answers directly from your document content. When a new PDF is uploaded to Google Drive, the workflow automatically extracts its text, splits it into manageable chunks using the Recursive Character Text Splitter, generates embeddings via OpenAI, and then inserts this structured data into Pinecone for efficient retrieval. Later, by clicking the 'Chat' button, you can engage in a natural language conversation with your document, powered by the OpenAI Chat Model and the Question and Answer Chain, which retrieves relevant information from Pinecone. This is ideal for researchers needing to quickly extract insights from large reports, legal professionals analyzing contracts, or businesses creating searchable knowledge bases from their documentation, saving countless hours of manual review and information searching.
Supabase Insertion & Upsertion & Retrieval
Efficiently manage and query your data with the Supabase Insertion & Upsertion & Retrieval workflow, a powerful solution for integrating document management with intelligent data processing. This 21-node workflow, triggered manually, connects Google Drive, Supabase, and OpenAI to automate the ingestion, updating, and retrieval of information. It allows you to upload documents from Google Drive, which are then processed by a Recursive Character Text Splitter and embedded using OpenAI Embeddings for insertion or upsertion into your Supabase vector store via the Insert Documents and Update Documents nodes. When a chat message is received, the workflow leverages OpenAI's Chat Model and a Question and Answer Chain to retrieve relevant information from Supabase using the Retrieve by Query node, providing intelligent responses based on your stored documents. This workflow is ideal for businesses and individuals who need to maintain an up-to-date knowledge base, power AI-driven chatbots with proprietary information, or automate the synchronization of document content with a searchable database, significantly reducing manual data entry and improving information accessibility.
Chat with Postgresql Database
Empower your users to interact with your PostgreSQL database using natural language by automating the process of querying and retrieving information. This workflow connects a chat interface, triggered by a new message, to an AI Agent that leverages OpenAI's powerful language model to understand user requests. The AI Agent intelligently utilizes a suite of PostgreSQL tools, including "Get Table Definition," "Execute SQL Query," and "Get DB Schema and Tables List," to dynamically fetch database schema, generate appropriate SQL queries, and execute them against your database. Chat history is maintained using an AI memory buffer, allowing for contextual conversations. This solution is ideal for support teams needing quick data lookups, business analysts exploring data without writing SQL, or developers building interactive data dashboards. It eliminates the need for manual SQL query writing, speeds up data access, and reduces the training burden for non-technical users, saving significant time and resources while improving data accessibility.