Transcribing Bank Statements To Markdown Using Gemini Vision AI
Workflow JSON
{"meta": {"instanceId": "408f9fb9940c3cb18ffdef0e0150fe342d6e655c3a9fac21f0f644e8bedabcd9"}, "nodes": [{"id": "490493d1-e9ac-458a-ac9e-a86048ce6169", "name": "When clicking \u2018Test workflow\u2019", "type": "n8n-nodes-base.manualTrigger", "position": [-700, 260], "parameters": {}, "typeVersion": 1}, {"id": "116f1137-632f-4021-ad0f-cf59ed1776fd", "name": "Google Gemini Chat Model", "type": "@n8n/n8n-nodes-langchain.lmChatGoogleGemini", "position": [980, 440], "parameters": {"options": {}, "modelName": "models/gemini-1.5-pro-latest"}, "credentials": {"googlePalmApi": {"id": "", "name": "[Your googlePalmApi]"}}, "typeVersion": 1}, {"id": "44695b4f-702c-4230-9ec3-e37447fed38e", "name": "Sort Pages", "type": "n8n-nodes-base.sort", "position": [400, 320], "parameters": {"options": {}, "sortFieldsUi": {"sortField": [{"fieldName": "fileName"}]}}, "typeVersion": 1}, {"id": "f2575b2c-0808-464e-b982-1eed8e0d9df7", "name": "Sticky Note", "type": "n8n-nodes-base.stickyNote", "position": [-1280, 0], "parameters": {"width": 437.0502325581392, "height": 430.522325581395, "content": "## Try Me Out!\n\n### This workflow converts a bank statement to markdown, faithfully capturing the details using the power of Vision Language Models (\"VLMs\"). The resulting markdown can then be parsed again by your standard LLM to extract data such as identifying all deposit table rows in the document.\n\nThis workflow is able to handle both downloaded PDFs as well as scanned PDFs. Be sure to protect sensitive data before running this workflow.\n\n### Need Help?\nJoin the [Discord](https://discord.com/invite/XPKeKXeB7d) or ask in the [Forum](https://community.n8n.io/)!"}, "typeVersion": 1}, {"id": "d62d7b0e-29eb-48a9-a471-4279e663c521", "name": "Get Bank Statement", "type": "n8n-nodes-base.googleDrive", "position": [-500, 260], "parameters": {"fileId": {"__rl": true, "mode": "id", "value": "1wS9U7MQDthj57CvEcqG_Llkr-ek6RqGA"}, "options": {}, "operation": "download"}, "credentials": {"googleDriveOAuth2Api": {"id": "", "name": "[Your googleDriveOAuth2Api]"}}, "typeVersion": 3}, {"id": "1329973b-a4e0-4272-9e24-3674bb9d4923", "name": "Split PDF into Images", "type": "n8n-nodes-base.httpRequest", "position": [-140, 320], "parameters": {"url": "http://stirling-pdf:8080/api/v1/convert/pdf/img", "method": "POST", "options": {}, "sendBody": true, "contentType": "multipart-form-data", "bodyParameters": {"parameters": [{"name": "fileInput", "parameterType": "formBinaryData", "inputDataFieldName": "data"}, {"name": "imageFormat", "value": "jpg"}, {"name": "singleOrMultiple", "value": "multiple"}, {"name": "dpi", "value": "300"}]}}, "typeVersion": 4.2}, {"id": "4e263346-9f55-4316-a505-4a54061ccfbb", "name": "Extract Zip File", "type": "n8n-nodes-base.compression", "position": [40, 320], "parameters": {}, "typeVersion": 1.1}, {"id": "5e97072f-a7c5-45aa-99d1-3231a9230b53", "name": "Images To List", "type": "n8n-nodes-base.code", "position": [220, 320], "parameters": {"jsCode": "let results = [];\n\nfor (item of items) {\n for (key of Object.keys(item.binary)) {\n results.push({\n json: {\n fileName: item.binary[key].fileName\n },\n binary: {\n data: item.binary[key],\n }\n });\n }\n}\n\nreturn results;"}, "typeVersion": 2}, {"id": "62836c73-4cf7-4225-a45d-0cd62b7e227d", "name": "Resize Images For AI", "type": "n8n-nodes-base.editImage", "position": [800, 280], "parameters": {"width": 75, "height": 75, "options": {}, "operation": "resize", "resizeOption": "percent"}, "typeVersion": 1}, {"id": "59fc6716-9826-4463-be33-923a8f6f33f1", "name": "Sticky Note1", "type": "n8n-nodes-base.stickyNote", "position": [-820, 0], "parameters": {"color": 7, "width": 546.4534883720931, "height": 478.89348837209275, "content": "## 1. Download Bank Statement PDF\n[Read more about Google Drive node](https://docs.n8n.io/integrations/builtin/app-nodes/n8n-nodes-base.googledrive)\n\nFor this demonstration, we'll pull an example bank statement off Google Drive however, you can also swap this out for other triggers such as webhook.\n\nYou can use the example bank statement created specifically for this workflow here: https://drive.google.com/file/d/1wS9U7MQDthj57CvEcqG_Llkr-ek6RqGA/view?usp=sharing"}, "typeVersion": 1}, {"id": "8e68a295-ff35-4d28-86bb-c8ea5664b3c6", "name": "Sticky Note2", "type": "n8n-nodes-base.stickyNote", "position": [-240, 3.173953488372149], "parameters": {"color": 7, "width": 848.0232558139535, "height": 533.5469767441862, "content": "## 2. Split PDF Pages into Seperate Images\n\nCurrently, the vision model we'll be using can't accept raw PDFs so we'll have to convert our PDF to a image in order to use it. To achieve this, we'll use the free [Stirling PDF webservice](https://stirlingpdf.io/) for convenience but if we need data privacy (recommended!), we could self-host our own [Stirling PDF instance](https://github.com/Stirling-Tools/Stirling-PDF/) instead. Alternatively, feel free to swap this service out for one of your own as long as it can convert PDFs into images!\n\nWe will ask the PDF service to return each page of our statement as separate images, which it does so as a zip file. Next steps is to just unzip the file and convert the output as a list of images."}, "typeVersion": 1}, {"id": "5286aa35-9687-4d5b-987c-79322a1ddc84", "name": "Sticky Note3", "type": "n8n-nodes-base.stickyNote", "position": [640, -40], "parameters": {"color": 7, "width": 775.3441860465115, "height": 636.0809302325588, "content": "## 3. Convert PDF Pages to Markdown Using Vision Model\n[Learn more about using the Basic LLM node](https://docs.n8n.io/integrations/builtin/cluster-nodes/root-nodes/n8n-nodes-langchain.chainllm)\n\nUnlike traditional OCR, vision models (\"VLMs\") \"transcribe\" what they see so while we shouldn't expect an exact replication of a document, they may perform better making sense of complex document layouts ie. such as with horizontally stacked tables.\n \nIn this demonstration, we can transcribe our bank statement scans to markdown text for the purpose of further processing. With markdown, we can retain tables or columnar data found in the document. We'll employ two optimisations however as a workaround for token and timeout limits (1) we'll only transcribe one page at a time and (2) we'll shrink the pages just a little just enough to speed up processing but not enough to reduce our required resolution."}, "typeVersion": 1}, {"id": "49deef00-4617-4b19-a56f-08fd195dfb82", "name": "Google Gemini Chat Model1", "type": "@n8n/n8n-nodes-langchain.lmChatGoogleGemini", "position": [1760, 480], "parameters": {"options": {"safetySettings": {"values": [{"category": "HARM_CATEGORY_DANGEROUS_CONTENT", "threshold": "BLOCK_NONE"}]}}, "modelName": "models/gemini-1.5-pro-latest"}, "credentials": {"googlePalmApi": {"id": "", "name": "[Your googlePalmApi]"}}, "typeVersion": 1}, {"id": "8e9c5d1d-d610-4bad-8feb-7ff0d5e1e64f", "name": "Sticky Note4", "type": "n8n-nodes-base.stickyNote", "position": [1440, 80], "parameters": {"color": 7, "width": 719.7534883720941, "height": 574.3134883720929, "content": "## 4. Extract Key Data Confidently From Statement\n[Read more about the Information Extractor](https://docs.n8n.io/integrations/builtin/cluster-nodes/root-nodes/n8n-nodes-langchain.information-extractor)\n\nWith our newly generated transcript, let's pull just the deposit line items from our statement. Processing all pages together as images may have been compute-extensive but as text, this is usually no problem at all for our LLM.\n\nFor our example bank statement PDF, the resulting extraction should be 8 table rows where a value exists in the \"deposits\" column."}, "typeVersion": 1}, {"id": "f849ad3c-69ec-443c-b7cd-ab24e210af73", "name": "Sticky Note6", "type": "n8n-nodes-base.stickyNote", "position": [-640, 500], "parameters": {"color": 5, "width": 366.00558139534894, "height": 125.41023255813957, "content": "### \ud83d\udca1 About the Example PDF\nScanned PDFs (ie. where each page is a scanned image) are a use-case where extracting PDF text content will not work. Vision models are a great solution as this workflow aims to demonstrate!"}, "typeVersion": 1}, {"id": "be6f529b-8220-4879-bd99-4333b4d764b6", "name": "Combine All Pages", "type": "n8n-nodes-base.aggregate", "position": [1580, 320], "parameters": {"options": {}, "fieldsToAggregate": {"fieldToAggregate": [{"renameField": true, "outputFieldName": "pages", "fieldToAggregate": "text"}]}}, "typeVersion": 1}, {"id": "2b35755c-7bae-4896-b9f9-1e9110209526", "name": "Sticky Note5", "type": "n8n-nodes-base.stickyNote", "position": [-190.1172093023256, 280], "parameters": {"width": 199.23348837209306, "height": 374.95069767441856, "content": "\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n### Privacy Warning!\nThis example uses a public third party service. If your data is senstive, please swap this out for the self-hosted version!"}, "typeVersion": 1}, {"id": "f638ba05-9ae2-447f-82af-eb22d8b9d6f1", "name": "Extract All Deposit Table Rows", "type": "@n8n/n8n-nodes-langchain.informationExtractor", "position": [1760, 320], "parameters": {"text": "= {{ $json.pages.join('---') }}", "options": {"systemPromptTemplate": "This statement contains tables with rows showing deposit and withdrawal made to the user's account. Deposits and withdrawals are identified by have the amount in their respective columns. What are the deposits to the account found in this statement?"}, "schemaType": "manual", "inputSchema": "{\n \"type\": \"array\",\n \"items\": {\n\t\"type\": \"object\",\n\t\"properties\": {\n \"date\": { \"type\": \"string\" },\n \"description\": { \"type\": \"string\" },\n \"amount\": { \"type\": \"number\" }\n\t}\n }\n}"}, "typeVersion": 1}, {"id": "cf1e8d85-5c92-469d-98af-7bdd5f469167", "name": "Sticky Note7", "type": "n8n-nodes-base.stickyNote", "position": [913.9944186046506, 620], "parameters": {"color": 5, "width": 498.18790697674433, "height": 130.35162790697677, "content": "### \ud83d\udca1 Don't use Google?\nFeel free to swap the model out for any state-of-the-art multimodal model which supports image inputs such as GPT4o(-mini) or Claude Sonnet/Opus. Note, I've found Gemini to produce the most accurate and consistent for this example use-case so no guarantees if you switch!"}, "typeVersion": 1}, {"id": "20f33372-a6b6-4f4d-987d-a94c85313fa8", "name": "Transcribe to Markdown", "type": "@n8n/n8n-nodes-langchain.chainLlm", "position": [980, 280], "parameters": {"text": "transcribe the image to markdown.", "messages": {"messageValues": [{"message": "=You help transcribe documents to markdown, keeping faithful to all text printed and visible to the best of your ability. Ensure you capture all headings, subheadings, titles as well as small print.\nFor any tables found with the document, convert them to markdown tables. If table row descriptions overflow into more than 1 row, concatanate and fit them into a single row. If two or more tables are adjacent horizontally, stack the tables vertically instead. There should be a newline after every markdown table.\nFor any graphics, use replace with a description of the image. Images of scanned checks should be converted to the phrase \"<scanned image of check>\"."}, {"type": "HumanMessagePromptTemplate", "messageType": "imageBinary"}]}, "promptType": "define"}, "typeVersion": 1.4}], "pinData": {}, "connections": {"Sort Pages": {"main": [[{"node": "Resize Images For AI", "type": "main", "index": 0}]]}, "Images To List": {"main": [[{"node": "Sort Pages", "type": "main", "index": 0}]]}, "Extract Zip File": {"main": [[{"node": "Images To List", "type": "main", "index": 0}]]}, "Combine All Pages": {"main": [[{"node": "Extract All Deposit Table Rows", "type": "main", "index": 0}]]}, "Get Bank Statement": {"main": [[{"node": "Split PDF into Images", "type": "main", "index": 0}]]}, "Resize Images For AI": {"main": [[{"node": "Transcribe to Markdown", "type": "main", "index": 0}]]}, "Split PDF into Images": {"main": [[{"node": "Extract Zip File", "type": "main", "index": 0}]]}, "Transcribe to Markdown": {"main": [[{"node": "Combine All Pages", "type": "main", "index": 0}]]}, "Google Gemini Chat Model": {"ai_languageModel": [[{"node": "Transcribe to Markdown", "type": "ai_languageModel", "index": 0}]]}, "Google Gemini Chat Model1": {"ai_languageModel": [[{"node": "Extract All Deposit Table Rows", "type": "ai_languageModel", "index": 0}]]}, "When clicking \u2018Test workflow\u2019": {"main": [[{"node": "Get Bank Statement", "type": "main", "index": 0}]]}}}How to Import This Workflow
- 1Copy the workflow JSON above using the Copy Workflow JSON button.
- 2Open your n8n instance and go to Workflows.
- 3Click Import from JSON and paste the copied workflow.
Don't have an n8n instance? Start your free trial at n8nautomation.cloud
Related Templates
ETL pipeline
Automate your data extraction, transformation, and loading with this robust ETL pipeline, designed to efficiently process and analyze information from various sources. This workflow begins on a schedule, fetching tweets from Twitter/X, then storing them in MongoDB for initial processing. The MongoDB data is then sent to Google Cloud Natural Language for sentiment analysis or entity extraction, with the results subsequently prepared and stored in PostgreSQL. A conditional check on the PostgreSQL data determines whether to send an alert to Slack, ensuring timely notifications for critical insights or anomalies. This powerful automation is ideal for marketing teams monitoring brand sentiment, researchers analyzing public opinion, or businesses tracking competitor activity, providing actionable intelligence without manual data handling. By automating data ingestion, enrichment, and storage, this workflow significantly reduces the time and effort spent on data preparation, allowing teams to focus on analysis and strategic decision-making while ensuring data consistency and accessibility.
SQL agent with memory
Empower your data analysis with the SQL agent with memory workflow, automating the process of querying databases using natural language. This powerful workflow connects OpenAI's advanced language models with your local SQL databases, allowing you to interact with your data through a conversational interface. Initially, the workflow downloads a chinook.zip example database, extracts it, and saves the chinook.db file locally, making it immediately available for querying. The AI Agent, powered by OpenAI Chat Model and supported by a Window Buffer Memory, interprets your natural language questions, translates them into SQL queries, executes them against your local chinook.db, and provides the results back to you. This is incredibly useful for data analysts, business intelligence professionals, or anyone needing quick insights from their databases without writing complex SQL queries, significantly reducing the time and specialized knowledge required for data exploration. By leveraging the Chat Trigger, users can easily initiate conversations and receive immediate, intelligent responses, streamlining data access and accelerating decision-making.
Prepare CSV files with GPT-4
Transform raw, unstructured text into perfectly formatted CSV files using the power of GPT-4 with this n8n workflow. This automation connects OpenAI's advanced language model to process your input, then meticulously structures the output into a usable CSV format. Ideal for data analysts, marketers, or researchers, this workflow helps you extract specific information from large text datasets, such as customer reviews, survey responses, or article summaries, and prepare it for analysis in spreadsheets or databases. By automating the extraction and formatting of data, you significantly reduce manual data entry errors and save countless hours of tedious work, allowing you to focus on insights rather than data preparation. The workflow manually triggers, sending your text to OpenAI, then splits the responses into manageable batches, parses the JSON output, converts it into a structured table, and finally saves a clean, UTF-8 encoded CSV file to disk, ensuring compatibility across various systems.