Scrape and summarize webpages with AI
Efficiently scrape and summarize multiple webpages with AI using this n8n workflow. This automation begins manually when you click "Execute Workflow," fetching a list of essay URLs via an HTTP Request node. It then extracts essay names using an HTML node, splitting them into individual items before limiting processing to the first three. For each of these selected essays, the workflow performs another HTTP Request to fetch the full text, extracts the title with an HTML node, and then leverages OpenAI's powerful language models for summarization. The AI:documentDefaultDataLoader and AI:textSplitterRecursiveCharacterTextSplitter prepare the content, which is then fed into an AI:lmChatOpenAi node and processed by an AI:chainSummarization node to generate concise summaries. This workflow is ideal for researchers, content creators, or anyone needing to quickly digest information from multiple web sources, saving significant time and effort by automating the tedious process of manual reading and summarization.
Workflow JSON
{"meta": {"instanceId": "408f9fb9940c3cb18ffdef0e0150fe342d6e655c3a9fac21f0f644e8bedabcd9"}, "nodes": [{"id": "67850bd7-f9f4-4d5b-8c9e-bd1451247ba6", "name": "When clicking \"Execute Workflow\"", "type": "n8n-nodes-base.manualTrigger", "position": [-740, 1000], "parameters": {}, "typeVersion": 1}, {"id": "0d9133f9-b6d3-4101-95c6-3cd24cdb70c3", "name": "Fetch essay list", "type": "n8n-nodes-base.httpRequest", "position": [-520, 1000], "parameters": {"url": "http://www.paulgraham.com/articles.html", "options": {}}, "typeVersion": 4.1}, {"id": "ee634297-a456-4f70-a995-55b02950571e", "name": "Extract essay names", "type": "n8n-nodes-base.html", "position": [-300, 1000], "parameters": {"options": {}, "operation": "extractHtmlContent", "dataPropertyName": "=data", "extractionValues": {"values": [{"key": "essay", "attribute": "href", "cssSelector": "table table a", "returnArray": true, "returnValue": "attribute"}]}}, "typeVersion": 1}, {"id": "83d75693-dbb8-44c4-8533-da06f611c59c", "name": "Fetch essay texts", "type": "n8n-nodes-base.httpRequest", "position": [360, 1000], "parameters": {"url": "=http://www.paulgraham.com/{{ $json.essay }}", "options": {}}, "typeVersion": 4.1}, {"id": "151022b5-8570-4176-bf3f-137f27ac7036", "name": "Extract title", "type": "n8n-nodes-base.html", "position": [700, 700], "parameters": {"options": {}, "operation": "extractHtmlContent", "extractionValues": {"values": [{"key": "title", "cssSelector": "title"}]}}, "typeVersion": 1}, {"id": "07bcf095-3c4d-4a72-9bcb-341411750ff5", "name": "Clean up", "type": "n8n-nodes-base.set", "position": [1360, 980], "parameters": {"fields": {"values": [{"name": "title", "stringValue": "={{ $json.title }}"}, {"name": "summary", "stringValue": "={{ $json.response.text }}"}, {"name": "url", "stringValue": "=http://www.paulgraham.com/{{ $('Limit to first 3').item.json.essay }}"}]}, "include": "none", "options": {}}, "typeVersion": 3}, {"id": "11285de0-3c5d-4296-a322-9b7585af9acc", "name": "Sticky Note", "type": "n8n-nodes-base.stickyNote", "position": [-580, 920], "parameters": {"width": 1071.752021563343, "height": 285.66037735849045, "content": "## Scrape latest Paul Graham essays"}, "typeVersion": 1}, {"id": "c32f905d-dd7a-4b68-bbe0-dd8115ee0944", "name": "Sticky Note1", "type": "n8n-nodes-base.stickyNote", "position": [620, 920], "parameters": {"width": 465.3908355795153, "height": 606.7924528301882, "content": "## Summarize them with GPT"}, "typeVersion": 1}, {"id": "29d264f4-df6d-4a41-ab38-58e1b1becc9a", "name": "Split out into items", "type": "n8n-nodes-base.splitOut", "position": [-80, 1000], "parameters": {"options": {}, "fieldToSplitOut": "essay"}, "typeVersion": 1}, {"id": "ccfa3a1d-f170-44b4-a1f2-3573c88cae98", "name": "Limit to first 3", "type": "n8n-nodes-base.limit", "position": [140, 1000], "parameters": {"maxItems": 3}, "typeVersion": 1}, {"id": "c3d05068-9d1a-4ef5-8249-e7384dc617ee", "name": "Default Data Loader", "type": "@n8n/n8n-nodes-langchain.documentDefaultDataLoader", "position": [820, 1200], "parameters": {"options": {}}, "typeVersion": 1}, {"id": "db75adad-cb16-4e72-b16e-34684a733b05", "name": "Recursive Character Text Splitter", "type": "@n8n/n8n-nodes-langchain.textSplitterRecursiveCharacterTextSplitter", "position": [820, 1340], "parameters": {"options": {}}, "typeVersion": 1}, {"id": "022cc091-9b4c-45c2-bc8e-4037ec2d0d60", "name": "OpenAI Chat Model1", "type": "@n8n/n8n-nodes-langchain.lmChatOpenAi", "position": [680, 1200], "parameters": {"model": "gpt-4o-mini", "options": {}}, "credentials": {"openAiApi": {"id": "", "name": "[Your openAiApi]"}}, "typeVersion": 1}, {"id": "cda47bb7-36c5-4d15-a1ef-0c66b1194825", "name": "Merge", "type": "n8n-nodes-base.merge", "position": [1160, 980], "parameters": {"mode": "combine", "options": {}, "combineBy": "combineByPosition"}, "typeVersion": 3}, {"id": "28144e4c-e425-428d-b3d1-f563bfd4e5b3", "name": "Summarization Chain", "type": "@n8n/n8n-nodes-langchain.chainSummarization", "position": [720, 1000], "parameters": {"options": {}, "operationMode": "documentLoader"}, "typeVersion": 2}], "pinData": {}, "connections": {"Merge": {"main": [[{"node": "Clean up", "type": "main", "index": 0}]]}, "Extract title": {"main": [[{"node": "Merge", "type": "main", "index": 0}]]}, "Fetch essay list": {"main": [[{"node": "Extract essay names", "type": "main", "index": 0}]]}, "Limit to first 3": {"main": [[{"node": "Fetch essay texts", "type": "main", "index": 0}]]}, "Fetch essay texts": {"main": [[{"node": "Extract title", "type": "main", "index": 0}, {"node": "Summarization Chain", "type": "main", "index": 0}]]}, "OpenAI Chat Model1": {"ai_languageModel": [[{"node": "Summarization Chain", "type": "ai_languageModel", "index": 0}]]}, "Default Data Loader": {"ai_document": [[{"node": "Summarization Chain", "type": "ai_document", "index": 0}]]}, "Extract essay names": {"main": [[{"node": "Split out into items", "type": "main", "index": 0}]]}, "Summarization Chain": {"main": [[{"node": "Merge", "type": "main", "index": 1}]]}, "Split out into items": {"main": [[{"node": "Limit to first 3", "type": "main", "index": 0}]]}, "When clicking \"Execute Workflow\"": {"main": [[{"node": "Fetch essay list", "type": "main", "index": 0}]]}, "Recursive Character Text Splitter": {"ai_textSplitter": [[{"node": "Default Data Loader", "type": "ai_textSplitter", "index": 0}]]}}}How to Import This Workflow
- 1Copy the workflow JSON above using the Copy Workflow JSON button.
- 2Open your n8n instance and go to Workflows.
- 3Click Import from JSON and paste the copied workflow.
Don't have an n8n instance? Start your free trial at n8nautomation.cloud
Related Templates
Text to Speech (OpenAI)
Converts text into natural-sounding speech using OpenAI's Text-to-Speech API. It sends your input text to OpenAI and receives an audio file in return. This is useful for creating audio versions of articles, generating voiceovers for videos, or providing accessibility features for web content. Quickly transform written content into engaging audio.
LangChain - Example - Code Node Example
Explore a basic LangChain agent that answers questions using a custom tool. This workflow connects n8n's AI nodes and custom code nodes to OpenAI for language model interactions. It's useful for developers building custom AI assistants or researchers experimenting with agentic workflows. This saves development time by providing a ready-to-use example of a LangChain agent.
AI-Powered Candidate Shortlisting Automation for ERPNext
Automate AI-powered candidate shortlisting for ERPNext job applications. This workflow connects ERPNext, Google Gemini, WhatsApp, and Outlook to process resumes, evaluate candidates, and communicate outcomes. Recruiters and HR departments can use this to efficiently screen applicants, automatically reject unqualified candidates, and send acceptance notifications. It significantly reduces manual review time and streamlines the hiring process.