Scrape and summarize webpages with AI

Efficiently scrape and summarize multiple webpages with AI using this n8n workflow. This automation begins manually when you click "Execute Workflow," fetching a list of essay URLs via an HTTP Request node. It then extracts essay names using an HTML node, splitting them into individual items before limiting processing to the first three. For each of these selected essays, the workflow performs another HTTP Request to fetch the full text, extracts the title with an HTML node, and then leverages OpenAI's powerful language models for summarization. The AI:documentDefaultDataLoader and AI:textSplitterRecursiveCharacterTextSplitter prepare the content, which is then fed into an AI:lmChatOpenAi node and processed by an AI:chainSummarization node to generate concise summaries. This workflow is ideal for researchers, content creators, or anyone needing to quickly digest information from multiple web sources, saving significant time and effort by automating the tedious process of manual reading and summarization.

15 nodesmanual trigger236 views0 copiesAI

OpenAI

Workflow JSON

{"meta": {"instanceId": "408f9fb9940c3cb18ffdef0e0150fe342d6e655c3a9fac21f0f644e8bedabcd9"}, "nodes": [{"id": "67850bd7-f9f4-4d5b-8c9e-bd1451247ba6", "name": "When clicking \"Execute Workflow\"", "type": "n8n-nodes-base.manualTrigger", "position": [-740, 1000], "parameters": {}, "typeVersion": 1}, {"id": "0d9133f9-b6d3-4101-95c6-3cd24cdb70c3", "name": "Fetch essay list", "type": "n8n-nodes-base.httpRequest", "position": [-520, 1000], "parameters": {"url": "http://www.paulgraham.com/articles.html", "options": {}}, "typeVersion": 4.1}, {"id": "ee634297-a456-4f70-a995-55b02950571e", "name": "Extract essay names", "type": "n8n-nodes-base.html", "position": [-300, 1000], "parameters": {"options": {}, "operation": "extractHtmlContent", "dataPropertyName": "=data", "extractionValues": {"values": [{"key": "essay", "attribute": "href", "cssSelector": "table table a", "returnArray": true, "returnValue": "attribute"}]}}, "typeVersion": 1}, {"id": "83d75693-dbb8-44c4-8533-da06f611c59c", "name": "Fetch essay texts", "type": "n8n-nodes-base.httpRequest", "position": [360, 1000], "parameters": {"url": "=http://www.paulgraham.com/{{ $json.essay }}", "options": {}}, "typeVersion": 4.1}, {"id": "151022b5-8570-4176-bf3f-137f27ac7036", "name": "Extract title", "type": "n8n-nodes-base.html", "position": [700, 700], "parameters": {"options": {}, "operation": "extractHtmlContent", "extractionValues": {"values": [{"key": "title", "cssSelector": "title"}]}}, "typeVersion": 1}, {"id": "07bcf095-3c4d-4a72-9bcb-341411750ff5", "name": "Clean up", "type": "n8n-nodes-base.set", "position": [1360, 980], "parameters": {"fields": {"values": [{"name": "title", "stringValue": "={{ $json.title }}"}, {"name": "summary", "stringValue": "={{ $json.response.text }}"}, {"name": "url", "stringValue": "=http://www.paulgraham.com/{{ $('Limit to first 3').item.json.essay }}"}]}, "include": "none", "options": {}}, "typeVersion": 3}, {"id": "11285de0-3c5d-4296-a322-9b7585af9acc", "name": "Sticky Note", "type": "n8n-nodes-base.stickyNote", "position": [-580, 920], "parameters": {"width": 1071.752021563343, "height": 285.66037735849045, "content": "## Scrape latest Paul Graham essays"}, "typeVersion": 1}, {"id": "c32f905d-dd7a-4b68-bbe0-dd8115ee0944", "name": "Sticky Note1", "type": "n8n-nodes-base.stickyNote", "position": [620, 920], "parameters": {"width": 465.3908355795153, "height": 606.7924528301882, "content": "## Summarize them with GPT"}, "typeVersion": 1}, {"id": "29d264f4-df6d-4a41-ab38-58e1b1becc9a", "name": "Split out into items", "type": "n8n-nodes-base.splitOut", "position": [-80, 1000], "parameters": {"options": {}, "fieldToSplitOut": "essay"}, "typeVersion": 1}, {"id": "ccfa3a1d-f170-44b4-a1f2-3573c88cae98", "name": "Limit to first 3", "type": "n8n-nodes-base.limit", "position": [140, 1000], "parameters": {"maxItems": 3}, "typeVersion": 1}, {"id": "c3d05068-9d1a-4ef5-8249-e7384dc617ee", "name": "Default Data Loader", "type": "@n8n/n8n-nodes-langchain.documentDefaultDataLoader", "position": [820, 1200], "parameters": {"options": {}}, "typeVersion": 1}, {"id": "db75adad-cb16-4e72-b16e-34684a733b05", "name": "Recursive Character Text Splitter", "type": "@n8n/n8n-nodes-langchain.textSplitterRecursiveCharacterTextSplitter", "position": [820, 1340], "parameters": {"options": {}}, "typeVersion": 1}, {"id": "022cc091-9b4c-45c2-bc8e-4037ec2d0d60", "name": "OpenAI Chat Model1", "type": "@n8n/n8n-nodes-langchain.lmChatOpenAi", "position": [680, 1200], "parameters": {"model": "gpt-4o-mini", "options": {}}, "credentials": {"openAiApi": {"id": "", "name": "[Your openAiApi]"}}, "typeVersion": 1}, {"id": "cda47bb7-36c5-4d15-a1ef-0c66b1194825", "name": "Merge", "type": "n8n-nodes-base.merge", "position": [1160, 980], "parameters": {"mode": "combine", "options": {}, "combineBy": "combineByPosition"}, "typeVersion": 3}, {"id": "28144e4c-e425-428d-b3d1-f563bfd4e5b3", "name": "Summarization Chain", "type": "@n8n/n8n-nodes-langchain.chainSummarization", "position": [720, 1000], "parameters": {"options": {}, "operationMode": "documentLoader"}, "typeVersion": 2}], "pinData": {}, "connections": {"Merge": {"main": [[{"node": "Clean up", "type": "main", "index": 0}]]}, "Extract title": {"main": [[{"node": "Merge", "type": "main", "index": 0}]]}, "Fetch essay list": {"main": [[{"node": "Extract essay names", "type": "main", "index": 0}]]}, "Limit to first 3": {"main": [[{"node": "Fetch essay texts", "type": "main", "index": 0}]]}, "Fetch essay texts": {"main": [[{"node": "Extract title", "type": "main", "index": 0}, {"node": "Summarization Chain", "type": "main", "index": 0}]]}, "OpenAI Chat Model1": {"ai_languageModel": [[{"node": "Summarization Chain", "type": "ai_languageModel", "index": 0}]]}, "Default Data Loader": {"ai_document": [[{"node": "Summarization Chain", "type": "ai_document", "index": 0}]]}, "Extract essay names": {"main": [[{"node": "Split out into items", "type": "main", "index": 0}]]}, "Summarization Chain": {"main": [[{"node": "Merge", "type": "main", "index": 1}]]}, "Split out into items": {"main": [[{"node": "Limit to first 3", "type": "main", "index": 0}]]}, "When clicking \"Execute Workflow\"": {"main": [[{"node": "Fetch essay list", "type": "main", "index": 0}]]}, "Recursive Character Text Splitter": {"ai_textSplitter": [[{"node": "Default Data Loader", "type": "ai_textSplitter", "index": 0}]]}}}

How to Import This Workflow

1Copy the workflow JSON above using the Copy Workflow JSON button.
2Open your n8n instance and go to Workflows.
3Click Import from JSON and paste the copied workflow.

Don't have an n8n instance? Start your free trial at n8nautomation.cloud

Related Templates

Text to Speech (OpenAI)

Converts text into natural-sounding speech using OpenAI's Text-to-Speech API. It sends your input text to OpenAI and receives an audio file in return. This is useful for creating audio versions of articles, generating voiceovers for videos, or providing accessibility features for web content. Quickly transform written content into engaging audio.

8 nodes

Automate Customer Support Issue Resolution using AI Text Classifier

Automate the resolution of customer support issues by classifying their state and applying AI-driven actions. This workflow connects Jira for issue management, OpenAI for AI classification and response generation, and Slack for notifications. Support teams can use this to automatically close resolved tickets, remind customers about open issues, or escalate complex cases.

36 nodes

AI-Powered Candidate Shortlisting Automation for ERPNext

Automate AI-powered candidate shortlisting for ERPNext job applications. This workflow connects ERPNext, Google Gemini, WhatsApp, and Outlook to process resumes, evaluate candidates, and communicate outcomes. Recruiters and HR departments can use this to efficiently screen applicants, automatically reject unqualified candidates, and send acceptance notifications. It significantly reduces manual review time and streamlines the hiring process.

39 nodes

Ready to automate with n8n?

Get affordable managed n8n hosting with 24/7 support.