Scrape and summarize webpages with AI

Efficiently scrape and summarize multiple webpages with AI using this n8n workflow. This automation begins manually when you click "Execute Workflow," fetching a list of essay URLs via an HTTP Request node. It then extracts essay names using an HTML node, splitting them into individual items before limiting processing to the first three. For each of these selected essays, the workflow performs another HTTP Request to fetch the full text, extracts the title with an HTML node, and then leverages OpenAI's powerful language models for summarization. The AI:documentDefaultDataLoader and AI:textSplitterRecursiveCharacterTextSplitter prepare the content, which is then fed into an AI:lmChatOpenAi node and processed by an AI:chainSummarization node to generate concise summaries. This workflow is ideal for researchers, content creators, or anyone needing to quickly digest information from multiple web sources, saving significant time and effort by automating the tedious process of manual reading and summarization.

15 nodesmanual trigger188 views0 copiesAI
OpenAI

Workflow JSON

{"meta": {"instanceId": "408f9fb9940c3cb18ffdef0e0150fe342d6e655c3a9fac21f0f644e8bedabcd9"}, "nodes": [{"id": "67850bd7-f9f4-4d5b-8c9e-bd1451247ba6", "name": "When clicking \"Execute Workflow\"", "type": "n8n-nodes-base.manualTrigger", "position": [-740, 1000], "parameters": {}, "typeVersion": 1}, {"id": "0d9133f9-b6d3-4101-95c6-3cd24cdb70c3", "name": "Fetch essay list", "type": "n8n-nodes-base.httpRequest", "position": [-520, 1000], "parameters": {"url": "http://www.paulgraham.com/articles.html", "options": {}}, "typeVersion": 4.1}, {"id": "ee634297-a456-4f70-a995-55b02950571e", "name": "Extract essay names", "type": "n8n-nodes-base.html", "position": [-300, 1000], "parameters": {"options": {}, "operation": "extractHtmlContent", "dataPropertyName": "=data", "extractionValues": {"values": [{"key": "essay", "attribute": "href", "cssSelector": "table table a", "returnArray": true, "returnValue": "attribute"}]}}, "typeVersion": 1}, {"id": "83d75693-dbb8-44c4-8533-da06f611c59c", "name": "Fetch essay texts", "type": "n8n-nodes-base.httpRequest", "position": [360, 1000], "parameters": {"url": "=http://www.paulgraham.com/{{ $json.essay }}", "options": {}}, "typeVersion": 4.1}, {"id": "151022b5-8570-4176-bf3f-137f27ac7036", "name": "Extract title", "type": "n8n-nodes-base.html", "position": [700, 700], "parameters": {"options": {}, "operation": "extractHtmlContent", "extractionValues": {"values": [{"key": "title", "cssSelector": "title"}]}}, "typeVersion": 1}, {"id": "07bcf095-3c4d-4a72-9bcb-341411750ff5", "name": "Clean up", "type": "n8n-nodes-base.set", "position": [1360, 980], "parameters": {"fields": {"values": [{"name": "title", "stringValue": "={{ $json.title }}"}, {"name": "summary", "stringValue": "={{ $json.response.text }}"}, {"name": "url", "stringValue": "=http://www.paulgraham.com/{{ $('Limit to first 3').item.json.essay }}"}]}, "include": "none", "options": {}}, "typeVersion": 3}, {"id": "11285de0-3c5d-4296-a322-9b7585af9acc", "name": "Sticky Note", "type": "n8n-nodes-base.stickyNote", "position": [-580, 920], "parameters": {"width": 1071.752021563343, "height": 285.66037735849045, "content": "## Scrape latest Paul Graham essays"}, "typeVersion": 1}, {"id": "c32f905d-dd7a-4b68-bbe0-dd8115ee0944", "name": "Sticky Note1", "type": "n8n-nodes-base.stickyNote", "position": [620, 920], "parameters": {"width": 465.3908355795153, "height": 606.7924528301882, "content": "## Summarize them with GPT"}, "typeVersion": 1}, {"id": "29d264f4-df6d-4a41-ab38-58e1b1becc9a", "name": "Split out into items", "type": "n8n-nodes-base.splitOut", "position": [-80, 1000], "parameters": {"options": {}, "fieldToSplitOut": "essay"}, "typeVersion": 1}, {"id": "ccfa3a1d-f170-44b4-a1f2-3573c88cae98", "name": "Limit to first 3", "type": "n8n-nodes-base.limit", "position": [140, 1000], "parameters": {"maxItems": 3}, "typeVersion": 1}, {"id": "c3d05068-9d1a-4ef5-8249-e7384dc617ee", "name": "Default Data Loader", "type": "@n8n/n8n-nodes-langchain.documentDefaultDataLoader", "position": [820, 1200], "parameters": {"options": {}}, "typeVersion": 1}, {"id": "db75adad-cb16-4e72-b16e-34684a733b05", "name": "Recursive Character Text Splitter", "type": "@n8n/n8n-nodes-langchain.textSplitterRecursiveCharacterTextSplitter", "position": [820, 1340], "parameters": {"options": {}}, "typeVersion": 1}, {"id": "022cc091-9b4c-45c2-bc8e-4037ec2d0d60", "name": "OpenAI Chat Model1", "type": "@n8n/n8n-nodes-langchain.lmChatOpenAi", "position": [680, 1200], "parameters": {"model": "gpt-4o-mini", "options": {}}, "credentials": {"openAiApi": {"id": "", "name": "[Your openAiApi]"}}, "typeVersion": 1}, {"id": "cda47bb7-36c5-4d15-a1ef-0c66b1194825", "name": "Merge", "type": "n8n-nodes-base.merge", "position": [1160, 980], "parameters": {"mode": "combine", "options": {}, "combineBy": "combineByPosition"}, "typeVersion": 3}, {"id": "28144e4c-e425-428d-b3d1-f563bfd4e5b3", "name": "Summarization Chain", "type": "@n8n/n8n-nodes-langchain.chainSummarization", "position": [720, 1000], "parameters": {"options": {}, "operationMode": "documentLoader"}, "typeVersion": 2}], "pinData": {}, "connections": {"Merge": {"main": [[{"node": "Clean up", "type": "main", "index": 0}]]}, "Extract title": {"main": [[{"node": "Merge", "type": "main", "index": 0}]]}, "Fetch essay list": {"main": [[{"node": "Extract essay names", "type": "main", "index": 0}]]}, "Limit to first 3": {"main": [[{"node": "Fetch essay texts", "type": "main", "index": 0}]]}, "Fetch essay texts": {"main": [[{"node": "Extract title", "type": "main", "index": 0}, {"node": "Summarization Chain", "type": "main", "index": 0}]]}, "OpenAI Chat Model1": {"ai_languageModel": [[{"node": "Summarization Chain", "type": "ai_languageModel", "index": 0}]]}, "Default Data Loader": {"ai_document": [[{"node": "Summarization Chain", "type": "ai_document", "index": 0}]]}, "Extract essay names": {"main": [[{"node": "Split out into items", "type": "main", "index": 0}]]}, "Summarization Chain": {"main": [[{"node": "Merge", "type": "main", "index": 1}]]}, "Split out into items": {"main": [[{"node": "Limit to first 3", "type": "main", "index": 0}]]}, "When clicking \"Execute Workflow\"": {"main": [[{"node": "Fetch essay list", "type": "main", "index": 0}]]}, "Recursive Character Text Splitter": {"ai_textSplitter": [[{"node": "Default Data Loader", "type": "ai_textSplitter", "index": 0}]]}}}

How to Import This Workflow

  1. 1Copy the workflow JSON above using the Copy Workflow JSON button.
  2. 2Open your n8n instance and go to Workflows.
  3. 3Click Import from JSON and paste the copied workflow.

Don't have an n8n instance? Start your free trial at n8nautomation.cloud

Ready to automate with n8n?

Get affordable managed n8n hosting with 24/7 support.