Convert URL HTML to Markdown Format and Get Page Links
Workflow JSON
{"meta": {"instanceId": "6b6a2db47bdf8371d21090c511052883cc9a3f6af5d0d9d567c702d74a18820e"}, "nodes": [{"id": "f4570aad-db25-4dcd-8589-b1c8335935de", "name": "When clicking \u2018Test workflow\u2019", "type": "n8n-nodes-base.manualTrigger", "position": [-180, 3800], "parameters": {}, "typeVersion": 1}, {"id": "bd481559-85f2-4865-8d85-e50e72369f26", "name": "Wait", "type": "n8n-nodes-base.wait", "position": [940, 3620], "webhookId": "f10708f0-38c6-4c75-b635-37222d5b183a", "parameters": {"amount": 45}, "typeVersion": 1.1}, {"id": "cc9e9947-19e4-47c5-95b0-a631d688a8b6", "name": "Sticky Note36", "type": "n8n-nodes-base.stickyNote", "position": [549.7858793743054, 3709.534654112671], "parameters": {"color": 7, "width": 327.8244990224782, "height": 268.48353140372035, "content": "**40 at a time seems to be the memory limit on my server - run until complete with batches of 40 or increase based on your server memory**\n"}, "typeVersion": 1}, {"id": "9ebbd993-9194-40b1-a98e-352eb3a3f9eb", "name": "Sticky Note28", "type": "n8n-nodes-base.stickyNote", "position": [-50.797941767307435, 3729.028866440868], "parameters": {"color": 7, "width": 574.7594700148138, "height": 248.90718753310907, "content": "**Firecrawl.dev retrieves markdown inc. title, description, links & content. First define the URLs you'd like to scrape**\n"}, "typeVersion": 1}, {"id": "71c0f975-c0f9-47ae-a245-f852387ad461", "name": "Connect to your own data source", "type": "n8n-nodes-base.noOp", "position": [1380, 3820], "parameters": {}, "typeVersion": 1}, {"id": "fba918e7-2c88-4de3-a789-cadbf4f2584e", "name": "Get urls from own data source", "type": "n8n-nodes-base.noOp", "position": [0, 3800], "parameters": {}, "typeVersion": 1}, {"id": "221a75eb-0bc8-4747-9ec1-1879c46d9163", "name": "Example fields from data source", "type": "n8n-nodes-base.set", "notes": "Define URLs in array", "position": [200, 3800], "parameters": {"options": {}, "assignments": {"assignments": [{"id": "cc2c6af0-68d3-49eb-85fe-3288d2ed0f6b", "name": "Page", "type": "array", "value": "[\"https://www.automake.io/\", \"https://www.n8n.io/\"]"}]}, "includeOtherFields": true}, "notesInFlow": true, "typeVersion": 3.4}, {"id": "5a914964-e8ef-4064-8ecb-f1866de0d8c6", "name": "Sticky Note33", "type": "n8n-nodes-base.stickyNote", "position": [-40, 4000], "parameters": {"color": 3, "width": 510.3561134140244, "height": 94.13486342358942, "content": "**REQUIRED**\nConnect to your database of urls to input. Name the column `Page` like in the `Example fields from data source` node and make sure it has one link per row like `split out page urls`"}, "typeVersion": 1}, {"id": "5c004d5c-afeb-47c9-b30b-eb88880f87b9", "name": "Sticky Note34", "type": "n8n-nodes-base.stickyNote", "position": [900, 4000], "parameters": {"color": 3, "width": 284.87764467541297, "height": 168.68864948728321, "content": "**REQUIRED**\nUpdate the Auth parameter to your own [Firecrawl](https://firecrawl.dev) dev token\n\n**Header Auth parameter**\nname - Authorization\nvalue - your-own-api-key"}, "typeVersion": 1}, {"id": "53d97054-a5e4-4819-bdd9-f8632c33eba2", "name": "Sticky Note35", "type": "n8n-nodes-base.stickyNote", "position": [1360, 4000], "parameters": {"color": 3, "width": 284.87764467541297, "height": 91.91340067739628, "content": "**REQUIRED** \nOutput the data to your own data source e.g. Airtable"}, "typeVersion": 1}, {"id": "357a463f-7581-43ba-8930-af27e4762905", "name": "Sticky Note37", "type": "n8n-nodes-base.stickyNote", "position": [900, 3570.2075673933587], "parameters": {"color": 7, "width": 181.96744211154697, "height": 189.23753199986137, "content": "**Respect API limits (10 requests per min)**\n"}, "typeVersion": 1}, {"id": "77311c67-f50f-427a-87fd-b29b1f542bbc", "name": "40 items at a time", "type": "n8n-nodes-base.limit", "position": [580, 3800], "parameters": {"maxItems": 40}, "typeVersion": 1}, {"id": "43557ab1-4e52-4598-83a9-e39d5afc6de7", "name": "10 at a time", "type": "n8n-nodes-base.splitInBatches", "position": [740, 3800], "parameters": {"options": {}, "batchSize": 10}, "typeVersion": 3}, {"id": "555d52e7-010b-462b-9382-26804493de1c", "name": "Markdown data and Links", "type": "n8n-nodes-base.set", "position": [1160, 3820], "parameters": {"options": {}, "assignments": {"assignments": [{"id": "3a959c64-4c3c-4072-8427-67f6f6ecba1b", "name": "title", "type": "string", "value": "={{ $json.data.metadata.title }}"}, {"id": "d2da0859-a7a0-4c39-913a-150ecb95d075", "name": "description", "type": "string", "value": "={{ $json.data.metadata.description }}"}, {"id": "62bd2d76-b78d-4501-a59b-a25ed7b345b0", "name": "content", "type": "string", "value": "={{ $json.data.markdown }}"}, {"id": "d4c712fa-b52a-498f-8abc-26dc72be61f7", "name": "links", "type": "string", "value": "={{ $json.data.links }} "}]}}, "notesInFlow": true, "typeVersion": 3.4}, {"id": "aac948e6-ac86-4cea-be84-f27919d6d936", "name": "Split out page URLs", "type": "n8n-nodes-base.splitOut", "position": [380, 3800], "parameters": {"options": {}, "fieldToSplitOut": "Page"}, "typeVersion": 1}, {"id": "71c5a0d4-540e-4766-ae99-bdc427019dac", "name": "Retrieve Page Markdown and Links", "type": "n8n-nodes-base.httpRequest", "notes": "curl -X POST https://api.firecrawl.dev/v1/scrape \\\n -H 'Content-Type: application/json' \\\n -H 'Authorization: Bearer YOUR_API_KEY' \\\n -d '{\n \"url\": \"https://docs.firecrawl.dev\",\n \"formats\" : [\"markdown\", \"html\"]\n }'\n", "position": [960, 3820], "parameters": {"url": "https://api.firecrawl.dev/v1/scrape", "method": "POST", "options": {}, "jsonBody": "={\n \"url\": \"{{ $json.Page }}\",\n \"formats\" : [\"markdown\", \"links\"]\n} ", "sendBody": true, "sendHeaders": true, "specifyBody": "json", "authentication": "genericCredentialType", "genericAuthType": "httpHeaderAuth", "headerParameters": {"parameters": [{"name": "Content-Type", "value": "application/json"}]}}, "credentials": {"httpHeaderAuth": {"id": "", "name": "[Your httpHeaderAuth]"}}, "retryOnFail": true, "typeVersion": 4.2, "waitBetweenTries": 5000}, {"id": "a2f12929-262e-4354-baa3-f9e3c05ec2eb", "name": "Sticky Note38", "type": "n8n-nodes-base.stickyNote", "position": [-840, 3340], "parameters": {"color": 4, "width": 581.9949654101088, "height": 818.5240734585421, "content": "## Convert URL HTML to Markdown and Get Page Links\n\n## Use Case\nTransform web pages into AI-friendly markdown format:\n- You need to process webpage content for LLM analysis\n- You want to extract both content and links from web pages\n- You need clean, formatted text without HTML markup\n- You want to respect API rate limits while crawling pages\n\n## What this Workflow Does\nThe workflow uses Firecrawl.dev API to process webpages:\n- Converts HTML content to markdown format\n- Extracts all links from each webpage\n- Handles API rate limiting automatically\n- Processes URLs in batches from your database\n\n## Setup\n1. Create a [Firecrawl.dev](https://www.firecrawl.dev/) account and get your API key\n2. Add your Firecrawl API key to the HTTP Request node's Authorization header\n3. Connect your URL database to the input node (column name must be \"Page\") or edit the array in `Example fields from data source`\n4. Configure your preferred output database connection\n\n## How to Adjust it to Your Needs\n- Modify input source to pull URLs from different databases\n- Adjust rate limiting parameters if needed\n- Customize output format for your specific use case\n\n\nMade by Simon @ [automake.io](https://automake.io)\n"}, "typeVersion": 1}], "pinData": {}, "connections": {"Wait": {"main": [[{"node": "10 at a time", "type": "main", "index": 0}]]}, "10 at a time": {"main": [null, [{"node": "Retrieve Page Markdown and Links", "type": "main", "index": 0}]]}, "40 items at a time": {"main": [[{"node": "10 at a time", "type": "main", "index": 0}]]}, "Split out page URLs": {"main": [[{"node": "40 items at a time", "type": "main", "index": 0}]]}, "Markdown data and Links": {"main": [[{"node": "Connect to your own data source", "type": "main", "index": 0}]]}, "Get urls from own data source": {"main": [[{"node": "Example fields from data source", "type": "main", "index": 0}]]}, "Connect to your own data source": {"main": [[{"node": "Wait", "type": "main", "index": 0}]]}, "Example fields from data source": {"main": [[{"node": "Split out page URLs", "type": "main", "index": 0}]]}, "Retrieve Page Markdown and Links": {"main": [[{"node": "Markdown data and Links", "type": "main", "index": 0}]]}, "When clicking \u2018Test workflow\u2019": {"main": [[{"node": "Get urls from own data source", "type": "main", "index": 0}]]}}}How to Import This Workflow
- 1Copy the workflow JSON above using the Copy Workflow JSON button.
- 2Open your n8n instance and go to Workflows.
- 3Click Import from JSON and paste the copied workflow.
Don't have an n8n instance? Start your free trial at n8nautomation.cloud
Related Templates
ETL pipeline
Automate your data extraction, transformation, and loading with this robust ETL pipeline, designed to efficiently process and analyze information from various sources. This workflow begins on a schedule, fetching tweets from Twitter/X, then storing them in MongoDB for initial processing. The MongoDB data is then sent to Google Cloud Natural Language for sentiment analysis or entity extraction, with the results subsequently prepared and stored in PostgreSQL. A conditional check on the PostgreSQL data determines whether to send an alert to Slack, ensuring timely notifications for critical insights or anomalies. This powerful automation is ideal for marketing teams monitoring brand sentiment, researchers analyzing public opinion, or businesses tracking competitor activity, providing actionable intelligence without manual data handling. By automating data ingestion, enrichment, and storage, this workflow significantly reduces the time and effort spent on data preparation, allowing teams to focus on analysis and strategic decision-making while ensuring data consistency and accessibility.
SQL agent with memory
Empower your data analysis with the SQL agent with memory workflow, automating the process of querying databases using natural language. This powerful workflow connects OpenAI's advanced language models with your local SQL databases, allowing you to interact with your data through a conversational interface. Initially, the workflow downloads a chinook.zip example database, extracts it, and saves the chinook.db file locally, making it immediately available for querying. The AI Agent, powered by OpenAI Chat Model and supported by a Window Buffer Memory, interprets your natural language questions, translates them into SQL queries, executes them against your local chinook.db, and provides the results back to you. This is incredibly useful for data analysts, business intelligence professionals, or anyone needing quick insights from their databases without writing complex SQL queries, significantly reducing the time and specialized knowledge required for data exploration. By leveraging the Chat Trigger, users can easily initiate conversations and receive immediate, intelligent responses, streamlining data access and accelerating decision-making.
Prepare CSV files with GPT-4
Transform raw, unstructured text into perfectly formatted CSV files using the power of GPT-4 with this n8n workflow. This automation connects OpenAI's advanced language model to process your input, then meticulously structures the output into a usable CSV format. Ideal for data analysts, marketers, or researchers, this workflow helps you extract specific information from large text datasets, such as customer reviews, survey responses, or article summaries, and prepare it for analysis in spreadsheets or databases. By automating the extraction and formatting of data, you significantly reduce manual data entry errors and save countless hours of tedious work, allowing you to focus on insights rather than data preparation. The workflow manually triggers, sending your text to OpenAI, then splits the responses into manageable batches, parses the JSON output, converts it into a structured table, and finally saves a clean, UTF-8 encoded CSV file to disk, ensuring compatibility across various systems.