Remove Personally Identifiable Information (PII) from CSV Files with OpenAI

Safeguard sensitive information by automatically removing Personally Identifiable Information (PII) from CSV files stored in Google Drive using the power of OpenAI. This workflow starts with a Google Drive Trigger, which initiates the process when new CSV files are added to a specified folder. The Google Drive node then retrieves the file, and the Extract from File node parses its contents. The extracted data is sent to the OpenAI node, which intelligently identifies and redacts PII, ensuring data privacy and compliance. The cleaned data is then uploaded back to Google Drive via the Upload to Drive node, while the Get filename and Get result nodes help manage the file naming and processing outcomes. This automation is ideal for businesses handling customer data, research institutions, or any organization needing to anonymize datasets before sharing or further processing, saving significant manual effort and reducing the risk of data breaches. It streamlines data sanitization, allowing teams to focus on analysis rather than tedious PII removal.

10 nodesmanual trigger196 views0 copiesData
Google DriveOpenAI

Workflow JSON

{"meta": {"instanceId": "2f9460831fcdb0e9a4494f0630367cfe2968282072e2d27c6ee6ab0a4c165a36"}, "nodes": [{"id": "ff4e8706-09a0-4bf1-86c1-dfb65f55ccb3", "name": "Google Drive Trigger", "type": "n8n-nodes-base.googleDriveTrigger", "position": [20, -140], "parameters": {"event": "fileCreated", "options": {}, "pollTimes": {"item": [{"mode": "everyMinute"}]}, "triggerOn": "specificFolder", "folderToWatch": {"__rl": true, "mode": "list", "value": "1-hRMnBRYgY6iVJ_youKMyPz83k9GAVYu", "cachedResultUrl": "https://drive.google.com/drive/folders/1-hRMnBRYgY6iVJ_youKMyPz83k9GAVYu", "cachedResultName": "nnnnnnnnnnn8n"}}, "credentials": {"googleDriveOAuth2Api": {"id": "", "name": "[Your googleDriveOAuth2Api]"}}, "typeVersion": 1}, {"id": "340fb03b-3b8a-4eb4-ad4c-b0ba12b72b19", "name": "Google Drive", "type": "n8n-nodes-base.googleDrive", "position": [260, -140], "parameters": {"fileId": {"__rl": true, "mode": "id", "value": "={{ $json.id }}"}, "options": {"binaryPropertyName": "data"}, "operation": "download"}, "credentials": {"googleDriveOAuth2Api": {"id": "", "name": "[Your googleDriveOAuth2Api]"}}, "typeVersion": 3}, {"id": "4a5d037f-0103-4645-87d0-785dfdfb80d1", "name": "Extract from File", "type": "n8n-nodes-base.extractFromFile", "position": [260, 60], "parameters": {"options": {}}, "typeVersion": 1, "alwaysOutputData": false}, {"id": "36c7e83d-f22f-4a71-b5a2-64ed3e4ce24b", "name": "OpenAI", "type": "@n8n/n8n-nodes-langchain.openAi", "position": [-120, 260], "parameters": {"modelId": {"__rl": true, "mode": "list", "value": "gpt-4o-mini", "cachedResultName": "GPT-4O-MINI"}, "options": {}, "messages": {"values": [{"role": "system", "content": "Analyze the provided tabular data and identify the columns that contain personally identifiable information (PII). Return only the column names that contain PII, separated by commas. Key name: 'content'. Do not include any additional text or explanation."}, {"content": "=Here is some tabular data with column headers and two example rows.\n\nHeaders: {{Object.keys($json)}}\n\nExample Row 1: {{Object.values($json)}}\n\n"}]}, "jsonOutput": true}, "credentials": {"openAiApi": {"id": "", "name": "[Your openAiApi]"}}, "executeOnce": true, "typeVersion": 1.7}, {"id": "771c6535-47d4-4c70-b487-bd5ac602e29c", "name": "Merge", "type": "n8n-nodes-base.merge", "position": [440, 260], "parameters": {"numberInputs": 3}, "typeVersion": 3}, {"id": "1fc467fd-379d-4841-978b-89c1453b61d8", "name": "Upload to Drive", "type": "n8n-nodes-base.googleDrive", "position": [740, 260], "parameters": {"name": "={{ $json.fileName }}", "content": "={{ $json.content }}", "driveId": {"__rl": true, "mode": "list", "value": "My Drive"}, "options": {}, "folderId": {"__rl": true, "mode": "list", "value": "1F30Qu3csrmMhtcu_prMipeiGm-64VEdd", "cachedResultUrl": "https://drive.google.com/drive/folders/1F30Qu3csrmMhtcu_prMipeiGm-64VEdd", "cachedResultName": "processed"}, "operation": "createFromText"}, "credentials": {"googleDriveOAuth2Api": {"id": "", "name": "[Your googleDriveOAuth2Api]"}}, "typeVersion": 3}, {"id": "92715586-e630-4584-83a3-1af42d7cb50e", "name": "Get filename", "type": "n8n-nodes-base.splitOut", "position": [20, 60], "parameters": {"options": {"destinationFieldName": "originalFilename"}, "fieldToSplitOut": "name"}, "executeOnce": true, "typeVersion": 1}, {"id": "2c4b3242-34db-4948-b835-cd2340ad7b19", "name": "Get result", "type": "n8n-nodes-base.splitOut", "position": [200, 260], "parameters": {"options": {"destinationFieldName": "data"}, "fieldToSplitOut": "message.content.content"}, "typeVersion": 1}, {"id": "4207dc71-5b0e-4780-9f23-00f5a7fc3862", "name": "Remove PII columns", "type": "n8n-nodes-base.code", "position": [580, 260], "parameters": {"jsCode": "// Input: All items from the previous node\nconst input = $input.all();\n\n// Step 1: Extract the PII column names from the first item\nconst firstItem = input[0];\nif (!firstItem.json.data || !firstItem.json.data) {\n throw new Error(\"PII column names are missing in the input data.\");\n}\nconst piiColumns = firstItem.json.data.split(',').map(col => col.trim());\n//console.log(\"PII Columns to Remove:\", piiColumns);\n\n// Step 2: Remove the first two items and process the remaining rows\nlet rows = input.slice(2).map(item => item.json); // Exclude the first item\n//console.log(\"Rows to convert (before skipping last):\", rows);\n\n\n// Ensure there are rows to process\nif (rows.length === 0) {\n throw new Error(\"No rows to convert to CSV.\");\n}\n\n// Step 3: Remove PII columns from each row\nconst sanitizedRows = rows.map(row => {\n const sanitizedRow = { ...row }; // Copy the row\n piiColumns.forEach(column => delete sanitizedRow[column]); // Remove PII columns\n return sanitizedRow;\n});\n//console.log(\"Sanitized Rows:\", sanitizedRows);\n\n// Step 4: Extract headers from sanitized rows\nconst headers = Object.keys(sanitizedRows[0]); // Extract updated headers\n//console.log(\"CSV Headers:\", headers);\n\n// Step 5: Convert rows to CSV format\nconst csvRows = [\n headers.join(','), // Add header row\n ...sanitizedRows.map(row => \n headers.map(header => String(row[header] || '').replace(/,/g, '')).join(',') // Match headers with rows\n )\n];\n\n// Join all rows with a newline character\nconst csvContent = csvRows.join('\\n');\n//console.log(\"CSV Content:\", csvContent);\n\nconst originalFileName = input[1].json.originalFilename;\n\n// Step 7: Generate a new filename\nconst fileExtension = originalFileName.split('.').pop();\nconst baseName = originalFileName.replace(`.${fileExtension}`, '');\nconst newFileName = `${baseName}_PII_removed.${fileExtension}`;\n//console.log(\"New Filename:\", newFileName);\n\n// Step 8: Return the CSV content and filename as JSON\nreturn [\n {\n json: {\n fileName: newFileName, // New file name\n content: csvContent // CSV content as plain text\n }\n }\n];\n"}, "typeVersion": 2}, {"id": "e9f25ee7-cd00-4496-9062-5d57cab5788d", "name": "Sticky Note", "type": "n8n-nodes-base.stickyNote", "position": [-300, -220], "parameters": {"height": 260, "content": "## Remove PII from CSV Files\nThis workflow monitors a Google Drive folder for new CSV files, identifies and removes PII columns using OpenAI, and uploads the sanitized file back to the drive. It requires Google Drive and OpenAI integrations with API access enabled."}, "typeVersion": 1}], "pinData": {}, "connections": {"Merge": {"main": [[{"node": "Remove PII columns", "type": "main", "index": 0}]]}, "OpenAI": {"main": [[{"node": "Get result", "type": "main", "index": 0}]]}, "Get result": {"main": [[{"node": "Merge", "type": "main", "index": 0}]]}, "Get filename": {"main": [[{"node": "Merge", "type": "main", "index": 1}]]}, "Google Drive": {"main": [[{"node": "Extract from File", "type": "main", "index": 0}]]}, "Upload to Drive": {"main": [[]]}, "Extract from File": {"main": [[{"node": "OpenAI", "type": "main", "index": 0}, {"node": "Merge", "type": "main", "index": 2}]]}, "Remove PII columns": {"main": [[{"node": "Upload to Drive", "type": "main", "index": 0}]]}, "Google Drive Trigger": {"main": [[{"node": "Get filename", "type": "main", "index": 0}, {"node": "Google Drive", "type": "main", "index": 0}]]}}}

How to Import This Workflow

  1. 1Copy the workflow JSON above using the Copy Workflow JSON button.
  2. 2Open your n8n instance and go to Workflows.
  3. 3Click Import from JSON and paste the copied workflow.

Don't have an n8n instance? Start your free trial at n8nautomation.cloud

Related Templates

Ask questions about a PDF using AI

Effortlessly transform your Google Drive PDFs into an interactive knowledge base with this powerful AI workflow. This n8n automation connects your Google Drive files, processes them with OpenAI embeddings, and stores them in a Pinecone vector database, allowing you to ask questions and receive intelligent answers directly from your document content. When a new PDF is uploaded to Google Drive, the workflow automatically extracts its text, splits it into manageable chunks using the Recursive Character Text Splitter, generates embeddings via OpenAI, and then inserts this structured data into Pinecone for efficient retrieval. Later, by clicking the 'Chat' button, you can engage in a natural language conversation with your document, powered by the OpenAI Chat Model and the Question and Answer Chain, which retrieves relevant information from Pinecone. This is ideal for researchers needing to quickly extract insights from large reports, legal professionals analyzing contracts, or businesses creating searchable knowledge bases from their documentation, saving countless hours of manual review and information searching.

16 nodes

Supabase Insertion & Upsertion & Retrieval

Efficiently manage and query your data with the Supabase Insertion & Upsertion & Retrieval workflow, a powerful solution for integrating document management with intelligent data processing. This 21-node workflow, triggered manually, connects Google Drive, Supabase, and OpenAI to automate the ingestion, updating, and retrieval of information. It allows you to upload documents from Google Drive, which are then processed by a Recursive Character Text Splitter and embedded using OpenAI Embeddings for insertion or upsertion into your Supabase vector store via the Insert Documents and Update Documents nodes. When a chat message is received, the workflow leverages OpenAI's Chat Model and a Question and Answer Chain to retrieve relevant information from Supabase using the Retrieve by Query node, providing intelligent responses based on your stored documents. This workflow is ideal for businesses and individuals who need to maintain an up-to-date knowledge base, power AI-driven chatbots with proprietary information, or automate the synchronization of document content with a searchable database, significantly reducing manual data entry and improving information accessibility.

21 nodes

Chat with Postgresql Database

Empower your users to interact with your PostgreSQL database using natural language by automating the process of querying and retrieving information. This workflow connects a chat interface, triggered by a new message, to an AI Agent that leverages OpenAI's powerful language model to understand user requests. The AI Agent intelligently utilizes a suite of PostgreSQL tools, including "Get Table Definition," "Execute SQL Query," and "Get DB Schema and Tables List," to dynamically fetch database schema, generate appropriate SQL queries, and execute them against your database. Chat history is maintained using an AI memory buffer, allowing for contextual conversations. This solution is ideal for support teams needing quick data lookups, business analysts exploring data without writing SQL, or developers building interactive data dashboards. It eliminates the need for manual SQL query writing, speeds up data access, and reduces the training burden for non-technical users, saving significant time and resources while improving data accessibility.

11 nodes

Ready to automate with n8n?

Get affordable managed n8n hosting with 24/7 support.