1. Load web URLs and Extract Text
Given a column with web URLs, theLoad web URLs operation will scrape the content from each URL, and output the content as a binary format or as a human readable text format, depending on the operation type selected. The figure below shows the Load web URL and Extract Text operation.

1a. Configure web scrape

extract text operation is selected, the text will be converted from binary to human readable format. When would you want to use the binary format? Binary web scraping is useful for downloading content including images or archived documents.
1b. Input
| Parameter | Description | Required |
|---|---|---|
| Column name (string with urls) | string - the input column which contains the strings of web URLs | True |
1c. Output
| Parameter | Description |
|---|---|
Result content Load url (web scrape) | binary - the contents of each web page |
Result content Load url (web scrape) and extract text | string - the contents of each web page, converted from binary to human readable text |
1d. Generated Code
2. Split text data into equal chunks
Sometimes you’d like to send text data to a foundational model or store in a vector database, but the text is too long. For this case, just split the text into “chunks” of characters.
2a. Configure text splitting
Given a text input, theSplit data operation will separate the input column entries into chunks of specified size.

2b. Input
| Parameter | Description | Required |
|---|---|---|
| Column name | string - the text content which should be split into equal chunks | True |
| Size | integer - the size of each chunk, number of characters. Example: 1000 | True |
2c. Output
| Parameter | Description |
|---|---|
| result_chunks | array(string) - an array of text strings, each string representing one chunk of the larger text content |

