Create connector source S3
Connect to a data source consisting of files stored on Amazon S3 or S3-compatible storage (e.g., MinIO, FPT Object Storage, etc.).
Automatically read, scan, and ingest data from files (CSV, TSV, AVRO, XML, ...) in a bucket into a streaming system or data pipeline.
Use case: Create a connector with Type: source, Database: S3
Pre-condition: CDC service status is Healthy
To create a connector, follow these steps:
Step 1: From the menu bar, select Data Platform > Workspace Management > Workspace name
Step 2: Under My services, select CDC service
Step 3: On the CDC service detail screen > Select the Connectors tab > Click Create a connector
Step 4: Enter the connector information:
- Name (required): connector name
Note: The connector name may contain lowercase letters a-z or digits 0-9. Spaces are not allowed; use "-" instead of a space.
-
Type (required): select source
-
Database (required): select S3
Step 5: Click Next to proceed to the Properties screen
Enter the Properties information:
-
URL (required): access address
-
Bucket name (required): bucket name
-
Access key (required): access key
-
Secret (required): access secret
-
Path (required): directory containing the source files
After entering the complete S3 Information, click Test connection to verify the connection from the Connector to the entered S3
- Topic prefix (required): When data changes, change events will be produced to Kafka topics
Step 6: Click Next to proceed to the Additional properties screen
Enter the Additional properties information:
-
Type (required): Select the file format the connector will read. Common options: ROW (CSV, TSV), XML, Avro
-
File filter regex pattern (required): Enter a regex expression to filter files by name when scanning the source (e.g., .*.csv$ will only accept files ending with .csv).
-
Mode (required): Select the error tolerance mode when processing data.
-
None: Do not skip errors; stop on error.
-
All: Skip all errors and record them in the log.
-
-
Header definition (required):
Select how to determine column names for the input data.
* **From file (required):** Column names are taken from the first line of the file.
* **Autogenerated (required):** Column names are auto-generated (typically column1, column2, ...).
* **User provided (required):** You manually enter the list of column names in the "Column name" section below.
-
Delimiter (required): The character separating columns. The default is typically a comma ",", but you can change it to another character (e.g., tab, semicolon, ...).
-
Trim value (required): Select Yes/No to trim leading/trailing whitespace from each column value.
-
Column name (required):
-
Only displayed when Header definition = User provided
-
Enter/create a list of data column names (each name separated by a comma or newline; names can also be added one at a time using the "+" or "Tag" button).
-
Step 7: Click Next to proceed to the Review screen
Step 8: Review the information and click Create to complete the connector creation.