Create connector source S3

Connect to a data source consisting of files stored on Amazon S3 or S3-compatible storage (e.g., MinIO, FPT Object Storage, etc.).

Automatically read, scan, and ingest data from files (CSV, TSV, AVRO, XML, ...) in a bucket into a streaming system or data pipeline.

Use case: Create a connector with Type: source, Database: S3

Pre-condition: CDC service status is Healthy

To create a connector, follow these steps:

Step 1: From the menu bar, select Data Platform > Workspace Management > Workspace name

Step 2: Under My services, select CDC service

Step 3: On the CDC service detail screen > Select the Connectors tab > Click Create a connector

create-connector

Step 4: Enter the connector information:

Name (required): connector name

Note: The connector name may contain lowercase letters a-z or digits 0-9. Spaces are not allowed; use "-" instead of a space.

Type (required): select source
Database (required): select S3

Step 5: Click Next to proceed to the Properties screen

Enter the Properties information:

URL (required): access address
Bucket name (required): bucket name
Access key (required): access key
Secret (required): access secret
Path (required): directory containing the source files

After entering the complete S3 Information, click Test connection to verify the connection from the Connector to the entered S3

Topic prefix (required): When data changes, change events will be produced to Kafka topics

Step 6: Click Next to proceed to the Additional properties screen

Enter the Additional properties information:

Type (required): Select the file format the connector will read. Common options: ROW (CSV, TSV), XML, Avro
File filter regex pattern (required): Enter a regex expression to filter files by name when scanning the source (e.g., .*.csv$ will only accept files ending with .csv).
Mode (required): Select the error tolerance mode when processing data.
- None: Do not skip errors; stop on error.
- All: Skip all errors and record them in the log.
Header definition (required):

Select how to determine column names for the input data.

* **From file (required):** Column names are taken from the first line of the file.

* **Autogenerated (required):** Column names are auto-generated (typically column1, column2, ...).

* **User provided (required):** You manually enter the list of column names in the "Column name" section below.

Delimiter (required): The character separating columns. The default is typically a comma ",", but you can change it to another character (e.g., tab, semicolon, ...).
Trim value (required): Select Yes/No to trim leading/trailing whitespace from each column value.
Column name (required):
- Only displayed when Header definition = User provided
- Enter/create a list of data column names (each name separated by a comma or newline; names can also be added one at a time using the "+" or "Tag" button).

Step 7: Click Next to proceed to the Review screen

Step 8: Review the information and click Create to complete the connector creation.