Skip to main content

Create connector source S3

Connect to a data source consisting of files stored on Amazon S3 or S3-compatible storage (e.g., MinIO, FPT Object Storage, etc.).

Automatically read, scan, and ingest data from files (CSV, TSV, AVRO, XML, ...) in a bucket into a streaming system or data pipeline.

Use case: Create a connector with Type: source, Database: S3

Pre-condition: CDC service status is Healthy

To create a connector, follow these steps:

Step 1: From the menu bar, select Data Platform > Workspace Management > Workspace name

Step 2: Under My services, select CDC service

Step 3: On the CDC service detail screen > Select the Connectors tab > Click Create a connector

create-connector

Step 4: Enter the connector information:

  • Name (required): connector name

Note: The connector name may contain lowercase letters a-z or digits 0-9. Spaces are not allowed; use "-" instead of a space.

  • Type (required): select source

  • Database (required): select S3

Step 5: Click Next to proceed to the Properties screen

Enter the Properties information:

  • URL (required): access address

  • Bucket name (required): bucket name

  • Access key (required): access key

  • Secret (required): access secret

  • Path (required): directory containing the source files

After entering the complete S3 Information, click Test connection to verify the connection from the Connector to the entered S3

  • Topic prefix (required): When data changes, change events will be produced to Kafka topics

Step 6: Click Next to proceed to the Additional properties screen

Enter the Additional properties information:

  • Type (required): Select the file format the connector will read. Common options: ROW (CSV, TSV), XML, Avro

  • File filter regex pattern (required): Enter a regex expression to filter files by name when scanning the source (e.g., .*.csv$ will only accept files ending with .csv).

  • Mode (required): Select the error tolerance mode when processing data.

    • None: Do not skip errors; stop on error.

    • All: Skip all errors and record them in the log.

  • Header definition (required):

Select how to determine column names for the input data.

* **From file (required):** Column names are taken from the first line of the file.

* **Autogenerated (required):** Column names are auto-generated (typically column1, column2, ...).

* **User provided (required):** You manually enter the list of column names in the "Column name" section below.
  • Delimiter (required): The character separating columns. The default is typically a comma ",", but you can change it to another character (e.g., tab, semicolon, ...).

  • Trim value (required): Select Yes/No to trim leading/trailing whitespace from each column value.

  • Column name (required):

    • Only displayed when Header definition = User provided

    • Enter/create a list of data column names (each name separated by a comma or newline; names can also be added one at a time using the "+" or "Tag" button).

Step 7: Click Next to proceed to the Review screen

Step 8: Review the information and click Create to complete the connector creation.