Teradata Connector For Hadoop

Teradata Connector for Hadoop (TDCH) can be used in DSS as an additional execution engine which allows scalable parallel data transfers between Teradata and Hadoop.

Installation and configuration

The Teradata Hadoop appliance already embeds TDCH. On the Hadoop side, many Hadoop enterprise vendors embed a TDCH library in their product, otherwise you can install it by:

  • downloading the Teradata Connector for Hadoop installation archive (you need a Teradata account)
  • unzipping it somewhere on the machine that runs DSS.

Once you have downloaded (or already know the location of) TDCH you can enable its support in DSS by adding the following properties to configuration file DATADIR/config/dip.properties, and restarting DSS (you may have to adjust file version numbers according to your distribution):

tdch.enabled = true
tdch.jar = /PATH/TO/TDCH/LIB/teradata-connector-1.5.1.jar
tdch.includes = /PATH/TO/TDCH/LIB/tdgssconfig.jar,/PATH/TO/TDCH/LIB/terajdbc4.jar

Usage and Guidelines

For any Sync recipe between Hadoop and Teradata the TDCH engine will be available. Refer to Teradata documentation for tuning this engine according to your Teradata characteristics and Yarn capabilities. This will define the target level of scalability and best split method for your data transfer.

The following distribution methods are available with TDCH:

For import:

  • split.by.hash
  • split.by.value
  • split.by.partition
  • split.by.amp

For export:

  • internal.fastload


  • TDCH doesn’t support Parquet (this not supported by Teradata)
  • Multi User Security is not supported