Teradata Connector For Hadoop¶
Teradata Connector for Hadoop (TDCH) can be used in DSS as an additional execution engine which allows scalable parallel data transfers between Teradata and Hadoop.
Installation and configuration¶
The Teradata Hadoop appliance already embeds TDCH. On the Hadoop side, many Hadoop enterprise vendors embed a TDCH library in their product, otherwise you can install it by:
- downloading the Teradata Connector for Hadoop installation archive (you need a Teradata account)
- unzipping it somewhere on the machine that runs DSS.
Once you have downloaded (or already know the location of) TDCH you can enable its support in DSS by adding the following properties to
DATADIR/config/dip.properties, and restarting DSS (you may have to adjust file version numbers according to your distribution):
tdch.enabled = true tdch.jar = /PATH/TO/TDCH/LIB/teradata-connector-1.5.1.jar tdch.includes = /PATH/TO/TDCH/LIB/tdgssconfig.jar,/PATH/TO/TDCH/LIB/terajdbc4.jar
Usage and Guidelines¶
For any Sync recipe between Hadoop and Teradata the TDCH engine will be available. Refer to Teradata documentation for tuning this engine according to your Teradata characteristics and Yarn capabilities. This will define the target level of scalability and best split method for your data transfer.
The following distribution methods are available with TDCH:
- TDCH doesn’t support Parquet (this not supported by Teradata)
- Multi User Security is not supported