Hive ORCFile¶
The Optimized Row Columnar (ORC) file format provides a highly efficient way to store Hive data. It was added in Hive 0.11 to overcome limitations of the other Hive file formats.
For more information, see the official ORCFile documentation.
Compatibility¶
Data Science Studio can read & write ORCFiles. Most Hive data types are supported, including complex types (object, map & array).
The following Hive types are not supported:
DATE
UNION
Limitations¶
The ORCFile format can only be used on Hadoop filesystems. If the data is on S3 or Azure Blob Storage, then access needs to be setup through Hadoop with HDFS connections
Impala doesn’t support ORCFile. Consequently, you won’t be able to use the “Live processing” chart engine on ORCFile datasets.