Executes a Hive query and returns the results as a data.frame
dkuHiveQueryToData(database, query, preQueries = NULL, postQueries = NULL, connection = NULL, findConnectionFromDataset = TRUE)
database | Name of the Hive database to use. This can also be a dataset name. In that case, findConnectionFromDataset must be set to TRUE |
---|---|
query | A Hive query that may or may not return results |
preQueries | A list of Hive queries to execute before the main query |
postQueries | A list of Hive queries to execute after the main query |
connection | Name of a HDFS connection whose Hive database to use. If this parameter is defined, then database has to be NULL |
findConnectionFromDataset | Set this to TRUE if the "database" you passed was actually a dataset name |
A data.frame with the query results, if any. Else, an empty dataframe
# NOT RUN { # Identify a database directly dkuHiveQueryToData('my-database', 'SELECT COUNT(*) FROM mytable') # Identify a connection by a dataset name dkuHiveQueryToData('my-dataset', 'SELECT COUNT(*) FROM mytable', findConnectionFromDataset=TRUE) # Insert data and commit dkuHiveQueryToData('my-database', 'INSERT INTO mytable VALUES (42, 'stuff')', postQueries=c("COMMIT")) # } # NOT RUN { # }