Executes a Impala query and returns the results as a data.frame
dkuImpalaQueryToData(database, query, preQueries = NULL, postQueries = NULL, connection = NULL, findConnectionFromDataset = TRUE)
| database | Name of the Hive database to use. This can also be a dataset name. In that case, findConnectionFromDataset must be set to TRUE |
|---|---|
| query | A Impala query that may or may not return results |
| preQueries | A list of Impala queries to execute before the main query |
| postQueries | A list of Impala queries to execute after the main query |
| connection | Name of a HDFS connection whose Hive database to use. If this parameter is defined, then database has to be NULL |
| findConnectionFromDataset | Set this to TRUE if the "database" you passed was actually a dataset name |
A data.frame with the query results, if any. Else, an empty dataframe
# NOT RUN {
# Identify a database directly
dkuImpalaQueryToData('my-database', 'SELECT COUNT(*) FROM mytable')
# Identify a connection by a dataset name
dkuImpalaQueryToData('my-dataset', 'SELECT COUNT(*) FROM mytable', findConnectionFromDataset=TRUE)
# Insert data and commit
dkuImpalaQueryToData('my-database', 'INSERT INTO mytable VALUES (42, 'stuff')', postQueries=c("COMMIT"))
# }
# NOT RUN {
# }