Executes a Impala query and returns the results as a data.frame

dkuImpalaQueryToData(database, query, preQueries = NULL, postQueries = NULL,
  connection = NULL, findConnectionFromDataset = TRUE)

Arguments

database

Name of the Hive database to use. This can also be a dataset name. In that case, findConnectionFromDataset must be set to TRUE

query

A Impala query that may or may not return results

preQueries

A list of Impala queries to execute before the main query

postQueries

A list of Impala queries to execute after the main query

connection

Name of a HDFS connection whose Hive database to use. If this parameter is defined, then database has to be NULL

findConnectionFromDataset

Set this to TRUE if the "database" you passed was actually a dataset name

Value

A data.frame with the query results, if any. Else, an empty dataframe

Examples

# NOT RUN {
# Identify a database directly
dkuImpalaQueryToData('my-database', 'SELECT COUNT(*) FROM mytable')

# Identify a connection by a dataset name
dkuImpalaQueryToData('my-dataset', 'SELECT COUNT(*) FROM mytable', findConnectionFromDataset=TRUE)

# Insert data and commit
dkuImpalaQueryToData('my-database', 'INSERT INTO mytable VALUES (42, 'stuff')', postQueries=c("COMMIT"))
# }
# NOT RUN {
# }