Executes a Impala query and returns the results as a data.frame
dkuImpalaQueryToData(database, query, preQueries = NULL, postQueries = NULL, connection = NULL, findConnectionFromDataset = TRUE)
database | Name of the Hive database to use. This can also be a dataset name. In that case, findConnectionFromDataset must be set to TRUE |
---|---|
query | A Impala query that may or may not return results |
preQueries | A list of Impala queries to execute before the main query |
postQueries | A list of Impala queries to execute after the main query |
connection | Name of a HDFS connection whose Hive database to use. If this parameter is defined, then database has to be NULL |
findConnectionFromDataset | Set this to TRUE if the "database" you passed was actually a dataset name |
A data.frame with the query results, if any. Else, an empty dataframe
# NOT RUN { # Identify a database directly dkuImpalaQueryToData('my-database', 'SELECT COUNT(*) FROM mytable') # Identify a connection by a dataset name dkuImpalaQueryToData('my-dataset', 'SELECT COUNT(*) FROM mytable', findConnectionFromDataset=TRUE) # Insert data and commit dkuImpalaQueryToData('my-database', 'INSERT INTO mytable VALUES (42, 'stuff')', postQueries=c("COMMIT")) # } # NOT RUN { # }