In such situations, the –boundary-query clause can be used. Generally, Sqoop uses the SQL query select min(), max() from to determine the boundary values for creating splits. However, if this query is not optimal, then using the –boundary-query argument any random query can be written to generate two numeric columns.
Sqoop metastore is a shared metadata repository for remote users to define and execute saved jobs created using the Sqoop job defined in the Sqoop metastore. The Sqoop –site.xml should be configured to connect to the Sqoop metastore.
The --split-by clause is used to specify the columns of a table that help generate splits for data imports while importing the data into the Hadoop cluster. This clause specifies the columns and helps improve the performance through increased parallelism. It also helps specify the column having an even distribution of data to create splits while importing data.