The --split-by clause is used to specify the columns of a table that help generate splits for data imports while importing the data into the Hadoop cluster. This clause specifies the columns and helps improve the performance through increased parallelism. It also helps specify the column having an even distribution of data to create splits while importing data.