Apache Hadoop HDFS Commands & Operations

HDFS Commands:

  1. hadoop fs -ls <path> list files in the path of the file system
  2. hadoop fs -chmod <arg> <file-or-dir> alters the permissions of a file where <arg> is the binary argument e.g. 777
  3. hadoop fs -chown <owner>:<group> <file-or-dir> change the owner of a file
  4. hadoop fs -mkdir <path> make a directory on the file system
  5. hadoop fs -put <local-origin> <destination> copy a file from the local storage onto file system
  6. hadoop fs -get <origin> <local-destination> copy a file to the local storage from the file system
  7. hadoop fs -copyFromLocal <local-origin> <destination> similar to the put command but the source is restricted to a local file reference
  8. hadoop fs -copyToLocal <origin> <local-destination> similar to the get command but the destination is restricted to a local file reference
  9. hadoop fs -touchz create an empty file on the file system
  10. hadoop fs -cat <file> copy files to stdout
  11. yarn node -list list nodes in the yarn cluster
  12. yarn node -status <node id> status of a node (memory used, free, number of containers, etc) for <node id> (first column from command above)
  13. yarn application -list list of Yarn applications and their state
  14. yarn logs -applicationId <appid> dump the logs for a particular application
  15. hdfs getconf return various configuration settings in effect
  16. hdfs getconf -namenodes namenodes in the cluster
  17. hdfs getconf -confkey <a.value> return the value of a particular setting (e.g. dfs.replication)
  18. hdfs dfsadmin -safemode get find out if you’re in safemode
  19. hdfs dfsadmin -report find out how much disk space us used, free, under-replicated, etc.
  20. kodoop sql <cluster> run an SQL session against the running server. <user> defaults to sys.
  21. kodoop server <cluster> start start the server, incorporating any new config file changes. Memory images will persist. If the server is currently running, this command restarts it.
  22. kodoop server <cluster> stop stop the server. Memory images will persist so long as the cluster remains active.
  23. kodoop server <cluster> status show the status of the server.
  24. kodoop cluster <cluster> initialize initialize the server. Erase existing data/metadata.
  25. kodoop cluster <cluster> stop stop the cluster’s YARN application. This will shut down everything except the edge nodes. Memory images will be lost but internal data will persist in HDFS.
  26. kodoop cluster <cluster> restart stop and then start again.
  27. kodoop mgr <cluster> shell run a sub-shell configured to allow users to directly run the management commands from the WX2 software.
  28. kodoop help find out about Kognitio on Hadoop commands
  29. kodoop testenv check Kognitio on Hadoop environment is configured correctly
  30. kodoop list_clusters show the currently configured Kognitio on Hadoop clusters
  31. kodoop server <cluster> diagnose check for problems with a server
  32. kodoop server <cluster> [auto|manual] turn automatic management on or off (defaults to on)
  33. kodoop server <cluster> viconf change server config settings
  34. kodoop incidents <cluster> list list of incidents (container failures, etc) the cluster has recovered from
  35. kodoop gateway <cluster> restart restart a hung gateway (was an issue for older versions)
  36. kodoop sql <cluster> quick SQL connection to the cluster as the sys user

 HDFS Operation

  • Client makes a Write request to Name Node
  • Name Node responds with the information about on available data nodes and where data to be written.
  • Client write the data to the addressed Data Node.
  • Replicas for all blocks are automatically created by Data Pipeline.
  • If Write fails, Data Node will notify the Client and get new location to write.
  • If Write Completed Successfully, Acknowledgement is given to Client
  • Non-Posted Write by Hadoop

HDFS: File Write

hadoop

HDFS: File Read

hadoop