How to mount HDFS using FUSE

Last updated on June 3, 2012 by Dan Nanni

Hadoop Distributed File System (HDFS) is a distributed, scalable filesystem developed as the back-end storage for data-intensive Hadoop applications. As such, HDFS is designed to handle very large files with "write-once-read-many" access model. As HDFS is not a full-fledged POSIX compliant filesystem, it cannot be directly mounted by the operating system, and file access with HDFS is done via HDFS shell commands.

However, one can leverage FUSE to write a userland application that exposes HDFS via a traditional filesystem interface. fuse-dfs is one such FUSE-based application which allows you to mount HDFS as if it were a traditional Linux filesystem.

If you would like to mount HDFS on Linux, you can install fuse-dfs along with FUSE as follows.

I assume that you have a HDFS cluster already up and running, and know the HDFS NameNode to connect to. I also assume that you would like to mount HDFS on a separate Debian/Ubuntu-based host.

On the host where you would like to mount HDFS, do the following.

First, install Java JDK.

Next, install fuse-dfs and all necessary dependencies as follows.

To install fuse-dfs on CentOS or RHEL 6:

$ wget http://archive.cloudera.com/redhat/6/x86_64/cdh/cdh3-repository-1.0-1.noarch.rpm
$ sudo yum --nogpgcheck localinstall cdh3-repository-1.0-1.noarch.rpm
$ sudo rpm --import http://archive.cloudera.com/redhat/6/x86_64/cdh/RPM-GPG-KEY-cloudera
$ sudo yum install hadoop-0.20-fuse

To install fuse-dfs on Debian or Ubuntu 10.10 and earlier:

$ wget http://archive.cloudera.com/one-click-install/$(lsb_release -cs)/cdh3-repository_1.0_all.deb
$ sudo dpkg -i cdh3-repository_1.0_all.deb
$ sudo apt-get update
$ sudo apt-get install hadoop-0.20-fuse

To install fuse-dfs on Ubuntu 12.04 and higher:

$ wget http://archive.cloudera.com/one-click-install/maverick/cdh3-repository_1.0_all.deb
$ sudo dpkg -i cdh3-repository_1.0_all.deb
$ sudo apt-get update
$ sudo apt-get install hadoop-0.20-fuse

Once fuse-dfs is installed, go ahead and mount HDFS using FUSE as follows.

$ sudo hadoop-fuse-dfs dfs://<name_node_hostname>:<namenode_port> <mount_point>

Once HDFS has been mounted at <mount_point>, you can use most of the traditional filesystem operations (e.g., cp, rm, cat, mv, mkdir, rmdir, more, scp). However, random write operations such as rsync, and permission related operations such as chmod, chown are not supported in FUSE-mounted HDFS.

Support Xmodulo

This website is made possible by minimal ads and your gracious donation via PayPal or credit card

Please note that this article is published by Xmodulo.com under a Creative Commons Attribution-ShareAlike 3.0 Unported License. If you would like to use the whole or any part of this article, you need to cite this web page at Xmodulo.com as the original source.

Xmodulo © 2021 ‒ AboutWrite for UsFeed ‒ Powered by DigitalOcean