How to mount HDFS using FUSE

Hadoop Distributed File System (HDFS) is a distributed, scalable filesystem developed as the back-end storage for data-intensive Hadoop applications. As such, HDFS is designed to handle very large files with "write-once-read-many" access model. As HDFS is not a full-fledged POSIX compliant filesystem, it cannot be directly mounted by the operating system, and file access with HDFS is done via HDFS shell commands.

However, one can leverage FUSE to write a userland application that exposes HDFS via a traditional filesystem interface. fuse-dfs is one such FUSE-based application which allows you to mount HDFS as if it were a traditional Linux filesystem. If you would like to mount HDFS on Linux, you can install fuse-dfs, along with FUSE as follows.

I assume that you have a HDFS cluster already up and running, and know the HDFS NameNode to connect to. I also assume that you would like to mount HDFS on a separate Debian/Ubuntu-based host.

On the host where you would like to mount HDFS, do the following.

First, install Java JDK.

Next, install fuse-dfs and all necessary dependencies as follows.

To install fuse-dfs on CentOS or RHEL 6:

$ wget http://archive.cloudera.com/redhat/6/x86_64/cdh/cdh3-repository-1.0-1.noarch.rpm
$ sudo yum --nogpgcheck localinstall cdh3-repository-1.0-1.noarch.rpm
$ sudo rpm --import http://archive.cloudera.com/redhat/6/x86_64/cdh/RPM-GPG-KEY-cloudera
$ sudo yum install hadoop-0.20-fuse

To install fuse-dfs on Debian or Ubuntu 10.10 and earlier:

$ wget http://archive.cloudera.com/one-click-install/$(lsb_release -cs)/cdh3-repository_1.0_all.deb
$ sudo dpkg -i cdh3-repository_1.0_all.deb
$ sudo apt-get update
$ sudo apt-get install hadoop-0.20-fuse

To install fuse-dfs on Ubuntu 12.04 and higher:

$ wget http://archive.cloudera.com/one-click-install/maverick/cdh3-repository_1.0_all.deb
$ sudo dpkg -i cdh3-repository_1.0_all.deb
$ sudo apt-get update
$ sudo apt-get install hadoop-0.20-fuse

Once fuse-dfs is installed, go ahead and mount HDFS using FUSE as follows.

$ sudo hadoop-fuse-dfs dfs://<name_node_hostname>:<namenode_port> <mount_point>

Once HDFS has been mounted at <mount_point>, you can use most of the traditional filesystem operations (e.g., cp, rm, cat, mv, mkdir, rmdir, more, scp). However, random write operations such as rsync, and permission related operations such as chmod, chown are not supported in FUSE-mounted HDFS.


Subscribe to Xmodulo

Do you want to receive Linux FAQs, detailed tutorials and tips published at Xmodulo? Enter your email address below, and we will deliver our Linux posts straight to your email box, for free. Delivery powered by Google Feedburner.

The following two tabs change content below.
Dan Nanni is the founder and also a regular contributor of Xmodulo.com. He is a Linux/FOSS enthusiast who loves to get his hands dirty with his Linux box. He likes to procrastinate when he is supposed to be busy and productive. When he is otherwise free, he likes to watch movies and shop for the coolest gadgets.

7 thoughts on “How to mount HDFS using FUSE

  1. Hello ,

    Thanks for the tutorial ,

    Is there any way to mount a sub directory from the hadoop file system .

    something like this :

    hadoop-fuse-dfs dfs://hadoop2-m1:8020/backups/ /fuse/

    I am getting the “/” directory and not the backups.

  2. I want to mount the HDFS as local file system using FUSE but I don't know how to install fuse.
    I am using ubuntu 12.04.
    When I run: sudo apt-get install hadoop-0.20-fuse
    I got following error:

    Reading package lists... Done
    Building dependency tree
    Reading state information... Done
    E: Unable to locate package hadoop-0.21-fuse
    E: Couldn't find any package by regex 'hadoop-0.21-fuse'

  3. hello,I am using ubuntu 12.04
    My fuse-dfs is installed.But when I mount HDFS using FUSE as follows

    sudo hadoop-fuse-dfs dfs://master:9000 /home/hl/a

    I get this :

    /usr/lib/hadoop-0.20/bin/fuse_dfs: /usr/lib/jvm/default-java/jre/lib/i386/jamvm/libjvm.so: no version information available (required by /usr/lib/libhdfs.so.0)
    INFO fuse_options.c:165 Adding FUSE arg /home/hl/a

    and when I open the /home/hl/a ,I get :
    Error: Error when getting information for file '/home/hl/a': Transport endpoint is not connected
    Please select another viewer and try again.

    and I can not unmount it.I get :
    umount: /home/hl/a is not in the fstab (and you are not root)

    What should I do?

    • CDH doesn't support OpenJDK. So make sure to install Oracle JDK. Also, set LD_LIBRARY_PATH properly, so that it can find the correct version of libjvm.so.

  4. I did everything correctly accord to the tutorial.
    When I ran hadoop-fuse-dfs dfs://node01:9000 /root/hadoop ,
    I got the message:
    INFO fuse_options.c:165 Adding FUSE arg /root/hadoop

    And when I try to 'ls' /root I got the error:
    ls: cannot access hadoop: Input/output error

    what should I do?

    • I came across this error too ": Input/output error", it seems like some problems with the permission of the mount point. This post is too simple & outdated to make it work.

Leave a comment

Your email address will not be published. Required fields are marked *

Current ye@r *