Last updated on October 20, 2020 by Dan Nanni
A typical data file often has associated metadata which is descriptive information about the file, represented in the form of a set of name-value pairs. Common metadata include creator's name, tools used to generate the file, file creation/update date, location of creation, editing history, etc. EXIF (images), RDF (web resources), DOI (digital documents) are some of popular metadata standards.
While metadata has its own merits for data management purposes, it can actually affect your privacy adversely. For example, EXIF metadata in photo images can reveal personally identifiable information such as your camera model, GPS coordinate of shooting, your favorite photo editor software, etc. Metadata in documents and spreadsheets contain author/affiliation information and other editing history. Not to be paranoid, but metadata gathering tools such as metagoofil are often exploited during information gathering stage as part of penetration testing.
For those of you who want to strip any personalizing metadata from any shared data, there are ways to remove metadata from data files. You can use existing document or image editor software which typically have built-in metadata editing capability. In this tutorial, let me introduce a nice standalone metadata cleaner tool which is developed for a single goal: anonymize all metadata for your privacy.
MAT (Metadata Anonymisation Toolkit) is a dedicated metadata cleaner written in Python. It was developed under the umbrella of the Tor project, and comes standard on Tails, privacy-enhanced live OS. Compared to other tools such asexiftool
which can write to only a limited number of file types, MAT can eliminate metadata from all kinds of files: images (png, jpg), documents (odt, docx, pptx, xlsx, pdf), archives (tar, tar.bz2), audio (mp3, ogg, flac), etc.
On Debian-based systems (Ubuntu or Linux Mint), MAT comes packaged, so installation is straightforward:
$ sudo apt-get install mat
On Fedora, MAT does not come as a pre-built package, so you need to build it from the source. Here is how I built MAT on Fedora (with some limited success; see the bottom of the tutorial):
$ sudo yum install python-devel intltool python-pdfrw perl-Image-ExifTool python-mutagen $ sudo pip install hachoir-core hachoir-parser $ wget https://mat.boum.org/files/mat-0.5.tar.xz $ tar xf mat-0.5.tar.xz $ cd mat-0.5 $ python setup.py install
Once installed, MAT can be accessible via GUI as well as from the command line. To launch MAT's GUI, simply type:
$ mat-gui
Let's clean up a sample document file (e.g., private.odt
) which has the following metadata embedded.
To add the file to MAT for cleanup, click on Add
icon. Once the file is loaded, click on Check
icon to scan for any hidden metadata information.
Once any metadata is detected by MAT, State
will be marked as Dirty
. You can double click the file to see detected metadata.
To clean up metadata from the file, click on Clean
icon. MAT will automatically empty all private metadata fields from the file.
The cleaned up state is without any personally identifiable traces:
As mentioned before, another way to invoke MAT is from the command line, and for that, use mat
command.
To check for any sensitive metadata, first go to the directory where your files are located, and then run:
$ mat -c .
It will scan all files in the current directory and its sub directories, and report their state (clean or unclean).
You can check actual metadata detected by using -d
option:
$ mat -d <input_file>
If you don't supply any option with mat
command, the default action is to remove metadata from files. If you want to keep a backup of original files during cleanup, use -b
option. The following command cleans up all files, and stores original files as *.bak
files.
$ mat -b .
To see a list of all supported file types, run:
$ mat -l
Currently I have the following issue with a compiled version of MAT on Fedora. When I attempt to clean up archive/document files (e.g., *.gz, *.odt, *.docx) on Fedora, MAT fails with the following error. If you know how to fix this problem, let me know in the comment.
File "/usr/lib64/python2.7/zipfile.py", line 305, in __init__ raise ValueError('ZIP does not support timestamps before 1980') ValueError: ZIP does not support timestamps before 1980
MAT is a simple, yet extremely useful tool to prevent any inadvertent privacy leaks from metadata. Note that it is still your responsibility to anonymize file content, if necessary. All MAT does is to eliminate metadata associated with your files, but does nothing with the files themselves. In short, MAT can be a life saver as it can handle most common metadata removal, but you shouldn't rely solely on it to guarantee your privacy.
This website is made possible by minimal ads and your gracious donation via PayPal or credit card
Please note that this article is published by Xmodulo.com under a Creative Commons Attribution-ShareAlike 3.0 Unported License. If you would like to use the whole or any part of this article, you need to cite this web page at Xmodulo.com as the original source.
Xmodulo © 2021 ‒ About ‒ Write for Us ‒ Feed ‒ Powered by DigitalOcean