A typical data file often has associated "metadata" which is descriptive information about the file, represented in the form of a set of name-value pairs. Common metadata include creator's name, tools used to generate the file, file creation/update date, location of creation, editing history, etc. EXIF (images), RDF (web resources), DOI (digital documents) are some of popular metadata standards.
While metadata has its own merits for data management purposes, it can actually affect your privacy adversely. For example, EXIF metadata in photo images can reveal personally identifiable information such as your camera model, GPS coordinate of shooting, your favorite photo editor software, etc. Metadata in documents and spreadsheets contain author/affiliation information and other editing history. Not to be paranoid, but metadata gathering tools such as metagoofil are often exploited during information gathering stage as part of penetration testing.
For those of you who want to strip any personalizing metadata from any shared data, there are ways to remove metadata from data files. You can use existing document or image editor software which typically have built-in metadata editing capability. In this tutorial, let me introduce a nice standalone metadata cleaner tool which is developed for a single goal: anonymize all metadata for your privacy.
Compared to other tools such as exiftool which can write to only a limited number of file types, MAT can eliminate metadata from all kinds of files: images (png, jpg), documents (odt, docx, pptx, xlsx, pdf), archives (tar, tar.bz2), audio (mp3, ogg, flac), etc.
Install MAT on Linux
On Debian-based systems (Ubuntu or Linux Mint), MAT comes packaged, so installation is straightforward:
On Fedora, MAT does not come as a pre-built package, so you need to build it from the source. Here is how I built MAT on Fedora (with some limited success; see the bottom of the tutorial):
$ sudo pip install hachoir-core hachoir-parser
$ wget https://mat.boum.org/files/mat-0.5.tar.xz
$ tar xf mat-0.5.tar.xz
$ cd mat-0.5
$ python setup.py install
Anonymize Metadata with MAT-GUI
Once installed, MAT can be accessible via GUI as well as from the command line. To launch MAT's GUI, simply type:
Let's clean up a sample document file (e.g., private.odt) which has the following metadata embedded.
To add the file to MAT for cleanup, click on "Add" icon. Once the file is loaded, click on "Check" icon to scan for any hidden metadata information.
Once any metadata is detected by MAT, "State" will be marked as "Dirty". You can double click the file to see detected metadata.
To clean up metadata from the file, click on "Clean" icon. MAT will automatically empty all private metadata fields from the file.
The cleaned up state is without any personally identifiable traces:
Anonymize Metadata from the Command Line
As mentioned before, another way to invoke MAT is from the command line, and for that, use mat command.
To check for any sensitive metadata, first go to the directory where your files are located, and then run:
It will scan all files in the current directory and its sub directories, and report their state (clean or unclean).
You can check actual metadata detected by using '-d' option:
If you don't supply any option with mat command, the default action is to remove metadata from files. If you want to keep a backup of original files during cleanup, use '-b' option. The following command cleans up all files, and stores original files as '*.bak" files.
To see a list of all supported file types, run:
Currently I have the following issue with a compiled version of MAT on Fedora. When I attempt to clean up archive/document files (e.g., *.gz, *.odt, *.docx) on Fedora, MAT fails with the following error. If you know how to fix this problem, let me know in the comment.
File "/usr/lib64/python2.7/zipfile.py", line 305, in __init__ raise ValueError('ZIP does not support timestamps before 1980') ValueError: ZIP does not support timestamps before 1980
MAT is a simple, yet extremely useful tool to prevent any inadvertent privacy leaks from metadata. Note that it is still your responsibility to anonymize file content, if necessary. All MAT does is to eliminate metadata associated with your files, but does nothing with the files themselves. In short, MAT can be a life saver as it can handle most common metadata removal, but you shouldn't rely solely on it to guarantee your privacy.
Subscribe to Xmodulo
Do you want to receive Linux FAQs, detailed tutorials and tips published at Xmodulo? Enter your email address below, and we will deliver our Linux posts straight to your email box, for free. Delivery powered by Google Feedburner.
Did you find this tutorial helpful? Then please be generous and support Xmodulo!
Latest posts by Dan Nanni (see all)
- How to monitor OpenFlow messages with packet sniffer - February 2, 2016
- How to search multiple pdf documents for words on Linux - January 13, 2016
- How to access Amazon Cloud Drive from the command line on Linux - January 12, 2016