In a previous tutorial, we have seen the method of creating a gateway using iptables. This tutorial will focus on turning the gateway into a transparent proxy server. A proxy is called "transparent" when clients are not aware that their requests are processed through the proxy.
There are several benefits of using a transparent proxy. First of all, for end users, a transparent proxy can enhance their web browsing performance by caching frequently accessed web content, while introducing minimal configuration overhead for them. For administrators, it can be used to enforce various administrative policies such as content/URL/IP filtering, rate limiting, etc.
A proxy server acts as an intermediary between a client and a destination server. The client sends requests to the proxy server which then evaluates the requests and takes necessary actions. In this tutorial, we will be setting up a web proxy server using Squid, which is a robust, customizable and stable proxy server. Personally, I had administered a Squid server with 400+ client workstations for about a year. Although I had to restart the service about once a month in average, CPU and storage utilization, throughput and client response time were all great.
We will be configuring Squid for the following topology. The CentOS/RHEL box has one NIC (eth0) connected to the private LAN, and the other one (eth1) connected to the Internet.
To set up a transparent proxy with Squid, we start by adding necessary iptables rules. These rules should help you get started, but please make sure that they do not conflict with any of the existing configuration.
# iptables -t nat -A PREROUTING -i eth0 -p tcp --dport 80 -j REDIRECT --to-port 3128
The first rule will cause all outbound packets from eth1 (WAN interface) to have the source IP address of eth1 (i.e., enable NAT). The second rule will redirect all incoming HTTP packets (destined to TCP 80) from eth0 (LAN interface) to Squid listening ort (TCP 3128), instead of forwarding it out to WAN interface right away.
We start Squid installation by using yum.
Now we will modify Squid configuration to turn it into a transparent proxy. We define our LAN subnet (e.g., 10.10.10.0/24) as a valid client network. Any traffic not originating from the LAN subnet will be denied access.
visible_hostname proxy.example.tst http_port 3128 transparent ## Define our network ## acl our_network src 10.10.10.0/24 ## make sure that our network is allowed ## http_access allow our_network ## finally deny everything else ## http_access deny all
Now we start Squid service and make sure it is added to startup.
# chkconfig squid on
Now that Squid is up and running, we can test its functionality by monitoring Squid log. Visit any URL from a computer connected to the LAN, and you should see something like the following in the log.
1402987348.816 1048 10.10.10.10 TCP_MISS/302 752 GET http://www.google.com/ - DIRECT/18.104.22.168 text/html 1402987349.416 445 10.10.10.10 TCP_MISS/302 762 GET http://www.google.com.bd/? - DIRECT/22.214.171.124 text/html
According to the log file, the machine with IP 10.10.10.10 tried to access google.com and Squid processed the request.
The most basic form of Squid proxy server is now ready. In the rest of the tutorial, we will be tuning some parameters of Squid to control outbound traffic. Note that this is for demonstration only. Actual policies should be customized to meet your requirements.
Before starting the configuration, we clarify a few points.
Squid Configuration Parsing
While reading a configuration file, Squid parses the file in a top-down fashion. Rules are parsed top-down until a match is found. Whenever a match is found, that rule is executed, and any other rule below it is ignored. So, the best practice for adding filtering rules is to specify rules in the following sequence.
explicit allow explicit deny allow entire LAN deny all
Squid restart vs. Squid reconfigure
Whenever Squid configuration is modified, Squid service needs to be restarted. Depending on the number of active connections, restarting the service may take a a while, sometimes several minutes. LAN users will not be able to access the Internet during this time. To avoid such service interruption, we can use the following command instead of "service squid restart".
This command will allow Squid to run with updated parameters without restarting itself.
Filtering LAN Hosts by IP Address
In this demonstration, we want to set up Squid such that hosts with IP address 10.10.10.24 and 10.10.10.25 are prevented from accessing the Internet. We create a text file 'denied-ip-file' containing the IP addresses of all denied hosts, and add that file in Squid configuration.
## first we create an ACL to isolate the denied IPs ## acl denied-ip-list src "/etc/squid/denied-ip-file" ## then we apply the ACL ## http_access deny denied-ip-list ## explicit deny ## http_access allow our_network ## allow LAN ## http_access deny all ## deny all ##
Now we need to restart Squid service. Squid will no longer honor requests from these IP addresses. If we check the squid log, we will find 'TCP_DENIED' for requests originated from these hosts.
Filtering Websites in a Blacklist
This method will work for HTTP only. Assuming that we want to block badsite.com and denysite.com, we add the addresses to a file and add the reference to squid.conf.
## ACL definition ## acl badsite-list url_regex "/etc/squid/badsite-file" ## ACL application ## http_access deny badsite-list http_access deny denied-ip-list ## previously set, no effect here ## http_access allow our_network http_access deny all
Please note that we have used the ACL type 'url_regex', which will match the words 'badsite' and 'denysite' in requested URLs. That is, any request whose URL contains 'badsite' or 'denysite' (e.g., badsite.org, newdenysite.com, otherbadsite.net) will be blocked.
Combining Multiple ACLs
We will create an access list to block clients with IP addresses 10.10.10.200 and 10.10.10.201 from accessing custom-block-site.com. Any other clients would be able to access the site. To do this, we will create an access list to isolate the IP addresses first, and then create another access list to isolate the required website. Finally, we will use both access lists simultaneously to meet the requirement.
acl custom-denied-list src "/etc/squid/custom-denied-list-file" acl custom-block-site url_regex "/etc/squid/custom-block-website-file" ## ACL application ## http_access deny custom-denied-list custom-block-site http_access deny badsite-list ## previously set, no effect here ## http_access deny denied-ip-list ## previously set, no effect here ## http_access allow our_network http_access deny all
The blocked hosts should not be able to access the mentioned site now. The log file /var/log/squid/access.log should contain 'TCP_DENIED' for corresponding requests.
Setting Maximum Download Size
Squid can be used to control the maximum downloadable file size. We want to restrict maximum download size to 50 MB for hosts 10.10.10.200 and 10.10.10.201. We have already created the ACL 'custom-denied-list' previously to isolate the traffic from these sources. Now we will use the same access list to restrict download size.
reply_body_max_size 50 MB custom-denied-list
Setting up Squid Caching Hierarchy
Squid supports caching by storing frequently accessed files in the local storage. Imagine 100 users within your LAN are accessing google.com. Without caching, the logo or doodle for the page needs to be fetched individually for each request. Squid can store the logo or doodle in its cache to serve them from its cache. This results in improved user perceived performance as well as reduced bandwidth usage. A win-win if you will.
To enable caching, we modify the configuration file squid.conf.
cache_dir ufs /var/spool/squid 100 16 256
The numbers 100, 16 and 256 have the following meaning.
- 100 MB storage is allocated for Squid cache. You may increase the allocated space if you want.
- 16 directories, each containing 256 subdirectories will be used to store cache files. This parameter should not be modified.
We can verify whether Squid cache is enabled from the log file /var/log/squid/access.log. For successful cache hits, we should see entries with 'TCP_HIT'.
To sum up, Squid is a powerful, industry standard web proxy server that is used widely by system admins worldwide. Squid provides easy access control that can be used to administer traffic originating from the LAN. It can be deployed in small companies as well as large enterprise networks. This tutorial covered only a subset of all Squid features. Refer to the official documentation for a complete feature list.
Hope this helps.
Subscribe to Xmodulo
Do you want to receive Linux FAQs, detailed tutorials and tips published at Xmodulo? Enter your email address below, and we will deliver our Linux posts straight to your email box, for free. Delivery powered by Google Feedburner.