Why would you like to customize a 404 page?
Well, it’s all about misleading information. A hacker has a very limited chance to get a friend with your web server. On the other way, he needs to find out as much as any possible. The more he knows, the more likely his attack will be successful. On the other hand, he has to let sleeping dogs lie. With other words: He must not alarm you.
One of the most important things to know is: What kind of web server do I have to deal with?
The first source to look into is an HTTP response header called Server. Information here may be very verbose. I don’t know why this header is part of HTML standard, but actually, it is.
The Server response-header field contains information about the software used by the origin server to handle the request. The field can contain multiple product tokens (section 3.8) and comments identifying the server and any significant subproducts. The product tokens are listed in order of their significance for identifying the application. (RFC 2616)
This is an example server header:
Apache/1.3.28 (Unix) mod_ssl/2.8.15 OpenSSL/0.9.7c mod_perl/1.27 PHP/4.3
In this case, it’s a very outdated Apache, using an outdated SSL module, outdated Perl and outdated PHP. It’s easy to change this information using Citrix NetScaler rewrite policies (DELETE_HTTP_HEADER and INSERT_HTTP_HEADER).
But hackes are not that stupid. They will probably verify this information. My personal next try would be: check for a non-existing site. We will see a 404, page not found. Being careful I would use an existing URL, however, do a minor typo, just like https://192.168.200.109/default.html instead of https://192.168.200.109/default.htm. You would probably not be scared if you would see a request like that watching your logs.
The next thing he would see is a 404, Not Found. It will be specific to your server if you don’t change it. And a 404 page originating from an IIS6 would, for sure, come from an IIS 6, no matter what the server header tells you.
More reasons to change the 404 page
of course, there are even more reasons to change the 404 page: customized 404 pages seem to be funny, they may help people to find the content needed, and so on.
Why not change your web server?
This would be possible. However, you would need to change all your load-balanced web servers. There is another reason: Responder policies. I will never return a “401 Unauthorized” or “403 forbidden“. I would rather return a “404 not found”. Being a hacker I would be very excited to see a 401 or 403!
I would think: here it is, but someone protects it from being accessed. But how could I find out what’s going on, if a Citrix NetScaler uses exactly the same 404 page as the original web server? I would probably think the file is not there.
My solution
My first attempt was creating a simple rewriting policy changing the body with something like “HTTP/1.1 404 OK\n\r\n\r<html><head><title>404 File not found</title></tead><body><h1><font color=\”#802020\”>404 File not found!</font></h1><p><font color=\”#802020\”>The file you requested is not on this server.</font></p></body></html>” in it.
The length of the text is limited, so this is not a good solution. And I would rather like to place the file “somewhere” on my web server, so it’s pretty easy to change.
I spent some time thinking about what to do and made up my mind to use the HTTP callout feature. It was my first ever attempt to use HTTP callout, and I’ll describe how it works.
NetScaler’s HTTP callout feature
HTTP callout is intended to be used in policies to check something, i.e. an IP address, against a web-based service. So I could send an IP address (CLIENT.IP.SRC) to a web server containing an IP blacklist. This web server then would respond with something indicating good or bad.
I do something completely different: I will retrieve the content of the 404 page from a web server. To do so I have to navigate to App Expert -> HTTP Callouts.
Like any policy, it has to get a name. I do my callout to a vServer, so I have to specify the server here. My request will be attribut based, that means, I will be able to send regular HTTP requests, mine is a HTTP GET. My web server uses several hostnames for various virtual pages, so I have to specify a proper host expression. This makes sure, we retrieve the file from the right source. The URL Stem Expression is the URL we want to retrieve.
We scroll down to the bottom and select the return type TEXT and the expression should be HTTP.RES.BODY(65538). The number is the number of bytes to retrieve.
So, my policy will connect to a NetScaler vServer called cs_vsrv_norz.at to retrieve a file called /notfound.htm, setting the header Host to norz.at (i.e: https://192.168.200.109/notfound.htm). It will then return all the body of this file, containing links to style definitions, pictures and so on.
Command-line version:
add policy httpCallout callout_retrieve_404 -vServer cs_vsrv_norz.at -returnType TEXT -hostExpr "\"norz.at\"" -urlStemExpr "\"/notfound.htm\"" -scheme http -resultExpr "HTTP.RES.BODY(65538)" set policy httpCallout callout_retrieve_404 -vServer cs_vsrv_norz.at -returnType TEXT -hostExpr "\"norz.at\"" -urlStemExpr "\"/notfound.htm\"" -scheme http -resultExpr "HTTP.RES.BODY(65538)"
The rewrite policy
The rewrite policy should be a very simple thing:
The NetScaler rewrite action using a HTTP callout
add rewrite action callout404 replace_http_res "SYS.HTTP_CALLOUT(callout_retrieve_404 )"
It’s a replace policy. Expression to choose the target location is all of the HTML body, so HTTP.RES.BODY (65536). To be more precise, it’s the first 65536 byte of the body (a 404 page typically is by far smaller). The Expression is the text we will use to replace the former body with. It is the HTTP callout request, in my case SYS.HTTP_CALLOUT(callout_retrieve_404).
The NetScaler rewrite policy
add rewrite policy rw_pol_404 "HTTP.RES.STATUS.EQ(404)" rw_act_404
This policy will get applied if the HTTP response status is a 404 (HTTP.RES.STATUS.EQ(404)). I then bound this policy to my webserver. That’s it. It was pretty easy.