How to start a Citrix ADC / NetScaler WAF Project, Part 4: Start URLs

23 January 2020

12 Min read

H

This is the forth part of this blog. Part Part 1 2 3 4 5

Click here to see how to start your WAF project

StartURLs are a powerful tool to protect a web server. Probably, creating StartURLs will be the first thing you need to do. There are two ways to deal with it: Learning or doing.

Learning

Learning does not mean, you learn, instead Citrix ADC / NetScaler learns about the application. There is nothing to do but enable learning for StartURLs in the profile, and the Citrix ADC / NetScaler WAF learns, over time, all URLs the application has. Absolutely great! If you trust Citrix Sales, creating a good and save WAF is the easiest thing in the world, The learning feature does the job for you. So let’s take a closer look at this marvel.

How learning works

Basically learning is based on logging. You set up a WAF, don’t do any settings, turn on learning, turn off blocking, and the WAF will log all the violations into a special data base. You may than allow all these rules. It’s easy, it’s reliable and it will lead into thousands of rules, if your application is a bit more than the norz.at website (it currently consists of just 56 objects in 4 folders, most of the files are images).

It’s crucial to have just trustworthy users using the application while learning is enabled. I would strongly advise to use “trusted learning clients” if your website is already in production and receives potentially malicious traffic. You never know who is just now using your application!

Open Learning within the profile and click “Trusted Learning Clients”

The “learning clients” are defined by IPs or IP ranges. So it may be a single IP addresses like 192.168.10.13 or a range of IPs (i.e. your internal network) like 192.168.10.0/23. Don’t forget to enable the IP. Citrix ADC / NetScaler WAF will fall back to allow all IPs, if no trusted learning client IPs are enabled.

Attention!

The learning database is limited to 20 MB in size, which is reached after approximately 2,000 learned rules or relaxations are generated per security check for which learning is enabled.

So you can’t 100% rely on learning. Always make sure, learning is enabled for subsystems you’re currently working on only! Learning will, out of the blue, stop doing it’s job if the database reached it’s limit. Go to WAF- Setings and clear the logs, to continue learning.

So let’s start learning

First of all, I turn on learning (and at the same time turn off blocking, as we want to be able to access all items)

Next, someone has to surf through ALL URLs on your website. This will be easy for my own website, but turns out to be an almost impossible thing to do on a professional website. StartURL will only allow access to well known and trusted URLs.

URLs you did not click won’t get whitelisted. This will result in false positives later on. This will also happen, if you trace life traffic. Some applications have “end of quarter” routines. These routines will very likely not get clicked. A nice surprise out of the blue for your help desk.

I did this for my own website: I get a bunch of URLs. The small numbers next to the URLs is the number of hits:It’s not that handy to view, so I can use the visualizer instead:

You can see, there are 3 domain names: https://192.168.200.109, https://192.168.200.109 and https://elisabeth.norz.at. In addition a dimwit tried to find an admin interface, using one of my IPs (there is nothing like an admin interface on my server, this website does not use any framework, the owner knows how to use vi).

The visualizer shows nods, and branches from this nodes. I love the learning feature in Citrix ADC / NetScaler for this view, as it gives me a good overview on he structure of this website.

Next task would be: deploy rules. The easiest way: Select all he list and click deploy. But the visualizer can do more for us. I select a node, let’s say: https://192.168.200.109/ip/images. It contains the 2 images for my IP address calculator I created for my daughter: pc.jpg and router.jpg.

This will collapse the list and show a regular expression (regex) in the status bar.

https:\/\/www\.norz\.at\/ip\/images\/[\.cegjoprtu]{6,10}

It means: The string begins with https://192.168.200.109/ip/images. Behind, there are 6 and 10 of any following characters: .cegjoprtu

To be honest, I don’t like this regex very much. It would allow several possible file names. I am not very good with scrabble but I give some examples:

curt.jpg
geo.jpeg
port.tcp
ceo
j.e.p.g

and many more.

It’s not specific enough, at the same time, it’s highly complex.I could also do a php if there would be also a h on the list as well. So someone could execute php files in this subdirectory; this might be security risk. In addition, this policy could probably cause false positives, in case our testers had forgotten to click some of the links, or the logic of the program didn’t use all images available.

So I will rather not click deploy, but rewrite this regex, create a better one. My solution would be:

^https?;//(www\.)?norz.at/ip/images/\w+\.jpg$, or, even more specific ^https?;//(www\.)?norz.at/ip/images/\w{2,6}\.jpg$

^	begin of string
()	a block. My website uses 2 hostnames, norz.at and www.norz.at, so www. is optional
?	the character (or – in this case, block) is optional
\.	as a period actually means “any character”, I escape the period. It just means a period
\w	any word character (a-z A-Z 0-9 and _)
+	one or more of the character or block before. * would mean 0 or more
{}	the length, in this case, 2 to 6 repetitions of word characters

So instead of clicking deploy, I would rather click edit and deploy and create my own regex.

You may ask me: Why is Citrix using this https:\/\/www\.norz\.at instead of https://www\.norz\.at? Shall I be 100% honest? I have no fucking clue. Different to what Citrix says, Citrix ADCs / NetScalers don’t use pearl compatible regex, but php compatible regex. There is no need to escape these slashes. On the other hand, it’s not wrong, so escape them, if you think you like it.

So: Learning does a good job for simple application, but it needs some tweaking.

The manual method

Well, manual method is not actually correct. It’s a semi-manual method. I use the bash, to get a list of URLs. Similar to learning, I start with logging, blocking disabled.

As you see, I grep /var/log/ns.log for APPFW_STARTURL (these are log entries for start URL only) and pipe the output into an other grep to get only output for the profile APPFWP_WWW, the profile I’m currently working on. There are tons of logs, and we can’t deal with it. So I need a better solution.

cat ns.log |grep APPFW_STARTURL | grep APPFWP_WWW | awk -F '//' '{print $2}'|cut -d " " -f1 |sort | uniq

This results in a list, in alphabetic order, containing all URLs. There is absolutely no limit about the number of URLs, as you could also use the historic logs (ns.conf.0.gz, …)
zcat ns.log.* |grep APPFW_STARTURL | grep APPFWP_WWW | awk -F '//' '{print $2}'|cut -d " " -f1 |sort | uniq

I have written about statistics on logs previously. This article also explains my bash script. There are some drawbacks you need to be aware of:

there is no counter, so you don’t see the number of hits
There is a crazy vendor out there, I don’t name it, but it’s in Redmond/WA. They tend to use spaces (?!!!) and other characters, that actually never should be in an URL, in their URLs. This is especially true for their file sharing and mail solution. My script cuts the line after the first space, so URLs may be shortened. You need to know if you deal with this crazy kind of software, or with serious one!
I cut off everything in front of the first //, so you don’t see if it’s http or https.

Why do I need this list? It gives me a good overview on similar URLs, so I can do relaxations for several URLs in one. This saves CPU, memory and makes profiles (and ns.conf) smaler.

I give you an example:

elisabeth.norz.at/images/Anschrift.png elisabeth.norz.at/images/Elisabeth.Norz.jpg elisabeth.norz.at/images/english.png norz.at/images/NorzLogo2.gif norz.at/images/footer1.png norz.at/images/footer2.png norz.at/images/footer3.png norz.at/ip/images norz.at/ip/images/PC.jpg norz.at/ip/images/Router.jpg norz.at/ip/images/pc.jpg norz.at/ip/images/router.jpg www.norz.at/images/NorzLogo2.gif www.norz.at/images/footer1.png www.norz.at/images/footer2.png www.norz.at/images/footer3.png www.norz.at/ip/images. www.norz.at/ip/images/PC.jpg www.norz.at/ip/images/Router.jpg

I did an other grep and filtered on images. It’s easy to see, there are several files in /images subdirectories. These subdirectories contain gif, png and jpg. so I can make a rule for each subdirectory. Two entries are wrong, so they should not be covered. I coloured items, to make things a bit clearer.

https://elisabeth\.norz\.at/images/\w+\.(png|jpg) https://norz\.at/images/\w+\.(gif|png|jpg) https://www\.norz\.at/images/\w+.(gif|pmg) https://www\.norz\.at/ip/images/\w+\.jpg

You see, just 4 rules left. To be honest, www.norz.at and norz.at is exactly the same, so I can make just one green rule:

https://elisabeth\.norz\.at/images/\w+\.(png|jpg) https://(www\.)?norz\.at/images/\w+\.(gig|jpg|png) https://www\.norz\.at/ip/images/\w+\.jpg

Just 3 left. But I could also include elisabeth.norz.at:

https://(www\.|elisabeth\.)?norz\.at/images/\w+\.(gif|png|jpg) https://www\.norz\.at/ip/images/\w+\.jpg

2 rules left. And let’s make one of them, we already know how to make parts optional:

https://(www\.|elisabeth\.)?norz\.at/(ip/)?images/\w+\.(gif|png|jpg)

Great!

All Regex needs a risk analysis. What can go wrong, which weakens may arise? First of all, we didn’t see png and gif in /ip/images. I allowed them. I don’t think this is risky. At the same time, I allow all file names in unlimited length (see buffer overflow test). My OS has to be able to deal with this (my Linux does). If not I would need to do something like \w{1,64} to limit file names to, let’s say, 64 characters (plus extension).

I just wanted to give you an idea on how I do startURLs.

A summary, based on logs of a day or even more, gives me a good overview. I spend much time on combining regex. I use regex101.com to test. Don’t just test if all desired URLs get through, also think of potentially malicious URLs and try if you block these.

What next?

Nothing is perfect. I will cause false positives (see my first chapter) for several reasons. So I turn on learning. There should not be too much left to learn.

Johannes Norz

Johannes Norz is a Citrix Certified Citrix Technology Advocate (CTA), Citrix Certified Instructor (CCI) and Citrix Certified Expert on Application Delivery and Security (CCE-AppDS).

He frequently works for Citrix international Consulting Services and several education centres all around the globe.

Johannes lives in Austria. He had been borne in Innsbruck, a small city (150.000 inhabitants) in the middle of the most beautiful Austrian mountains (https://www.youtube.com/watch?v=UvdF145Lf2I)

Cancel reply

Programming and Interactive Multimedia Research Group Telkom University says:

8 July 2020 at 12:35 pm

Do you also provide videos in the full version?

Reply
- Johannes Norz says:
  
  9 September 2020 at 9:20 am
  
  Sorry, no videos about WAF.
  
  Reply
Johannes Norz says:

9 September 2020 at 9:04 am

There will be a part 3. One day. I started, but I am very busy, so I could not finish all I wanted :'(

Reply
Fianda Briliyandi says:

15 December 2020 at 7:17 pm

Good article, thanks for sharing, please visit
ittelkom-sby.ac.id

Reply
- Johannes Norz says:
  
  15 December 2020 at 8:13 pm
  
  Thank you, Fianda!
  
  Reply

How to start a Citrix ADC / NetScaler WAF Project, Part 4: Start URLs

Learning

How learning works

Attention!

So let’s start learning

The manual method

What next?

About the author

Johannes Norz

5 comments

Cancel reply

Last posts

Avalable categories

Learning

How learning works

Attention!

So let’s start learning

The manual method

What next?

About the author

Johannes Norz

5 comments

Read more

Last posts

Avalable categories