Home > Articles > Programming

  • Print
  • + Share This
This chapter is from the book

URLs

URL matching is a complicated task—or rather, it can be complicated depending on how flexible the matching needs to be. At a minimum, URL matching should match the protocol (probably http and https), a hostname, an optional port, and a path.

http://www.forta.com/blog
https://http://www.forta.com:80/blog/index.cfm
http://www.forta.com
http://ben:password@http://www.forta.com/
http://localhost/index.php?ab=1&c=2
http://localhost:8500/
https?://[-\ w.]+(:\ d+)?(/([\ w/_.]*)?)?
http://www.forta.com/blog
https://http://www.forta.com:80/blog/index.cfm
http://www.forta.com
http://ben:password@http://www.forta.com/
http://localhost/index.php?ab=1&c=2
http://localhost:8500/

https?:// matches http:// or https:// (the ? makes the s optional). [-\ w.]+ matches the hostname. (:\ d+)? matches an optional port (as seen in the second and sixth lines in the example). (/([\ w/_.]*)?)? matches the path, the outer subexpression matches / if one exists, and the inner subexpression matches the path itself. As you can see, this pattern cannot handle query strings, and it misreads embedded username:password pairs. However, for most URLs it will work adequately (matching hostnames, ports, and paths).

  • + Share This
  • 🔖 Save To Your Account