Update: see my most recent comment. WordPress has a pretty good regex for matching URLs.


A bold claim, but I think I’ve got one:

|([A-Za-z]{3,9})://([-;:&=\+\$,\w]+@{1})?([-A-Za-z0-9\.]+)+:?(\d+)?((/[-\+~%/\.\w]+)?\??([-\+=&;%@\.\w]+)?#?([\w]+)?)?|

An online events booking system I developed doesn’t allow HTML in the event description field, primarily to protect against annoying scripting attacks. But what if you want to provide a link in the description? I need to detect plain text URLs stored in the database, and turn them into hyperlinks when displayed in the browser. The regular expression above allows me to do that quite easily in PHP:

$pattern = |([A-Za-z]{3,9})://([-;:&=+$,w]+@{1})?([-A-Za-z0-9.]+)+:?(d+)?((/[-+~%/.w]+)???([-+=&;%@.w]+)?#?([w]+)?)?|;
$html = preg_replace($pattern, '<a href="$0">$0</a>', $text);

But the regular expression has several submatches. They provide a means to break down the URL into its constituent parts, including protocol, user info, server name, REQUEST_URI, query string and anchor.

Here’s a PHP class I wrote that uses this Regular Expression to analyse a string, detect URLs, populate an array with the constituent parts of the URL, and replace URLs with hyperlinks. Here’s an example of usage:

$text  = 'Please visit http://www.example.com/cgi-bin/';
$text .= 'script.cgi?variable=value&variable2=some';
$text .= '+url+encoded+text#section-1 to find out more';

$urlf = new URLFinder();
$html = $urlf->make_links($text);
echo $html;

If you print_r($urlf), you can see how the URL is broken down.

I haven’t managed to find any exceptions to the expression, but if you do, please post an example.