A rehash of code modified by "laulibrius at hotmail dot com".
This also parses urls for hosts that don't have a domain name and just use an IP as the hostname.
The old code would assume that the IP octets were a subdomain.
So the url "http://255.255.255.255/" would return 255.255 as a subdomain of 255.255.
<?php
parseUrl($url)
{
$r = "^(?:(?P<scheme>\w+)://)?";
$r .= "(?:(?P<login>\w+):(?P<pass>\w+)@)?";
$ip="(?:[0-9]{1,3}+\.){3}+[0-9]{1,3}";//ip check
$s="(?P<subdomain>[-\w\.]+)\.)?";//subdomain
$d="(?P<domain>[-\w]+\.)";//domain
$e="(?P<extension>\w+)";//extension
$r.="(?P<host>(?(?=".$ip.")(?P<ip>".$ip.")|(?:".$s.$d.$e."))";
$r .= "(?::(?P<port>\d+))?";
$r .= "(?P<path>[\w/]*/(?P<file>\w+(?:\.\w+)?)?)?";
$r .= "(?:\?(?P<arg>[\w=&]+))?";
$r .= "(?:#(?P<anchor>\w+))?";
$r = "!$r!"; // Delimiters
preg_match($r, $url,$out);
}
?>
If you need to validate the host IP this is easier than using regex.
<?php
$parsed=parseUrl($url);
if($parsed['ip']) {
if(long2ip(ip2long($parsed['ip']))==$parsed['ip']){//validates IP
echo $parsed['ip']." is a valid host";
}
else {
echo $parsed['ip']." is not a valid host";
}
}
?>
parse_url
(PHP 4, PHP 5)
parse_url — URL を解釈し、その構成要素を返す
説明
この関数は、URL の様々な構成要素のうち特定できるものに関して 連想配列にして返します。
この関数は、指定された URL が有効かどうかを調べるためのもの ではなく、単に URL を上で示した 要素に分解するだけのものです。不完全な URL であっても受け入れられますし、 そのような場合でも parse_url() は可能な限り 正しく解析しようとします。
パラメータ
- url
-
パースする URL
- component
-
PHP_URL_SCHEME、 PHP_URL_HOST、PHP_URL_PORT、 PHP_URL_USER、PHP_URL_PASS、 PHP_URL_PATH、PHP_URL_QUERY あるいは PHP_URL_FRAGMENT のうちのいずれかを指定し、 特定の URL コンポーネントのみを 文字列で取得するようにします。
返り値
完全におかしな形式の URL については、parse_url() は FALSE を返し、E_WARNING を発生します。それ以外の場合は 連想配列が返され、その中には以下の要素(のうち少なくともひとつ)が含まれます。
- scheme - 例: http
- host
- port
- user
- pass
- path
- query - クエスチョンマーク ? 以降
- fragment - ハッシュマーク # 以降
component が指定されている場合、結果は array ではなく文字列で返されます。
変更履歴
| バージョン | 説明 |
|---|---|
| 5.1.2 | パラメータ component が追加されました。 |
例
例1 parse_url() の例
<?php
$url = 'http://username:password@hostname/path?arg=value#anchor';
print_r(parse_url($url));
?>
上の例の出力は以下となります。
Array ( [scheme] => http [host] => hostname [user] => username [pass] => password [path] => /path [query] => arg=value [fragment] => anchor )
注意
注意: この関数は相対 URL では動作しません。
注意: parse_url() は URL をパースするための関数であり、 URI をパースするものではありません。しかし、PHP の後方互換性を満たすため、 例外として file:// スキームについては 3 重スラッシュ(file:///...) が認められています。他のスキームにおいては、これは無効な形式となります。
parse_url
05-Sep-2008 04:12
23-Aug-2008 05:47
my function catch the url written on the browser by the user and does the same thing of parse_url. but better, I think. I don't like parse_url because it says nothing about elements that it doesn't find in the url. my function instead return an empty string.
<?php
function get_url()
{
$arr = array();
$uri = $_SERVER['REQUEST_URI'];
// query
$x = array_pad( explode( '?', $uri ), 2, false );
$arr['query'] = ( $x[1] )? $x[1] : '' ;
// resource
$x = array_pad( explode( '/', $x[0] ), 2, false );
$x_last = array_pop( $x );
if( strpos( $x_last, '.' ) === false )
{
$arr['resource'] = '';
$x[] = $x_last;
}
else
{
$arr['resource'] = $x_last;
}
// path
$arr['path'] = implode( '/', $x );
if( substr( $arr['path'], -1 ) !== '/' ) $arr['path'] .= '/';
// domain
$arr['domain'] = $_SERVER['SERVER_NAME'];
// scheme
$server_prt = explode( '/', $_SERVER['SERVER_PROTOCOL'] );
$arr['scheme'] = strtolower( $server_prt[0] );
// url
$arr['url'] = $arr['scheme'].'://'.$arr['domain'].$uri;
return $arr;
}
?>
PS: I found working with explode is faster than using preg_match (I tryed with getmicrotime function and 'for' cycles).
PPS: I used array_pad to prevent any notice.
29-Jun-2008 09:28
Here's the easiest way to get the URL to the path that your script is in (so not the actual script name itself, just the complete URL to the folder it's in)
echo "http://".$_SERVER['HTTP_HOST'].dirname($_SERVER['PHP_SELF']);
23-Jun-2008 09:35
based on the "laulibrius at hotmail dot com" function, this work for relatives url only:
<?php
function parseUrl($url) {
$r = "^(?:(?P<path>[\.\w/]*/)?";
$r .= "(?P<file>\w+(?:\.\w+)?)?)\.(?P<extension>\w+)?";
$r .= "(?:\?(?P<arg>[\w=&]+))?";
$r .= "(?:#(?P<anchor>\w+))?";
$r = "!$r!";
preg_match ( $r, $url, $out );
return $out;
}
print_r(parseUrl("../test/f.aq.php?p=1&v=blabla#X1"));
?>
returns:
Array
(
[0] => ../test/faq.php?p=1&v=blabla#X1
[path] => ../test/
[1] => ../test/
[file] => faq
[2] => faq
[extension] => php
[3] => php
[arg] => p=1&v=blabla
[4] => p=1&v=blabla
[anchor] => X1
[5] => X1
)
17-Jun-2008 01:31
There was one thing missing in the function dropped by "to1ne at hotmail dot com" when i tried it : domain and subdomain couldn't have a dash "-". So i add it in the regexp and the function looks like this now :
<?php
function parseUrl($url) {
$r = "^(?:(?P<scheme>\w+)://)?";
$r .= "(?:(?P<login>\w+):(?P<pass>\w+)@)?";
$r .= "(?P<host>(?:(?P<subdomain>[-\w\.]+)\.)?" . "(?P<domain>[-\w]+\.(?P<extension>\w+)))";
$r .= "(?::(?P<port>\d+))?";
$r .= "(?P<path>[\w/]*/(?P<file>\w+(?:\.\w+)?)?)?";
$r .= "(?:\?(?P<arg>[\w=&]+))?";
$r .= "(?:#(?P<anchor>\w+))?";
$r = "!$r!"; // Delimiters
preg_match ( $r, $url, $out );
return $out;
}
?>
Btw, thanks for the function, it helps me a lot.
14-Jun-2008 03:01
Based on the idea of "jbr at ya-right dot com" have I been working on a new function to parse the url:
<?php
function parseUrl($url) {
$r = "^(?:(?P<scheme>\w+)://)?";
$r .= "(?:(?P<login>\w+):(?P<pass>\w+)@)?";
$r .= "(?P<host>(?:(?P<subdomain>[\w\.]+)\.)?" . "(?P<domain>\w+\.(?P<extension>\w+)))";
$r .= "(?::(?P<port>\d+))?";
$r .= "(?P<path>[\w/]*/(?P<file>\w+(?:\.\w+)?)?)?";
$r .= "(?:\?(?P<arg>[\w=&]+))?";
$r .= "(?:#(?P<anchor>\w+))?";
$r = "!$r!"; // Delimiters
preg_match ( $r, $url, $out );
return $out;
}
print_r ( parseUrl ( 'me:you@sub.site.org:29000/pear/validate.html?happy=me&sad=you#url' ) );
?>
This returns:
Array
(
[0] => me:you@sub.site.org:29000/pear/validate.html?happy=me&sad=you#url
[scheme] =>
[1] =>
[login] => me
[2] => me
[pass] => you
[3] => you
[host] => sub.site.org
[4] => sub.site.org
[subdomain] => sub
[5] => sub
[domain] => site.org
[6] => site.org
[extension] => org
[7] => org
[port] => 29000
[8] => 29000
[path] => /pear/validate.html
[9] => /pear/validate.html
[file] => validate.html
[10] => validate.html
[arg] => happy=me&sad=you
[11] => happy=me&sad=you
[anchor] => url
[12] => url
)
So both named and numbered array keys are possible.
It's quite advanced, but I think it works in any case... Let me know if it doesn't...
03-May-2008 12:24
This function never works the way you think it should...
Example....
<?php
print_r ( parse_url ( 'me:you@sub.site.org/pear/validate.html?happy=me&sad=you#url' ) );
?>
Returns...
Array
(
[scheme] => me
[path] => you@sub.site.org/pear/validate.html
[query] => happy=me&sad=you
[fragment] => url
)
Here my way of doing parse_url
<?php
function parseUrl ( $url )
{
$r = '!(?:(\w+)://)?(?:(\w+)\:(\w+)@)?([^/:]+)?';
$r .= '(?:\:(\d*))?([^#?]+)?(?:\?([^#]+))?(?:#(.+$))?!i';
preg_match ( $r, $url, $out );
return $out;
}
print_r ( parseUrl ( 'me:you@sub.site.org/pear/validate.html?happy=me&sad=you#url' ) );
?>
Returns...
Array
(
[0] => me:you@sub.site.org/pear/validate.html?happy=me&sad=you#url
[1] =>
[2] => me
[3] => you
[4] => sub.site.org
[5] =>
[6] => /pear/validate.html
[7] => happy=me&sad=you
[8] => url
)
Where as...
out[0] = full url
out[1] = scheme or '' if no scheme was found
out[2] = username or '' if no auth username was found
out[3] = password or '' if no auth password was found
out[4] = domain name or '' if no domain name was found
out[5] = port number or '' if no port number was found
out[6] = path or '' if no path was found
out[7] = query or '' if no query was found
out[8] = fragment or '' if no fragment was found
15-Mar-2008 12:05
Please note that parse_url seems not to produce always the same results when passing non-standard urls.
Eg. I was using this code since 2005 (both under PHP 4.3.10 and PHP 5.2.3) :
<?php
$p = parse_url ( 'http://domain.tld/tcp://domain2.tld/dir/file' ) ;
$d2 = parse_url ( $p['path'] ) ;
echo $d2 ; // returns '/dir/file'
?>
Of course my example is very specific, as URL is not really correct. But using parse_url was a great trick to split URL easily (without using regular expressions).
Unfortunately under PHP 5.2.0-8 (+etch10), parse_url will fail as it does not accept the slash (/) at the beginning of URL.
Here is a possible patch :
<?php
$p = parse_url ( 'http://domain.tld/tcp://domain2.tld/dir/file' ) ;
$d2 = parse_url ( substr ( $p['path'] , 1 ) ) ;
echo $d2 ; // returns '/dir/file'
?>
However this last code is not optimized at all, and should be replaced by a regular expression to split URL (so that parse_url would be no longer used).
So you should use parse_url very carefully, and verify that you pass only standard URLs...
05-Sep-2007 06:32
Note that older versions of PHP (e.g., 4.1) returned an blank string as the path for URLs without any path, such as http://www.php.net
However more recent versions of PHP (e.g., 4.4.7) don't set the path element in the array, so old code will get a PHP warning about an undefined index.
28-Aug-2007 12:51
Another update to the glue_url function: applied the "isset" treatment to $parsed['pass'].
<?php
function glue_url($parsed)
{
if (!is_array($parsed)) return false;
$uri = isset($parsed['scheme']) ? $parsed['scheme'].':'.((strtolower($parsed['scheme']) == 'mailto') ? '' : '//') : '';
$uri .= isset($parsed['user']) ? $parsed['user'].(isset($parsed['pass']) ? ':'.$parsed['pass'] : '').'@' : '';
$uri .= isset($parsed['host']) ? $parsed['host'] : '';
$uri .= isset($parsed['port']) ? ':'.$parsed['port'] : '';
if(isset($parsed['path']))
{
$uri .= (substr($parsed['path'], 0, 1) == '/') ? $parsed['path'] : ('/'.$parsed['path']);
}
$uri .= isset($parsed['query']) ? '?'.$parsed['query'] : '';
$uri .= isset($parsed['fragment']) ? '#'.$parsed['fragment'] : '';
return $uri;
}
?>
13-Aug-2007 07:08
an update to the glue url function.
you are able to put a host and a path without a slash at the beginning of the path
<?php
function glue_url($parsed)
{
if (! is_array($parsed)) return false;
$uri = isset($parsed['scheme']) ? $parsed['scheme'].':'.((strtolower($parsed['scheme']) == 'mailto') ? '':'//'): '';
$uri .= isset($parsed['user']) ? $parsed['user'].($parsed['pass']? ':'.$parsed['pass']:'').'@':'';
$uri .= isset($parsed['host']) ? $parsed['host'] : '';
$uri .= isset($parsed['port']) ? ':'.$parsed['port'] : '';
if(isset($parsed['path']))
{
$uri .= (substr($parsed['path'],0,1) == '/')?$parsed['path']:'/'.$parsed['path'];
}
$uri .= isset($parsed['query']) ? '?'.$parsed['query'] : '';
$uri .= isset($parsed['fragment']) ? '#'.$parsed['fragment'] : '';
return $uri;
}
?>
09-Aug-2007 04:05
In reply to adrian,
Thank you very much for your function. There is a small issue with your relative protocol function. You need to remove the // when making the url the path. Here is the new function.
function resolve_url($base, $url) {
if (!strlen($base)) return $url;
// Step 2
if (!strlen($url)) return $base;
// Step 3
if (preg_match('!^[a-z]+:!i', $url)) return $url;
$base = parse_url($base);
if ($url{0} == "#") {
// Step 2 (fragment)
$base['fragment'] = substr($url, 1);
return unparse_url($base);
}
unset($base['fragment']);
unset($base['query']);
if (substr($url, 0, 2) == "//") {
// Step 4
return unparse_url(array(
'scheme'=>$base['scheme'],
'path'=>substr($url,2),
));
} else if ($url{0} == "/") {
// Step 5
$base['path'] = $url;
} else {
// Step 6
$path = explode('/', $base['path']);
$url_path = explode('/', $url);
// Step 6a: drop file from base
array_pop($path);
// Step 6b, 6c, 6e: append url while removing "." and ".." from
// the directory portion
$end = array_pop($url_path);
foreach ($url_path as $segment) {
if ($segment == '.') {
// skip
} else if ($segment == '..' && $path && $path[sizeof($path)-1] != '..') {
array_pop($path);
} else {
$path[] = $segment;
}
}
// Step 6d, 6f: remove "." and ".." from file portion
if ($end == '.') {
$path[] = '';
} else if ($end == '..' && $path && $path[sizeof($path)-1] != '..') {
$path[sizeof($path)-1] = '';
} else {
$path[] = $end;
}
// Step 6h
$base['path'] = join('/', $path);
}
// Step 7
return unparse_url($base);
}
04-Aug-2007 04:57
I searched for an implementation of rfc3986, which is a newer version of rfc 2392. I may find it here : <http://www.chrsen.dk/fundanemt/files/scripter/php/misc/rfc3986.php> - read the rfc at <http://rfc.net/rfc3986.html>
26-Jul-2007 06:58
Here's a function which implements resolving a relative URL according to RFC 2396 section 5.2. No doubt there are more efficient implementations, but this one tries to remain close to the standard for clarity. It relies on a function called "unparse_url" to implement section 7, left as an exercise for the reader (or you can substitute the "glue_url" function posted earlier).
<?php
/**
* Resolve a URL relative to a base path. This happens to work with POSIX
* filenames as well. This is based on RFC 2396 section 5.2.
*/
function resolve_url($base, $url) {
if (!strlen($base)) return $url;
// Step 2
if (!strlen($url)) return $base;
// Step 3
if (preg_match('!^[a-z]+:!i', $url)) return $url;
$base = parse_url($base);
if ($url{0} == "#") {
// Step 2 (fragment)
$base['fragment'] = substr($url, 1);
return unparse_url($base);
}
unset($base['fragment']);
unset($base['query']);
if (substr($url, 0, 2) == "//") {
// Step 4
return unparse_url(array(
'scheme'=>$base['scheme'],
'path'=>$url,
));
} else if ($url{0} == "/") {
// Step 5
$base['path'] = $url;
} else {
// Step 6
$path = explode('/', $base['path']);
$url_path = explode('/', $url);
// Step 6a: drop file from base
array_pop($path);
// Step 6b, 6c, 6e: append url while removing "." and ".." from
// the directory portion
$end = array_pop($url_path);
foreach ($url_path as $segment) {
if ($segment == '.') {
// skip
} else if ($segment == '..' && $path && $path[sizeof($path)-1] != '..') {
array_pop($path);
} else {
$path[] = $segment;
}
}
// Step 6d, 6f: remove "." and ".." from file portion
if ($end == '.') {
$path[] = '';
} else if ($end == '..' && $path && $path[sizeof($path)-1] != '..') {
$path[sizeof($path)-1] = '';
} else {
$path[] = $end;
}
// Step 6h
$base['path'] = join('/', $path);
}
// Step 7
return unparse_url($base);
}
?>
17-Jul-2007 05:42
Actually the behaviour noticed by the previous poster is quite correct. When the URI scheme is not present, it is plain wrong to assume that something starting with www. is a domain name, and that the scheme is HTTP. Internet Explorer does it that way, sure, but it does not make it any more correct. The documentation says that the function tries to decode the URL as well as it can, and the only sensible and standards-compliant way to decode such URL is to expect it to be a relative URI.
04-Jun-2007 07:59
Note that if you pass this function a url without a scheme (www.php.net, as opposed to http://www.php.net), the function will incorrectly parse the results. In my test case it returned the domain under the ['path'] element and nothing in the ['host'] element.
15-Mar-2007 01:10
Do not look for the fragment in $_SERVER['QUERY_STRING'], you will not find it. You should read the fragment in JavaScript for example.
24-Oct-2006 11:21
Heres a simple function to add the $component option in for PHP4. Haven't done exhaustive testing, but should work ok.
<?php
## Defines only available in PHP 5, created for PHP4
if(!defined('PHP_URL_SCHEME')) define('PHP_URL_SCHEME', 1);
if(!defined('PHP_URL_HOST')) define('PHP_URL_HOST', 2);
if(!defined('PHP_URL_PORT')) define('PHP_URL_PORT', 3);
if(!defined('PHP_URL_USER')) define('PHP_URL_USER', 4);
if(!defined('PHP_URL_PASS')) define('PHP_URL_PASS', 5);
if(!defined('PHP_URL_PATH')) define('PHP_URL_PATH', 6);
if(!defined('PHP_URL_QUERY')) define('PHP_URL_QUERY', 7);
if(!defined('PHP_URL_FRAGMENT')) define('PHP_URL_FRAGMENT', 8);
function parse_url_compat($url, $component=NULL){
if(!$component) return parse_url($url);
## PHP 5
if(phpversion() >= 5)
return parse_url($url, $component);
## PHP 4
$bits = parse_url($url);
switch($component){
case PHP_URL_SCHEME: return $bits['scheme'];
case PHP_URL_HOST: return $bits['host'];
case PHP_URL_PORT: return $bits['port'];
case PHP_URL_USER: return $bits['user'];
case PHP_URL_PASS: return $bits['pass'];
case PHP_URL_PATH: return $bits['path'];
case PHP_URL_QUERY: return $bits['query'];
case PHP_URL_FRAGMENT: return $bits['fragment'];
}
}
?>
05-Oct-2006 01:48
With few modifications
/**
* source: http://us2.php.net/manual/en/function.parse-url.php#60237
* Edit the Query portion of an url
*
* @param string $action ethier a "+" or a "-" depending on what action you want to perform
* @param mixed $var array (+) or string (-)
* @param string $uri the URL to use. if this is left out, it uses $_SERVER['PHP_SELF']
* @version 1.0.0
*/
function change_query($action, $var = NULL, $uri = NULL) {
if (($action == "+" && !is_array($var)) || ($action == "-" && $var == "") || $var == NULL) {
return FALSE;
}
if (is_null($uri)) { //Piece together uri string
$beginning = $_SERVER['PHP_SELF'];
$ending = (isset ($_SERVER['QUERY_STRING'])) ? $_SERVER['QUERY_STRING'] : '';
} else {
$qstart = strpos($uri, '?');
if ($qstart === false) {
$beginning = $uri; //$ending is '' anyway
$ending = "";
} else {
$beginning = substr($uri, 0, $qstart);
$ending = substr($uri, $qstart);
}
}
$vals = array ();
$ending = str_replace('?', '', $ending);
parse_str($ending, $vals);
switch ($action) {
case '+' :
$vals[$var[0]] = $var[1];
break;
case '-' :
if (isset ($vals[$var])) {
unset ($vals[$var]);
}
break;
default :
break;
}
$params = array();
foreach ($vals as $k => $value) {
$params[] = $k."=".urlencode($value);
}
$result = $beginning . (count($params) ? '?' . implode("&", $params) : '');
return $result;
}
27-Sep-2006 08:21
Here is a simple extended version of ParseURL().
I needed to make a link that will be saved off site but point to different file
than the one creating the link.
So I needed to get the path without the file name so I could change the
file name.
Here it is:
<?php
function ParseURLplus($url){
$URLpcs = (parse_url($url));
$PathPcs = explode("/",$URLpcs['path']);
$URLpcs['file'] = end($PathPcs);
unset($PathPcs[key($PathPcs)]);
$URLpcs['dir'] = implode("/",$PathPcs);
return ($URLpcs);
}
$url = 'http://username:password@hostname/path/directory/file.php?arg=
value#anchor';
$URLpcs = ParseURLplus($url);
print_r($URLpcs);
?>
Now I can change the $URLpcs['file'] and then glue itback together to make
a new url.
14-Jul-2006 03:59
I hope this is helpful! Cheers!
-eo
<?
# Author: Eric O
# Date: July 13, 2006
# Go Zizou!! :O)
# Creating Automatic Self-Redirect To Secure Version
# of Website as Seen on Paypal and other secure sites
# Changes HTTP to HTTPS
#gets the URI of the script
$url = $_SERVER['SCRIPT_URI'];
#chops URI into bits BORK BORK BORK
$chopped = parse_url($url);
#HOST and PATH portions of your final destination
$destination = $chopped[host].$chopped[path];
#if you are not HTTPS, then do something about it
if($chopped[scheme] != "https"){
#forwards to HTTP version of URI with secure certificate
header("Location: https://$destination");
exit();
}
?>
09-May-2006 08:18
Modfied version of glue_url to avoid error messages if the error_reporting is set high.
function glue_url($parsed)
{
if (! is_array($parsed)) return false;
$uri = isset($parsed['scheme']) ? $parsed['scheme'].':'.((strtolower($parsed['scheme']) == 'mailto') ? '':'//'): '';
$uri .= isset($parsed['user']) ? $parsed['user'].($parsed['pass']? ':'.$parsed['pass']:'').'@':'';
$uri .= isset($parsed['host']) ? $parsed['host'] : '';
$uri .= isset($parsed['port']) ? ':'.$parsed['port'] : '';
$uri .= isset($parsed['path']) ? $parsed['path'] : '';
$uri .= isset($parsed['query']) ? '?'.$parsed['query'] : '';
$uri .= isset($parsed['fragment']) ? '#'.$parsed['fragment'] : '';
return $uri;
}
31-Dec-2004 05:36
You may want to check out the PEAR NET_URL class. It provides easy means to manipulate URL strings.
http://pear.php.net/package/Net_URL
10-May-2004 05:36
Modified version of glue_url()
Cox's,Anonimous fucntion
<?php
function glue_url($parsed) {
if (! is_array($parsed)) return false;
$uri = $parsed['scheme'] ? $parsed['scheme'].':'.((strtolower($parsed['scheme']) == 'mailto') ? '':'//'): '';
$uri .= $parsed['user'] ? $parsed['user'].($parsed['pass']? ':'.$parsed['pass']:'').'@':'';
$uri .= $parsed['host'] ? $parsed['host'] : '';
$uri .= $parsed['port'] ? ':'.$parsed['port'] : '';
$uri .= $parsed['path'] ? $parsed['path'] : '';
$uri .= $parsed['query'] ? '?'.$parsed['query'] : '';
$uri .= $parsed['fragment'] ? '#'.$parsed['fragment'] : '';
return $uri;
}
?>
