downloads | documentation | faq | getting help | mailing lists | licenses | wiki | reporting bugs | php.net sites | links | conferences | my php.net

search for in the

mb_detect_encoding> <mb_decode_mimeheader
[edit] Last updated: Fri, 11 May 2012

view this page in

mb_decode_numericentity

(PHP 4 >= 4.0.6, PHP 5)

mb_decode_numericentityHTML 数値エンティティを文字にデコードする

説明

string mb_decode_numericentity ( string $str , array $convmap , string $encoding )

文字列 str において指定した文字領域にある数値エンティティを変換し、 変換後の文字列を返します。

パラメータ

str

デコードする文字列。

convmap

convmap は変換するコード領域を指定する配列です。

encoding

encoding パラメータには文字エンコーディングを指定します。省略した場合は、 内部文字エンコーディングを使用します。

返り値

変換された文字列を返します。

例1 convmap の例

<?php
$convmap 
= array (
   
int start_code1int end_code1int offset1int mask1,
   
int start_code2int end_code2int offset2int mask2,
   ........
   
int start_codeNint end_codeNint offsetNint maskN );
// start_codeN および end_codeN に Unicode値を指定
// 値にoffsetNを追加、マスクmaskNを指定してビット毎の'AND'をとり、
// 数値エンティティに値を変換します。
?>

参考



mb_detect_encoding> <mb_decode_mimeheader
[edit] Last updated: Fri, 11 May 2012
 
add a note add a note User Contributed Notes mb_decode_numericentity
tom 13-Apr-2011 02:46
When I use this function, I had found an error.
Example)
------------------------
INPUT STRING1 : abc&
INPUT STRING2 : abc&#
------------------------
OUTPUT STRING : abc
------------------------
<?php
    $input
= 'abc&';
   
$convmap = array (0x0, 0xffff, 0, 0xffff);
   
$output = mb_decode_numericentity($intput, $convmap, 'UTF-8');
    echo
$output;
?>
result : abc

If an input string is finished with some characters such the beginning of NCR-form, this function remove that characters.

So, I use an trick.

<?php
function decode_numericentity($string){
   
$string = $string.chr(32);
   
$string = mb_decode_numericentity($string, $convmap, 'UTF-8');
   
$pos = strlen($string)-1;
   
//if(ord($string[$pos]) == 32){
       
$string = substr($string,0,$pos);
   
//}
   
return $string;
}
?>
Navi 01-Apr-2009 01:00
Manual entity => utf8 conversion:
<?php
       
// parse entities
       
$raw = preg_replace_callback
       
(
           
"/&#(\\d+);/u",
           
"_pcreEntityToUtf",
           
$raw
       
);

    function
_pcreEntityToUtf($matches)
    {
       
$char = intval(is_array($matches) ? $matches[1] : $matches);

        if (
$char < 0x80)
        {
           
// to prevent insertion of control characters
           
if ($char >= 0x20) return htmlspecialchars(chr($char));
            else return
"&#$char;";
        }
        else if (
$char < 0x8000)
        {
            return
chr(0xc0 | (0x1f & ($char >> 6))) . chr(0x80 | (0x3f & $char));
        }
        else
        {
            return
chr(0xe0 | (0x0f & ($char >> 12))) . chr(0x80 | (0x3f & ($char >> 6))). chr(0x80 | (0x3f & $char));
        }
    }
?>
donovan at conduit it 19-Apr-2006 09:05
note that at this time it seems that mb_decode_numericentity() only works with decimal entities and not hexadecimal entities.  This fact would have saved me a good hour of time in debugging.

For those who need to convert hex entities try first converting them all to decimal entities with a combination of the preg_replace() and hexdec() functions.
dirk at camindo de 30-Jan-2005 09:51
By use of function utf8_decode you'll get a problem with all extended chars above ISO-8859-1 charset. You can solve this problem by using the

function mb_encode_numericentity before:

  // convert $text from UTF-8 to ISO-8859-1
  $convmap = array(0xFF, 0x2FFFF, 0, 0xFFFF);
  $text = mb_encode_numericentity($text, $convmap, "UTF-8");
  $text = utf8_decode($text);

The second line encodes all extended chars below 0xFF, the third line converts the rest: 0x80 - 0xFF
Andrew Simpson 10-Dec-2004 05:29
Many web browsers will tend upload high order characters as UTF-8 encoded entities.

Here is some simple code to convert UTF-8 HTML entities within a block of text into proper characters:

<?php
  
//decode decimal HTML entities added by web browser
 
$body = preg_replace('/&#\d{2,5};/ue', "utf8_entity_decode('$0')", $body );
 
//decode hex HTML entities added by web browser
 
$body = preg_replace('/&#x([a-fA-F0-7]{2,8});/ue', "utf8_entity_decode('&#'.hexdec('$1').';')", $body );

//callback function for the regex
function utf8_entity_decode($entity){
 
$convmap = array(0x0, 0x10000, 0, 0xfffff);
 return
mb_decode_numericentity($entity, $convmap, 'UTF-8');
}
?>
php at cNhOiSpPpAlMe dot org 31-Mar-2004 12:55
Here are functions to convert hankaku to zenkaku characters (and vice-versa) in Japanese text.

<?php

// Supported characters:
//    (space)
//     !#$%&()*+,./0123456789:;<=>?@
//    ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`
//    abcdefghijklmnopqrstuvwxyz{|}
// (Katakana isn't supported.)

function f_han2zen ($string,$encoding = null) {
  if (
is_null($encoding)) $encoding = mb_internal_encoding();
 
$convmap = array(
    
0x20,0x20,0x3000-0x20,0xffff,   // Space
    
0x21,0x7e,0xff01-0x21,0xffff);
 
$temp = mb_encode_numericentity($string,$convmap,$encoding);
 
$convmap = array(0,0xffff,0,0xffff);
  return
mb_decode_numericentity($temp,$convmap,$encoding);
}
function
f_zen2han ($string,$encoding = null) {
  if (
is_null($encoding)) $encoding = mb_internal_encoding();
 
$convmap = array(
    
0x3000,0x3000,-(0x3000-0x20),0xffff,   // Space
    
0xff01,0xff5e,-(0xff01-0x21),0xffff);
 
$temp = mb_encode_numericentity($string,$convmap,$encoding);
 
$convmap = array(0,0xffff,0,0xffff);
  return
mb_decode_numericentity($temp,$convmap,$encoding);
}

// Sample usage:
f_han2zen("test","shift_jis");
f_han2zen("test","utf-8");

?>
dev at glossword info 19-Nov-2003 07:43
Just two great functions for daily use:

/* Converts any HTML-entities into characters */
function my_numeric2character($t)
{
    $convmap = array(0x0, 0x2FFFF, 0, 0xFFFF);
    return mb_decode_numericentity($t, $convmap, 'UTF-8');
}
/* Converts any characters into HTML-entities */
function my_character2numeric($t)
{
    $convmap = array(0x0, 0x2FFFF, 0, 0xFFFF);
    return mb_encode_numericentity($t, $convmap, 'UTF-8');
}
print my_numeric2character('&#8217; &#7936; &#226;');
print my_character2numeric(' ');

 
show source | credits | sitemap | contact | advertising | mirror sites