php str split utf 8

mb_split

(PHP 4 >= 4.2.0, PHP 5, PHP 7, PHP 8)

mb_split — Разделение строк в многобайтных кодировках, используя регулярное выражение

Описание

Список параметров

Шаблон регулярного выражения.

Разбиваемая строка ( string ).

limit Если необязательный аргумент limit задан, функция разобьёт строку не более, чем на limit частей.

Возвращаемые значения

Результат разбиения в виде массива ( array ) или false в случае возникновения ошибки.

Примечания

Смотрите также

User Contributed Notes 8 notes

a (simpler) way to extract all characters from a UTF-8 string to array with a single call to a built-in function:

I figure most people will want a simple way to break-up a multibyte string into its individual characters. Here’s a function I’m using to do that. Change UTF-8 to your chosen encoding method.

In addition to Sezer Yalcin’s tip.

This function splits a multibyte string into an array of characters. Comparable to str_split().

To split an string like this: «日、に、本、ほん、語、ご» using the «、» delimiter i used:

The solution was to set this before:

mb_regex_encoding(‘UTF-8’);
mb_internal_encoding(«UTF-8»);
$v = mb_split(‘、’,»日、に、本、ほん、語、ご»);

and now it’s working:

this is my solution for it:

an other way to str_split multibyte string:
= ‘әӘөүҗңһ’ ;

We are talking about Multi Byte ( e.g. UTF-8) strings here, so preg_split will fail for the following string:

‘Weiße Rosen sind nicht grün!’

And because I didn’t find a regex to simulate a str_split I optimized the first solution from adjwilli a bit:

( ‘UTF-8’ );
mb_regex_encoding ( ‘UTF-8’ );

echo » ;
?>

Let me know [by personal email], if someone found a regex to simulate a str_split with mb_split.

Источник

str_split

str_split — Преобразует строку в массив

Описание

Преобразует строку в массив.

Список параметров

Максимальная длина фрагмента.

Возвращаемые значения

Примеры

Пример #1 Пример использования str_split()

Результат выполнения данного примера:

Примечания

Функция str_split() производит разбивку по байтам, а не по символам, в случае использования строк в многобайтных кодировках.

Смотрите также

User Contributed Notes 40 notes

A proper unicode string split;

print_r(str_split($s, 3));
print_r(str_split_unicode($s, 3));

A new version of «str_split_unicode» prev.

heres my version for php4 and below

The manual don’t says what is returned when you parse a different type of variable.

This is the example:

= «Long» ; // More than 1 char
$str2 = «x» ; // Only 1 char
$str3 = «» ; // Empty String
$str4 = 34 ; // Integer
$str5 = 3.4 ; // Float
$str6 = true ; // Bool
$str7 = null ; // Null

I noticed in the post below me that his function would return an array with an empty key at the end.

So here is just a little fix for it.

I needed a function that could split a string from the end with any left over chunk being at the beginning of the array (the beginning of the string).

The documentation fails to mention what happens when the string length does not divide evenly with the chunk size. Not sure if the same behavior for all versions of PHP so I offer the following code to determine this for your installation. On mine [version 5.2.17], the last chunk is an array the length of the remaining chars.

The very handy str_split() was introduced in PHP 5, but a lot of us are still forced to use PHP 4 at our host servers. And I am sure a lot of beginners have looked or are looking for a function to accomplish what str_split() does.

Taking advantge of the fact that strings are ‘arrays’ I wrote this tiny but useful e-mail cloaker in PHP, which guarantees functionality even if JavaScript is disabled in the client’s browser. Watch how I make up for the lack of str_split() in PHP 4.3.10.

// The result is an email address in HTML entities which, I hope most email address harvesters can’t read.

>
print cloakEmail ( ‘someone@nokikon.com’ );
?>

###### THE CODE ABOVE WITHOUT COMMENTS ######

It’s mentioned in the Return Values section above («If the split_length length exceeds the length of string, the entire string is returned as the first (and only) array element»), but note that an input of empty string will return array(1) < [0]=>string(0) «» >. Interestingly an input of NULL will also return array(1) < [0]=>string(0) «» >.

revised function from tatsudoshi

The previous suggestion is almost correct (and will only working for strlen=1. The working PHP4 function is:

Even shorter version:

//place each character (or group of) of the
string into and array

the fastast way (that fits my needs) to replace str_split() in php 4 i found is this:

Источник

Using str_split on a UTF-8 encoded string

I’m currently working on a project, and instead of using regular MySQL queries I thought I’d go ahead and learn how to use PDO.

I have a table called contestants, both the database, the table, and all of the columns are in utf-8. I have ten entries in the contestant table, and their column «name» contains characters such as åäö.

Now, when I fetch an entry from the database, and var_dump the name, I get a good result, a string with all the special characters intact. But what I need to do is to split the string by characters, to get them in an array that I then shuffle.

For instance, I have this string: Test ÅÄÖ Tåän

And when I run str_split I get each character in it’s own key in an array. The only issue is that all the special characters display as this: �, meaning the array will be like this:

As you can see, it not only messes up the characters, but it also duplicates them in str_split process. I’ve tried several ways to split the string, but they all have the same issue. When I output the string before the split, it shows the special characters just fine.

This is my dbConn.php code:

// Require config file: require_once(‘config.inc.php’);

And this is the code that I use to fetch from the database and loop:

I’m connecting with utf-8, my php file is utf-8 without BOM and no other special characters on this page share this issue. What could be wrong, or what am I doing wrong?

Источник

Функции для работы с многобайтовыми строками

Схемы многобайтного кодирования символов и их реализации достаточно сложны, и их описание находится за пределами этой документации. Более исчерпывающую информацию о кодировках и их устройстве можно почерпнуть из нижеприведённых источников.

    Материалы по Юникоду

    Информация о символах японской/корейской/китайской кодировок

    Содержание

    User Contributed Notes 35 notes

    Please note that all the discussion about mb_str_replace in the comments is pretty pointless. str_replace works just fine with multibyte strings:

    = ‘漢字はユニコード’ ;
    $needle = ‘は’ ;
    $replace = ‘Foo’ ;

    ?>

    The usual problem is that the string is evaluated as binary string, meaning PHP is not aware of encodings at all. Problems arise if you are getting a value «from outside» somewhere (database, POST request) and the encoding of the needle and the haystack is not the same. That typically means the source code is not saved in the same encoding as you are receiving «from outside». Therefore the binary representations don’t match and nothing happens.

    PHP can input and output Unicode, but a little different from what Microsoft means: when Microsoft says «Unicode», it unexplicitly means little-endian UTF-16 with BOM(FF FE = chr(255).chr(254)), whereas PHP’s «UTF-16» means big-endian with BOM. For this reason, PHP does not seem to be able to output Unicode CSV file for Microsoft Excel. Solving this problem is quite simple: just put BOM infront of UTF-16LE string.

    SOME multibyte encodings can safely be used in str_replace() and the like, others cannot. It’s not enough to ensure that all the strings involved use the same encoding: obviously they have to, but it’s not enough. It has to be the right sort of encoding.

    UTF-8 is one of the safe ones, because it was designed to be unambiguous about where each encoded character begins and ends in the string of bytes that makes up the encoded text. Some encodings are not safe: the last bytes of one character in a text followed by the first bytes of the next character may together make a valid character. str_replace() knows nothing about «characters», «character encodings» or «encoded text». It only knows about the string of bytes. To str_replace(), two adjacent characters with two-byte encodings just looks like a sequence of four bytes and it’s not going to know it shouldn’t try to match the middle two bytes.

    While real-world examples can be found of str_replace() mangling text, it can be illustrated by using the HTML-ENTITIES encoding. It’s not one of the safe ones. All of the strings being passed to str_replace() are valid HTML-ENTITIES-encoded text so the «all inputs use the same encoding» rule is satisfied.

    The text is «x = ‘x ;
    mb_internal_encoding ( ‘HTML-ENTITIES’ );

    ?>

    Even though neither ‘l’ nor ‘;’ appear in the text «x y» and in the other it broke the encoding completely.

    One more reason to use UTF-8 if you can, I guess.

    Yet another single-line mb_trim() function

    PHP5 has no mb_trim(), so here’s one I made. It work just as trim(), but with the added bonus of PCRE character classes (including, of course, all the useful Unicode ones such as \pZ).

    Источник

    str_split

    str_split — Преобразует строку в массив

    Описание

    Преобразует строку в массив.

    Список параметров

    Максимальная длина фрагмента.

    Возвращаемые значения

    Примеры

    Пример #1 Пример использования str_split()

    Результат выполнения данного примера:

    Примечания

    Функция str_split() производит разбивку по байтам, а не по символам, в случае использования строк в многобайтных кодировках.

    Смотрите также

    User Contributed Notes 39 notes

    A proper unicode string split;

    print_r(str_split($s, 3));
    print_r(str_split_unicode($s, 3));

    The documentation fails to mention what happens when the string length does not divide evenly with the chunk size. Not sure if the same behavior for all versions of PHP so I offer the following code to determine this for your installation. On mine [version 5.2.17], the last chunk is an array the length of the remaining chars.

    A new version of «str_split_unicode» prev.

    I noticed in the post below me that his function would return an array with an empty key at the end.

    So here is just a little fix for it.

    The very handy str_split() was introduced in PHP 5, but a lot of us are still forced to use PHP 4 at our host servers. And I am sure a lot of beginners have looked or are looking for a function to accomplish what str_split() does.

    Taking advantge of the fact that strings are ‘arrays’ I wrote this tiny but useful e-mail cloaker in PHP, which guarantees functionality even if JavaScript is disabled in the client’s browser. Watch how I make up for the lack of str_split() in PHP 4.3.10.

    // The result is an email address in HTML entities which, I hope most email address harvesters can’t read.

    >
    print cloakEmail ( ‘someone@nokikon.com’ );
    ?>

    ###### THE CODE ABOVE WITHOUT COMMENTS ######

    The previous suggestion is almost correct (and will only working for strlen=1. The working PHP4 function is:

    If you use PHP 4 and don’t need the split_length parameter, here’s the shortest replacement:

    Источник

    Добавить комментарий

    Ваш адрес email не будет опубликован. Обязательные поля помечены *