php substr replace utf 8
Функции для работы с многобайтовыми строками
Схемы многобайтного кодирования символов и их реализации достаточно сложны, и их описание находится за пределами этой документации. Более исчерпывающую информацию о кодировках и их устройстве можно почерпнуть из нижеприведённых источников.
Материалы по Юникоду
Информация о символах японской/корейской/китайской кодировок
Содержание
User Contributed Notes 35 notes
Please note that all the discussion about mb_str_replace in the comments is pretty pointless. str_replace works just fine with multibyte strings:
= ‘漢字はユニコード’ ;
$needle = ‘は’ ;
$replace = ‘Foo’ ;
?>
The usual problem is that the string is evaluated as binary string, meaning PHP is not aware of encodings at all. Problems arise if you are getting a value «from outside» somewhere (database, POST request) and the encoding of the needle and the haystack is not the same. That typically means the source code is not saved in the same encoding as you are receiving «from outside». Therefore the binary representations don’t match and nothing happens.
PHP can input and output Unicode, but a little different from what Microsoft means: when Microsoft says «Unicode», it unexplicitly means little-endian UTF-16 with BOM(FF FE = chr(255).chr(254)), whereas PHP’s «UTF-16» means big-endian with BOM. For this reason, PHP does not seem to be able to output Unicode CSV file for Microsoft Excel. Solving this problem is quite simple: just put BOM infront of UTF-16LE string.
SOME multibyte encodings can safely be used in str_replace() and the like, others cannot. It’s not enough to ensure that all the strings involved use the same encoding: obviously they have to, but it’s not enough. It has to be the right sort of encoding.
UTF-8 is one of the safe ones, because it was designed to be unambiguous about where each encoded character begins and ends in the string of bytes that makes up the encoded text. Some encodings are not safe: the last bytes of one character in a text followed by the first bytes of the next character may together make a valid character. str_replace() knows nothing about «characters», «character encodings» or «encoded text». It only knows about the string of bytes. To str_replace(), two adjacent characters with two-byte encodings just looks like a sequence of four bytes and it’s not going to know it shouldn’t try to match the middle two bytes.
While real-world examples can be found of str_replace() mangling text, it can be illustrated by using the HTML-ENTITIES encoding. It’s not one of the safe ones. All of the strings being passed to str_replace() are valid HTML-ENTITIES-encoded text so the «all inputs use the same encoding» rule is satisfied.
The text is «x = ‘x ;
mb_internal_encoding ( ‘HTML-ENTITIES’ );
?>
Even though neither ‘l’ nor ‘;’ appear in the text «x y» and in the other it broke the encoding completely.
One more reason to use UTF-8 if you can, I guess.
Yet another single-line mb_trim() function
PHP5 has no mb_trim(), so here’s one I made. It work just as trim(), but with the added bonus of PCRE character classes (including, of course, all the useful Unicode ones such as \pZ).
substr_replace
(PHP 4, PHP 5, PHP 7, PHP 8)
substr_replace — Заменяет часть строки
Описание
Список параметров
Возвращаемые значения
Возвращает результирующую строку. Если string является массивом, то возвращает массив.
Список изменений
Версия | Описание |
---|---|
8.0.0 | length теперь допускает значение null. |
Примеры
Пример #1 Простой пример использования substr_replace()
Пример #2 Использование substr_replace() для одновременной множественной замены строк
Результат выполнения данного примера:
Примечания
Замечание: Эта функция безопасна для обработки данных в двоичной форме.
Смотрите также
User Contributed Notes 35 notes
Forget all of the mb_substr_replace() implementations mentioned in this page, they’re all buggy.
Here is a version that mimics the behavior of substr_replace() exactly:
PHP version of Java’s removeCharAt() function:
Using substr_replace() can be avoided by using substr() instead:
This can be useful when you need to replace parts of multibyte strings like strings encoded with utf-8. There isn’t a multibute variant for substr_replace(), but for php substr() there is mb_substr(). For more information on multibyte strings see http://nl3.php.net/manual/en/ref.mbstring.php
I’ve just taken a look at the post by ntoniazzi and I have a very small correction to make.
In the second if statement, it should be a triple equals, so:
I wrote a function that you can use for example in combination with a search script to cut off the articles that are too long.
I recently ran across a situation where I need to strip a heavily nested html list such that only the top level was preserved. I started with a regular expression solution, but found that I kept matching the wrong closing ul with an outer opening ul.
This was my alternative solution, and it seems to work well:
?>
Hope this helps someone.
This will truncate a longer string to a smaller string of specified length while replacing the middle portion with a separator exactly in the middle.
//prints «abcdefghij/. /56789z.jpg»
I have a little function that works like substr_replace () what I use for some purpose. Maybe someone needs it.
This is a small powerful function that performs its job flawlessly.
I suggest changing the function suggested by Guru Evi slightly. I found that it doesn’t work as written here.
If your string is not long enough to meet what you specify in start and length then the replacement string is added towards the end of the string.
I use strip_tags to strip out the HTML otherwise you might get a screwed up HTML (when a tags open in the string, but because you cut-off it doesn’t)
THE DOT DOT DOT ISSUE
PROBLEM:
You want to abbreviate a string.
E.g. You want «BritneySpears» to show as «BritneySpe. «, being only the ten first characters followed by «. «
This will result in BritneySpe.
The older function would end up looking like «blah blah. » or «blah blah. » which doesn’t look so nice to me.
$punctuation = «. ;,-» ; //punctuation you want removed
Here is a simple function to shorten a string and add an ellipsis
This may be obvious to others, but I just spent hours and my feeble brain only caught up to it after a long break.
If you are looping through a string which has multiple substrings that need to be replaced, you have to add an offset factor to each original offset before you replaced any strings. Here is a real world example:
From draft.js we get paragraphs with multiple links designated only with offset, anchor text length, url, target. So each anchor text must be wrapped in the anchortext to save proper content in the database.
Here is the implementation of offset factor:
I hope this helps a noobie 🙂 If there is another easier way, I would love to hear about it.
First Example can be simplified =>
$input = array(‘A: XXX’, ‘B: XXX’, ‘C: XXX’);
output: Array ( [0] => A: YYY [1] => B: YYY [2] => C: YYY )
I recently needed a routine that would remove the characters in one string from another, like the regex
I don’t know if this function is multibyte safe but I’ve written a function that will do the same in multibyte mode.
Just to add to the examples, if replacement is longer than length, only the length number of chars are removed from string and all of replacement is put in its place, and therefor strlen($string) is inreased.
$var = ‘ABCDEFGH:/MNRPQR/’;
/* Should return ABCDEFGH:/testingRPQR/ */
echo substr_replace ($var, ‘testing’, 10, 2);
If you would like to remove characters from the start or end of a string, try the substr() function.
The comment by geniusdex is a good one. Short, simple functions are the best. But if the string is not longer than the limit set, NOTHING is returned. Here is the function re-done to always return a string:
Regarding «. «, even the short functions are too long and complicated, and there’s no need to use substr_replace. substr() works better and is way faster prior to 4.3.5 as the below poster stated.
This is my version of making dotted strings:
To abbreviate links into ‘. ‘ if they outreach a certain amount of space; use the preg_replace function instead.
For instance you grabbed the headlines of a news site for use on your own page and the lines are to long:
PHP substr_replace() Function
Example
Replace «Hello» with «world»:
Definition and Usage
The substr_replace() function replaces a part of a string with another string.
Note: If the start parameter is a negative number and length is less than or equal to start, length becomes 0.
Note: This function is binary-safe.
Syntax
Parameter Values
Technical Details
Return Value: | Returns the replaced string. If the string is an array then the array is returned |
---|---|
PHP Version: | 4+ |
Changelog: | As of PHP 4.3.3, all parameters now accept arrays |
More Examples
Example
Start replacing at the 6th position in the string (replace «world» with «earth»):
Example
Start replacing at the 5th position from the end of the string (replace «world» with «earth»):
Example
Insert «Hello» at the beginning of «world»:
Example
Replace multiple strings at once. Replace «AAA» in each string with «BBB»:
We just launched
W3Schools videos
COLOR PICKER
LIKE US
Get certified
by completing
a course today!
CODE GAME
Report Error
If you want to report an error, or if you want to make a suggestion, do not hesitate to send us an e-mail:
Thank You For Helping Us!
Your message has been sent to W3Schools.
Top Tutorials
Top References
Top Examples
Web Courses
W3Schools is optimized for learning and training. Examples might be simplified to improve reading and learning. Tutorials, references, and examples are constantly reviewed to avoid errors, but we cannot warrant full correctness of all content. While using W3Schools, you agree to have read and accepted our terms of use, cookie and privacy policy.
php substr() function with utf-8 leaves � marks at the end
Here is simple code
and it outputs something like this:
Бензин Офиси А.С. также производит все типы жира и смазок и их побочных продук�.
I tried mb_substr() with no luck. How to do this the right way?
7 Answers 7
The comments above are correct so long as you have mbstring enabled on your server.
Here’s the php docs:
A proper (logical) alternative for unicode strings;
PHP5 does not understand UTF-8 natively. It is proposed for PHP6, if it ever comes out.
Use the multibyte string functions to manipulate UTF-8 strings safely.
For instance, mb_substr() in your case.
If your strings may contain Unicode (multi-byte) characters and you don’t want to break these, replace substr with one of the following two, depending on what you want:
Limit to 142 characters:
Limit to 142 bytes:
$foo = mb_substr($word, 0, mb_strlen($word)-1);
Never use constant in substr function for UTF-8 string:
50% chance you will get half of a character at end of the string.
substr
(PHP 4, PHP 5, PHP 7, PHP 8)
substr — Возвращает подстроку
Описание
Список параметров
Если string меньше offset символов, будет возвращена пустая строка.
Пример #1 Использование отрицательного параметра offset
Если length положительный, возвращаемая строка будет не длиннее length символов, начиная с параметра offset (в зависимости от длины string ).
Если параметр length опущен, то будет возвращена подстрока, начинающаяся с позиции, указанной параметром offset и длящейся до конца строки.
Пример #2 Использование отрицательного параметра length
Возвращаемые значения
Возвращает извлечённую часть параметра string или пустую строку.
Список изменений
Примеры
Пример #3 Базовое использование substr()
Пример #4 substr() и приведение типов
class apple <
public function __toString () <
return «green» ;
>
>
Результат выполнения данного примера:
Пример #5 Недопустимый диапазон символов
Результат выполнения данного примера в PHP 8:
Результат выполнения данного примера в PHP 7:
Смотрите также
User Contributed Notes 36 notes
For getting a substring of UTF-8 characters, I highly recommend mb_substr
may be by following functions will be easier to extract the needed sub parts from a string:
Coming to PHP from classic ASP I am used to the Left() and Right() functions built into ASP so I did a quick PHPversion. hope these help someone else making the switch
Shortens the filename and its expansion has seen.
### SUB STRING BY WORD USING substr() and strpos() #####
### THIS SCRIPT WILL RETURN PART OF STRING WITHOUT WORD BREAK ###
Drop extensions of a file (even from a file location string)
= «c:/some dir/abc defg. hi.jklmn» ;
?>
output: c:/some dir/abc defg. hi
Hope it may help somebody like me.. (^_^)
PS:I’m sorry my english is too poor. 🙁
If you want to have a string BETWEEN two strings, just use this function:
$string = «123456789» ;
$a = «12» ;
$b = «9» ;
If you need to parse utf-8 strings char by char, try this one:
Be aware of a slight inconsistency between substr and mb_substr
mb_substr(«», 4); returns empty string
substr(«», 4); returns boolean false
tested in PHP 7.1.11 (Fedora 26) and PHP 5.4.16 (CentOS 7.4)
I wanted to work out the fastest way to get the first few characters from a string, so I ran the following experiment to compare substr, direct string access and strstr:
(substr) 3.24
(direct access) 11.49
(strstr) 4.96
(With standard deviations 0.01, 0.02 and 0.04)
THEREFORE substr is the fastest of the three methods for getting the first few letters of a string.
Here we have gr8 function which simply convert ip address to a number using substr with negative offset.
You can need it if you want to compare some IP addresses converted to a numbers.
For example when using ip2country, or eliminating same range of ip addresses from your website 😀
$min = ip2no ( «10.11.1.0» );
$max = ip2no ( «111.11.1.0» );
$visitor = ip2no ( «105.1.20.200» );
I created some functions for entity-safe splitting+lengthcounting:
I needed a function like lpad from oracle, or right from SQL
then I use this code :
Just a little function to cut a string by the wanted amount. Works in both directions.
Anyone coming from the Python world will be accustomed to making substrings by using a «slice index» on a string. The following function emulates basic Python string slice behavior. (A more elaborate version could be made to support array input as well as string, and the optional third «step» argument.)
The output from the examples:
c
cd
cdefg
abcd
abcd
efg
I have developed a function with a similar outcome to jay’s
Checks if the last character is or isnt a space. (does it the normal way if it is)
It explodes the string into an array of seperate works, the effect is. it chops off anything after and including the last space.
I needed to cut a string after x chars at a html converted utf-8 text (for example Japanese text like 嬰謰弰脰欰罏).
The problem was, the different length of the signs, so I wrote the following function to handle that.
Perhaps it helps.
Using a 0 as the last parameter for substr().
[English]
I created python similar accesing list or string with php substr & strrev functions.
About of pattern structures
[start:stop:step]
?>
Using this is similar to simple substr.