php nbsp to space
How to replace decoded Non-breakable space (nbsp)
How to replace (using preg_replace) it without encoding it to entities?
What are possibilities beyond regexp?
My question is not exactly «how» to do this (since I could encode entities, remove entities i don’t need and decode entities). But how to remove those with just str_replace or preg_replace.
2 Answers 2
Problem Explanation
The reason why it’s not working is that you are specifying the non-breaking space incorrectly.
A Bit of Theory
Legacy character encodings were using the constant number of bits to encode every character in their set. For example, the original ASCII encoding was using 7 bits per character, extended ASCII 8 bits.
The UTF-8 encoding is so-called variable width character encoding, which means that the number of bits used to represent individual characters is variable, in the case of UTF-8, character codes consist of one up to four (8 bit) bytes (octets). In general, similarly to the Huffman coding, more frequently used characters have shorter codes while more rare characters have longer codes. That helps reduce the data size of the average text.
Solution
You can replace all occurences of the UTF-8 non-breaking space in text using a simple (and fast) str_replace or using a more flexible regular expression, depending on your needs:
Notes
In contrast, the preg_replace function itself understands the textual representation of character codes so you don’t need PHP to convert them into actual characters and you can use apostrophes (single quotes, ‘ ) to enclose the search string in this case.
Does html_entity_decode replaces also? If not how to replace it?
I have a situation where I am passing a string to a function. I want to convert to » » (a blank space) before passing it to function. Does html_entity_decode does it?
If not how to do it?
I am aware of str_replace but is there any other way out?
4 Answers 4
You might wonder why trim(html_entity_decode(‘ ‘)); doesn’t reduce the string to an empty string, that’s because the ‘ ‘ entity is not ASCII code 32 (which is stripped by trim()) but ASCII code 160 (0xa0) in the default ISO 8859-1 characterset.
You can use str_replace() to replace the ascii character #160 to a space:
html_entity_decode does convert to a space, just not a «simple» one (ASCII 32), but a non-breaking space (ASCII 160) (as this is the definition of ).
Carefully read the Notes, maybe that s the issue you are facing:
You might wonder why trim(html_entity_decode(‘ ‘)); doesn’t reduce the string to an empty string, that’s because the ‘ ‘ entity is not ASCII code 32 (which is stripped by trim()) but ASCII code 160 (0xa0) in the default ISO 8859-1 characterset.
Not sure if it is a viable solution for most cases but I used trim(strip_tags(html_entity_decode(htmlspecialchars_decode($html), ENT_QUOTES, ‘UTF-8’))); in my most recent application. The addition of htmlspecialchars_decode() initially was the only thing that would actually strip them.
How to add extra whitespace in PHP?
I was wondering how can I add extra whitespace in php is it something like \s please help thanks.
Is there a tutorial that list these kind of things thanks.
14 Answers 14
To render more than one whitespace on most web browsers use instead of normal white spaces.
For showing data in raw format (with exact number of spaces and «enters») use HTML «; //Will render exactly as written here (8 white spaces)
Or you can use some CSS to style current block, not to break text or strip spaces (I don’t know, but this one)
Any way you do the output will be the same but the browser itself strips double white spaces and renders as one.
PHP (typically) generates HTML output for a web-site.
When displaying HTML, the browser (typically) collapses all whitespace in text into a single space. Sometimes, between tags, it even collapses whitespace to nothing.
In order to persuade the browser to display whitespace, you need to include special tags like or
in your HTML to add non-breaking whitespace or new lines, respectively.
for adding space character you can use
use this one. it will provide 60 spaces. that is your second parameter.
Use str_pad function. It is very easy to use. One example is below:
pre is your friend.
Or you can «View Source» in your browser. (Ctrl+U for most browser.)
source
output
is this for display purposes? if so you really should consider separating your display form your logic and use style sheets for formatting. being server side php should really allow providing and accepting data. while you could surely use php to do what you are asking I am a very firm believer in keeping display and logic with as much separation as possible. with styles you can do all of your typesetting.
give output class wrappers and style accordingly.
You can also use this
Although, you should define the styles globally, and not inline as I have done in this example.
When you are outputting strings from PHP you can use «\n» for a new line, and «\t» for a tab.
Although, flags like \n or \t only work in double quotes («) not single wuotes (‘).
to make your code look better when viewing source
your code when view source will look like, with nice line returns thanks to \n
Not the answer you’re looking for? Browse other questions tagged php whitespace or ask your own question.
Linked
Related
Hot Network Questions
Subscribe to RSS
To subscribe to this RSS feed, copy and paste this URL into your RSS reader.
site design / logo © 2021 Stack Exchange Inc; user contributions licensed under cc by-sa. rev 2021.9.17.40238
By clicking “Accept all cookies”, you agree Stack Exchange can store cookies on your device and disclose information in accordance with our Cookie Policy.
htmlspecialchars
(PHP 4, PHP 5, PHP 7, PHP 8)
htmlspecialchars — Преобразует специальные символы в HTML-сущности
Описание
Список параметров
Конвертируемая строка ( string ).
Необязательный аргумент, определяющий кодировку, используемую при конвертации символов.
Если не указан, то значение по умолчанию для encoding зависит от конфигурационной опции default_charset.
Хотя этот аргумент является технически необязательным, настоятельно рекомендуется указать правильное значение для вашего кода, опция конфигурации default_charset может быть задана неверно для входных данных.
Поддерживаются следующие кодировки:
Кодировка | Псевдонимы | Описание |
---|---|---|
ISO-8859-1 | ISO8859-1 | Западно-европейская Latin-1. |
ISO-8859-5 | ISO8859-5 | Редко используемая кириллическая кодировка (Latin/Cyrillic). |
ISO-8859-15 | ISO8859-15 | Западно-европейская Latin-9. Добавляет знак евро, французские и финские буквы к кодировке Latin-1 (ISO-8859-1). |
UTF-8 | 8-битная Unicode, совместимая с ASCII. | |
cp866 | ibm866, 866 | Кириллическая кодировка, применяемая в DOS. |
cp1251 | Windows-1251, win-1251, 1251 | Кириллическая кодировка, применяемая в Windows. |
cp1252 | Windows-1252, 1252 | Западно-европейская кодировка, применяемая в Windows. |
KOI8-R | koi8-ru, koi8r | Русская кодировка. |
BIG5 | 950 | Традиционный китайский, применяется в основном на Тайване. |
GB2312 | 936 | Упрощённый китайский, стандартная национальная кодировка. |
BIG5-HKSCS | Расширенная Big5, применяемая в Гонконге. | |
Shift_JIS | SJIS, SJIS-win, cp932, 932 | Японская кодировка. |
EUC-JP | EUCJP, eucJP-win | Японская кодировка. |
MacRoman | Кодировка, используемая в Mac OS. | |
» | Пустая строка активирует режим определения кодировки из файла скрипта (Zend multibyte), default_charset и текущей локали (смотрите nl_langinfo() и setlocale() ) в указанном порядке. Не рекомендуется к использованию. |
Замечание: Остальные кодировки не поддерживаются, вместо них будет применена кодировка по умолчанию и сгенерировано предупреждение.
Если параметр double_encode выключен, то PHP не будет преобразовывать существующие html-сущности. По умолчанию преобразуется все без ограничений.
Возвращаемые значения
Преобразованная строка ( string ).
Примеры
Пример #1 Пример использования htmlspecialchars()
How can strip whitespaces in PHP’s variable?
I know this comment PHP.net. I would like to have a similar tool like tr for PHP such that I can run simply
I run unsuccessfully the function php_strip_whitespace by
I run the regex function also unsuccessfully
15 Answers 15
To strip any whitespace, you can use a regular expression
See also this answer for something which can handle whitespace in UTF-8 strings.
A regular expression does not account for UTF-8 characters by default. The \s meta-character only accounts for the original latin set. Therefore, the following command only removes tabs, spaces, carriage returns and new lines
With UTF-8 becoming mainstream this expression will more frequently fail/halt when it reaches the new utf-8 characters, leaving white spaces behind that the \s cannot account for.
To deal with the new types of white spaces introduced in unicode/utf-8, a more extensive string is required to match and removed modern white space.
Because regular expressions by default do not recognize multi-byte characters, only a delimited meta string can be used to identify them, to prevent the byte segments from being alters in other utf-8 characters ( \x80 in the quad set could replace all \x80 sub-bytes in smart quotes)
This accounts for and removes tabs, newlines, vertical tabs, formfeeds, carriage returns, spaces, and additionally from here:
nextline, non-breaking spaces, mongolian vowel separator, [en quad, em quad, en space, em space, three-per-em space, four-per-em space, six-per-em space, figure space, punctuation space, thin space, hair space, zero width space, zero width non-joiner, zero width joiner], line separator, paragraph separator, narrow no-break space, medium mathematical space, word joiner, ideographical space, and the zero width non-breaking space.
Many of these wreak havoc in xml files when exported from automated tools or sites which foul up text searches, recognition, and can be pasted invisibly into PHP source code which causes the parser to jump to next command (paragraph and line separators) which causes lines of code to be skipped resulting in intermittent, unexplained errors that we have begun referring to as «textually transmitted diseases»
[Its not safe to copy and paste from the web anymore. Use a character scanner to protect your code. lol]