php ansi to utf 8
PHP: Converting UTF-8 string to Ansi?
Now I offer this string for download, like this:
When I open this in Notepad++ for example, it says Ansi as UTF-8. How can I chnage that to Ansi only?
That did not change anything.
3 Answers 3
But you will eventually lose data because ANSI can only encode a small subset of UTF-8. If you don’t have a very strong reason against it, serve your files UTF-8 encoded.
Since there is a misunderstanding about ISO-8859-1, Windows-1252 & ANSI in your question an important thing to note here is that:
The so-called Windows character set (WinLatin1, or Windows code page 1252, to be exact) uses some of those positions for printable characters. Thus, the Windows character set is NOT identical with ISO 8859-1. The Windows character set is often called «ANSI character set», but this is SERIOUSLY MISLEADING. It has NOT been approved by ANSI.
Historical background: Microsoft based the design of the set on a draft for an ANSI standard. A glossary by Microsoft explicitly admits this.
Some more resources: here and here.
So just FYI for other people that end up in this question.
Here’s MS’s exact explanation on this:
The term “ANSI” as used to signify Windows code pages is a historical reference, but is nowadays a misnomer that continues to persist in the Windows community. The source of this comes from the fact that the Windows code page 1252 was originally based on an ANSI draft—which became International Organization for Standardization (ISO) Standard 8859-1. “ANSI applications” are usually a reference to non-Unicode or code page–based applications.
mb_convert_encoding
(PHP 4 >= 4.0.6, PHP 5, PHP 7, PHP 8)
mb_convert_encoding — Преобразует кодировку символов
Описание
Список параметров
Строка ( string ) или массив ( array ), для преобразования.
Параметр для указания исходной кодировки строки. Это может быть массив ( array ), или строка со списком кодировок через запятую. Если параметр from_encoding не указан, то кодировка определяется автоматически.
Возвращаемые значения
Преобразованная строка ( string ) или массив ( array ) или false в случае возникновения ошибки.
Ошибки
Список изменений
Примеры
Пример #1 Пример использования mb_convert_encoding()
Смотрите также
User Contributed Notes 32 notes
For my last project I needed to convert several CSV files from Windows-1250 to UTF-8, and after several days of searching around I found a function that is partially solved my problem, but it still has not transformed all the characters. So I made this:
I’ve been trying to find the charset of a norwegian (with a lot of ø, æ, å) txt file written on a Mac, i’ve found it in this way:
= «A strange string to pass, maybe with some ø, æ, å characters.» ;
Hope can help someone
in your php.ini. Be sure to include the quotes around none. Or at run-time with
Hey guys. For everybody who’s looking for a function that is converting an iso-string to utf8 or an utf8-string to iso, here’s your solution:
public function encodeToUtf8($string) <
return mb_convert_encoding($string, «UTF-8», mb_detect_encoding($string, «UTF-8, ISO-8859-1, ISO-8859-15», true));
>
public function encodeToIso($string) <
return mb_convert_encoding($string, «ISO-8859-1», mb_detect_encoding($string, «UTF-8, ISO-8859-1, ISO-8859-15», true));
>
For me these functions are working fine. Give it a try
My solution below was slightly incorrect, so here is the correct version (I posted at the end of a long day, never a good idea!)
Again, this is a quick and dirty solution to stop mb_convert_encoding from filling your string with question marks whenever it encounters an illegal character for the target encoding.
Another sample of recoding without MultiByte enabling.
(Russian koi->win, if input in win-encoding already, function recode() returns unchanged string)
Read ansi file and convert to UTF-8 string
Is there any way to do that with PHP?
The data to be inserted looks fine when I print it out.
But when I insert it in the database the field becomes empty.
3 Answers 3
Strange thing is you end up with an empty string in your DB. I can understand you’ll end up with some garbarge in your DB but nothing at all (empty string) is strange.
I just typed this in my console:
These are possible values for YOUR CURRENT CHARSET As pointed out before when your input string contains chars that are allowed in UTF, you dont need to convert anything.
Change UTF-8 in UTF-8//TRANSLIT when you dont want to omit chars but replace them with a look-a-like (when they are not in the UTF-8 set)
«ANSI» is not really a charset. It’s a short way of saying «whatever charset is the default in the computer that creates the data». So you have a double task:
For #2, I’m normally happy with iconv() but utf8_encode() can also do the job if source data happens to use ISO-8859-1.
Update
It looks like you don’t know what charset your data is using. In some cases, you can figure it out if you know the country and language of the user (e.g., Spain/Spanish) through the default encoding used by Microsoft Windows in such territory.
Be careful, using iconv() can return false if the conversion fails.
I am also having a somewhat similar problem, some characters from the Chinese alphabet are mistaken for \n if the file is encoded in UNICODE, but not if it is UFT-8.
To get back to your problem, make sure the encoding of your file is the same with the one of your database. Also using utf-8_encode() on an already utf-8 text can have unpleasant results. Try using mb_detect_encoding() to see the encoding of the file, but unfortunately this way doesn’t always work. There is no easy fix for character encoding from what i can see 🙁
Php преобразовать кодировку в utf 8
(PHP 4 >= 4.0.6, PHP 5, PHP 7)
mb_convert_encoding — Преобразует кодировку символов
Описание
Список параметров
Строка ( string ) или массив ( array ), для преобразования.
Параметр для указания исходной кодировки строки. Это может быть массив ( array ), или строка со списком кодировок через запятую. Если параметр from_encoding не указан, то кодировка определяется автоматически.
Возвращаемые значения
Преобразованная строка ( string ) или массив ( array ).
Примеры
Пример #1 Пример использования mb_convert_encoding()
Смотрите также
Список изменений
User Contributed Notes 32 notes
For my last project I needed to convert several CSV files from Windows-1250 to UTF-8, and after several days of searching around I found a function that is partially solved my problem, but it still has not transformed all the characters. So I made this:
I’ve been trying to find the charset of a norwegian (with a lot of ø, æ, å) txt file written on a Mac, i’ve found it in this way:
= «A strange string to pass, maybe with some ø, æ, å characters.» ;
The line that looks good, gives you the encoding it was written in.
Hope can help someone
in your php.ini. Be sure to include the quotes around none. Or at run-time with
Hey guys. For everybody who’s looking for a function that is converting an iso-string to utf8 or an utf8-string to iso, here’s your solution:
public function encodeToUtf8($string)
public function encodeToIso($string)
For me these functions are working fine. Give it a try
Hope it helps someone else out
My solution below was slightly incorrect, so here is the correct version (I posted at the end of a long day, never a good idea!)
Again, this is a quick and dirty solution to stop mb_convert_encoding from filling your string with question marks whenever it encounters an illegal character for the target encoding.
Hope this helps someone! (Admins should feel free to delete my previous, incorrect, post for clarity)
-A
To add to the Flash conversion comment below, here’s how I convert back from what I’ve stored in a database after converting from Flash HTML text field output, in order to load it back into a Flash HTML text field:
function htmltoflash($htmlstr)
«,
mb_convert_encoding(html_entity_decode($htmlstr),
«UTF-8″,»ISO-8859-1»))));
>
Why did you use the php html encode functions? mbstring has it’s own Encoding which is (as far as I tested it) much more usefull:
$text = mb_convert_encoding($text, ‘HTML-ENTITIES’, «UTF-8»);
instead of ini_set(), you can try this
(‘macintosh’ is the IANA name for the MacRoman character set.)
But the first one didn’t show extended chars correctly, and the second one, did’t separe fields correctly
Clean a string for use as filename by simply replacing all unwanted characters with underscore (ASCII converts to 7bit). It removes slightly more chars than necessary. Hope its useful.
$dos = mb_convert_encoding($utf8_text, «CP850», mb_detect_encoding($utf8_text, «UTF-8, CP850, ISO-8859-15», true));
Another sample of recoding without MultiByte enabling.
(Russian koi->win, if input in win-encoding already, function recode() returns unchanged string)
Можно ли конвертировать файл в UTF-8 на моем конце?
Если у меня есть доступ к файлу после подачи с
Замечания: Пользователь может загрузить файл CSV с любым типом кодировки, я обычно сталкиваюсь с неизвестный 8-битный кодировок.
Но проблема в том, что этот код удаляет специальные символы, такие как одинарные кавычки.
Я поставил это для дополнительной информации. Спасибо за тех, кто может помочь!
Решение
Попробуйте это.
Пример, который я использовал, был чем-то, что я делал в тестовой среде, возможно, вам придется немного изменить код.
У меня был текстовый файл со следующими данными в:
Затем у меня была форма, в которую входил файл и выполнялся следующий код:
Когда данные публикуются, я сохраняю файл в переменной. Очевидно, что если вы используете multiple Атрибут ваш код будет выглядеть не совсем так.
$handle хранит все содержимое текстового файла в формате только для чтения; следовательно «r» аргумент.
$enc использует mb_detect_encoding функция для определения кодировки (дух).
Сначала у меня были проблемы с получением правильной кодировки. Настройка encoding_list использовать только UTF-8 и настройки strict чтобы быть правдой.
Если кодировка UTF-8, то я просто печатаю строку, если нет, я конвертирую ее в UTF-8, используя iconv функция.
Другие решения
предупреждение, непроверенный код (я внезапно спешу), но может выглядеть примерно так:
Вы можете преобразовать текст файла в двоичные данные, используя следующие
после преобразования данных в двоичный файл вы просто изменяете текст на метод php mb_convert_encoding ($ fileText, «UTF-8»);
(PHP 4 >= 4.0.5, PHP 5, PHP 7)
iconv — Преобразование строки в требуемую кодировку
Описание
Список параметров
Кодировка входной строки.
Требуемая на выходе кодировка.
Как будет работат //TRANSLIT и будет ли вообще, зависит от системной реализации iconv() ( ICONV_IMPL ). Известны некоторые реализации, которые просто игнорируют //TRANSLIT, так что конвертация для символов некорректных для out_charset скорее всего закончится ошибкой.
Строка, которую необходимо преобразовать.
Возвращаемые значения
Возвращает преобразованную строку или FALSE в случае возникновения ошибки.
Список изменений
Версия | Описание |
---|---|
5.4.0 | Начиная с этой версии, функция возвращает FALSE на некорректных символах, только если в выходной кодировке не указан //IGNORE. До этого функция возвращала часть строки. |
Примеры
Пример #1 Пример использования iconv()
Результатом выполнения данного примера будет что-то подобное:
User Contributed Notes 39 notes
The «//ignore» option doesn’t work with recent versions of the iconv library. So if you’re having trouble with that option, you aren’t alone.
That means you can’t currently use this function to filter invalid characters. Instead it silently fails and returns an empty string (or you’ll get a notice but only if you have E_NOTICE enabled).
ini_set(‘mbstring.substitute_character’, «none»);
$text= mb_convert_encoding($text, ‘UTF-8’, ‘UTF-8’);
That will strip invalid characters from UTF-8 strings (so that you can insert it into a database, etc.). Instead of «none» you can also use the value 32 if you want it to insert spaces in place of the invalid characters.
Interestingly, setting different target locales results in different, yet appropriate, transliterations. For example:
//some German
$utf8_sentence = ‘Weiß, Goldmann, Göbel, Weiss, Göthe, Goethe und Götz’ ;
to test different combinations of convertions between charsets (when we don’t know the source charset and what is the convenient destination charset) this is an example :
If you are getting question-marks in your iconv output when transliterating, be sure to ‘setlocale’ to something your system supports.
Some PHP CMS’s will default setlocale to ‘C’, this can be a problem.
use the «locale» command to find out a list..
Like many other people, I have encountered massive problems when using iconv() to convert between encodings (from UTF-8 to ISO-8859-15 in my case), especially on large strings.
The main problem here is that when your string contains illegal UTF-8 characters, there is no really straight forward way to handle those. iconv() simply (and silently!) terminates the string when encountering the problematic characters (also if using //IGNORE), returning a clipped string. The
workaround suggested here and elsewhere will also break when encountering illegal characters, at least dropping a useful note («htmlentities(): Invalid multibyte sequence in argument in. «)
I have found a lot of hints, suggestions and alternative methods (it’s scary and in my opinion no good sign how many ways PHP natively provides to convert the encoding of strings), but none of them really worked, except for this one:
I use this function that does’nt need any extension :
I have not tested it extensively, hope it may help.
For those who have troubles in displaying UCS-2 data on browser, here’s a simple function that convert ucs2 to html unicode entities :
When I asked my linux for locale (by locale command) it returns «cs_CZ.UTF-8», so there is maybe correlation between it.
iconv (GNU libc) 2.6.1
glibc 2.3.6
Here is how to convert UCS-2 numbers to UTF-8 numbers in hex:
echo strtoupper ( ucs2toutf8 ( «06450631062D0020» ));
Input:
06450631062D
Output:
D985D8B1D8AD
I have used iconv to convert from cp1251 into UTF-8. I spent a day to investigate why a string with Russian capital ‘Р’ (sounds similar to ‘r’) at the end cannot be inserted into a database.
The problem is not in iconv. But ‘Р’ in cp1251 is chr(208) and ‘Р’ in UTF-8 is chr(208).chr(106). chr(106) is one of the space symbol which match ‘s’ in regex. So, it can be taken by a greedy ‘+’ or ‘*’ operator. In that case, you loose ‘Р’ in your string.
For example, ‘ГР ‘ (Russian, UTF-8). Function preg_match. Regex is ‘(.+?)[s]*’. Then ‘(.+?)’ matches ‘Г’.chr(208) and ‘[s]*’ matches chr(106).’ ‘.
Although, it is not a bug of iconv, but it looks like it very much. That’s why I put this comment here.
Here is how to convert UTF-8 numbers to UCS-2 numbers in hex:
echo strtoupper ( utf8toucs2 ( «D985D8B1D8AD» )). »
» ;
echo strtoupper ( utf8toucs2 ( «456725» )). »
» ;
I just found out today that the Windows and *NIX versions of PHP use different iconv libraries and are not very consistent with each other.
Here is a repost of my earlier code that now works on more systems. It converts as much as possible and replaces the rest with question marks:
Didn’t know its a feature or not but its works for me (PHP 5.0.4)
test it to convert from windows-1251 (stored in DB) to UTF-8 (which i use for web pages).
BTW i convert each array i fetch from DB with array_walk_recursive.
Here is an example how to convert windows-1251 (windows) or cp1251(Linux/Unix) encoded string to UTF-8 encoding.
Note an important difference between iconv() and mb_convert_encoding() — if you’re working with strings, as opposed to files, you most likely want mb_convert_encoding() and not iconv(), because iconv() will add a byte-order marker to the beginning of (for example) a UTF-32 string when converting from e.g. ISO-8859-1, which can throw off all your subsequent calculations and operations on the resulting string.
In other words, iconv() appears to be intended for use when converting the contents of files — whereas mb_convert_encoding() is intended for use when juggling strings internally, e.g. strings that aren’t being read/written to/from files, but exchanged with some other media.
How do I convert an ANSI encoded file to UTF-8 with Notepad++? [closed]
Want to improve this question? Update the question so it’s on-topic for Stack Overflow.
I have a website, and I can send my Turkish characters with jQuery in Firefox, but Internet Explorer doesn’t send my Turkish characters. I looked at my source file in notepad, and this file’s code page is ANSI.
When I convert it to UTF-8 without BOM and close the file, the file is again ANSI when I reopen.
How can I convert my file from ANSI to UTF-8?
3 Answers 3
Regarding this part:
When I convert it to UTF-8 without bom and close file, the file is again ANSI when I reopen.
The easiest solution is to avoid the problem entirely by properly configuring Notepad++.
That way all the opened ANSI files will be treated as UTF-8 without BOM.
For explanation what’s going on, read the comments below this answer.
To fully learn about Unicode and UTF-8, read this excellent article from Joel Spolsky.
Maybe this is not the answer you needed, but I encountered similar problem, so I decided to put it here.
I needed to convert 500 xml files to UTF8 via Notepad++. Why Notepad++? When I used the option «Encode in UTF8» (many other converters use the same logic) it messed up all special characters, so I had to use «Convert to UTF8» explicitly.
Here some simple steps to convert multiple files via Notepad++ without messing up with special characters (for ex. diacritical marks).