php base64 decode utf 8
base64_decode
(PHP 4, PHP 5, PHP 7, PHP 8)
base64_decode — Декодирует данные, закодированные MIME base64
Описание
Список параметров
Возвращаемые значения
Возвращает декодированные данные или false в случае возникновения ошибки. Возвращаемые данные могут быть бинарными.
Примеры
Пример #1 Пример использования base64_decode()
Результат выполнения данного примера:
Смотрите также
User Contributed Notes 17 notes
If you want to save data that is derived from a Javascript canvas.toDataURL() function, you have to convert blanks into plusses. If you do not do that, the decoded data is corrupted:
Base64 for URL parameters/filenames, that adhere to RFC 4648.
Defaults to dropping the padding on encode since it’s not required for decoding, and keeps the URL free of % encodings.
The base64-decoding function is a homomorphism between modulo 4 and modulo 3-length segmented strings. That motivates a divide and conquer approach: Split the encoded string into substrings counting modulo 4 chars, then decode each substring and concatenate all of them.
This function supports «base64url» as described in Section 5 of RFC 4648, «Base 64 Encoding with URL and Filename Safe Alphabet»
To follow up on Starson’s post, PHP was changed to no longer treat a space as if it were a plus sign in CVS revision 1.43.2.1, which corresponds to PHP 5.1.0. You can see what happened with a diff to branch point 1.43 at:
The CVS log indicates that this change was made to fix bug #34214 (base64_decode() does not properly ignore whitespace).
It would seem from the comment preceding the code which was removed that the treatment of the space as if it were the plus sign was actually intentional at one time:
When Base64 gets POSTed, all pluses are interpreted as spaces.
This line changes them back. It’s not exactly the Base64 spec,
but it is completely compatible with it (the spec says that spaces
are invalid). This will also save many people considerable
headache.
However, RFC 3548 states that characters not in the Base64 alphabet should either be ignored or cause the implementation to reject the encoding and RFC 2045 says they should be ignored. So the original code was unfortunately not fully compatible with the spec or other implementations. It may have also masked problems with code not properly escaping POST variables.
base64_encode
(PHP 4, PHP 5, PHP 7, PHP 8)
base64_encode — Кодирует данные в формат MIME base64
Описание
Кодирует string с base64.
Эта кодировка предназначена для корректной передачи бинарных данных по протоколам, не поддерживающим 8-битную передачу, например, для отправки тела письма.
Данные, закодированные base64 занимают на 33% больше места по сравнению с оригинальными данными.
Список параметров
Данные для кодирования.
Возвращаемые значения
Кодированные данные в виде строки.
Примеры
Пример #1 Пример использования base64_encode()
Результат выполнения данного примера:
Смотрите также
User Contributed Notes 35 notes
For anyone interested in the ‘base64url’ variant encoding, you can use this pair of functions:
gutzmer at usa dot net’s ( http://php.net/manual/en/function.base64-encode.php#103849 ) base64url_decode() function doesn’t pad longer strings with ‘=’s. Here is a corrected version:
function base64_encode_url($string) <
return str_replace([‘+’,’/’,’=’], [‘-‘,’_’,»], base64_encode($string));
>
Checked here with random_bytes() and random lengths:
Unfortunately my «function» for encoding base64 on-the-fly from 2007 [which has been removed from the manual in favor of this post] had 2 errors!
The first led to an endless loop because of a missing «$feof»-check, the second caused the rare mentioned errors when encoding failed for some reason in larger files, especially when
setting fgets($fh, 2) for example. But lower values then 1024 are bad overall because they slow down the whole process, so 4096 will be fine for all purposes, I guess.
The error was caused by the use of «empty()».
Here comes the corrected version which I have tested for all kind of files and length (up to 4,5 Gb!) without any error:
$cache = » ;
$eof = false ;
Base64 encoding of large files.
So if you read from the input file in chunks of 8151 (=57*143) bytes you will get (up to) 8151 eight-bit symbols, which encode as exactly 10868 six-bit symbols, which then wrap to exactly 143 MIME-formatted lines. There is no need to retain left-over symbols (either six- or eight-bit) from one chunk to the next. Just read a chunk, encode it, write it out, and go on to the next chunk. Obviously the last chunk will probably be shorter, but encoding it is still independent of the rest.
?>
Conversely, each 76-character MIME-formatted line (not counting the trailing CRLF) contains exactly enough data for 57 bytes of output without needing to retain leftover bits that need prepending to the next line. What that means is that each line can be decoded independently of the others, and the decoded chunks can then be concatenated together or written out sequentially. However, this does make the assumption that the encoded data really is MIME-formatted; without that assurance it is necessary to accept that the base64 data won’t be so conveniently arranged.
A function I’m using to return local images as base64 encrypted code, i.e. embedding the image source into the html request.
This will greatly reduce your page load time as the browser will only need to send one server request for the entire page, rather than multiple requests for the HTML and the images. Requests need to be uploaded and 99% of the world are limited on their upload speed to the server.
Using Javascript’s atob to decode base64 doesn’t properly decode utf-8 strings
I’m using the Javascript window.atob() function to decode a base64-encoded string (specifically the base64-encoded content from the GitHub API). Problem is I’m getting ASCII-encoded characters back (like ⢠instead of ™ ). How can I properly handle the incoming base64-encoded stream so that it’s decoded as utf-8?
10 Answers 10
The Unicode Problem
Though JavaScript (ECMAScript) has matured, the fragility of Base64, ASCII, and Unicode encoding has caused a lot of headache (much of it is in this question’s history).
Consider the following example:
Why do we encounter this?
Base64, by design, expects binary data as its input. In terms of JavaScript strings, this means strings in which each character occupies only one byte. So if you pass a string into btoa() containing characters that occupy more than one byte, you will get an error, because this is not considered binary data.
The «Unicode Problem» Since DOMString s are 16-bit-encoded strings, in most browsers calling window.btoa on a Unicode string will cause a Character Out Of Range exception if a character exceeds the range of a 8-bit byte (0x00
Solution with binary interoperability
(Keep scrolling for the ASCII base64 solution)
The solution recommended by MDN is to actually encode to and from a binary string representation:
Encoding UTF8 ⇢ binary
Decoding binary ⇢ UTF-8
Solution with ASCII base64 interoperability
The entire history of this question shows just how many different ways we’ve had to work around broken encoding systems over the years. Though the original MDN article no longer exists, this solution is still arguably a better one, and does a great job of solving «The Unicode Problem» while maintaining plain text base64 strings that you can decode on, say, base64decode.org.
There are two possible methods to solve this problem:
If you’re trying to save yourself some time, you could also consider using a library:
Encoding UTF8 ⇢ base64
Decoding base64 ⇢ UTF8
TypeScript support
Here’s same solution with some additional TypeScript compatibility (via @MA-Maddin):
The first solution (deprecated)
This used escape and unescape (which are now deprecated, though this still works in all modern browsers):
And one last thing: I first encountered this problem when calling the GitHub API. To get this to work on (Mobile) Safari properly, I actually had to strip all white space from the base64 source before I could even decode the source. Whether or not this is still relevant in 2021, I don’t know:
Things change. The escape/unescape methods have been deprecated.
You can URI encode the string before you Base64-encode it. Note that this does’t produce Base64-encoded UTF8, but rather Base64-encoded URL-encoded data. Both sides must agree on the same encoding.
For OP’s problem a third party library such as js-base64 should solve the problem.
If treating strings as bytes is more your thing, you can use the following functions
The part where we encode from Unicode/UTF-8 is
This is one of the most used methods nowadays.
Decoding base64 to UTF8 String
Below is current most voted answer by @brandonscript
Above code can work, but it’s very slow. If your input is a very large base64 string, for example 30,000 chars for a base64 html document. It will need lots of computation.
Here is my answer, use built-in TextDecoder, nearly 10x faster than above code for large input.
Here is 2018 updated solution as described in the Mozilla Development Resources
TO ENCODE FROM UNICODE TO B64
TO DECODE FROM B64 TO UNICODE
I would assume that one might want a solution that produces a widely useable base64 URI. Please visit data:text/plain;charset=utf-8;base64,4pi44pi54pi64pi74pi84pi+4pi/ to see a demonstration (copy the data uri, open a new tab, paste the data URI into the address bar, then press enter to go to the page). Despite the fact that this URI is base64-encoded, the browser is still able to recognize the high code points and decode them properly. The minified encoder+decoder is 1058 bytes (+Gzip→589 bytes)
Below is the source code used to generate it.
Then, to decode the base64 data, either HTTP get the data as a data URI or use the function below.
The advantage of being more standard is that this encoder and this decoder are more widely applicable because they can be used as a valid URL that displays correctly. Observe.
base64_decode
base64_decode — Декодирует данные, закодированные алгоритмом MIME base64
Описание
Список параметров
Возвращаемые значения
Возвращает декодированные данные или FALSE в случае возникновения ошибки. Возвращаемые данные могут быть бинарными.
Список изменений
Версия | Описание |
---|---|
5.2.0 | Добавлен параметр strict |
Примеры
Пример #1 Пример использования base64_decode()
Результат выполнения данного примера:
Смотрите также
Коментарии
I used to do uudecode as a C module, but I’ve discovered a really fast way to do it in PHP. Here it is:
this script can correct the bug
I was wondering how to decode attached images within mails. Basically they are mostly JPEG files, so it was obviously to write a function that decodes JPEG images.
I guess the plainest way to do so was the following:
To expand on Jes’ post:
The change took place between 5.0.5 and 5.1.0. Exactly where I don’t know or care.
This function supports «base64url» as described in Section 5 of RFC 4648, «Base 64 Encoding with URL and Filename Safe Alphabet»
Here is a drop-in replacement for base64_decode(), based on a faster version of morgangalpin’s code:
@morgangalpin att gmail dotty com
A better implementation would be the following regular expression:
Which will also detect the usage of = or == at the end of the string (and only end).
If this regex isn’t following proper RFC guidelines, please comment on it.
A function geared specifically toward this:
is_base64_encoded ( «iash21iawhdj98UH3» ); // true
is_base64_encoded ( «#iu3498r» ); // false
is_base64_encoded ( «asiudfh9w=8uihf» ); // false
is_base64_encoded ( «a398UIhnj43f/1!+sadfh3w84hduihhjw==» ); // true
To follow up on Starson’s post, PHP was changed to no longer treat a space as if it were a plus sign in CVS revision 1.43.2.1, which corresponds to PHP 5.1.0. You can see what happened with a diff to branch point 1.43 at:
The CVS log indicates that this change was made to fix bug #34214 (base64_decode() does not properly ignore whitespace).
It would seem from the comment preceding the code which was removed that the treatment of the space as if it were the plus sign was actually intentional at one time:
When Base64 gets POSTed, all pluses are interpreted as spaces.
This line changes them back. It’s not exactly the Base64 spec,
but it is completely compatible with it (the spec says that spaces
are invalid). This will also save many people considerable
headache.
However, RFC 3548 states that characters not in the Base64 alphabet should either be ignored or cause the implementation to reject the encoding and RFC 2045 says they should be ignored. So the original code was unfortunately not fully compatible with the spec or other implementations. It may have also masked problems with code not properly escaping POST variables.
You can do partial decoding (e.g. from buffered input streams) if you choose a chunk length that is multiple of 4:
?>
4 encoded chars represent 3 original chars. The «=» character is used as padding.
I had a problem testing whether an imap message body was base64 encoded on a pre 5.2.* server. I had been using this function on a post 5.2 server.
I found that the function imap_base64() returns FALSE on failing to decode a string, and that I could use that to check instead.
The base64-decoding function is a homomorphism between modulo 4 and modulo 3-length segmented strings. That motivates a divide and conquer approach: Split the encoded string into substrings counting modulo 4 chars, then decode each substring and concatenate all of them.
If you want to save data that is derived from a Javascript canvas.toDataURL() function, you have to convert blanks into plusses. If you do not do that, the decoded data is corrupted:
base64_decode seems to fail when decoding big files/strings. I had an issue decoding a 7MB image file. Here is a solution that worked for me:
Base64 for URL parameters/filenames, that adhere to RFC 4648.
Defaults to dropping the padding on encode since it’s not required for decoding, and keeps the URL free of % encodings.
The docs don’t make this explicitly clear, but if you omit `$strict` or set it to `false` then invalid characters in the encoded input will be silently ignored.
function is_base64($str) <
if($str === base64_encode(base64_decode($str))) <
return true;
>
return false;
>
if(is_base64($str)) <
print base64_decode($str);
>
base64_decode
base64_decode — Декодирует данные закодированные алгоритмом MIME base64
Описание
Декодирует строку, кодированную при помощи base64.
Список параметров
Данные для декодирования.
Возвращаемые значения
Возвращает декодированные данные или FALSE в случае ошибки. Возвращаемые данные могут быть бинарными.
Список изменений
Версия | Описание |
---|---|
5.2.0 | добавлен параметр strict |
Примеры
Пример #1 Пример использования base64_decode()
Результат выполнения данного примера:
Смотрите также
Коментарии
I used to do uudecode as a C module, but I’ve discovered a really fast way to do it in PHP. Here it is:
this script can correct the bug
I was wondering how to decode attached images within mails. Basically they are mostly JPEG files, so it was obviously to write a function that decodes JPEG images.
I guess the plainest way to do so was the following:
To expand on Jes’ post:
The change took place between 5.0.5 and 5.1.0. Exactly where I don’t know or care.
This function supports «base64url» as described in Section 5 of RFC 4648, «Base 64 Encoding with URL and Filename Safe Alphabet»
Here is a drop-in replacement for base64_decode(), based on a faster version of morgangalpin’s code:
@morgangalpin att gmail dotty com
A better implementation would be the following regular expression:
Which will also detect the usage of = or == at the end of the string (and only end).
If this regex isn’t following proper RFC guidelines, please comment on it.
A function geared specifically toward this:
is_base64_encoded ( «iash21iawhdj98UH3» ); // true
is_base64_encoded ( «#iu3498r» ); // false
is_base64_encoded ( «asiudfh9w=8uihf» ); // false
is_base64_encoded ( «a398UIhnj43f/1!+sadfh3w84hduihhjw==» ); // true
To follow up on Starson’s post, PHP was changed to no longer treat a space as if it were a plus sign in CVS revision 1.43.2.1, which corresponds to PHP 5.1.0. You can see what happened with a diff to branch point 1.43 at:
The CVS log indicates that this change was made to fix bug #34214 (base64_decode() does not properly ignore whitespace).
It would seem from the comment preceding the code which was removed that the treatment of the space as if it were the plus sign was actually intentional at one time:
When Base64 gets POSTed, all pluses are interpreted as spaces.
This line changes them back. It’s not exactly the Base64 spec,
but it is completely compatible with it (the spec says that spaces
are invalid). This will also save many people considerable
headache.
However, RFC 3548 states that characters not in the Base64 alphabet should either be ignored or cause the implementation to reject the encoding and RFC 2045 says they should be ignored. So the original code was unfortunately not fully compatible with the spec or other implementations. It may have also masked problems with code not properly escaping POST variables.
You can do partial decoding (e.g. from buffered input streams) if you choose a chunk length that is multiple of 4:
?>
4 encoded chars represent 3 original chars. The «=» character is used as padding.
I had a problem testing whether an imap message body was base64 encoded on a pre 5.2.* server. I had been using this function on a post 5.2 server.
I found that the function imap_base64() returns FALSE on failing to decode a string, and that I could use that to check instead.
The base64-decoding function is a homomorphism between modulo 4 and modulo 3-length segmented strings. That motivates a divide and conquer approach: Split the encoded string into substrings counting modulo 4 chars, then decode each substring and concatenate all of them.
If you want to save data that is derived from a Javascript canvas.toDataURL() function, you have to convert blanks into plusses. If you do not do that, the decoded data is corrupted:
base64_decode seems to fail when decoding big files/strings. I had an issue decoding a 7MB image file. Here is a solution that worked for me:
Base64 for URL parameters/filenames, that adhere to RFC 4648.
Defaults to dropping the padding on encode since it’s not required for decoding, and keeps the URL free of % encodings.
The docs don’t make this explicitly clear, but if you omit `$strict` or set it to `false` then invalid characters in the encoded input will be silently ignored.
function is_base64($str) <
if($str === base64_encode(base64_decode($str))) <
return true;
>
return false;
>
if(is_base64($str)) <
print base64_decode($str);
>