php yaml parse file
The Yaml Component
What is It?¶
The Symfony Yaml component parses YAML strings to convert them to PHP arrays. It is also able to convert PHP arrays to YAML strings.
YAML, YAML Ain’t Markup Language, is a human friendly data serialization standard for all programming languages. YAML is a great format for your configuration files. YAML files are as expressive as XML files and as readable as INI files.
The Symfony Yaml Component implements a selected subset of features defined in the YAML 1.2 version specification.
Learn more about the Yaml component in the The YAML Format article.
Installation¶
If you install this component outside of a Symfony application, you must require the vendor/autoload.php file in your code to enable the class autoloading mechanism provided by Composer. Read this article for more details.
One of the goals of Symfony Yaml is to find the right balance between speed and features. It supports just the needed features to handle configuration files. Notable lacking features are: document directives, multi-line quoted messages, compact block collections and multi-document files.
Real Parser¶
It sports a real parser and is able to parse a large subset of the YAML specification, for all your configuration needs. It also means that the parser is pretty robust, easy to understand, and simple enough to extend.
Clear Error Messages¶
Whenever you have a syntax problem with your YAML files, the library outputs a helpful message with the filename and the line number where the problem occurred. It eases the debugging a lot.
Dump Support¶
It is also able to dump PHP arrays to YAML with object support, and inline level configuration for pretty outputs.
Types Support¶
It supports most of the YAML built-in types like dates, integers, octal numbers, booleans, and much more…
Full Merge Key Support¶
Full support for references, aliases, and full merge key. Don’t repeat yourself by referencing common configuration bits.
Using the Symfony YAML Component¶
The Symfony Yaml component consists of two main classes: one parses YAML strings ( Symfony\Component\Yaml\Parser ), and the other dumps a PHP array to a YAML string ( Symfony\Component\Yaml\Dumper ).
On top of these two classes, the Symfony\Component\Yaml\Yaml class acts as a thin wrapper that simplifies common uses.
Reading YAML Contents¶
The parse() method parses a YAML string and converts it to a PHP array:
If an error occurs during parsing, the parser throws a Symfony\Component\Yaml\Exception\ParseException exception indicating the error type and the line in the original YAML string where the error occurred:
Reading YAML Files¶
The parseFile() method parses the YAML contents of the given file path and converts them to a PHP value:
If an error occurs during parsing, the parser throws a ParseException exception.
Writing YAML Files¶
The dump() method dumps any PHP array to its YAML representation:
If an error occurs during the dump, the parser throws a Symfony\Component\Yaml\Exception\DumpException exception.
Expanded and Inlined Arrays¶
The YAML format supports two kind of representation for arrays, the expanded one, and the inline one. By default, the dumper uses the expanded representation:
The second argument of the dump() method customizes the level at which the output switches from the expanded representation to the inline one:
yaml_parse_file
yaml_parse_file — Parse a YAML stream from a file
Описание
Convert all or part of a YAML document stream read from a file to a PHP variable.
Список параметров
If ndocs is provided, then it is filled with the number of documents found in stream.
Content handlers for YAML nodes. Associative array of YAML tag => callable mappings. See parse callbacks for more details.
Возвращаемые значения
Returns the value encoded in input in appropriate PHP type или FALSE в случае возникновения ошибки. If pos is -1 an array will be returned with one entry for each document found in the stream.
Примечания
Processing untrusted user input with yaml_parse_file() is dangerous if the use of unserialize() is enabled for nodes using the !php/object tag. This behavior can be disabled by using the yaml.decode_php ini setting.
Смотрите также
Коментарии
As Jesse Donat mentioned the type will be infered automatically. To enforce some type you can use the callback facility like this:
Array
(
[event1] => Array
(
[name] => My Event
[date] => DateTime Object
(
[date] => 2001-05-25 00:00:00
[timezone_type] => 3
[timezone] => Europe/Berlin
)
BTW if you want to have large numbers you are probably using BC Math. Thus, you simple enclose your number in quotes:
= largenumber: ‘14695760472279668267313200104308’
YAML;
?>
When trying to read an empty file, yaml_parse_file() throws a warning:
PHP Warning: yaml_parse_file(): end of stream reached without finding document 0
Be aware that when parsing yaml an unquoted Y value will become a boolean true
This may be desired or undesired behavior depending on context
— chr_name: X // becomes string X
— chr_name: Y // becomes boolean true
You definitely don’t want chromosome Y becoming chromosome 1 (true) as happened to me, so heads up!
yaml_parse_file
yaml_parse_file — Parse a YAML stream from a file
Описание
Convert all or part of a YAML document stream read from a file to a PHP variable.
Список параметров
If ndocs is provided, then it is filled with the number of documents found in stream.
Content handlers for YAML nodes. Associative array of YAML tag => callable mappings. See parse callbacks for more details.
Возвращаемые значения
Returns the value encoded in input in appropriate PHP type или FALSE в случае возникновения ошибки. If pos is -1 an array will be returned with one entry for each document found in the stream.
Смотрите также
Коментарии
As Jesse Donat mentioned the type will be infered automatically. To enforce some type you can use the callback facility like this:
Array
(
[event1] => Array
(
[name] => My Event
[date] => DateTime Object
(
[date] => 2001-05-25 00:00:00
[timezone_type] => 3
[timezone] => Europe/Berlin
)
BTW if you want to have large numbers you are probably using BC Math. Thus, you simple enclose your number in quotes:
= largenumber: ‘14695760472279668267313200104308’
YAML;
?>
When trying to read an empty file, yaml_parse_file() throws a warning:
PHP Warning: yaml_parse_file(): end of stream reached without finding document 0
Be aware that when parsing yaml an unquoted Y value will become a boolean true
This may be desired or undesired behavior depending on context
— chr_name: X // becomes string X
— chr_name: Y // becomes boolean true
You definitely don’t want chromosome Y becoming chromosome 1 (true) as happened to me, so heads up!
How can I parse a YAML file from a Linux shell script?
I wish to provide a structured configuration file which is as easy as possible for a non-technical user to edit (unfortunately it has to be a file) and so I wanted to use YAML. I can’t find any way of parsing this from a Unix shell script however.
21 Answers 21
Here is a bash-only parser that leverages sed and awk to parse simple yaml files:
It understands files such as:
Which, when parsed using:
it also understands yaml files, generated by ruby which may include ruby symbols, like:
and will output the same as in the previous example.
typical use within a script is:
parse_yaml accepts a prefix argument so that imported settings all have a common prefix (which will reduce the risk of namespace collisions).
Note that previous settings in a file can be referred to by later settings:
Another nice usage is to first parse a defaults file and then the user settings, which works since the latter settings overrides the first ones:
I’ve written shyaml in python for YAML query needs from the shell command line.
Example’s YAML file (with complex features):
More complex looping query on complex values:
More sample and documentation are available on the shyaml github page or the shyaml PyPI page.
My use case may or may not be quite the same as what this original post was asking, but it’s definitely similar.
I need to pull in some YAML as bash variables. The YAML will never be more than one level deep.
YAML looks like so:
I achieved the output with this line:
yq is a lightweight and portable command-line YAML processor
The aim of the project is to be the jq or sed of yaml files.
As an example (stolen straight from the documentation), given a sample.yaml file of:
Given that Python3 and PyYAML are quite easy dependencies to meet nowadays, the following may help:
It’s possible to pass a small script to some interpreters, like Python. An easy way to do so using Ruby and its YAML library is the following:
, where data is a hash (or array) with the values from yaml.
As a bonus, it’ll parse Jekyll’s front matter just fine.
here an extended version of the Stefan Farestam’s answer:
produces this output:
Edit: I have created a github repository for this.
I just wrote a parser that I called Yay! (Yaml ain’t Yamlesque!) which parses Yamlesque, a small subset of YAML. So, if you’re looking for a 100% compliant YAML parser for Bash then this isn’t it. However, to quote the OP, if you want a structured configuration file which is as easy as possible for a non-technical user to edit that is YAML-like, this may be of interest.
It’s inspred by the earlier answer but writes associative arrays (yes, it requires Bash 4.x) instead of basic variables. It does so in a way that allows the data to be parsed without prior knowledge of the keys so that data-driven code can be written.
As well as the key/value array elements, each array has a keys array containing a list of key names, a children array containing names of child arrays and a parent key that refers to its parent.
This is an example of Yamlesque:
Here is an example showing how to use it:
And here is the parser:
There is some documentation in the linked source file and below is a short explanation of what the code does.
It writes valid bash commands to its standard output that, if executed, define arrays representing the contents of the input data file. The first of these defines the top-level array:
The two expressions are similar; they differ only because the first one picks out quoted values where as the second one picks out unquoted ones.
The File Separator (28/hex 12/octal 034) is used because, as a non-printable character, it is unlikely to be in the input data.
The result is piped into awk which processes its input one line at a time. It uses the FS character to assign each field to a variable:
All lines have an indent (possibly zero) and a key but they don’t all have a value. It computes an indent level for the line dividing the length of the first field, which contains the leading whitespace, by two. The top level items without any indent are at indent level zero.
Next, it works out what prefix to use for the current item. This is what gets added to a key name to make an array name. There’s a root_prefix for the top-level array which is defined as the data set name and an underscore:
For the top level (indent level zero) the data set prefix is used as the parent key so it has no prefix (it’s set to «» ). All other arrays are prefixed with the root prefix.
Next, the current key is inserted into an (awk-internal) array containing the keys. This array persists throughout the whole awk session and therefore contains keys inserted by prior lines. The key is inserted into the array using its indent as the array index.
Because this array contains keys from previous lines, any keys with an indent level grater than the current line’s indent level are removed:
This leaves the keys array containing the key-chain from the root at indent level 0 to the current line. It removes stale keys that remain when the prior line was indented deeper than the current line.
The final section outputs the bash commands: an input line without a value starts a new indent level (a collection in YAML parlance) and an input line with a value adds a key to the current collection.
When a key has a value, a key with that value is assigned to the current collection like this:
The first statement outputs the command to assign the value to an associative array element named after the key and the second one outputs the command to add the key to the collection’s space-delimited keys list:
When a key doesn’t have a value, a new collection is started like this:
The first statement outputs the command to add the new collection to the current’s collection’s space-delimited children list and the second one outputs the command to declare a new associative array for the new collection:
All of the output from yay_parse can be parsed as bash commands by the bash eval or source built-in commands.
Moving my answer from How to convert a json response into yaml in bash, since this seems to be the authoritative post on dealing with YAML text parsing from command line.
Both are available for installation via standard installation package managers on almost all major distributions
Both the versions have some pros and cons over the other, but a few valid points to highlight (adopted from their repo instructions)
kislyuk/yq
mikefarah/yq
My take on the following YAML (referenced in other answer as well) with both the versions
Various actions to be performed with both the implementations (some frequently used operations)
Using kislyuk/yq
Using mikefarah/yq
Another option is to convert the YAML to JSON, then use jq to interact with the JSON representation either to extract information from it or edit it.
Content of sample.yaml:
A quick way to do the thing now (previous ones haven’t worked for me):
Example asd.yaml:
parsing root:
parsing key3:
then in your bash script
Also if you are using jq you can do something like that
Because js-yaml converts a yaml file to a json string literal. You can then use the string with any json parser in your unix system.
If you have python 2 and PyYAML, you can use this parser I wrote called parse_yaml.py. Some of the neater things it does is let you choose a prefix (in case you have more than one file with similar variables) and to pick a single value from a yaml file.
For example if you have these yaml files:
You can load both without conflict.
And even cherry pick the values you want.
You could use an equivalent of yq that is written in golang:
Whenever you need a solution for «How to work with YAML/JSON/compatible data from a shell script» which works on just about every OS with Python (*nix, OSX, Windows), consider yamlpath, which provides several command-line tools for reading, writing, searching, and merging YAML, EYAML, JSON, and compatible files. Since just about every OS either comes with Python pre-installed or it is trivial to install, this makes yamlpath highly portable. Even more interesting: this project defines an intuitive path language with very powerful, command-line-friendly syntax that enables accessing one or more nodes.
To your specific question and after installing yamlpath using Python’s native package manager or your OS’s package manager (yamlpath is available via RPM to some OSes):
You didn’t specify that the data was a simple Scalar value though, so let’s up the ante. What if the result you want is an Array? Even more challenging, what if it’s an Array-of-Hashes and you only want one property of each result? Suppose further that your data is actually spread out across multiple YAML files and you need all the results in a single query. That’s a much more interesting question to demonstrate with. So, suppose you have these two YAML files:
File: data1.yaml
File: data2.yaml
How would you report only the sku of every item in inventory after applying the changes from data2.yaml to data1.yaml, all from a shell script? Try this:
You get exactly what you need from only a few lines of code:
As you can see, yamlpath turns very complex problems into trivial solutions. Note that the entire query was handled as a stream; no YAML files were changed by the query and there were no temp files.
I realize this is «yet another tool to solve the same question» but after reading the other answers here, yamlpath appears more portable and robust than most alternatives. It also fully understands YAML/JSON/compatible files and it does not need to convert YAML to JSON to perform requested operations. As such, comments within the original YAML file are preserved whenever you need to change data in the source YAML file. Like some alternatives, yamlpath is also portable across OSes. More importantly, yamlpath defines a query language that is extremely powerful, enabling very specialized/filtered data queries. It can even operate against results from disparate parts of the file in a single query.