wiki:PyYAML

Version 1 (modified by xi, 8 years ago) (diff)

Move the page  http://trac.xitology.org/pysyck/wiki/PyYAML3000

PyYAML 3000

PyYAML 3000 is the next generation YAML parser for Python.

If you have any questions, please post them to the  YAML-core mailing list.

If you find any bugs, you may post them to the list or open a ticket in the BTS.

Download and installing

Check it out from the SVN repository  http://svn.pyyaml.org/pyyaml/trunk.

Install it by running

$ python setup.py install

Features

  • PyYAML3000 is a complete  YAML 1.1 parser. In particular, PyYAML3000 can parse all examples from the specification. The parsing algorithm is simple enough to be a reference for implementing YAML parsers in C or other languages.
  • PyYAML3000 supports Unicode including UTF-8/UTF-16 input and \u escape sequences.
  • PyYAML3000 provides both a low-level event-based interface to the parser (think SAX) and high-level API for generating native Python objects (think DOM).
  • PyYAML3000 supports almost all types from the  YAML types repository and has a simple API to extend it with application-specific tags.
  • PyYAML3000 produces relatively meaningful error messages.

Note that PyYAML3000 is still young and may have some bugs. In particular, there are two major drawbacks:

  • There in no YAML emitter yet.
  • PyYAML3000 is written in Python and is slow comparing to C based parsers.

High-level API

Warning: API is not stable and may change in the future

Basic examples

Start with importing the package:

>>> import yaml

Define the input data:

>>> data = """
... - YAML
... - is
... - fun!
... """

The parser accepts string objects, unicode objects, open file objects, and unicode file objects.

Now convert it to a native Python object:

>>> yaml.load_document(data)
['YAML', 'is', 'fun!']

PyYAML 3000 supports many of the types defined in the YAML tags repository:

>>> data = """
... - ~
... - true
... - 3_141_592.653e-6
... - 3000
... - PyYAML3000 birthday: 2006-02-11
... - primes (sort of): !!set { 2, 3, 5, 7, 11, 13 }
... - pairs: !!pairs [1: 2, 3: 4, 5: 6]
... """
>>> for x in yaml.load_document(data): print x
None
True
3.141592653
3000
{'PyYAML3000 birthday': datetime.datetime(2006, 2, 11, 0, 0)}
{'primes (sort of)': set([2, 3, 5, 7, 11, 13])}
{'pairs': [(1, 2), (3, 4), (5, 6)]}

The following tags are supported: !!map, !!omap, !!pairs, !!set, !!seq, !!binary, !!bool, !!float, !!int, !!merge, !!null, !!str, !!timestamp, !!value.

Defining custon tags

You may define constructors for your own application-specific tags. You may use either the function yaml.Constructor.add_constructor or subclass from yaml.YAMLObject.

If you use yaml.YAMLObject, you need to define the class attribute yaml_tag and the class method from_yaml:

>>> class Person(yaml.YAMLObject):
...     yaml_tag = '!Person'
...     @classmethod
...     def from_yaml(cls, constructor, node):
...         # Convert the node to a dictionary
...         attributes = constructor.construct_mapping(node)
...         # Convert spaces into underlines.
...         for key in attributes:
...             if ' ' in key:
...                 value = attributes[key]
...                 del attributes[key]
...                 key = key.replace(' ', '_')
...                 attributes[key] = value
...         # Create an object
...         return cls(**attributes)
...     def __init__(self, first_name, last_name, email=None, birthday=None):
...         self.first_name = first_name
...         self.last_name = last_name
...         self.email = email
...         self.birthday = birthday

If you don't want to use metaclass magic, you may define the constructor as a function and register it:

>>> def construct_person(constructor, node):
...     # ...
>>> yaml.Constructor.add_constructor('!Person', construct_person)

After that, PyYAML 3000 will understand the !Person tag and convert it into the Person object:

>>> data = """
... --- !Person
... first name: Kirill
... last name: Simonov
... email: xi(at)resolvent.net
... """
>>> p = yaml.load_document(data)
>>> p
<__main__.Person object at 0xb7de408c>
>>> p.first_name, p.last_name, p.email, p.birthday
('Kirill', 'Simonov', 'xi(at)resolvent.net', None)

Loading all documents

If an input stream contains several documents, you may load all of them using the yaml.load function.

>>> data = """
... This is the first document
... --- # This is an empty document
... ---
... - this
... - is: the
...   last: document
... """
>>> for document in yaml.load(data): print document
This is the first document
None
['this', {'is': 'the', 'last': 'document'}]

There are more features, check the source to find out.

Low-level API

PyYAML 3000 provides low-level event-based and easy-to-use parser API.

Example:

>>> data = """
... --- !tag
... scalar
... ---
... - &anchor item
... - another item
... - *anchor
... ---
... key: value
... ? - complex
...   - key
... : - complex
...   - value
... """
>>> for event in yaml.parse(data): print event

ScalarEvent(anchor=None, tag=u'!tag', value=u'scalar')

SequenceEvent(anchor=None, tag=u'!')
ScalarEvent(anchor=u'anchor', tag=None, value=u'item')
ScalarEvent(anchor=None, tag=None, value=u'another item')
AliasEvent(anchor=u'anchor')
CollectionEndEvent()

MappingEvent(anchor=None, tag=u'!')
ScalarEvent(anchor=None, tag=None, value=u'key')
ScalarEvent(anchor=None, tag=None, value=u'value')
SequenceEvent(anchor=None, tag=u'!')
ScalarEvent(anchor=None, tag=None, value=u'complex')
ScalarEvent(anchor=None, tag=None, value=u'key')
CollectionEndEvent()
SequenceEvent(anchor=None, tag=u'!')
ScalarEvent(anchor=None, tag=None, value=u'complex')
ScalarEvent(anchor=None, tag=None, value=u'value')
CollectionEndEvent()
CollectionEndEvent()

StreamEndEvent()

To Do

For the initial release we need website and documentation.

Long-term goals:

  • fix tabs, indentation for flow collections, indentation for scalars (min=1?), 'y' is !!bool,
  • emitter
  • libyaml3000

Deviations from the specification

  • rules for tabs in YAML are confusing. We are close, but not there yet. Perhaps both the spec and the parser should be fixed. Anyway, the best rule for tabs in YAML is to not use them at all.
  • Byte order mark. The initial BOM is stripped, but BOMs inside the stream are considered as parts of the content. It can be fixed, but it's not really important now.
  • Empty plain scalars are not allowed if alias or tag is specified. This is done to prevent anomalities like [ !tag, value], which can be interpreted both as [ !<!tag,> value ] and [ !<!tag> "", "value" ]. The spec should be fixed.
  • Indentation of flow collections. The spec requires them to be indented more then their block parent node. Unfortunately this rule many intuitively correct constructs invalid, for instance,
    block: {
    } # this is indentation violation according to the spec.
    
  • ':' is not allowed for plain scalars in the flow mode. {1:2} is interpreted as { 1 : 2 }.