Ticket #29 (closed enhancement: wontfix)

Opened 2 years ago

Last modified 10 months ago

Keeping mapping keys ordered

Reported by: edemaine@mit.edu Assigned to: xi
Priority: normal Component: pyyaml
Severity: normal Keywords:
Cc: xi@resolvent.net edemaine@mit.edu

Description

Would you be interested in adding the following kind of functionality to the public distribution of PyYAML?

>>> import yaml
>>> d = yaml.load('z: 1\ny: 2\nx: 3\n', Loader=yaml.order.OrderedLoader)
>>> d
yaml.order.odict([('z', 1), ('y', 2), ('z', 3)])
>>> for key, value in d.iteritems (): print key, value
z 1
y 2
x 3
>>> print yaml.dump(d, Dumper=yaml.order.OrderedDumper, default_flow_style=False),
z: 1
y: 2
x: 3
>>> s = yaml.dump(d, default_flow_style=False)
>>> print s,
!!omap
z: 1
y: 2
x: 3
>>> yaml.load(y)
yaml.order.odict([('z', 1), ('y', 2), ('z', 3)])

There are two things going on here:

1. Add real !!omap functionality. When loading an !!omap object, create an odict object (defined by a new class that maintains a dictionary along with a key order), instead of the current behavior of creating a regular Python dictionary. Conversely, when dumping such an object, preserve the key order (don't sort), and output an !!omap directive. Both of these features seem quite desirable from a YAML standard point of view.

Perhaps, more generally, dumping could check for a special 'keys_in_order' attribute, in which case it follows the order of keys(), instead of sorting the keys as in the recent patch.

2. Add special yaml.order.OrderedLoader, which loads regular !!map values as if they were !!omap values, and yaml.order.OrderedDumper, which dumps odict types as regular !!map values (to avoid the ugly !!omap specifier).

Personally I would find this functionality very useful in many projects. It would enable a computer program edit a human-written YAML file, without messing up all the key orders, so that the computer output looks pretty similar to what the human had just before. I understand that YAML does not guarantee preservation of key order in a map type, or more precisely, it does not give it any significance to the order in absense of an !!omap or !!pairs type specification. But this is a practically useful feature in some cases, so it seems natural to provide it as an optional functionality in yaml.order.

I'd be happy to write the code for all of this, because I need it myself. My question is whether you'd consider including it in the PyYAML distribution.

Attachments

Change History

01/06/07 07:00:09 changed by m.mueller at ibgw-leipzig dot de

  • cc set to xi@resolvent.net edemaine@mit.edu.

I find YAML a very interesting format for input files of simulation models that typically need lots of data. I am just writing an input program in Python that reads YAML files. It was easy to implement some simple !File directive that reads data from other files either in YAML or in other formats. There are two reasons for this:

1. Since the files can be rather large, it is very nice to be able distribute input over as many files as desired. This can also be nested, i.e. a YAML file that is included that way can include another file etc. Only YAML files can include other files. Files in other formats can not, they are dead ends.

2. For data that fit very nicely in tabular representation, other format such as delimited text files (either space or semicolons), spreadsheet formats (EXCEL etc.), dBase file or other databases are often a good choice. They will be imported automatically into the Python data structures just like YAML itself.

I want the user to edit files as well as to read and write those files with my programs in between edits. Therefore, I would like mappings to be ordered. Otherwise the file would possibly be totally rearranged after processing, which is clearly something the user who edits the file would really dislike.

Furthermore, I plan to write a data-driven GUI for input files. It should find (nearly) all information for its appearance in the YAML file. One natural representation would be a tree view. Since the tree has an order I would need an order also for dictionaries.

To cut a long story short, I am really interested in an order mapping for YAML. I think both ways (1) doing it explicitly in YAML with !!odict and (2) doing it in source code with yaml.order.OrderedLoader? should be supported.

I’m willing to contributed to the implementation but would certainly need some help to look at the right places from the beginning and to do it properly, i.e. in agreement with the overall design of PyYAML.

06/07/07 00:08:46 changed by cems at lanl dot gov

An ordered dict may be the wrong tool for the problem. The basic problem is not just a dict issue but keeping all the yaml entries in the same order they were in the original file at dump time. I can see how this could be addressed by an ordered dict however this may be overkill.

An ordered dict is often implemented in a slower and more costly manner than a conventional dict. In this case the desired behavior is most cases I believe will be simply to have the YAML object remember the order of the items in the original file.

The distinction being made here if it is not clear is that one can use a conventional dict for storing and accessing the data quickly in python usage. Most of the time we will not care the order that keys are iterated or how they are stored or the voltility of the key order if the hash is resized. The only time we actually care, I believe, is when one wants to re-write the yaml file. Then one either wants to recreate an original order or specify a particular order.

Thus one can simply use a conventional dict, but also have some axillirary storage to specify the ordering of the keys. This can be consulted when needed and ignored for speed when not needed. Moreover, the user can even edit the order, rather than have it soley determined by the input order.

10/29/07 12:56:24 changed by mundt@easydesign.de

Just wanted to show interest on this topic, especially on reading the YAML files without messing with the keys. I use YAML to configure a validation process and it would be really nice to be able to have a fixed order. At the moment I do have a extra array just for the order. And thats just double code... who wants that!? :)

However... Since nothing has changed for almost 5 month I just want to check if this topic is still in progress and if there are any news.

thanks for the module anyways.

11/17/07 20:16:12 changed by xi

  • status changed from new to closed.
  • resolution set to wontfix.

I won't add this kind of functionality to the PyYAML core for two reasons:

  • It breaks the YAML specs. The spec clearly indicates that the key order is a representation detail and should not be used for constructing native objects.
  • I don't like the idea of PyYAML defining custom types for generated objects. I'd prefer PyYAML to generate only objects of the types defined in the standard library.

The implementation is nearly trivial though, so anyone who wants this feature regardless what the specs says could implement it by themselves:

>>> import yaml
>>> def omap_constructor(loader, node):
...     return loader.construct_pairs(node)
... 
>>> yaml.add_constructor(u'!omap', omap_constructor)
>>> yaml.load('!omap { C: 1, B: 2, A: 1 }')
[('C', 1), ('B', 2), ('A', 1)]


Add/Change #29 (Keeping mapping keys ordered)




Change Properties
Action