Ticket #16 (reopened defect)

Opened 4 years ago

Last modified 3 years ago

duplicate keys in dictionary cause unpredictable behavior

Reported by: nickesk@cs.bu.edu Assigned to: xi
Priority: normal Component: pysyck
Severity: major Keywords:
Cc:

Description (Last modified by xi)

When there are duplicate keys in the same dictionary, PySyck returns unpredictable results, including the wrong data type.

I'm not sure if this is a defect in PySyck, or the underlying syck library. However, older versions of PySyck (<0.55) didn't suffer from this and merely overwrote the duplicate key with the new information.

>>> syck.load('foo: bar\nfoo2: bar')
{'foo': 'bar', 'foo2': 'bar'}
>>> syck.load('foo: bar\nfoo2: bar\nfoo: bar')
[('foo', 'bar')]

Attachments

Change History

05/23/06 15:27:59 changed by xi

  • status changed from new to closed.
  • resolution set to fixed.
  • description changed.

Fixed in [176].

When a mapping node cannot be correctly represented as a dictionary, PySyck is supposed to convert it to a list of pairs. So in this case, the expected value is

>>> syck.load('foo: bar\nfoo2: bar\nfoo: bar')
[('foo2', 'bar'), ('foo', 'bar'), ('foo', 'bar')]

I fixed the bug and now PySyck works as above.

But now I doubt if it's really a correct fix. The spec claims that a duplicate key in a mapping is an error. It's hard to check in all cases, but PyYAML tries to enforce it:

>>> yaml.load('foo: bar\nfoo2: bar\nfoo: bar')
Traceback (most recent call last):
  [...]
yaml.constructor.ConstructorError: while constructing a mapping
  in "<string>", line 1, column 1:
    foo: bar
    ^
found duplicate key
  in "<string>", line 3, column 1:
    foo: bar
    ^

So you shouldn't really use duplicate keys as they may cause an error in the next versions of PySyck.

05/23/06 15:32:02 changed by xi

BTW, feel free to post your concerns here or reopen the ticket if you think this issue is not completely resolved.

05/30/06 18:22:29 changed by nickesk@cs.bu.edu

  • status changed from closed to reopened.
  • resolution deleted.

Thanks for the quick fix! However, I think it would be more correct for it to either overwrite the duplicate key with the new information or to throw an exception. Returning a different data structure due to a syntax error in the source document is probably the wrong way to go.

Honestly, I would prefer the "overwrite" option, as this is the old behavior and also how Ruby's yaml library deals with this. Maybe the best way would be to have an option for "strict" parsing that throws an exception when a duplicate key is found, and otherwise overwrites the value.

03/27/07 07:58:27 changed by tsuna@lrde.epita.fr

Hello we use PySyck in one of our projects and this change broke the backward compatibility of PySyck. I know it might not be a good idea to have duplicate keys but sometimes it might be hard to avoid (think of generated code). So yes it would be better to revert to the legacy behavior or throw an exception but offer a configuration option that would enable the user to use the legacy behavior.

Thanks


Add/Change #16 (duplicate keys in dictionary cause unpredictable behavior)




Change Properties
Action