Ticket #16 (reopened defect)

Opened 8 years ago

Last modified 7 years ago

duplicate keys in dictionary cause unpredictable behavior

Reported by: nickesk@… Owned by: xi
Priority: normal Component: pysyck
Severity: major Keywords:

Description (last modified by xi) (diff)

When there are duplicate keys in the same dictionary, PySyck returns unpredictable results, including the wrong data type.

I'm not sure if this is a defect in PySyck, or the underlying syck library. However, older versions of PySyck (<0.55) didn't suffer from this and merely overwrote the duplicate key with the new information.

>>> syck.load('foo: bar\nfoo2: bar')
{'foo': 'bar', 'foo2': 'bar'}
>>> syck.load('foo: bar\nfoo2: bar\nfoo: bar')
[('foo', 'bar')]


Change History

comment:1 Changed 8 years ago by xi

  • Status changed from new to closed
  • Resolution set to fixed
  • Description modified (diff)

Fixed in [176].

When a mapping node cannot be correctly represented as a dictionary, PySyck is supposed to convert it to a list of pairs. So in this case, the expected value is

>>> syck.load('foo: bar\nfoo2: bar\nfoo: bar')
[('foo2', 'bar'), ('foo', 'bar'), ('foo', 'bar')]

I fixed the bug and now PySyck works as above.

But now I doubt if it's really a correct fix. The spec claims that a duplicate key in a mapping is an error. It's hard to check in all cases, but PyYAML tries to enforce it:

>>> yaml.load('foo: bar\nfoo2: bar\nfoo: bar')
Traceback (most recent call last):
yaml.constructor.ConstructorError: while constructing a mapping
  in "<string>", line 1, column 1:
    foo: bar
found duplicate key
  in "<string>", line 3, column 1:
    foo: bar

So you shouldn't really use duplicate keys as they may cause an error in the next versions of PySyck.

comment:2 Changed 8 years ago by xi

BTW, feel free to post your concerns here or reopen the ticket if you think this issue is not completely resolved.

comment:3 Changed 8 years ago by nickesk@…

  • Status changed from closed to reopened
  • Resolution fixed deleted

Thanks for the quick fix! However, I think it would be more correct for it to either overwrite the duplicate key with the new information or to throw an exception. Returning a different data structure due to a syntax error in the source document is probably the wrong way to go.

Honestly, I would prefer the "overwrite" option, as this is the old behavior and also how Ruby's yaml library deals with this. Maybe the best way would be to have an option for "strict" parsing that throws an exception when a duplicate key is found, and otherwise overwrites the value.

comment:4 Changed 7 years ago by tsuna@…

Hello we use PySyck in one of our projects and this change broke the backward compatibility of PySyck. I know it might not be a good idea to have duplicate keys but sometimes it might be hard to avoid (think of generated code). So yes it would be better to revert to the legacy behavior or throw an exception but offer a configuration option that would enable the user to use the legacy behavior.



Add a comment

Modify Ticket

Change Properties
<Author field>
as reopened
as The resolution will be set. Next status will be 'closed'
to The owner will be changed from xi. Next status will be 'new'

E-mail address and user name can be saved in the Preferences.

Note: See TracTickets for help on using tickets.