Custom Query (121 matches)
Results (70 - 72 of 121)
| Ticket | Resolution | Summary | Owner | Reporter |
|---|---|---|---|---|
| #11 | fixed | Unicode support | xi | edemaine@… |
| Description |
I would like to bring up two issues with Unicode support in PyYAML's emitter. First, it emits a type annotation of !!python/unicode whenever emitting a unicode string that can be encoded in ASCII: >>> print yaml.dump(u'Fran\xe7ais') "Fran\xE7ais" >>> print yaml.dump(u'hello') !!python/unicode 'hello' I assume this is to force the value to be a unicode string when read back in. However, it makes for rather ugly files. In my case, and I imagine many others, I really don't care whether a string is stored as a 'str' or as a 'unicode' object in Python. And in YAML, the native string type is Unicode anyway. So it seems strange to have this distinction at the level of the YAML file. On the other hand, I understand the desire to have yaml.load(yaml.dump(x)) == x. Perhaps this should be another configuration option? (Of course, I could just convert my ASCII-encodable unicode objects to str objects...) The second issue is that the emitter escapes non-ASCII characters even when all characters are printable (according to 'c-printable' in the YAML spec) when using an encoding (UTF8) that supports such characters. I don't find this as elegant as could be. Instead of the "Fran\xE7ais" output above, I would have hoped for the UTF8-encoded byte string Fran\xc3\xa7ais\n. I guess this is as stylistic an issue as the previous one. It makes me wonder again whether there should be a Style object that can specify various emitting options, instead of many keyword arguments... |
|||
| #12 | fixed | PyYAML is slow | xi | edemaine@… |
| Description |
Here are two simple wall-clock timings comparing PyYAML to PySyck on a Pentium 4 2.8GHz with 1MB cache and 1GB RAM: $ wc file1.yaml 2036 8767 59154 file1 $ test.py file1.yaml 0:00:00.001419 to read the YAML via Syck 0:00:04.029627 to read the YAML via PyYAML $ wc file2.yaml 8949 35105 317342 file2 $ test.py file2.yaml 0:00:00.001564 to read the YAML via Syck 0:00:19.288912 to read the YAML via PyYAML I do not expect PyYAML to be terribly competitive with Syck: the language barrier is big, and PyYAML is written with a higher level of abstraction. But I was surprised to see a factor of 12,000 difference. I wonder if a bit of profiling and tuning might reduce this gap to just a couple of orders of magnitude (100x) instead of four? Personally, 19 seconds to read a 0.3 meg file is too slow for my application, so I'll have to switch back to Syck for now, unfortunately. Just food for thought... |
|||
| #30 | fixed | Timestamp support has floating-point roundoff | xi | edemaine@… |
| Description |
Example: >>> import yaml, datetime >>> yaml.dump(datetime.datetime(2005, 7, 8, 17, 35, 4, 517600)) '2005-07-08 17:35:04.517600\n' >>> yaml.load(_) datetime.datetime(2005, 7, 8, 17, 35, 4, 517599) This breaks the desired rule that yaml.load(yaml.dump(x)) == x in a case where there should be no roundoff. (datetime.datetime uses integers everywhere to avoid any error.) The offending code seems to be line 321 in yaml/constructor.py: fraction = int(float(values['fraction'])*1000000) This seems to be an "easy" way to convert the trailing '.517600' into an integer, but it can go beyond floating-point precision. Wouldn't the following work? fraction = int(values['fraction'][:6].ljust(6, '0')) |
|||
