Ticket #59 (closed defect: worksforme)
Load yaml data as utf-8 strings into a dictionary
| Reported by: | dukebody@… | Owned by: | xi |
|---|---|---|---|
| Priority: | normal | Component: | pyyaml |
| Severity: | normal | Keywords: | |
| Cc: |
Description (last modified by xi) (diff)
Hello, I'm looking for support, but I don't know if this is the right place to ask.
I'm using PyYAML to load some nested data from an utf-8 encoded file in Python with:
import yaml
stream=file('data.yaml','r')
data=yaml.load(stream)
The data variable becomes a dictionary with unicode values where is needed. What I want is to put utf-8 strings instead of unicode values to use this data with another library: Cheetah. If I try to use the unicode-type dictionary generated by default with PyYAML I get an UnicodeDecodeError, because de Cheetah strings are in iso-8859-15 and Python tries to decode them to Unicode using ASCII charset tables, so it obiously fails.
Is there any way to get an utf-8 coded dictionary?
Attachments
Change History
comment:2 Changed 6 years ago by xi
- Status changed from assigned to closed
- Resolution set to worksforme
You need to specify an alternative constructor for the tag !!str. You can do it as follows:
import yaml
def custom_str_constructor(loader, node):
return loader.construct_scalar(node).encode('utf-8')
yaml.add_constructor(u'tag:yaml.org,2002:str', custom_str_constructor)
Example:
>>> import yaml
>>> yaml.load("Кирилл")
u'\u041a\u0438\u0440\u0438\u043b\u043b'
>>> def custom_str_constructor(loader, node):
... return loader.construct_scalar(node).encode('utf-8')
...
>>> yaml.add_constructor(u'tag:yaml.org,2002:str', custom_str_constructor)
>>> yaml.load("Кирилл")
'\xd0\x9a\xd0\xb8\xd1\x80\xd0\xb8\xd0\xbb\xd0\xbb'
