Modify

Ticket #36 (closed defect: invalid)

Opened 8 years ago

Last modified 8 years ago

PyYaml and WebWare Don't Play Nicely...

Reported by: stef@… Owned by: xi
Priority: high Component: pyyaml
Severity: blocker Keywords: pyyaml webware threading
Cc:

Description (last modified by xi) (diff)

Hello,

Sorry to ask a stupid question, but, is PyYaml? thread-safe ? I am trying to run PyYaml? under WebWare? and I get a session identifier. Due to the nature of WebWare? being servlet based, I find that it sometimes calls 'get session' more than once during the lifetime of processing the servlet. The code that get's run more than once is;

    def __getitem__(self, key):
        if debug:
            print '>> get (%s)' % key
        filename = self.filenameForKey(key)
        self._lock.acquire()
        try:
            try:
                file = open(filename)
            except IOError:
                raise KeyError, key
            try:
                try:
                    item = yaml.load(file)
                finally:
                    file.close()
            except:                 # session can't be unpickled
                os.remove(filename) # remove session file
                print "Error loading session from disk:", key
                self.application().handleException()
                raise KeyError, key
        finally:
            self._lock.release()
        return item

When I put wrapper print statements around the yaml.load, I notice that the first time it works without fail, the next time I get this;

>> get (20061016171933-d0e40d4c65b62dffbbe4c3d2b21922a5)
 RIGHT BEFORE YAML.load for <open file '/usr/local/web_work/Sessions/20061016171933-d0e40d4c65b62dffbbe4c3d2b21922a5.ses', mode 'r' at 0xb71798d8> (/usr/local/web_work/Sessions/20061016171933-d0e40d4c65b62dffbbe4c3d2b21922a5.ses)
[Mon Oct 16 17:20:28 2006] [error] WebKit: Error while executing script /usr/local/web_work/Compass/RebookHotelPage.py
Traceback (most recent call last):
  File "/usr/local/webware-cvs/WebKit/Application.py", line 436, in dispatchRawRequest
    self.runTransaction(trans)
  File "/usr/local/webware-cvs/WebKit/Application.py", line 487, in runTransaction
    self.runTransactionViaServlet(servlet, trans)
  File "/usr/local/webware-cvs/WebKit/Application.py", line 512, in runTransactionViaServlet
    servlet.runTransaction(trans)
  File "WebKit/Servlet.py", line 41, in runTransaction
  File "/usr/local/webware-cvs/WebKit/Transaction.py", line 108, in awake
    self._servlet.awake(self)
  File "/usr/local/web_work/Compass/RebookHotelPage.py", line 74, in awake
    print " %s - %s " % ( self, self.session() )
  File "WebKit/Page.py", line 151, in session
  File "/usr/local/webware-cvs/WebKit/Transaction.py", line 69, in session
    self._session = self._application.createSessionForTransaction(self)
  File "/usr/local/webware-cvs/WebKit/Application.py", line 312, in createSessionForTransaction
    session = self.session(sessId)
  File "/usr/local/webware-cvs/WebKit/Application.py", line 279, in session
    return self._sessions[sessionId]
  File "/usr/local/webware-cvs/WebKit/SessionFileStore.py", line 59, in __getitem__
    myItem = yaml.load(file)
  File "build/bdist.linux-i686/egg/yaml/__init__.py", line 66, in load
  File "build/bdist.linux-i686/egg/yaml/constructor.py", line 38, in get_data
  File "build/bdist.linux-i686/egg/yaml/constructor.py", line 50, in construct_document
  File "build/bdist.linux-i686/egg/yaml/constructor.py", line 393, in construct_yaml_seq
  File "build/bdist.linux-i686/egg/yaml/constructor.py", line 120, in construct_sequence
  File "build/bdist.linux-i686/egg/yaml/constructor.py", line 96, in construct_object
  File "build/bdist.linux-i686/egg/yaml/constructor.py", line 572, in construct_python_object
  File "build/bdist.linux-i686/egg/yaml/constructor.py", line 551, in make_python_instance
TypeError: __new__() takes exactly 2 arguments (1 given)

So, from my total layman's perspective, it appears that the init.py is expected to be called 'once', yet due to WebWare?'s threading, I think it get's called more than once. Either way, I could be totally and whole heartedly wrong, any ideas ?

Priority is 'high' as I am on a strict timeline to understand what's going on, and severity is definitely 'blocker', as without a resolution, WebWare? can't use PyYaml?. Which, is kinda weird, but, at least in my experience :)

Regards and Thanks Stef

Attachments

Change History

comment:1 Changed 8 years ago by xi

  • Description modified (diff)

comment:2 Changed 8 years ago by xi

  • Status changed from new to assigned

Hmm... It's an interesting question. I believe that yaml.load() and yaml.dump() are thread-safe, but yaml.add_constructor() and yaml.add_representer() are not.

Anyway this error doesn't look like related to threads. Are you sure that the file is not modified between the first and the second calls of yaml.load()? It rather looks like some Python object cannot be constructed from a yaml node.

You may use the following "patch" to detect what node causes the exception:

import yaml, yaml.constructor

old_make_python_instance = yaml.constructor.Constructor.make_python_instance

def my_make_python_instance(self, suffix, node,
                    args=None, kwds=None, newobj=False):
    try:
        old_make_python_instance(self, suffix, node, args, kwds, newobj)
    except TypeError:
        print suffix, node, args, kwds, newobj
        raise

yaml.constructor.Constructor.make_python_instance = my_make_python_instance

comment:3 Changed 8 years ago by stef@…

Hello again,

Well, it definitely is curiouser and curiouser (said alice). It transpires that YAML is also serialising other objects that are stored into the session that are my own class's, which, is what I would expect. However, the new in my own classes (say for example Service), only take args, they don't take (args, *kwds). This is why the instantation of them is 'barfing out' with '2 parameters given but only 1 expected in new()'

I have changed my code to merely pass in the id's, as this is unique to my class and even after un-marshalling the objects, I am still instantiating them anyway (hence the new call above ;). By only storing the id's, then I can YAML away to my heart's content :)

I am even unsure as to how to fix this for other people in the future. It would require doing some introspection on the class type itself to see what parameters new expects. My perl OO and Ruby are strong, but, less so in python. Is this even achievable ?

Irregardless, thanks for your help and pointing me in the right track. I leave the bug's status upto you, but I can see this tripping up other people as well. Regards Stef

comment:4 Changed 8 years ago by xi

  • Status changed from assigned to closed
  • Resolution set to invalid

When serializing/deserializing Python object, PyYAML follows the pickle protocol v2 (see  http://www.python.org/dev/peps/pep-0307/). You must ensure that your objects support:

>>> out = pickle.dumps(obj, protocol=2)
>>> obj = pickle.loads(out)

Most likely, you need to provide a custom __reduce__ function to make your objects work correctly.

I'm closing the ticket, but feel free to post any questions here or reopen it if you find any object that (de)serializable with pickle, but cannot be loaded/dumped using PyYAML.

View

Add a comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
The resolution will be deleted. Next status will be 'reopened'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.