[zeromq-dev] ZeroMQ contribution policy - style guide

john skaller skaller at users.sourceforge.net
Sat Feb 4 20:22:32 CET 2012


On 04/02/2012, at 9:30 PM, Dirkjan Ochtman wrote:

> On Sat, Feb 4, 2012 at 02:30, john skaller
> <skaller at users.sourceforge.net> wrote:
>> I have just added a new section to the style guide "banning" the calling
>> of public functions by public functions. Public functions are reserved
>> for the public.
> 
> I don't really see the point in that... Care to elaborate on why this
> is important?

Sure. [it's partly explained in an edit to the Wiki, but perhaps a better
explanation can be written]

The contract programming paradigm is generally accepted as a good thing.

The public interface to something maintains an invariant. It may be
that the invariant is "True" at some time, but that may change.

Code internal to a function does not need to maintain the same
invariant. It may have a different invariant.

If the internal code calls the public function, the stronger invariant
must be established.  People usually forget. Failure to do so
may cause the function to fail, corrupt something, or just waste
resources.

This actually occurred in libzmq C API. The invariant maintained
by the socket related function is that the first argument

	void *s_

is a zmq::socket. Here is the actual code:

    if (!s_ || !((zmq::socket_base_t*) s_)->check_tag ()) {
        errno = ENOTSOCK;
        return -1;
    }

It is repeated in most of the functions. It got left out of one,
for some reason I don't understand (I added it back in).
[Grrr ... a good type system could have enforced this much better]

Now, if zmq_recv calls zmq_recvmsg which it used to, the
test above is repeated twice.

Checks like this often occur on the public interface boundary
because the client is not trusted to maintain the invariant.
It's usually to maintain another invariant "No core dumps".

In C the more common check is for NULL pointers where they're
not supposed to be. Once the invariant is established the check
doesn't have to be repeated inside the function because the library
code is trusted and has established it.

Now, I will continue with a real example. In adding the optional thread-safe
socket code, I did this kind of thing:

int zmq_bind (void *s_, const char *addr_)
{
    if (!s_ || !((zmq::socket_base_t*) s_)->check_tag ()) {
        errno = ENOTSOCK;
        return -1;
    }
    zmq::socket_base_t *s = (zmq::socket_base_t *) s_;
    if(s->thread_safe()) s->lock();
    int result = s->bind (addr_);
    if(s->thread_safe()) s->unlock();
    return result;
}

Now, what would happen if the implementation code to be protected
called back to the public API?

You see, it's no longer a matter of efficiency: the code would deadlock.
This corrected piece of code actually DID do just that:

int zmq_recvmsg (void *s_, zmq_msg_t *msg_, int flags_)
{
    if (!s_ || !((zmq::socket_base_t*) s_)->check_tag ()) {
        errno = ENOTSOCK;
        return -1;
    }
    zmq::socket_base_t *s = (zmq::socket_base_t *) s_;
    if(s->thread_safe()) s->lock();
    int result = inner_recvmsg(s, msg_, flags_);
    if(s->thread_safe()) s->unlock();
    return result;
}



int zmq_recv (void *s_, void *buf_, size_t len_, int flags_)
{
    if (!s_ || !((zmq::socket_base_t*) s_)->check_tag ()) {
        errno = ENOTSOCK;
        return -1;
    }
    zmq_msg_t msg;
    int rc = zmq_msg_init (&msg);
    errno_assert (rc == 0);

    zmq::socket_base_t *s = (zmq::socket_base_t *) s_;
    if(s->thread_safe()) s->lock();
    int nbytes = inner_recvmsg (s, &msg, flags_);
    if(s->thread_safe()) s->unlock();
    if (unlikely (nbytes < 0)) {
        int err = errno;
        rc = zmq_msg_close (&msg);
        errno_assert (rc == 0);
        errno = err;
        return -1;
    }

    //  At the moment an oversized message is silently truncated.
    //  TODO: Build in a notification mechanism to report the overflows.
    size_t to_copy = size_t (nbytes) < len_ ? size_t (nbytes) : len_;
    memcpy (buf_, zmq_msg_data (&msg), to_copy);

    rc = zmq_msg_close (&msg);
    errno_assert (rc == 0);

    return nbytes;    
}

Previously, zmq_recv called zmq_recvmsg. With my addition of locks,
this would have deadlocked.

So what has happened is that the optional invariant added, namely
"the C socket API is thread safe" could not be correctly implemented without
refactoring. If it had been correctly constructed in the first place, the
change of invariant would have proceeded mechanically.

You will note that the C API for sockets is responsible for establishing
the invariant, and this is necessarily contrary to allowing the code
inside the functions from calling the public API, unless the invariant
establishment and de-establishment is idempotent, in which case
we just waste resources.

There are two other example of import I can think of **even when
the invariant is apparently just "true"**:

1) Profiling will be screwed up by extra calls to the API than are
actually made by the public

2) Tracing will be similarly screwed up. 

because here, an additional invariant is invasively imposed by
a profiler or debugging tool.

Roughly speaking, quality code strictly separates interface layers
and documents invariants. The situation in C++ code is much
more complex and almost everyone gets it wrong. Here we
have several layers, and protection provided to implement
some enforcement: public, protected, private, data/virtual,
friends .. 

But we also have a situation where simple code would require
multiple layers of wrappers, and so many times programmers
prefer to refactor than bother doing it right. This is probably
a design fault in C++.

There are of course many principles of contract programming,
a nice one I heard recently is "single point of responsibility".

This suggests that instead of repeating the check the void *
argument is a socket in every API call, it should be 
factored out into an inline function. Doing that is not a matter
of "factoring out common code" which is often incorrect,
because such factoring can stand in the way of modification
and thus make the code fragile. Rather, we would do this
because the principle dictates it. Then, if the invariant changes
only one place needs modification.

I didn't add that rule to the style guide, because it's more a matter
of judgement. The public interface not calling itself, however,
is more clear cut.

Though, not always! In the C API socket code, there are calls to the 
C API for messages (zmq_msg_t things). That may not be wrong,
because the message management subsystem might be considered
a distinct API, and the socket API one of its clients: in that case
the socket API "is" the public. One can make the judgement by thinking
about use cases .. eg .. profiling, tracing, locking messages to make
them thread safe etc help -- even if one is NOT doing those things
by design, considering "what if" helps get the structure right.

I hope this waffly explanation helps. It's all about invariants
and contract programming principles and how they relate to
correctness, efficiency, and fragility/robustness in the face
of change.

--
john skaller
skaller at users.sourceforge.net







More information about the zeromq-dev mailing list