MIDI probably isn't a good example. Or, maybe it's a very good example, but a counter one. As I keep droning on about, there's syntax and there's semantics. A common protocol at its base deals with syntax. It defines the words and sentences used by devices to communicate. But, for it really make a difference beyond where we already are, it has to deal with semantics, i.e. what do these messages mean, and at a higher level what are the devices that are talking and how do they work and what can they do.
MIDI is a specialized protocol, for music, so it does provide some basic semantics. And that allows for basic interconnection of those specialized devices and some basic understanding of what they are trying to say to each other. The core part of the protocol, the sending of note off/on and note velocity events, has inherent semantics, so other devices can understand what is desired of them.
But, once you move beyond that, if you try to apply MIDI to other types of devices, it can be done, but those inherent semantics are no longer really valid. And certainly anyone who has gone beyond that with MIDI and had to set up some complicated MIDI CC control stuff between devices understands what I'm whining about. At that point, you are beyond the basic, inherent semantics of the protocol and it once again can get quite complicated to set up, because now you the human has to know the semantic intent of the messages and make sure that they are set up to do the right thing. That doesn't happen auto-magically.
In the automation world, an equivalent scenario would be a dedicated common protocol for, say, thermostats. It would provide inherent semantics because any automation system using that protocol would know its dealing with thermostats and the protocol would define what a thermostat is, what it can be asked to do, what information it can provide, and what states it can have and what can be asked of it in any given state. That would make it pretty easy to set up control over thermostats, though of course it would also mean that any feature that lies outside of that semantic framework would not be supportable. And, though you could probably hack it to support some other types of devices, the semantics it defines would no longer be valid and it would be no better (and probably worse) than what we already have.
A truly ubiquitous, useful common protocol would have to provide that sort of semantic framework for every single possible type of device, which would be a massive undertaking. And, it would, to be useful for the sort of folks who hang out around here, have to set a fairly high semantic bar, which would inevitably leave out a lot of hardware because it just wouldn't be able to meet the requirements (technically or financially.) If you go the other way, and set the bar low and make lots of functionality optional, then it's worse than where we are now really, since it becomes very limiting and you can't really make any assumptions about anything up front (which is the great benefit of standardization in terms of allowing you to create reusable systems or reusable bits of systems that can be assembled together.)
In reality, most protocol that are targeting the 'internet of whatevers' is likely to be the low endy version, allowing for simplistic interaction between devices without a central controller, but being of limited use for folks who are looking to create serious automation solutions. That sort of scenario would end up being much like Z-Wave is today, allowing very simple interaction between devices without any real setup requirements, and basically the simple remote control on a phone type interface without any real integration.
Any truly ubiquitous protocol that would serve the needs of the types of folks who hang out here would be an epic undertaking, both technically to define, and politically and economically to get it accepted at a wide enough level to ever get a foothold and become successful. So it's extremely unlikely to happen.
Any truly ubiquitous protocol that was of the other sort (the low endy variety), if it did become truly widespread, would just push the automation world away from what folks around here would be interested in anyway. Why would a device manufacturer bother to do the work to expose a lot of functionality that would be necessary for extensive integration when it could just provide the basic support required by this widely used protocol, and get 95% of the financial benefits, i.e. the market that's happy with a very simple remote control on an iPad type of interface with no real 'integration' of functionality? What would be the financial incentive to do that?
Anyway, I need to eat some food because my CPU is under-voltage right now and I'm losing the thread of my point. And, for that matter it's the same point I've made before anyway, so probably it was a waste of time to even say it to begin with.