This is all very much a rough idea. There is no spec here, this is the fuzziest of straw-men. As you read this, please keep that in mind.
I want to next publish a spec and reference implementation for both a client and server, just to give people a taste of what’s possible.
This document is not quite enough to do that, but should be enough to make reading that implementation more sensible.
The web browser, far from it’s humble beginning as a weird sort of information retrieval system at CERN with shades of Nelson’s Project Xanadu and Bush’s Memex, has expanded from a simple hyperlinked document browser into an all-consuming platform for viewing locked-down videos, talking directly with devices over USB, supporting virtual and augmented reality, enabling fascinating timing attacks, back-porting support for multi-threading bugs, and creating an utterly baffling and garbage way of making noises.
This has been great–sites are more interactive than ever, content is more accessible than ever, and overall a whole lot of people have been served in ways that previously they arguably would not have. No longer are we limited to a web of documents.
And yet.
And yet.
Modern HTML5 and HTTP have made it almost impossible for any single developer to reliably reproduce a large fraction of the functionality required to claim to support those standards. The only organizations capable of fielding the legions of engineers required to keep up with these byzantine specs have also worked hard on pushing them ever-outward, breaking things and undermining the open web. Many of those same organizations derive their revenue wholly or in part from spying on users.
We are never going to be able to compete with Google or Apple or Google’s subsidiary in terms of sheer engineer man-hours if we stick to the same ground that they occupy. We can never hope for casual developers to implement this spec for web markup, or this spec for data requests, or this spec for a scripting language (though a more limited implementation of an earlier spec may be possible by small teams or individuals). Those specs are the product of large efforts by dedicated engineers not optimizing for that use case.
If we want users to be able to have freedom, they need to be able to pick their choice of tools.
If we want them to have a meaningful choice of tools, there have to be multiple options to choose from.
If we want multiple options, it can’t be so burdensome to create a tool that only two or three large organizations–with their own profit motives and business models–do it.
We need a simpler platform for the web. We need a platform where motivated individuals can make Khyber Pass Browsers.
I suggest the term Khyber Pass Browsers because I think we should look towards the evolved niche of the Khyber Pass Copy. Vice–whatever you might think about them–had a kind of neat little documentary about the homegrown (artisnal?) firearms industry of the Khyber Pass region. The important thing to note is that the weapons seem to be mostly hand-made, using whatever bits are on hand–contrast this with, say, manufacturing of 1911s in the US.
When KP gunsmiths lack access to good quality steel, they substitute something simpler. Similarly, if our developers lack access to, say, large IDL auto-generation frameworks we could skip them. The KP weapons are not meant to be used in large standardized batches for equipping large armies (and indeed, can’t be used for that purpose). Similarly, we aren’t going to match the quality control/ubiquity of Google or Facebook or Apple, so maybe we should settle on simpler designs that can be custom-tailored to the individual browsing.
What exactly do we mean by a Khyber Pass Browser (KPB)? What bullet list would we point to and say “Why yes, that is a collection of the salient design decisions and constraints for making a KPB”?
*(note that KPB is meant to include also the server, not merely the client browser…I realized this error after writing this section.)
The purpose of a KPB is not to submit lots of back-and-forth requests, to function as a chat application, or whatever else. It may embed some object or application that does that (more on that later), but the framework of a KPB should include as little affordance for application development as can be managed.
Rationale: Applications are ephemeral, documents are persistent (or at least, should be). Caching is a simpler with documents, management is simpler with documents, history and traversal is simpler with documents. Further, modular reading and interpretation of documents is simpler than with applications. Issues like upgrading a document reader are usually less fraught than with upgrading an application run-time.
Further, once you focus on serving applications instead of documents, you tend to need to specify a particular API for accessing functionality. Inevitably, you’ll get this wrong in the eyes of some person or group. They’ll be motivated to supplant you, to update the API, and if they work for an organization that has money in the game probably to subvert the entire ecosystem. Even if you don’t get it wrong on purpose, you’ll still need to evolve it over time to match the new capabilities of your hardware or demands of application developers–by contrast, documents are mercifully static in their design requirements by comparison.
A KPB shouldn’t be too hard to implement that somebody shouldn’t be able to bang one together in their favorite language. No knowledge of advanced data structures, light knowledge of networking or security, and only standard language knowledge should be required to make something Good Enough to function and talk to servers. It shouldn’t require dozens of pages of specs, arcane cross-referencing of documents, and so forth.
Ideally, somebody who has built an KPB stack should be able to talk somebody else through the process from memory. That’s a hell of a goal, but given that JSON can be on the back of a business card it doesn’t strike me as unreasonable.
Rationale: If one person can make something in a year, it becomes something a lot of people have access to. Linus Torvalds made the first version of Linux in a year, Carmack and Romero made the Doom engine in about a year and change, and Unix itself was substantially started in a year. Even a simple system that takes too long to develop probably won’t catch on. A year isn’t a terribly long time to work on a project, and if an idea is tried and doesn’t pan out it’s more educational than disastrous.
The reason I suggest an undergrad is that that seems to be the average education level of the folks listed above–further, requiring more education in computer science or software architecture means that you’ll end up excluding one of the most useful stages of developer: the relatively inexperienced whose optimism and chutzpah outstrip their knowledge. This demographic has written more useful software than can be counted.
Further, systems that require a lot of theoretical knowledge often beget the formation of larger groups, and that in turn causes weird architectural and political pathologies. Can a small team implement a language requiring clever techniques like Rust? Empirically not. Can one person implement a simple, bad language like C? Empirically so.
Users should be able to open up telnet, ncat, socat, or whatever their socket munger of choice is and be able to talk to other KPB clients and servers for most things. Human readability and hackability is of paramount importance to the success of KPB tech, so we can’t have everything squirreled away into binary tooling that is impossible for the under-tooled to engage with.
Rationale: What we can’t read we can’t trust, and even things we can read are highly suspect. For debugging, exploration, and reverse engineering, textual protocols are vastly more accessible than their binary counterparts.
A good chunk of the problems solved by binary protocols go away in a document-centered universe. You don’t have to worry about head-of-line blocking if you aren’t constrained to a few active TCP connections at a time. You don’t have to worry about connection setup and tear-down if you leave transport-layer security to operating environment.
Unfortunately, using TCP as a networking technology does imply certain issues around things like time to create connections. UDP would be easier for fixing this–by that point, though, we push the complexity of managing a stream over UDP onto our implementors (and while this is something even highschoolers are capable of, it’s just one more obstacle in the way of adoption).
Documents themselves, being collections of objects with some metadata about them, are best given as grouped key-value pairs. This promotes easy inspection and authoring. After a brief literature review with my colleague Morgan, we gravitated towards a format based heavily on RFC1521. Each object is just a collection of key-value pairs denoting content type, source, signature, and the content itself.
Rationale: For the choice of RFC as a base, this was problem was heavily explored and solved almost 30 years ago when tooling was much simpler and computers much slower. We expect any implementor to have far faster computers and better languages available, but also not to have lots of tooling available. This matches the spirit of the spec at the time.
As given above, the preference for textual parsing is due to the desire to make authoring and inspection easy. This does come with a good chance of mistakes when doing interoperability (say, spurious line feeds or similar), but nothing that isn’t too hard to debug with slightly careful inspection and following of the final spec.
One of the nice features of RFC1521 is the multipart/alternative
sub-type, which supports the specification of multiple equivalent objects of differing types. Originally this would be useful for, say, different image formats or something, but in a modern KPB this would be a great way of handling accessibility concerns–substituting a text fallback for audio if the user is deaf, for example.
Another nice feature is the message/external-body
sub-type, which with minor modifications would neatly handle remote objects.
The document itself can be thought of as having a collection of objects with fallbacks if a given object content type is unsupported/unwanted by the user.
The most basic client would display a concatenated list of these objects to the screen. More advanced clients would expand text objects into a readable form, and provide links for viewing objects of things like images, videos, audio, and so forth. Still more advanced clients would spin up helper programs and do something like embedding their X window or sharing their OpenGL context or something gross. Even more advanced clients could represent the objects as weird skeumorphic sub-documents in a VR workspace. Who knows.
Rationale: Every user has their own preferred setup for different types of documents. Some prefer VLC player for videos, others mpv. There’s no reason to force a user to use some stock rendering setup if they don’t need to/want to.
From an implementation standpoint, this frees up folks from having to know about the document stack and rendering video and playing audio and decoding pngs and so on and so forth. They can focus on keeping their KPB simple and obviously correct.
Additionally, if all object rendering is handled using external programs, it becomes a lot easier for users to update things as they become more performant or featureful and to better isolate them for security reasons.
Lastly, it lets us get back closer to the Unix philosophy in every major way. We move towards a composition of simpler programs, we move towards being able to rewrite and throw out bad ideas in the KPB, we remove lots of implementation complexity…we gain a lot by doing little.
As a weird side-effect of this, we regain the ability to run more than one type of web app, if we so choose–users can write scripts as object types to be evaluated in the context of their local run-times and permissions. No longer do they have to deal with the tyranny of Javascript! One could even do something weird like bundling an entire VM image up and shipping that as an object for download and execution, whether a svelte Smalltalk environment or a chunky qemu machine state.
Switching to a KPB is not an unalloyed good, though. We’ll be losing decades of work in making it possible to do pretty page layout. We’ll be setting back, at least initially, a lot of accessibility work. We’ll also lose the authoring tools that everybody takes for granted these days. The security model beyond document and object signing is purposefully anemic, which is bound to bite us eventually.
We’ll make many of the same mistakes of the earlier web pioneers–but also, hopefully, many new ones.
Gopher was briefly the big competitor (and predecessor) to the world wide web as we know it. The user interface is very opinionated, and frankly I think that the desires people have for even document browsing have far outstripped what Gopher can support.
In the same way that the imperious British colonizers met defeat at the hands of home-armed citizens, perhaps we can defeat the surveillance capitalists with simple rugged technology that any disgruntled developer can reproduce and enhance on their own.
It can be made simpler, it can be made more flexible, and it can be made to suit each user in a way that a mass-produced ad platform cannot.