Programming languages war started with the microprocessor instruction sets. Different instruction sets allows to construct the same programs, but not with the same effort.
A machine contains an state(memory) and a program. The program runs and modifies the memory. The programmer decides how to modify the memory. If the memory handles a device, the device will perform some action
Things got worse with the compilers and interpreters. Programs are described in a high level language, and then the code is translated to machine language, performing an action.
Operating systems with it processes and threads increases program complexity, making harder -if not impossible- to have a full understanding of what is going on. Then we have tools to reduce the complexity or have a better knowledge, like debuggers or IDEs. But again, the complexity is overwhelming.
As a programmer, I have to choose the right tool all the time. If I want to create a system script, the best choice is bash because it includes an unix DSL for file manipulation. If I want to create a very robust program, I’d go for a functional language with strong type support. If I want something flexible I’d go with my favourite language, Python.
But of course no choice is perfect. Functional programs are hard to read, and for example Haskell requires a full understanding of the types involved. Java is too verbose and object oriented languages are not very good at some tasks. Python lack of typing can be distracting sometimes. C/C++ lack of automatic memory management is sometimes the only option for a good performance on some environments. Multiple execution is very hard to implement is the language is not designed for that purpose (See twisted or nodejs). No matter which metric is used, every programming language have to position itself and sacrifice some features to get some advantages.
The main point I’m trying to make is that general purpose programming languages cannot escape the effects of translating its content to an underlying machine language. No programming language can become the next silver bullet.
On the other hand, programming languages shows some kind of DSL composition. The most obvious examples regular expressions, basic arithmetic and basic logic; they are first class citizens in most programming languages. Another example is list handling: Almost every programming language have some support for handling lists that includes list definition and manipulation.
Batteries included programming
Data will change its shape to use external services/libraries until the desired result is retrieved. For example: 1+1 is a very simple operation that involves integers and generates a simple response: 2. Getting the first element of a list is trivial too: head [1,2,3] or [1,2,3] and the response is always 1. But what about getting the latest email from an IMAP server?
Time to learn a new python library: imaplib. It will include a few classes, a few required commands like connecting to the server, selecting the folder, and grabbing the last message. This is the example included in the documentation:
import getpass, imaplib M = imaplib.IMAP4() M.login(getpass.getuser(), getpass.getpass()) M.select() typ, data = M.search(None, 'ALL') for num in data.split(): typ, data = M.fetch(num, '(RFC822)') print 'Message %s\n%s\n' % (num, data) M.close() M.logout()
the verbs are very similar to the words I’ve used for describing the problem. However, if I want to do similar actions with ftplib, then I have to learn new verbs:
- select becomes cwd
- fetch becomes retrbinary or retrlines
- search disappears, now we have dir
I think the main reason for this is because the underlying protocols. Ftp protocol has its own verbs, and so does smtp. However, I think that this is a good example of the limits of python abstractions. There is no standard way for defining the abstraction and that causes slight differences that increases the overall complexity.
If I want to create a tool that connects to an imap server or an smtp server and perform the same actions, I’d have to create an abstraction on top of those two libraries, learning how to use both and creating a new set of verbs(functions) and nouns(classes) that can fit both. I think that’s what Python development have become with the batteries includes philosophy. Again , the same happens with the JSON/XML and the API mess growing in the web. Programs should craft requests, get a response and convert (a JSON or XML code) to something useful like storing parts of it to SQL. In order to perform the conversion, I might require to use a library for handling, dictionaries or lists.
An alternative path
I can see the same underlying problem behind these tasks. Programming languages are not particularly well suited for translation, and the implicit DSLs cannot be stored in a convenient way. Interfaces can do the job for the easier tasks, but not for the complex abstractions. What about defining a common interface for actions in a DSL fashion?
- search (query) for searching
- get (location) for downloading
- put (location) for uploading
- list (location) for listing the location
Of course this DSL doesn’t include the concept of directory that might be required for ftp, or how is the search query syntax. I think that’s the responsability of underlying implementations, and they might have different DSL available for defining what a query or a location is. The methodology I suggest is to define software projects as the aggregation of language processors. Each processor have access to some tools related to the grammars they accept, like generating an AST or extracting a slice of the input. Data interchange between processors is always regulated by grammars. A grammar can describe an action too, and the language processor may execute the action if that is its purpose. Any data can be parsed by more than one grammar/language processor.
With this methodology, a program is a set of grammar definitions for inputs and outputs and language processors. Each language processor have another set of language processors and definitions. Here are a few advantages of this model:
Parallel and external processing
Each language processor only requires a valid definition for inputs and outputs. That mean that they can be run independently, within the same machine or through the network. It also allows to trivially define externals APIs as language processors.
Natural Language and meaning
It is technically feasible to connect any element of any grammar definition to a first order logic processor. I.e. the common interface for actions described above. I think it is also a good step to integrate NLP into programs.
Isolating language processors and having a formal definition per language allows input enumeration and eases code proof.
A well defined language processor will be easily reusable. It is the case of regular expressions.
Supported grammar definition formats
- regular expressions
- pydsl BNF format
- mongodb query dictionaries
- enumerate(gd): yields a list of accepted words
- first(gd): yields a list of the first accepted subword/char
- min(gd): length of the smaller accepted word
- max(gd): length of the biggest accepted word
- getgroup(gd, input, tag): returns the parts of the input according to a tag
- extract(gd, input): extract all the slices of the input that are accepted by grammar definition
- distance(gd, input1, input2): returns the distance between two inputs according to grammar definition
- translate(gd, input): generic translator
- ast(astdefinition, input): creates an abstract syntax tree according to astdefinition
- sdt( sdt, ast): Performs an AST translation using a Syntax Directed Translator
|"Come, let us make bricks, burning them well"|