MyDef, a general purpose preprocessor

What is a preprocessor

A preprocessor modifies a source code file before handing it over to the compiler. If you have programmed in C before, you know what I am talking about. For others, you can think of a preprocessor as it provides "macros".

While very powerful when used prudently, C's particular implementations of macro system is full of pitfalls. In fact, it is quite controversial. For example, the use of macros in C++ is generally discouraged, and many modern features in C++ are intentionally developed to replace certain macro usages. However, this controversy should not automatically extend to preprocessing in general. To help broaden the view of preprocessing, consider following examples:

  • Syntactic sugar.

    Consider following snippet in Python:

    for i in old_list:
        if filter(i):
    You can obtain the same thing using so-called list comprehension:
    new_list = [expression(i) for i in old_list if filter(i)]

    Both code are equivalent. Although I am not certain, an easy way for a language to add syntactic sugar is to simply add a filter on top of the original parser. In another word, syntactic sugar is a form of preprocessing.

    Python, and many other languages, choose to control the syntactic sugar as part of language design. On the other side there is LISP, which intrinsically lacks any syntactic sugar but provides the user the ability to preprocess the code (form) in any way they see fit. LISP's macros are powerful, but it has to be developed in LISP and for LISP. A general purpose preprocessor can fill the void for general languages -- including LISP.

  • Literate programming.

    In making of TeX, Donald E. Knuth invented a style of programming called literate programming. Following is an excerpt from his TeX program:

    @ The program begins with a normal \PASCAL\ program heading, whose components will be filled in later, using the conventions of \.{WEB}.
    @p @t\4@>@<Compiler directives@>@/
    program TEX; {all file names are defined dynamically}
    label @<Labels in the outer block@>@/
    const @<Constants in the outer block@>@/
    mtype @<Types in the outer block@>@/
    var @<Global variables@>@/
    @d start_of_TEX=1 {go here when \TeX's variables are initialized}
    @d end_of_TEX=9998 {go here to close files and terminate gracefully}
    @d final_end=9999 {this label marks the ending of the program}
    @<Labels in the out...@>=
    start_of_TEX@t\hskip-2pt@>, end_of_TEX@t\hskip-2pt@>,@,final_end;
    We can see that he is definitely not into syntactic sugars.

    Due to Prof. Knuth's meticulous style, many of us got the impression that literate programming simply means mixing copious amount of documentation with code. That is not entirely correct. The key concept of literate programming is to write code in a organization that best suited for human comprehension. To achieve this literary goal, it is necessary to re-organize code according to semantic context rather than what is required by underlying programming language. For example, declare the label, const, type, var separately where that is relevant and with sensible names rather than where and how the pascal compiler expects, or breaking blocks of code and definitions apart and regroup them according to literary contexts.

    To achieve this, Prof. Knuth invented web, which is simply a preprocessor. Unfortunately, it is not quite general due to its tie with TeX and Pascal.

Syntax is important. And the idea of literate programming makes sense. Both practices have been limited primarily due to implementations being tied within narrow domains. This calls for general purpose preprocessors.

At this point, someone may call out that general purpose preprocessor do exist. What about m4?

m4 is essentially C's preprocessor that made general. To understand why I find m4 less satisfactory, let's review a few design choices C made that limited its general usage:

  • It is implemented at word/token level

    Lines and blocks are higher units of semantic meaning. By working only at word level rather than line or block level, the macros are not being used toward first-order literate purpose.

  • It is intended to be concealed with base C syntax. A macro word looks like C identifier; and a macro with parameter looks like a C function call.

    Macros have subtle differences from native C functions and programmers need be able to easily distinguish them. By conceal the appearance, the subtleties become pitfalls by design.

  • It is not scoped.

    At human communication level, context is very important. Within a context, we prefer short names, such as it, he, this, and that; or a, b, c and i, j, k. However, at global level, we prefer long names, e.g. John Smith the 3rd from Connecticut, or global_counter_of_particular_type. Short names replaces long names within small context -- perfect for macros, but they need automatically expire upon exit of context. When they don't, semantic contexts gets polluted.

    C's macros are always global, even go beyond file scope, a true source of pain beyond reasonable tolerance.

There is no reason that a preprocessor implementation has to be bound with a particular language; and there is no reason that we can't have something that is easy to read and write and works at comprehension units rather than tokens. This introduces MyDef.

Installing MyDef

To dispel the impression that we are only talking about something hypothetical, let's make sure that we can install MyDef.

The designer of MyDef had a very different philosophy from those of typical libraries and frameworks. A typical design goal of library or framework is to appear "magical". Users are not expected to understand or even see how the internal works. They are expected to follow instructions and find it "just works". However, the goal of MyDef is to give users ability to control their own syntax and the ability to write code in the best comprehensible way from their own perspective. The designer of MyDef do have his own view of what is more desirable but he does not insist it. The goal is not to assert, rather it is to give users freedom to discover on their own. Therefore, to show users how MyDef works and to give the control to users are important.

MyDef is rather provided as an example of a particular use case. In a metaphor, MyDef are good bicycle wheels as far as the author concerns. If others find the bicycle wheels useful, they can directly use it. But he also believe it can be of use to, e.g. tractor users, as long as they understand how the wheel works in general. There is no need to reinvent the wheel as long as we are willing to learn the wheel.

Distinctly, MyDef is expected to be installed to one's home directory. The users are expected to see, read, add or modify any parts of it to best adapt to users particular scenarios.

Assuming typical Linux environment -- where $HOME is defined and you are working in a projects folder under $HOME, and you have installed git, make, uses bash, MyDef can be installed easily with

# [overwrites .bashrc and .vimrc!]
$ git clone
$ cd my_configs
$ bash
$ source ~/.bashrc
It pulls a few additional git checkout and installs them. It installs perl code in $HOME/lib/perl5 and MyDef library in $HOME/lib/MyDef. In addition, it also installs custom .bashrc and .vimrc to set up a workable environment and simple syntax highlighting for vim. Try this within a vanilla docker container.

But if you are comfortable setting up your own environment, you should not follow above instructions. Instead,

$ git clone
$ cd MyDef
$ sh
Not too complicated. The caveats include -- PATH and a few key environment variable may not have been set automatically; Your editor may not have been configured to show syntax highlighting or configured to deal with block-level indentations, and short-cut keys may not have been set up to make the extra preprocessing step transparent. They are explained in the documents within the repository.

There is no configure to guess your particular system and particular preference. It should work for a typical command line environment. If it doesn't, you should know how to make it work better than I could ever guess.

Using MyDef

Freshly installed, let's play with it:

$ mkdir ~/projects/temp
$ cd ~/projects/temp
$ vim t.def
    page: t
        module: perl

        $print Hello World!
Save, and still inside vim, press <F5>.

If it works, on the terminal it should tell you exactly what that <F5> did: It compiles t.def into and then runs perl The <F5> is configured inside .vimrc. If you don't have it, you could try enter it manually:

$ mydef_page -mperl t.def
$ perl
I encourage you to configure a single key-press mapping. It is important not to have the impression that extra preprocessing step is something extra in your workflow.

Unlike C's preprocessor, MyDef does not hide its output. It's output should be identical or similar to what the user normally would write (without MyDef). And in case the user need debug or diagnose his code, he often need work at both def level and its output source level. It is essentially two abstraction level. .def is at literary or abstract level while its output is at whatever underlying language level directly.

Put them side by side, in t.def:

$print Hello World!
and in

print "Hello World!\n";
Do you see the difference?

Before you may generalize, you should be told that this is only a special syntactic sugar for printing to console. It is of my personal flavor and there there are more to it. Generally, MyDef does not attempt to guess where your strings are and add quotation marks for you. Within print however, the narrower context allows particular syntactic sugar to work -- just as Python's list comprehension. Since printing is so prevailing, I will find saving the trouble of typing quotes quite welcome.

How MyDef works

There are special syntax that are available in language specific output modules, like the output_perl module, however, in this section, we will limit our discussions to base MyDef.

Just like learning programming, before the language syntax, one should understand how to navigate command line, how to edit source code in text, how to compile, and how to run the program. Here is how MyDef works:

  • You should keep the MyDef source folder that you cloned from github. Although you don't need it to run MyDef once installed, it is convenient to have it around so you can read and customize its working. In fact, if you later want to download or develop plug-ins -- called output modules -- you would need this base repository. You should have it as an environment variable:
    export MYDEFSRC=$HOME/projects/MyDef
  • If you take a look into the repository folder, you may discover that they are mostly .def files. The actual code is in Perl, and they need be preprocessed by MyDef in order to run as perl. This brings forth the problem of bootstrap -- how would we install MyDef if MyDef is not installed in the first place. To solve that, we simply place a set pre-compiled perl code inside the bootstrap folder, and they will be used to initialize the installation. To provide stability, what's inside bootstrap/ is typically many commits behind the actual code in .def; but as long as they can pre-process the main .def source, it should work.
  • MyDef installs several perl scripts into $HOME/bin, which should be in your PATH (if not, you need add it). The perl scripts uses a few perl modules, which are installed in $HOME/lib/perl5/MyDef/. Finally, there are some "standard" def libraries installed in $HOME/lib/MyDef/, for convenience.
  • The following environment variables are set (in ~/.bashrc) to reflect your preference:
    export PERL5LIB=$HOME/lib/perl5
    export MYDEFLIB=$HOME/lib/MyDef
    export PATH=$HOME/bin:...
  • Inside $HOME/bin, the most important script, mydef_page, is the MyDef compiler:
    $ mydef_page -m[module] src.def
    -mperl will uses the module to do the translation. If -m option is not specified, it will read the module option from your .def source or a file in your source folder simply called config.
  • mydef_make will survey all your .def files in the directory and generate a Makefile for you. It is for the convenience when you have many source file or subdirectory and hand writing Makefile seem to be pain. Oh, it will generate a default config file for you if you don't have one already.
  • mydef_run calls mydef_page directly and also tries to run the resulting code. For perl, it runs perl. For python, it runs python. For c, it compiles with gcc and then runs the executable.

    mydef_run covers the opposite side of the needs from mydef_make. When all you want is a single program from a single source, you don't need Makefile, you just need be able to run it as convenient as you can.

    Do you know you can use mydef_run to run .pl, .py, or even .c files directly? There is nothing magical, just convenience. You can customize its assumptions by modifying mydef_run.def inside the source repository.

    There are a bunch of test script inside tests folder (in source repository). They are not particularly organized yet, but that shouldn't prevent you from checking it out using mydef_run. Also, <F5> simply calls mydef_run.

  • There is also mydef_install. Perl has its official installation mechanism. However, for pure perl scripts like MyDef, it is rather an overkill. mydef_install is there to avoid unnecessarily complications.

That's it. Finally, let's learn some MyDef!

MyDef syntax

The MyDef manual is located in the repository and can be viewed here. A systematic approach of going over MyDef's syntax may work better for some people but it should take place somewhere more approprate. Here we can learn quite a lot by looking at following illustrative example:

include: your_custom_library.def
include: macros_dir/library_within_dir.def

    A: ...
    B: ...

# this is a comment
page: name
    module: general &num:, which is the default
    a: page-level macros or options

    Line $(A)
    Line $(B)
    $call more_1
    Indent here
        $call more_2

    subcode: more1
        line a
        line b

subcode: more_2
    line I $(a)
    line II $(b)
I hope you have inferred the following:
  • MyDef is indentation based. It works at block and line level. One should note that this does not mean it can't work with non-indentation based languages. As a matter of fact, it is well accepted that indentations can and should be added on top of the language syntax to enhance readability. MyDef embraces this practice and side-loaded the syntax so your indentations are forced into consistency.
  • At top indentation level, you can have include:, page, macros, and subcode. Ordering is arbitrary, depending on your literary needs. Having other text (that does start with these keywords) will not raise errors. They will simply be ignored -- become automatic documentation, but this practice may not always be preferred.
  • page: marks the output unit. A page is a file in output. You can have multiple pages in a single def file, but only do it if it makes sense, such as when they are short and shares some semantics. Remember, you always can share macros and subcodes by include libraries.
  • You can break long output files into multiple def files and include: them together in the main def file. In fact, as long as your computer has sufficient memory, you can have entire program in a single source file such as main.c. It significantly simplifies your code structure and build process.
  • There are both inline macros and block macros (subcode:).
  • The only significant syntax MyDef introduces is $(macroname...). It is used for both inline macro replacement and preprocessing directives. It does not clash with most programming languages unless you are writing shell scripts or Makefiles. Even in the latter, there are solutions.
  • As you may guessed, macros and subcodes are scoped and can be nested.
  • Lastly, # introduces comments. (Why do we need learn all the different commenting syntax for each language?)

Not trying to be complete, following is a sample of some additional features you may enjoy:

page: test
    $(for:name in A-Z)
            Hey $(name)! Such an honor!
            Hey $(name)! Nice to follow O, right?
            Hello $(name)!

    $(for: x,y,z and 1-3)
        $1: $2

    # sets macros with perl one-liners
    Right now is [$(now)].
Since I am not trying to be complete, you get what you can infer.

A realistic example.

Tutorials of a new language often choose artificial examples. While they are illustrating, they can also be misleading as the simple example cannot capture the complexities of practical reality. Same here. As you can tell, all the above examples are rather fake. Before conclude, I'll show you a real usage example.

It just happened I need update my resume. And being tired of messing with Microsoft Word, I want to try with LaTeX. I searched on the internet "latex resume templates", which lead me to this website -- The "Freeman" template is pretty nice, so I downloaded that. The example look like following:

% comments
% ... endless comments
% ...

    {\sffamily\Huge Dr. Gordon Freeman}\\\medskip
    {\Huge\color{headings}\cvtextfont Curriculum Vitae}
Well ..., I am a bit annoyed. So much text that is supposed to be ignored, against so little actual text that the user is supposed to edit (Dr. Gordon Freeman). It is going to be a tough sell to convince an average Joe that this is supposed to be superior to Word.

So after a couple hours of effort -- I know, but I have to brush up my LaTeX -- I had the following:

include: freeman.def
    Name: Hui Zhou
    Address: 0000 Xxxx Ave. Xxxxxxxxxx XX 00000
    Phone: (000) 000 0000
    Github: hzhou

page: cv_2018
    type: tex
    run: xelatex cv_2018

    &call template_freeman
        &call section_work
            &call work, 2014 -- current, Xxxxxx Xxxxxxxxxx, Research contractor
                Develop metrology models for nanoscale critical dimensions.
                Develop and maintain optical simulation software.
                Develop and run atomisitic simulations for quantum-scale devices.

            &call work, 2007 -- 2014, XXXX, Guest Researcher
                Develop algorithms for microscopic defect detection based on optical imaging and simulations.
                Develop optical metrology models.
                Develop simulation software based on FDTD and RCWA algorithm.

            &call work, 2004 --2006, XXXX, Research Associate
                Experimental study of Si(111) surfaces.
                Simulation study of Si etching process using Monte-Carlo algorithm.
                Develop and maintain data acquisition software, include PID control and data analysis.

            &call work, 1999 -- 2004, Univ. of XXXX, Research Assistant
                Programming for laboratory data acquisition and analysis.

        &call section_reference
            $call reference, Xxxxx Xxxxxx, Project Leader, XXXX, (000) 000 0000
            $call reference, Xxxxxxx Xxxxxr, Group Leader, XXXX, (000) 000 0000

        $call switch_column

        &call section_education
            $call education, 1999 -- 2004, Ph. D., Chemical Physics, Univ. of XXXX
            $call education, 1994 -- 1998, B. S., Chemical Physics, Univ. of XXXX

        &call tab_section, Programming Skills
            $call Entry, Proficient, Perl, C
            $call Entry, Excellent, C++, Java, Python, Javascript
            $call Entry, Familiar, LISP, Fortran, Pascal
            $call Entry, Expert, Linux administration

        &call section, Portfolio
            &call para, MyDef
                MyDef is a general-purpose preprocessor that works with multiple programming languages. It provides macros, syntax-customization, code restructure, and generic programming to languages with or without such facilities. As use of MyDef is independent of underlying languages, it provides a separation of syntactic layer from semantic layer to most programming languages.

            &call para, MyDef Modules
                Language specific output modules for Perl, C, C++, Java, etc. At plug-in level, syntactic customization and analysis can be easily implemented using Perl script.

            &call para, FDTD, RCWA, Aerial
                In-house code for solving Maxwell equations and simulating microscope imaging based on Fourier optics. Code available upon inquiry.

Voil\`a! Well separated contents. I can maintain this! A press of <F5> gives me the PDF.

The rest of the structure and template is in freeman.def.

Note: the original downloaded templates implements many TeX macros. TeX macros are also designed at word/token level and suffers the same readability problem. I translated most of them with MyDef facilities.

subcode: template_freeman
    # Freeman Curriculum Vitae
    # Version 2.0 (19/3/2018)
    %!TEX program = xelatex

    $call @setup

    $call name_box

    # -------------------------
    subcode: switch_column
        $call contact_box

    subcode: section(subject)

        subcode: para(@subject)

        subcode: left
        subcode: right

    subcode: tab_section(subject)
        &call section, $(subject)

        subcode: entry(head, @text)
            \textsc{$(head)} & $(text) \\

        subcode: Entry(head, @text)
            \textsc{$(head)} & $(text) \\[6pt]

    subcode: section_work
        &call section, Work Experience

        subcode: work(duration, place, job_title)
            &call left
            &call right
            &call right

    subcode: section_reference
        &call tab_section, References

        subcode: reference(name, position, employer, phone)
            $call Entry, -, \textbf{$(name)}
            $call entry, Position, $(position)
            $call entry, Employer, $(employer)
            $call entry, Phone, $(phone)

    subcode: section_education
        &call tab_section, Education

        subcode: education(duration, degree, department, school)
            $call entry, \textsc{$(duration)}, \textbf{$(degree)}
            $call entry, -, $(department)
            $call Entry, -, $(school)

    subcode: name_box
            {\sffamily\Huge $(Name)}\\\medskip
            {\Huge\color{headings}\cvtextfont R\'esum\'e}

    subcode: contact_box
                    \raisebox{-1pt}{\faHome} & $(Address) \\
                    \raisebox{-1pt}{\faPhone} & $(Phone) \\
                    \raisebox{0pt}{\small\faEnvelope} & $(Email) \\
                        \raisebox{-1pt}{\small\faDesktop} & \href{$(Url)}{$(Url)} \\
                        \raisebox{-1pt}{\faGithub} & \href{$(Github)}{$(Github)} \\
                        \raisebox{-1pt}{\faLinkedinSquare} & \href{$(Linkedin)}{$(Linkedin)} \\

# --------------------------------------
subcode: setup

    \geometry{ hmargin=1.5cm, vmargin=1.75cm, letterpaper, }


    #	FONTS



    \usepackage[sf,scale=0.95]{libertine} # Load Libertine as a \sffamily font for sans serif titles

    #   COLORS
    \definecolor{text}{HTML}{2b2b2b} # Main document font colour, off-black
    \definecolor{headings}{HTML}{701112} # Dark red colour for headings
    \definecolor{shade}{HTML}{F5DD9D} # Peach colour for the contact information box
    \definecolor{linkcolor}{HTML}{641c1d} # 25% desaturated headings colour for links
    # Other colour options: shade=B9D7D9 and linkcolor=A40000; shade=D4D7FE and linkcolor=FF0080


    \hypersetup{colorlinks, breaklinks, urlcolor=linkcolor, linkcolor=linkcolor}

    \fancyhf{} # This suppresses all headers and footers by default, add headers and footers in the template file as per the example
    \renewcommand{\headrulewidth}{0pt} # Remove the default rule under the header

    # ----------------------------------------------------------------------------------------
    # ---------------------------------------------------------------------------------------


    \titlespacing{\section}{0pt}{0pt}{8pt} # Spacing around section titles, the order is: left, before and after


Recent Posts