screwlisp proposes kittens

WIP common lisp type theoretic org-file parser (is this a parser? What kind of parser?)

If you are specially interested in emacs org-mode

This Sunday morning in Europe, I.e. on September 7th 8am UTC Karl Voit (Mastodon) will be joining us for a Lispy Gopher Climate Peertube Live about emacs org as compares markdown. Watch the show Mastodon for details. Pre-question/comment thread (three boosts, zero questions…).

Back to the Introduction

I wanted to write a ā€œrealā€ parser for org files hosted by common lisp. Since I wanted to do it quickly, I thought I would write a recursive descent parser. However, it seemed like I could use types and typep. I’m not deep in parsers and parsing, so I wonder if someone can tell me if I’m barking up the wrong tree, or mad here.

I guess instead of having an operator precedence metric, I have used explicit type precedence.

Recalling also that ANSI CL has satisfies to use arbitrary single-argument predicates in deftypes.

types for orgmode lines

Since org-mode uses lines with distinguishing leading characters, I am sort of imagining later for

*** Foo bar

a type resembling (I am not sure if I can use list like these cons types)

(defun all-starsp (string)
  (equal (remove-duplicates string) "*"))
  
(deftype org-stars (level)
  `(and (simple-array character (,level))
	(satisfies all-starsp)))

(deftype org-heading (level)
  `(cons (org-stars ,level) string))

whence

CL-USER> (typep '("***" . "Foo bar") '(org-heading *))
T
CL-USER> (typep '("***" . "Foo bar") '(org-heading 3))
T
CL-USER> (typep '("***" . "Foo bar") '(org-heading 1))
NIL

with the caveat that common lisp does not directly provide a way for me to specify these types in a way like (org-heading 3+) to mean any-heading-level-3+. I would have to explicitly make that type using satisfies.

(defun car-length-2+ (x) (<= 2 (length (car x))))
(deftype org-heading-2+ (n) `(and (org-heading ,n)
				  (satisfies car-length-2+)))
(defun car-length-3+ (x) (<= 3 (length (car x))))
(deftype org-heading-3+ (n) `(and (org-heading ,n)
				  (satisfies car-length-3+)))

E.g. of what I made so far - types claiming types alist

EDIT: make-parser’s first arguement is the reader function used as used with funcall, and the second arguement is an association list with :equal (function typep) for assoc.

(defparameter *p*
  (make-parser 'read
	       '(((org-heading 1) string (org-heading-2+ *))
		 ((org-heading 2) string (org-heading-3+ *))
		 ((org-heading 3) string))))

Building a parse tree (?)

Admittedly, a bit preprocessed. The idea is that the reader argument will be customized.

(with-input-from-string
    (in
     (format nil "~@{~s~^ ~}"
	     "foo"
	     '("**" . "bar")
	     '("**" . "baz")
	     "buz"
	     "frob"
	     '("***" . "ulously")
	     "zaz"
	     '("**" . "featherless")
	     "biped"))

  (funcall *p* in '(("*" . "Heading number 1"))))

Resulting in the s-expression:

(("*" . "Heading number 1")
  "foo"
  ("**" . "bar")
  ( ("**" . "baz")
     "buz"
     "frob"
     ( ("***" . "ulously")
        "zaz"))
  ( ("**" . "featherless")
     "biped"))

of which, I think we can see that the parser nested terms according to common lisp’s types.

That parser maker

(defun make-parser
    (stream-reader rules &aux (done (gensym)))
  (lambda
      (stream results)
    (flet
	((belongs-to
	     (prev-line current-line)
	   (let
	       ((owned-types (cdr (assoc prev-line rules :test 'typep))))
	     (find current-line owned-types :test 'typep))))
      (labels
	  ((correctly-file (current-line list)
	     (cond
	       ((and (consp (car (last list)))
		     (correctly-file current-line (car (last list))))
		t)
	       ((belongs-to (car (last list)) current-line)
		(rplaca (last list) (list (car (last list)) current-line)))
	       ((belongs-to (car list) current-line)
		(nconc list (list current-line))))))
	(do ((line
	      (funcall stream-reader stream)
	      (handler-case
		  (funcall stream-reader stream)
		(end-of-file () (return results)))))
	    (nil)
	  (correctly-file line results))))))

Conclusions (actually questions for you)

So, I think this is a Pratt parser, except that instead of having a binding priority metric, I have an explicit association list of what common lisp types belong in what other common lisp types.

Admittedly, I specifically made use of cons types- I wasn’t sure how to write a sequence type with types at different positions other than just using satisfies some more.

If this is not a Pratt / recursive descent parser except with types instead of binding powers, please let me know sooner rather than later, or any other notes about the (lack of) style and strategy.

I am a bit reticient to finish the parser without some outside comments on what exactly seems to be happening.

One oddity is the way the parser proceeds. At each token read, the parser recurses as deep as possible into the last of the current output list and bubbles back up to find where the deepest place the current token read sticks (according to type relationships). This means that the result incrementally grows correctly with each token, and the parsing can be stopped and resumed, or even fixed or given a new input later.

About orgmode

I think that on inspection, orgmode is quite a lot like nested, tagged list s-expressions. So when I set out to show how different a system based on s-expressions would be, I found orgmode seemed a lot like that already.

As Karl Voit has written, orgmode is very amenable to outside processing compared to the somewhat unstructured Markdown, even just as a file format and without using orgmode’s actual functions in emacs lisp, which would be the first class option.

In my example, we did come across I guess a difference of orgmode to s-expressions- there’s no way to close the current heading and return to the level above, whereas this is standard to do in s-expression formats such as the cons tree I was nascently parsing orgfile into. I guess it’s the dangling else problem.

Cons Types in lisp

Jack Daniel was working on these for Embeddable Common Lisp recently.

Fin.

See you on the Mastodon to hear your parsing thoughts.

Also pre-questions and comments for Karl Voit. If you live in any of the Americas, you will probably be asleep for the live interview at 8am UTC Sunday.

screwlisp proposes kittens