Using EMACS from LISP
By Richard Weyhrauch

This note is called: [CALLING-EMACS]
The notations used in this note is described in [NOTATION]
Under development

Contents:

Preface

This note documents how IBUKI uses elisp in EMACS and calls it from LISP. EMACS is perhaps the world's most efficient and featureful programmible editor. By creating and using this EMACS 'API' we avail ourselves of an expremely powerful tool for text processing. We use it all the time. This note was written as documantation for IBUKI's internal use so the referenced files are found on IBUKI servers but (mod directory names) this code will work anywhere.

Setting up a call to EMACS

First we choose a <name> for the EMACS 'function' and then follow thefollowing template:

  • use the name of the EMACS routine in the ~/.emacs file
  • make a file called <name>.scr to call from LISP
  • make a file called <name>.fun to implement the operation in EMACS
  • make a file called <name>.emx to hold auxiliary elisp definitions

Use the name of the EMACS routine in the ~/.emacs file

Add the following lines to the file ~/.emacs

------------------------------------------------------------------------------
(print "Before <name>")
(if (equal (getenv "RWWOP") "<name>")
  (load-file "<name>.fun")
)
------------------------------------------------------------------------------

This code determines if you want to call this routine and loads it when wanted, i.e., it essentially calls the routine <name> and executes it.

Make a <name>.scr file to call from LISP

This is a shell script that can be called either from the UNIX shell or from LISP using the 'command' function'.

------------------------------------------------------------------------------
#!/usr/bin/csh

setenv RWWOP "<name>"

foreach file ($1)
  setenv RWWEDIT $file
  emacs -batch
  end
------------------------------------------------------------------------------

This script depends on both the shell calling it and the version of EMACS you are using but the essencials are the same. This script runs in the UNIX shell 'csh'.

Make a file '<name>.fun' to implement the operation in EMACS

This file contains elisp code and is called <name>.fun by convention. The first line in the file allows it to be executed as a stand alone script file.

------------------------------------------------------------------------------
#!/usr/bin/emacs --script

; these functions destructively edit a file using emacs

; load the needed emacs macro files
; for convience *example-dir* is defined in ~/.emacs 
;
(load-file (concat *example-dir* "<name>.emx"))

; get the file to edit
; RWWEDIT is set in <name>.scr 
;
(find-file (getenv "RWWEDIT"))

; execute a series of emacs list commamds

;; some spellaing fixes (defined in %lt;name>.emx)
(beginning-of-buffer)
(execute-kbd-macro 'fix-sissor)
(beginning-of-buffer)
(execute-kbd-macro 'fix-sizzer)

; accounts for the failure of search-forward in the macro
; the macros are defined in %lt;name>.emx
(condition-case nil
  (progn
    (execute-kbd-macro 'script-all)
  )
  (error nil)
)

; this is one way of programatically
;   saving the current buffer  AND
;   exiting EMACS
; save the edited buffer (loaded above)
(basic-save-buffer)

; exit EMACS (return to the <name>.scr file) 
(kill-emacs)
------------------------------------------------------------------------------

Make a file called '<name>.emx' to hold auxiliary elisp definitions

This file contains elisp code. It is optional but IBUKI uses it because it makes code sharing between EMACS calls convient.

------------------------------------------------------------------------------
;; 'safe' spelling correction
;; these examples, of course, just replaced a mispelled word
;;   but the idea should be clear.
;; There are other elisp commands that would do the same job  BUT
;;   this illustrates making a keyboard macro callable from LISP
;;   IBUKI uses this to do complex format changes by making a keyboard macro
;;   eg UTF-8 to printable ASCII - its much simpler than writing the LISP code
;;
; sissor ==> scissor
(fset 'fix-sissor
  "\C-[<\C-[xreplace-string\C-msissor\C-msizzor\C-m\C-[<")
; sizzor ==> scissor
(fset 'fix-sizzer
  "\C-[<\C-[xreplace-string\C-msizzer\C-msissor\C-m\C-[<")

; remove one <script>s from an HTML file
(fset 'script
   "\C-s\C-b\C-f\C-w\C-[<")
; remove 1000 <script>s from an HTML file
(fset 'script-all
   "\C-[1000\M-xscript")

; unused example of executing code that doesn't die if search fails
;(defun vc-get-number-of-servings ()
;  (condition-case nil
;    (progn
;      (beginning-of-buffer)
;      (search-forward "interesting string" nil nil)
;        ... do something ...
;     )
;     (error nil)
;   )
;)
------------------------------------------------------------------------------

Testing an EMACS call

There are 4 example files that facilitate testing

  • example.src
  • example.fun
  • example.emx
  • example.txt
These are called either as a shell script
>ibutc
>cd IBML/scripts
>./example.scr

or from LISP >(command (ibuki-test "IBML/scripts/example.scr")

both of these destructively edit 'example.txt'. Every time the script is run it copies example.tst to 'example.txt' and fizes the spelling of 'sissora'.

Available Scripts


IBUKI uses emacs for some complex text manipulations and for speed
From LISP
  The idea is to think of these files as an API to emacs 
  that can be called as part of some processing, eg. we might
   1) get an OCRed file from the internet archive and
   2) run some text preperations in emacs
   3) make an ARC file - putting the text in ARC format
   4) run some text cleanups in both lisp and emacs
 
arcrtf.emx               ; arc chars to RTF
arcspl.emx               ; arc chars to ascii - for spelling
brit-amer.emx            ; british spelling to american - not functional
emacs.emx                ; one macro - clear-buffer
fix-squote.emx           ; tries to deal with single quote
fix.emx                  ; generic fixes - normalization of spaces
fix001.emx               ; various forms to ARC chars
fixcap.emx               ; normalization caps in ARC files
greek.emx		 ; greek-html-entities-to-unicode
html.emx                 ; utilities for manipulating HTML
hyphan.emx               ; hyphen fixups (quicker than rules)
init.emx                 ; forces ENACS to output UNIS file
iso-8859-1.emx           ; functions for iso to HTML entities (to no-accent)
ligatures-out.emx        ; ARC symbol ligatures ==> ASCII
ligatures.dic            ;   ditto
notes                    ; this file
notes.chars.html         ; some notes on characters
ocr-begin.fun            ; " ; " to "; "
ocr-fix.emx              ; frequent ocr mistakes (classified by type)
pre-arc.emx              ; HTML ==> facilitate ARC eg " ==> [dq]
replace.emx              ; tidy up ARC files
replace.fun              ;   ditto
replace.scr              ;   run from a script
save.emx                 ; basically docs
spanfix.emx              ; span-simple-markup to ARC
splarc.emx               ; ASCII accent marks ==> ARC chars (for spelling)
spltest.emx              ; somewhat ad hoc spelling correction (ocr created?)
spltmp.emx               ;    ditto
spm.emx                  ; absolute spelling correction
spm.emxSPELLINGL         ; tidy up ARC files (compare with other and 'fix'
squote.emx               ; squote fixup 
ktest.emx                 ; test for fancy macro
test.file                ;   test file
text-fixup               ; like hyphan above
text-fixup.emx           ;   ditto
to-arc.emx               ; mismash of trans
to-ascii.emx             ;   more and different
to-char-entities.emx     ; chars and unicode to HTML entities
to-unicode.emx           ; utf-8 to unicode