m4 support in Opbasm

The m4 preprocessor adds powerful facilities for enhanced PicoBlaze assembly. m4 is typically already present on most Linux systems. A precompiled Windows binary is included with Opbasm. Opbasm will automatically run m4 if a source file has the extension “.psm4” or “.m4” or on any file if the --m4 option is used. The included macro package depends on some GNU extensions so GNU m4 must be used if the built-in macros are employed.

Around 200 predefined macros are provided with Opbasm covering the following areas:

None of the macros use the PicoBlaze-6 inactive bank registers. They are all portable to both targets although a few lose some functionality on PicoBlaze-3.

Using these macros you can write code in a higher-level style like the following Fibonacci generator.

With macros: After expansion:
fibonacci:
  ; Generate the first 10 Fibonacci numbers
  vars(s0 is counter, s1 is two_prev := 0,
      s2 is prev := 1, s3 is next)
  for (counter := 0, counter < 10, counter := counter + 1) {
   if (counter < 2) {
     LOAD next, counter
   } else {
     ; Compute next number
     expr(next := two_prev + prev)
     LOAD two_prev, prev
     LOAD prev, next
   }
   push(next)
   CALL print_num   ; Output the next number
  }
  RETURN
      fibonacci:
                 ; Generate the first 10 Fibonacci numbers
                 LOAD s1, 00          ; Var two_prev := 0
                 LOAD s2, 01          ; Var prev := 1
                 ; Expression: s0 := 0
                 LOAD s0, 00

    FOR_f1_0001:
                 ; If s0 < 10
                 COMPARE s0, 0a
                 JUMP nc, GE_f1_0004

                 ; If s0 < 2
                 COMPARE s0, 02
                 JUMP nc, GE_f1_0006
                 LOAD s3, s0
                 JUMP ENDIF_f1_0007

     GE_f1_0006:
                 ; Compute next number
                 ; Expression: s3 := s1 + s2
                 LOAD s3, s1
                 ADD s3, s2
                 LOAD s1, s2
                 LOAD s2, s3

  ENDIF_f1_0007:
                 STORE s3, (sf)       ; Push
                 SUB sf, 01
                 CALL print_num       ; Output the next number

NEXTFOR_f1_0002:
                 ; Expression: s0 := s0 + 1
                 ADD s0, 01
                 JUMP FOR_f1_0001
     GE_f1_0004:
ENDLOOP_f1_0003:
                 RETURN

Installing m4 on Windows

Opbasm includes GnuWin32 Windows binaries for m4. It will be installed by the setup.py script.

In the Cygwin environment, the included binary will not be executed and you must install the Cygwin version of m4. In the Cygwin installer select the m4 package (under Interpreters) for installation. Opbasm will be able to use m4 immediately within a Cygwin shell.

Overview of m4

m4 is a general purpose macro preprocessor. It performs text based string manipulation through repeated expansion of previously defined macros including support for recursion. The preprocessor has a number of built in macros and provides a means to define your own. Macros have a name that contains letters, digits, and underscores. Any arguments are enclosed in parentheses and delimited by commas. Expansion is suppressed by enclosing text in quote characters which are `' by default. Comments start with “#” by default but have been changed to use “;” to match PicoBlaze syntax.

m4 uses different syntax from PicoBlaze assembly to represent different types of literals. It is important to know what context you are operating in to determine which type of literal to put in your source.

Type PicoBlaze m4
Decimal 10’d 10
Hexadecimal 0a 0x0a
Binary 00001010’b 0b1010
Char “A” A or `A'

In general you use m4 syntax for literals passed as arguments to macros within parentheses. The only exception is pbhex() which takes a list of hex values in PicoBlaze format. Be careful not to use PicoBlaze hex format in other m4 contexts as it will be misinterpreted as decimal if only digits 0-9 are used.

m4 includes the ability to evaluate arbitrary integer expressions using the builtin eval() macro. Its default output is an m4 decimal integer so the similar evalh(), evald(), and evalb() are provided to evaluate expressions resulting in PicoBlaze hex, decimal, or binary format.

load s0, evald(4 * 5 + 1)     ; Expands to "load s0, 21'd"

The expression evaluator permits the natural use of negative decimal literals:

load s0, evalh(-20)           ; Expands to "load s0, ec"

The evala() macro works like evalh() but expands to a 12-bit PicoBlaze address.

define(DATA_ORG, 0x200)
address evala(DATA_ORG)       ; Expands to "address 200"

m4 expressions support all of the C language operators as well as ** for exponentiation.

Note that evalh(), evald(), and evalb() cannot be nested within other macros because they expand with a comment reporting the original expression to make the listing file easier to read. If you need to evaluate an expression within another macro you should use the builtin eval() macro. Of particular note it is important to know that Picoblaze constant directives are temporarily converted into an undocumented const() macro so that constants defined in Picoblaze syntax are accessible to m4. As a consequence you can’t use the custom eval macros that generate a comment to compute a constant value.

constant BAD_CONST,  evalh(1+1)       ; This will fail during m4 expansion
constant GOOD_CONST, eval(1+1, 16, 2) ; Generate zero-padded hex constant

An evalx() macro is available which works like the builtin eval() but also accepts strings that are not valid expressions.

load s0, evalx(9 + 2, 16, 2)  ; Expands to "load s0, 0b"
constant CNAME, 1f
load s0, evalx(CNAME)         ; Expands to "load s0, CNAME"

You can define aliases for registers without altering the original as with namereg.

define(alt_name, s0)
load alt_name, 01             ; Expands to "load s0, 01"
add s0, 01                    ; s0 register is still visible

Special logic is implemented in a preprocessor stage so that PicoBlaze constants are visible to m4. They are automatically converted from PicoBlaze format into m4 format.

constant THE_ANSWER, 42'd
expr(s0 := s1 + THE_ANSWER)                            ; Same as expr(s0 := s1 + 42)
if(s0 > THE_ANSWER, `output s1, 00', `output s2, 00')  ; Left operand is treated like a constant

You can use also use define() to establish constants that are visible to m4 and create more complex macros. Michael Breen’s notes on m4 provide a good introductory overview to m4. The Gnu m4 manual provides more detailed documentation.

Type conversions

Some basic macros are provided to perform type conversions. They are useful for constructing parameters to other macros that only expect decimal values.

The pbhex() macro is used to convert a list of values in PicoBlaze hex format into m4 decimals.

pbhex(0a, 0b, ff)         ; Expands to "10, 11, 255"

The asciiord() macro converts a string of one or more characters to a list of decimals representing their ASCII encoding. Quotes are not strictly necessary but guard against including trailing whitespace.

asciiord(0)               ; Expands to "48"
asciiord(`any str')       ; Expands to "97, 110, 121, 32, 115, 116, 114"

If you need a NUL terminated string, the cstr() macro works the same but appends a terminating 0:

cstr(`1234')     ; Expands to "49, 50, 51, 52, 0"

The words_le() and words_be() macros convert a list of 16-bit numbers into little-endian or big-endian bytes.

words_le(0xff01, 0xff02)  ; Expands to "1, 255, 2, 255"
words_be(0xff01, 0xff02)  ; Expands to "255, 1, 255, 2"

Conditional code

You may want to conditionally generate portions of a program or pass build time parameters to macros for different results. This can be accomplished with the m4 ifdef() macro.

ifdef(`VARNAME`, `
  <Defined conditional code here>
', `
  <Undefined conditional code here>
')


ifdef(`VARNAME', `load s0, 10')  ; Defined

ifdef(`VARNAME',, `load s0, 20') ; Not defined

load s1, MAXVAL

You can omit either block of the ifdef() macro if you want generation only for the defined or undefined conditions. To control the selected code block you pass defined variables with the -D option to Opbasm:

opbasm -DVARNAME -DMAXVAL=42 foo.psm4

This will define “VARNAME” as an empty string and “MAXVAL” with the string “42” which will be passed on unaltered to the assembler. These defined variables become macros which will be substituted with their value like any other macro.

General purpose macros

A few of the macros depend on modifying a temporary register. To simplify the macro calls, a preallocated temp register is used. It is set to sE by default. You can change it to another register by calling use_tempreg(). The temp register can be accessed in your own macros by using the _tempreg macro. The temp register is never preserved on the stack and you should not store data you want preserved across invocations of Opbasm macros.

use_tempreg(sA)    ; Switch to sA for the temp register

The following macros use the temp register:

expr2s load_out load_store setcy use_multiply8x8
use_multiply8x8s use_multiply8x8su use_divide8x8 use_divide8x8s use_divide16x8
use_divide16x8s use_divide8xk use_random8 use_memcopy use_memwrite
use_bcdwrite use_hexwrite use_int2bcd use_ascii2bcd use_bcd2int

The other expr() macros use the temp register indirectly when the mul and div operations are invoked.

You can guard against accidentally using the temp register for long term storage by renaming it with the namereg directive:

namereg sE, TEMPREG
use_tempreg(TEMPREG)

Now you can’t accidentally assign something to sE that will be overwritten by a macro using the _tempreg macro.

PicoBlaze programs commonly contain lists of constant declarations for IO port addresses. The iodefs() macro simplifies their declaration by allowing contiguous sequences of ports to be named in one statement. It can also be used to define scratchpad addresses.

; Usage: iodefs(<start port>, [port names]+)
iodefs(0, P_control, P_read, P_write)

; Expands to:
  constant P_control, 00
  constant P_read, 01
  constant P_write, 02

The vars() macro allows you to associate alias names with a register. Unlike the namereg directive, the original register name is still available. An optional initial value can be provided:

; Usage: vars([<reg> is <alias> [:= <init>]]+)
vars(`s0 is count := 0', `s1 is sum')

; Expands to:
  load s0, 00

Symbols “count” and “sum” can now be used in place of s0 and s1. You should quote each variable declaration to avoid macro expansion errors when redefining an existing variable. Use the popvars macro to remove all variables defined in the previous call to vars().

Stack operations

A set of macros are available to simulate a stack using the scratchpad RAM. You initialize the stack and establish the stack pointer register with a call to use_stack(). After that you can call push() and pop() to manage registers on the stack. You can push and pop any number of registers at once. Pops happen in reverse order to preserve register values when passed the same list as push(). The stack grows down so the initial address should be the highest the stack will occupy.

namereg sF, SP      ; Protect sF for use as the stack pointer
use_stack(SP, 0x3F) ; Start stack at end of 64-byte scratchpad
...

my_func:
  push(s0, s1)
  <Do something that alters s0 and s1>
  pop(s0, s1)
  return

The getstack(), getstackat(), and dropstack() macros can be used to retrieve and drop values from a stack frame. This provides a facility for passing function arguments on the stack and is particularly useful for writing functions that take a variable number of arguments. The argument to dropstack() can be a register to drop a variable number of arguments.

  load s0, BE
  push(s0)    ; First argument
  load s0, EF
  push(s0)    ; Second argument
  call my_func2

my_func2:
  getstack(s3, s4)     ; Retrieve first and second argument
  <Do your business>
  dropstack(2)         ; Remove arguments from the stack
  return

You can use the getstackat() macro to retrieve values from the stack one at a time in any order.

my_func3:
  getstackat(s4, 1)    ; Retrieve second argument (SP + 1)
  getstackat(s3, 2)    ; Retrieve first argument  (SP + 2)
  <Do your business>
  dropstack(2)         ; Remove arguments from the stack
  return

You may wish to allocate temporary space on the stack for local variables in a function. Use the addstack() macro to accomplish this. putstack() and putstackat() are used to store register values on the stack without altering the stack pointer.

my_func4:
  addstack(4)              ; Add 4 bytes to the stack to work with
  putstack(s0, s1, s2, s3)
  getstackat(s4, 2)
  dropstack(4)             ; Remove local frame

Bitfield operations

A set of macros are available to manipulate bitfields without manually constructing hex masks.

load s0, f0
setbit(s0, 0)                ; s0 = f1
setbit(s0, 2)                ; s0 = f5
clearbit(s0, 7)              ; s0 = 75

setmask(s0, mask(0,1,2,3))   ; s0 = 7f
clearmask(s0, mask(4,5,6,7)) ; s0 = 0f

testbit(s0, 0)               ; Test if bit-0 is set or clear
jump nz, somewhere

The maskh() macro works like mask() but produces a result in PicoBlaze hex format so it can be used as a direct argument to any instruction that takes a constant.

load s0, maskh(0,1,2,6,7)  ; Expands to "load s0, c7"

Shift and rotate

Shifts and rotates are inconvenient in PicoBlaze assembly because they must be performed one bit at a time. Macros are provided that generate shifts and rotates by any number of bits more easily. The shift amount must be a constant integer. It cannot come from another register.

load s0, 01
sl0(s0, 4)  ; Shift left by 4 bits  s0 = 00010000'b
sr1(s0, 3)  ; Shift right by 3 bits with 1's inserted  s0 = 11100010'b

All 10 of the PicoBlaze shift and rotate instructions have macro equivalents. The original instructions can still be used as usual.

sl0() sl1() sla() slx() rl()
sr0() sr1() sra() srx() rr()

Conditional jump call and return

PicoBlaze assembly depends on using the carry and zero flags directly to handle conditional jump and call instructions. It can be difficult to remember how the carry flag is interpreted so a set of macros are provided to perform more natural conditional instructions.

compare s0, s1
jne(not_equal)           ; Jump if s0 != s1
jeq(equal)               ; Jump if s0 == s1
jge(greater_or_equal)    ; Jump if s0 >= s1
jlt(less_than)           ; Jump if s0 < s1

callne(not_equal)        ; Call if s0 != s1
calleq(equal)            ; Call if s0 == s1
callge(greater_or_equal) ; Call if s0 >= s1
calllt(less_than)        ; Call if s0 < s1

retne                    ; Return if s0 != s1
reteq                    ; Return if s0 == s1
retge                    ; Return if s0 >= s1
retlt                    ; Return if s0 < s1

Conditional if-then-else

A high level if() macro is present that provides evaluation of infix Boolean expressions. It takes the form of if(<expr>,<true block>,[<expr>,<true block 2>...|<else block>]). The expression syntax uses conventional C operators ==, !=, <, ,>=, >, <=, &, and ~&. Additional expressions after the first true block produce else-if evaluation similar to m4’s ifelse() macro. It is important to guard code blocks with m4 quotes to avoid errors caused by m4 splitting strings with internal commas. The if() macro implements a compare instruction and generates the appropriate branch logic to test the flags. Unique generated labels are inserted into the code to manage the sequencing of the code blocks.

load s0, 05
if(s0 < 10,
  `load s1, "T"
  output s1, 00',
; else-if
s0 < 8,
  `load s1, "t"
  output s1, 01',
;else
  `load s1 "F"
  output s1, 02'
)

In addition, the & and ~& operators can be used to generate a test instruction instead of compare. For & the true block is executed if the test result is non-zero:

; Check if MSB is set
if(s0 & 0x80, `load s1, 00')

For ~& the true block is executed if the test result is zero:

; Check if MSB is clear
if(s0 ~& 0x80, `load s1, 00')

You can invoke signed comparison using the compares() macro by wrapping the expression in signed():

load s0 evalh(-10) ; -10 = 0xF6 which evaluates as > 5 in unsigned comparison
if(signed(s0 < 5),`load s1, 00') ; evaluate as < 5 using signed comparison

Macros can be used within the code blocks including nested if() macros:

if(s0 < s1,
   `<something>',
; else
  `if(s2 >= s3,`<something else>')'
)

Note

The > and <= operators have to be simulated because the limited Picoblaze ALU flags don’t permit them to be implemented directly. If both operands are registers they are swapped and the reverse comparison operation (< or >= ) is performed. If the right operand is a constant it has to be adjusted by adding one to its value and swapping the true and false conditional blocks. For instance “s0 > 0x20” is converted to “s0 <= 0x21” with the false condition (originally true) executed when s0 is greater than 0x20.

This can lead to problems when doing comparisons with 0xFF because the 0x100 can’t be used as an immediate instruction value. You may have to find alternate ways to express comparison logic when dealing with the 0xFF and 0x00 boundary values. Consider a loop counter that you want to terminate after passing 0xFF. Instead of testing for “sN > 0xFF” you should test for “sN != 0” and ensure that this won’t cause early termination at the start of the loop.

C-style syntax

The m4 syntax for the if() macro is a little untidy but an alternate C-style syntax can be used. It is implemented using an initial preprocessing step where pattern matching converts C-style control flow statements into m4 syntax. Instead of m4 quotes, code blocks are surrounded by mandatory curly braces. Unlike m4 macros, whitespace is permitted between the if keyword and its comparison expression.

if (s0 < s1) {
  load s0, "T"
} else if (s2 == s3) {
  load s0, "t"
} else {
  load s0, "F"
}

A set of lower level if-then-else macros are provided to expose the internal workings of if(). The macros are ifeq(), ifne(), ifge(), and iflt(). Unlike if(), no compare or test instruction is generated from an expression. You have to prepare the flags on your own. The first argument is the code to execute for the true condition. An optional second argument is used for the else clause.

compare s0, s1
ifeq(
  `load s4, 20
   output s4, PORT',
; else
  `load s4, 30
   output s4, PORT2')

This expands to the following:

compare s0, s1
jump nz, NEQ_f1_0001
load s4, 20
   output s4, PORT
jump ENDIF_f1_0002
NEQ_f1_0001:
; else
  load s4, 30
   output s4, PORT2
ENDIF_f1_0002:

Looping

Similarly to if() there are a set of high level looping macros for(), while(), and dowhile(). They implement the corresponding looping constructs using the syntax for(<init>,<expr>,<update>,<loop body>) and [do]while(<expr>,<loop body>). Signed comparison is supported just as with if() using the signed() macro as a modifier. The for loop macro uses the expr() macro syntax for the init and update fields.

for(s0 := -10, signed(s0 < 10), s0 := s0 + 1,
  `output s1, P_FOO'
)
; Output s1 to port 00 10 times
load s0, 00
while(s0 < 10,
  `output s1, P_FOO
   add s0, 01'
)

C-style syntax

An alternate C-style syntax is also available for for(), while(), and dowhile(). Note that the for() macro continues to use commas to separate the sections.

; For loops
for (s0 := 0, s0 < s1, s0 := s0 + 1) {
  output s0, P_FOO
}

; While loops
while (s0 < s1) {
  add s0, 01
  output s0, P_FOO
}

; Do-while loops
do {
  add s0, 01
  output s0, P_FOO
} while (s0 < s1)

Two macros, break and continue, are available to exit the current loop and restart a loop respectively. In a for loop the continue macro will execute the update field expression to prepare the next iteration.

; "continue" resumes execution here
while (s0 < s1) {
  add s0, 01
  if (s3 == 4) { continue }
  if (s2 == 5) { break }
  output s0, 00
}
; "break" resumes execution here

Procedures and Functions

A set of macros are available that can streamline the creation of procedures, functions, and interrupt service routines. All of these macros have a C-style block syntax which is the preferred way to invoke them.

proc

The most basic is the proc() macro which is a convenience routine creating a labeled code block with an included vars() macro for variable definitions, a final return instruction, and automatic “;PRAGMA” comments identifying it as a function.

proc addinc(s0 is count, s1 is inc) {
  add count, inc
}
...

call addinc

; Expands to:

        ;PRAGMA function addinc [s0 is count, s1 is inc] begin
addinc:
        ADD s0, s1
        RETURN
        ;PRAGMA function addinc end

CALL addinc

The “argument” list to proc is passed on to the vars() macro. It can include local variables used by the procedure. You are responsible for loading arguments into registers and cleaning up temporary registers.

func

The func() macro provides a more elaborate function generator that takes care of handling arguments by passing them on the stack. A dynamically generated macro is created for calling each defined function. func() takes a list of registers to pass as arguments as well as an optional number of bytes for values returned on the stack. those registers are placed on the stack and then popped into local registers that are saved and restored after the function completes. The argument list is in the same “Sn is Y” syntax used by the vars() macro but you can also just list register names without providing an alias.

; func <funcname>(<vars>) : <optional return bytes> {}

func addinc(s0 is count, s1 is inc): 1 {
  add count, inc
  retvalue(count, 1) ; Save the return value on the stack
}
...

; Call function with s3 and s4 as args
addinc(s3, s4)
pop(s5)       ; Get the return value

; Expands to:
              ;PRAGMA function addinc [stack(s0 is count, s1 is inc : 1)] begin
      addinc:
              <Save registers and retrieve arguments from stack frame>

              ADD s0, s1
              <Put the result on the stack>

LEAVE_addinc:
              <Restore saved registers and remove stack frame>
              RETURN
              ;PRAGMA function addinc end

              ; Call function with s3 and s4 as args
              ; Push arguments:
              STORE s3, (sf)                 ; Push
              SUB sf, 01
              STORE s4, (sf)                 ; Push
              SUB sf, 01
              CALL addinc
              ADD sf, 01
              FETCH s5, (sf)                 ; Pop

After the function call the registers will be in the same state they were before the function call and any return values will be on the stack. Unlike with proc() the parameter list is only used to define arguments. You are responsible for preserving any registers used internally for local variables. The retvalue() macro takes a register for its first argument and the index of the return byte from the top of the stack starting from 1.

You cannot use a return instruction inside the code body of a func() macro because the stack cleanup code will not be executed. Instead you must call the leave_func() macro whenever you want to exit early. It will ensure the cleanup code is executed.

isr

A variant of the func() macro is available for defining ISRs. The isr() macro is similar to func() but you specify an address for the interrupt vector instead of a name and in place of the return byte count you specify whether the ISR returns with interrupts enabled or disabled. Interrupts are enabled by default if the last parameter is omitted.

; isr <address>(<vars>) : [enable | disable] {}

isr 0x3FF(s0) : enable {
  output s0, FF
}

; Expands to:

       __ISR:
              ADDRESS 3ff                    ; 0x3FF
              JUMP __ISR
              ADDRESS __ISR
              ;PRAGMA function __ISR begin
              <Save registers on stack>
              OUTPUT s0, FF

 LEAVE___ISR:
              <Restore registers from stack>

              RETURNI enable
              ;PRAGMA function __ISR end

ISRs take no arguments and the variable list only serves to identify which registers are used in the ISR so that they can be saved on the stack. There can only be one isr() macro call in a program. You can use leave_func() or the equivalent leave_isr() macro to exit early from an ISR. Do not call returni directly within the ISR code block as that will leave saved registers on the stack without cleaning up.

Delay generators

A set of delay generator macros are available to implement software delays. The simplest is delay_cycles() which delays by a number of instruction cycles (each being two clock cycles). By default it is implemented with recursive loops and requires no registers to function.

delay_cycles(40)   ; Delay for 40 instructions (80 clock periods)

This expands to the following recursive code implemented in 13 instructions:

                   CALL DTREE_f1_0001_4           ; Delay for 33 cycles
                   JUMP DTREE_f1_0001_end
  DTREE_f1_0001_4: CALL DTREE_f1_0001_3
  DTREE_f1_0001_3: CALL DTREE_f1_0001_2
  DTREE_f1_0001_2: CALL DTREE_f1_0001_1
  DTREE_f1_0001_1: CALL DTREE_f1_0001_0
  DTREE_f1_0001_0: RETURN
DTREE_f1_0001_end:
                   CALL DTREE_f1_0002_1           ; Delay for 5 cycles
                   JUMP DTREE_f1_0002_end
  DTREE_f1_0002_1: CALL DTREE_f1_0002_0
  DTREE_f1_0002_0: RETURN
DTREE_f1_0002_end:
                   LOAD sf, sf                    ; NOP
                   LOAD sf, sf                    ; NOP

The delay can be from 0 to approximately 100e9 but a practical limit would be to keep the delay less than 200 cycles to restrict the amount of generated code. You must ensure that there is enough space on the call stack to perform the recursive calls. In the example above the 33-cycle delay block extends five calls deep.

An alternate implementation of delay_cycles() can be invoked by first configuring it with the use_delay_reg() macro. You call it with a single register to use for a delay counter. This register must be different than the ones used for the long period delay macros described next. With a delay register configured, the delay_cycles() macro will be implemented as a small loop for delays of 511 cycles or less. Longer delays will fall back to using recursive delay trees.

use_delay_reg(s6)
delay_cycles(40)

; Expands to:


                LOAD s6, 13                    ; (40 - 1) / 2
 DLOOP_f1_0001:
                SUB s6, 01
                JUMP nz, DLOOP_f1_0001
                LOAD se, se                    ; NOP

Time delays

Delays by microseconds and milliseconds are implemented with the delay_us() and delay_ms() macros. Before using these you must establish the system clock frequency with the use_clock() macro. These delays are cycle accurate if the requested delay is an integer multiple of the clock period. They have the ability to adjust the delay down by a certain number of instructions if needed to account for function call or loop overhead.

use_clock(100)                     ; 100 MHz system clock
use_delay_reg(s6)                  ; Use compact internal delay loop

; 10 ms delay subroutine
delay_10ms: delay_ms(10, s4,s5, 2) ; Adjust delay by 2 instructions for call and return
            return

...
call delay_10ms
; Exactly 10 ms have passed here

...
delay_ms(10, s4, s5)               ; Inline delay by 10 ms
; Exactly 10 ms have passed here

The delay_*() macros take a delay value, a pair of registers and an optional instruction adjustment as arguments. The delay value is the amount of delay in the associated units. The upper delay limit depends on the clock frequency. It has a complex relationship that is approximated by the equation max_delay = 22.05e6 * clock_freq ^ -1.0016 . You will get a macro error if a delay is too large for the currently selected frequency. The following table shows the maximum delays for representative clock frequencies:

50 MHz 429 ms
75 MHz 286 ms
100 MHz 214 ms
125 MHz 171 ms
150 MHz 143 ms

The registers are used for an internal 16-bit counter. The internal delay loop is automatically adjusted to ensure the count value fits within 16-bits. When implementing a delay as a subroutine, an adjustment can be added to account for the call and return instructions.

Variable delays

If you need to use multiple delays it may be desirable to have a common delay routine that supports variable delay counts. This is provided by the var_delay_us() and var_delay_ms() macros. They are similar to the fixed delays but are not cycle accurate and have no provision for adjustment.

use_clock(50)            ; 50 MHz system clock

define(MAX_DELAY, 200)   ; Maximum 200 us delay

var_delay: var_delay_us(MAX_DELAY, s4,s5)
           return
...

load16(s4,s5, var_count_us(20, MAX_DELAY))  ; 20 us delay
call var_delay
...

load16(s4,s5, var_count_us(150, MAX_DELAY)) ; 150 us delay
call var_delay

The first argument to the var_delay_*() macros is the maximum delay value to support. When a delay is needed you must load the count registers with a constant computed with the var_count_*() macros.

String and table operations

PicoBlaze-3 doesn’t have the ability to handle strings as efficiently as PB6 because it lacks the load&return instruction but it is still necessary to work with them at times. Suppose that you have a subroutine “write_char” that writes characters in s0 out to a peripheral. You can write entire strings with the following:

callstring(write_char, s0, `My string') ; Note use of m4 quotes `' to enclose the string

This expands to the following:

load s0, "M"
call write_char
load s0, "y"
call write_char
load s0, " "
call write_char
...
load s0, "n"
call write_char
load s0, "g"
call write_char

Similarly you can call with arbitrary bytes in a table. The pbhex() macro is useful here to express hex numbers with less clutter.

calltable(write_char, s0,  pbhex(DE, AD, BE, EF))

There are four targets for string and table macros: “call”, “output”, “store”, and “inst”. They work similarly to the “call” macros above but generate output, store, or inst instructions in place of call.

callstring outputstring storestring storestringat  
calltable outputtable storetable storetableat insttable_le, insttable_be

The storestringat() and storetableat() macros take a register as a pointer to the destination scratchpad address. The pointer register is incremented after storing each byte except for the last.

constant M_DATA, 10
load s0, M_DATA
storestringat(s0, sF, `Store this') ; sF is used as a temp register

The insttable_le() and insttable_be() macros generate packed inst directives for use as static data. The former generates little-endian instructions while the latter is big-endian.

insttable_le(pbhex(0a, 0b, 0c))
; Expands to:  inst 00b0a
;              inst 0000c

insttable_be(pbhex(0a, 0b, 0c))
; Expands to:  inst 00a0b
;              inst 00c00

The insttable macros only accept a list of decimal values directly but the asciiord() macro can be used to convert strings to numeric data.

insttable_le(asciiord(`Pack strings into ROM'))
; Expands to:
  inst 06150
  inst 06b63
  inst 07320
  ...
  inst 0206f
  inst 04f52
  inst 0004d

This permits the compact storage of data bytes in the PicoBlaze ROM. If synthesized as a dual-ported block RAM, the data can be retrieved with external logic. The picoblaze_dp_rom component included with picoblaze_rom.vhdl provides a second read/write port for this purpose.

Escaped strings

The native PicoBlaze syntax does not permit the use of character escapes in strings. The macros estr() and cstr() provide a means for generating escaped strings without and with a NUL terminator respectively. They generate a list of integers representing each character in the string. The following C-style backslash escape codes are supported:

Escape Meaning
\ Literal “"
\n Newline \ Line Feed
\r Carriage Return
\b Backspace
\a Bell
\e Esc
\s Literal semicolon

On PicoBlaze-6 you can apply the output of these macros directly in a table directive as follows:

table hello#, [dec2pbhex(cstr(`Hello\r\n'))]
; This expands to: table hello#, [48, 65, 6c, 6c, 6f, 0d, 0a, 00]

table hello2#, [dec2pbhex(estr(`Hello\r\n'))]
; This expands to: table hello2#, [48, 65, 6c, 6c, 6f, 0d, 0a]

For PicoBlaze-3 you can pass the output of estr() and cstr() to the calltable(), storetable(), and outputtable() macros or use the portable string macros described next.

If you need to know the length of a string constant you can use strlenc() to generate that value. It takes a single string argument that can contain escaped characters. It is passed through estr() to remove escapes before characters are counted. strlenc() only works at compile time when passed a string literal or a named portable/packed string. It does not work at runtime on dynamic string buffers.

load s0, strlenc(`foobar\r\n') ; Expands to 8

You can also pass the label to a string defined with string() or packed_string() to retrieve their length.

packed_string(my_string, `This is a string')
load s0, strlenc(my_string) ; Expands to 16

Note

m4 has a builtin macro len() that also returns the length of strings. However, it does not account for escape characters and will include blackslashes in its count.

Portable strings

A simplified system for generating efficient, portable strings is provided by the macro package. With this you can create string handling code that will expand into the most efficient form for PicoBlaze-3 or PicoBlaze-6 allowing you to easily migrate between platforms. You must first setup the portable string system with the use_strings() macro. It configures the registers and a character handling routine used when processing a string.

use_strings() takes the following arguments:

  • Arg1: Register loaded with each character
  • Arg2, Arg3: MSB, LSB of string address (Only used on PB6. Use dummy registers for PB3)
  • Arg4: Label of a user provided function called to process each character
  • Arg5: Optional name of the macro to define new strings (default is “string”)

After configuring string handling with use_strings() you must define each string using the string() macro. It takes two arguments. The first is a label to identify the string and the second is the string. You can use any of the escapes supported by estr() and cstr() in a string. Strings are reproduced by calling them with the label used in their definition. Labels should not end with a “$” like with the string directive.

jump main
use_strings(s0, s5,s6, write_char)

proc write_char(s0) {
  output s0, 00
}

string(hello, `Hello world\r\n') ; Define a string called "hello"

main:
...
call hello ; Call write_char on each character in the "hello" string

This expands to the following when targeting PB6:

                  JUMP main
                  ; PB6 common string handler routine
__string_handler: CALL@ (s5, s6)                 ; Read next char
                  COMPARE s0, 00                 ; Check if NUL
                  RETURN z
                  CALL write_char                ; Handle the char
                  ADD s6, 01                     ; 1
                  ADDCY s5, 00                   ; Increment address
                  JUMP __string_handler

                  ;PRAGMA function write_char [s0] begin
      write_char:
                  OUTPUT s0, 00
                  RETURN
                  ;PRAGMA function write_char end

                  ; "Hello world\r\n"
                  TABLE hello#, [48, 65, 6c, 6c, 6f, 20, 77, 6f, 72, 6c, 64, 0d, 0a, 00]
           hello: LOAD s5, _hello_STR'upper
                  LOAD s6, _hello_STR'lower
                  JUMP __string_handler
      _hello_STR: LOAD&RETURN s0, hello#         ; Define a string called `"hello"'

            main:
                  ...
                  CALL hello                     ; Call write_char on each character in the "hello" string

Note that a common string processing routine __string_handler is generated after the call to jump main and the escaped string is implemented with load&return instructions.

When targeting PB3 the following expansion results:

            JUMP main

            ;PRAGMA function write_char [s0] begin
write_char:
            OUTPUT s0, 00
            RETURN
            ;PRAGMA function write_char end

            ; "Hello world\r\n"
     hello: LOAD s0, 48
            CALL write_char
            LOAD s0, 65
            CALL write_char
            LOAD s0, 6c
            CALL write_char
            LOAD s0, 6c
            CALL write_char
            ...
            LOAD s0, 0d
            CALL write_char
            LOAD s0, 0a
            CALL write_char
            RETURN                         ; Define a string called `"hello"'

      main:
            ...
            CALL hello                     ; Call write_char on each character in the "hello" string

The PB3 version does not generate a common handler routine but instead generates code to handle each string in place using the calltable() macro.

You are limited to a single user provided function for processing each character in a string. If you need to perform different operations on strings then you will have to use a register or scratchpad value to select the desired behavior before calling the string label and write a handler routine that checks what operation is needed for each character it receives.

Packed strings

A set of macros for handling packed strings is available for use. These work similarly to the portable string macros but rely on character data packed with inst directives. This is the most efficient way to store uncompressed strings in PicoBlaze memory. Access to the data must be implemented with external hardware that can read instruction memory through a second port. The picoblaze_dp_rom component defined in picoblaze_rom.vhdl shows a way to accomplish that. The same code is generated for both PB3 and PB6.

To configure packed strings you need to call the use_packed_strings() macro. It is similar to use_strings() but you also need to provide a function that retrieves character pairs from an address in memory. Its arguments are the following:

  • Arg1: Register to store even characters (0, 2, 4, …)
  • Arg2: Register to store odd characters (1, 3, 5, …)
  • Arg3, Arg4: Registers for MSB, LSB of address to string
  • Arg5: Label of user provided function called to process each character (Only needs to handle the even char register)
  • Arg6: Label of user provided function called to read pairs of characters from memory
  • Arg7: Optional name of the macro to define new strings (default is “packed_string”)

Character pairs are stored in big-endian order. The first character in a string is stored in the upper byte of an inst directive. The read routine takes a set of registers for the address of a packed character pair. It must retrieve the INST data at that location and load the upper byte into the even character register and lower byte in the odd character register.

A common handler routine __packed_string_handler is generated so you must ensure the execution path bypasses the generated code.

After configuration you define strings with the packed_string() macro just as with the string() macro.

jump main
mem16(P_ROM, 0x0b,0x0a)            ; Define 16-bit port addresses for dual-ported ROM
use_packed_strings(s0,s1, s5,s6, write_char, read_next_chars)

proc write_char(s0) {
  output s0, 00                    ; Using register for even chars
}

proc read_next_chars(s0,s1, s5,s6) {
            output16(s5,s6, P_ROM) ; Select next address from second port
            nop
            input16(s0,s1, P_ROM)  ; Read back upper and lower byte
}

packed_string(hello, `Hello world\r\n') ; Define a packed string called "hello"

main:
...
call hello ; Call write_char on each character in the "hello" string

This expands to the following on both target processors:

            <Handler routines>

            ; "Hello world\r\n"
     hello: LOAD s5, _hello_STR'upper
            LOAD s6, _hello_STR'lower
            JUMP __packed_string_handler
_hello_STR: INST 04865
            INST 06c6c
            INST 06f20
            INST 0776f
            INST 0726c
            INST 0640d
            INST 00a00

            ; Define a packed string called `"hello"'

      main:

            CALL hello

You can see that the 13 byte string is stored into 7 instruction words providing the densest string storage possible without resorting to compression.

If you have existing code using the portable string macros, you can convert it to use packed strings by changing the macro name with the optional seventh argument:

use_packed_strings(s0,s1, s5,s6, write_char, read_next_chars, string)

Multi-function strings

Most of the previous string handling routines are hard-coded to use a single callback routine like write_char to process characters. This function does not need to be limited to just outputting data on a port. It also does not need to be limited to a single operation. You can use a register or scratchpad location to alter its behavior for different needs.

constant M_CHAR_MODE, 00
constant P_CONSOLE, FF

constant CHAR_OUT, 01
constant CHAR_COPY, 02


use_strings(s0, s5,s6, handle_char)

proc handle_char(`s0 is ch', `sA is ptr') {
  fetch _tempreg, M_CHAR_MODE
  if(_tempreg == CHAR_COPY) {
    ; Store in a scratchpad buffer
    store ch, (ptr)
    add ptr, 01
  } else { ; CHAR_OUT
    ; Write to console
    output ch, P_CONSOLE
  }
}

string(hello, `Hello again\n')

...

; Write string to a port
load_store(CHAR_OUT, M_CHAR_MODE)
call hello

; Copy string to a scratchpad buffer
load_store(CHAR_COPY, M_CHAR_MODE)
load sA, 10  ; Start address
call hello
load_store(NUL, sA) ; Write NUL to end of string buffer

Scratchpad memory operations

A set of routines are available for manipulating arrays in scratchpad memory. They are accessed by invoking a use_XXX() generator macro to create the functions with register allocations of your choice. All of these macros take an initial argument that is the name of the generated function. They all preserve their input and temporary registers on the stack unless reused for a return value.

memset

The use_memset() macro creates a function that can set an array to a fixed value.

;                 <dest> <len> <init value>
use_memset(memset, s0,     s1,     s2)
...

load s0, 20  ; Destination at 0x20 in scratchpad
load s1, 05  ; 5 bytes in the array
load s2, "A" ; Value to initialize with
call memset

After the call every byte of the array will be initialized to the contents of the value register.

memcopy

use_memcopy() creates a function to copy an array from one location to another in scratchpad.

;                  <source> <dest> <len>
use_memcopy(memcopy, s0,      s1,   s2)
...

load s0, 20 ; Source at 0x20
load s1, 10 ; Destination at 0x10
load s2, 05 ; Copy 5 bytes
call memcopy

After the call the bytes from 0x10 to 0x14 contain the data copied from 0x20 to 0x24.

memwrite

The use_memwrite() macro scans an array in scratchpad and writes the raw bytes to a fixed output port.

constant ConsolePort, FE
;                    <source> <len> <output port>
use_memwrite(memwrite, s0,      s1,   ConsolePort)

load s0, 20 ; Source array
load s1, 05 ; Writing 5 bytes
call memwrite

This performs an output to port 0xFE for each of the bytes from 0x20 to 0x24.

hexwrite

Similar to use_memwrite() is the use_hexwrite() macro. It writes an array of bytes converted to ASCII hex values. This macro destructively modifies the global _tempreg register.

;                    <source> <len> <output port>
use_hexwrite(hexwrite, s0,      s1,  ConsolePort)
...

load_store(0x5A, 0x20)
load_store(0x11, 0x21)
load_store(0x42, 0x22)

load s0, 20 ; Source array
load s1, 03 ; Writing 3 bytes
call hexwrite

This writes the string “5A1142” to the output port. Every byte expands into two hex digits.

bcdwrite

Another similar output routine is the use_bcdwrite() macro. It writes an array to an output port but treats the bytes as unpacked BCD digits. Each digit is converted to an ASCII digit before writing to the port. Any leading 0 digits are skipped. Invalid BCD digits are not detected.

;                    <source> <len> <output port>
use_bcdwrite(bcdwrite, s0,      s1 , ConsolePort)
...

load_store(0x00, 0x20)
load_store(0x01, 0x21)
load_store(0x05, 0x22)

load s0, 20 ; Source array
load s1, 03 ; Writing 3 bytes
call bcdwrite

This converts the array to ASCII characters and sends “15” to the output port. This is useful for printing the output from int2bcd described below.

BCD conversion

A pair of generator macros create functions for converting between unsigned integers and unpacked BCD. They are designed to work with arbitrary sized integers consisting of one or more bytes. The use_int2bcd() macro takes a list of integer bytes on the stack and writes the BCD representation into a fixed size buffer.

;             <fixed array len> <dest> <integer bytes> <temp regs>
use_int2bcd(int2bcd, 5,           s0,       s1,        s2,s3,s4,s5)
...

load s0, 20  ; Use buffer from 0x20 to 0x24
load s1, 02  ; Convert 16-bit integer (2 bytes)
load16(s4,s3, 30789)
push(s3, s4) ; Place integer on stack low byte first, high byte last (on top)
call int2bcd

After conversion the array at scratchpad 0x20 contains the hex values [03 00 07 08 09]. This result can then be processed by bcdwrite to write an integer value out to a port. The result is right justified in the array with leading 0’s for any unused digits. No error detection is performed if the result requires more digits than the generator macro was defined to use.

load16(s4,s3, 512)
push(s3, s4)
call int2bcd

The result is [00 00 05 01 12] at 0x20.

For converting numeric string inputs to binary, a pair of generator macros can be used. First is use_ascii2bcd() which will convert a numeric ASCII string into BCD format.

;                    <Array addr> <len>
use_ascii2bcd(ascii2bcd, s0,        s1)

load_store("X", 0x20) ; Simulate text input
load_store("1", 0x21)
load_store("2", 0x22)
load_store("4", 0x23)
load_store("9", 0x24)

load s0, 20 ; Use array at 0x20
load s1, 05 ; Convert 5 characters from 0x20 to 0x24
call ascii2bcd

The resulting array contains BCD: [00 01 02 04 09]. Any non-digit characters in the string are converted to 0.

The use_bcd2int() macro is used to convert from BCD to an integer. This finishes the conversion of numeric string input into a usable integer value after first converting ASCII to BCD using ascii2bcd.

;                <Array addr> <len> <temp regs>
use_bcd2int(bcd2int, s0,       s1,   s2,s3,s4,s5,s6)

load s0, 20 ; Use array at 0x20
load s1, 05 ; Convert 5 digits from 0x20 to 0x24
call bcd2int

The converted integer value is overwritten into the array from left to right, destroying some of the BCD digits. The first byte in the array is the least significant. The total number of converted binary integer bytes is returned in the length register (s1 in this case). After conversion the array contains [E1 04 02 04 09]. 0x04E1 is 1249 from the original ASCII string. The integer result is guaranteed to always be smaller than the largest BCD number that will fit in an array (999…) so an overflow is impossible.

8-bit arithmetic

The not() and negate() macros are available to perform logical inversion and 2’s complement negation on 8-bit registers. The abs() macro produces the absolute value of signed registers.

You can perform signed comparison with the compares() macro. It takes the same arguments as the native compare instruction. The C flag is set in accordance with their signed relationship. However, the Z flag is not set correctly. Use the compare instruction to test for equality or inequality of signed values.

If you need to convert an 8-bit signed value to 16-bit, use the signex(MSB, LSB) macro to extend the sign bit onto the upper register. The 8-bit register to be extended is passed in as the LSB argument.

16-bit arithmetic

The need will frequently arise to handle values larger than the capacity of an 8-bit register. The following macros provide quick access to 16-bit operations.

You can define aliases for pairs of 8-bit registers with reg16() and then pass them into the 16-bit arithmetic macros:

reg16(rx, s4, s3)      ; Virtual 16-bit register rx is composed of (s4, s3)
reg16(ry, s6, s5)

load16(rx, 1000)
load16(ry, 3000 + 500) ; You can use arbitrary expressions for constants
add16(rx, ry)          ; rx = rx + ry
add16(rx, -100)        ; rx = rx + (-100)

This is much less obtuse than manually calculating 16-bit constants and repeatedly implementing the operations in pieces. The virtual name always expands into its original two registers and can be used on any macro that takes an MSB,LSB register pair as an argument.

You can retrieve the upper and lower byte registers indirectly from a virtual name with the regupper() and reglower() macros. This makes it easy to reallocate the registers if needed.

load s0, reglower(rx) ; s0 = s3
load s1, regupper(rx) ; s1 = s4

The mem16() macro defines 16-bit constants for scratchpad and port addresses. Like reg16() it creates a new m4 macro that lets you refer to the pair of port addresses together. In addition, two constants are created with the same name suffixed with “_H” and “_L” to identify the high and low ports respectively.

mem16(M_DATA, 0x05, 0x04)
load16(rx, 1000)
store16(rx, M_DATA)

The following 16-bit functions are available. All other than not16(), negate16(), and abs16() take a constant or a 16-bit register as their second argument.

load16() reg16() mem16() add16()
sub16() and16() or16() xor16()
test16() not16() negate16() abs16()

The test16() macro is implemented differently on PicoBlaze-3 due to the lack of the testcy instruction. The Z flag is set when the AND of both bytes with the test word is zero but the C flag does not represent the XOR of all 16 bits.

A full suite of 16-bit shifts and rotates are also available. They work the same as their 8-bit equivalents.

sl0_16() sl1_16() sla_16() slx_16()
sr0_16() sr1_16() sra_16() srx_16()
rl16() rr16()    
sl0_16(rx, 4) ; Multiply by 2**4

16-bit IO

16-bit versions of the port and scratchpad I/O operations are available. You can use the mem16() macro to define pairs of memory and port addresses for simplification. The variants using a pointer register increment by two so that successive calls can be made to work on contiguous ranges of addresses.

fetch16() store16() input16() output16()
mem16(M_ACCUM, 0x1b, 0x1a)
reg16(rx, s4, s3)

fetch16(rx, M_ACCUM)  ; Fetch direct from address

load s0, M_ACCUM_L    ; Low byte constant defined by mem16()
fetch16(rx, s0)       ; Fetch from indirect pointer
fetch16(rx, s0)       ; Fetch next word

Similarly for port I/O.

mem16(P_ACCUM, 0x1b, 0x1a)

input16(rx, P_ACCUM)  ; Input direct from address

load s0, P_ACCUM_L
input16(rx, s0)       ; Input from indirect pointer
input16(rx, s0)       ; Input next word

Multiply and divide

The general purpose PicoBlaze 8x8 multiply and divide routines are made available with arbitrary register allocations to suit your needs. A set of constant multiply and divide routines can also be generated for faster results than the general purpose functions. The following macros are available:

use_multiply8x8() 8x8-bit unsigned
use_multiply8x8s() 8x8-bit signed
use_multiply8x8su() 8-bit signed x 8-bit unsigned
use_divide8x8() 8/8-bit unsigned
use_divide8x8s() 8/8-bit signed
use_divide16x8() 16/8-bit unsigned
use_divide16x8s() 16/8-bit signed
use_multiply8xk() 8-bit x constant
use_multiply8xk_small() 8-bit x constant (result less than 256)
use_divide8xk() 8-bit / constant
init:
  ...
  jump main ; Skip over our functions

  ; Configure multiply and divide functions (sE is a temp register)
  reg16(rx, s5, s4)
  use_multiply8x8(mul8, s0, s1, rx)     ; rx = s0 * s1

  use_divide8x8(div8, s0, s1, s6, s7)   ; s6 = s0 / s1  rem. s7

  use_multiply8xk(mul8k7, s0, 7, rx)        ; rx = s0 * 7 (Multiplier can be greater than 255)

  use_multiply8xk_small(mul8k7s, s0, 7, s1) ; s1 = s0 * 7 (Result must fit in one byte)

  use_divide8xk(div8k, s0, 7, s1)       ; s1 = s0 / 7 (No remainder)

main:

  load s0, 20'd
  load s1, 3'd
  call mul8    ; rx = 20 * 3

  call div8    ; s6 = 20 / 3

  call mul8k7  ; rx = 20 * 7

  call mul8k7s ; s1 = 20 * 7

  call div8k   ; s1 = 20 / 7

Expressions

A family of expression evaluator macros are provided that can implement arithmetic and other operations using pseudo-infix notation. The basic principle is borrowed from the PL360 high level assembler. You can write an assignment expression of the form expr(<target register> := <val> op <val> [op <val>]*). Spaces are required between all symbols.

val is one of:

register
literal expression (with no internal spaces)
sp[<addr>]” reverse assignment to scratchpad address
spi[<reg>]” reverse assignment to indirect scratchpad address in register

op is one of:

+, -, *, / arithmetic: add, subtract, multiply, divide
&, |, ^ bitwise operations: and, or, xor
<<, >> shifts: left and right
=: reverse assignment

Operations are evaluated from left to right with no precedence. The target register is used as the left operand of all operations. It is updated with the result after each operation.

expr(s0 := s1 + s2 =: s3 >> 2)

Arithmetic is performed on s0 at each stage. The reverse assignment to s3 captures the intermediate result of s1 + s2 and then continues with the right shift applied to s0. This expands to:

; Expression: s0 := s1 + s2 =: s3 >> 2
LOAD s0, s1
ADD s0, s2
LOAD s3, s0
SR0 s0
SR0 s0

If you want to use the existing value of a register use it as the first operand after the assignment:

load s0, 03
expr(s0 := s0 + 100)

Here are all of the expression macros available:

Macro Target x Operand Supported operators Notes
expr() 8x8 +, -, *, /, &, |, ^, <<, >>, =:  
exprs() 8x8 +, -, *, /, &, |, ^, <<, >>, =: signed *, /, and >>
expr2() 16x8 * +, -, *, /, <<, >>, =:  
expr2s() 16x8 * +, -, *, /, <<, >>, =: signed for all except <<
expr16() 16x16 +, -, &, |, ^, <<, >>, =:  
expr16s() 16x16 +, -, &, |, ^, <<, >>, =: signed >>

* The expr2 macros support 16-bit literals as operands of + and -. The first register after the assignment can be 16-bits.

16-bit registers must be comma separated register pairs in MSB,LSB order or named 16-bit registers created with reg16().

For multiplication and division support you must initialize the internal functions with one of the following:

Macro Multiply Divide
expr use_expr_mul() use_expr_div()
exprs use_expr_muls() use_expr_divs()
expr2 use_expr_mul() use_expr_div16()
expr2s use_expr_muls() and use_expr_mulsu() use_expr_div16s()

As an expedient you can invoke use_expr_all to include all of them and then eliminate any unused mul or div routines with the --remove-dead-code option to Opbasm.

These macros need to be called before any call to expr*() that uses multiplication or division. It is best to place them at the start of the program and jump over them to reach the startup code. The stack must be configured (use_stack()) before calling these macros because additional modified registers must be saved and restored.

By default these macros configure the mul and div functions to use the s8,s9 or s7,s8, and s9 registers for input and output. You can modify the register allocation by passing arguments to the use_* macros. The registers sA, sB, and sometimes sC are temporarily altered and restored. The common temp register (default sE) is destructively modified. You can change the tempreg with the use_tempreg() macro. The MSB of multiplication is ignored by subsequent operations. Division by 0 is not detected.

An example of signed expressions applied to converting temperatures:

use_stack(sF, 0x3F)
jump start

use_expr_all ; Invoke all of the mul and div routines

; Setup register aliases
reg16(rx, s0,s1)
reg16(ry, s2,s3)
vars(s4 is celsius, s5 is fahrenheit)

; Convert temperature
c_to_f:
  load reglower(rx), celsius     ; Load 8-bit Celsius temperature
  signex(rx)                     ; Sign extend to 16-bits
  expr2s(rx := rx * 9 / 5 + 32)  ; Perform 16x8-bit signed arithmetic to get Fahrenheit
  return

c_to_f_fast: ; Saves approx. 130 instructions compared to c_to_f with multiply
  load reglower(ry), celsius     ; Load 8-bit Celsius temperature
  signex(ry)                     ; Sign extend to 16-bits
  expr16s(rx := ry << 3 + ry)    ; Multiply by 9 with shift and add
  expr2s(rx := rx / 5 + 32)      ; Perform 16x8-bit signed arithmetic to get Fahrenheit
  return

f_to_c:
  load reglower(rx), fahrenheit  ; Load 8-bit Fahrenheit temperature
  signex(rx)                     ; Sign extend to 16-bits
  expr2s(rx := rx - 32 * 5 / 9 ) ; Perform 16x8-bit signed arithmetic to get Celsius
  return

start:
  ...

Random numbers

A pair of simple pseudo-random number generators are included in the macro package. They are implemented using the xorshift algorithm with coefficients selected for minimal code on PicoBlaze. They generate a full cycle of every value in their range except 0. use_random8() generates 8-bit numbers and use_random16() generates 16-bit. You must set a non-zero seed value to initialize the PRNGs.

namereg sA, SEED
use_random8(random, SEED)
...
load SEED, 5A    ; You should use an entropy source to set the initial seed
call random
...
call random

The new random value is in the SEED register after each call to random.

The 16-bit PRNG is similar but you must provide two additional registers for temporary values. Their contents are not preserved across calls.

namereg sA, SEEDH
namereg sB, SEEDL
reg16(SEED, SEEDH,SEEDL)
use_random16(random, SEED, sC,sD)
...
load16(SEED, 0x1234)    ; You should use an entropy source to set the initial seed
call random
...
call random

If you don’t want to dedicate a register to storing the seed you can create a wrapper that fetches from scratchpad:

constant M_SEED, 00  ; Address to store seed variable
use_random8(random_core, s0)

proc random(s0) {
  fetch s0, M_SEED
  call random_core
  store s0, M_SEED
}

load_store(M_SEED, 0x5A)    ; You should use an entropy source to set the initial seed
...
call random

Miscellaneous

A few miscellaneous utility macros are included:

Macro Description Example
nop No-operation  
clearcy() Clear the carry flag  
setcy() Set the carry flag setcy or setcy(<tmpreg>)
isnum() Test if a string is a number  
load_out() Load and output value load_out(0x01, P_uart)
load_store() Load and store value load_store(0x01, M_var)
reverse() Reverse arguments reverse(1,2,3)
swap() Swap registers swap(s0, s1)
randlabel() Random label name randlabel(PREFIX_)
uniqlabel() Unique label name uniqlabel(PREFIX_)

Manually running m4

Some users may be unable to use Opbasm due to formal release procedures requiring a “golden” assembler. The m4 macro package can still be used with other PicoBlaze assemblers by manually running code through m4:

> m4 picoblaze.m4 [input source] > expanded_macros.gen.psm

The picoblaze.m4 file is located in the opbasm_lib directory of the source distribution.