Package monq.jfa

Class Dfa<T>

java.lang.Object
monq.jfa.Dfa<T>

public class Dfa<T> extends Object
A deterministic finite automaton, with convenience methods to match text and/or lookup values stored for a matching text. A Dfa can be created with NfaBuilder.compileToDfa().

For matching the automaton at the start of an input character source use the high level methods matchesAtStart(monq.jfa.CharSource), getAtStart(monq.jfa.CharSource), matchLength(monq.jfa.CharSource), match(monq.jfa.CharSource, java.lang.StringBuilder) and matchWithPartials(monq.jfa.CharSource, monq.jfa.RecordingMatcher<T>). All matching methods remove the matching characters from the input.

The low level access to traversing the automaton one character one automaton state at a time is transition(monq.jfa.CharSource) together with getters providing information about the traversal. These are marked with (*).

Technical Note

In contrast to an Nfa, the compilation step makes sure that there are no epsilon transitions in the state graph starting at the start state — with one exception: every stop state (state with a value) has an epsilon transition to a single last state which is not relevant for matching, but allows to use the automaton again in an NfaBuilder by pushing it with NfaBuilder.pushCopy(...).

  • Method Details

    • withIsStopValue

      public Dfa<T> withIsStopValue(Predicate<T> isStop)
      The default behavior for traversing the automaton is to treat states with a non-null value as stop states. However when using matchWithPartials, states may have values only as a marker to be recorded by that method. In this case a new predicate to exclude markers as stop states can be configured here.
      Parameters:
      isStop - to decide whether a state's value signals a stop state. The predicate will never be called with null, as a state with a null is always treated as not being a stop state.
      Returns:
      a new automaton with the given predicate.
    • getIsStopValue

      public Predicate<T> getIsStopValue()
    • dotPrint

      public void dotPrint(String filename)
    • getCh

      public int getCh()
      (*)Get latest character or control code after a call to transition(monq.jfa.CharSource).
      Returns:
      -2 if we are positioned on the start state, -1 if we hit EOF on the source, otherwise the character read by transition().
    • getCurrentValue

      public T getCurrentValue()
      (*)Provides the value stored in the current state as maintained by transition(CharSource).
      Returns:
      the value or null if the current state has no value
    • atStop

      public boolean atStop()
      (*)Returns the result of the isStop predicate provided with withIsStopValue(java.util.function.Predicate<T>) applied to getCurrentValue().
    • getOverread

      public int getOverread()
      (*)How many characters of the input we over-read by transition(monq.jfa.CharSource). false.
      Returns:
      the excess number of true returned by transition(monq.jfa.CharSource) before noticing that we cannot reach a stop state any more. At this point, the input source is already re-wound by excess characters read.
    • toDot

      public void toDot(PrintStream out)
    • matchesEmpty

      public boolean matchesEmpty()
      Returns the result of applying isStop as provided with withIsStopValue(Predicate) to the start state.
      Returns:
      true if the start state is a stop state of the automaton
    • matchesAtStart

      public boolean matchesAtStart(CharSource in) throws IOException
      Returns true, if the this automaton matches at the start of the character source.
      Throws:
      IOException
    • matchesAtStart

      public boolean matchesAtStart(String in) throws IOException
      Calls matchesAtStart(CharSource) with the wrapped string.
      Throws:
      IOException
    • getAtStart

      public T getAtStart(CharSource in) throws IOException
      If this automaton matches at the start of the input, the value stored in the stop state reached by the match is returned, otherwise null.
      Throws:
      IOException
    • getAtStart

      public T getAtStart(String in) throws IOException
      Calls getAtStart(CharSource) with the wrapped string.
      Throws:
      IOException
    • matchLength

      public int matchLength(CharSource in) throws IOException
      If this automaton matches at the start of the input, the number of matching characters is returned, otherwise -1. Reminder: 0 is the result for matching the empty string.
      Throws:
      IOException
    • matchLength

      public int matchLength(String in) throws IOException
      Calls matchLength(CharSource) with the wrapped string.
      Throws:
      IOException
    • match

      public T match(CharSource in, StringBuilder out) throws IOException
      Copies the longest matching prefix of the input to the output.

      The length of the output may not change for two reasons:

      1. no match was found or
      2. a match for the empty string was found.
      The difference can be told from the return value, which is the value associated with a match or null.

      Returns:
      value associated with the match, or null, if no match can be found.
      Throws:
      IOException - only if in.read() throws.
    • match

      public T match(String in, StringBuilder out) throws IOException
      Calls match(CharSource, StringBuilder) with the wrapped string.
      Throws:
      IOException
    • matchWithPartials

      public T matchWithPartials(CharSource in, RecordingMatcher<T> rm) throws IOException
      Convenience method to call RecordingMatcher.match(Dfa, String).
      Throws:
      IOException
    • matchWithPartials

      public T matchWithPartials(String text, RecordingMatcher<T> rm) throws IOException
      Calls the other one with the input wrapped into a CharSource.
      Throws:
      IOException
    • transition

      public boolean transition(CharSource in) throws IOException

      Reads one character from the input source and performs a transition to the next state of the automaton if possible. In this case true is returned and the following conditions hold:

      If this methods returns false, the following conditions hold:

      • The current automaton state is set to the start state, such that the next call to this method will start over with traversal. Further, all characters read since we last left a stop state are pushed back into the input.
      • getCh() returns -1 if we hit EOF or -2 if no transition was possible with the character read.
      • getCurrentValue() returns the value of the start state (see above), may be null.
      • getOverread() provides the number N of true returns after having left the last seen stop state. This allows the caller to discard information recorded for the last N transitions, as they did not lead to a further stop state.

      Hint: This method allows to implement additional matching methods similar to match(monq.jfa.CharSource, java.lang.StringBuilder) or getAtStart(monq.jfa.CharSource) which call this method in a loop for the same character source until false is returned. Changing the character source from one call to the next does not make much sense, except false was returned or reset() was called in between.

      Parameters:
      in - to read characters from
      Returns:
      true if a character could be read and allowed a state transition.
      Throws:
      IOException - if the source throws it
    • reset

      public void reset()

      Resets state traversal as follows:

      This does not need to be called, except in one circumstance: to pre-empt a traversal with transition(monq.jfa.CharSource), i.e. before it returns false. This may leave the input source used with transition in a state where only a partial match was read.