Validating FIX Messages
2024/08/06
Patrick Delaney
A Bit About FIX
In the world of Finance, the Financial Information eXchange (FIX) protocol is used to facilitate communication between two market participants. Most users come close to FIX when they're using an Order Management System (OMS), which typically acts as an interface to a FIX engine.
At it's core, the FIX protocol is a set of structured messages that facilitate various types of transactions like orders, executions, and status updates about financial instruments.
// an example FIX message pulled from Wikipedia
8=FIX.4.2|9=65|35=A|49=SERVER|56=CLIENT|34=177|52=20090107-18:15:16|98=0|108=30|10=062|
The Object Oriented Approach
The FIX protocol has some concept of modularity and encapsulation, and messages themselves are comprised of fields that bear a resemblance to objects with attributes. This makes the protocol well suited for implementation in an object oriented language like C++ or Java.
There are only a handful of open source FIX implementations available. In this post I'll be referring to QuickFIX/J, which is the Java flavor of QuickFIX.
implementing the QuickFIX/J Application interface
QuickFIX applications are centered around quickfix.Application interface. The interface defines various callback methods for things like session management, and the processing of incoming and outgoing fix messages. Instances of the application are passed to an "acceptor" (server) or "initiator" (client).
// pulled from the quickfix/j user manual
package quickfix;
public interface Application {
void onCreate(SessionID sessionId);
void onLogon(SessionID sessionId);
void onLogout(SessionID sessionId);
void toAdmin(Message message, SessionID sessionId);
void toApp(Message message, SessionID sessionId)
throws DoNotSend;
void fromAdmin(Message message, SessionID sessionId)
throws FieldNotFound, IncorrectDataFormat, IncorrectTagValue, RejectLogon;
void fromApp(Message message, SessionID sessionId)
throws FieldNotFound, IncorrectDataFormat, IncorrectTagValue, UnsupportedMessageType;
}
handling messages with QuickFIX/J's MessageCracker
The approach to message handling in the documentation recommends utilizing the MessageCracker class which peeks at the MsgType field of a generic incoming Message and casts it to it's corresponding typed message. Then it's just a matter of specifying what types of messages the application cares about and what to do with them.
// Example application using quickfix.MessageCracker
public class MyApplication extends MessageCracker implements quickfix.Application
{
public void fromApp(Message message, SessionID sessionID)
throws FieldNotFound, UnsupportedMessageType, IncorrectTagValue {
crack(message, sessionID);
}
// Using annotation
@Handler
public void myEmailHandler(quickfix.fix50.Email email, SessionID sessionID) {
}
public void onMessage(quickfix.fix44.Email email, SessionID sessionID) {
}
}
Learning from QuickFIX
Message classes for the QuickFIX libraries are generated from a common
specification available in XML format. the C++ version of QuickFIX uses
XSLT to parse the specification, and build header files containing FIX
message classes (e.g. MarketDataRequest.h).
The anatomy of FIX message specifications
The lowest level of a FIX message is the data type. FIX data types
resemble the primitives found in most programming languages, but come
with additional constraints that can make them more complex.
Fields are the next level, and are either required or optional, and
are of a particular FIX data type.
<field number='1' name='Account' type='STRING' />
<field number='2' name='AdvId' type='STRING' />
<field number='3' name='AdvRefID' type='STRING' />
Groups are distinct collections of ordered or unordered fields.
<group name='NoHops' required='N'>
<field name='HopCompID' required='N' />
<field name='HopSendingTime' required='N' />
<field name='HopRefID' required='N' />
</group>
Components, which are comprised of fields and groups, essentially
represent properties common to many different FIX messages. Though
present in the specification files, components are not a concrete
concept within the FIX protocol in the same way that fields and groups
are.
<component name='PtysSubGrp'>
<group name='NoPartySubIDs' required='N'>
<field name='PartySubID' required='N' />
<field name='PartySubIDType' required='N' />
</group>
</component>
The Header and Trailer of a message are also comprised of fields and
components. All of these things come together to form a Message.
Transforming the FIX XML specification with Meander
Meander is a Clojure library that
provides a number of macros that use logic variables (symbols that start
with a ?) to transform data in a plain fashion. Often times when
working with code that transforms data, the shapes of inputs and outputs
aren't readily apparent. Meander does a decent job representing
transformations as data, so that transformations are easy to understand
without walking through functions. For that reason, I wanted to use it
to render the FIX specification into an intermediate
representation.
Defining Primitive Data Types Manually
There are only a handful of data types, and the QuickFIX specification files don't provide a means for constructing them, so they must be defined manually. the FIXimate tool is a useful source of information that provides specifics for each primitive. Each primitive in my case is defined as a Clojure spec:
...
(s/def ::integer #(mu/is-int? %))
(s/def ::length #(mu/is-pos-int? %))
(s/def ::tag-number #(mu/is-pos-int? %))
(s/def ::sequence-number #(mu/is-pos-int? %))
(s/def ::number-in-group #(mu/is-pos-int? %))
...
They are then added to a lookup table that maps each spec to it's name as found in the specification files. This table will be referenced while generating fields.
...
(def primitives
{"AMT" ::amount
"BOOLEAN" ::boolean
"CHAR" ::character
"COUNTRY" ::country
...
Parsing and Filtering XML
Specification files are read and then converted to EDN via
clojure.data.xml. They are then filtered in various ways to extract
data that will be handed to Meander.
(ns meriweather.parse.xml
(:require
[clojure.java.io :as io]
[clojure.data.xml :as xml]
[meander.epsilon :as m]
[meriweather.util :as mu]
[meriweather.data-types :refer [primitives]]))
;; clojure.data.xml reads lazily, this function forces reading with `doall`
(defn read-xml [file]
(let [xml (-> file io/file io/input-stream)]
(-> xml xml/parse xml-seq doall)))
(defn filter-version
[data]
(filter #(= (:tag %) :fix) data))
(defn filter-fields
[data]
(let [by-tag #(= (:tag %) :field)
by-number #(contains? (:attrs %) :number)]
(filter (apply every-pred [by-tag by-number]) data)))
;; components are defined and then referenced within the same specification,
;; filtering out the :required field ensures we only recieve definitions.
(defn filter-components
[data]
(let [by-tag #(= (:tag %) :component)
by-required #(not (contains? (:attrs %) :required))]
(filter (apply every-pred [by-tag by-required]) data)))
(defn filter-messages [data]
(filter #(= (:tag %) :message) data))
;; takes the initial conversion of xml to edn created by `read-xml` and makes it
;; less xml-y by flattening the data somewhat.
(defn xml->edn
[elem]
(m/rewrite elem
{:tag ?tag
:attrs (m/map-of !k !v)
:content (m/seqable !content ..1)}
{:tag ?tag
:children [(m/cata !content) ...] & ([!k !v] ...)}
{:tag ?tag
:attrs (m/map-of !k !v)
:content (m/pred empty?)}
{:tag ?tag & ([!k !v] ...)}))
Transforming the Data
Once the data has been ingested and flattened a little bit, each part of
the specification has it's elements transformed with Meander's
rewrite macro. the rewrite macro allows easy handling of the various
forms an input could take, which makes working with nested or
self-referential data much easier.
(defn field
[field]
(m/rewrite field
{:number ?number
:name ?name
:type ?type
:tag :field
:children (m/seqable !children ..1)}
{(m/app Integer/parseInt ?number)
{:name ~(mu/keywordize ?name)
:spec ~(get primitives ?type)
:values {& [(m/cata !children) ...]}}}
{:number ?number
:name ?name
:type ?type
:tag :field}
{(m/app Integer/parseInt ?number)
{:name ~(mu/keywordize ?name)
:spec ~(get primitives ?type)}}
{:enum ?enum
:description ?desc
:tag :value}
{?enum ~(keyword ?desc)}))
(defn component
[component]
(m/rewrite component
{:name ?name
:tag :component
:children (m/seqable !children ..1)}
{:name ~(mu/keywordize ?name)
:tag :component
:children [(m/cata !children) ...]}
{:name ?name
:required ?required
:tag ?tag
:children (m/seqable !children ..1)}
{:name ~(mu/keywordize ?name)
:tag ?tag
:required ~(mu/char->boolean ?required)
:children [(m/cata !children) ...]}
{:name ?name
:tag ?tag
:required ?required
:children (m/pred empty?)}
{:name ~(mu/keywordize ?name)
:tag ?tag
:required ~(mu/char->boolean ?required)}))
(defn message [message]
(m/rewrite message
{:name ?name
:msgtype ?msgtype
:msgcat ?msgcat
:tag :message
:children (m/seqable !children ..1)}
{:name ?name
:msgtype ?msgtype
:msgcat ~(keyword ?msgcat)
:tag :message
:children [(m/cata !children) ...]}
{:name ?name
:required ?required
:tag ?tag}
{:name ?name
:required ~(mu/char->boolean ?required)
:tag ?tag}))
Putting it Together
(def data-file (read-xml "FIX44.xml"))
(def fields (->> data-file
filter-fields
(map xml->edn)
(map field)
(into (sorted-map))))
What we end up with looks a lot like the original specification with a few differences. Most notably, fields are numerically indexed and have their primitive spec associated.
...
{1 {:name :account, :spec :meriweather.data-types/string},
2 {:name :adv-id, :spec :meriweather.data-types/string},
3 {:name :adv-ref-id, :spec :meriweather.data-types/string},
...
Now that we have our IR, we can use it to validate some FIX messages.
Validating A FIX Message
In this post, all of the FIX messages have been tagvalue encoded. Tagvalue encoding is still used today, except in use-cases where it is imperative that extremely low levels of latency can be achieved while operating on high levels of throughput. Such is the case with data flowing in and out of the NASDAQ.
Since tagvalue encoding is represented as a string, we can use a simple regex to parse a message and quickly validate that it's in the correct format. If it isn't, we don't want to bother trying to validate the field values.
(def field-pattern #"(?<tag>\d+)(?<delim>=)(?<value>[^\u0001]+)(?<SOH>\u0001)")
(defrecord Field [tag delim value soh])
(s/def ::tag (s/and string? #(re-matches #"[0-9]{0,4}" %)))
(s/def ::delim #(= "=" %))
(s/def ::value any?)
(s/def ::soh #(= (mu/hex->str %) "1"))
(s/def ::field (s/keys :req-un [::tag ::delim ::value ::soh]))
(defn str->fields [message]
(->> message
(re-seq field-pattern)
(map #(apply ->Field (rest %)))
(into [])))
(defn valid-message-format? [message]
(every? #(s/valid? ::field %) message))
Assuming that the message is in the proper format, we'll use our intermediate representation to validate that each field in the message is of a type would be expected in that kind of message.
(defn valid-field?
[definitions & {:keys [tag value]}]
(let [tag-number (Integer/parseInt tag)
field (get definitions tag-number)]
(if (contains? field :values)
(and (s/valid? (:spec field) value)
(contains? (:values field) value))
(s/valid? (:spec field) value))))
(defn valid-message? [definitions message]
(every? #(valid-field? definitions %) message))
Example Message
A common FIX message is the NewOrderSingle message. As one could imagine, this message type is used to place a single new order, perhaps within a trading terminal. Here is an example of the NewOrderSingle message according to the FIX42 specification:
(def new-order-single-42 "8=FIX.4.29=25135=D49=56=ABROKER34=252=2003061501:14:4911=123451=11111163=064=2003062121=3110=1000111=5000055=IBM48=45920010122=154=160=2003061501:14:4938=500040=144=15.7515=USD59=010=127")
After loading the IR, validating that the message is properly formatted is relatively simple:
(def field-definitions (-> "FIX42.edn" io/resource slurp edn/read-string :field))
(def parsed-message (str->fields new-order-single-42))
(valid-message-format? parsed-message) ;; => true
(valid-message? field-definitions parsed-message) ;; => true
Whats Next?
The above code doesn't enforce the order of fields within groups, which is something that would have to be accounted for before it could be used in any sort of scenario. After that, my next steps are to produce valid messages using that same intermediate representation. Ultimately, my goal is to have a FIX implementation written purely in Clojure. At the time of writing however my code leans heavily upon QuickFIX/J and leaves much to be desired.