I’ve just received the following email:
I’ve had a chance by now to examine a fair number of SEC Xbrl documents using Gepsio with my [code]. I’m consistently seeing around 9-10 % of the documents in a given processing session that fail to load …
In my project … I am processing a fixed number specific data points.
I only need a few data points with their durations, start and end dates, along with basic entity information and fiscal period focus info. But I need this for a large number of documents.
I am wondering if the careful and thorough validation done on document.Load, which is really necessary for many users of Gepsio, might be overkill for what I am doing.
Could some of this validation be preventing the load of documents that might be flawed in ways that would leave them still usable for my project?
I don’t know much about xml or xbrl but I get the sense that some of the validation occurring is relevant mainly to the re- presentation of the data, not so much putting the data into a db.
Does it make any sense to think about an optional overloaded version of the document.load method with parameters that could cause it to skip some validation of things that might be peripheral to processes like mine?
This is a very thought provoking approach, as it echoes a thought that I have had for some time. Let me begin by explaining Gepsio’s current approach to reporting on XBRL validation issues and follow that up with a possible design change.
Gepsio’s Current XBRL Validation Strategy
Today, Gepsio validates a loaded document against the XBRL specification after the document is loaded. An exception is thrown when the first XBRL specification violation is discovered. At a high level, the algorithm is as follows:
- load document as XML document
- if XML is invalid throw XML exception
- read taxonomy schema references
- read contexts
- read units
- read facts
- read footnote links
- validate context references
- if validation violation found throw XBRL exception
- validate unit references
- if validation violation found throw XBRL exception
- validate context time spans against period types
- if validation violation found throw XBRL exception
- validate footnote locations
- if validation violation found throw XBRL exception
- validate footnote arcs
- if validation violation found throw XBRL exception
- validate items
- if validation violation found throw XBRL exception
Today’s pattern asks callers to write code like this:
var myDoc = new XbrlDocument();
try
{
myDoc.Load("MyXbrlDocument.xml");
}
catch(XbrlException e)
{
Console.WriteLine("Validation Failed!");
Console.WriteLine(e.Message);
}
With today’s pattern, any exception thrown by the validation code forfeits the rest of the validation process. If a validation violation is found during the validation of unit references, for example, the context time spans, footnotes and items won’t even run through validation. Once an XBRL document is deemed invalid, then no more validation is even attempted. The analogy would be a source code compiler that stops the compilation process after the first error.
Gepsio’s Possible XBRL Validation Future
As an alternative, Gepsio could stop throwing exceptions on validation errors and simply build a list of validation errors that could be examined later. In the alternative design that I have considered (a design which is shamelessly “lifted” from the CSLA.NET business objects framework) an XBRL document loaded by Gepsio would maintain a Boolean property indicating its validity as well as a collection of validation errors. These could all be examined by the caller after Gepsio loads a document and could use this information to decide whether or not the caller can proceed with the planned operation against the loaded document. This design could make document loading look something like this:
var myDoc = new XbrlDocument();
myDoc.Load("MyXbrlDocument.xml");
if(myDoc.IsValid == false)
{
Console.WriteLine("Validation Failed!");
foreach(var validationError in myDoc.ValidationErrors)
{
Console.WriteLine(validationError);
}
}
This design takes advantage of two hypothetical additions to Gepsio’s XbrlDocument class:
- A read-only Boolean property called IsValid. Gepsio would set this value to true if all validation rules passed and false if at least one validation rule failed.
- A collection of validation error objects, with each object describing information about the error. In the simple design above, these are simple strings; however, in an actual design these may be full Gepsio-namespaced objects that describe a validation error with as much fidelity as possible to identify the root of the problem as well as the error category. Callers may want to distinguish between a unit reference error and a calculation summation error, and a simple string would not give that kind of fidelity.
In this design, no exceptions are thrown, allowing Gepsio to perform as much validation as possible.
Feedback Welcome
Your feedback is welcome. Do you like the current exception-based validation scheme? Do you favor the validity design shown above? Or do you have another idea? Add your feedback to a comment on this post or post a message to the Gepsio Facebook page at http://www.facebook.com/gepsio.
Hi Jeff,
ReplyDeleteYep, I think this would be the way to go.
Using the Robustness Principle of "Be conservative in what you do, be liberal in what you accept from others" ( a principal that helped the HTML browser(and TCP in general) to handle the imperfect world of the internet).
I too keep coming across so-called "XBRL documents", that fail various rules, but still contain useful information.
Tom
This was really an interesting topic and I kinda agree with what you have mentioned here! http://verifications.io
ReplyDelete