Sunday, September 9, 2012

Sep 2012 CTP Released

The Sep 2012 CTP release of Gepsio is now available! Download the latest binary from the project’s Download page.

Here’s a peek at the newest enhancements to the latest build of Gepsio:

Role Types

Role types found in schemas are now available from within Gepsio. The XbrlSchema class now contains a property called RoleTypes. This property is a collection of objects of a new class called RoleType. Objects of the RoleType class expose any role types defined in the schema.

HTTP-Based Schema Linkbase References

Gepsio now supports HTTP-based schema linkbase references. Previous releases assumed that all schema linkbase references were based on local filesystem paths.

CalculationLink.Linkbase Property

The CalculationLink class contains a new property called Linkbase, which references the LinkbaseDocument object containing the calculation link.

LinkbaseDocument.Schema Property

The LinkbaseDocument class contains a new property called Schema, which references the XbrlSchema class referencing the linkbase document.

SummationConcept.Link Property

The SummationConcept class contains a new property called Link, which references the CalculationLink object containing the summation concept.

Resolved Issues

Work Item 9401: Valid XBRL Doc Failing To Be Loaded

The latest Amazon quarterly XBRL filing passes other validation tests I used, however attempting to Load the doc in Gepsio throws an error. Since it is an Object Ref Not Set to Instance error, it is not entirely clear what caused it although the final line in the stack trace is JeffFerguson.Gepsio.QualifiedName.Equals(Object obj). I attach the filing for your ref. Any ideas on how to get around this?

A bug in the QualifiedName equality testing code failed to detect various null conditions. This has been fixed.

Work Item 7843: The given path's format is not supported

Fixed bug that allowed paths of the form "file:///C:/blah/blah/http://blah/blah.org" to be created in GetFullLinkbasePath when filings that reference remote documents are stored locally. This caused a NotSupportedException to be thrown. Supplied as a patch from Codeplex user matthewschrager.

 

Work Item 9465: WebException Objects Thrown During XbrlSchema Creation Are Not Wrapped in XbrlException Objects

The XbrlSchema constructor uses an XmlTextReader to read an XBRL schema. If the URI for the XBRL schema to be read is an HTTP-based URI, then the XmlTextReader will use the .NET Web stack to read the schema using HTTP. If something fails during that process, the .NET Web stack will throw a WebException. Thrown WebException objects were not wrapped in an XbrlException object and were consequently thrown back to the client as a WebException object.

To be clear that the issue is an XBRL issue caused by an HTTP failure, the XBRL schema creation code now creates an XbrlException object, stores the caught WebException as an inner exception to the XbrlException, and throws the XbrlException object back up to the client.

Work Item 9571: No Support for Taxonomy Role Types

Role types found in schemas are now available from within the object model. The XbrlSchema class now contains a property called RoleTypes. This property is a collection of objects of a new class called RoleType. Objects of the RoleType class expose any role types defined in the schema.

Thursday, September 6, 2012

Gepsio on Twitter

I’ve just opened a Twitter account for Gepsio announcements, so I can keep them separate from my main (unfocused) Twitter account. I’m planning on using Gepsio’s Twitter account in much the same way as I use Gepsio’s Facebook page (www.facebook.com/gepsio) in that I will be announcing blog postings, release notes, bug fixes, and answering your questions.
Follow Gepsio on Twitter at @GepsioXbrl … see you there!

Monday, September 3, 2012

Gepsio XBRL Validation Strategies: The Present and the Future

I’ve just received the following email:

I’ve had a chance by now to examine a fair number  of SEC Xbrl documents using Gepsio with my [code].  I’m consistently seeing around 9-10 % of the documents in a given processing session that fail to load …

In my project … I am processing a fixed number specific data points.

I only need a few data points with their durations, start and end dates, along with basic entity information and fiscal period focus info.   But I need this for a large number of documents.

I am wondering if the careful and thorough validation done on document.Load, which is really necessary for many users of Gepsio, might be overkill for what I am doing.  

Could some of this validation be preventing the load of documents that might be flawed in ways that would leave them still usable for my project?

I don’t know much about xml or xbrl but I get the sense that some of the validation occurring is relevant mainly to the re- presentation of the data, not so much putting the data into a db.

Does it make any sense to think about an optional overloaded version of the document.load method with parameters that could cause it to skip some validation of things that might be peripheral to processes like mine?

This is a very thought provoking approach, as it echoes a thought that I have had for some time. Let me begin by explaining Gepsio’s current approach to reporting on XBRL validation issues and follow that up with a possible design change.

Gepsio’s Current XBRL Validation Strategy

Today, Gepsio validates a loaded document against the XBRL specification after the document is loaded. An exception is thrown when the first XBRL specification violation is discovered. At a high level, the algorithm is as follows:

  1. load document as XML document
  2. if XML is invalid throw XML exception
  3. read taxonomy schema references
  4. read contexts
  5. read units
  6. read facts
  7. read footnote links
  8. validate context references
  9. if validation violation found throw XBRL exception
  10. validate unit references
  11. if validation violation found throw XBRL exception
  12. validate context time spans against period types
  13. if validation violation found throw XBRL exception
  14. validate footnote locations
  15. if validation violation found throw XBRL exception
  16. validate footnote arcs
  17. if validation violation found throw XBRL exception
  18. validate items
  19. if validation violation found throw XBRL exception

Today’s pattern asks callers to write code like this:

var myDoc = new XbrlDocument();
try
{
myDoc.Load("MyXbrlDocument.xml");
}
catch(XbrlException e)
{
Console.WriteLine("Validation Failed!");
Console.WriteLine(e.Message);
}



With today’s pattern, any exception thrown by the validation code forfeits the rest of the validation process. If a validation violation is found during the validation of unit references, for example, the context time spans, footnotes and items won’t even run through validation. Once an XBRL document is deemed invalid, then no more validation is even attempted. The analogy would be a source code compiler that stops the compilation process after the first error.


Gepsio’s Possible XBRL Validation Future


As an alternative, Gepsio could stop throwing exceptions on validation errors and simply build a list of validation errors that could be examined later. In the alternative design that I have considered (a design which is shamelessly “lifted” from the CSLA.NET business objects framework) an XBRL document loaded by Gepsio would maintain a Boolean property indicating its validity as well as a collection of validation errors. These could all be examined by the caller after Gepsio loads a document and could use this information to decide whether or not the caller can proceed with the planned operation against the loaded document. This design could make document loading look something like this:

var myDoc = new XbrlDocument();
myDoc.Load("MyXbrlDocument.xml");
if(myDoc.IsValid == false)
{
Console.WriteLine("Validation Failed!");
foreach(var validationError in myDoc.ValidationErrors)
{
Console.WriteLine(validationError);
}
}



This design takes advantage of two hypothetical additions to Gepsio’s XbrlDocument class:



  1. A read-only Boolean property called IsValid. Gepsio would set this value to true if all validation rules passed and false if at least one validation rule failed.
  2. A collection of validation error objects, with each object describing information about the error. In the simple design above, these are simple strings; however, in an actual design these may be full Gepsio-namespaced objects that describe a validation error with as much fidelity as possible to identify the root of the problem as well as the error category. Callers may want to distinguish between a unit reference error and a calculation summation error, and a simple string would not give that kind of fidelity.

In this design, no exceptions are thrown, allowing Gepsio to perform as much validation as possible.


Feedback Welcome


Your feedback is welcome. Do you like the current exception-based validation scheme? Do you favor the validity design shown above? Or do you have another idea? Add your feedback to a comment on this post or post a message to the Gepsio Facebook page at http://www.facebook.com/gepsio.