Monday, January 31, 2011

Gepsio Feb 2011 CTP Released

I have just released the Feb 2011 CTP of Gepsio to Codeplex. You can get it here. I added release notes to the page, and I encourage you to check those out as well.

With this release, I have moved away from PDF documentation and into a Wiki-style format which Codeplex supports nicely. You can find the table of contents here. As of this writing, the documentation is slightly out of date, but I will work on it moving forward. The wiki format allows you to see changes as I make them, rather than downloading a separate document.

Enjoy this latest release!

Sunday, January 30, 2011

Schema Refactoring Check In

The XBRL schema support task that I have mentioned in my last two posts is complete! I am very happy to report that, with this change, Gepsio will provide much better support for complex XBRL schemas, including industry standard schemas such as the US GAAP schema. This support will make it into the Feb 2011 CTP, which will be released shortly.

If you have used a previous Gepsio CTP, but were running across errors or thrown XbrlException objects with messages relating to missing types or other schema-related errors, I encourage you to try the Feb 2011 CTP and load your XBRL documents with it. I am hopeful that you will see a marked difference in the level of support Gepsio offers for ensuring that your schemas are parsed with greater accuracy. Some of the Gepsio unit tests parse XBRL documents that use the UK-GAAP-2008-01-15 taxonomy, and those documents load into Gepsio with greater parsing accuracy than ever before.

There is still schema work to be done. The classes that implement the now-obsolete homegrown type system are still in the code base (though unused), and linkbases and arcs are still parsed manually. These items will be addressed in a future release.

Tuesday, January 18, 2011

BUG: Facts From Non-Target Namespaces Not Available from Loaded XBRL Document Object Model in Jan 2011 CTP

XBRL documents that use facts originating from multiple namespaces can be loaded into Gepsio, but not all of the facts will appear in the object model in the Jan 2011 CTP.

Consider, for example, an XBRL document whose schema declares the following namespaces:

Sample XBRL Schema
  1. <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:ci-com="http://xbrl.us/stm/ci/com/2008-03-31" xmlns:ci-scf-dir="http://xbrl.us/stm/ci/scf-dir/2008-03-31" xmlns:ci-scf-indir="http://xbrl.us/stm/ci/scf-indir/2008-03-31" xmlns:ci-sfp-cls="http://xbrl.us/stm/ci/sfp-cls/2008-03-31" xmlns:ci-sheci="http://xbrl.us/stm/ci/sheci/2008-03-31" xmlns:ci-soc="http://xbrl.us/stm/ci/soc/2008-03-31" xmlns:ci-soi="http://xbrl.us/stm/ci/soi/2008-03-31" xmlns:ci-spc="http://xbrl.us/stm/ci/spc/2008-03-31" xmlns:cmi="http://www.fujitsu.com/xbrl/taxeditor/default" xmlns:dei="http://xbrl.us/dei/2008-03-31" xmlns:dei-std="http://xbrl.us/dei-std/2008-03-31" xmlns:link="http://www.xbrl.org/2003/linkbase" xmlns:ref="http://www.xbrl.org/2006/ref" xmlns:stm-all-ci="http://xbrl.us/ci/stm-all/2008-03-31" xmlns:stm-ci="http://xbrl.us/ci/stm/2008-03-31" xmlns:us-gaap="http://xbrl.us/us-gaap/2008-03-31" xmlns:us-gaap-all="http://xbrl.us/us-gaap-all/2008-03-31" xmlns:us-gaap-std="http://xbrl.us/us-gaap-std/2008-03-31" xmlns:us-roles="http://xbrl.us/us-roles/2008-03-31" xmlns:us-types="http://xbrl.us/us-types/2008-03-31" xmlns:xbrldt="http://xbrl.org/2005/xbrldt" xmlns:xbrli="http://www.xbrl.org/2003/instance" xmlns:xl="http://www.xbrl.org/2003/XLink" xmlns:xlink="http://www.w3.org/1999/xlink" elementFormDefault="qualified" targetNamespace="http://www.fujitsu.com/xbrl/taxeditor/default">

The target namespace in this example is http://www.fujitsu.com/xbrl/taxeditor/default, and the prefix for this namespace is cmi. Using the Gepsio Jan 2011 CTP, only facts in the cmi namespace will be loaded and available in the Facts collection. Facts in other namespaces will not appear in the collection.

Why is this happening?

When looking for facts in an XBRL document, Gepsio must determine which XML elements are indeed facts and which are elements defined in the XBRL specification (such as <context> and <period>). In the Jan 2011 CTP (and, indeed, all of the CTPs to this point), this determination is made by examining the current XML element’s namespace. If the namespace of the element matches the target namespace of the document’s schema, then the element is assumed to be a fact. Elements that are not a part of the target namespace are assumed to be something other than an XBRL fact, so they are not read in as a fact and not available in the Fact collection.

To continue the example using the sample XBRL schema shown above, only facts in the cmi namespace will be read in as an XBRL fact. Other elements in other namespaces, though they are facts, will not be read in as a fact and will not appear in the Fact collection. Facts using the us-gaap namespace, for example, will not appear in the Fact collection, even though they are indeed valid XBRL facts.

How will this be fixed in the future?

Gepsio’s fact loader will change in the future. Rather than looking at an XML element’s namespace to see if it matches the schema’s target namespace, the namespace URI for the XML element will be examined. If the namespace of the element lists the www.xbrl.org or www.w3.org domain, then the element will assumed to be something other than an XBRL fact. Elements using any other namespace domain will be assumed to be an XBRL fact. This rework should allow for a more complete population of a XbrlFragment object’s Fact collection.

When will this be fixed?

I am working on this bug now and am hoping to have this issue resolved in the Feb 2011 CTP.

Tuesday, January 11, 2011

XBRL Schema Support Refactoring Success

In my last post, I mentioned that I was going to embark on a task to replace the homegrown, half baked implementation of XBRL schema support in Gepsio with an implementation that heavily leverages the XmlSchema and XmlSchemaSet classes already in the .NET Framework. I have spent the last week working on that task, and I am happy to report that things are looking very positive.

The XBRL schema support in the current Gepsio CTP is written from the ground up with no support from the .NET Framework other than XmlDocument. All of the schema parsing and type system management in the current CTP uses a homegrown implementation. Since Gepsio is in a CTP state, this support was incomplete. Because of the incomplete, homegrown schema support, some complex schemas used in XBRL documents, and many data types, were not supported and caused Gepsio to throw an exception when they are encountered. Because of these issues, some folks using early versions of the CTP were blocked from using it because the schemas used by the XBRL documents that they were trying to load wouldn’t parse. I’m fairly certain, for example, that the US-GAAP schema can’t be parsed by the current Gepsio CTP.

The exciting aspect of leveraging the XmlSchema and XmlSchemaSet classes available in the .NET Framework for XBRL schema parsing and type management is that the next Gepsio CTP will have complete support for the parsing and type management aspects of complex XBRL schemas. This will give you an even better chance to use Gepsio with XBRL documents that use complex XBRL schemas and taxonomies. Better yet, the new implementation is hidden inside Gepsio’s already-existing XbrlSchema class, so this won’t be a breaking change for anyone. All of this work is being done under the hood, hidden away from you, so you can concentrate on getting information out of your XBRL documents rather than having to worry about the details of schema management. That, after all, is what Gepsio is all about.

Another benefit of this work is that I can delete lots of code from the Gepsio source code base. Any developer can tell you that deleting code is a good thing, because that means that there is less code to manage and maintain and test. Currently, there are entire classes in Gepsio focused on XBRL schema support. There is, for example, an entire set of classes that map to the various simple and complex data types that can be found in an XBRL schema. There is currently a base class called AnyType, from which many other simple and complex type classes derive. Since all of that support will now be coming from the .NET Framework itself, I can happily delete all of those type management classes.

I hope to have this task all wrapped up and available in the next CTP. At this point, I hope to release the next CTP in February 2011. I hope to announce that you should grab the next CTP simply because of the vastly improved schema and type support that it will provide.

Monday, January 3, 2011

XBRL Schemas Are Valid XML Schemas?

Way back when, when I started work on Gepsio, my initial XBRL research led me to believe that XBRL schemas were not W3C valid XML schemas. This led me to an implementation approach in which I built my own XBRL schema parser for use with Gepsio.

Now that I am older and wiser (OK, maybe just older), I am going to re-evaluate that position. I see that my as-yet-incomplete implementation of XBRL schema support within Gepsio is causing some bugs to be filed, which will lead to more code that must be written. If my initial assumption was incorrect, and all XBRL schemas are indeed W3C valid XML schemas, then all of this schema parsing and management code is a waste. It would make more sense to use the XmlSchema class in the .NET Framework and let it do all of my schema parsing and type management for Gepsio.

I’ll be taking some time this month to try and prove out the assumption that all XBRL schemas are valid W3C XML schemas. Here’s the approach I will be taking:

  • Retain the existing XbrlSchema class and related parsing and type management code.
  • Add a private field to the XbrlSchema class of type XmlSchema.
  • Along with all of the current work that Gepsio is doing with XbrlSchema, attempt to open up the same schema as a W3C XML Schema with the new XmlSchema member.
  • Run all of the Gepsio unit tests. If the XmlSchema-based parsing fails with a valid XBRL schema on any of the unit tests, then I will know that not all XBRL schemas are W3C XML schemas.

If, as I suspect, all XBRL schemas are indeed valid W3C SML schemas, then I can work to rip out the internal implementation of XbrlSchema and replace it with a new implementation that leverages XmlSchema rather than doing all of the parsing work myself. This will eliminate lots of unneeded schema management code and will keep things cleaner. In any case, the XbrlSchema class and its public interface will remain as is. Only its internal and private implementations will change so that it leverages the functionality in the .NET XmlSchema class. That would be fine with me; deleting dead code is always worthwhile.