Thursday, November 17, 2011

Easy XBRL Scripting with Gepsio and PowerShell

Microsoft’s PowerShell scripting environment provides full support for using .NET objects, which makes it an ideal scripting language for use with Gepsio. PowerShell and Gepsio can work together to make working with XBRL data very easy.

Here is a simple PowerShell script that pulls an XBRL document from the SEC site and displays some simple information about the document:

Add-Type -Path "C:\GepsioPath\JeffFerguson.Gepsio.dll"

$XbrlDoc = New-Object -TypeName JeffFerguson.Gepsio.XbrlDocument

$DocumentLocation = "http://www.sec.gov/Archives/edgar/data/21344/000104746911006790/ko-20110701.xml"
Write-Host "Loading and validating" $DocumentLocation"..."
$XbrlDoc.Load($DocumentLocation)

foreach($CurrentFragment in $XbrlDoc.XbrlFragments)
{
Write-Host $CurrentFragment.Facts.Count "facts in fragment."
Write-Host $CurrentFragment.Units.Count "units in fragment."
Write-Host $CurrentFragment.Contexts.Count "contexts in fragment."
}



This script gives the following output:

Loading and validating http://www.sec.gov/Archives/edgar/data/21344/000104746911006790/ko-20110701.xml...
1081 facts in fragment.
4 units in fragment.
281 contexts in fragment.



Let’s take a look at the script in more detail. It begins by adding the Gepsio runtime into the PowerShell environment with a call to the Add-Type command:

Add-Type -Path "C:\GepsioPath\JeffFerguson.Gepsio.dll"



Every PowerShell script that uses Gepsio will need to use this command so that the Gepsio runtime can be used by the PowerShell script.


Once Gepsio is loaded, a new Gepsio XbrlDocument object is created and saved in a variable:

$XbrlDoc = New-Object -TypeName JeffFerguson.Gepsio.XbrlDocument



This statement states that a new .NET object of type JeffFerguson.Gepsio.XbrlDocument, which is the Gepsio type that represents an XBRL document, should be created and be made available in a PowerShell script variable called $XbrlDoc.


One the new XbrlDocument object is created, another PowerShell variable is created to describe the address of the document to be loaded:

$DocumentLocation = "http://www.sec.gov/Archives/edgar/data/21344/000104746911006790/ko-20110701.xml"



This statement creates a new PowerShell script variable called $DocumentLocation and assigns it a string of “http://www.sec.gov/Archives/edgar/data/21344/000104746911006790/ko-20110701.xml”. This is the address of the XBRL document that the script should load.


Once the address of the XBRL document is made available, it is loaded into the Gepsio XbrlDocument object through a call to the XbrlDocument.Load() method:

$XbrlDoc.Load($DocumentLocation)



Gepsio treats an XbrlDocument as a collection of XbrlFragment objects. Each XBRL fragment is XML data having the <xbrl> tag as its root. Generally, an XBRL document will have only one XbrlFragment, although Gepsio supports documents that may have more than one fragment. The script iterates through each fragment in the document and examines the Xbrlfragment object’s three main collections: Facts, Units and Contexts. Each collection has a property called Count, and each of these counts are displayed as script output:

foreach($CurrentFragment in $XbrlDoc.XbrlFragments)
{
Write-Host $CurrentFragment.Facts.Count "facts in fragment."
Write-Host $CurrentFragment.Units.Count "units in fragment."
Write-Host $CurrentFragment.Contexts.Count "contexts in fragment."
}



Future blog posts will explore PowerShell’s access to the Gepsio runtime in more detail.

Monday, November 7, 2011

Gepsio Available as a NuGet Package

Beginning with the recently-released Nov 2011 CTP, Gepsio is now available via NuGet as well as from the project Web site. The name of the NuGet package is JeffFerguson.Gepsio, so, from within the Package Manager Console in Visual Studio 2010, you can execute the following command:

PM> Install-Package JeffFerguson.Gepsio

You will also find it from the “Manage NuGet Packages” submenu item from the “Library Package Manager” menu item on the Visual Studio 2010 “Tools” menu. Just search for Gepsio:

image

Tuesday, November 1, 2011

Nov 2011 CTP Released

I am pleased to announce the Nov 2011 CTP release of Gepsio. This release passes 191 of the tests in the XBRL-CONF-CR3-2007-03-05 conformance suite (which is 32 more than the 159 tests that passed in the last CTP release). This release builds on the previous CTP release in the following ways:

  • First Class Support For Items and Tuples. Previous releases of Gepsio used a single class called Fact to represent a fact in an XBRL fragment. Starting with this release, the Fact class is a base class for two derived classes: one called Item and the other called Tuple. The Facts collection available from an XbrlFragment object is a collection of Fact objects, and each Fact in the collection could, in fact, be either an Item object or a Tuple object. This work brings the XBRL concept of tuples in as a first-class citizen of the Gepsio object model.
  • Complete Support For Essence Aliases. This release of Gepsio passes all of the conformance suite tests relating to the concept of essence aliases.
  • Better Support For Calculation Arc Validation. Gepsio now has a better validation engine for calculation arcs. This release now includes better handling of context-equals and unit-equals items, items with a nil value, and better support for contributing concept items found in tuples.

Download this latest version (numbered 2.1.0.5) at http://gepsio.codeplex.com and click the “Downloads” tab to grab the latest release.

Thursday, October 20, 2011

Support Added for Calculation Arcs with Destination Labels Referencing Multiple Locators

I have just checked in a change to Gepsio that correctly validates calculation arcs that use destination labels that reference more than one locator.

Take a look at the following calculation link:

<calculationLink xlink:type="extended" xlink:role="http://www.xbrl.org/2003/role/link">

<loc xlink:type="locator" xlink:href="397-ABC.xsd#A" xlink:label="summationItem" />

<loc xlink:type="locator" xlink:href="397-ABC.xsd#B" xlink:label="contributingItem" />

<loc xlink:type="locator" xlink:href="397-ABC.xsd#C" xlink:label="contributingItem" />

<calculationArc
xlink:type="arc"
xlink:arcrole="http://www.xbrl.org/2003/arcrole/summation-item"
xlink:from="summationItem"
xlink:to="contributingItem" weight="1"
/>

</calculationLink>



This example, taken from the 397.00 conformance test in the XBRL-CONF-CR3-2007-03-05 conformance suite, defines a calculation arc that arcs from “summationItem” to “contributingItem”. The issue here is that there are two locators that use the “contributingItem” label:



  • 397-ABC.xsd#B

  • 397-ABC.xsd#C

Both of these locators must participate in the calculation arc. In the current CTP of Gepsio, only the fact referenced by the first locator is used to validate the calculation arc. With the new code, both facts referenced by both locators are used to validate the calculation arc.


This improved functionality will be available in the next CTP to released (which I am currently targeting for release in the next week or so).

Tuesday, October 11, 2011

Calculation Validation Location Bug Fixed

I have checked in code that fixes a bug in the validation of calculation arc values in Gepsio.

In the current CTP, the value of a xlink:href attribute of a locator element for a calculation arc is used to find the appropriate contributing concept. This is in error; in fact, the xlink:label attribute must be used instead. This bug hasn’t been caught until now because many of the XBRL documents in XBRL-CONF-CR3-2007-03-05 conformance suite use the same value for the xlink:href and label attributes. The 305.07 test, for example, references a calculation linkbase document that includes markup as follows:

<loc
xlink:type="locator"
xlink:href="305_07_decimals_test.xsd#decimals_Land"
xlink:label="decimals_Land"
/>



As you can see, the resource ID of the value of the xlink:href attribute (decimals_Land) matches the value of the xlink:label attribute (decimals_Land), so it didn’t matter which value Gepsio used to find the correct element.


The 395.01 test, however, uses different location markup:

<loc
xlink:type="locator"
xlink:href="SummationItem.xsd#CurrentAsset"
xlink:label="labelCurrentAsset"
xlink:title="CurrentAsset"
/>



In this markup, the resource ID of the value of the xlink:href attribute (CurrentAsset) does not match the value of the xlink:label attribute (labelCurrentAsset) and Gepsio’s use of the resource ID is in error. Gepsio must use the value of the xlink:label attribute to find the correct element.


This bug has been fixed and will be available in the next CTP of Gepsio (which I am currently planning on releasing on Nov 01 2011).

Friday, October 7, 2011

Essence Alias Support Complete

I have just checked in code that completes Gepsio’s support of the XBRL essence alias concept. All of the essence alias conformance tests found in the XBRL-CONF-CR3-2007-03-05 conformance test suite (numbered with the 392 prefix) now behave as expected when parsed by Gepsio.

The next CTP of Gepsio will include all of this work and will correctly validate essence aliases found in XBRL documents.

Thursday, October 6, 2011

Dates Available For Instant Period Contexts

I have just checked in a change to the Context class that will appear in the next CTP. The Context class now includes a public property called InstantDate, which exposes a DateTime value. The value of the InstantDate property for a Context reflects the date given for an instant period. This value is valid only when the Context reflects an Instant period. You will be able to write code like the following:

if(CurrentContext.InstantPeriod == true)
{
var ContextDate = CurrentContext.InstantDate;
}

Friday, September 30, 2011

Facts, Items and Tuples

The current CTP of Gepsio (which, as of this writing, is the Feb 2011 CTP) exposes a collection of Fact objects as a property in an XbrlFragment. With this arrangement, you can write code like this:

var MyXbrlDocument = new XbrlDocument();
MyXbrlDocument.Load("xbrldata.xml");
foreach(var CurrentFragment in MyXbrlDocument.XbrlFragments)
{
foreach(var CurrentFact in CurrentFragment.Facts)
{
// Ooh! An XBRL fact!
}
}



I have been working on adding support for tuples in the next CTP, and, in doing so, have come to realize that this scenario is incomplete.


The Fact class in the current CTP is a standalone class. It doesn’t inherit from any other class, nor is it a base class for a derived class. The issue with the current CTP is that the only facts that it reads and exposes are single-value facts. Any tuples in the XBRL document are ignored and not included as a Fact instance.


In actuality, the XBRL specification supports two types of facts:



  • single-value facts, called items

  • multiple-value facts, called tuples

The next CTP will provide support for tuples, and, as such, Gepsio’s understanding of a fact will be more complete. In the next CTP, the Fact class will remain, but it will serve as a base class for a new class called Item and another new class called Tuple:

public class Fact
{
// magic happens here
}

public class Item : Fact
{
// more magic happens here
}

public class Tuple : Fact
{
// even more magic happens here
}



Fact collections will remain in place, but Fact items can be examined to determine whether they are, in fact, items or tuples. This updated design will allow Gepsio to detect tuples in loaded XBRL documents and perform the appropriate work needed to expose tuples to Gepsio clients.

Thursday, September 29, 2011

Rounded Values of Essence Aliased Facts Now Checked

I have just checked in code that will be released with the next CTP of Gepsio. The code validates the rounded values (that is, fact values with any applicable rounding and truncation applied) of essence aliased facts and throws an exception if those values do not match.

For example, consider the following XBRL document:

<xbrl xmlns="http://www.xbrl.org/2003/instance" xmlns:link="http://www.xbrl.org/2003/linkbase" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:example="http://example.com/xbrl/taxonomy/EssenceAlias" xmlns:iso4217="http://www.xbrl.org/2003/iso4217" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://example.com/xbrl/taxonomy/EssenceAlias EssenceAlias.xsd">
<link:schemaRef xlink:href="EssenceAlias.xsd" xlink:type="simple"/>
<unit id="u1">
<measure>iso4217:USD</measure>
</unit>
<context id="c1">
<entity>
<identifier scheme="www.example.com">example</identifier>
</entity>
<period>
<instant>2003-03-31</instant>
</period>
</context>
<example:CurrentDeferredIncomeTaxExpense contextRef="c1" unitRef="u1" precision="4">100.0</example:CurrentDeferredIncomeTaxExpense>
<example:ForeignDomesticIncomeTaxExpense contextRef="c1" unitRef="u1" precision="3">100</example:ForeignDomesticIncomeTaxExpense>
<example:TaxExpense contextRef="c1" unitRef="u1" precision="3">200</example:TaxExpense>
</xbrl>



Now suppose that the CurrentDeferredIncomeTaxExpense and TaxExpense facts are paired in an essence alias relationship, as shown in the following linkbase document:

<?xml version="1.0" encoding="UTF-8"?>
<linkbase xmlns="http://www.xbrl.org/2003/linkbase" xmlns:xbrll="http://www.xbrl.org/2003/linkbase" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.xbrl.org/2003/linkbase ../lib/xbrl-linkbase-2003-12-31.xsd">
<definitionLink xlink:type="extended" xlink:role="http://www.xbrl.org/2003/role/link">
<loc xlink:type="locator" xlink:href="EssenceAlias.xsd#TaxExpense" xlink:label="labelTaxExpense" xlink:title="TaxExpense"/>
<loc xlink:type="locator" xlink:href="EssenceAlias.xsd#ForeignDomesticIncomeTaxExpense" xlink:label="labelForeignDomesticIncomeTaxExpense" xlink:title="ForeignDomesticIncomeTaxExpense"/>
<loc xlink:type="locator" xlink:href="EssenceAlias.xsd#CurrentDeferredIncomeTaxExpense" xlink:label="labelCurrentDeferredIncomeTaxExpense" xlink:title="CurrentDeferredIncomeTaxExpense"/>
<definitionArc xlink:type="arc" xlink:arcrole="http://www.xbrl.org/2003/arcrole/essence-alias" xlink:from="labelTaxExpense" xlink:to="labelCurrentDeferredIncomeTaxExpense" priority="0"/>
<definitionArc xlink:type="arc" xlink:arcrole="http://www.xbrl.org/2003/arcrole/essence-alias" xlink:from="labelTaxExpense" xlink:to="labelForeignDomesticIncomeTaxExpense" priority="0"/>
</definitionLink>
</linkbase>



The XBRL document is invalid, because the two facts are in an invalid essence alias relationship. The relationship is invalid because the rounded values of the facts differ. In the current CTP, this issue is not reported by Gepsio. Starting with the next CTP, this issue will be reported during validation through an XbrlException thrown back to the caller. The exception will contain a message describing the mismatched values. For the document above, Gepsio will provide a message that reads as follows:


Facts named TaxExpense are defined as being in an essence alias relationship with facts named CurrentDeferredIncomeTaxExpense. However, the fact with ID  has a rounded value of 200, which differs from the fact with ID , which has a rounded value of 100. These two facts are therefore not in a valid essence alias relationship.


(Gepsio will include the fact IDs in the message. The document shown above has no IDs for the facts, so that part of the message is empty.)

Tuesday, May 17, 2011

Calculation Arc Validation Errors Are Common?

I ran across this tweet from a gentleman named Tom Gleeson about Gepsio's support for calculation arcs. Gepsio was failing to validate one of Oracle's XBRL documents posted through EDGAR. Although the Feb 2011 Gepsio CTP has passed the the Section 320 Calculation Binding tests in the XBRL-CONF-CR3-2007-03-05 conformance suite, I was initially confident in Gepsio's calculation arc validation code, though I wouldn't be surprised to find an issue that the tests don't cover. I'm certainly not smug enough to say "Gepsio must be right and the document must be wrong".

In this follow-up tweet, Tom alludes to the fact that Gepsio failed to validate one of the document's calculation arcs but also mentions that this is a "unfortunately very common" problem. In another follow-up tweet, he mentions that this same data is viewable in Arelle without any problems.

I was intrigued by Tom's comment that calculation arc validation errors are a common problem. Since I am still learning about XBRL and the community in which it lives, I wanted to open some questions for discussion:
  1. Are calculation arc validation errors a common problem within documents generated by the XBRL community, or is this simply an issue that will resolve itself as Gepsio matures?
  2. Would it be beneficial, as Tom suggests in a post, for Gepsio to support various "levels of correctness" whereby, in a more lenient validation mode, validation errors are overlooked?
Your comments are welcome and encouraged! Simply add a comment to this blog post.

I will mention that, in the general case, Gepsio validates the document after most (if not all) properties have been populated. This implies that, if you place a try exception block around the code that loads an XBRL document into Gepsio, and the catch block catches an XbrlException instance that references a problem with the validation, your code may still be able to carry on with at least some of its work, since many of the XbrlFragment properties will be populated and ready for use. The XbrlDocument instance should, in the general case, be in a known, stable state that will allow for its continued use in code. (This suggests that Gepsio should distinguish between exceptions that keep the XbrlDocument instance in a stable state from exceptions that have destabilized the loaded object model - I'll consider that idea for a future version of Gepsio.)

Special thanks to Tom for trying Gepsio and for alerting me to the issue!

Wednesday, February 2, 2011

Next Up: Essence-Alias Arcs

The first plan for the next CTP is to support essence-alias arcs as defined in section 5.2.6.2.2 of the XBRL spec. The XBRL-CONF-CR3-2007-03-05 conformance suite used in the largest of the Gepsio unit tests contains 17 examples of valid and invalid XBRL documents using the essence-alias arc, and the current Gepsio CTP does not pass all of the 17 tests (in fact, is passes the first five of the 17 essence-alias tests, but that’s pure coincidence, since those documents are marked as valid and Gepsio is currently not looking at essence-alias constructs).

This task will also lay some groundwork for removing more hardcoded parsing of XBRL schema elements. You may remember that the Feb 2011 CTP moved Gepsio from the hardcoded interpretation of schema elements over to support provided by the XmlSchema and XmlSchemaSet classes in the .NET Framework. While this work was successful and led to a drastic improvement in Gepsio’s honoring of complex XBRL taxonomies, the definition arcs are still currently interpreted without the help of the .NET Framework. Any of the definition arcs found in the <appinfo> element of a schema are manually discovered and interpreted. This includes the footnote and calculation arcs that Gepsio already supports.

Gepsio’s support for the essence-alias arcs will be implemented based on their discovery within a compiled XmlSchemaSet (once I figure out where schema appinfo is buried inside the XmlSchemaSet object graph). Once Gepsio can find and interpret essence-alias appinfo within a compiled XmlSchemaSet, the design pattern will exist to allow the eventual refactoring of the existing appinfo support code to use similar constructs.

While I am on the subject of refactoring, I must tip my virtual hat to the XBRL-CONF-CR3-2007-03-05 conformance suite. It has been an invaluable tool in ensuring that Gepsio is behaving as the XBRL spec mandates. I don’t think that I would have taken on the work of moving from a manually interpreted XBRL schema to a schema interpreted by .NET for the Feb 2011 CTP if I weren’t confident that the conformance suite would catch errors that I introduced during that refactoring process. If I ever were to implement another specification like XBRL, I would certainly ensure that a conformance suite is available to validate my work. The importance of unit testing through a conformance suite published by the specification committee cannot be overstated.

Monday, January 31, 2011

Gepsio Feb 2011 CTP Released

I have just released the Feb 2011 CTP of Gepsio to Codeplex. You can get it here. I added release notes to the page, and I encourage you to check those out as well.

With this release, I have moved away from PDF documentation and into a Wiki-style format which Codeplex supports nicely. You can find the table of contents here. As of this writing, the documentation is slightly out of date, but I will work on it moving forward. The wiki format allows you to see changes as I make them, rather than downloading a separate document.

Enjoy this latest release!

Sunday, January 30, 2011

Schema Refactoring Check In

The XBRL schema support task that I have mentioned in my last two posts is complete! I am very happy to report that, with this change, Gepsio will provide much better support for complex XBRL schemas, including industry standard schemas such as the US GAAP schema. This support will make it into the Feb 2011 CTP, which will be released shortly.

If you have used a previous Gepsio CTP, but were running across errors or thrown XbrlException objects with messages relating to missing types or other schema-related errors, I encourage you to try the Feb 2011 CTP and load your XBRL documents with it. I am hopeful that you will see a marked difference in the level of support Gepsio offers for ensuring that your schemas are parsed with greater accuracy. Some of the Gepsio unit tests parse XBRL documents that use the UK-GAAP-2008-01-15 taxonomy, and those documents load into Gepsio with greater parsing accuracy than ever before.

There is still schema work to be done. The classes that implement the now-obsolete homegrown type system are still in the code base (though unused), and linkbases and arcs are still parsed manually. These items will be addressed in a future release.

Tuesday, January 18, 2011

BUG: Facts From Non-Target Namespaces Not Available from Loaded XBRL Document Object Model in Jan 2011 CTP

XBRL documents that use facts originating from multiple namespaces can be loaded into Gepsio, but not all of the facts will appear in the object model in the Jan 2011 CTP.

Consider, for example, an XBRL document whose schema declares the following namespaces:

Sample XBRL Schema
  1. <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:ci-com="http://xbrl.us/stm/ci/com/2008-03-31" xmlns:ci-scf-dir="http://xbrl.us/stm/ci/scf-dir/2008-03-31" xmlns:ci-scf-indir="http://xbrl.us/stm/ci/scf-indir/2008-03-31" xmlns:ci-sfp-cls="http://xbrl.us/stm/ci/sfp-cls/2008-03-31" xmlns:ci-sheci="http://xbrl.us/stm/ci/sheci/2008-03-31" xmlns:ci-soc="http://xbrl.us/stm/ci/soc/2008-03-31" xmlns:ci-soi="http://xbrl.us/stm/ci/soi/2008-03-31" xmlns:ci-spc="http://xbrl.us/stm/ci/spc/2008-03-31" xmlns:cmi="http://www.fujitsu.com/xbrl/taxeditor/default" xmlns:dei="http://xbrl.us/dei/2008-03-31" xmlns:dei-std="http://xbrl.us/dei-std/2008-03-31" xmlns:link="http://www.xbrl.org/2003/linkbase" xmlns:ref="http://www.xbrl.org/2006/ref" xmlns:stm-all-ci="http://xbrl.us/ci/stm-all/2008-03-31" xmlns:stm-ci="http://xbrl.us/ci/stm/2008-03-31" xmlns:us-gaap="http://xbrl.us/us-gaap/2008-03-31" xmlns:us-gaap-all="http://xbrl.us/us-gaap-all/2008-03-31" xmlns:us-gaap-std="http://xbrl.us/us-gaap-std/2008-03-31" xmlns:us-roles="http://xbrl.us/us-roles/2008-03-31" xmlns:us-types="http://xbrl.us/us-types/2008-03-31" xmlns:xbrldt="http://xbrl.org/2005/xbrldt" xmlns:xbrli="http://www.xbrl.org/2003/instance" xmlns:xl="http://www.xbrl.org/2003/XLink" xmlns:xlink="http://www.w3.org/1999/xlink" elementFormDefault="qualified" targetNamespace="http://www.fujitsu.com/xbrl/taxeditor/default">

The target namespace in this example is http://www.fujitsu.com/xbrl/taxeditor/default, and the prefix for this namespace is cmi. Using the Gepsio Jan 2011 CTP, only facts in the cmi namespace will be loaded and available in the Facts collection. Facts in other namespaces will not appear in the collection.

Why is this happening?

When looking for facts in an XBRL document, Gepsio must determine which XML elements are indeed facts and which are elements defined in the XBRL specification (such as <context> and <period>). In the Jan 2011 CTP (and, indeed, all of the CTPs to this point), this determination is made by examining the current XML element’s namespace. If the namespace of the element matches the target namespace of the document’s schema, then the element is assumed to be a fact. Elements that are not a part of the target namespace are assumed to be something other than an XBRL fact, so they are not read in as a fact and not available in the Fact collection.

To continue the example using the sample XBRL schema shown above, only facts in the cmi namespace will be read in as an XBRL fact. Other elements in other namespaces, though they are facts, will not be read in as a fact and will not appear in the Fact collection. Facts using the us-gaap namespace, for example, will not appear in the Fact collection, even though they are indeed valid XBRL facts.

How will this be fixed in the future?

Gepsio’s fact loader will change in the future. Rather than looking at an XML element’s namespace to see if it matches the schema’s target namespace, the namespace URI for the XML element will be examined. If the namespace of the element lists the www.xbrl.org or www.w3.org domain, then the element will assumed to be something other than an XBRL fact. Elements using any other namespace domain will be assumed to be an XBRL fact. This rework should allow for a more complete population of a XbrlFragment object’s Fact collection.

When will this be fixed?

I am working on this bug now and am hoping to have this issue resolved in the Feb 2011 CTP.

Tuesday, January 11, 2011

XBRL Schema Support Refactoring Success

In my last post, I mentioned that I was going to embark on a task to replace the homegrown, half baked implementation of XBRL schema support in Gepsio with an implementation that heavily leverages the XmlSchema and XmlSchemaSet classes already in the .NET Framework. I have spent the last week working on that task, and I am happy to report that things are looking very positive.

The XBRL schema support in the current Gepsio CTP is written from the ground up with no support from the .NET Framework other than XmlDocument. All of the schema parsing and type system management in the current CTP uses a homegrown implementation. Since Gepsio is in a CTP state, this support was incomplete. Because of the incomplete, homegrown schema support, some complex schemas used in XBRL documents, and many data types, were not supported and caused Gepsio to throw an exception when they are encountered. Because of these issues, some folks using early versions of the CTP were blocked from using it because the schemas used by the XBRL documents that they were trying to load wouldn’t parse. I’m fairly certain, for example, that the US-GAAP schema can’t be parsed by the current Gepsio CTP.

The exciting aspect of leveraging the XmlSchema and XmlSchemaSet classes available in the .NET Framework for XBRL schema parsing and type management is that the next Gepsio CTP will have complete support for the parsing and type management aspects of complex XBRL schemas. This will give you an even better chance to use Gepsio with XBRL documents that use complex XBRL schemas and taxonomies. Better yet, the new implementation is hidden inside Gepsio’s already-existing XbrlSchema class, so this won’t be a breaking change for anyone. All of this work is being done under the hood, hidden away from you, so you can concentrate on getting information out of your XBRL documents rather than having to worry about the details of schema management. That, after all, is what Gepsio is all about.

Another benefit of this work is that I can delete lots of code from the Gepsio source code base. Any developer can tell you that deleting code is a good thing, because that means that there is less code to manage and maintain and test. Currently, there are entire classes in Gepsio focused on XBRL schema support. There is, for example, an entire set of classes that map to the various simple and complex data types that can be found in an XBRL schema. There is currently a base class called AnyType, from which many other simple and complex type classes derive. Since all of that support will now be coming from the .NET Framework itself, I can happily delete all of those type management classes.

I hope to have this task all wrapped up and available in the next CTP. At this point, I hope to release the next CTP in February 2011. I hope to announce that you should grab the next CTP simply because of the vastly improved schema and type support that it will provide.

Monday, January 3, 2011

XBRL Schemas Are Valid XML Schemas?

Way back when, when I started work on Gepsio, my initial XBRL research led me to believe that XBRL schemas were not W3C valid XML schemas. This led me to an implementation approach in which I built my own XBRL schema parser for use with Gepsio.

Now that I am older and wiser (OK, maybe just older), I am going to re-evaluate that position. I see that my as-yet-incomplete implementation of XBRL schema support within Gepsio is causing some bugs to be filed, which will lead to more code that must be written. If my initial assumption was incorrect, and all XBRL schemas are indeed W3C valid XML schemas, then all of this schema parsing and management code is a waste. It would make more sense to use the XmlSchema class in the .NET Framework and let it do all of my schema parsing and type management for Gepsio.

I’ll be taking some time this month to try and prove out the assumption that all XBRL schemas are valid W3C XML schemas. Here’s the approach I will be taking:

  • Retain the existing XbrlSchema class and related parsing and type management code.
  • Add a private field to the XbrlSchema class of type XmlSchema.
  • Along with all of the current work that Gepsio is doing with XbrlSchema, attempt to open up the same schema as a W3C XML Schema with the new XmlSchema member.
  • Run all of the Gepsio unit tests. If the XmlSchema-based parsing fails with a valid XBRL schema on any of the unit tests, then I will know that not all XBRL schemas are W3C XML schemas.

If, as I suspect, all XBRL schemas are indeed valid W3C SML schemas, then I can work to rip out the internal implementation of XbrlSchema and replace it with a new implementation that leverages XmlSchema rather than doing all of the parsing work myself. This will eliminate lots of unneeded schema management code and will keep things cleaner. In any case, the XbrlSchema class and its public interface will remain as is. Only its internal and private implementations will change so that it leverages the functionality in the .NET XmlSchema class. That would be fine with me; deleting dead code is always worthwhile.