Thursday, June 6, 2013

New Document Validation Design Shipping In Next CTP

I have, after a long break, checked in a pretty significant changeset for Gepsio that will change how consumers check the validity of a loaded XBRL document. The idea was originally discussed in this post, and, after some work, this scheme has been put into place.

The XbrlException class will no longer be supported with this change, and code currently making use of this exception class will have to change to remove it. Instead of catching an XbrlException instance when a document is loaded, do the following:

  • Load an XBRL document as usual.
  • Check the state of the XbrlDocument’s IsValid property. The IsValid property will be true if the document is valid according to XBRL conformance rules and false if the document is not valid.
  • If the document is not valid, the XbrlDocument’s ValidationErrors collection will contain more information on the validation errors. This collection is a .NET collection of ValidationError objects, each of which contains a property called Message which describes the error.

Using C#, XBRL document loading and validation code will look something like this:

newXbrlDocument.Load(instanceXmlSourceFullPath);
if(newXbrlDocument.IsValid == false)
{
    foreach (var currentValidationError in xbrlDoc.ValidationErrors)
    {
        var validationMessage = currentValidationError.Message;
        // display message
    }
}

The ValidationError class is a base class, and, depending on the error, certain derived classes may contain more context to describe the error. For example, validation errors in calculation linkbases might reference the contributing facts and the summation fact involved in the failed calculation.

The original XbrlException scheme caused Gepsio to stop validating an XBRL document as soon as the first error was found. This new scheme allows Gepsio to completely validate an XBRL document and report on all of the errors in one pass.

This change will ship with the next CTP, which I hope to release soon. It’s been a while.

Sunday, September 9, 2012

Sep 2012 CTP Released

The Sep 2012 CTP release of Gepsio is now available! Download the latest binary from the project’s Download page.

Here’s a peek at the newest enhancements to the latest build of Gepsio:

Role Types

Role types found in schemas are now available from within Gepsio. The XbrlSchema class now contains a property called RoleTypes. This property is a collection of objects of a new class called RoleType. Objects of the RoleType class expose any role types defined in the schema.

HTTP-Based Schema Linkbase References

Gepsio now supports HTTP-based schema linkbase references. Previous releases assumed that all schema linkbase references were based on local filesystem paths.

CalculationLink.Linkbase Property

The CalculationLink class contains a new property called Linkbase, which references the LinkbaseDocument object containing the calculation link.

LinkbaseDocument.Schema Property

The LinkbaseDocument class contains a new property called Schema, which references the XbrlSchema class referencing the linkbase document.

SummationConcept.Link Property

The SummationConcept class contains a new property called Link, which references the CalculationLink object containing the summation concept.

Resolved Issues

Work Item 9401: Valid XBRL Doc Failing To Be Loaded

The latest Amazon quarterly XBRL filing passes other validation tests I used, however attempting to Load the doc in Gepsio throws an error. Since it is an Object Ref Not Set to Instance error, it is not entirely clear what caused it although the final line in the stack trace is JeffFerguson.Gepsio.QualifiedName.Equals(Object obj). I attach the filing for your ref. Any ideas on how to get around this?

A bug in the QualifiedName equality testing code failed to detect various null conditions. This has been fixed.

Work Item 7843: The given path's format is not supported

Fixed bug that allowed paths of the form "file:///C:/blah/blah/http://blah/blah.org" to be created in GetFullLinkbasePath when filings that reference remote documents are stored locally. This caused a NotSupportedException to be thrown. Supplied as a patch from Codeplex user matthewschrager.

 

Work Item 9465: WebException Objects Thrown During XbrlSchema Creation Are Not Wrapped in XbrlException Objects

The XbrlSchema constructor uses an XmlTextReader to read an XBRL schema. If the URI for the XBRL schema to be read is an HTTP-based URI, then the XmlTextReader will use the .NET Web stack to read the schema using HTTP. If something fails during that process, the .NET Web stack will throw a WebException. Thrown WebException objects were not wrapped in an XbrlException object and were consequently thrown back to the client as a WebException object.

To be clear that the issue is an XBRL issue caused by an HTTP failure, the XBRL schema creation code now creates an XbrlException object, stores the caught WebException as an inner exception to the XbrlException, and throws the XbrlException object back up to the client.

Work Item 9571: No Support for Taxonomy Role Types

Role types found in schemas are now available from within the object model. The XbrlSchema class now contains a property called RoleTypes. This property is a collection of objects of a new class called RoleType. Objects of the RoleType class expose any role types defined in the schema.

Thursday, September 6, 2012

Gepsio on Twitter

I’ve just opened a Twitter account for Gepsio announcements, so I can keep them separate from my main (unfocused) Twitter account. I’m planning on using Gepsio’s Twitter account in much the same way as I use Gepsio’s Facebook page (www.facebook.com/gepsio) in that I will be announcing blog postings, release notes, bug fixes, and answering your questions.
Follow Gepsio on Twitter at @GepsioXbrl … see you there!

Monday, September 3, 2012

Gepsio XBRL Validation Strategies: The Present and the Future

I’ve just received the following email:

I’ve had a chance by now to examine a fair number  of SEC Xbrl documents using Gepsio with my [code].  I’m consistently seeing around 9-10 % of the documents in a given processing session that fail to load …

In my project … I am processing a fixed number specific data points.

I only need a few data points with their durations, start and end dates, along with basic entity information and fiscal period focus info.   But I need this for a large number of documents.

I am wondering if the careful and thorough validation done on document.Load, which is really necessary for many users of Gepsio, might be overkill for what I am doing.  

Could some of this validation be preventing the load of documents that might be flawed in ways that would leave them still usable for my project?

I don’t know much about xml or xbrl but I get the sense that some of the validation occurring is relevant mainly to the re- presentation of the data, not so much putting the data into a db.

Does it make any sense to think about an optional overloaded version of the document.load method with parameters that could cause it to skip some validation of things that might be peripheral to processes like mine?

This is a very thought provoking approach, as it echoes a thought that I have had for some time. Let me begin by explaining Gepsio’s current approach to reporting on XBRL validation issues and follow that up with a possible design change.

Gepsio’s Current XBRL Validation Strategy

Today, Gepsio validates a loaded document against the XBRL specification after the document is loaded. An exception is thrown when the first XBRL specification violation is discovered. At a high level, the algorithm is as follows:

  1. load document as XML document
  2. if XML is invalid throw XML exception
  3. read taxonomy schema references
  4. read contexts
  5. read units
  6. read facts
  7. read footnote links
  8. validate context references
  9. if validation violation found throw XBRL exception
  10. validate unit references
  11. if validation violation found throw XBRL exception
  12. validate context time spans against period types
  13. if validation violation found throw XBRL exception
  14. validate footnote locations
  15. if validation violation found throw XBRL exception
  16. validate footnote arcs
  17. if validation violation found throw XBRL exception
  18. validate items
  19. if validation violation found throw XBRL exception

Today’s pattern asks callers to write code like this:

var myDoc = new XbrlDocument();
try
{
myDoc.Load("MyXbrlDocument.xml");
}
catch(XbrlException e)
{
Console.WriteLine("Validation Failed!");
Console.WriteLine(e.Message);
}



With today’s pattern, any exception thrown by the validation code forfeits the rest of the validation process. If a validation violation is found during the validation of unit references, for example, the context time spans, footnotes and items won’t even run through validation. Once an XBRL document is deemed invalid, then no more validation is even attempted. The analogy would be a source code compiler that stops the compilation process after the first error.


Gepsio’s Possible XBRL Validation Future


As an alternative, Gepsio could stop throwing exceptions on validation errors and simply build a list of validation errors that could be examined later. In the alternative design that I have considered (a design which is shamelessly “lifted” from the CSLA.NET business objects framework) an XBRL document loaded by Gepsio would maintain a Boolean property indicating its validity as well as a collection of validation errors. These could all be examined by the caller after Gepsio loads a document and could use this information to decide whether or not the caller can proceed with the planned operation against the loaded document. This design could make document loading look something like this:

var myDoc = new XbrlDocument();
myDoc.Load("MyXbrlDocument.xml");
if(myDoc.IsValid == false)
{
Console.WriteLine("Validation Failed!");
foreach(var validationError in myDoc.ValidationErrors)
{
Console.WriteLine(validationError);
}
}



This design takes advantage of two hypothetical additions to Gepsio’s XbrlDocument class:



  1. A read-only Boolean property called IsValid. Gepsio would set this value to true if all validation rules passed and false if at least one validation rule failed.
  2. A collection of validation error objects, with each object describing information about the error. In the simple design above, these are simple strings; however, in an actual design these may be full Gepsio-namespaced objects that describe a validation error with as much fidelity as possible to identify the root of the problem as well as the error category. Callers may want to distinguish between a unit reference error and a calculation summation error, and a simple string would not give that kind of fidelity.

In this design, no exceptions are thrown, allowing Gepsio to perform as much validation as possible.


Feedback Welcome


Your feedback is welcome. Do you like the current exception-based validation scheme? Do you favor the validity design shown above? Or do you have another idea? Add your feedback to a comment on this post or post a message to the Gepsio Facebook page at http://www.facebook.com/gepsio.

Monday, August 27, 2012

Importing XBRL Fact Values Into SQL Server Using PowerShell and Gepsio

A Gepsio user has just written in with a success story on importing XBRL data into a SQL Server database using SQL Server Integration Services (SSIS), PowerShell and Gepsio. The solution was too good not to share, and, with the user's permission, this blog post will describe that solution.

The user was tasked with importing some XBRL data into a SQL Server database instance. The overall architectural idea was to use an SSIS package to run a PowerShell script which would create a comma separated values (CSV) file from a raw XBRL document, and the values in the CSV would be used to import data into the database.

The first trick, which isn't really XBRL-specific but still worth mentioning, is to get SSIS to run a PowerShell script. Basically, PowerShell scripts are run from within an SSIS package by adding an "Execute Process Task" and using a command line in the following form:

C:\[PATHTOPOWERSHELL]\PowerShell.exe -ExecutionPolicy ByPass -command ". 'C:\SCRIPTPATH\MyScript.ps1' 'param1' 'param2'"



Read the forum discussion at http://social.msdn.microsoft.com/Forums/en-NZ/sqlintegrationservices/thread/216d2ee6-0f04-480f-808d-8241bc4a8d18 for more information about this process.


The next trick, of course, is creating the CSV file from the raw XBRL document. The user turned to PowerShell and Gepsio for this work, and life became a lot easier. Here is the example PowerShell script that can do this work:

param([string]$instFile = "C:\XBRLDOCPATH\XbrlDocument.xml")

#load Gepsio
Add-Type -Path "C:\GEPSIOPATH\JeffFerguson.Gepsio.dll"
$XbrlDoc = New-Object -TypeName JeffFerguson.Gepsio.XbrlDocument
$XbrlDoc.Load($instFile)
$instCSV = "C:\OUTPUTPATH\Allinfo.csv"

New-Item $instCSV -type "file" -force -value "EntityRegName,EntityFilerCat,FactName,Value`r`n"
$stream = new-object system.IO.StreamWriter($instCSV,$true)

[string] $script:Entity =""
try
{
foreach($CurrentFragment in $XbrlDoc.XbrlFragments)
{
GetEntityInfo $Currentfragment
try
{
WriteItemCSV $CurrentFragment "EarningsPerShareBasic"
}
catch
{
Write-Error("us-gaap_EarningsPerShareBasic: " +$_ )
}
try
{
WriteItemCSV $CurrentFragment "NetIncomeLoss"
}
catch
{
Write-Error(":us-gaap_NetIncomeLoss " +$_ )
}
}
}
catch
{
Write-Error("main foreach writeloop: " +$_ )
}
finally
{
$stream.close()
$stream.Dispose()
}

Function GetEntityInfo
{
param($fragment)
$script:Entity = ""
$entr = $fragment.Facts | Where-Object {$_.Name -eq "EntityRegistrantName"}
if(!$entr)
{
$entr = ""
}
$efc = $fragment.Facts | Where-Object {$_.Name -eq "EntityFilerCategory"}
if(!$efc)
{
$efc = ""
}
$script:Entity = "`"" + $entr.Value + "`",`"" + $efc.Value + "`","
}

Function WriteItemCSV
{
param($fragment, $ElId)
$Ff = $fragment.Facts | Where-Object {$_.Name -eq $ElId}
if($Ff)
{
[string]$S = $script:Entity
if ($Ff.GetType().fullname -eq "JeffFerguson.Gepsio.Item")
{
[string]$S = $script:Entity
if( $Ff.Name)
{
$S = $S + "`"" + $Ff.Name + "`","
}
else
{
$S = $S + "" +","
}
if( $Ff.Value)
{
$S = $S + $Ff.Value + ","
}
else
{
$S = $S + "" +","
}
$stream.WriteLine($S)
}
if ($Ff.GetType().fullname -eq "System.Object[]")
{
foreach($i in $Ff)
{
[string]$S = $script:Entity
if( $i.Name)
{
$S = $S + "`"" + $i.Name + "`","
}
else
{
$S = $S + "" +","
}
if( $i.Value)
{
$S = $S + $i.Value
}
else
{
$S = $S + ""
}
$stream.WriteLine($S)
}
}
}
}



Let's take a look at this PowerShell script in more detail.


The opening statements adds the Gepsio types to the PowerShell script and loads the XBRL document named in a PowerShell script command line parameter into a new Gepsio XbrlDocument instance:

param([string]$instFile = "C:\XBRLDOCPATH\XbrlDocument.xml")

#load Gepsio
Add-Type -Path "C:\GEPSIOPATH\JeffFerguson.Gepsio.dll"
$XbrlDoc = New-Object -TypeName JeffFerguson.Gepsio.XbrlDocument
$XbrlDoc.Load($instFile)



At this point, the PowerShell script maintains a variable called $XbrlDoc which contains all of Gepsio's knowledge about the loaded XBRL document. Loading and validating an XBRL document can't get much easier.


Once the XBRL document is loaded, then the output CSV file is created:

New-Item $instCSV -type "file" -force -value "EntityRegName,EntityFilerCat,FactName,Value`r`n"    
$stream = new-object system.IO.StreamWriter($instCSV,$true)



Here, a new file is created and the CSV column header row is written to the new file. The CSV is set up to capture the following values found in the XBRL document:



  • Entity Registrant Name
  • Entity Filer Category

A .NET StreamWriter object is created to reference the newly created CSV file and is available from within a PowerShell script variable called $stream.


Once the CSV is available, the PowerShell script iterates through each of the XBRL fragments found in the XBRL document loaded by Gepsio:

try
{
foreach($CurrentFragment in $XbrlDoc.XbrlFragments)
{
GetEntityInfo $Currentfragment
try
{
WriteItemCSV $CurrentFragment "EarningsPerShareBasic"
}
catch
{
Write-Error("us-gaap_EarningsPerShareBasic: " +$_ )
}
try
{
WriteItemCSV $CurrentFragment "NetIncomeLoss"
}
catch
{
Write-Error(":us-gaap_NetIncomeLoss " +$_ )
}
}
}
catch
{
Write-Error("main foreach writeloop: " +$_ )
}
finally
{
$stream.close()
$stream.Dispose()
}



For each XBRL fragment found in the Gepsio document instance, entity information is read and written into the CSV file. Entity information is read from the fragment, and then values are written to the CSV. These operations are performed by functions in the PowerShell script called GetEntityInfo and WriteItemCSV, respectively.


Let's take a look at the GetEntityInfo function. It is defined as follows:

Function GetEntityInfo
{
param($fragment)
$script:Entity = ""
$entr = $fragment.Facts | Where-Object {$_.Name -eq "EntityRegistrantName"}
if(!$entr)
{
$entr = ""
}
$efc = $fragment.Facts | Where-Object {$_.Name -eq "EntityFilerCategory"}
if(!$efc)
{
$efc = ""
}
$script:Entity = "`"" + $entr.Value + "`",`"" + $efc.Value + "`","
}



This function, which accepts a Gepsio XbrlFragment object as a parameter, searches through each of the fragment's facts, looking for a fact whose name is "EntityRegistrantName" or "EntityFilerCategory". If they are found, the Gepsio Fact objects are stored in local script variables -- $entr and $efc, respectively. Once the search is complete, the Fact object's values are stored as a string in a global-level script variable called $script:Entity.


The other interesting function in the PowerShell script is the WriteItemCSV function:

Function WriteItemCSV
{
param($fragment, $ElId)
$Ff = $fragment.Facts | Where-Object {$_.Name -eq $ElId}
if($Ff)
{
[string]$S = $script:Entity
if ($Ff.GetType().fullname -eq "JeffFerguson.Gepsio.Item")
{
[string]$S = $script:Entity
if( $Ff.Name)
{
$S = $S + "`"" + $Ff.Name + "`","
}
else
{
$S = $S + "" +","
}
if( $Ff.Value)
{
$S = $S + $Ff.Value + ","
}
else
{
$S = $S + "" +","
}
$stream.WriteLine($S)
}
if ($Ff.GetType().fullname -eq "System.Object[]")
{
foreach($i in $Ff)
{
[string]$S = $script:Entity
if( $i.Name)
{
$S = $S + "`"" + $i.Name + "`","
}
else
{
$S = $S + "" +","
}
if( $i.Value)
{
$S = $S + $i.Value
}
else
{
$S = $S + ""
}
$stream.WriteLine($S)
}
}
}
}



This function accepts both an Gepsio XBRL Fragment object and a fact name as parameters. It begins by searching through each of the supplied fragment's facts, looking for a fact whose name matches the supplied name. If it is found, the Gepsio Fact objects are stored in a local script variable called $Ff. If the fact is found, its name and values are appended to the global script variable $script:Entity, after which the entire line built up in the $script:Entity variable is written out to the CSV stream.



[One quick note about the placement of the functions in the PowerShell script: When the script is executed from the PowerShell command line, the functions can be found anywhere in the script. SSIS, however, may need to have the functions defined before they are used in the script. If SSIS does not execute the script as written, the functions may need to be moved forward in the script so that they are defined before they are used.]


In the end, the PowerShell script will produce a CSV file that looks something like this:

EntityRegName,EntityFilerCat,FactName,Value
"COCA COLA CO","Large Accelerated Filer","EarningsPerShareBasic",1.22
"COCA COLA CO","Large Accelerated Filer","EarningsPerShareBasic",2.05
"COCA COLA CO","Large Accelerated Filer","EarningsPerShareBasic",2.14
"COCA COLA CO","Large Accelerated Filer","EarningsPerShareBasic",1.24
"COCA COLA CO","Large Accelerated Filer","NetIncomeLoss",4703000000
"COCA COLA CO","Large Accelerated Filer","NetIncomeLoss",2788000000
"COCA COLA CO","Large Accelerated Filer","NetIncomeLoss",4842000000
"COCA COLA CO","Large Accelerated Filer","NetIncomeLoss",2800000000



Once this script is in place, the SSIS package to actually import the data becomes an easy set of two simple tasks:



  1. execute PowerShell script to create CSV from XBRL
  2. import CSV data into a database table

For the sake of brevity, the PowerShell sample shown here loads only a few XBRL values into the CSV file. In practice, however, there is no limit to the amount of data that could be loaded into the CSV.


Once all of this was working, the user sent in a testimonial that read:



"Thank you for making Gepsio available.  It saved me an awful lot of work (xpath, YIKES)!".


This is gratifying, as this is what Gepsio is all about: making access to XBRL data easier without resorting to complicated XPath queries or understanding a lot about the rules of XBRL validation or technical syntax. Gepsio provides access to XBRL data without needing to worry about XBRL itself, which frees developers to work on the more important problem of building added value on top of the data.


Many thanks go out to the Gepsio user who provided this solution.

Friday, June 22, 2012

WebException Objects and XBRL Schema Object Creation

The other day, I was doing some performance testing in Gepsio and noticed that HTTP errors found during the XBRL schema object creation process were not reported in the same way as other exceptions thrown by Gepsio.

The constructor of Gepsio’s XbrlSchema class uses an XmlTextReader to read an XBRL schema. If the URI for the XBRL schema to be read is an HTTP-based URI, then the XmlTextReader will use the .NET Web stack to read the schema using HTTP. If something fails during that process, the .NET Web stack will throw a WebException.

In the Nov 2011 CTP (and, in fact, all Gepsio CTPs to this point), thrown WebException objects were not wrapped in an XbrlException object and were consequently thrown back to the client as a WebException object.

To be clear that the issue is an XBRL issue caused by an HTTP failure, the XBRL schema creation code now creates an XbrlException object, stores the caught WebException as an inner exception to the XbrlException, and throws the XbrlException object back up to the client.

I have just checked in a fix for the issue. The fix will be available in the next release (currently planned for Jul 2012). If you want to grab the code ahead of time, feel free to grab the latest source code here.