Monday, August 27, 2012

Importing XBRL Fact Values Into SQL Server Using PowerShell and Gepsio

A Gepsio user has just written in with a success story on importing XBRL data into a SQL Server database using SQL Server Integration Services (SSIS), PowerShell and Gepsio. The solution was too good not to share, and, with the user's permission, this blog post will describe that solution.

The user was tasked with importing some XBRL data into a SQL Server database instance. The overall architectural idea was to use an SSIS package to run a PowerShell script which would create a comma separated values (CSV) file from a raw XBRL document, and the values in the CSV would be used to import data into the database.

The first trick, which isn't really XBRL-specific but still worth mentioning, is to get SSIS to run a PowerShell script. Basically, PowerShell scripts are run from within an SSIS package by adding an "Execute Process Task" and using a command line in the following form:

C:\[PATHTOPOWERSHELL]\PowerShell.exe -ExecutionPolicy ByPass -command ". 'C:\SCRIPTPATH\MyScript.ps1' 'param1' 'param2'"



Read the forum discussion at http://social.msdn.microsoft.com/Forums/en-NZ/sqlintegrationservices/thread/216d2ee6-0f04-480f-808d-8241bc4a8d18 for more information about this process.


The next trick, of course, is creating the CSV file from the raw XBRL document. The user turned to PowerShell and Gepsio for this work, and life became a lot easier. Here is the example PowerShell script that can do this work:

param([string]$instFile = "C:\XBRLDOCPATH\XbrlDocument.xml")

#load Gepsio
Add-Type -Path "C:\GEPSIOPATH\JeffFerguson.Gepsio.dll"
$XbrlDoc = New-Object -TypeName JeffFerguson.Gepsio.XbrlDocument
$XbrlDoc.Load($instFile)
$instCSV = "C:\OUTPUTPATH\Allinfo.csv"

New-Item $instCSV -type "file" -force -value "EntityRegName,EntityFilerCat,FactName,Value`r`n"
$stream = new-object system.IO.StreamWriter($instCSV,$true)

[string] $script:Entity =""
try
{
foreach($CurrentFragment in $XbrlDoc.XbrlFragments)
{
GetEntityInfo $Currentfragment
try
{
WriteItemCSV $CurrentFragment "EarningsPerShareBasic"
}
catch
{
Write-Error("us-gaap_EarningsPerShareBasic: " +$_ )
}
try
{
WriteItemCSV $CurrentFragment "NetIncomeLoss"
}
catch
{
Write-Error(":us-gaap_NetIncomeLoss " +$_ )
}
}
}
catch
{
Write-Error("main foreach writeloop: " +$_ )
}
finally
{
$stream.close()
$stream.Dispose()
}

Function GetEntityInfo
{
param($fragment)
$script:Entity = ""
$entr = $fragment.Facts | Where-Object {$_.Name -eq "EntityRegistrantName"}
if(!$entr)
{
$entr = ""
}
$efc = $fragment.Facts | Where-Object {$_.Name -eq "EntityFilerCategory"}
if(!$efc)
{
$efc = ""
}
$script:Entity = "`"" + $entr.Value + "`",`"" + $efc.Value + "`","
}

Function WriteItemCSV
{
param($fragment, $ElId)
$Ff = $fragment.Facts | Where-Object {$_.Name -eq $ElId}
if($Ff)
{
[string]$S = $script:Entity
if ($Ff.GetType().fullname -eq "JeffFerguson.Gepsio.Item")
{
[string]$S = $script:Entity
if( $Ff.Name)
{
$S = $S + "`"" + $Ff.Name + "`","
}
else
{
$S = $S + "" +","
}
if( $Ff.Value)
{
$S = $S + $Ff.Value + ","
}
else
{
$S = $S + "" +","
}
$stream.WriteLine($S)
}
if ($Ff.GetType().fullname -eq "System.Object[]")
{
foreach($i in $Ff)
{
[string]$S = $script:Entity
if( $i.Name)
{
$S = $S + "`"" + $i.Name + "`","
}
else
{
$S = $S + "" +","
}
if( $i.Value)
{
$S = $S + $i.Value
}
else
{
$S = $S + ""
}
$stream.WriteLine($S)
}
}
}
}



Let's take a look at this PowerShell script in more detail.


The opening statements adds the Gepsio types to the PowerShell script and loads the XBRL document named in a PowerShell script command line parameter into a new Gepsio XbrlDocument instance:

param([string]$instFile = "C:\XBRLDOCPATH\XbrlDocument.xml")

#load Gepsio
Add-Type -Path "C:\GEPSIOPATH\JeffFerguson.Gepsio.dll"
$XbrlDoc = New-Object -TypeName JeffFerguson.Gepsio.XbrlDocument
$XbrlDoc.Load($instFile)



At this point, the PowerShell script maintains a variable called $XbrlDoc which contains all of Gepsio's knowledge about the loaded XBRL document. Loading and validating an XBRL document can't get much easier.


Once the XBRL document is loaded, then the output CSV file is created:

New-Item $instCSV -type "file" -force -value "EntityRegName,EntityFilerCat,FactName,Value`r`n"    
$stream = new-object system.IO.StreamWriter($instCSV,$true)



Here, a new file is created and the CSV column header row is written to the new file. The CSV is set up to capture the following values found in the XBRL document:



  • Entity Registrant Name
  • Entity Filer Category

A .NET StreamWriter object is created to reference the newly created CSV file and is available from within a PowerShell script variable called $stream.


Once the CSV is available, the PowerShell script iterates through each of the XBRL fragments found in the XBRL document loaded by Gepsio:

try
{
foreach($CurrentFragment in $XbrlDoc.XbrlFragments)
{
GetEntityInfo $Currentfragment
try
{
WriteItemCSV $CurrentFragment "EarningsPerShareBasic"
}
catch
{
Write-Error("us-gaap_EarningsPerShareBasic: " +$_ )
}
try
{
WriteItemCSV $CurrentFragment "NetIncomeLoss"
}
catch
{
Write-Error(":us-gaap_NetIncomeLoss " +$_ )
}
}
}
catch
{
Write-Error("main foreach writeloop: " +$_ )
}
finally
{
$stream.close()
$stream.Dispose()
}



For each XBRL fragment found in the Gepsio document instance, entity information is read and written into the CSV file. Entity information is read from the fragment, and then values are written to the CSV. These operations are performed by functions in the PowerShell script called GetEntityInfo and WriteItemCSV, respectively.


Let's take a look at the GetEntityInfo function. It is defined as follows:

Function GetEntityInfo
{
param($fragment)
$script:Entity = ""
$entr = $fragment.Facts | Where-Object {$_.Name -eq "EntityRegistrantName"}
if(!$entr)
{
$entr = ""
}
$efc = $fragment.Facts | Where-Object {$_.Name -eq "EntityFilerCategory"}
if(!$efc)
{
$efc = ""
}
$script:Entity = "`"" + $entr.Value + "`",`"" + $efc.Value + "`","
}



This function, which accepts a Gepsio XbrlFragment object as a parameter, searches through each of the fragment's facts, looking for a fact whose name is "EntityRegistrantName" or "EntityFilerCategory". If they are found, the Gepsio Fact objects are stored in local script variables -- $entr and $efc, respectively. Once the search is complete, the Fact object's values are stored as a string in a global-level script variable called $script:Entity.


The other interesting function in the PowerShell script is the WriteItemCSV function:

Function WriteItemCSV
{
param($fragment, $ElId)
$Ff = $fragment.Facts | Where-Object {$_.Name -eq $ElId}
if($Ff)
{
[string]$S = $script:Entity
if ($Ff.GetType().fullname -eq "JeffFerguson.Gepsio.Item")
{
[string]$S = $script:Entity
if( $Ff.Name)
{
$S = $S + "`"" + $Ff.Name + "`","
}
else
{
$S = $S + "" +","
}
if( $Ff.Value)
{
$S = $S + $Ff.Value + ","
}
else
{
$S = $S + "" +","
}
$stream.WriteLine($S)
}
if ($Ff.GetType().fullname -eq "System.Object[]")
{
foreach($i in $Ff)
{
[string]$S = $script:Entity
if( $i.Name)
{
$S = $S + "`"" + $i.Name + "`","
}
else
{
$S = $S + "" +","
}
if( $i.Value)
{
$S = $S + $i.Value
}
else
{
$S = $S + ""
}
$stream.WriteLine($S)
}
}
}
}



This function accepts both an Gepsio XBRL Fragment object and a fact name as parameters. It begins by searching through each of the supplied fragment's facts, looking for a fact whose name matches the supplied name. If it is found, the Gepsio Fact objects are stored in a local script variable called $Ff. If the fact is found, its name and values are appended to the global script variable $script:Entity, after which the entire line built up in the $script:Entity variable is written out to the CSV stream.



[One quick note about the placement of the functions in the PowerShell script: When the script is executed from the PowerShell command line, the functions can be found anywhere in the script. SSIS, however, may need to have the functions defined before they are used in the script. If SSIS does not execute the script as written, the functions may need to be moved forward in the script so that they are defined before they are used.]


In the end, the PowerShell script will produce a CSV file that looks something like this:

EntityRegName,EntityFilerCat,FactName,Value
"COCA COLA CO","Large Accelerated Filer","EarningsPerShareBasic",1.22
"COCA COLA CO","Large Accelerated Filer","EarningsPerShareBasic",2.05
"COCA COLA CO","Large Accelerated Filer","EarningsPerShareBasic",2.14
"COCA COLA CO","Large Accelerated Filer","EarningsPerShareBasic",1.24
"COCA COLA CO","Large Accelerated Filer","NetIncomeLoss",4703000000
"COCA COLA CO","Large Accelerated Filer","NetIncomeLoss",2788000000
"COCA COLA CO","Large Accelerated Filer","NetIncomeLoss",4842000000
"COCA COLA CO","Large Accelerated Filer","NetIncomeLoss",2800000000



Once this script is in place, the SSIS package to actually import the data becomes an easy set of two simple tasks:



  1. execute PowerShell script to create CSV from XBRL
  2. import CSV data into a database table

For the sake of brevity, the PowerShell sample shown here loads only a few XBRL values into the CSV file. In practice, however, there is no limit to the amount of data that could be loaded into the CSV.


Once all of this was working, the user sent in a testimonial that read:



"Thank you for making Gepsio available.  It saved me an awful lot of work (xpath, YIKES)!".


This is gratifying, as this is what Gepsio is all about: making access to XBRL data easier without resorting to complicated XPath queries or understanding a lot about the rules of XBRL validation or technical syntax. Gepsio provides access to XBRL data without needing to worry about XBRL itself, which frees developers to work on the more important problem of building added value on top of the data.


Many thanks go out to the Gepsio user who provided this solution.

1 comment: