Working with a Party Model

Party Model Overview

Working with a Party Model implements relationships for parties (entities) such as organisations and people.  The super-type (organisation) is comprised of sub-types, i.e. people.  People may have different roles, e.g. manager (responsible for) or employee (reports to) and the organisation (employs).  Roles declare meaning when read in a direction, viz. left-to-right (Ltr) and right-to-left (Rtl). Party is the super-type entity entry that signifies the identity of an interest to the reference organisation, i.e. the one that implements the model.  Party is viewed relative to this vantage point and is the superior interacted with.  The next inter-actor follows a hierarchical model, i.e. forms a relationship (as party) as first inter-actor, e.g. organisation as employer to the person as employee.  Both are Parties and have a representation in the Party table with PartyId as the unique identifier.  These identity keys are used in a relationship table to create the relation: Acme Corp. (employer) — employs —> Engineer Joe (employee).

Party is a physical table in the database.  The PartyId is the primary key of the table, an identifier that is the foreign key to the party being modelled, e.g. organisation or person.

Party Relationship

Figure 1 – Party Relationship

The Party table with its associated Person and Organisation tables contain the business data, e.g. Figure 1 – Party Relationship. This Party structure is the basic relationship or root of the party model.

The Party Physical Data Model (Figure 2) implements the organisation (as Party) in the Party (primary key) to Organisation (foreign key/ Primary Key) in the relationship.  Also for (Person) Individual, being defined by PartyId (PK) to IndividualPartyId (FK/ PK).  The Primary Key of an Organisation is used in the foreign-key implementations of its reference data, e.g. Organisation Level OrgPartyId (FK/ PK).

Party Physical Data Model

Figure 2 – Party Physical Data Model

The model (Figure 2) has very little meaning other than indicating a very explicit entity, defined as a party; the power of the relationship lies in the association of people (individuals) to the organisation, i.e. to be the bill-of-material (2016 – Jean-Marc Reynaud) of the organisation. An individual who is part of the company has a role as employee, with a designation of function,e.g. Engineer.  Such a person could also be a line-manager, all of which requires the model to accommodate and extensible implementation. The root (internal) Organisation also has relationships with various other (external) Organisations, with their respective classifications and reference data modelled on the OrganisationPartyId as the PartyId foreign key, but primary key for the Organisation.

Two types of data are brought together, viz. the business data (relationship structures [taxonomies] for Party) and the metadata (reference/ lookup) data for the Organisation, People and other entities modelled with all other reference data-structures (objects) that relate.

According to Siebel in their description of The Party Model the model comprises a base table and extension tables, with external tables used to model the relationships between the Party tables.  “Typically, you associate each employee and partner user with one or more positions.  The employee or partner has only one active position at a time and automatically associated to one division and organisation, being associated with the active position.”

The business data leverages metadata, e.g. Role and Party type.  For example, a Party is defined (in the Party type) as either a person or an organisation.  The Role type describes the Peron and/ or the Organisation, e.g. Systems Analyst (role) for Person, or Public Company for Organisation.

For other entities (objects) in a business, e.g. Product, Events, Agreements, or Assets, these are modelled as objects and exposed to Part in a junction-table called: Party<Entity>.  For example, PartyProduct, PartyEvents, PartyAgreements, or PartyAssets, following on from the previous example.  The Party consequently “has a role with” the reciprocal Party<Entity>, e.g. a Product is linked to a Party by PartyId and ProductId (with an associated PartyProductRole) and Product defined in an associated data model that uses ProductId as the unique product identifier.

Building the Organisation and People Party Model


Party is a base entity type relating to the objective organisation modelling its business.  A Party always represents a single person or a group, e.g. a company that can be viewed in a business as a single entity (Ref: Configure Siebel Applications).  These entities (according to Oracle) could be:

  • Person/ Contact
  • User
  • Employee
  • Position
  • Partner User
  • Account
  • Division
  • Organisation
  • Household
  • User Group
  • Access List

These parties are defined by a Party Type field, a code that uniquely identifies the sub-type discriminator for a party, as described in the list above.  Extension tables are associated with the basic Party record to provide differentiation.  Party joins extension tables explicitly.  The business data must be modelled in a Role Type Relationship table to create role-pairs that model first and second inter-actors.  This defines the type of relationships possible between Parties. For example, an Organisation implements many job roles, so the hierarchy of the role-pairs is to create the bill-of-materials of one organisation with its many job profiles.  The organisation also employs an employee to fill a job profile.  All these reference data entities (before relating them) must be created.

Define Party Types

Any Party is defined by its sub-type discriminator, describing the Party, what it is, viz. Organisation, Person, Position, Capbility, etc.  The PartyTypeCode (an Integer number can also be used) is the primary key to the foreign key in the Party table.

Party Role Type

A party requires fulfilling a role within a business context.  Each Party Type (person or organisation) plays a business contextual role in relation to the extensions of a business, e.g. Product, Events, Agreements, or Assets (as described in the overview).  Consequently, in the extension’s junction table, an explicit Role Type Code would join the Role Type to the extension, e.g. Asset to indicate what role the connected Party plays in that context.  For example: a company Product is joined to a Party in the Product Party junction table by relating PartyId and ProductId with a Party Product Role Code and a set of effective-from and effective-to dates to version the relationship.  This could be that Acme Corp (an internal organisation) has an asset called an impact drilling machine in the role of owner.


Reference data for an Organisation can be carried at this level while more business modelling, to extend the configuration such as Address, Name History, SIC History, and Type, can be done in the extension tables.  The Organisation Identifier (company code or some legal registration number) can be incorporated to uniquely identify the entity.  Whatever information is presented at this level should only change or become defunct if the Organisation is terminated or becomes redundant.

Modelling an Organisation requires the Organisation and Division structures. In the Siebel CRM model and Organisation is a Division (2015 Gerhard Hermann), designated as an Organisation and can be hierarchical such as Divisions.  A Division is an internal unit within the Organisation/ Company, e.g. a regional operating division of the company.  The company is modelled directly on a Division and not the Organisation.

Attributes or the Organisation, irrespective of the Organisation Type, can be added to this level (The Data Model Resource Book, Volume 1 – Len Silverston 2001).  It reduces the number of tables (containing business data) modelling the structures of Organisation.  At this

Organisation Type

Before an Organisation can be created, the type should be known.  Basically, an organisation can be Internal (part of the reference organisation) or External, an organisation with whom the reference organisation has dealings and interactions.  This break-down can take on the form Legal and Informal Organisations (Data Structure Patterns Topic 4.3), or whatever the reference Organisation has decreed how an Organisational model should look like.  The hierarchy must be sensible and usable to the organisation.  The delineation between internal and external organisations can also be achieved with a simple boolean flag to simplify the model and attach extension reference data at the Organisation level. According to Teradata: “A well-understood big picture of the organization needs to be captured and communicated in the form of a model.”


Visual Studio 2013 Database Project: Item Templates with VSIX

Visual Studio 2013 Database Project: Database Item Templates with VSIX


Strictly speaking VSIX is not a database project or related in kind, but by virtue of Item Templates.  With VSIX projects and the TemplateBuilder (NuGet) package, templates can be built from the contents of an ItemsTemplates folder.  The latter (if not created by TemplateBuilder) should be manually created as ItemTemplates folder as an item of the VSIX project.

NOTE: SQL Server (database) templates are not supported in Visual Studio 2013 VSIX and requires a manual effort to install on a developer’s IDE.  As indicated by the error item below, if the database files added to a VSIX project are changed to have a vstemplate file extension the compiler (on build) should issue this error saying that only CSharp and VBasic templates. Similarly if a Reference is added in the Asset tab of the manifest, the error The targer “TemplateProjectOuputGroup” does not exist in the project” is presented.

VS Database Template Support

VS Database Template Support

VSIX Template

The VSIX project template is available from the Visual Studio 2013 SDK and found under the menu: [Visual C#] >> Extensibility >> VSIX Project.

Template Builder

The TemplateBuilder package is installed from the Package Manager console or from Visual Studio: Tools >> NuGet Package Manager >> Package Manger Console.  Once installed, template are built to the folder: <Drive:>:\Users\<User>\Documents\Visual Studio 2013\Templates\ItemTemplates and <Drive:>\Users\<User>\Documents\Visual Studio 2013\Templates\ProjectTemplates.

NuGet Package Manager

Download and install the NuGet Package Manager for Visual Studio 2013.  It has a VSIX extension but installs by double-click and run.


SideWaffle is a template pack that adds a selection of project and item templates to Visual Studio, including enabling templates to use for creating new ones. A VSIX project generates an experimental invocation of Visual Studio in which to test these templates so that the real instance is not affected.  To configure templates require a work-flow that is enabled by a separate application (that you create) to populate template and custom parameters when creating a new item.  This article, for now, mentions it but is not using it to create templates.

Create a New VSIX Project for Template Deployment

In order to author customised solutions, projects, code-snippets, and templates, a VSIX project is required.  Depending on the user-environment set-up for Visual Studio, file paths may vary slightly.

Create a new project:

  1. Follow the path: File >> New Project >> [Custom Path] >> [Visual C#] >> extensibility >> VSIX Project
  2. Provide a name for the project and check the solution details
  3. Fill out the VS VSIX Manifest with the appropriate details (reference)
  4. Install the TemplateBuilder package.
VS VSIX Manifest

VS VSIX Manifest

Installing TemplateBuilder

This installation requires the Package Manager Console (NuGet):

  • Follow the path: Tools >> NuGet Package Manager >> Package Manager Console.
  • Type Install-Package TemplateBuilder -Pre at the PM> prompt and hit [Enter].  NOTE: Internet access to is required.
VS Package Manager Console

VS Package Manager Console

The TemplateManager installation responds favourably on completion of the install, if it succeeds.  NOTE: The NuGet package is downloaded into the file location: <Drive>:\Users\<user name>\AppData\Local\NuGet\Cache and in the event of not having the default Internet location ( available, Visual Studio looks here for whatever the package is being installed.

VS NuGet Installation of TemplateBuilder

VS NuGet Installation of TemplateBuilder

The installation process deposits an example template into: <Drive:>\Users\<User>\Documents\Visual Studio 2013\Templates\ProjectTemplates.  NOTE: there may not actually be an example template in this folder.

Create a Template Solution and Project

The Visual Studio Solution should contain the VSIX project as well as other projects, i.e. those used in which to create the templates.  For example, to create database templates a database project is added to the solution and the corresponding templates with parameters created in there.  This article uses the template created in a previous post (Visual Studio 2013 Database Project: Create a Stored Procedure Item Template) to illustrate how the VSIX project functions in this respect.

There are two methods:

  1. Exporting and importing a template (zip file)
  2. Referencing the template source project from the VSIX project where they are in the same solution

The latter (point 2) does not work for database templates

Exporting the Template

The template (or stored procedure created in the templates project) is created by exporting the necessary file (the actual stored procedure and any referenced files in the case of a project or c# type template) to the folder: <Drive:>\Users\<User>\Documents\Visual Studio 2013\My Exported Templates.  Follow the next steps to affect this:

  • Create, for example a stored procedure template (reference the above link on how to do that) an File >> Export Template >> Item Template, select the correct project, i.e. the database project containing the stored procedure created.
  • Select the correct stored procedure and complete the wizard.  The output template is locatable in the folder: <Drive:>\Users\<User>\Documents\Visual Studio 2013\My Exported Templates in a file called: <Template Name>.zip.

NOTE: Edit the MyTemplate.vstemplate file (contained inside the zip file) in Visual Studio and add an element TemplateGroupID (depicted in the code below) between the templateData tags to help with the filtering of template in the New Item dialog (reference).

VS Template Projects and Item Template

VS Template Projects and Item Template


The Item Template (Stored Procedure) is added to the VSIX project.  Use the .vsixmanifest file and add an Itemtemplate item to the Assets tab. Add a:

  • New item,
  • Pick ItemTemplate and
  • Point the File Location to the folder where the copied template zip file has been placed.

NOTE: The path could be: :\Users\\Documents\Visual Studio 2013\Templates\ItemTemplates\\Database\Programmability, but it is up to choice.  However, Visual Studio, on starting up, reads from this location on a user-instance, i.e. local machine.  The project is appended with a new folder, i.e. ItemTemplates and the zip file is appended.

On the  the property of the newly added template (zip) file, set the property Copy to Output Directory is set tot Copy Always.

NOTE: To inspect the XML code of the project file: right-click >> Remove and then right-click >> edit) reveals this setting in its XML structure, as depicted below.

    "packages.config" />
    "Properties\template-builder.props" />

NOTE: Turn off the Build and Deploy options in the Configuration Manager for the VSIX project.

Using a Reference to the Template Project in the same Solution

A reference to the template project (the one where the stored procedure is authored) is added to the Reference of the VSIX project by using the source.extension.vsixmanifest file’s Assets tab, as before.  But, in this case, the reference is to a project in the same solution.  The fields [Type],

[Source], and [Project] are completed, with project being the name of the said template project.  This adds a reference to the VSIX project.

Set the Properties of the reference (see how), especially the VSIX Sub Path that determines where the template is installed on the menu-tree of the New Item menu function.

NOTE: Visual Studio does not provide a project template for creating database objects, so it seems the export method is the only way in which to get the zip file into the VSIX for deployment.  This aspect may be resolvable by creating the zipped files in a database project and let the project zip them like the wizard and place in the appropriate folder.  Not yet sure how to do this.

Build the Project and Install the Templates

The VSIX project is built and it copies the template zip file to the output folder: \Users\\Documents\Visual Studio 2013\Projects\\\bin\Debug as a .vsix (installable) file along with a TemplateBuilder.dll and the folder structure (Visual Studio) that contains the actual template file.

NOTE: For developer projects, this installation file should be made available on a network shared drive and installed as updates are made, or from a Private Gallery reference) that notifies users via RSS feeds of new extensions.  Alternatively the same file can be uploaded to the Visual Studio Gallery for community consumption and addition.

file in the VSIX project to set the [Assets] by creating a new reference to an ItemTemplate of a project (the one containing the stored procedure template) that also adds it to the References of the VSIX project. Here the Properties of the added assembly can be altered to achieve a variety of things. See the screen image (below) for a visual explanation of what is described.

VS VSIX Template Database Item Folder Location Reference

VS VSIX Template Database Item Folder Location Reference

Executing the <Name>.vsix (installable) file invokes all the things that were set in the manifest to install the template(s).  These are available in Visual Studio and can be inspected via: Tools >> Extensions & Updates >> Templates
, illustrated below.

VS Extensions and Updates Templates folder

VS Extensions and Updates Templates folder

Using the newly installed Templates

After installing the VSIX template package and verifying the extension is in Visual Studio, the template can be used from which to create Projects or Items.  During the installation the templates are deployed to a random folder: %LocalAppData%\Microsoft\VisualStudio\12.0\Extensions and by virtue of %LocalAppData% it only applies to the current user.  Alternatively, setting the manifest to AllUsers=true the extension is installed under Common7\IDE\Extensions (where VS is installed) and appears to all users on the computer (reference).

Visual Studio 2013 Database Project: Working the Project LocalDbs

Visual Studio 2013 Database Project: Working the Project LocalDbs


Visual Studio (VS) has evolved much to what it is capable of today for database development and deployment.  With the addition of a much better table create facility Visual Studio is the development tool of choice for the database developer.  SQL Server Management Studio still has its place as a light free tool, but for serious database development with Team Foundation Integration, Visual Studio is the tool of choice.

Using Local Databases

Visual Studio creates local databases when a database project is created, viz.

  1. MSSQLLocalDb — SQL 2014 local db
  2. ProjectsV12 — created by SQL Server Data Tools (SSDT)
  3. V11.0 — SQL 2013 local db
VS SQL Server Object Explorer

VS SQL Server Object Explorer

These local database instances are located on a file location on the local drive of the computer that runs Visual Studio at: C:\Users\hendriekusr\AppData\Local\Microsoft\Microsoft SQL Server Local DB\Instances.

The server name consequently is: (localdb)\ProjectV12.  Details about these installed databases can be obtained by running the SQlLocalBD.exe (located in C:\Program Files\Microsoft SQL Server\110\Tools\Binn\) against the name of the database, e.g. info ProjectsV12.

VS Local Database Information

VS Local Database Information

Use the Instance pipe name (from above execution) to connect to a local database from SQL Server Management Studio (SSMS), as depicted below.

SSMS Local Db Connection Dialog

SSMS Local Db Connection Dialog

A connection (to view database objects) is possible from VS and SSMS with the latter using the pipe connection string.  The SSMS screen (below) illustrates how the connection is made to the localDB as well as a Stored Procedure created from a SSMS template that was added to the database.  The next article [Create a Stored Procedure Item Template] illustrates how this same SSMS Stored Procedure template is created for Visual Studio.

SSMS Connection to LocalDB with Stored Procedure from Template

SSMS Connection to LocalDB with Stored Procedure from Template

Previous Article (2) [Create Database Project Folders]. . . Next Article (4) [Create a Stored Procedure Item Template]

Visual Studio 2013 Database Project: Create a Stored Procedure Item Template

Visual Studio 2013 Database Project-Create a Stored Procedure Item Template


Using Visual Studio for database development requires that database objects like stored procedures be created, amongst other items such as tables, view, triggers, etc.  It is advantageous if creating an item that the proposed script is a template that already conforms to the company’s standards, by providing a boiler-plate script that speeds up the developer’s options.  Even better would be if these templates are available to all developers as they check items in-and-out of Team Foundation Server.  Bear in mind that a particular organisation might have a single development office or location, but others may be geographically dispersed.  It would be expedient if all of them are singing from the same hymnal.

This article explores the basics of Visual Studio Templates for database development.  Using this phrase as a search term, an Internet search engine divvies up a few articles in support.  One of which is called An Overview of Templates for Database and Server Objects (Microsoft).  It suggests that existing templates can be customised to affect all templates used by the team.  Customisable project item/ stubs accelerate development by providing the developer with a head-start. Templates are comprised of project and item templates.  Every time a user opens a menu item in Visual Studio to create something, a template is invoked, e.g. New Project, or Add New Item.

Templates exist in the user community (How to Add Items Using Online Item Templates) or a VSIX project with SideWaffle (Code4Fun), but these aren’t discussed here.  To Illustrate the concept, a template for a Stored Procedure is created and added as an Item Template to the local developer machine.  This is absorbed into Visual Studio and provides the developer with a new item in the Add New Item option.

Creating the Template

A Stored Procedure (via Add New Item ) in Visual Studio (VS) and for that matter in Microsoft SQL Server Management Studio (SSMS) is pretty uneventful.  Therefore, should a developer create quite a few items, the work required to customise these to the company standard absorbs unnecessary time and effort.  A template should make this much easier.

Either use VS or SSMS to create a stored procedure that conforms to all general company standards.  This one to be used in Visual Studio, so the usage between SSMS and VS is different from a parameterisation perspective.

In SSMS the code-block (below) illustrates how parameter are coded and then on invocation provided by Ctrl+Shift+M.  In Visual Studio this is done through an XML configuration and only what the respective dialog requests on New Item is user-provided, e.g. the project or object name.

CREATE PROCEDURE	[<Schema_Name, sysname, Schema_Name>].[<Procedure_Name, SYSNAME, ProcedureName>] 
					<@Param1, SYSNAME, @p1> <Datatype_For_Param1, , INTEGER> = <Default_Value_For_Param1, , 0>
				,	<@Param2, SYSNAME, @p2> <Datatype_For_Param2, , INTEGER> = <Default_Value_For_Param2, , 0>
				,	<@Param3, SYSNAME, @p3> <Datatype_For_Param2, , INTEGER OUTPUT>
WITH EXECUTE AS [CALLER	| SELF | 'user_name'] -- Default: CALLER				

The flowing code-block represents the Visual Studio equivalent of the obove code for the same purpose.  The $fileinputname$ is the file name provided by the developer when creating a new stored procedure.  All the other parameters are configured in the Item Template’s XML file.  Therefore, the XML configuration should prescribe the purpose of the stored procedure, I.e. suit exactly what it is being created for. 

CREATE PROCEDURE	[$SchemaName$].[$fileinputname$] 
					$ProcParam1$	$ProcParam1DataType$	=	$ProcParam1Value$
				,	$ProcParam2$	$ProcParam2DataType$	=	$ProcParam2Value$
				,	$ProcParam3$	$ProcParam3DataType$		$ProcParam3Type$
WITH EXECUTE AS [CALLER	| SELF | 'user_name'] -- Default: CALLER <-- make a selection in the executing principle context.				

A Visual Studio Item Template for a Custom Stored Procedure

The provided template fast-tracks a developer and provides a consistent standard for every stored procedure created base on this template.  This particular standard is comprised of a few sections, viz.

  • Create with execution principal context
  • Flower box containing the abstract and owner details
  • Initial proc-wide directives, e.g. NOCOUNT, EXACABORT, and TRANSACTION ISOLATION
  • Error-handling variables (for RAISERROR)
  • Local variable declaration & initialisation
  • Main query execution block wrapped in TRY. . . CATCH and transaction
  • CATCH block with detailed error handling with XACT_STATE
  • Return value for application event handling
  • Directive switch to system defaults
  • Metadata section for object details (metadata) when it’s built in the database
CREATE PROCEDURE	[$SchemaName$].[$fileinputname$] 
					$ProcParam1$	$ProcParam1DataType$	=	$ProcParam1Value$
				,	$ProcParam2$	$ProcParam2DataType$	=	$ProcParam2Value$
				,	$ProcParam3$	$ProcParam3DataType$		$ProcParam3Type$
WITH EXECUTE AS [CALLER	| SELF | 'user_name'] -- Default: CALLER <-- make a selection in the executing principle context.				
/* ==================================================================================================================================
	ABSTRACT:	This procedure executes for $projectname$ and. . . 

	OWNER:		$ItemOwner$

	<NOTE: to developer, complete the metadata at the end of this procedure script.>

											<< See bottom of item for details >>
=================================================================================================================================== */
	-- --------------------------------------------------------------------------------------------------
	--	Renders entire txn uncommittable if constraint violation occurs. 
	--	ms-help://MS.SQLCC.v10/MS.SQLSVR.v10.en/s10de_6tsql/html/cbcaa433-58f2-4dc3-a077-27273bef65b5.htm
	-- --------------------------------------------------------------------------------------------------
	-- ------------------------------------------------------------------------------------
	-- ------------------------------------------------------------------------------------
	-- None.
	-- -------------------------------------------------------------------------------------
	-- -------------------------------------------------------------------------------------
	-- Error Handling Local Variables
	-- Delete this for SQL 2014 unless required for custom error logging
	DECLARE			@vErrorNumber		INTEGER	=	0
				,	@vErrorMessage		NVARCHAR(2048) = N''
				,	@vErrorSeverity		INTEGER = 0
				,	@vErrorState		TINYINT = 0
				,	@vErrorLine			INTEGER = 0
				,	@vErrorProcedure	NVARCHAR(256) = NULL;
	-- Processing Local Variables
	DECLARE			@LocalVariable1Name	INTEGER
				, 	@LocalVariable2Name	DATETIME;
	-- -------------------------------------------------------------------------------------
	-- -------------------------------------------------------------------------------------
	-- Initialise processing variables
	SET			@LocalVariable1Name	=	0;
	SET			@LocalVariable2Name	=	'2011-01-01';

	/* =========================================================================================
	============================================================================================ */
		-- { sql_statement | statement_block }
		/* ------------------------------------------------------------------------------------------
			Severity 0-10 (information) are not caught by TRY. . . CATCH
			Severity 11-16 (user-correctable)
			Severity 17 (Insufficient Resources)
			Severity 18 (non-fatal error)
			Severity 19 (SQL resource error)
			Severity 20-25 (fatal system error) execution termination with SQL close

			Any sys.messages valid messages can be utilised but must be passed to RAISERROR or THROW.
		-------------------------------------------------------------------------------------------- */

		-- Grab error indicators.
		-- Delete this for SQL 2014 unless required for custom error logging
		SELECT			@vErrorNumber		=	ERROR_NUMBER() 
					,	@vErrorLine			=	ERROR_LINE()
					,	@vErrorSeverity		=	ERROR_SEVERITY()
					,	@vErrorState		=	ERROR_STATE()
					,	@vErrorProcedure	=	ERROR_PROCEDURE()
					,	@vErrorMessage		=	ERROR_MESSAGE()+ CHAR(10)
												+ N'Executing code item name or type ('+ISNULL(@vErrorProcedure,N'AdHoc SQL')
												+ N'), error number ('+ CONVERT(NVARCHAR(20), @vErrorNumber)
												+ N'), severity ('+CONVERT(NVARCHAR(20), @vErrorSeverity)
												+ N'), code error line ('+CONVERT(NVARCHAR(20),@vErrorLine)										
												+ N').'+CHAR(10) +CHAR(13);
		-- -------------------------------------------------------------------------
		--	XACT_STATE =  1 ~ Transaction is committable.
		--	XACT_STATE = -1 ~ Transaction is uncommittable - roll back 
		--	XACT_STATE = 0 ~ No transaction - commit or rollback  results in error.
		--	ms-help://MS.SQLCC.v10/MS.SQLSVR.v10.en/s10de_6tsql/html/248df62a-7334-4bca-8262-235a28f4b07f.htm
		--  Any custom error loggin should be implemented here
		-- -------------------------------------------------------------------------
		IF (XACT_STATE() = 1) 
			SET @vErrorMessage	= @vErrorMessage + N'Transaction: ROLLED BACK';

			-- SQL 2000+
			RAISERROR	(	@vErrorMessage		-- { msg_id | msg_str | @local_variable }
						,	@vErrorSeverity
						,	@vErrorState		-- [ ,argument [ ,...n ] ] )  [ WITH option [ ,...n ] ]
			-- SQL Server 2014+

	-- Cleanup/ Housekeeping

/* ==================================================================================================================================
	TEMPLATE AUTHOR: $TemplateAuthor$, COMPANY: $OwnerCompany$


	---------	-----------	-----------------	---------------------------------------------------------------------------------------
	$ItemVersion$		YYYY-MM-DD	Name.Surname		Change details are.....

												<< See bottom of item for details >>
=================================================================================================================================== */
/* ---------------------------------------------------------------------------------------------
	NOTE:	This section has to be edited and run for all attributes after each edit to keep the
			database metadata up to date.
	 Add extended properties to the view (in database objects)
		1. Author name
		2. View naming specifics
		3. View version
		4. Editor name
		5. Edit date
		6. Template version
 ----------------------------------------------------------------------------------------------- */

-- Author details
EXECUTE sys.sp_addextendedproperty	@name = N'Author Name'
				,	@value = N'$ItemAuthor$' --<-- Change this only!
				,	@level0type = N'SCHEMA'
				,	@level0name = N'$SchemaName$'
				,	@level1type = N'PROCEDURE'
				,	@level1name = N'$fileinputname$'	
				,	@level2type = NULL
				,	@level2name = NULL;

-- Object description
EXECUTE sys.sp_addextendedproperty	@name = N'Functional Description'
				,	@value = N'This procedure executes for $projectname$ and. . .' --<-- Change this only!
				,	@level0type = N'SCHEMA'
				,	@level0name = N'$SchemaName$'
				,	@level1type = N'PROCEDURE'
				,	@level1name = N'$fileinputname$'	
				,	@level2type = NULL
				,	@level2name = NULL;

-- This view version
EXECUTE sys.sp_addextendedproperty	@name = N'Version'
				,	@value = N'$ItemVersion$' --<-- Change this only!
				,	@level0type = N'SCHEMA'
				,	@level0name = N'$SchemaName$'
				,	@level1type = N'PROCEDURE'
				,	@level1name = N'$fileinputname$'	
				,	@level2type = NULL
				,	@level2name = NULL;

-- Editor details
EXECUTE sys.sp_addextendedproperty		@name = N'Editor Name'
				,	@value = N'$EditorName$' --<-- Change this only
				,	@level0type = N'SCHEMA'
				,	@level0name = N'$SchemaName$'
				,	@level1type = N'PROCEDURE'
				,	@level1name = N'$fileinputname$'
				,	@level2type = NULL
				,	@level2name = NULL;
-- Edit date
EXECUTE sys.sp_addextendedproperty		@name = N'Edit Date'
				,	@value = N'$EditDate$' --<-- Change this only!
				,	@level0type = N'SCHEMA'
				,	@level0name = N'$SchemaName$'
				,	@level1type = N'PROCEDURE'
				,	@level1name = N'$fileinputname$'
				,	@level2type = NULL
				,	@level2name = NULL;	
-- Template version
EXECUTE sys.sp_addextendedproperty	@name = N'Template version'
				,	@value = N'$TemplateVersion$' --<-- Change this only!
				,	@level0type = N'SCHEMA'
				,	@level0name = N'$SchemaName$'
				,	@level1type = N'PROCEDURE'
				,	@level1name = N'$fileinputname$'
				,	@level2type = NULL
				,	@level2name = NULL;
-- End --------------------------------------------------------------------------------------

Configuring the Stored Procedure Template in Visual Studio

The template can be created in any SQL editor, viz. SSMS or VS.  Once it is created, open Visual Studio and add a Stored Procedure to an existing Project, using the default VS functionality.

VS Add new Item to Project

VS Add new Item to Project

Follow these steps:

  1. Select Stored Procedure and create it by giving it a name that would suit the template Stored Procedure.
  2. Take the newly created template SQL code for Stored Procedure (listed in the code-block above) and paste it over the text of the newly created item.
  3. Save it and then open: File >> Export Template
  4. Select Item Template radio-button from the [Export Template Wizard] and the appropriate project into which the template Stored Procedure has been saved
  5. Click the [Next] button
  6. Select the name of the Stored Procedure just created and saved in the project
  7. There should be no Item References to select so click the [Next] button
  8. Provide a Template Name, description, Icon, Preview Image, and deselect the Automatically import the template into Visual Studio (reference), keep the Display an explorer window on the output files folder selected
  9. Click the [Finish] button, and the output files folder is presented

A zip-file is created that contains three files:

  1. TemplateIcon.ic
  2. MyTemplate.vstemplate
  3. <Stored Procedure>.sql (the template SQL)

The MyTemplate.vstempate file is the one used to configure how the stored procedure is created, i.e. provide the name, description, input parameters, and other functions.

<VSTemplate Version="3.0.0" xmlns="" Type="Item">
    <Description>Default CreateStoredProcedure for Acme Corporation</Description>
    <References />
    <ProjectItem ReplaceParameters="true" TargetFileName="$fileinputname$.sql" >Z Create Stored Procedure.sql</ProjectItem>
	<CustomParameter Name="$projectname$" Value="API"/>
        <CustomParameter Name="$SchemaName$" Value="dbo"/>
	<CustomParameter Name="$ProcParam1$" Value="@InputParameter1"/>
	<CustomParameter Name="$ProcParam1DataType$" Value="INTEGER"/>
	<CustomParameter Name="$ProcParam1Value$" Value="0"/>
	<CustomParameter Name="$ProcParam2$" Value="@InputParameter2"/>
	<CustomParameter Name="$ProcParam2DataType$" Value="INTEGER"/>
	<CustomParameter Name="$ProcParam2Value$" Value="0"/>
	<CustomParameter Name="$ProcParam3$" Value="@OutputParameter"/>
	<CustomParameter Name="$ProcParam3DataType$" Value="INTEGER"/>
	<CustomParameter Name="$ProcParam3Type$" Value="OUTPUT"/>
	<CustomParameter Name="$ItemVersion$" Value="0.0.1"/>
	<CustomParameter Name="$TemplateVersion$" Value="0.0.1"/>
	<CustomParameter Name="$ItemOwner$" Value="c 2015 Acme (Pty) Ltd."/>
	<CustomParameter Name="$OwnerCompany$" Value="Acme (Pty) Ltd."/>
	<CustomParameter Name="$TemplateAuthor$" Value="Hendriekus J.Roux"/>
	<CustomParameter Name="$ItemAuthor$" Value="Hendriekus J.Roux"/>
	<CustomParameter Name="$EditorName$" Value="Hendriekus J.Roux"/>
	<CustomParameter Name="$EditDate$" Value="2015-03-09"/>

The listing of the MyTemplate.vstemplate (above) illustrates two parameter functions, viz. system parameter and custom parameters. These are placed in the template SQL code as place-holders that are filled-out either by providing a name in response to an action in Visual Studio like New >> Project or New >> Item where a name is provided by the developer.  Or, the custom parameters are filled-out in correspondence to the values provided in the MyTemplate.vstemplate file.  The latter should be considered in affinity to the functional intension of the Item, I.e. Stored Procedure in this case.

 Present the Template Available in Visual Studio

Visual Studio present a dialog when a new Stored Procedure is created.

VS Create New Item Dialog Option List

VS Create New Item Dialog Option List

The template should be presented here in a section appropriate to the company and project.  To create a new section the template is added to a folder location on the developer’s machine.  The Visual Studio custom templates are located here: <drive>:\Users\<user name>\Documents\Visual Studio 2013\Templates\ItemTemplates.

VS Custom Templates Folder

VS Custom Templates Folder

By adding templates either into these locations or by creating a new folder, say Acme Items, any new custom templates could be placed there and become available from the New Item dialog.

In this example create a folder here called: ItemTemplates\Acme Corporation\Programmability into which the zip-file (<drive>:\Users\<user>\Documents\Visual Studio 2013\My Exported Templates) is copied.

To use the new template, restart Visual Studio and add a new Stored Procedure to the template using the new template.

VS Custom Template New Item Dialog

VS Custom Template New Item Dialog

Unfortunately, at this point, none of the parameters (except the procedure name) is substituted in the actual SQL code.  These, i.e. the custom parametes must yet be added to the MyTemplate.vstemplate file.

Adding Custom Parameters to a Template

Follow these steps to add these parameters and their values:

  1. Extract the template’s zip file from its location into the [My Exported Templates] location and open the  file in Notepad
  2. Take the parameters between the <CustomParameters> tag from the code (above) and add it to the current template’s file in the appropriate location
  3. Save the file and drag the saved version from the explorer window over the zip file in the template location  and choose the overwrite option to up date the template
  4. Restart Visual Studio and create a new Stored Procedure from the template to see all parameters being populated.

Previous Article (3) [Create a Database Project with Folder Structure]. . . Next Article (4) [Coming Soon]

Visual Studio 2013 and Database Projects: Create a Database Project with Folder Structure

Visual Studio 2013 and Database Projects: Create a Database Project with Folder Structure


Data base projects in Visual Studio is a bit of an after thought on application development.  It started with Gert Drapers (the data dude) and became SSDT (SQL Server Data Tools) that had to be installed (as an add-on) to Visual Studio.  This had some varying function over the years, but was aimed at reducing the difficulty of database development in Visual Studio and keep database changes in synch across a solution (matching database and application versions) and multiple server environments through the application life-cycle.

Creating a Database Project

This is quite easy using Visual Studio 2013.  On the first landing page (can be turned off) a Start section has the option from which to create a new project, or the traditional FILE menu is just as simple.

Visual Studio for Database Development First Screen

Visual Studio for Database Development First Screen

VS Create New Database Project Dialog

VS Create New Database Project Dialog

Be mindful that if a project is created and Team Foundation Server (TFS) is being used to take care of how the solution and projects contained by the solution are demarcated.  It is not in the scope of this article to explain how to do this, but suffice to say that where solutions are stored separately from projects, to first create an empty solution and check it in.  Then the projects that are added back to the solution so that all are checked into TFS in their correct locations.

Further take not of the Visual Studio boxes, viz. Solution Explorer, SQL Server Object Explorer, and Properties for example.  These should be orientated (docked) in positions that make sense, I.e. for easy reference.  Below is an example of an instance that contains a Solution with three projects.

VS Database Project Screen

VS Database Project Screen

The two most important boxes are the Solution Explorer and the SQL Server Object Explorer.  The former depicts the Solutions and the database projects.  Where no database actually exists (this is where you create it) SQL Server Objects can be created/ added to the project, or (where a database already exists) imported by a few means.  The first is using a DACPAC (SQL Server Data-tier application extract file), or connecting directly to a SQL Server database, or create one using a script file containing all the object definitions.

Choosing a Folder Structure

Importing a database allows for the folder structure in the project to be changed.  The project folder structure should allow for ease of maintenance and readability.  The folder structure does not influence deployment.

VS Database Project Import Dialog Folder Structure

VS Database Project Import Dialog Folder Structure

An import of an existing database (where no structure is chosen, or defined dumps all files in the root of the project, which is moderately messy, so it must be carefully considered.  A folder is not created where an imported database does not have objects, e.g. Stored Procedures.  If subsequent objects are required, the folder will first have to be created.

Organising the database advances readability, consequently also maintenance, because developers are able to find things quicker.  Any database objects without any discernable grouping are dumped in the root of the project on import.

NOTE:Things that impact the display of objects and what access rights are differentiated in practice, bears on the configuration of the naming scheme.  Making use of database schemas (to group objects and assign login permissions) is an effective strategy for data architecture over using prefixes on objects like sp or pr for a Stored Procedure (source).  The folder structure in Visual Studio and SQL Server Management Studio already informs the object type and with an added schema denotes an organisational name space for such objects.  SQL Server uses a four-part naming scheme: Server.Database.DatabaseSchema.DatabaseObject.  Objects in a schema are owned by the security principle that owns the schema and a schema can contain objects owned by different users.  Any predefined schema (dbo, sys, etc) can be dropped from the model database if not required consequently preventing them from being created in new databases.  A schema owner can be changed without breaking the code, but if an object owner, say Tommy.Object is removed, an ownership change must first be performed before Tommy’s account can be deleted.  However, any fully qualified references will break in the code, e.g.  Server.Database.[tommy[.DatabaseObject. The default dbo schema is owned by the dbo user. Users assigned to the dbo schema do not inherit dbo user account permissions, but the database objects contained in the schema (source).  Using dbo as the owner of all object can greatly simplify managing objects (source).

Previous article (1) [Installation]. . . Next Article (3) [Working the Project]

Visual Studio 2013 and Database Projects: Installation

Visual Studio 2013 and Database Projects–Installation

This installation is for Visual Studio Premium.  Once the image is mounted, the folder contains the installation file for the install, i.e. vs_premium.

VS Premium Folder

VS Premium Folder

Execute the vs_premium file and agree to the License Terms and Privacy Policy, and the location of the installation. Two options are quite important:

  1. Light Switch
  2. Microsoft SQL Server Data Tools

Light Switch is a technology to shorten the development time on business applications and data services for data-centric business applications.  SQL Server Data Tools are the add-on previously (VS 2012) downloaded and installed separately, but now part of the install.  It provides database schema modelling and support for querying live databases.

The other default option is Blend, a visual user-interface design tool using XAML (web) and WYSIWYG (desktop), before part of the Microsoft Expression Studio toolset.  This is not a database specific tool.

Once installed (it takes a while) the Microsoft Account Login dialog is presented for which you are required to have credentials.  Either use an existing account or create one (reference) and then associate an MSDN subscription (licenses that your company pays for) to this identity. Note: a licence key may also be used if you don’t have an MSDN subscription to activate the software (see how).

Microsoft Account Login

Microsoft Account Login

Once logged in access your account details to find the MSDN section where the Subscriber ID and company email are used to link the MSDN profile to the Microsoft User Identity.

Once the subscription is activated the software is licensed.

MSDN Subscription Activation Dialog

MSDN Subscription Activation Dialog

Visual Studio starts with an environment set-up that can be changed at a later stage also.  Here the Database environment (as that is the purpose of this instance) is selected and the colour scheme of your choice.

VS Environment Set-up Dialog

VS Environment Set-up Dialog

This is the installation process and you have arrived at the first screen.

Visual Studio for Database Development First Screen

Visual Studio for Database Development First Screen

Next article in series. . . 2 [coming soon]

Navigate. . . Next Article (2)

Hadoop for a Data Driven Organisation

Using Big Data: Interoperability Issues

To derive meaning quickly, from huge quantities of data, has been the objective of enterprises since the inception of databases.  Data virtualisation makes large and complex data sets understandable and enables domain experts to spot underlying trends and patterns. Companies would want to learn of signs and patterns in their data to warn them, not only of hazards such as fraud or production delays, but provide insight into opportunities such as the treatment of diseases or leveraging different logistics approaches.  Traditionally, business intelligence required static reports, created from an underlying data warehouse, comprised of conspicuously formatted data in specialised data tables. Now, with a proper visualisation tool, on top of big data[1], the business analytics process may be shortcut; stakeholders use a visualisation tool to model their data requirements to see relationships and spot patterns that analytics professionals might be missing. (Linthicum, 2013).

Hadoop[2] has become the preferred platform for processing large quantities of data[3].  These large quantities may also refer to structured data that cannot be analysed in a reasonable length of time.  The better information is, and the more routinely it is put in the hands of those who need it, the higher the intelligence derived from it.  Despite this, Hadoop is a batch-oriented[4] system and cannot (as yet) answer simple queries with a good turnaround time (Hortonworks, 2013).  But, Hadoop handles data that traditional relational databases, data warehouses, and other analytic platforms have been unable to effectively manage—including user-generated data through social media and machine generated data from sensors, appliances, and applications.  That is, all forms of data.  Hadoop does this through a distributed file system, called the Hive Distributed File System – HDFS[5], providing the portability, scalability (over commodity hardware[6]) and reliability and MapReduce processing for parallel distributed systems (Srivastava, 2011).  “Once data exceeds the capabilities of a single computer, we face the problem of distributing its storage and processing.” (Downey, 2011).

Big Data - Horton Works

Big Data – Horton Works

The Open Group suggests The Need for Boundaryless Information Flow (III-RM).  The boundaryless Information Flow is essentially resolving the problem of getting information to the right people at the right time in a secure, reliable manner, in order to support the operations that are core to the extended enterprise.  According to the Boundaryless Organisation it is not implied that there are no boundaries in the flow of information, only that they should be made permeable or penetrable.  However, the information systems in place today do not allow for information to flow in support of the boundaryless organisation, but when data do, it will be a boundaryless information flow.  The Open Group published the Interoperable[7] Enterprise Business Scenario in 2001 that crystallises this need for boundaryless information flow, describing how this need drives information customers’ deployment of their information systems.  A key driver is to “create a worldwide market for interoperable information products supporting access to integrated information, in which all stakeholders’ needs are addressed.” (The Open Group, 2002).

The lack of the right information to the right person at the right time is equably a lack of interoperability, preventing organisations from achieving their business objectives.  Information as an asset, in principle, passes the point of departure “that information technology merely supports or enables the business – increasingly, information is the business.” (The Open Group, 2002).  To address the lack of interoperability between systems that provide access to data is to attend to the boundaryless information flow.  Technology infrastructure is used to facilitate interoperability by either using Common-off-the-shelf (COTS) appliances or a collection of best of breed integration[8] to reduce interoperability issues.

The Open Group (The Open Group, 2002) identifies high-level technical requirements for a boundaryless information flow architecture:

  1. Openness – standards based open interfaces need to exist for critical interfaces to enable interoperability more easily and support integration of components, for areas:
    1. Protocols
    2. Web portals
    3. Collaboration
    4. Database
    5. Storage Connections
    6. Data Integrity – the same data stored in multiple systems must be conformed to mean the same thing throughout
    7. Availability – must be guaranteed and where maintenance requires downtime, a fall-back must be available
    8. Security – support different security levels and protection based on the sensitivity and use of the data, as well as the normal requirements for data integrity, confidentiality, logging and tracking.  Must provide flexible policy management and strong authentication providing users with a single sign-on and user administration.
    9. Accessibility – solutions must provide global accessibility to the information without compromising security
    10. Manageability – solutions must be manageable
    11. Internationalisation and Localisation – accommodate multiple languages and adapt to culture where deployed internationally
Technology Architecture III-RM

Technology Architecture III-RM – The Open Group

Standardisation of the information technology of an organisation, to achieve a boundaryless information flow, must be measurable in terms of business value, improving business operations, effectiveness, and also that of the Information Technology capability in the organisation.

  • Drive revenue growth
  • Lower Information Technology expenditure
  • Increase % of procurements against standards
  • Reduce spend on customisations
  • Improve cycle time for rolling out upgrades

In the Business Intelligence sphere, a few key trends are identified with the emerging use of big data (Linthicum, 2013):

  • The ability to leverage both structured and unstructured data, and visualise that data
  • Ability to imply structure at the time of analysis, to provide flexibility from the underlying structure of the data (be it structured or unstructured) through decoupling
  • Leverage current or near-real-time data for critical information
  • Allowing BI analyst to mash up data that are outside of the organisation (like the cloud) to enhance the analysis process
  • Ability to bind data analytics to business processes and applications to allow issue resolution without human intervention, i.e. known as embedded analytics
Data and Hadoop Architecture

Data and Hadoop Architecture

An organisation should seek to support of the boundaryless information flow idea in any strategy it evaluates for information integration.  Hadoop, being a framework in support of Big Data, i.e. to provision results/ data over large heterogeneous data sources to analytics tools, or data stores, is a technology to be considered in support of the boundaryless information flow.

Business Intelligence and Big Data

Big data analytics has a potential, which is at odds with the traditional Business Intelligence approach.  To force this existing paradigm on big data analytics is missing the full potential of this technology, or to fail altogether (Linthicum, 2013).

In the previous two decades, calculating summaries, sums and averages were sufficient, but large complex data require new techniques.  To recognise customer preferences requires the analysis of purchase history, browsing behaviour, products viewed, comments, suggestions, complaints, to predict future behaviours (Cloudera, 2013).

Hadoop[9], being able to work on Big Data, structured, and/ or unstructured (heterogeneous sources), is ideally positioned for provisioning data for Business Intelligence.  Problems like advertisement targeting, recommendations engine (predicting customer preferences), customer churn (attrition) analyses, risk modelling, point-of-sale transactional analysis, predicting network data failures, threat analysis, and trade surveillance[10], can be solved with the ability of Hadoop (Cloudera, 2013).

The challenge with Big Data is the transformation of it into intelligent data.  Because Hadoop is primarily a batch-orientated processing system, it is very suitable for Data Analytics (Payandeh, 2013).  It is the crow-bar (no pun intended) that wrests open the capacious data-lake to parse, cleanse, apply structure and transform the data and arranges  it for consumption by analytics applications.

HADOOP Specifically

The most popular spearhead of Big Data, right now, appears to be Hadoop (Kernochan, 2011).  Hadoop is open-source software that achieves reliable, scalable, and distributed computing of large data sets across clusters of computers using programming models, designed to scale from a single to thousands of machines (Welcome to Apache™ Hadoop®!, 2013).  Hadoop is a framework that supports data-intensive distributed applications that run in parallel on large clusters of commodity hardware that provide both reliability and data motion to applications.  By using MapReduce[11][12] (reducing the workload into smaller fragments, parallelising the effort over multiple servers) and a distributed file system that stores data on the compute nodes, it provides a huge aggregated bandwidth across the cluster, as well as the automatic handling of failures by the framework.  “It enables applications to work with thousands of computational-independent computers and petabytes of data.” (WikiPedia, 2013).

Hadoop is also more cost-effective than traditional analytics platforms such as data warehouses, and it moves analytics closer to where the data is.  It may commonly be used for optimising internal data management operations, but it is rapidly gaining ground as a strategic big data analytics platform (Big Data Analytics: Unleashing the Power of Hadoop, 2013).

The Apache-Hadoop platform is comprise of:

  1. Common – common utilities that support the Hadoop modules
  2. MapReduce – YARN-based system for parallel processing of large data sets
  3. Distributed File System (HDFS) – distributed file system for high throughput access to application data
  4. YARN – framework for job scheduling and cluster resource management

Other related projects are:

  • Amabar – web-based tool for provisioning, managing, and monitoring Hadoop clusters, with dashboard
  • Avro – data serialisation system
  • Cassandra – scalable multi-master database with no single points of failure
  • Chukwa – data collection system for managing distributed systems
  • HBase – scalable, distributed database that supports structured data storage for large tables
  • Hive – a data warehouse infrastructure that provides data summarisation and adhoc querying
  • Pig – high level data-flow language and execution framework for parallel computation
  • Zookeeper – high-performance coordination service for distributed applications

The Hadoop File System (HDFS) can store huge and many files to scale across the commodity nodes, breaking down enterprise data silos[13], this common data globule is referred to as a “data lake” (Gualtieri, 2013).  These “data lakes” are queried by MapReduce, a distributed data processing query, in a highly parallel fashion.

Dealing with only files on disk, essentially indicates a lack of structure for the content, data maps are applied over the unstructured content to define the core metadata for it.  For example, using Hive to provide a mechanism to project structure onto the data allows the data to be queried by a SQL-like language called HiveQL, or Pig, a MapReduce mechanism.  A BI tool will thus obtain file metadata from the cluster (serving from both structured and unstructured sources) and retrieve the data from the data nodes into the structure using MapReduce to get to the required data, in parallel, and return the data to a BI visualisation tool, e.g. Tableau[14], or into other types of processing, into specific data structures, e.g. a dimensional model for complex analytical processing (Linthicum, 2013).

Big data is usually only processed periodically, e.g. daily or weekly.  During processing, the load is dispersed over multiple processing nodes to speed it up.  But, after the query is executed those nodes are no longer required until the next run.  This is indicative of a perfect expansion-contraction pattern.  To reduce cost and maximise hardware utilisation, it would be a boon to have the fabric dynamically provisioning the nodes in the grid, to jobs that are executing, and so provide an agile infrastructure.  Hadoop should be thought of as a distributed operating system with many computers storing and processing data as if formed on a single machine (Ken Krugler[15]).  Adding to this, the ability to host Hadoop in the Cloud, to achieve an agile infrastructure, sees efficiency maximised and costs reduced for a multiple number of use-cases (Downey, 2011).

Hadoop is essentially used for (Julio, 2009):

  • Searching
  • Log processing
  • Recommendation systems
  • Analytics
  • Video and Image analysis
  • Data Retention
General BI Scenario with Hadoop

General BI Scenario with Hadoop

Visualisation of Data and Hadoop

Hadoop is supported by Data Virtualisation technologies like Informatica, Composite Server, and Denodo.  Visualisation technologies like Tableau[16] can connect to multiple flavours of Hadoop, to bring data in-memory and perform fast adhoc visualisations to find patterns and outliers, in all the data in the Hadoop cluster.  It is said about Tableau that it is a very elegant solution, obviating the need to move huge data into a relational store before analysing it (Tableau, 2013).   Tableau uses ODBC to connect to the Cloudera connector for Tableau that uses HiveQL to connect to Hive and MapReduce over the HDFS and HBase, Hadoop (Popescu, 2012).

A visualisation tool (in this example Tableau) is connected to Hadoop from which to build visualisations just like when connected to a traditional database.  Tableau achieves “on-the-fly ETL” by connecting to Hive with Custom SQL, allowing the batch-orientated[17] nature of Hadoop to handle layers of analytical queries on top of complex Custom SQL with only an incremental increase in time.

Although Hadoop is essentially a batch-orientated system, there are a number of techniques for improving the performance of visualisations and dashboards, built from data stored on a Hadoop cluster.  Custom SQL limits the data-set size to speed up exploring a new data set and build initial visualisations.   Extracting data for Tableau[18], although Tableau is capable of handling large sets of data, it is nowhere near as well positioned as Hadoop to handle really large, or Big Data loads.  An extract can be pre-formatted for Tableau, by using Hadoop to roll up dates, hide unused fields, and aggregate visible dimensions, to create a broader view of the data.  Hive can, for example, partition records on disk and allow tableau to direct a query, aligned to these partitions, to greatly reduce the time spent on getting to the right data (Hortonworks, 2013).

HADOOP Use-case

Why is Hadoop most typically the technology underpinning “Big Data”?  Data comes from a set of data sources, e.g. CRM, ERP, and Custom Applications.  This data has to be queried somehow, either via an elaborate point-to-point integration, a virtual data store, data warehouse or even a Massively Parallel Processing Systems (MPP), e.g. Microsoft’s Parallel Data Warehouse.  The application doing the query is probably some high-end business intelligence tool, e.g. SAS.  As other unstructured machine generated or social origin data are added, the data integration becomes more complex.

Hadoop is not replacing traditional data systems, but is complementary to them (Hortonworks, Apache Hadoop Patterns of Use, 2013).

Enterprise Data Architecture

Enterprise Data Architecture

The traditional Enterprise Data Architecture is indicated inside the red-marked area, with the additionally added Hadoop ability.  The specific use of Hadoop is directed by a few use-cases or patterns.


Observing analytic challenges from an infrastructure level, some clear patterns emerge and can fit into the following three patterns (Hortonworks, Apache Hadoop Patterns of Use, 2013):

  1. Hadoop as a Data Refinery
  2. Data Exploration with Hadoop
  3. Application Enrichment

Hadoop as a Data Refinery

Refine data sources into their commonly used analytics applications, e.g. providing a central view of Customer from all the data about them in either the ERP, CRM or bespoke systems.  This pattern is especially powerful where Customer data are incorporated from their web sessions on the enterprise’s web site, to see what they have accessed and what they are interested in.

This pattern uses Hadoop[19] to parse, cleanse, apply structure and transform such data and then push it into an analytics store for use with existing analytics tools.  This pattern uses Hadoop as a type of staging environment between the source and the analytics tools, with Hadoop being the transformation agent.

Data Exploration with Hadoop

Hadoop is used directly on the data to explore it.  It still acts the role of parser, cleaner, structure and transformer, but it provisions the data to visualisation tools, e.g. Tableau (Tableau, 2013).  Financial institutions can use this pattern to discover identity fraud, i.e. perform some kind of surveillance activity.

Application Enrichment

Data, stored in Hadoop, is being used to impact an application’s behaviour, e.g. a web application using a user’s access patterns to curtail its depiction to suit the returning user’s usage characteristics.  Hadoop parses, cleanses, applies structure and transforms the data and passes it directly into the application.  This is an ability used by the large web companies in the world, like Yahoo and Facebook.  The right data is enriched at the right time to the right customer.  Massive amounts of patterns and repeatable behaviour are identified in order to customise the web-application experience of the user, by serving the right content to the right person, at the right time, increasing the conversation rates for purchase.

End Notes

[1] Too big for Excel is not Big Data! (Chris, 2013).  The New York Stock Exchange generates about 1 TB of new trading data per day, Facebook hosts about 10 billion photos (1 Petabyte), stores 2.5 Petabytes of genealogy data, and the Internet Archives stores around 2 Petabytes, growing at a rate of 20 Tb per month (Seminar Report Downloads, 2012).

[2] It was first conceived as a web search engine for yahoo (Big Data Analytics: Unleashing the Power of Hadoop, 2013)

[3] A (subjective) figure of twenty Terra Bytes is cited as a tipping point for considering Hadoop (Chris, 2013) “Rama Ramasamy”.

[4] There are projects that provide real-time capabilities to applications for results from Hadoop, but it is primarily a batch-orientated processing solution (HandsonERP & Mir, 2013).

[5] LZO or Snappy Compression can be applied to HDFS files to reduce the bytes read or written thus improving the efficiency of network bandwidth and disk space (Julio, 2009).

[6] Easy to obtain, available and affordable hardware that is not unique in any function.

[7] The ability of two or more entities or components to exchange information

and to use the information that has been exchanged “to meet a defined mission or objective; in

this scenario this term specifically relates to access to information and the infrastructure that supports it. (The Open Group, 2002)

[8] The process of combining components into an effective overall system. In this scenario the phrase “access to integrated information” is used repeatedly. The term “integrated information” means an overall system in which a single view of information is presented, and is achieved by combining various information sources (The Open Group, 2002).

[9] By 2015, 50% of Enterprise data will be processed by Hadoop (HandsonERP & Mir, 2013).

[10][10] A Hadoop cluster can run alongside a trading system obtaining copies of the trading data, coupled with the reference data of parties to the trade, the system can continually monitor trade activity to build relationships between people and organisations trading with each other, watching for patterns that reflect rogue trading (Cloudera, 2013).

[11] MapReduce is a programming model for processing large data sets with a parallel and distributed algorithm on a cluster, doing filtering and sorting on the distributed servers, running various tasks in parallel, managing all communications and data transfers between the various parts of the system, providing for redundancy and fault tolerance and overall management of the whole process (WikiPedia, 2013).

[12] The progress estimator for MapReduce in Hadoop is simplistic, and often produces inaccurate estimates.

[13] Collapsing data silos creates a pro-BI data base, on which to position BI and Visualisation tools.

[14] To use Hortonworks’ online Hadoop data source for Tableau, connect to the Hortonworks Hadoop Hive Connection of Tableau to (Sandbox VM) to start visualising data from Hadoop (Hortonworks, HOW TO: Connect Tableau to Hortonworks Sandbox, 2013).  Get the driver from here:


[16] Through a partnership with Cloudera, Tableau software has built a Hadoop connector for doing big data visualisation (Williams, 2012).  Also see footnote  14 on how to use Hortonworks online Hadoop data source for Tableau.

[17] Hadoop is not yet capable of answering simple queries with very quick turnaround. (Hortonworks, 2013).

[18] Hadoop query cancellation is not straight forward.  Queries used by Tableau can only be abandoned, but the query continues on the Hadoop cluster, consuming resources (Hortonworks, 2013).

[19] Large quantities of data are distilled to something more manageable (Hortonworks, Apache Hadoop Patterns of Use, 2013).


Big Data Analytics: Unleashing the Power of Hadoop. (2013, August 21). Retrieved from Database Trends and Applications:

Chris, S. (2013, September 16). Don’t use Hadoop – your data isn’t that big. Retrieved from Chris Stucchio:

Cloudera. (2013). TEN COMMON . Chicago: Cloudera. Retrieved from

Downey, J. (2011, July 1). Big Data and Hadoop: A Coud Use Case. Retrieved from Jim Downey:

Gualtieri, M. (2013, october 23). 5 Reasons Hadoop Is Kicking Can and Taking Names. Retrieved from Information Management :

HandsonERP, & Mir, H. (2013, April 5). Hadoop Tutorial 1 – What is Hadoop? Retrieved from YouTube:

Hortonworks. (2013). Apache Hadoop Patterns of Use. Chigago: Hortonworks. Retrieved from

Hortonworks. (2013, August 26). Best Practices for Hadoop Data Analysis with Tableau and Hortonworks Data Platform. Retrieved from SlideShare:

Hortonworks. (2013, Novemeber 07). HOW TO: Connect Tableau to Hortonworks Sandbox. Retrieved from Hortonworks KB:

Julio, P. (2009, October 19). What is Hadoop used for>. Retrieved from SlideShare:

Kernochan, W. (2011, October 20). Big Data, MapReduce, Hadoop, NoSQL: The Relational Technology Behind the Curtain. Retrieved from enterpriseappstoday:

Linthicum, D. (2013). Making Sense of Big Data. Chicago: IBM.

Payandeh, F. (2013, September 8). Hadoop vs. NoSql vs. Sql vs. NewSql By Example. Retrieved from Big Data Studio:

Popescu, A. (2012, February 8). Visualizing Hadoop data with Tableau Software and Cloudera Connector for Tableau. Retrieved from myNoSQL:

Seminar Report Downloads. (2012, February 06). Hadoop. Retrieved from Seminar Report Downloads:

Srivastava, A. (2011, April 26). A Hadoop Use Case . Retrieved from

Tableau. (2013, October 25). Hadoop. Retrieved from Tableausoftwar:

The Open Group. (2002). Interoperable Enterprise Business Scenario. Chicago: The Open Group.

Welcome to Apache™ Hadoop®! (2013, October 24). Retrieved from

WikiPedia. (2013, October 23). Apache Hadoop. Retrieved from WikiPedia:

WikiPedia. (2013, October 19). MapReduce. Retrieved from WikiPedia:

Williams, A. (2012, November 9). Hadoop. Retrieved from Siliconangle:

Microsoft® SQL Server™ AlwaysOn™ for Performance Mitigation?

Executive Summary

Microsoft® SQL Server™ AlwaysOn™ is a database clustering technology intended for high availability (HA) and Disaster Recovery (DR) with the ability to accommodate other beneficial workloads, such as read-only reporting query load sharing and database backups.

This article investigates using AlwaysOn™ as a performance optimisation (load sharing) mechanism for read-only reporting purposes.  In conclusion and by association with the intended purpose of AlwaysOn™, this function is not a preferred performance mitigation strategy for database access in an enterprise.  However, this does not imply that performance enhancements are not achievable, quite the contrary.

Firstly, AlwaysOn™ is not intrinsically designed for performance mitigation (between OLTP and Reporting workloads) reasons and the mitigation of concurrency, although it does provide secondary benefits to alleviate such contentions.  SQL Server 2012 can be positioned as a less-performant database system, when compared to SQL Server 2014, simply because of technological advancements in the latter.  This, specifically because of SQL Server 2014’s in-memory (Hekaton) ability and the ColumnStore indices of the Data Warehouse, amongst many new enhancements.

However, Hekaton is an OLTP beneficial enhancement, maybe even resulting in the removal of the need to have a secondary database over which to defray concurrency by sharing the load, as a result of the significant performance gains achieved by the technology.  However, AlwaysOn™ for geographically dispersed databases, is a superb solution for brokering regionalised access to data, discharging connections and load across the database system.  But, for a single enterprise architecture, mitigating performance for reporting style read-only queries, either by federation or a replication architecture, assuming that AlwaysOn™ as the perfect replacement, is erroneous.

Great performance gains are achievable through indexing secondary databases, via the primary, but requires them to be the same on both, despite different workloads.  Also, despite the additional performance benefits of the temporary query statistics in the tempdb and the compounded performance gains of SQL Server 2014, it does not herald AlwaysOn™ as the de facto load balancing mechanism for offloading reporting workloads by virtue of the database’s ability.  For this reason, it is not a preferable mechanism to spawn active secondary databases in lieu of any existing replicated databases, just because we can, thinking this is how to upgrade our replicated model.  It is probably more possible that by using SQL Server 2014 with Hekaton, increasing the hardware specification that the replicated databases can be abolished altogether, because of a much increased native ability of the SQL Server 2014.  With its in-memory abilities, any[1] amount of transactional or reporting queries should be tolerated by the database server.  Rationalising a proper enterprise data access strategy, so as to avoid the unnecessary implementation of business logic (modelling) on the database, is achievable through other technologies in augmentation of SQL Server 2014’s new performance abilities, e.g. Data Virtualisation, which is only as good as the underlying database performance.  AlwaysOn™ should not be seen as the panacea to these problems, because of the technology enhancements of the actual database server.

This article concludes that AlwaysOn™ should be used for its intended purpose, i.e. High Availability and Disaster Recovery, and in a limited capacity to defray concurrency and performance, except in the case of geographically dispersed database.  It should not be applied to mitigate performance issues between OLTP and Reporting queries as the preferred method, but rather that a more apt enterprise data access strategy be architected to benefit enterprise database performance issues.

[1] Factors more than with SQL Server 2012, so much so that we could almost (by comparison) say any amount of queries.


SQL Server® is a Microsoft™ database technology for enterprise relational database management systems.  It declares a feature called AlwaysOn™ that is essentially a High Availability, for planned and unplanned disaster recovery purposes, something akin to (other high availability options) such as replication, log shipping, sql clustering and database mirroring.  AlwaysOn™ addresses the functional needs that had been acquired from the aforementioned architectural patterns, in an attempt to augment and simplify their respective limitations, by providing the following abilities:

  • Database Replicas, up for five, one primary and four[1] secondary stored
  • High Availability and Disaster Recovery, as an enhanced mirroring with read-only access
  • Synchronous and asynchronous transactions between primary (R/W) and replicas (R), with replicas using snapshots to avoid blocking
  • Use replicas for reporting access
  • Automatic failover

A combination of the best of failover clustering and database mirroring is the new HA/ DR feature of an Availability Group[2] of the AlwaysOn™ function.  An Availability Group is a group of databases that fails over, together (as a unit) from one replica of SQL Server to another replica within the same availability group (Ali, 2012).  Each availability replica resides on a different node of a single Windows Server failover cluster (WSFC)

This article considers only the viability of using AlwaysOn™ for the purpose of offsetting reporting workloads[3].

To realise how to leverage AlwaysOn™ for performance optimisation is to understand how the primary and replica databases can be used for this purpose.

Active Secondary Capabilities

Using active secondary[4] capabilities provides better resource utilisation (lowers cost) by offloading read intent applications[5] and backup jobs to secondary replicas.  This reduces the load on a primary[6] database (AlwaysOn Availability Groups (SQL Server), 2012).  The readable secondary, although providing read-only access, is not set to read-only, but is dynamic.  On a typical secondary, the data is near real-time, i.e. only a few seconds (Active Secondaries: Readable Secondary Replicas (AlwaysOn Availability Groups), 2012).

The deflection of workloads from the primary database is achieved by directing read-only connections to a readable secondary replica, but such a connection should tolerate latency.  To remove blocking, the read-only workloads use row versioning[7] and are automatically mapped to a snapshot isolation transaction level, in spite of the actual explicit setting, and in ignorance of locking hints.  Read-only routing is achieved by employing an availability group, listener to which the client directs the connection, as a read-only intent. There is no such thing as load balancing[8] because the connection is routed to the first available and readable secondary on the routing list of the primary.   A secondary database can influence performance on the primary because of the snapshot isolation transaction level modus. Ghost record lean-up on the primary can be blocked by transactions on a secondary.  For example (in an extreme case) if a secondary database has a long running query or when it gets disconnected or the data movement is suspended, it prevents log truncation, the secondary may have to be removed from the availability group to release the primary database’s holding pattern[9]. (Active Secondaries: Readable Secondary Replicas (AlwaysOn Availability Groups), 2012).

Indexing & Statistics

To deflect workloads to a replica in the availability group, for reporting purposes, yields a query pattern that is significantly different from a transactional database.  Presenting a more optimal performance yield, requires that the replica be able to curtail its response in a pro-reporting query fashion.  Schema or data changes, on a secondary database, must come from the primary, because of the read-only attribute of the replica.  Creating the indices on the primary allows the redo process to transfer them to the replicas.

To optimise query plans on the read-only replicas, different from that on the primary (but in addition to those on the primary) the statistics on the secondary is created in tempdb[10].  These statistics are temporary, as read-only statistics that are only created by SQL Server, but can be dropped by the user.

SQL Server 2014 a Better Option?

Almost always, a standard line from a Microsoft consultant is, “upgrade to the next version” for every existing version’s weaknesses.  So, what does SQL Server 2014 offer for performance mitigation between OLTP and reporting workloads, such as can be achieved in SQL Server 2012 with AlwaysOn™?  SQL Server 2014’s in-memory database ability[11] (Hekaton) is integrated with AlwaysOn™ by having the tables also readable in the secondary[12] (Diaconu, et al., 2013).  Hekaton, however is an OLTP[13] optimisation, using in-memory direct pointers (in place of disk-stored-indices), not indirectly such as page/ row ids to address data.  Hekaton is integrated into SQL Server 2014. (Larson, Zwilling, & Farlee, 2013).

But, does Hekaton benefit an active secondary for the purpose of offloading query pressure from the primary?  There are a few major Hekaton improvements:

  • In-memory tables[14]
  • Indices, hash and range with direct pointers to data, all in memory[15]
  • Stored Procedures compiled to native machine code
  • High level of concurrency[16] not relying on portioning, using latch free data structures to avoid physical interference amongst threads with a new optimistic, multi-version concurrency control technique to reduce interference among transactions. (Larson, Zwilling, & Farlee, 2013).

From a published AlwaysOn™ features perspective, SQL Server 2014 provides the following enhancements over SQL Server 2012:

  • Enhanced Availability for read-only Replicas
  • Increased Number of Replicas
  • Integration with Windows Azure
  • Enhanced Diagnostics

These don’t, on face value, translate to any material benefit over SQL Server 2012, by just looking at what’s new in AlwaysOn™.  But, if the advances of Hekaton is considered, having all of those features available in the secondary databases, provides a significant performance ability yield, but that by virtue of the database engine and not because of AlwaysOn™.

The Column Store Index option[17] (Data Warehousing) is part of the in-memory ability of SQL Server 2014 providing greater data compression for more real-time analytics support.  The resource governor of SQL 2014 also provides the ability to perform IO management (Team, SQL Server 2014: A Closer Look, 2013).  It also includes buffer-pool-extension support for solid state drives, enabling faster paging (Foley, 2013).  All of these, yet again, are database engine optimisations and enhancements, and not AlwaysOn™.  So, any performance gains are because of the database technology and not as a result of the AlwaysOn™ feature and any metrics thus achieved, should be attributed to the database technology only.

[1] This is a SQL Server 2012 constraint that is doubled with SQL Server 2014 (Otey, 2013).

[2] Availability groups cannot be created across domains, nor against multiple clusters in the same domain, nor having two replicas on the same node, part of the same group, neither replicate system databases, neither perform cross database transactions in a group (Golla, 2012).

[3] Scaling read-only workloads to replicas is available only in the SQL Server Enterprise Edition and attracts a full new server licence for every active replica. “When the secondary server does any work at all, like taking requests or running backups off of it, or any of the readable features of AlwaysOn are enabled then it must be fully licensed” (Hughes, 2012).

[4] Change tracking and change data capture are not available on secondary databases that belong to a readable secondary replica.

[5] This is very advantageous specifically for geographically distributed instances of a databases

[6] In SQL Server 2014, the relationship between the primary and the replica is more stable, in that if the primary goes down, the replica is still usable (Ozar, 2013).

[7] Row versioning increases data storage in the primary and secondary, although version data is not generated by the primary database.  A read-committed snapshot isolation (RCSI) level setting is turned on for the primary database.

[8] In SQL Server 2014, load balancing can be implemented using simple DNS round-robin or specialised load balancing solutions, either hard or software based. (Team, AlwaysOn in SQL Server 2014 CTP1, 2013).

[9] However, if ghost records (on the primary) is still needed by the secondary, a DBCC SHRINKFILE on the primary database log may fail.

[10] Any server restart flushes all read-only replica statistics customised for the read queries.

[11] In-memory performance is achievable across OLTP, DW and BI (Underwood, 2013).

[12] Failover time is not dependent on size of durable memory optimised tables (Underwood, 2013).

[13] OLTP-like, i.e. short lived transactions, high degree of concurrency, e.g. stock trading, order processing, or travel reservations.  Long running transactions should be avoided.

[14] Max size SCHEMA_AND_DATE = 512 GB (Underwood, 2013)

[15] In the event of a failure, all tables and indices are rebuilt entirely from the latest checkpoint and logs.

[16] Concurrency is obtained through multi-versioning per update per user.

[17] Column Store Indices can be updated for real-time data without dropping and recreating them (Belnekar, 2013).  In-memory ColumnStores provide fast execution for data warehouse queries and eliminates the need for other indices.  Microsoft’s PDW makes use of the in-memory ColumnStore in its Delta Store, converting data into a columnar format once the segment (1M rows) is full and is the preferred storage engine of the PDW. (Underwood, 2013).

Works Cited

Active Secondaries: Readable Secondary Replicas (AlwaysOn Availability Groups). (2012). Retrieved from Microsoft Technet:

Ali, A. (2012, January 19). Using The AlwaysOn Feature of SQL Server 2012. Retrieved from SQL Server Performance:

AlwaysOn Availability Groups (SQL Server). (2012). Retrieved from Microsoft Technet:

Belnekar, P. (2013, June 18). Priti Belnekar’s Blogs. Retrieved from Blogspot:

Diaconu, C., Freedman, C., Ismert, E., Larson, P.-A., Mittal, P., Stonecipher, R., . . . Zwilling, M. (2013). Hekaton: SQL Server’s Memory-Optimized OLTP Engine. Redmond: Microsoft. Retrieved from

Foley, M. (2013, June 11). Microsoft’s SQL Server 2014: More than just in-memory OLTP. Retrieved from ZDNet:

Golla, K. (2012, September 17). SQL Server 2012: Always On FAQs. Retrieved from Blogs MSDN:

Hughes, J. (2012, October 17). Does Sql Server 2012 Always On support Active/Passive? Retrieved from Stack Exchange:

Larson, P.-A., Zwilling, M., & Farlee, K. (2013). The Hekaton Memory-Optimized OLTP Engine. Redmond: Microsoft. Retrieved from

Otey, M. (2013, September 20). Changes to SQL Server 2014 AlwaysOn Availability Groups. Retrieved from SQL Seerver Mag:

Ozar, B. (2013, June 03). (Almost) Everything You Need to Know About SQL Server 2014. Retrieved from Brent Ozar:

Team, S. (2013, June 27). AlwaysOn in SQL Server 2014 CTP1. Retrieved from SQL Server Blog:

Team, S. (2013, June 05). SQL Server 2014: A Closer Look. Retrieved from SQL Server Blog:

Underwood, J. (2013, September 22). SQL Server 2014 In-Memory, High Performance and Mission Critical Overview. Retrieved from Slideshare: