Accessing Data in .NET
In This Chapter
Far and away the single most important technique for corporate VB developers is data access. Most of the applications written in today's corporate environments involve displaying and manipulating operational data and line of business data. Historically, Microsoft has provided support for accessing data in a variety of relational and nonrelational data stores by shipping APIs that existing VB developers have become quite familiar with over the years including VBSQL, ODBC, DAO, RDO, OLE DB, and ADO. Although this continued evolution meant that VB developers had to learn new techniques along the way, the benefits of learning the models meant a unified relational data access model (ODBC), increased simplicity (DAO), increased performance (RDO), and increased reach to nonrelational sources (ADO).
To that list you can now add the data access classes of the Services Framework. These classes, collectively termed ADO.NET, serve to implement a managed interface to OLE DB and SQL Server, increase performance when using SQL Server, and allow data to be manipulated in a fashion commensurate with distributed application development utilizing the Internet and XML. Even though learning a new data access model may at first be daunting, it will also help you build modern, distributed applications.
In this chapter, we'll discuss the architecture of ADO.NET and how it can be used to build distributed applications. This and the three chapters that follow form a progression that illustrates the techniques useful in VB.NET to build distributed applications. The example code discussed in this and subsequent chapters illustrates a somewhat simplified implementation of Quilogy's online education system where students enroll in classes over the Web and Quilogy employees view and manipulate data through an Intranet site. The enrollment data is stored in a SQL Server 2000 database so where appropriate, specific features of SQL Server 2000 will be utilized.
Because data is the foundation for applications such as this one we'll begin with a discussion of accessing data in the .NET world.
As mentioned previously, ADO.NET is comprised of classes found in the System.Data namespace that encapsulate data access for distributed applications. However, rather than simply mapping the existing ADO object model to .NET to provide a managed interface to OLE DB and SQL Server, ADO.NET changes the way data is stored and marshaled within and between applications. The primary reason ADO.NET redefines this architecture is that most applications developed today can benefit from the scalability and flexibility of being able to distribute data across the Internet in a disconnected fashion.
Because the classic ADO model was developed primarily with continuously connected access in mind, creating distributed applications with it is somewhat limiting. A typical example is the need to move data through a Recordset object between tiers in a distributed application. To accomplish this in classic ADO you have to specifically create a disconnected Recordset using a combination of properties including cursor location, cursor type, and lock type. In addition, because the Recordset is represented in a proprietary binary format, you have to rely on COM marshalling code built into OLE DB to allow the Recordset to be passed by value (ByVal) to another component or client code. This architecture also runs into problems when attempting to pass recordsets through firewalls because these system level requests are often denied.
On the other hand, if you elected not to use disconnected recordsets, you had to devise your own scheme to represent the data using Variant arrays, delimited within in a string, or saved as tabular XML using the Save method (although the latter option is really only viable when using ADO 2.5 and higher). Obviously these approaches have their downside because they run into problems with performance and maintainability not to mention interoperability between platforms.
In addition, the classic ADO model doesn't handle hierarchical data particularly well. Although it is possible to create hierarchical recordsets using the Microsoft data shape provider, it is not simple and is therefore not often used. Typically JOIN clauses are used inside stored procedures or inline SQL to retrieve data from multiple tables. However, this does not allow you to assemble data from multiple data sources and easily determine from where the data comes. As a result, classic ADO provides a flat view of data that is not strongly typed.
To alleviate these problems, ADO.NET is built from the ground up for distributed applications used in today's disconnected scenarios. For example, the central class in ADO.NET is the DataSet, which can be thought of as an in-memory XML database that stores related tables, relationships, and constraints. As you'll see, the DataSet is the primary mechanism used in VB.NET applications to cache data and pass it between tiers in a distributed application thereby alleviating the need to rely on proprietary schemes or COM marshalling.
Using XML alleviates several of the burdens of classic ADO. For example, by storing the data as XML it can easily pass through firewalls without special configuration. In addition, by storing related tables and representing the relationships between those tables the DataSet can store data hierarchically allowing for the easy manipulation of parent/child relationships. The self-describing nature of XML combined with the object-oriented nature of VB.NET also allows for direct programmatic access to the data in a DataSet in a strongly typed fashion. In other words, the data need not be accessed using a tables, rows, and columns metaphor but can be accessed in terms of the definition of the data that can be type checked by the compiler.
Furthermore, this disconnected model combined with connection pooling schemes frees resources on the database server more quickly, allowing applications to scale by not holding on to expensive database connections and locks.
In the following section we'll dig deeper to take a look at how ADO.NET is designed with the goals of disconnected access, scalability, and interoperability in mind.
Relationship to OLE DB and ADO
Some readers may get the impression that ADO.NET replaces the existing ADO and OLE DB architecture. On the contrary, you should think of ADO.NET as an enhancement to this technology for distributed application development that, for the most part, relies on the underlying OLE DB and ADO architecture. For example, ADO.NET does not contain managed code to natively access any data source other than SQL Server. For all other data sources OLE DB providers are required.
In fact, because ADO.NET was designed for disconnected and distributed applications it might not be suitable for all types of applications you need to create with VB.NET, especially those that rely on server side cursors and are continuously connected. As a result, you may want to use classic ADO with VB.NET through the COM Interop layer as discussed in Chapter 8.