Designing APIs Related to LINQ Support
9.6 LINQ Support
Writing applications that interact with data sources, such as databases, XML documents, or Web Services, was made easier in the .NET Framework 3.5 with the addition of a set of features collectively referred to as LINQ (Language-Integrated Query). The following sections provide a very brief overview of LINQ and list guidelines for designing APIs related to LINQ support, including the so-called Query Pattern.
9.6.1 Overview of LINQ
Quite often, programming requires processing over sets of values. Examples include extracting the list of the most recently added books from a database of products, finding the e-mail address of a person in a directory service such as Active Directory, transforming parts of an XML document to HTML to allow for Web publishing, or something as frequent as looking up a value in a hashtable. LINQ allows for a uniform language-integrated programming model for querying datasets, independent of the technology used to store that data.
In terms of concrete language features and libraries, LINQ is embodied as:
- A specification of the notion of extension methods. These are described in detail in section 5.6.
- Lambda expressions, a language feature for defining anonymous delegates.
- New types representing generic delegates to functions and procedures: Func<...> and Action<...>.
- Representation of a delay-compiled delegate, the Expression<...> family of types.
- A definition of a new interface, System.Linq.IQueryable<T>.
- The Query Pattern, a specification of a set of methods a type must provide in order to be considered as a LINQ provider. A reference implementation of the pattern can be found in System.Linq.Enumerable class. Details of the pattern will be discussed later in this chapter.
- Query Expressions, an extension to language syntax allowing for queries to be expressed in an alternative, SQL-like format.
//using extension methods: var names = set.Where(x => x.Age>20).Select(x=>x.Name); //using SQL-like syntax: var names = from x in set where x.Age>20 select x.Name;
9.6.2 Ways of Implementing LINQ Support
There are three ways by which a type can support LINQ queries:
- The type can implement IEnumerable<T> (or an interface derived from it).
- The type can implement IQueryable<T>.
- The type can implement the Query Pattern.
The following sections will help you choose the right method of supporting LINQ.
9.6.3 Supporting LINQ through IEnumerable<T>
DO implement IEnumerable<T> to enable basic LINQ support.
Such basic support should be sufficient for most in-memory datasets. The basic LINQ support will use the extension methods on IEnumerable<T> provided in the .NET Framework. For example, simply define as follows:
public class RangeOfInt32s : IEnumerable<int> { public IEnumerator<int> GetEnumerator() {...} IEnumerator IEnumerable.GetEnumerator() {...} }
Doing so allows for the following code, despite the fact that RangeOfInt32s did not implement a Where method:
var a = new RangeOfInt32s(); var b = a.Where(x => x>10);
CONSIDER implementing ICollection<T> to improve performance of query operators.
For example, the System.Linq.Enumerable.Count method's default implementation simply iterates over the collection. Specific collection types can optimize their implementation of this method, since they often offer an O(1) – complexity mechanism for finding the size of the collection.
CONSIDER supporting selected methods of System.Linq.Enumerable or the Query Pattern (see section 9.6.5) directly on new types implementing IEnumerable<T> if it is desirable to override the default System.Linq.Enumerable implementation (e.g., for performance optimization reasons).
9.6.4 Supporting LINQ through IQueryable<T>
CONSIDER implementing IQueryable<T> when access to the query expression, passed to members of IQueryable, is necessary.
When querying potentially large datasets generated by remote processes or machines, it might be beneficial to execute the query remotely. An example of such a dataset is a database, a directory service, or Web service.
DO NOT implement IQueryable<T> without understanding the performance implications of doing so.
Building and interpreting expression trees is expensive, and many queries can actually get slower when IQueryable<T> is implemented.
The trade-off is acceptable in the LINQ to SQL case, since the alternative overhead of performing queries in memory would have been far greater than the transformation of the expression to an SQL statement and the delegation of the query processing to the database server.
DO throw NotSupportedException from IQueryable<T> methods that cannot be logically supported by your data source.
For example, imagine representing a media stream (e.g., an Internet radio stream) as an IQueryable<byte>. The Count method is not logically supported—the stream can be considered as infinite, and so the Count method should throw NotSupportedException.
9.6.5 Supporting LINQ through the Query Pattern
The Query Pattern refers to defining the methods in Figure 9-1 without implementing the IQueryable<T> (or any other LINQ interface).
Figure 9-1. Query Pattern Method Signatures
S<T> Where(this S<T>, Func<T,bool>) S<T2> Select(this S<T1>, Func<T1,T2>) S<T3> SelectMany(this S<T1>, Func<T1,S<T2>>, Func<T1,T2,T3>) S<T2> SelectMany(this S<T1>, Func<T1,S<T2>>) O<T> OrderBy(this S<T>, Func<T,K>), where K is IComparable O<T> ThenBy(this O<T>, Func<T,K>), where K is IComparable S<T> Union(this S<T>, S<T>) S<T> Take(this S<T>, int) S<T> Skip(this S<T>, int) S<T> SkipWhile(this S<T>, Func<T,bool>) S<T3> Join(this S<T1>, S<T2>, Func<T1,K1>, Func<T2,K2>, Func<T1,T2,T3>) T ElementAt(this S<T>,int) |
Please note that the notation is not meant to be valid code in any particular language but to simply present the type signature pattern.
The notation uses S to indicate a collection type (e.g., IEnumerable<T>, ICollection<T>), and T to indicate the type of elements in that collection. Additionally, we use O<T> to represent subtypes of S<T> that are ordered. For example, S<T> is a notation that could be substituted with IEnumerable<int>, ICollection<Foo>, or even MyCollection (as long as the type is an enumerable type).
The first parameter of all the methods in the pattern (marked with this) is the type of the object the method is applied to. The notation uses extension-method-like syntax, but the methods can be implemented as extension methods or as member methods; in the latter case the first parameter should be omitted, of course, and the this pointer should be used.
Also, anywhere Func<...> is being used, pattern implementations may substitute Expression<Func<...>> for it. You can find guidelines later that describe when that is preferable.
DO implement the Query Pattern as instance members on the new type, if the members make sense on the type even outside of the context of LINQ. Otherwise, implement them as extension methods.
For example, instead of the following:
public class MyDataSet<T>:IEnumerable<T>{...} ... public static class MyDataSetExtensions{ public static MyDataSet<T> Where(this MyDataSet<T> data, Func<T,bool> query){...} }
Prefer the following, because it's completely natural for datasets to support Where methods:
public class MyDataSet<T>:IEnumerable<T>{ public MyDataSet<T> Where(Func<T,bool> query){...} ... }
DO implement IEnumerable<T> on types implementing the Query Pattern.
CONSIDER designing the LINQ operators to return domain-specific enumerable types. Essentially, one is free to return anything from a Select query method; however, the expectation is that the query result type should be at least enumerable.
This allows the implementation to control which query methods get executed when they are chained. Otherwise, consider a user-defined type MyType, which implements IEnumerable<T>. MyType has an optimized Count method defined, but the return type of the Where method is IEnumerable<T>. In the example here, the optimization is lost after the Where method is called; the method returns IEnumerable<T>, and so the built-in Enumerable.Count method is called, instead of the optimized one defined on MyType.
var result = myInstance.Where(query).Count();
AVOID implementing just a part of the Query Pattern if fallback to the basic IEnumerable<T> implementations is undesirable.
For example, consider a user-defined type MyType, which implements IEnumerable<T>. MyType has an optimized Count method defined but does not have Where. In the example here, the optimization is lost after the Where method is called; the method returns IEnumerable<T>, and so the built-in Enumerable.Count method is called, instead of the optimized one defined on MyType.
var result = myInstance.Where(query).Count();
DO represent ordered sequences as a separate type, from its unordered counterpart. Such types should define ThenBy method.
This follows the current pattern in the LINQ to Objects implementation and allows for early (compile-time) detection of errors such as applying ThenBy to an unordered sequence.
For example, the Framework provides the IOrderedEnumerable<T> type, which is returned by OrderBy. The ThenBy extension method is defined for this type, and not for IEnumerable<T>.
DO defer execution of query operator implementations. The expected behavior of most of the Query Pattern members is that they simply construct a new object which, upon enumeration, produces the elements of the set that match the query.
The following methods are exceptions to this rule: All, Any, Average, Contains, Count, ElementAt, Empty, First, FirstOrDefault, Last, LastOrDefault, Max, Min, Single, Sum.
In the example here, the expectation is that the time necessary for evaluating the second line will be independent from the size or nature (e.g., in-memory or remote server) of set1. The general expectation is that this line simply prepares set2, delaying the determination of its composition to the time of its enumeration.
var set1 = ... var set2 = set1.Select(x => x.SomeInt32Property); foreach(int number in set2){...} // this is when actual work happens
DO place query extensions methods in a "Linq" subnamespace of the main namespace. For example, extension methods for System.Data features reside in System.Data.Linq namespace.
DO use Expression<Func<...>> as a parameter instead of Func<...> when it is necessary to inspect the query. See section 9.6.5 for more details.
As discussed earlier, interacting with an SQL database is already done through IQueryable<T> (and therefore expressions) rather than IEnumerable<T>, since this gives an opportunity to translate lambda expressions to SQL expressions.
An alternative reason for using expressions is performing optimizations. For example, a sorted list can implement look-up (Where clauses) with binary search, which can be much more efficient than the standard IEnumerable<T> or IQueryable<T> implementations.