Accessing and Saving Data
Take a moment to download the Chapter6Wintellog project from the book website at:
http://www.informit.com/title/0321822161
This is a sample project that demonstrates several techniques for accessing and saving data. This version of the project is purposefully unfinished to allow you to focus on the key data-related code. Specific features such as lazy-loading data to improve start-up time, better formatting of data, and contracts will be integrated in later chapters. The application should be very useful because it takes blog feeds from various Wintellect employees and caches them locally on your Windows 8 device. Each time you launch the application it will scan for new items and pull those down. These blogs cover cutting edge content ranging from the latest information about Windows 8 to topics like Azure, SQL Server, and more. You may recognize some of the blogs including Jeff Prosise, Jeffrey Richter, and John Robbins.
You learned in the Chapter 5, “Application Lifecycle” about the various storage locations and how you can use either settings or the file system itself. The application currently uses settings to track the first time it runs. Currently that process takes several minutes as it reads a feed with blog entries and parses the web pages for display. An extended splash screen is used due to the longer startup time. You can see the check to see if the application has been initialized in the ExtendedSplashScreen_Loaded method:
if (ApplicationData.Current.LocalSettings.Values .ContainsKey("Initialized"))
Once the process is completed, the flag is set to true. This allows the application to display a warning about the startup time the first time it runs. Subsequent launches will load the majority of data from a local cache to improve the speed of the application.
ApplicationData.Current.LocalSettings.Values["Initialized"] = true;
There are several classes involved with loading and saving the data. Take a look at the StorageUtility class. This class is used to simplify the process of saving items to local storage and restoring them when the application is launched. In SaveItem you can see the process to create a folder and a file and handling potential collisions as described in Chapter 5, “Application Lifecycle”:
var folder = await ApplicationData.Current.LocalFolder .CreateFolderAsync(folderName, CreationCollisionOption.OpenIfExists); var file = await folder.CreateFileAsync(item.Id.GetHashCode().ToString(), CreationCollisionOption.ReplaceExisting);
Notice that the method itself is marked with an async keyword and the file system operations are preceded by await. You will learn about these keywords in the next section. Unlike the example in Chapter that manually wrote the properties to storage, the StorageUtility class takes a generic type to make it easier to save any type that can be serialized. The code uses the same engine that handles complex types transmitted via web services (you will learn more about web services later in this chapter). This code uses the DataContractJsonSerializer to take the snapshot of the instance that is saved:
var stream = await file.OpenAsync(FileAccessMode.ReadWrite); using (var outStream = stream.GetOutputStreamAt(0)) { var serializer = new DataContractJsonSerializer(typeof(T)); serializer.WriteObject(outStream.AsStreamForWrite(), item); await outStream.FlushAsync(); }
The file is created through the previous call and used to retrieve a stream. The instance of the DataContractJsonSerializer is passed the type of the class to be serialized. The serialized object is written to the stream attached to the file and then flushed to store this to disk. The entire operation is wrapped in a try ... catch block to handle any potential file system errors that may occur. This is common for cache code because if the local operation fails, the data can always be retrieved again from the cloud.
To see how the serialization works and where the files are stored, run the application and allow it to initialize and pass you to the initial grouped item list. Navigate back to the desktop and browse to:
C:\Users\Jeremy\AppData
Replace the “C” with your system drive letter and the “Jeremy” with your user name. If you are not sure what these values are, go the start menu and type Command then select Developer Command Prompt. At the prompt, type:
echo %userprofile%
This will give you the root path to your user profile. Append AppData to the end. This is where each application will store data related to your login. Notice the top level folders that include Local and Roaming. Open the Local folder then navigate to Packages. Your path will now look something like this:
C:\Users\Jeremy\AppData\Local\Packages
This is where the application specific data for your login will be stored. You can either try to match the folder name to the package identifier, or type Groups into the search box to locate the folder used by the Wintellog application. When you open the folder you’ll see several folders with numbers for the name and a single folder called Groups, similar to what is shown in Figure 6.1.
Figure 6.1 The local cache for the Wintellog application
To simplify generating filenames, the application currently just uses the hash code for the unique identifier of the group or item to establish a filename. A hash code is simply a value that makes it easier to compare complex objects. You can read more about hash codes online at:
http://msdn.microsoft.com/en-us/library/system.object.gethashcode.aspx
Hash codes are not guaranteed to be unique, but in the case of strings it is highly unlikely that the combination of a group and a post would cause a collision. The Groups folder contains a list of files for each group. Navigate to that folder and open one of the items in Notepad. You’ll see the JSON serialized value for a BlogGroup instance.
The JSON is stored in a compact format on disk. The following example shows the JSON value for my blog, formatted to make it easier to read:
{ "Id" : "http://www.wintellect.com/CS/blogs/jlikness/default.aspx", "PageUri" : "http://www.wintellect.com/CS/blogs/jlikness/default.aspx", "Title" : "Jeremy Likness' Blog", "RssUri" : "http://www.wintellect.com/CS/blogs/jlikness/rss.aspx" }
The syntax is straightforward. The braces enclose the object being defined and contain a list of keys (the name of the property) and values (what the property is set to). If you inspect any of the serialized posts (those are contained in a folder with the same name as the group hash code) you will notice the ImageUriList property uses a bracket to specify an array:
"ImageUriList" : [ "http://www.wintellect.com/.../Screen_thumb_42317207.png", "http://www.wintellect.com/.../someotherimage.png" ]
You may have already looked at the BlogGroup class and noticed that not all of the properties are being stored. This particular approach requires that you mark the class as a DataContract and then explicitly tag the properties you wish to serialize. The BlogGroup class is tagged like this:
[DataContract] public class BlogGroup : BaseItem
Any properties to be serialized are tagged using the DataMember attribute:
[DataMember] public Uri RssUri { get; set; }
If you have written web services using Windows Communication Foundation (WCF) in the past, you will be familiar with this format for tagging classes. You may not have realized it could be used for direct serialization without going through the web service stack. The default DataContractSerializer outputs XML, so remember to specify the DataContractJsonSerializer if you want to use JSON.
The process to restore is very similar. You still reference the file but this time open it for read access. The same serialization engine is used to create an instance of the type from the serialized data:
var folder = await ApplicationData.Current.LocalFolder .GetFolderAsync(folderName); var file = await folder.GetFileAsync(hashCode); var inStream = await file.OpenSequentialReadAsync(); var serializer = new DataContractJsonSerializer(typeof(T)); var retVal = (T)serializer.ReadObject(inStream.AsStreamForRead());
You can see when you start the application that the process of loading web sites, saving the data, and restoring items from the cache takes time. In the Windows Runtime, any process that takes more than a few milliseconds is defined as asynchronous. This is very different from a synchronous call. To understand the difference, it is important to be familiar with the concept of threading.
You will learn more about threads in Chapter 17, “Advanced Metro Techniques.” In a nutshell, threading provides a way to execute different processes at the same time (concurrently). One job of the processor in your device is to schedule these threads. If you only have one processor, multiple threads take turns to run. If you have multiple processors, threads can run on different processors at the same time.
When the user launches an application the system creates a main application thread that is responsible for performing most the work including responding to user input and drawing graphics on the screen. The fact that it manages the user interface has led to a convention of calling this thread the “UI thread.” By default, your code will execute on the UI thread unless you do something to spin off a separate thread.
The problem with making synchronous calls from the UI thread is that all processing must wait for your code to complete. If your code takes several seconds, this means the routines that check for touch events or update graphics will not run during that period. In other words, your application will freeze and become unresponsive.
The Windows Runtime team purposefully designed the framework to avoid this scenario by introducing asynchronous calls for any methods that might potentially take longer than 50 milliseconds to execute. Instead of running synchronously, these methods will spin off a separate thread to perform work and leave the UI thread free. At some point when their work is complete, they return their results to the UI thread (a common mistake is to try to update the display without returning to the UI thread; this will generate an exception called a cross-thread access violation because only the UI thread is allowed to manage those resources).
Managing asynchronous calls in traditional C# was not only difficult, but resulted in code that was hard to read and maintain. Listing 6.1 provides an example using a traditional event-based model. Breakfast, lunch, and dinner happen asynchronously but one meal must be completed before the next can begin. In the event-based model, an event handler is registered with the meal so the meal can flag when it is done. A method is called to kick off the process, which by convention ends with the text Async.
Listing 6.1: Asynchronous meals using the event model
public void EatMeals() { var breakfast = new Breakfast(); breakfast.MealCompleted += breakfast_MealCompleted; breakfast.BeginBreakfastAsync(); } void breakfast_MealCompleted(object sender, EventArgs e) { var lunch = new Lunch(); lunch.MealCompleted += lunch_MealCompleted; lunch.BeginLunchAsync(); } void lunch_MealCompleted(object sender, EventArgs e) { var dinner = new Dinner(); dinner.MealCompleted += dinner_MealCompleted; dinner.BeginDinnerAsync(); } void dinner_MealCompleted(object sender, EventArgs e) { // done; }
This example is already complex. Every step requires a proper registration (subscription) to the completion event and then passes control to an entirely separate method when the task is done. The fact that the process continues in a separate method means that access to any local method variables is lost and any information must be passed through the subsequent calls. This is how many applications become overly complex and difficult to maintain.
The Task Parallel Library (TPL) was introduced in .NET 4.0 to simplify the process of managing parallel, concurrent, and asynchronous code. Using the TPL you can create meals as individual tasks and execute them like this:
var breakfast = new Breakfast(); var lunch = new Lunch(); var dinner = new Dinner(); var t1 = Task.Run(() => breakfast.BeginBreakfast()) .ContinueWith(breakfastResult => lunch.BeginLunch(breakfastResult)) .ContinueWith(lunchResult => dinner.BeginDinner(lunchResult));
This helped simplify the process quite a bit. The code is still not as easy to read and understand or maintain. The Windows Runtime has a considerable amount of APIs that use the asynchronous model. To make developing applications that use asynchronous method calls even easier, Visual Studio 11 provides support for two new keywords called async and await.
Understanding async and await
The async and await keywords provide a simplified approach to asynchronous programming. A method that is going to perform work asynchronously and should not black the calling thread is marked with the async keyword. Within that method, you can call other asynchronous methods to launch long running tasks. Methods marked with the keyword can have one of three return values.
All async operations in the Windows Runtime return one of four interfaces. The interface that is implemented depends on whether or not the operation returns a result to the caller and whether or not it supports tracking progress. Table 6.1 lists the available interfaces.
Table 6.1: Interfaces available for async operations
|
Reports Progress |
Does Not Report Progress |
Returns Results |
IAsyncOperationWithProgress |
IAsyncOperation |
Does Not Return Results |
IAsyncActionWithProgress |
IAsyncAction |
In C# there are several ways you can both wrap calls to asynchronous methods as well as define them. Methods that call asynchronous operations are tagged with the async keyword. Methods with the async keyword that return void are most often event handlers. Event handlers require a void return type. For example when you want to run an asynchronous task from a button tap, the signature of the event handler looks like this:
private void button1_Click(object sender, RoutedEventArgs e) { // do stuff }
In order to wait for asynchronous calls to finish without blocking the UI thread, you must add the async keyword so the signature looks like this:
private async void button1_Click(object sender, RoutedEventArgs e) { // do stuff await DoSomethingAsynchronously(); }
Aside from the special case of event handlers, you might want to create a long running task that must complete before other code can run but does not return any values. For those methods, you return a Task. For example:
public async Task LongRunningNoReturnValue() { await TakesALongTime(); return; }
Notice that the compiler does the work for you. In your method, you simply return without sending a value. The compiler will recognize the method as a long-running Task and create the Task “behind the scenes” for you. The final return type is a Task that is closed with a specific return type. Listing 6.2 demonstrates how to take a simple method that computes a factorial and wrap it in an asynchronous call. The DoFactorialExample method asynchronously computes the factorial for the number 5 and then puts the result into the Text property as a string.
Listing 6.2: Creating an asynchronous method that returns a result
public long Factorial(int factor) { long factorial = 1; for (int i = 1; i <= factor; i++) { factorial *= i; } return factorial; } public async Task<long> FactorialAsync(int factor) { return await Task.Run(() => Factorial(factor)); } public async void DoFactorialExample() { var result = await FactorialAsync(5); Result = result.ToString(); }
Note how easy it was to take an existing synchronous method (Factorial) and provide it as an asynchronous method (FactorialAsync) then call it to get the result with the await keyword (DoFactorialExample). The Task.Run call is what creates the new thread. The flow between threads is illustrated in Figure 6.2. Note the UI thread is left free to continue processing while the factorial computes, then the result is updated and can be displayed to the user.
Figure 6.2 Asynchronous flow between threads
The examples here use the TPL because it existed in previous versions of the .NET Framework. It is also possible to create asynchronous processes using Windows Runtime methods like ThreadPool.RunAsync. You can learn more about asynchronous programming in the Windows Runtime in the development center:
http://msdn.microsoft.com/en-us/library/windows/apps/hh464924.aspx
http://msdn.microsoft.com/en-us/library/windows/apps/hh452713.aspx
Lambda Expressions
The parameter that was passed to the Thread.Run method is called a lambda expression. A lambda expression is simply an anonymous function. It starts with the signature of the function (if the Run method took parameters, those would be specified inside the parenthesis) and ends with the body of the function. I like to refer to the special arrow => as the gosinta for “goes into.” Take the expression:
()=>Factorial(factor)
This can be read as “nothing goes into a call to Factorial with parameter factor.” You can use lambda expressions to provide methods “on the fly.” In the previous examples showing lunch, breakfast, and dinner, special methods were defined to handle the completion events. A lambda expression could also be used like this:
breakfast.MealCompleted += (sender, eventArgs) => { // do something };
In this case, “The sender and eventArgs goes into a set of statements that do something.” The parameters triggered by the event are available in the body of the lambda expression, as are local variables defined in the surrounding methods. Lambda expressions are used as a short-hand convention for passing in delegates.
There are a few caveats to be aware of when using lambda expressions. Unless you assign a lambda expression to a variable, it is no longer available to reference from code so you cannot unregister an event handler that is defined with a lambda expression. Lambda expressions that refer to variables within the method capture those variables so they can live longer than the method scope (this is because the lambda expression may be referenced after the method is complete) so you must be aware of the side effects for this. You can learn more about lambda expressions online at http://msdn.microsoft.com/en-us/library/bb397687(v=vs.110).aspx.
IO Helpers
The PathIO and FileIO classes provide special helper methods for reading and writing storage files. The PathIO class allows you to perform file operations by passing the absolute path to the file. Creating a text file and writing data can be accomplished in a single line of code:
await PathIO.WriteTextAsync("ms-appdata:///local/tmp.txt", "Text.");
The ms-appdata prefix is a special URI that will point to local storage for the application. You can also access local resources that are embedded in your application using the ms-appx prefix. In the sample application, an initial list of blogs to load is stored in JSON format under Assets/Blogs.js. The code to access the list is in the BlogDataSource class – the file is accessed and loaded with a single line of code:
var content = await PathIO .ReadTextAsync("ms-appx:///Assets/Blogs.js");
Table 6.2 provides the list of available methods you can use. All of the methods take an absolute file path for the PathIO class and an IStorageFile object (obtained using the storage API) for the FileIO class:
Table 6.2: File helper methods from the PathIO and FileIO classes
Method Name |
Description |
AppendLinesAsync |
Appends lines of text to the specified file |
AppendTextAsync |
Appends the text to the specified file |
ReadBufferAsync |
Reads the contents of the specified file into a buffer |
ReadLinesAsync |
Reads the contents of the specified file into lines of text |
ReadTextAsync |
Reads the contents of the specified file into a single string as text |
WriteBufferAsync |
Writes data from a buffer to the specified file |
WriteBytesAsync |
Writes the byte array to the specified file |
WriteLinesAsync |
Writes the text lines to the specified file |
WriteTextAsync |
Writes the text to the specified file |
Take advantage of these helpers where it makes sense. They will help simplify your code tremendously.
Embedded Resources
There are several ways you can embed data within your application and read it back. A common reason to embed data is to provide seed values for a local database or cache, configuration items, and special files such as license agreements. You can embed any type of resource, including images and text files. The applications you have worked with already include image resources.
To specify how a resource is embedded, right-click the resource name in the Solution Explorer and select Properties or select the item and press ALT + ENTER. Figure 6.3 shows the result of highlighting the file TestPage.html and selecting the Properties dialog. Note the Build Action and Copy to Output Directory attributes.
Figure 6.3 Properties for a resource
When you set the action to Content the resource is copied into a folder that is relative to the package for your application. In addition to the storage containers you learned about in Chapter 5, “Application Lifecycle,” every package has an install location that contains the local assets you have specified the Content build action for. This will include resources such as images.
You can find the location where the package is installed using the Package class:
var package = Windows.ApplicationModel.Package.Current; var installedLocation = package.InstalledLocation; var loc = String.Format("Installed Location: {0}", installedLocation.Path);
An easier way to access these files is to use the ms-appx prefix. Open the BlogUtilityTests.cs file in the Wintellect.Tests project. You will learn more about testing in Chapter 11, “Testing.” The tests included are simple tests to validate the algorithms for extracting data from the target web pages. You’ve already seen how to load a file that is included with the Content build action (the Blogs.js asset used to load the initial set of blogs to reference).
It is also possible to embed resources directly into the executable for your application. These resources are not visible in the file system but can still be accessed through code. To embed a resource, set the Build Action to Embedded Resource (an example in the test project for this is the TestBlogPage.html file). Accessing the resource is a little more complex.
To read the contents of an embedded resource, you must access the current assembly. An assembly is a building block for applications. One way to get the assembly is to inspect the information about a class you have defined:
var assembly = typeof(BlogUtilityTests).GetTypeInfo().Assembly;
The assembly is what the resource is embedded within. Once you have a reference to the assembly, you can grab a stream to the resource using the GetManifestResourceStream method. There is a trick to how you reference the resource, however. The resource will be named as part of the namespace for your assembly. Therefore, a resource at the root of a project with the default namespace Wintellect.Tests will be given the path:
Wintellect.Tests.ResourceName
The reference to the TestBlogPage.html file is therefore Wintellect.Tests.TestBlogPage.html. Once you have retrieved the stream for the resource, you can use a stream reader to read it back. The test project contains a helper method to retrieve embedded resources. Once the assembly reference is obtained, it returns the contents like this:
var stream = assembly.GetManifestResourceStream(page); var reader = new StreamReader(stream); var result = await reader.ReadToEndAsync(); return result;
You will typically use embedded resources only when you wish to obfuscate the data by hiding it in the assembly. Note this will not completely hide the data because anyone with the right tools will be able to inspect the assembly to examine its contents, including embedded resources. Embedding assets using the Content build action not only makes it easier to inspect the assets from your application, but also has the added advantage of allowing you to enumerate the file system using the installed location of the current package when there are multiple assets to manage.