Home > Articles > Programming > Windows Programming

  • Print
  • + Share This
This chapter is from the book

Savings Mart

Savings Mart is a fictitious discount retailer operating in a single American state. It has been in business since 2001 and hopes to open new stores by achieving greater operational efficiencies. Since its inception, Savings Mart has relied on a system of adjusting product inventory thresholds to determine when shipments will be made to stores. Every time a product is purchased, the quantity for that product and store is updated. When the quantity dips below the minimum threshold allowed for that product and store, an order is automatically generated and a shipment is made three days later.

Although this process seemed like a good way to ensure that the stores were well stocked, it resulted in shipments being made to each store almost every day. This resulted in high overhead costs for each store. Management now wishes to replace the order/shipment strategy with a system designed to predict shipment dates rather than rely on adjustable thresholds.

A sample application presented in this chapter allows the reader to create a training dataset for Savings Mart based on randomly generated purchases. The reader can then step through the process of creating a mining model based on the dataset.

Loading the Savings Mart Database

To execute the sample code with SQL Server, you will need to create a database using a script file available on the book’s Web site. The installation steps are as follows:

  1. Open SQL Server’s Query Analyzer and connect to the server where you wish to install the database.

  2. Click File and Open...

  3. From the Open Query File dialog, browse to location of the InstallDB.sql file available on the book’s Web site. Once selected, click OK.

  4. Click Query and Execute or hit F5.

  5. Once the script has completed executing, a new database named SavingsMart will be visible in the drop-down box on the toolbar.

Figure 5.1 is a diagram of the SavingsMart database. The Products table contains a record for every product available in the stores. Each product is associated with a certain vendor and assigned to a product type, such as beverage or medicine. A field named DiscontinuedDate will contain either a null, meaning it is not discontinued, or the date that it should no longer be available for order. Every product should have a UnitQuantity of one or greater, which indicates the number of items packaged with that product. The products UnitPrice represents the retail price before any discounts are applied. The UnitType and UnitAmount fields may or may not contain values, depending on the product. For instance, a bottled water product will have a UnitType of "oz" and a Unit- Amount of 16.4, indicating that it weighs 16.4 fluid ounces. It is not necessary to record the weight of a mop, so for this product these values would be null.

Figure 5.1

Figure 5.1 The SavingsMart database. The sample database contains five hundred products and five stores. Stores are stocked with all products according to threshold values contained in the ProductThresholds table. Each time a product is ordered and a shipment completed, the quantity field in ProductQty is updated.

The Purchases table is written to every time a customer makes a single purchase. It records information common to all purchases, such as when the purchase took place, what store it was made in, and what employee rang up the purchase. Purchases are made for products available within a particular store. Availability is determined by examining the Quantity field in the table ProductQty. A purchase can include multiple products and more than one unit of each product. The ProductDetail table is a child of products, and it contains a record for each product associated with a single purchase. If the product purchased is on sale during the time of purchase, then the discount percentage, specified in the ProductDiscounts table is applied.

Once a product is sold beneath the minimum threshold allowed for the store, as indicated by the ProductThresholds table, an order is automatically placed. The quantity for the order is based on the maximum amount found in the ProductThresholds table. Each shipment is the direct result of an order and is typically completed three days after the order is placed.

Populating the Database

Once the SavingsMart database is created, the next step is to populate the database. Unlike the sample databases in Chapters 2 and 3, the SavingsMart database needs to be populated with a large quantity of data. To facilitate this process and provide a method for generating unique training datasets, a sample data-loading application is included on the book’s Web site at http://www.awprofessional.com/title/0321246268. The sample Windows application, named LoadSampleData, will allow you to simulate random purchases as well as to initiate orders and shipments needed to restock products.

Utilizing the LoadSampleData program ensures that you are dealing with a clean database. Very often, the most difficult and time-consuming part of successful data mining involves cleaning the historical database to remove or replace records holding invalid values. Refer to the next section, "Cleaning the Database," for more information about this.

The LoadSampleData program also gives you an opportunity to adjust factors affecting the mining model and therefore produce different results. For instance, the program allows you to select certain product types and vendor names that will be purchased more often.

The LoadSampleData application consists of one form, Form1.vb (see Figure 5.2). It utilizes the Microsoft Data Application block to handle data access and the Microsoft Exception Application block to handle writing exceptions to the event log. The Load Data button is used to populate tables with values from text files, available for download from the book’s Web site. The following is a list of these text files along with a brief description of what they contain:

  • Stores.txt—Data for a total of five stores.

  • Employees.txt—Assigns three employees to each store.

  • Vendors.txt—Data for a total of thirty-four vendors or product brands.

  • ProductTypes.txt—A total of fourteen product types, including such items as Beverages and Kitchen Supplies.

  • Products.txt—A total of five hundred products representing each of the product types and vendors.

Figure 5.2

Figure 5.2 Screenshot of the main form used in the LoadSampleData program. This program will be used to load initial data values into the SavingsMart database. It will also allow the reader to simulate random customer purchases in order to populate a large historical dataset.

The LoadData routine is also used to populate the ProductThresholds table with set values for minimum and maximum threshold amounts. The minimum field represents the minimum quantity of product that should be available in a certain store. The maximum field is the quantity that will be ordered for that store when the minimum threshold is broken. Initially, the minimum and maximum values will be set at ten and two hundred respectively. This will be the case for each product and each store, resulting in a total of twenty-five hundred records (500 products x 5 stores = 2500 records).

Finally, the LoadData routine will generate orders and shipments for each of the five stores. The initial orders will stock the stores with the maximum threshold for all five hundred products. The shipment date will occur three days after the order date to ensure that all stores are fully stocked on the first day of the year.

To begin loading data, execute the following steps:

  1. Copy the contents of the LoadSampleData directory, available for download from the book’s Web site, to a location accessible by your development machine. Note the location because it will be used to set the sPath variable in step 4.

  2. Open the LoadSampleData project file with Visual Studio .NET.

  3. From Solution Explorer, right-click Form1 and select View Code.

  4. The top of the form contains two variables that will be unique to your installation. Ensure that the sConn and sPath variables are set correctly. sConn is a string variable containing the connection string used to connect to the SavingsMart database on SQL Server . sPath is a string variable containing the file path to the text files. The text files reside in a subdirectory name TextFiles. This subdirectory is located inside the LoadSampleData directory (created in step 1).

  5. Execute the application by selecting Start from the Debug Menu. Figure 5.2 is a screenshot of form1.

  6. To begin, click the Load Data button and ensure that the message box "Initial Data Load is complete" appears. Note that the form contains several combo and textboxes that will determine what and how data is loaded.

'Maximum # of purchases per day
Dim nMaxPurchases As Int16 = txtMaxPurchases.Text
'Maximum # of products per purchase
Dim nMaxProducts As Int16 = txtMaxProducts.Text
'Maximum value of quantity per product
Dim nMaxQuantity As Int16 = txtMaxQuantity.Text
Dim sYear As String = cboYear.Text
Dim nStoreID As Int16 = cboStoreID.Text

'These are the Product Types in which there is an 
'increased chance of product selection
'The default of 1,2 represents snack foods and beverages
Dim sProductTypeAdj As String = txtProductTypeAdj.Text

'These are the Vendors in which there is an increased 
'chance of product selection 
'The default of 20,24,27 represents Kamp, Notch, and PNuts as Vendors
Dim sVendorAdj As String = txtVendorAdj.Text
ProgressBar1.Minimum = 1
ProgressBar1.Maximum = 366

  Dim params(2) As SqlParameter
  params(0) = New _
   SqlParameter("@ID", SqlDbType.Int)
  params(1) = New _
   SqlParameter("@ProdTypeAdj", SqlDbType.VarChar, 50)
  params(2) = New _
  SqlParameter("@VendorAdj", SqlDbType.VarChar, 50)
 params(0).Value = nStoreID
 params(1).Value = sProductTypeAdj
 params(2).Value = sVendorAdj
 Dim ds As DataSet = _
 SqlHelper.ExecuteDataset(sConn, _
  CommandType.StoredProcedure, "GetProductIDS", params)
 Dim i As Int16 = 1
 Dim dtDate As DateTime
 dtDate = Convert.ToDateTime("01/01/" + sYear)
 'Loop through everyday of the year 
 'We assume the store is open every day
 Do Until i = 366
   'First thing is check to see if orders needs to 
   'be fulfilled for this day and store
   'We assume that all orders are shipped 3 days 
   'after the orderdate in one shipment
   Dim params1(1) As SqlParameter
   params1(0) = New _
     SqlParameter("@StoreID", SqlDbType.Int)
   params1(1) = New _
     SqlParameter("@Date", SqlDbType.SmallDateTime)
   params1(0).Value = nStoreID
   'order was placed 3 days ago
   params1(1).Value = dtDate.AddDays(-3)
   SqlHelper.ExecuteReader(sConn, _
    CommandType.StoredProcedure, "InsertShipment", params1)
   Dim x As Int16 = 1
   'This will be the total number of purchases for this day
   Dim nPurchases As Int16
   nPurchases = CInt(Int((nMaxPurchases * Rnd()) + 1))
   Do Until x = nPurchases + 1
     Dim y As Int16 = 1
     Dim nEmployeePos As Int16
     nEmployeePos = CInt(Int((ds.Tables(1).Rows.Count * Rnd())))
     Dim nEmployeeID As Integer = _
     Dim params2(2) As SqlParameter
     params2(0) = New SqlParameter("@ID1", SqlDbType.Int)
     params2(1) = New _
       SqlParameter("@Date", SqlDbType.SmallDateTime)
     params2(2) = New SqlParameter("@ID2", SqlDbType.Int)
     params2(0).Value = nStoreID
     params2(1).Value = dtDate
     params2(2).Value = nEmployeeID
     Dim nPurchaseID As Integer = _
       SqlHelper.ExecuteScalar(sConn, _
      CommandType.StoredProcedure, "InsertPurchase", params2)
     'This is total number of products for this purchase
     Dim nProducts As Int16 = _
       CInt(Int((nMaxProducts * Rnd()) + 1))
     Do Until y = nProducts + 1
       'This is quantity for this purchase
       Dim nQty As Int16 = _
         CInt(Int((nMaxQuantity * Rnd()) + 1))
       'This is the product for this detail record
       Dim nProductPos As Int16 = _
         CInt(Int((ds.Tables(0).Rows.Count * Rnd())))
       Dim nProductID As Integer = _
       'Generate the detail record
       Dim params3(4) As SqlParameter
        params3(0) = New SqlParameter("@StoreID", SqlDbType.Int)
       params3(1) = New _
         SqlParameter("@ProductID", SqlDbType.Int)
       params3(2) = New _
         SqlParameter("@PurchaseID", SqlDbType.Int)
       params3(3) = New SqlParameter("@Qty", SqlDbType.Int)
       params3(4) = New _
         SqlParameter("@Date", SqlDbType.SmallDateTime)
       params3(0).Value = nStoreID
       params3(1).Value = nProductID
       params3(2).Value = nPurchaseID
       params3(3).Value = nQty
       params3(4).Value = dtDate
       SqlHelper.ExecuteScalar(sConn, _
         CommandType.StoredProcedure, _
        "InsertPurchaseDetail", params3)
       y += 1
     x += 1

    i += 1
    ProgressBar1.Value = i
    'Go to the next day
    dtDate = dtDate.AddDays(1)
   MessageBox.Show("Purchases for store " _
    + Convert.ToString(cboStoreID.Text) + _
    " were generated successfully")
Catch ex As Exception
End Try

The next step is to generate purchases for each of the five stores. Data mining is most effective in dealing with large datasets. Therefore, the GeneratePurchases routine, seen as follows, will insert approximately 100,000 records in the PurchaseDetail table for each store and calendar year. Purchases are generated for one store—one year at a time.

The amount of records is approximate because the routine utilizes a random number generator to determine the number of purchases per day along with the number of products per purchase. The number of records also varies depending on what input variables are chosen on form1.

The program utilizes default values specifying that purchases will be generated for Store 1 in the year 2001. The GeneratePurchases routine contains a main loop that will execute 365 times for each day of one calendar year. The variable max purchases defaults to 80 and is used to provide the maximum value for the random number generator when determining how many purchases will be generated for a single day.

The variable max products determines the number of distinct products that will be used for a single purchase. Max quantity is used to determine the quantity used in a single purchase detail record. By utilizing the random number generator and then adjusting these values for each store that is processed, we can simulate random purchase activity. In the section titled "Interpreting the Results," we will examine the results of one mining model. To ensure that your results are consistent with the explanations in this section, use the values in Table 5.2 when loading your database.

Table 5.2 Values to be used in the LoadSampleData application when generating purchases for all five stores.


Field Caption

Value (only use the number values and not the literal values in parentheses)


Processing Year



Max Purchases



Max Products



Max Quantity



Product Type Adjustment

1, 2 (Snack Foods and Beverages)


Vendor Adjustment

20, 24, 27 (Kamp, Notch, and Pnuts)


Processing Year



Max Purchases



Max Products



Max Quantity



Product Type Adjustment

2, 6 (Beverages and Baking Goods)


Vendor Adjustment

13, 18 (Gombers and Joe’s)


Processing Year



Max Purchases



Max Products



Max Quantity



Product Type Adjustment

2 (Beverages)


Vendor Adjustment

34 (Store Brand)


Processing Year



Max Purchases



Max Products



Max Quantity



Product Type Adjustment

(leave blank)


Vendor Adjustment

34, 18 (Store Brand and Joe’s)


Processing Year



Max Purchases



Max Products



Max Quantity



Product Type Adjustment

6 (Baking Goods)


Vendor Adjustment

-24, 27, 34 (Notch, Pnuts, and Store Brand)

Of course, since we are using a random number generator, the purchases generated will represent all products equally well over the long run. The Product Type Adjustment and Vendor Adjustment variables are introduced because equal distribution of product purchases is not realistic. These variables contain a comma-delimited list of ProductTypeID and VendorID values. The stored procedure GetProductIDs uses these values when returning the dataset of available ProductID’s. If a ProductTypeID is specified, then every product that relates to that product type will be included in the list of available product id’s more than once. This will increase the chances that the product will be selected for the PurchaseDetail record. The Vendor Adjustment works similarly in that for each VendorID specified, all products assigned to that vendor will appear in the available product list more than once.

If you do not alter the values on form1, the GeneratePurchases routine will take approximately twenty minutes to load data for each store and calendar year. A progress bar is used to indicate the status of the data load because the process is somewhat time-consuming.

Cleaning the Database

Cleaning the database is one of the most important tasks in successful data mining. Databases to be mined are often constructed from multiple data sources. These data sources often involve data that is prone to a variety of errors that can destroy any chance you have of making useful predictions. Most everyone has heard or used the phrase "Garbage in, Garbage out." This phrase applies more than ever to data mining.

Possible errors include records with impossible values or values that fall outside the expected range. Also, records that include null values where nulls will negatively impact the decision-making process. Most of these errors could be prevented with database restrictions or application code, but this does not always happen. Since our sample database was artificially created, we can be reasonably sure that these errors do not exist. The following is a list of the errors that could have occurred if our database existed in the real world:

  • Store sale in which the ending data occurs before the starting date.

  • A purchase handled by a store employee before their hire date or for a store they were not assigned to.

  • An order made for a product that was discontinued before the order date.

  • A shipment or order date that is invalid or outside the days of operation for the concerned store.

  • A negative quantity in either PurchaseDetail, OrderDetail, ShipmentDetail, or ProductQty.

  • A product not associated with a vendor or a purchase with no purchase date and quantity.

  • A maximum amount that is greater than the minimum.

The methods used to clean a database can vary. Often, values can be corrected with a few update queries made in Query Analyzer. The hardest part is determining what the correct values should be. More than likely, outside help will be needed from people intimately familiar with the data, such as a store manager.

Creating Views

In order to ease the process of building a mining model, a special view will be created. The view, vw_Shipments, combines fields from five different tables and will be used in the next section to create the mining model. The view utilizes the function fn_GetLastShipmentDate to calculate the number of days between shipments. The Transact-SQL code (viewable from the User Defined Function tab in Enterprise Manager) for this function is as follows:

CREATE FUNCTION fn_GetLastShipmentDate
 @ShipmentID int,
 @ProductID int
RETURNS datetime

 DECLARE @ShippedDate smalldatetime, @TShipmentID int, @ret 
 FOR select shippeddate, s.shipmentid from shipments s
    left join shipmentdetails sd on s.shipmentid = sd.shipmentid 
    where s.storeid IN (SELECT StoreID FROM Shipments
       WHERE shipmentid = @Shipmentid)
    AND sd.productid = @ProductID
    ORDER BY shippeddate
 OPEN cursor1
 FETCH NEXT FROM cursor1 INTO @ShippedDate, @TShipmentID

    IF @ShipmentID = @TShipmentID
       FETCH PRIOR FROM cursor1 INTO @ShippedDate, @TShipmentID
       SET @ret = @ShippedDate
       GOTO Close_Cursor

    FETCH NEXT FROM cursor1 INTO @ShippedDate, @TShipmentID

 CLOSE cursor1


The function accepts @ShipmentID and @ProductID as input variables. It then opens a scrollable cursor (similar to an ADO resultset) based on the following SQL statement:

SELECT shippeddate, s.shipmentid FROM shipments s
 LEFT JOIN shipmentdetails sd ON s.shipmentid = sd.shipmentid 
 WHERE s.storeid IN (SELECT StoreID FROM Shipments
    WHERE shipmentid = @Shipmentid)
 AND sd.productid = @ProductID
 ORDER BY shippeddate

The function loops through the cursor results until it locates the ShipmentID supplied as an input variable. Once located, it moves to the preceding record and returns that shipment date. This shipment date is used as the first variable for the built-in SQL function DATEDIFF. The resulting variable, DaysSinceLastShipped, will be an important column for the Analyze Shipments mining model.

  • + Share This
  • 🔖 Save To Your Account

InformIT Promotional Mailings & Special Offers

I would like to receive exclusive offers and hear about products from InformIT and its family of brands. I can unsubscribe at any time.


Pearson Education, Inc., 221 River Street, Hoboken, New Jersey 07030, (Pearson) presents this site to provide information about products and services that can be purchased through this site.

This privacy notice provides an overview of our commitment to privacy and describes how we collect, protect, use and share personal information collected through this site. Please note that other Pearson websites and online products and services have their own separate privacy policies.

Collection and Use of Information

To conduct business and deliver products and services, Pearson collects and uses personal information in several ways in connection with this site, including:

Questions and Inquiries

For inquiries and questions, we collect the inquiry or question, together with name, contact details (email address, phone number and mailing address) and any other additional information voluntarily submitted to us through a Contact Us form or an email. We use this information to address the inquiry and respond to the question.

Online Store

For orders and purchases placed through our online store on this site, we collect order details, name, institution name and address (if applicable), email address, phone number, shipping and billing addresses, credit/debit card information, shipping options and any instructions. We use this information to complete transactions, fulfill orders, communicate with individuals placing orders or visiting the online store, and for related purposes.


Pearson may offer opportunities to provide feedback or participate in surveys, including surveys evaluating Pearson products, services or sites. Participation is voluntary. Pearson collects information requested in the survey questions and uses the information to evaluate, support, maintain and improve products, services or sites, develop new products and services, conduct educational research and for other purposes specified in the survey.

Contests and Drawings

Occasionally, we may sponsor a contest or drawing. Participation is optional. Pearson collects name, contact information and other information specified on the entry form for the contest or drawing to conduct the contest or drawing. Pearson may collect additional personal information from the winners of a contest or drawing in order to award the prize and for tax reporting purposes, as required by law.


If you have elected to receive email newsletters or promotional mailings and special offers but want to unsubscribe, simply email information@informit.com.

Service Announcements

On rare occasions it is necessary to send out a strictly service related announcement. For instance, if our service is temporarily suspended for maintenance we might send users an email. Generally, users may not opt-out of these communications, though they can deactivate their account information. However, these communications are not promotional in nature.

Customer Service

We communicate with users on a regular basis to provide requested services and in regard to issues relating to their account we reply via email or phone in accordance with the users' wishes when a user submits their information through our Contact Us form.

Other Collection and Use of Information

Application and System Logs

Pearson automatically collects log data to help ensure the delivery, availability and security of this site. Log data may include technical information about how a user or visitor connected to this site, such as browser type, type of computer/device, operating system, internet service provider and IP address. We use this information for support purposes and to monitor the health of the site, identify problems, improve service, detect unauthorized access and fraudulent activity, prevent and respond to security incidents and appropriately scale computing resources.

Web Analytics

Pearson may use third party web trend analytical services, including Google Analytics, to collect visitor information, such as IP addresses, browser types, referring pages, pages visited and time spent on a particular site. While these analytical services collect and report information on an anonymous basis, they may use cookies to gather web trend information. The information gathered may enable Pearson (but not the third party web trend services) to link information with application and system log data. Pearson uses this information for system administration and to identify problems, improve service, detect unauthorized access and fraudulent activity, prevent and respond to security incidents, appropriately scale computing resources and otherwise support and deliver this site and its services.

Cookies and Related Technologies

This site uses cookies and similar technologies to personalize content, measure traffic patterns, control security, track use and access of information on this site, and provide interest-based messages and advertising. Users can manage and block the use of cookies through their browser. Disabling or blocking certain cookies may limit the functionality of this site.

Do Not Track

This site currently does not respond to Do Not Track signals.


Pearson uses appropriate physical, administrative and technical security measures to protect personal information from unauthorized access, use and disclosure.


This site is not directed to children under the age of 13.


Pearson may send or direct marketing communications to users, provided that

  • Pearson will not use personal information collected or processed as a K-12 school service provider for the purpose of directed or targeted advertising.
  • Such marketing is consistent with applicable law and Pearson's legal obligations.
  • Pearson will not knowingly direct or send marketing communications to an individual who has expressed a preference not to receive marketing.
  • Where required by applicable law, express or implied consent to marketing exists and has not been withdrawn.

Pearson may provide personal information to a third party service provider on a restricted basis to provide marketing solely on behalf of Pearson or an affiliate or customer for whom Pearson is a service provider. Marketing preferences may be changed at any time.

Correcting/Updating Personal Information

If a user's personally identifiable information changes (such as your postal address or email address), we provide a way to correct or update that user's personal data provided to us. This can be done on the Account page. If a user no longer desires our service and desires to delete his or her account, please contact us at customer-service@informit.com and we will process the deletion of a user's account.


Users can always make an informed choice as to whether they should proceed with certain services offered by InformIT. If you choose to remove yourself from our mailing list(s) simply visit the following page and uncheck any communication you no longer want to receive: www.informit.com/u.aspx.

Sale of Personal Information

Pearson does not rent or sell personal information in exchange for any payment of money.

While Pearson does not sell personal information, as defined in Nevada law, Nevada residents may email a request for no sale of their personal information to NevadaDesignatedRequest@pearson.com.

Supplemental Privacy Statement for California Residents

California residents should read our Supplemental privacy statement for California residents in conjunction with this Privacy Notice. The Supplemental privacy statement for California residents explains Pearson's commitment to comply with California law and applies to personal information of California residents collected in connection with this site and the Services.

Sharing and Disclosure

Pearson may disclose personal information, as follows:

  • As required by law.
  • With the consent of the individual (or their parent, if the individual is a minor)
  • In response to a subpoena, court order or legal process, to the extent permitted or required by law
  • To protect the security and safety of individuals, data, assets and systems, consistent with applicable law
  • In connection the sale, joint venture or other transfer of some or all of its company or assets, subject to the provisions of this Privacy Notice
  • To investigate or address actual or suspected fraud or other illegal activities
  • To exercise its legal rights, including enforcement of the Terms of Use for this site or another contract
  • To affiliated Pearson companies and other companies and organizations who perform work for Pearson and are obligated to protect the privacy of personal information consistent with this Privacy Notice
  • To a school, organization, company or government agency, where Pearson collects or processes the personal information in a school setting or on behalf of such organization, company or government agency.


This web site contains links to other sites. Please be aware that we are not responsible for the privacy practices of such other sites. We encourage our users to be aware when they leave our site and to read the privacy statements of each and every web site that collects Personal Information. This privacy statement applies solely to information collected by this web site.

Requests and Contact

Please contact us about this Privacy Notice or if you have any requests or questions relating to the privacy of your personal information.

Changes to this Privacy Notice

We may revise this Privacy Notice through an updated posting. We will identify the effective date of the revision in the posting. Often, updates are made to provide greater clarity or to comply with changes in regulatory requirements. If the updates involve material changes to the collection, protection, use or disclosure of Personal Information, Pearson will provide notice of the change through a conspicuous notice on this site or other appropriate way. Continued use of the site after the effective date of a posted revision evidences acceptance. Please contact us if you have questions or concerns about the Privacy Notice or any objection to any revisions.

Last Update: November 17, 2020