Home > Articles > Programming > Windows Programming

Windows Parallelism, Fast File Searching, and Speculative Processing

Johnson (John) M. Hart, author of Windows System Programming, shows how to speed up simple file comparison by a factor of 10 or more, using parallelism with speculative processing, combined with memory-mapped files. These techniques work just as well in Linux/UNIX, and they can be extended to parallelize and speed up other file and data stream computations.
Like this article? We recommend

Introduction

Sequential file processing (SFP) and data stream processing, while basic, is an essential computational task that's used for searching, transformation, archiving, encryption, and countless other tasks. Windows CMD and POSIX (Linux, UNIX, and Cygwin) both support commands to perform SFP operations.

Nonetheless, many SFP implementations (including the Windows and POSIX commands) are much slower than necessary, as they fail to exploit two basic techniques: parallelism and memory-mapped files. These techniques are enabled by three advances in commodity computer systems, including laptops:

  • Multi-core CPU chips and multi-chip systems enable parallelism; that is, independent tasks running concurrently on separate cores. The programming challenge consists of determining how to subdivide the problem into parallel tasks; it's easy in some cases and very difficult in others.
  • Large physical memories such as 8GB are common even on inexpensive desktops, and much larger memories are available on high-end systems.
  • 64-bit systems allow for mapping large files easily and efficiently, although memory-mapped files are effective for smaller files on 32-bit systems. In most cases, processing the mapped memory is faster than conventional file I/O.

Example programs in my book Windows System Programming, Fourth Edition (WSP4) [6] show how to speed up sequential file processing significantly in two specific SFP patterns:

  • File transformation, in which an input file is transformed (example: encryption) and written to an output file.
  • File processing, in which multiple input files are processed concurrently to obtain information such as counts of lines, words, and characters (specifically, the POSIX wc command).

In a recent paper, "Sequential File Processing: A Fast Path from Native Code to .NET with Visits to Parallelism and Memory Mapped Files," (SFP-Fast) [7], I've shown how the same techniques can be extended from Windows native code to C#/.NET managed code.

These two examples are cases of Toub's [8] "delightful parallelism," in which the file processing task can easily be divided into smaller "stripes" (collections of regularly spaced, fixed-size blocks) to be processed concurrently by separate tasks. Not all SFP problems are so delightfully parallel, however, and now I'll show how to solve a harder problem that requires "speculative processing" (Toub, p. 94). The performance results using native Windows code are excellent, relative to the Windows commands. The paper will show performance graphs, show the solution's limitations, and describe the solution strategy. Complete code is in the WSP4 Examples file, and spreadsheets with raw performance data are downloadable from the author's website.

A Brief Parallelism Review

Parallelism is a hot topic, widely discussed (see the "References" section). Nearly all writers, including this one, agree that parallelism is a difficult topic and that, in general, it's challenging to find and exploit the parallelism in many computational problems. It's even more difficult to convert legacy code to exploit parallelism. Parallel programmers face challenges including but not limited to shared data, locking, deadlocks, race conditions, false sharing, and identifying computational tasks that can run concurrently.

However, this complexity shouldn't stop us from exploiting parallelism when it's relatively easy to achieve in practical situations, such as the file processing and file transformation tasks mentioned earlier. Furthermore, the experience and confidence gained from solving simpler problems may help to solve some harder problems in the future.

A Harder Problem: File Comparison

The benchmark problem solved in this article involves speculative processing:

  • We search data streams for a pattern or specific condition.
  • We need to identify the first occurrence(s) of the condition.
  • Once an occurrence has been detected, other search tasks terminate as soon as possible in order to maximize performance.

More generally, Toub describes speculative processing this way:

Speculation is the pattern of doing something that may not be needed in case it actually is needed. This is increasingly relevant to parallel computing, where we can take advantage of multiple cores to do more things in advance of their actually being needed. Speculation trades off throughput for reduced latency, by utilizing resources to do more work in case that extra work could pay dividends.

This article's specific benchmark problem is to implement a fast version of the Windows COMP command (called compMP), which compares two files, byte by byte, and displays the mismatches in order of occurrence. compMP differs from the COMP command in several ways that are useful but challenging to implement:

  • The search terminates after identifying eight mismatches (an arbitrary limit).
  • Optional command-line parameters specify a block size (default: 4,096) and the number of concurrent threads (default: the number of processors).
  • If the number of threads is negative, the program uses conventional file processing with ReadFile Windows calls, reading block size blocks for each read operation. This allows for comparison with both the fast memory-mapped/parallel implementation and with the real Windows COMP command.

The implementation challenges occur because mismatches can be identified at any time in different tasks (threads) and may not be found in the order in which they occur in the two files. Therefore, the solution needs to keep track of the earliest mismatches and, when the maximum mismatch count (8) is reached, other search tasks should terminate if they're searching past the last mismatch location.

Some Performance Results

This section gives performance results to show that the effort is worthwhile. Following sections will discuss the key implementation considerations.

First, in order to become familiar with the command syntax, Figure 1 shows results on two short files (3,968 bytes) which differ in two locations. The program name is compMP (memory-mapped, parallel compare), and there is also a run of Windows COMP. Notice the command-line parameters to specify block size and thread count (which is negative in one case to force conventional file processing).

Figure 1: compMP and Windows COMP comparing two small files.

Figure 2 shows performance results using timep (a time utility in WSP4) to measure the elapsed (real), user, and system time on a four-core 64-bit system. Results, arranged from fastest to slowest, are shown for four threads, two threads, one thread, conventional file access (ReadFile), and Windows COMP.

The compared files are identical and 640,000,000 bytes long. This size was selected as follows:

  • The size is sufficiently large to consume measurable—but not excessive—time.
  • A large file can stress a system with less memory or a 32-bit CPU, revealing scalability issues.
  • The size (640,000,000 bytes) is not a power of two and hence not a multiple of the block sizes, which is important for thorough testing of some implementations.
  • Many applications, including emerging cloud applications, must process large data sets (this size is small by such standards), as described, for example, by Gannon and Reed [5].

Figure 2: compMP and Windows COMP comparing two identical large files.

Summarizing these results:

  • The fast implementation, using memory-mapping and four tasks, consumed 0.562 seconds real (elapsed) time. The user time is larger, as it's the sum of times on four processors. The default number of tasks (four on the test system) is nearly optimal (eight threads doesn't produce better results) and is significantly faster than the one-thread and two-thread cases.
  • The conventional file processing real-time is 2.353 seconds (more than three times the four-thread result and about double the one-thread case). The block size (65,536) is important; smaller block sizes are slower, as there are more ReadFile system calls. Examples: 512-byte blocks: 15.1s; 2,048-byte blocks: 5.7s; 8,192-byte blocks: 2.9s.
  • Windows COMP required more than 13 seconds (there was a small delay to respond to the prompt).

Figure 3 is a graph showing compMP and Windows COMP results for different file sizes, using both conventional file processing (65,536 byte blocks) and memory-mapped, parallel file processing. As Figure 3 shows, compMP is not efficient for very large files (discussed in the next section).

Figure 3: compMP performance as a function of file size and number of threads.

Overcoming Large File Limitations

compMP, with a thread for every processor (the red line), is significantly faster than other solutions for files of "moderate" size. Efficiency requires sufficient physical memory to hold each file, and the test system RAM size is 8GB, not enough for two 5GB files.

The four threads cause page thrashing, as the threads can be accessing the file at different locations; and the elapsed time, while highly variable, usually exceeds 500 seconds. Two threads provide slightly better performance (about 400 seconds), but one thread (the green line) is best.

The elapsed time is off the charts for the 5,120GB files for all solutions, but here are the numerical results for the three single-threaded solutions:

  • ReadFile (conventional file processing) requires about 60 seconds, and is the fastest of the three single-threaded solutions.
  • compMP with one thread (the green line) requires about 185 seconds.
  • Windows COMP (light blue line) requires about 185 seconds, and is the slowest solution in all cases.

These results suggest a simple adaptive strategy (not implemented) for compMP, which would appear to beat Windows COMP in all cases:

  • Test the file size. If two files don't fit conveniently in the available physical memory, use the ReadFile code. Experiment to find the Windows overhead to determine the file-size threshold.
  • If enough physical memory exists, use the memory-mapped file, multithread code.

32-Bit Results

Results on older 32-bit systems are similar for smaller files. The 320MB file test worked well, but the 640MB files couldn't be mapped due to virtual address space limitations. compMP should extend its adaptive strategy for 32-bit operation.

About the Solution

The source code is available at the WSP4 support site. A ReadMe.txt file is available, and you should open the compMP solution in the Projects2008_64 folder, which contains Visual Studio 2008 64-bit projects. The source file is CHAPTR14\compMP.c, and the project settings are consistent with other WSP4 examples, as described in ReadMe.txt.

Source-code comments explain the important code features, but here's a summary of important points:

  • The memory-mapping and thread management are standard, corresponding to patterns that WSP4 (Chapters 5 and 7–10) describes and uses extensively.
  • The file-comparison threads used "shared state," which maintains an array of up to eight mismatch entries, with each mismatch entry containing file position and byte values in the two files.
  • The shared state contains a lock; in this case, a "Slim Reader/Writer" (SRW) lock (not available with Windows XP and Server 2003). A conventional mutex or CRITICAL_SECTION would also work, but neither is as efficient as the SRW lock.
  • The mismatch table is maintained in order, sorted by the file location. See the LogMismatch function for the table update logic and the exclusive SRW locking. The implementation is simple and adequate given the small table size, and this logic is invoked only when there's a mismatch, which will never occur in the normal case of identical files.
  • The comparison is performed in eight-byte units as Microsoft C++ __int64 data items. If there's a mismatch, the program identifies the one or more bytes that mismatch. This is much more efficient (by a factor of three or more, in some cases) than comparing byte by byte. Other code optimizations are possible but haven't been attempted.
  • The thread shutdown logic in the ReadCompare function is passive; there's no need for any type of thread cancellation. Each thread periodically (at the start of each block) examines the thread state to determine whether the mismatch table is full, and whether the thread's file location is greater than the maximum location in the table. A thread might run a bit longer than necessary, but locking the shared state and test before each byte comparison would be inefficient. Furthermore, no cancellation technique assures immediate response.
  • Locking before all shared-state access erects memory barriers, ensuring that the latest values are available.

A Harder Search Problem

File comparison, which in this case searches a file pair for a limited number of mismatches, lends itself to speculative processing because there's no state as we scan the file, other than the shared mismatch data. Two bytes are either the same or different. If they're different, we log the mismatch. In either case, we can then compare the next byte pair. There's no problem when scanning an individual block, as there's no concern about the state when starting to scan the block.

Searching a file for a regular expression is an instance of a more complex problem, as there's no way to know the state when starting to process a file block. You need to process the file sequentially from beginning to end. If the search pattern is very simple, such as a constant string, a little ingenuity can overcome this problem, by allowing a thread to continue past the end of a block into the next block.

The general regular expression doesn't have an easy, apparent solution using speculative or any other form of parallel processing, but "whole file parallelism" works well. That is, if several files are specified on the command line, create a distinct thread to process each file. Use shared state to accumulate and order the pattern matches. WSP4 has a wc (word count) solution using whole-file parallelism.

Linux and UNIX Results

It's straightforward to convert the code to POSIX, although the Windows-specific OS calls and data types should be replaced carefully. I haven't performed this conversion, but earlier results with whole-file parallelism with the wc command indicate that we can expect similar results for compMP.

Future Work

A future article will extend the solution from native code to C#/.NET code and exploit new .NET 4.0 features, thereby extending the solution in SFP-Fast. This article will also explore other speculative processing patterns.

References

[1] Asanovic, K., et al. "A View of the Parallel Computing Landscape." Commun. ACM 52, 10 (Oct. 2009), 56–67.

[2] Butenhof, D. R. Programming with POSIX Threads. Addison-Wesley, 1997.

[3] Cormen, T. H., et al. Introduction to Algorithms, Third Edition. The MIT Press, 2009. (Chapter 27.)

[4] Duffy, J. Concurrent Programming on Windows. Addison-Wesley, 2008.

[5] Gannon, G. and D. Reed. "Parallelism and the Cloud." Dr. Dobb's Journal, October 16, 2009.

[6] Hart, J. M. [WSP4] Windows System Programming, Fourth Edition. Addison-Wesley, 2010. (Download the Examples code from the author's website.)

[7] Hart, J. M. [SFP-Fast] "Sequential File Processing: A Fast Path from Native Code to .NET with Visits to Parallelism and Memory Mapped Files." June 14, 2010. (To appear in Dr. Dobb's Journal.)

[8] Toub, S. "Patterns for Parallel Programming: Understanding and Applying Parallel Patterns with the .NET Framework 4." February 16, 2010.

[9] Wilson, G. V. "Deja Parallel All Over Again." Dr. Dobb's Journal, June 19, 2005.

Johnson (John) M. Hart is a consultant specializing in Microsoft Windows and .NET application development, open systems computing, and technical training and writing. He has over 25 years' experience as a software engineer, manager, engineering director, and architect at Cilk Arts, Inc. (now part of Intel); ArrAy, Inc. (now part of Sierra Atlantic, Inc.); HP; and Apollo Computer. He served as a computer science professor at the University of Kentucky for nine years, and authored all four editions of Windows System Programming. John still enjoys pushing lots of bits around as fast as possible.

InformIT Promotional Mailings & Special Offers

I would like to receive exclusive offers and hear about products from InformIT and its family of brands. I can unsubscribe at any time.

Overview


Pearson Education, Inc., 221 River Street, Hoboken, New Jersey 07030, (Pearson) presents this site to provide information about products and services that can be purchased through this site.

This privacy notice provides an overview of our commitment to privacy and describes how we collect, protect, use and share personal information collected through this site. Please note that other Pearson websites and online products and services have their own separate privacy policies.

Collection and Use of Information


To conduct business and deliver products and services, Pearson collects and uses personal information in several ways in connection with this site, including:

Questions and Inquiries

For inquiries and questions, we collect the inquiry or question, together with name, contact details (email address, phone number and mailing address) and any other additional information voluntarily submitted to us through a Contact Us form or an email. We use this information to address the inquiry and respond to the question.

Online Store

For orders and purchases placed through our online store on this site, we collect order details, name, institution name and address (if applicable), email address, phone number, shipping and billing addresses, credit/debit card information, shipping options and any instructions. We use this information to complete transactions, fulfill orders, communicate with individuals placing orders or visiting the online store, and for related purposes.

Surveys

Pearson may offer opportunities to provide feedback or participate in surveys, including surveys evaluating Pearson products, services or sites. Participation is voluntary. Pearson collects information requested in the survey questions and uses the information to evaluate, support, maintain and improve products, services or sites, develop new products and services, conduct educational research and for other purposes specified in the survey.

Contests and Drawings

Occasionally, we may sponsor a contest or drawing. Participation is optional. Pearson collects name, contact information and other information specified on the entry form for the contest or drawing to conduct the contest or drawing. Pearson may collect additional personal information from the winners of a contest or drawing in order to award the prize and for tax reporting purposes, as required by law.

Newsletters

If you have elected to receive email newsletters or promotional mailings and special offers but want to unsubscribe, simply email information@informit.com.

Service Announcements

On rare occasions it is necessary to send out a strictly service related announcement. For instance, if our service is temporarily suspended for maintenance we might send users an email. Generally, users may not opt-out of these communications, though they can deactivate their account information. However, these communications are not promotional in nature.

Customer Service

We communicate with users on a regular basis to provide requested services and in regard to issues relating to their account we reply via email or phone in accordance with the users' wishes when a user submits their information through our Contact Us form.

Other Collection and Use of Information


Application and System Logs

Pearson automatically collects log data to help ensure the delivery, availability and security of this site. Log data may include technical information about how a user or visitor connected to this site, such as browser type, type of computer/device, operating system, internet service provider and IP address. We use this information for support purposes and to monitor the health of the site, identify problems, improve service, detect unauthorized access and fraudulent activity, prevent and respond to security incidents and appropriately scale computing resources.

Web Analytics

Pearson may use third party web trend analytical services, including Google Analytics, to collect visitor information, such as IP addresses, browser types, referring pages, pages visited and time spent on a particular site. While these analytical services collect and report information on an anonymous basis, they may use cookies to gather web trend information. The information gathered may enable Pearson (but not the third party web trend services) to link information with application and system log data. Pearson uses this information for system administration and to identify problems, improve service, detect unauthorized access and fraudulent activity, prevent and respond to security incidents, appropriately scale computing resources and otherwise support and deliver this site and its services.

Cookies and Related Technologies

This site uses cookies and similar technologies to personalize content, measure traffic patterns, control security, track use and access of information on this site, and provide interest-based messages and advertising. Users can manage and block the use of cookies through their browser. Disabling or blocking certain cookies may limit the functionality of this site.

Do Not Track

This site currently does not respond to Do Not Track signals.

Security


Pearson uses appropriate physical, administrative and technical security measures to protect personal information from unauthorized access, use and disclosure.

Children


This site is not directed to children under the age of 13.

Marketing


Pearson may send or direct marketing communications to users, provided that

  • Pearson will not use personal information collected or processed as a K-12 school service provider for the purpose of directed or targeted advertising.
  • Such marketing is consistent with applicable law and Pearson's legal obligations.
  • Pearson will not knowingly direct or send marketing communications to an individual who has expressed a preference not to receive marketing.
  • Where required by applicable law, express or implied consent to marketing exists and has not been withdrawn.

Pearson may provide personal information to a third party service provider on a restricted basis to provide marketing solely on behalf of Pearson or an affiliate or customer for whom Pearson is a service provider. Marketing preferences may be changed at any time.

Correcting/Updating Personal Information


If a user's personally identifiable information changes (such as your postal address or email address), we provide a way to correct or update that user's personal data provided to us. This can be done on the Account page. If a user no longer desires our service and desires to delete his or her account, please contact us at customer-service@informit.com and we will process the deletion of a user's account.

Choice/Opt-out


Users can always make an informed choice as to whether they should proceed with certain services offered by InformIT. If you choose to remove yourself from our mailing list(s) simply visit the following page and uncheck any communication you no longer want to receive: www.informit.com/u.aspx.

Sale of Personal Information


Pearson does not rent or sell personal information in exchange for any payment of money.

While Pearson does not sell personal information, as defined in Nevada law, Nevada residents may email a request for no sale of their personal information to NevadaDesignatedRequest@pearson.com.

Supplemental Privacy Statement for California Residents


California residents should read our Supplemental privacy statement for California residents in conjunction with this Privacy Notice. The Supplemental privacy statement for California residents explains Pearson's commitment to comply with California law and applies to personal information of California residents collected in connection with this site and the Services.

Sharing and Disclosure


Pearson may disclose personal information, as follows:

  • As required by law.
  • With the consent of the individual (or their parent, if the individual is a minor)
  • In response to a subpoena, court order or legal process, to the extent permitted or required by law
  • To protect the security and safety of individuals, data, assets and systems, consistent with applicable law
  • In connection the sale, joint venture or other transfer of some or all of its company or assets, subject to the provisions of this Privacy Notice
  • To investigate or address actual or suspected fraud or other illegal activities
  • To exercise its legal rights, including enforcement of the Terms of Use for this site or another contract
  • To affiliated Pearson companies and other companies and organizations who perform work for Pearson and are obligated to protect the privacy of personal information consistent with this Privacy Notice
  • To a school, organization, company or government agency, where Pearson collects or processes the personal information in a school setting or on behalf of such organization, company or government agency.

Links


This web site contains links to other sites. Please be aware that we are not responsible for the privacy practices of such other sites. We encourage our users to be aware when they leave our site and to read the privacy statements of each and every web site that collects Personal Information. This privacy statement applies solely to information collected by this web site.

Requests and Contact


Please contact us about this Privacy Notice or if you have any requests or questions relating to the privacy of your personal information.

Changes to this Privacy Notice


We may revise this Privacy Notice through an updated posting. We will identify the effective date of the revision in the posting. Often, updates are made to provide greater clarity or to comply with changes in regulatory requirements. If the updates involve material changes to the collection, protection, use or disclosure of Personal Information, Pearson will provide notice of the change through a conspicuous notice on this site or other appropriate way. Continued use of the site after the effective date of a posted revision evidences acceptance. Please contact us if you have questions or concerns about the Privacy Notice or any objection to any revisions.

Last Update: November 17, 2020