Home > Articles > Programming > C#

  • Print
  • + Share This
Like this article? We recommend

Like this article? We recommend

Vampire Bots: Quick Overview

As mentioned earlier, a vampire bot is a program that "drains" content from other web sites. Of course, we're speaking figuratively when we say drains, as no actual information is lost from the source site; rather, the vampire bot stores the source information on the destination computer. The program typically takes as input the address of a web page where the bot should start draining content and a folder where the bot should store the content.

Suppose you know of a web page with public domain images of planets; for example, at http://www.professorf.com/planets.html (see Figure 1).

Figure 1 The sample web page that our vampire bot will drain.

Now suppose that you want to store those images onto a folder on your laptop named d:\botpics\ and that you have a vampire bot named vbot. Before executing vbot, the botpics folder is empty (see Figure 2).

To run the simple vampire bot, you open a DOS window, execute vbot, and enter the web page address and folder location (see Figure 3).

Figure 2 Folder before executing the vampire bot.

Figure 3 Executing the vampire bot.

After the vampire bot finishes executing, it fills the folder with the images found on the web page, as shown in Figure 4.

Figure 4 Folder after executing the vampire bot.

Cutting right to the chase, Listing 1 shows the code for the simple vampire bot.

Listing 1—Vampire Bot Archetype

using System;
using System.Net;
using System.IO;
using System.Collections;

class vampirebot
{
string base_url, folder;

vampirebot(string url, string dir)
{
int slash_loc;

slash_loc = url.LastIndexOf("/");
base_url = url.Substring(0, slash_loc+1);
folder  = dir;
}

public string URLtoRawHTML(string URL)
{
WebRequest req;
WebResponse res;
Stream   str;

string   RawHTML;
int     ch;

req = WebRequest.Create(URL);
res = req.GetResponse();
str = res.GetResponseStream();

RawHTML = "";
while ((ch=str.ReadByte())!=-1)
 RawHTML=RawHTML+Convert.ToChar(ch);

str.Close();
res.Close();

return RawHTML;
}

public ArrayList RawHTMLtoImageList(string raw_html)
{
string patt, spat, epat;
int  ploc, sloc, eloc;
string file;
ArrayList list;

patt=".gif";
spat="\"" ;
epat="\"" ;

list = new ArrayList();

ploc=raw_html.IndexOf  (patt, 0);
while (ploc>=0) {
 sloc=raw_html.LastIndexOf(spat, ploc)+1;
 eloc=raw_html.IndexOf  (epat, sloc)-1;
 file=raw_html.Substring (sloc, eloc-sloc+1);
 ploc=raw_html.IndexOf  (patt, eloc);
 list.Add(file);
}

return list;
}

public void ImageListtoFiles(ArrayList file_list)
{
int i;
string filename;
FileStream fs;

WebRequest req;
WebResponse res;
Stream   str;

int ch;

for (i=0; i < file_list.Count; i++) {
 filename=Convert.ToString(file_list[i]);
 filename=filename.Replace("/", "_");
 filename= folder+"/"+filename;
 fs=new FileStream(filename, FileMode.Create);

 req = WebRequest.Create(base_url+file_list[i]);
 res = req.GetResponse();
 str = res.GetResponseStream();

 while ((ch=str.ReadByte())!=-1)
  fs.WriteByte(Convert.ToByte(ch));

 str.Close();
 res.Close();
 fs.Close();
}

}

public static void Main()
{
string   url, dir;
vampirebot vbot;
string   rawHTML;
ArrayList  alist;

Console.Write("Enter starting URL: ");
url=Console.ReadLine();
Console.Write("Destination folder? ");
dir=Console.ReadLine();

vbot = new vampirebot(url,dir);

rawHTML = vbot.URLtoRawHTML(url);
alist  = vbot.RawHTMLtoImageList(rawHTML);
vbot.ImageListtoFiles(alist);
}
}

Remember that this code is just an archetype; you have to modify it for your specific task, which requires an understanding of how the code works. The next section examines the design of a vampire bot from a code-improvisation perspective. You'll find that the code contains many useful motifs—for string manipulations, file handling, and web streaming, to name a few—that you can use in a wide variety of other programs.

  • + Share This
  • 🔖 Save To Your Account