Monday, July 28, 2014

Why when i add items to the List > it's doing the opertion very slow ? And why it's adding the items twice ?

I have a class i did called ListsExtraction.


In the top of the class:



public static List<List<string>> responsers = new List<List<string>>();



In the class i have a method called Links:



public List<string> Links(string FileName)
{
List<string> links = new List<string>();
List<string> allLinks = new List<string>();
Filters.FilteredLinks = new List<string>();
HtmlDocument doc = new HtmlDocument();
doc.Load(FileName);
foreach (HtmlNode link in doc.DocumentNode.SelectNodes("//a[@href]"))
{
HtmlAttribute att = link.Attributes["href"];
if (att.Value.StartsWith("http://ift.tt/1qmZBGw;))
{

links.Add(att.Value);

}
}

for (int i = 0; i < links.Count; i++)
{
int f = links[i].IndexOf("#");
string test = links[i].Substring(0, f);

allLinks.Add(test);
GetResponsersFN(test);
}
return allLinks;
}

I have inside a method im calling name: GetResponsersFN. In the bottom of this class i'm doing:



public List<string> GetResponsersFN(string filename)
{
string str = "";
using (WebClient client = new WebClient())
{
client.Headers.Add(HttpRequestHeader.ContentType, "charset=windows-1255");
str = client.DownloadString(filename);
}
return GetResponsers(str);

}

Then a method:



public List<string> GetResponsers(string contents)
{
string responser = "";
List<string> threadList = new List<string>();
int f = 0;
int startPos = 0;
while (true)
{
string firstTag = "<FONT CLASS='text16b'>";
string lastTag = "&n";
f = contents.IndexOf(firstTag, startPos);
if (f == -1)
{
break;
}
int g = contents.IndexOf(lastTag, f);
startPos = g + lastTag.Length;
responser = contents.Substring(f + 22, g - f - 22);
threadList.Add(responser);
}
SortList(threadList);
return threadList;
}

In the end a method that sort the List:



public List<string> SortList(List<string> thread)
{
thread = thread
.OrderBy(str =>
{
var match = Regex.Match(str, @"^([-+]?\d+)");
return match.Success ? int.Parse(match.Groups[1].Value) : int.MaxValue;
})
.ToList();
responsers.Add(new List<string>(thread));
return thread;
}



There are two problems at least two that i know about:




  1. All the operation of creating the responsers List is very slow when i'm running my program it's first time calling method Links and then the other methods and it's taking like almost 10 seconds untill it's creating responsers.




  2. This calls the method Links i call it from form1 timer tick event every 10 seconds so maybe this is why responsers have 100 items instead 50 ? I used a breakpoint and from 0 to 49 i see the items once and from 50 to 100 i see the same items again. Somehwere for some reason it's adding the items to responsers twice.




I used a breakpoint now and i saw that for the slow problem when i call the GetResponsersFN method in Links it's first doing all the itertions on the first link in test. Then the next link and then send test to the GetResponsersFN.


There are 50 links in the variable links in the method Links.


And i guess it's taking a lot of time since it's doing each single link in the GetResponsersFN.


I also tried to remove for the test the line:



GetResponsersFN(test);



Once i removed the line and run the program everything is running fast.


So is there any way to make all this operation with the GetResponsersFN faster ?


I checked also and saw that the method with the WebClient is getting fast and also the Sorting method get fast.


The problem is with the method GetResponsers that doing each single link and that take a lot of time.


Maybe there is a better way to make the method GetResponsers ?


About why it's adding everything twice i'm not sure .


Maybe it's adding twice since in the method GetResponsers i'm adding to the threadList once and then in the SortList method i'm doing responsers.Add ?




No comments:

Post a Comment