Sunday, June 6. 2010
Heres the latest paper I've published in JPDC. The Wellcome Trust has paid for it to be made open access.
Andrew J. Page, Thomas M. Keane and Thomas J. Naughton, Multi-heuristic dynamic task allocation using genetic algorithms in a heterogeneous distributed system, Journal of Parallel and Distributed Computing, Volume 70, Issue 7, July 2010, 758-766.
Abstract
We present a multi-heuristic evolutionary task allocation algorithm to dynamically map tasks to processors in a heterogeneous distributed system. It utilizes a genetic algorithm, combined with eight common heuristics, in an effort to minimize the total execution time. It operates on batches of unmapped tasks and can preemptively remap tasks to processors. The algorithm has been implemented on a Java distributed system and evaluated with a set of six problems from the areas of bioinformatics, biomedical engineering, computer science and cryptography. Experiments using up to 150 heterogeneous processors show that the algorithm achieves better efficiency than other state-of-the-art heuristic algorithms.
Article on Science Direct
PDF
Tuesday, July 7. 2009
My son Sean was born on the 22nd of August.
Monday, July 6. 2009
I'm retrieving the search engine count estimates for queries from Bing, Google and Yahoo, using the APIs for each and getting back JSON formatted results. You'd think its easy, but think again. The problem is that each search engine has a different format with different interpretations of the JSON spec. For example, the total number of hits (search engine count estimate) for a query is called "Total" by Bing, "estimatedResultCount" by Google and "deephits" by Yahoo. It gets worse, Bing returns a integer value (correct) while Yahoo and Google return a string containing an integer (not so correct). Hopefully in the future there will be a single standard format for interacting with a search engine, however its a very very long way off.
Monday, June 29. 2009
Some observations on using the search APIs for the 3 major search engines, Google, Bing and Yahoo. By far the best I've found is Bing, followed by Yahoo, and lastly Google.
To perform a search for Ireland:
For Bing and Yahoo you need to sign up for an API key. It only takes 2 minutes. Google doesnt need an API key. All can return JSON formatted search results (also XML), however each has a proprietary format. All search engines have removed limits on the number of queries you can submit. Bing and Yahoo dont place limits on the number of results that can be returned from a single query, however Google limits you to 64 results for a general search (other searches are more limited). The 'rsz' parameter for Google can be small (4 results) or large (8 results). To retrieve more results for all you can apply an offset, which is the last parameter for each. Overall the search APIs have improved massively over the past year. All thats really missing is a unified search syntax and result set.
Tuesday, June 16. 2009
Using data I collected from Reddit for RedditTrends I found some interesting spikes in the number of votes submitted articles receive.

This graph shows the sum total of UP votes received per day by submitted links.

This graph shows the sum total of DOWN votes received per day by submitted links.
As you can see, the graphs are normally quite static, but have huge spikes every now and again, which are massively out of kilter with the normal every day average.

This graph shows the total score achieved which should be = UP votes - DOWN votes. On one positive note, users of Reddit are twice as likely to UP vote a link than to DOWN vote a link.

When this is compared to the total number of comments its clear to see that the spike around January is a natural one. It seems to correspond with Obamas election and inauguration.
The huge spike on 24th of April was effectively a revolt on Reddit over the number of duplicate stories. Dozens of links got vast numbers of UP and DOWN votes, mostly with the same title.
Sunday, May 3. 2009
As an extension to PredictReddit I've created RedditTrends.com. Its kind of like Google trends except for Reddit submissions. Its just an early work in progress but looks interesting and produces pretty graphs.
Pandemic
Obama
Mexico
Graphs are produced using flot (jQuery). The backend runs the Zend framework (PHP) and MySQL.
Saturday, May 2. 2009
On Reddit there are a huge number of links submitted, however few ever get enough votes to make it to the front page of the site where most Reddit users will see it. A submission depends greatly on the title, however you only have one shot to get it right.
This is where PredictReddit comes in. It allows you to test out your proposed title, giving you an estimate of the number of votes it is likely to get. So you can fine tune it before you submit it to Reddit. It uses past submissions to predict future votes. This of course assumes that the Reddit community is interested in similar recurring topics (and it seems to be).
There can be some confusion over the results it gives back. For example, if you type in a title that you know got a high number of votes and you get a low number of votes back. You might assume that PredictReddit is broken, however in reality, if a story is very successful, often you find numerous other submissions trying to piggyback on its success (but they fail to get many votes). This gives a low predicted number of votes. Best to play around with it yourself and try it out. And remember its just for fun.
It works by using a k-Nearest Neighbours algorithm. It was written in PHP using the Zend Framework. It uses MySQL for data storage. Data is pulled from Reddit using their json interface.
Thursday, April 16. 2009
For those of you who are interested, here's my PhD thesis:
Adaptive Scheduling in Heterogeneous Distributed Computing Systems.
Abstract
The main focus of this research is in the area of adaptive scheduling for heterogeneous
distributed systems. Given an unreliable, non-dedicated set of
processing and communication resources, a scheduler is required to allocate
tasks to processors. No information about the state of the system, which
can vary over time, or the tasks to be processed, is known in advance and
thus must be estimated dynamically. Current schedulers do not adequately
address this dynamism. To address this, a property estimation method is
presented, which utilizes a k-Nearest Neighbours algorithm, a smoothed average
and an analytical benchmark. These estimated properties are then
used by two different scheduling techniques, which make less restrictive assumptions
than the current state-of-the-art methods. A multi-heuristic evolutionary
method utilizes a genetic algorithm and eight simple heuristics to
efficiently allocate tasks to processors. A deterministic method utilizes the
error inherent in estimating the properties of the system and the execution
time of tasks, to allocate tasks to processors. The algorithms have been
implemented on a real-world heterogeneous distributed system with up to
150 processors. A set of real-world problems from the areas of cryptography,
bioinformatics, and biomedical engineering were used as a test set to measure
the effectiveness of the scheduling algorithms. Experiments have shown that
both methods achieve better efficiency than other state-of-the-art heuristic
algorithms. Finally, a low memory distributed reconstruction application for
large digital holograms is presented, which has significantly increased the size
of holograms that can be reconstructed, over the previous state-of-the-art.
Thursday, October 30. 2008
Journal Publications
- Lukas Ahrenberg, Andrew J. Page, Bryan M. Hennelly, John B. McDonald, and Thomas J. Naughton, Using commodity graphics hardware for real-time digital hologram view-reconstruction,Journal of Display Technology, vol. 5, no. 1, 2009.
- Andrew J. Page, Thomas M. Keane and Thomas J. Naughton, Scheduling in a dynamic heterogeneous distributed system using estimation error, Journal of Parallel and Distributed Computing, Volume 68, Issue 11, November 2008, 1452-1462. DOI
- Andrew J. Page, Lukas Ahrenberg, Thomas J. Naughton, Low memory distributed reconstruction of large digital holograms, Optics Express 16, 1990-1995 (2008).
- Thomas M. Keane, Andrew J. Page, Thomas J. Naughton, S.A.A. Travers, J.O. McInerney, Building large phylogenetic trees on coarse-grained parallel machines, Algorithmica, Springer, vol. 45, no. 3, pp. 285-300, July 2006.
- Andrew J. Page, Thomas J. Naughton, Framework for task scheduling in heterogeneous distributed computing using genetic algorithms, Artificial Intelligence Review, Volume 24, Numbers 3-4, November 2005, Pages: 415 - 429, Springer.
Peer Reviewed Conference Papers
- Andrew J. Page, Shirley Coyle, Thomas M. Keane, Thomas J. Naughton, Charles Markham and Tomas Ward, Distributed Monte Carlo Simulation of Light Transportation in Tissue, 8th International Workshop on Java for Parallel and Distributed Computing, proceedings of the 20th International Parallel &Distributed Processing Symposium, Rhodes, Greece, April 2006. IEEE Computer Society.
- Thomas M. Keane, Andrew J. Page, James O. McInerney, Thomas J. Naughton, A high-throughput bioinformatics distributed computing platform, Bioinformatics and its Medical Applications Special Track, The 18th IEEE International Symposium on Computer-Based Medical Systems, pp. 377-382, Dublin, Ireland, June 2005.
- Andrew J. Page, Thomas J. Naughton, Dynamic task scheduling using genetic algorithms for heterogeneous distributed computing, 8th International Workshop on Nature Inspired Distributed Computing, proceedings of the 19th International Parallel & Distributed Processing Symposium, pp. 189a.1-189a.8, Denver, Colorado, USA, April 2005. IEEE Computer Society. Bibtex and abstract
- Andrew J. Page, Thomas M. Keane, Thomas J. Naughton,Bioinfomatics on a Heterogeneous Java Distributed System,7th International Workshop on Java for Parallel and Distributed Computing, proceedings of the 19th International Parallel & Distributed Processing Symposium, pp. 184a.1-184a.4, Denver, Colorado, USA, April 2005. IEEE Computer Society.Bibtex and abstract
- Andrew J. Page, Thomas J. Naughton,Framework for task scheduling in heterogeneous distributed computing using genetic algorithms, 15th Artificial Intelligence and Cognitive Science Conference, eds. Lorraine McGinty and Brian Crean, pp. 137-146,September 8th - 10th 2004, Castlebar, Ireland. ISBN 1-902277-89-9.Bibtex and abstract
- Andrew J. Page, Thomas Keane, Thomas J. Naughton,Adaptive Scheduling Across a Distributed Computation Platform,Third International Symposium on Parallel and Distributed Computing, ed. John P. Morrisson, pp. 141-149,July 2004, Cork, Ireland. ISBN 0-7695-2210-6, IEEE Computer Society.Bibtex and abstract
- Andrew Page, Thomas Keane, Richard Allen, Thomas J. Naughton, John Waldron,Multi-tiered distributed computing platform,2nd International Conference on the Principles and Practice of Programming in Java, pp. 191-194,Kilkenny City, Ireland, June 2003. ISBN 0-9544145-1-9. Bibtex and abstract
- Thomas M. Keane, Andrew Page, Thomas J. Naughton, Simon A.A. Travers,James O. McInerney, Grace P. McCormack, "Heterogeneous distributed computing," IFIP Working Group 8.6 Conference on IT Innovation for Adaptability and Competitiveness, Leixlip, Ireland, 30 May - 2 June 2004.
Wednesday, July 9. 2008
I've updated Baruchs LaTex thesis template. Enjoy.
Tuesday, May 20. 2008
Opera Mini - Easy to use web browser. It should be the first thing you install.
Fring - Universal instant messenger and VOIP application. It works with MSN, Gtalk, Yahoo, Skype, Twitter etc.... Its very easy to use, and allows you to stay connected all the time. This alone is a killer app.
Gmail - If you use Gmail, get the mobile application. Its fast and neat. You cant send attachments however.
Google Maps - Get satellite photos and maps on your phone. Can pinpoint your approximate location (to the nearest cell tower).
Train timetables - Stripped down interface to the Irish Rail website, without all the bloat.
Wednesday, November 14. 2007
The logical operator for NOT Equals in Matlab is ~= rather than !=
So not is ~ in matlab, compared to ! in Java/C/Perl etc....
Tuesday, November 13. 2007
Bigulo has performed more than 5 million searches since Sept 2006 (14 months). Its still going strong, with a constantly high level of traffic (and very low bounce rate). On average each visitor looks at 5 pages per visit.
Wednesday, October 24. 2007
I googled for something from Politics.ie today and came across a Chinese domain leecher. They had setup http://poitics-ie.hostsoft.us (now blocked) and were redirecting their DNS to the Politics.ie server. Thus they were leeching off our pagerank (and content) and succeeded in getting 1200 URLs into the Google index (with our content). They then either sell the domain (with a temporarily high pagerank) or replace the pages with ads and steal some of our referrals. Anyone using that domain now gets a 403 error.
Thursday, September 27. 2007
View Larger Map
We went for a hike last weekend, its great what can be done with gps
|