Possible enhancements:
Search engines have a linkto: facility,
so you can see who links to you,
but it takes forever to browse the list
so you don't bother.
This would be a
standalone program to
find all pages that link to the user or reference them,
download all these pages (will take a long time),
perhaps sort them by the topic or page referenced,
and present them all in a nice readable output,
with all the references highlighted (as in Google's cache).
How to highlight a phrase using
the font tag
View Source
to see how this is done (link works on Firefox).
How to highlight a phrase using
the bold tag
View Source
to see how this is done (link works on Firefox).
| How to highlight a phrase using | tables |
Extract keywords from page. (How? Need idea of dictionary frequency.)
Use search engines to find similar pages on Web.
Implement as CGI script so that I can automatically add a "What is like this?" link at the top of every URL.
CGI script. Input is a web page.
Extracts all proper nouns
(perhaps words that are not in the UNIX dictionary
- see "man spell"
and /usr/share/lib/dict/).
For all these proper nouns, link them to a search engine
so you can just click on the word
to search for it on the web.
Output is this new web page with all these words linked.
Enhancement - Perhaps check first to see if you can find a
Yahoo category
for the word.
If so, link to it.
If not, link to a search engine.
Perhaps the CGI script just functions as a spell-checker.
The basic enhancements of this and the other CGI scripts below is that images should still display from the original page. Also when links are followed, the new page should be put through the CGI script as well.
CGI script. Input is web page.
Tries to extract all the locations on the page
and link them to some online map.
Output is this new web page with all these words linked.
Difficult bit is identifying the place names
(as opposed to just any proper nouns).
Perhaps can identify Irish and UK locations
by the presence of a trailing Co.--- (one of the 32 counties)
or --Shire (one of the UK counties).
Enhancement might be to actually include the map on the page
(make sure no copyright infringement!).
Alternatively, link to a choice
of online maps.
Also see basic CGI enhancements above.
As above. Identify all dates on the page,
in any format, and, using "cal" program,
convert them all to format showing the day of the week,
like: "Sun 1st Nov 1818".
Enhancement - Link each month to a second CGI script
which displays calendar for that month.
Also see basic CGI enhancements above.
For all of these CGI scripts, I have many web pages with proper nouns, locations and dates, that could be used as test cases.
Background:
The program:
Notes:
Enhancements:
Takes trees which are in a structured HTML format (e.g. GEDCOM 2 HTML), and tries to match up fragments of them with other trees in structured HTML format on the Web, looking for overlaps.
Start with matching surname lists. Then look for overlaps round each individual.
Similar to "What is like this?" above.
The standard format for computerised family trees
is the
GEDCOM format.
Historically, the standard format for paper family trees
has been the Burke's Peerage narrative format.
The aim of the project is to write a
converter between the two.
The converter would take as input a family tree
in GEDCOM format (there are many sample GEDCOM trees on the Web)
and output the information in the
Burke's narrative format in HTML (which is illustrated on my own Web pages).
One of the main challenges for the software
would be to automatically detect where to break
the narrative and start a new narrative, something Burke's
(and I) currently do by hand.
The result would be a more flexible output than the databases provide.
Perhaps to be done in cooperation with Tompsett at Hull. Debugged offline on separate data. End product is a script Tompsett could add to his site, so that we can see all of Tompsett's data in condensed Hypertext Narrative format.