Home Research Software Personal Network
Contact Information
    Office:  231 Rush Building
    Tel(O): (215)895-6360
    Tel(C): (215)840-6193
 
Miscellaneous
  Contact Information
  Curriculum Vitae
  Research Statement
  Teaching Statement

Davis's Software Factory

For Academia

Semantic Profiling of Human Tags@CiteULike (2007/12)
Social bookmarking has been a popular service in Internet. The co-occurrence of human tag and the text it annotated for gives us a way to explore the meaning of the tag. Such semantic knowledge would be useful in understanding user information needs in IR. Here is a demo based on the dataset of CiteULike from Nov. 2004 to Feb. 2007.
Web-based Factoid Question Answering (2007/10)
The current version can answer entity, number and abbreviation-related factoid questions ( not support description or definition questions yet). We developed a novel scoring algorithm which took into account the context of the candidate answers and significantly improved the accuracy. In general, the question beginning with a question word(e.g., how, when, who, where, what and which) brings higher accuracy, but not required.
Collective Wisdom based Entity Classification (2007/09)
The assignment of a semantic category to an entity (concept) is a challenging problem to machines. Traditional approaches extract features from either surface forms or local contexts (surrounding texts) and then apply machine learning methods or human-coded rules for entity classification. Such classifiers usually require large number of training examples and domain-specific tuning and even human-created ontologies (dictionaries). Instead, this tool utilizes the wisdom of crowds for entity classification. It builds the semantic context for each entity through web search engines such as Google. The top ranked documents returned by a search engine gives the sense of what poeple think of this entity! The new approach is simple, robust and powerful. No tuning, no external dictionaries, applicable to any domain,and most importantly, good accuracy! Click here to see a demo.
The Dragon Pinyin [ÁúÆ´] (2007/04)
A smart pinyin for Chinese text input. This tool is featureed for its high accuracy for long Chinese text input and flexible personalized startup training from one's personal documents such as emails, chat logs, blogs, and other written essays. Click here (or this address) to see the online demo.
The Dragon Toolkit (2006/07)
The Dragon Toolkit is a Java-based development package for academic use in language modeling (LM), information retrieval (IR), and text mining (TM, including text classification, text clustering, text summarization, and topic modeling). Language modeling has recently emerged as an attractive new framework for text information retrieval and text mining. However, most Java-based free search engines such as Rucene do not support LM very well. The Lemur toolkit is designed for LM and IR, but written in C and C++, which may be a hindrance to people who prefer Java programming. Basically, the dragon toolkit is tailored for researchers who work on large-scale LM, IR and TM and prefer Java programming. Moreover, different from Lucene and Lemur, it provides built-in supports for semantic-based IR and TM. The dragon toolkit seamlessly integrates a set of NLP tools, which enable the toolkit to index text collections with various representation schemes including words, phrases, ontology-based concepts and relationships. However, to minimize the learning time, we intentionally keep the package small and simple. The toolkit does not have some features including distributed IR and cross-language IR which is a part of Lemur toolkit.

Ontology-based Biomedical Text Annotation (2006/04)

Dictionary-based biological concept extraction is still the state-of-the-art approach to large-scale biomedical literature annotation and indexing. The exact dictionary lookup is a very simple approach, but always achieves low extraction recall because a biological term often has many variants while a dic-tionary is impossible to collect all of them. We propose a generic extraction ap-proach, referred to as approximate dictionary lookup, to cope with term varia-tions and implement it as an extraction system called MaxMatcher. The basic idea of this approach is to capture the significant words instead of all words to a particular concept. The new approach dramatically improves the extraction re-call while maintaining the precision.
Queuing System Component (2001/11)
This component written in Visual Basic is a middleware for development of queusing system simulation applications. It consists of five objects: event, customer, queue, distribution and statistics. Because of its flexibility and simpleness, it has been chosen by a research team of Shanghai Maritime University to design software for simulating the traffic of Shanghai Container Harbor, and another research team in BaoSteel, one of the largest steel companies in China. Now you can download the midware for free. The package includes the component, help file written in Chinese and sample projects. Download the package.

For Fun

Magic Coversion from Image to Word Doc (2004/08)
It is an interesting sutff converting any image to Microsoft Word document. All pixels in the image will appear as specified character in the word document. But manipulating the doc slightly like changing font type and size, you will see the original image in the word document. It seems marvellous! Dowload the installaton package (inclusive of source code)
American Option Pricing Tool (2004/05)
It is a tool written in Visual Basic for American option pricing. I basically calculate the option vlaue by simulation, a simple least-square approch as described in this paper. Download the source code.
Netlinez (2003/07)
It is a mini-game written in Visual Basic and Visual C++. The original idea is not mine, but from a Russian guy. The original version supports only stand-alone mode. I write the code from the scratch and extend it to a net game. Now two players can compete with each other or cooperate through Internet. The new version seems more fun than the original one. Download installation package and source code.

©2006 Davis Zhou, All Rights Reserved. Last Update on December 8, 2007.