How will we do this? Together with our colleagues from CNR in Pisa and ARC in Athens, in the work package devoted to Knowledge Extraction Services (WP10), we will improve and expand the Information Inference Service created in the OpenAIREplus project. We will make improvements to the inference infrastructure: add visual workflow management and improve quality assurance. We will extend existing document content analysis functionality to extract information about structure of the document, affiliation of the authors, and sentiment of the citations. We will also enhance our automatic document classification functionality and introduce functionality of creating clusters of similar documents. We will also search for new types of links to outside knowledge bases, i.e., 3rd party, domain-specific repositories describing genes, chemicals, organisms, etc. Some solutions will be built from scratch, other will be based on software developed by the partners, like CERMINE and MadIS.Code on Github: Finally, we will work on better uptake of the project’s deliverables by making our results even more discoverable and usable by the general public. To that end, we plan to migrate our code to GitHub (star our repository now: https://github.com/openaire/openaire-mining) and to publish our data sets on Zenodo. Both the source codes and the data sets will be available on open licenses, of course! First deliverables in our work package are scheduled for August 2015. We’ll keep you up-to-date about our research on this blog, so stay tuned!
By Łukasz Bolikowski and Mateusz Kobos, ADA Lab, ICM, University of Warsaw.
This blog post has been simultaneously published on the official OpenAIRE blog and on the ADA Lab blog.
When you subscribe to the blog, we will send you an e-mail when there are new updates on the site so you wouldn't miss them.