Code on GitHub
The code of the EventKG can be found on GitHub.Software License
The EventKG software code is licensed under the terms of the MIT license (see LICENSE.txt in the GitHub repository).Configuration
Create a configuration file like the following to state where to store your EventKG version, and the languages and dumps to be used for extraction:
languages en,de,ru,fr,pt,it
enwiki 20190101
dewiki 20190101
frwiki 20190101
ruwiki 20190101
ptwiki 20190101
dbpedia 2016-10
wikidata 20181231
Currently, 15 languages are supported (en, fr, de, ru, pt, es, it, da, nl, ro, no, pl, hr, sl, bg). Timestamps of current Wikipedia dumps can be found on https://dumps.wikimedia.org/enwiki. Usually, the dump dates are consistent between languages. The chosen dump needs to say "Dump complete" on the dump's website. Wikidata dumps are listed on https://dumps.wikimedia.org/wikidatawiki/entities/. There is one dump for each language. DBpedia is dumped for all languages at once. The newest dump is listed on the top of http://wiki.dbpedia.org/datasets.
Run the extraction
The EventKG extraction pipeline consists of several steps described in the following. Consider that some of these step require some time and resources (e.g. for the data download, for processing the big Wikidata dump file, and for processing the Wikipedia XML files).
1. Export the Pipeline class (de.l3s.eventkg.pipeline.Pipeline) as executable jar (Pipeline.jar).
2. Start the data download using:
3. Run the first steps of extraction
4. Export the Dumper class (de.l3s.eventkg.wikipedia.mwdumper.Dumper) as Jar (Dumper.jar). Run the extraction from the Wikipedia dump files for each language by running the following command (here for Portuguese, replace pt with other languages if needed). GNU parallel is required.
5. Start the final steps of extraction:
6. The resulting .nq files can be found in the folder data/results/all.