Code on GitHubThe code of the EventKG can be found on GitHub.
Software LicenseThe EventKG software code is licensed under the terms of the MIT license (see LICENSE.txt in the GitHub repository).
Create a configuration file like the following to state where to store your EventKG version, and the languages and dumps to be used for extraction:
Currently, the five languages English (en), German (de), Russian (ru), French (fr), and Portuguese (pt) are supported. Timestamps of current Wikipedia dumps can be found on https://dumps.wikimedia.org/enwiki. Usually, the dump dates are consistent between languages. The chosen dump needs to say "Dump complete" on the dump's website. Wikidata dumps are listed on https://dumps.wikimedia.org/wikidatawiki/entities/. There is one dump for each language. DBpedia is dumped for all languages at once. The newest dump is listed on the top of http://wiki.dbpedia.org/datasets.
Run the extraction
The EventKG extraction pipeline consists of several steps described in the following. Consider that some of these step require some time and resources (e.g. for the data download, for processing the big Wikidata dump file, and for processing the Wikipedia XML files).
1. Export the Pipeline class (de.l3s.eventkg.pipeline.Pipeline) as executable jar (Pipeline.jar).
2. Start the data download using:
3. Run the first steps of extraction
4. Export the Dumper class (de.l3s.eventkg.wikipedia.mwdumper.Dumper) as Jar (Dumper.jar). Run the extraction from the Wikipedia dump files for each language by running the following command (here for Portuguese, replace pt with other languages if needed). GNU parallel is required.
5. Start the final steps of extraction:
6. The resulting .nq files can be found in the folder data/results/all.