An American academic has teamed up with the Internet Archive and Flickr to create a massive, searchable trove of copyright-free images from books dating back to 1502.
Kalev Leetaru, the Yahoo! Fellow at Georgetown University, has developed software that recognizes images from the Internet Archive’s 600 million scanned pages, saves each as a separate .jpeg file, and copies the caption and surrounding text. Leetaru then used Flickr (which is owned by Yahoo!) to upload and create searchable tags for the images.
Although libraries everywhere have undertaken digitization projects, Leetaru tells the BBC that such efforts have typically neglected the artwork in books. Even the Internet Archive’s digitization software discards pictures when converting each word into searchable text.
“For all these years all the libraries have been digitising their books, but they have been putting them up as PDFs or text searchable works,” [Leetaru] told the BBC.
“They have been focusing on the books as a collection of words. This inverts that.
“Stretching half a millennia, it’s amazing to see the total range of images and how the portrayals of things have changed over time.
“Most of the images that are in the books are not in any of the art galleries of the world – the original copies have long ago been lost.”
Leetaru plans to offer his code to libraries around the world, and hopes that Wikipedia might take advantage of the software to illustrate articles.