Skip to content
@internetarchive

Internet Archive

The Internet Archive is "the library of the Internet", and a big supporter of Free Software.

Pinned Loading

  1. openlibrary openlibrary Public

    One webpage for every book ever published!

    Python 6.2k 1.8k

  2. bookreader bookreader Public

    The Internet Archive BookReader

    JavaScript 1.1k 475

  3. heritrix3 heritrix3 Public

    Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.

    Java 3.2k 782

  4. cicd cicd Public

    build & test using github registry; deploy to nomad clusters

    21 1

Repositories

Showing 10 of 268 repositories
  • RevisionChest Public

    Transforms Wikipedia XML dumps into a more compact, stream-friendly format

    internetarchive/RevisionChest’s past year of commit activity
    Rust 0 GPL-3.0 0 0 0 Updated Feb 3, 2026
  • openlibrary Public

    One webpage for every book ever published!

    internetarchive/openlibrary’s past year of commit activity
    Python 6,156 AGPL-3.0 1,750 802 (16 issues need help) 197 Updated Feb 3, 2026
  • iari Public

    Import workflows for the Wikipedia Citations Database

    internetarchive/iari’s past year of commit activity
    Python 13 GPL-3.0 8 56 0 Updated Feb 3, 2026
  • iare Public

    An interactive IARI JSON viewer

    internetarchive/iare’s past year of commit activity
    JavaScript 5 AGPL-3.0 5 32 0 Updated Feb 3, 2026
  • iiif Public

    The official Internet Archive IIIF service

    internetarchive/iiif’s past year of commit activity
    JavaScript 26 GPL-3.0 7 21 7 Updated Feb 3, 2026
  • wiki-references-db Public

    Data models and scripts to build a database of references (broadly defined) appearing on Wikipedia and other wikis

    internetarchive/wiki-references-db’s past year of commit activity
    Python 7 GPL-3.0 0 3 0 Updated Feb 3, 2026
  • iaux-notification-toast Public

    displays notifications and automatically clears them

    internetarchive/iaux-notification-toast’s past year of commit activity
    TypeScript 1 AGPL-3.0 0 1 12 Updated Feb 3, 2026
  • gowarc Public

    Read and write WARC files in Go

    internetarchive/gowarc’s past year of commit activity
    Go 47 CC0-1.0 10 15 (1 issue needs help) 8 Updated Feb 3, 2026
  • wiki-references-extractor Public

    Extracts references from Wikipedia articles

    internetarchive/wiki-references-extractor’s past year of commit activity
    Python 6 GPL-3.0 2 1 0 Updated Feb 3, 2026
  • brozzler Public

    brozzler - distributed browser-based web crawler

    internetarchive/brozzler’s past year of commit activity
    Python 783 Apache-2.0 111 36 19 Updated Feb 3, 2026