Note: This is a guide for technical readers – read the non-technical introduction.

SayIt is a Django application designed to radically reduce the effort of putting transcripts online in an attractive, searchable, linkable, readable way. This site is an example of it deployed as a standalone site.

Whilst SayIt can be deployed as a standalone website, it has really been built to work as a feature within bigger, more complicated websites. Our goal is that people who are trying to track the activities of powerful people can do a better job with less time and energy by building SayIt into their tools, and using all the saved time for something else. We are prepared to put a lot of time and effort into making SayIt a tool that developers find an attractive system to integrate with their websites and apps, so if you want something, please do ask.

If you have questions, please join the Poplus mailing list. We’ll do our best to answer them, and other people interested in SayIt will benefit from your contributions.

Ways to get started with SayIt, as a developer

At this Alpha stage, there are a few different ways in which you can have a go at SayIt:

  1. Try using the example sites, and submit bug or UX tickets to our GitHub repository
  2. More desirable – help someone without coding skills to get transcript data that they care about converted and loaded into an instance of SayIt
  3. Also more desirable – pick some transcripts that you care about, and convert them and upload them.
  4. Try installing SayIt and give us feedback on that process.
  5. Try using the nascent export API to do something fun or interesting with the data that’s already in the example sites. You could try text processing, or data-vis tools, for example.

Whichever of these you try, we’d be very grateful if you’d tell us what you’re doing by joining the list and letting us know. That way there’ll be much more visibility as to the questions being asked the problems being solved.

Currently, installation of SayIt as an app within another Django project still needs some work before it’s ready for reuse, but we are actively doing this ourselves within our Pombola project.

How to be match-made with someone non-technical who wants transcripts uploading

If you want to help someone else to upload some transcripts (which would be a very nice thing to do), please get in touch and we’ll let you know what options we have for helping. We love our volunteers, and there’ll be a hoodie in it for you :)

How to convert data to the standard we use – Akoma Ntoso

Akoma Ntoso1 is a comprehensive XML schema for several Parliamentary document types such as bills, acts, and debates. Various bodies around the world are starting to use or interoperate with Akoma Ntoso to model their data. Whilst it was designed for Parliamentary document types, the schema is general enough that it can be used for many different types of debate.

SayIt can import a subset of Akoma Ntoso, and below we describe which aspects of it we currently cover. You can see an Akoma Ntoso representation of any section on SayIt by adding .an to the end of any section URI, for example Shakespeare’s The Tempest: http://shakespeare.sayit.mysociety.org/the-tempest.an.

If you have some transcripts and can put them into the format below (hopefully via some form of automated process!), we should be able to import it. If you can’t put them into that format without a lot of manual work, get in touch and we can hopefully help.

If you use aspects of Akoma Ntoso that we don't yet cover, please also get in touch so we can discuss improving our import process.

Basic structure

Akoma Ntoso is XML, with some HTML for its low-level content, which for all practical purposes means it looks quite a bit like HTML. Here is a small example showing the basic structure:

<akomaNtoso>
  <debate name="play">
    <meta>
      <references source="#">
        <TLCPerson id="caliban" href="/ontology/person/shakespeare.caliban" showAs="Caliban"/>
        <TLCPerson id="trinculo" href="/ontology/person/shakespeare.trinculo" showAs="Trinculo"/>
        <TLCPerson id="stephano" href="/ontology/person/shakespeare.stephano" showAs="Stephano"/>
      </references>
    </meta>
    <preface>
      <docTitle>The Tempest</docTitle>
    </preface>
    <debateBody>
      …
      <debateSection name="act" id="act2">
        <heading id="act2-head">Act 2</heading>
        …
        <debateSection name="scene" id="act2-scene2">
          <heading id="act2-scene2-head">Scene 2</heading>
          <subheading>Another part of the island.</subheading>
          <narrative>Enter CALIBAN with a burden of wood. A noise of thunder heard</narrative>
          <speech by="#caliban">
            <from>CALIBAN</from>
            <p>All the infections that the sun sucks up…</p>
          </speech>
          <narrative>Enter TRINCULO</narrative>
          <speech by="#caliban">
            <from>CALIBAN</from>
            <p>Lo, now lo!…</p>
          </speech>
          <speech by="#trinculo">
            <from>TRINCULO</from>
            <p>Here's neither bush nor shrub, to bear off…</p>
          </speech>
          …
        </debateSection>
        …
      </debateSection>
      …
    </debateBody>
  </debate>
</akomaNtoso>

Generic attributes

All elements can have the following optional attributes (don’t worry about these too much, you might only need id):

  • id (must be unique within the document, start with letter or underscore, and can only contain letters, digits, underscores, periods, and hyphens)
  • class, style, and title (as in HTML)
  • refersTo (a URI to an entry within references)

Container elements

akomaNtoso and debate

The akomaNtoso and debate element wrap the entire document. The debate element has a name attribute to express the correct name for the document's type, for example, "hansard", "transcript", "play", or simply "debate".

preface / coverPage

The preface or coverPage element can contain block element children. Within that, you may use various inline elements to signify things such as the title, type, number, purpose, or jurisdiction of the document – SayIt currently only spots the docDate or docTitle elements.

debateBody

The debateBody element is an overall generic container for the main content of the document, containing the hierarchy of speech sections.

Speech sections

<questions id="…">
  <debateSection id="…">
    <heading id="…">…</heading>
    <question by="#…">…</question>
    <answer by="#…" as="#…">…</answer>
    …
  </debateSection>
  …
</questions>
<ministerialStatements>
  <heading id="…">…</heading>
  <debateSection id="…">
    <heading id="…">…</heading>
    <speech by="#…" as="#…">…</speech>
  </debateSection>
</ministerialStatements>

The following elements (which all require an id attribute) can be used to create a hierarchy of speech-like elements. The generic element is debateSection, which requires a name attribute to describe what type of section it is. Most of the specific elements are only useful in a Parliamentary-style debate context; do use them if applicable, but generally you may find debateSection is what you use. SayIt doesn't handle different types of section differently at present.

  • debateSection (additionally requires a name attribute to describe the type of section)
  • administrationOfOath, rollCall, prayers
  • oralStatements, writtenStatements, personalStatements, ministerialStatements
  • resolutions, nationalInterest
  • declarationOfVote
  • communication
  • petitions, papers, noticesOfMotion
  • questions
  • address
  • proceduralMotions
  • pointOfOrder
  • adjournment

Each of these elements contains zero or one num, heading, and subheading elements, followed by more speech section elements, or speech-like elements.

The num, heading and subheading elements can contain inline text, and heading must have an id attribute (though examples on the official Akoma Ntoso website do not). Whilst semantically you can use these to mark up different information, SayIt will munge these together into one string on import.

Speech-like elements

<narrative>…</narrative>
<speech by="#caliban">
  <from>CALIBAN</from>
  <p>……</p>
</speech>
<narrative>Enter TRINCULO</narrative>
<speech by="#caliban">
  <from>CALIBAN</from>
  <p>Lo, now lo!…</p>
</speech>
<speech by="#trinculo">
  <from>TRINCULO</from>
  <p>……</p>
</speech>

There are seven elements for holding speech-like entries:

  • speech
  • question
  • answer
  • scene
  • narrative
  • summary
  • other

speech, question and answer require a by attribute – a URI to an entry in references (probably a TLCPerson). You may also optionally include as (a URI to a reference of the role this speech is made in), to (a URI to a reference of who this speech is addressed to), and startTime and endTime (in ISO format YYYY-MM-DDThh:mm:ss).

Each of these three elements contains optional num, heading, and subheading elements (as with speech section elements), an optional from element and then one or more block elements.

The from element should contain the text used in the transcript for this speaker (their identifier is handled by the attributes on the speech element itself).

There are three elements for descriptive entries, that can contain inline elements and text:

  • scene (e.g. “applause”)
  • narrative (e.g. “Mr X takes the Chair”)
  • summary (e.g. “Question agreed to”)

Lastly, the other element is the container for parts of a debate that are not speeches nor scene comments (e.g. lists of papers). It requires an id attribute, and contains block elements.

Block elements

<speech by="#…" as="#…">
  <from>Mr Block</from>
  <p>Here is a list:</p>
  <ul id="">
    <li>First item</li>
    <li>Second item</li>
  </ul>
  <p>And here is a table:</p>
  <table id="">
    <tr> <td>A</td> <td>B</td> </tr>
    <tr> <td>A</td> <td>D</td> </tr>
  </table>
</speech>

Block elements handled by SayIt are the HTML elements:

  • p
  • ul
  • ol
  • table

All these besides p require an id attribute. ul and ol contain lis as in HTML (which can optionally have a value attribue), and lis can contain p, ul, ol, or inline text.

Other Akoma Ntoso block elements are ignored (though not their contents).

Inline elements

Akoma Ntoso supports the following HTML inline elements, and so does SayIt:

  • span (generic inline)
  • b, i, u, sup, sub (presentational)
  • abbr (abbreviations)
  • a (link)
  • br (line break)

Akoma Ntoso has many inline elements for adding semantic information to inline text; the only one SayIt currently recognises on import is recordedTime, which it uses for updating speech times (if speeches don't have their own times).

Feel free to use other inline elements such as person or eop; they simply won't be output in the HTML.

References

Reference elements are empty elements for providing URIs for entities used in the document following:

  • TLCPerson
  • TLCOrganization
  • TLCConcept
  • TLCObject
  • TLCEvent
  • TLCLocation
  • TLCProcess
  • TLCRole
  • TLCTerm
  • TLCReference

An id attribute is required, as is href, and showAs. Optional attributes are shortForm. name is required for TLCReference to explain what type of reference it is. Please see the Akoma Ntoso website for more information on Akoma Ntoso metadata.

Questions/problems

Get in touch, we’re happy to help.

Footnotes

1 Technically an acronym for Architecture for Knowledge-Oriented Management of African Normative Texts using Open Standards and Ontologies, and probably a backronym as it means “linked hearts” in Akan :)