Note: This is a guide for technical readers – read the non-technical introduction.
SayIt is a Django application designed to radically reduce the effort of putting transcripts online in an attractive, searchable, linkable, readable way. This site is an example of it deployed as a standalone site.
Whilst SayIt can be deployed as a standalone website, it has really been built to work as a feature within bigger, more complicated websites. Our goal is that people who are trying to track the activities of powerful people can do a better job with less time and energy by building SayIt into their tools, and using all the saved time for something else. We are prepared to put a lot of time and effort into making SayIt a tool that developers find an attractive system to integrate with their websites and apps, so if you want something, please do ask.
If you have questions, please join the Poplus mailing list. We’ll do our best to answer them, and other people interested in SayIt will benefit from your contributions.
At this stage, there are a few different ways in which you can have a go at SayIt:
Whichever of these you try, we’d be very grateful if you’d tell us what you’re doing by joining the list and letting us know. That way there’ll be much more visibility as to the questions being asked the problems being solved.
If you want to help someone else to upload some transcripts (which would be a very nice thing to do), please get in touch and we’ll let you know what options we have for helping. We love our volunteers, and there’ll be a hoodie in it for you :)
Akoma Ntoso1 is a comprehensive XML schema for several Parliamentary document types such as bills, acts, and debates. Various bodies around the world are starting to use or interoperate with Akoma Ntoso to model their data. Whilst it was designed for Parliamentary document types, the schema is general enough that it can be used for many different types of debate.
SayIt can import a subset of Akoma Ntoso, and below we describe which
aspects of it we currently cover. You can export an Akoma Ntoso representation
of any section on SayIt by adding .an
to the end of any section
URI, for example Shakespeare’s The Tempest:
https://shakespeare.sayit.mysociety.org/the-tempest.an.
If you have some transcripts and can put them into the format below (hopefully via some form of automated process!), we should be able to import it. If you can’t put them into that format without a lot of manual work, get in touch and we can hopefully help.
If you use aspects of Akoma Ntoso that we don't yet cover, please also get in touch so we can discuss improving our import process.
Akoma Ntoso is XML, with some HTML for its low-level content, which for all practical purposes means it looks quite a bit like HTML. Here is a small example showing the basic structure:
<akomaNtoso> <debate name="play"> <meta> <references source="#"> <TLCPerson id="caliban" href="/ontology/person/shakespeare.caliban" showAs="Caliban"/> <TLCPerson id="trinculo" href="/ontology/person/shakespeare.trinculo" showAs="Trinculo"/> <TLCPerson id="stephano" href="/ontology/person/shakespeare.stephano" showAs="Stephano"/> </references> </meta> <preface> <docTitle>The Tempest</docTitle> </preface> <debateBody> … <debateSection name="act" id="act2"> <heading id="act2-head">Act 2</heading> … <debateSection name="scene" id="act2-scene2"> <heading id="act2-scene2-head">Scene 2</heading> <subheading>Another part of the island.</subheading> <narrative>Enter CALIBAN with a burden of wood. A noise of thunder heard</narrative> <speech by="#caliban"> <from>CALIBAN</from> <p>All the infections that the sun sucks up…</p> </speech> <narrative>Enter TRINCULO</narrative> <speech by="#caliban"> <from>CALIBAN</from> <p>Lo, now lo!…</p> </speech> <speech by="#trinculo"> <from>TRINCULO</from> <p>Here's neither bush nor shrub, to bear off…</p> </speech> … </debateSection> … </debateSection> … </debateBody> </debate> </akomaNtoso>
All elements can have the following optional attributes (don’t worry
about these too much, you might only need id
):
references
)
The akomaNtoso
and debate
element wrap the entire
document. The debate element has a name
attribute to express the
correct name for the document's type, for example, "hansard", "transcript",
"play", or simply "debate".
The preface
or coverPage
element can contain
block element children. Within that, you may use various
inline elements to signify things such as the title, type, number, purpose, or
jurisdiction of the document – SayIt currently only spots the
docDate
or docTitle
elements.
The debateBody
element is an overall generic container for the
main content of the document, containing the hierarchy of speech sections.
<questions id="…"> <debateSection id="…"> <heading id="…">…</heading> <question by="#…">…</question> <answer by="#…" as="#…">…</answer> … </debateSection> … </questions> <ministerialStatements> <heading id="…">…</heading> <debateSection id="…"> <heading id="…">…</heading> <speech by="#…" as="#…">…</speech> </debateSection> </ministerialStatements>
The following elements (which all require an id
attribute) can be used to
create a hierarchy of speech-like elements. The generic element is
debateSection
, which requires a name
attribute to
describe what type of section it is. Most of the specific elements are only useful in
a Parliamentary-style debate context; do use them if applicable, but generally you may find
debateSection
is what you use. SayIt doesn't handle different
types of section differently at present.
name
attribute to
describe the type of section)
Each of these elements contains zero or one num
,
heading
, and subheading
elements, followed by more
speech section elements, or speech-like elements.
The num
, heading
and subheading
elements can contain inline text, and heading
must have an
id
attribute (though examples on the official Akoma Ntoso website
do not).
<narrative>…</narrative> <speech by="#caliban"> <from>CALIBAN</from> <p>……</p> </speech> <narrative>Enter TRINCULO</narrative> <speech by="#caliban"> <from>CALIBAN</from> <p>Lo, now lo!…</p> </speech> <speech by="#trinculo"> <from>TRINCULO</from> <p>……</p> </speech>
There are seven elements for holding speech-like entries:
speech
, question
and answer
require a
by
attribute – a URI to an entry in references
(probably a TLCPerson
). You may also optionally include
as
(a URI to a reference of the role this speech is made in),
to
(a URI to a reference of who this speech is addressed to), and
startTime
and endTime
(in ISO format
YYYY-MM-DDThh:mm:ss
).
Each of these three elements contains optional num
,
heading
, and subheading
elements (as with speech
section elements), an optional from
element and then one or more
block elements.
The from
element should contain the text used in the transcript
for this speaker (their identifier is handled by the attributes on the speech
element itself).
There are three elements for descriptive entries, that can contain inline elements and text:
Lastly, the other
element is the container for parts of a debate that
are not speeches nor scene comments (e.g. lists of papers). It requires an
id
attribute, and contains block elements.
<speech by="#…" as="#…"> <from>Mr Block</from> <p>Here is a list:</p> <ul id=""> <li>First item</li> <li>Second item</li> </ul> <p>And here is a table:</p> <table id=""> <tr> <td>A</td> <td>B</td> </tr> <tr> <td>A</td> <td>D</td> </tr> </table> </speech>
Block elements handled by SayIt are the HTML elements:
All these besides p
require an id
attribute.
ul
and ol
contain li
s as in HTML (which
can optionally have a value
attribue), and li
s can
contain p
, ul
, ol
, or
inline text.
Other Akoma Ntoso block elements are ignored (though not their contents).
Akoma Ntoso supports the following HTML inline elements, and so does SayIt:
Akoma Ntoso has many inline elements for adding semantic information to
inline text; the only one SayIt currently recognises on import is
recordedTime
, which it uses for updating speech times (if speeches
don't have their own times).
Feel free to use other inline elements such as person
or
eop
; they simply won't be output in the HTML.
Reference elements are empty elements for providing URIs for entities used in the document following:
An id
attribute is required, as is href
, and
showAs
. Optional attributes are shortForm
.
name
is required for TLCReference
to explain what
type of reference it is.
Please see the Akoma Ntoso website for
more
information on Akoma Ntoso metadata.
Get in touch, we’re happy to help.
1 Technically an acronym for Architecture for Knowledge-Oriented Management of African Normative Texts using Open Standards and Ontologies, and probably a backronym as it means “linked hearts” in Akan :)