Australian Digital Observatory blog

What do Reddit users discuss about Australia's Federal Election 2025? Updates on week 4

2025-04-28T00:00:00Z

Our report on the discussions around Australia's Federal Election 2025 is out now, covering last week's data.

Political engagement

What do Reddit users discuss about Australia's Federal Election 2025? Updates on week 3

2025-04-24T00:00:00Z

Our report on the discussions around Australia's Federal Election 2025 is out now, covering last week's data.

Political engagement

What do Reddit users discuss about Australia's Federal Election 2025? Updates on week 2

2025-04-15T00:00:00Z

Our latest report on the discussions around Australia's Federal Election 2025 is out now, covering last week's data.

Political engagement

We also investigated the presence of political engagement in these discussions, whereby we attempted to detect support, opposition, or neutrality to a political party from Reddit comments. As shown in the chart below, levels of political expression varied across topics. Some topics saw a substantial proportion of comments indicating sentiments towards political parties, while in some, the discussions tended to stay general.

What do Reddit users discuss about Australia's Federal Election 2025? We used GenAI to find out

2025-04-09T00:00:00Z

As Australia's Federal Election 2025 is approaching, we at the Digital Observatory are keen to explore public sentiments around key political issues that might have an impact on election outcomes.

Making use of our comprehensive Australian Reddit databank, we extracted and summarised thousands of Reddit submissions and comments from twenty politically active subreddits, using AI models such as Google's Gemini Flash and Gemini Pro. The analysis will be done on a regular basis until election day. With this, we aim to track public conversations around election issues and how they evolve as we get closer to election day.

The results of the first week of data are available as a public dashboard and a report.

Launching new Reddit databank!

2024-10-14T00:00:00Z

Consisting of all Reddit data from Australia-related subreddits, AusReddit is our latest databank. Our goal is to give researchers across Australia easy access to an important data source that can provide rich insights into the nation's societal issues.

What is included in the data?

At the time of writing, the databank contains 5,082,294 submissions and 103,321,441 comments from 585 Australia-related subreddits. For more details on how these subreddits were identified, refer to our factsheet or read it online here.

AusReddit has data from 2005 to the most recent whole month. We run data updates monthly as soon as the most recent month's data is available.

What do I need to get data?

Currently, AusReddit is available to academic researchers across Australia. Researchers with access can search, view, and download raw data from AusReddit for their projects. You do need to obtain ethics clearance before access is granted.

For more information, please refer to our factsheet or visit AusReddit at https://ausreddit.digitalobservatory.net.au/.

A recap of Post-API conference

2024-07-10T00:00:00Z

Last month, we had the opportunity to participate in the Post-API Conference 2024, organised by Prof. Dan Agnus, Prof. Deen Freelon, Associate Professor Jo Lukito, Dr. Jo Lukito, and Dr. John Pasek.

First held in Philadelphia in 2023, the Post-API Conference provides a platform for the research community to learn and share their experience working with social media data. As the name suggests, the conference focuses on acquiring social media data for research in an era where data access is becoming increasingly difficult.

As a research infrastructure facility primarily dealing with human data on the web such as social media data, the Digital Observatory has been exploring sustainable non-API approaches to data collection. For that reason, we were invited to talk about our tools and resources in this area.

The talk described our three newly established tools:

youte+ for YouTube metadata collection,
NewsTalk, a databank of public comments on Australian news stories,
AusReddit, a databank of Reddit posts and comments from Australian subreddits (soon to be launched).

We also briefly mentioned TubeTalk, a future project that aims to transcribe with speaker diarisation YouTube videos.

The interest our talk generated was encouraging, and we had good conversations with researchers to better understand their research problems and needs.

It was also interesting to hear how other researchers approached data collection. As expected, we also discussed the ethical and legal implications of digital data acquisition.

A recap of ResBaz Queensland 2023

2023-11-27T00:00:00Z

Last week, the DO team had the wonderful opportunity to present at Research Bazaar (ResBaz) Queensland 2023. A popular conference in the research community, ResBaz holds an annual three-day event featuring keynote speeches, digital research training workshops, and roundtables, all aiming to upskill and connect researchers from diverse backgrounds.

As in previous ResBazzes, the DO team did a number of workshops and talks related to our areas of expertise. Of note this year were two workshops on AI for research, and one showcasing our new data sources.

Demystifying AI for research

Having done a lot of investigation and experimentation with Large Language Models (LLMs), Robert Fleet and Mat Bettinson did two talks on the capabilities and potential of these powerful tools in aiding research. Specifically, we discussed the characteristics and properties of LLMs, their strengths and weaknesses, and based on those, how to leverage the power of AI to augment research.

Recap of DO's talk at UNSW: Social Data on the Web and Using Large Language Models for Research

2023-09-14T00:00:00Z

Last week, Mat and Rob from the Digital Observatory gave a talk at the University of New South Wales about the current Web social data landscape and how advanced technologies such as AI and Large Language Models (LLMs) can be leveraged to augment research.

This blog post is a quick recap of the talk. If you'd like to found out more, please feel free to contact us.

First, we reiterated the value that social data on the web (social media, forums, chats, reviews, etc.) has contributed to the research community, easily demonstrated through the long list of journal articles and publications using such data.

We stressed the importance of Twitter and Reddit as two open data sources, which before could be readily obtained by researchers via their APIs. Then, with Elon Musk taking over Twitter, and data-gorging AI models such as ChatGPT turning Web data into the new oil, previously freely accessible APIs now charge researchers an exorbitant fee for a fraction of the volume.

Implications for research: Gone is the era of free API access - walled gardens such as Facebook and Instagram are now the new norm. It will be harder to get data from any single platform.

The path forward

We talked about how we as a research infrastructure facility navigated the path forward: developing tools to collect data from still-friendly platforms, and "diversifying" by harvesting data from multiple platforms.

As part of this, we introduced a curated databank of reader commentary on new sites and a tool to collect YouTube metadata and comments.

Large Language Models (LLMs)

The latter half of our talk revolved around LLMs and their potential to augment HASS (Humanities and Social Sciences) research quality and scale. We discussed the strengths and weaknesses of LLMs. Based on that, we suggested ways in which LLMs can be leveraged to assist researchers and gave some demos.

RIP Australian Twittersphere

2023-07-17T00:00:00Z

As Twitter began restricting free research access to their data, our collectors for the Australian Twittersphere were officially cut off on 13 July 2023.

In this eulogy for the Australian Twittersphere, we will trace back its evolution over the years, the contributions it has made to the research community, and what comes next for researchers and the Digital Observatory.

About the Australian Twittersphere

The Australian Twittersphere is a longitudinal collection of tweets from a periodically updated list of Twitter accounts that are identified as Australian (i.e., have a stated connection to Australia in the free text fields of the account profile).

Since the Twittersphere's establishment six years ago, over 1.9 billion tweets have been collected from more than 1 million Twitter accounts considered to be Australian. The earliest tweet captured was posted on 9 January 2007.

Historical background

The Australian Twittersphere began as the TrISMA project, during which tweets were collected from roughly 530,000 Twitter accounts deemed as "Australian". There was a gap in tweet collection between May 2017 and early 2018 which coincided with the TrISMA project ending and the QUT Digital Observatory being established.

It was at this time that we (the QUT Digital Observatory) took over the tweet collection and made improvements to secure it as a stable longitudinal collection. Subsequently, we built another collector and developed a method to identify more Australian accounts. The method allowed us to periodically update the list of accounts and collect more tweets from them. Updates were completed in late 2020 and 2021, resulting in twice the number of accounts compared to the original list.

More details on this can be found in the Australian Twittersphere factsheet.

Research impacts

The Australian Twittersphere was our first major resource as a research infrastructure facility. It provided historial, Australia-specific data to researchers at a time when it was not possible to geo-locate Twitter data in significant volumes (which is still the case: roughly 1% of tweets include precise geolocation information). At the time, so much of our work was related to the Australian Twittersphere that the Digital Observatory became synonymous with it.

The Australian Twittersphere has helped inform impactful research into education, health, business, law, and the humanities and social sciences. A non-exhaustive list of publications arising from this databank is listed below.

What's next for researchers

Twitter/X is only one of many cases where the gates to research-friendly, free-to-access APIs are gradually closing. Other platforms have also begun to gatekeep their data by charging for API access.

For example, researchers working with Twitter data are now having to bear the extra costs of the API. It's not cheap: a reasonable amount of 1 million tweets costs no less than $5,000 USD per month. These costs mean that further collection for the Australian Twittersphere is no longer feasible, but we will continue to make the existing data available to researchers. There are also new limitations on the Twitter/X platform that researchers should be aware of. This article gives a good overview of how to navigate the new Twitter/X.

The good news is that not all doors are closed. Platforms such as YouTube and Tiktok still provide free API access for researchers (albeit with some limitations). Indeed, the Digital Observatory has been working on different platforms and alternatives for researchers for several years. Our youte tool is a fantastic example of this work. Researchers can also leverage other maturing methods of collecting Web data, such as web archiving and data donation. The Digital Observatory is working on resources for web archiving, so watch this space!

The transition away from Twitter will be uncomfortable for some, but it is important to remember that other sources of data still exist on the Web.

Researchers can refer to our guide to the Australian Twittersphere for more information on the Australian Twittersphere and how to access it. For a more in-depth technical overview, please read the Australian Twittersphere technical fact sheet.

Publications using the Australian Twittersphere

Publications that have used the Australian Twittersphere include:

Balasubramaniam, T., Nayak, R., & Bashar, M. A. (2020). Understanding the Spatio-temporal Topic Dynamics of Covid-19 using Nonnegative Tensor Factorization: A Case Study. 2020 IEEE Symposium Series on Computational Intelligence (SSCI). Retrieved January 11, 2021, from https://ausdm20.ausdm.org/index.html
Balasubramaniam, T., Nayak, R., Luong, K., & Bashar, Md. A. (2021). Identifying Covid-19 misinformation tweets and learning their spatio-temporal topic dynamics using Nonnegative Coupled Matrix Tensor Factorization. Social Network Analysis and Mining, 11(1), 57. Retrieved June 22, 2021, from https://doi.org/10.1007/s13278-021-00767-7
Barnes, N. (2021b). The social life of literacy education: How the 2018 #phonicsdebate is reshaping the field. The Australian Educational Researcher. Retrieved April 20, 2021, from https://doi.org/10.1007/s13384-021-00451-x
Barnes, N. (2021a). Parents, carers, and policy labor: Policy networks and new media. New Media & Society, 1461444820979004. Retrieved February 22, 2021, from https://doi.org/10.1177/1461444820979004
Bashar, M. A., Nayak, R., & Balasubramaniam, T. (2020). Topic, Sentiment and Impact Analysis: COVID19 Information Seeking on Social Media. arXiv:2008.12435 [cs]. Retrieved January 11, 2021, from http://arxiv.org/abs/2008.12435
Bruns, A., Angus, D., & Graham, T. (2019, May). The 2019 election on Twitter: Watergate, mums, and well-organised independents. The Conversation. Retrieved April 22, 2022, from http://theconversation.com/the-2019-election-on-twitter-watergate-mums-and-well-organised-independents-117182
Cooper, K., Dedehayir, O., Riverola, C., Harrington, S., & Alpert, E. (2022). Exploring Consumer Perceptions of the Value Proposition Embedded in Vegan Food Products Using Text Analytics. Sustainability, 14(4), 2075. Retrieved February 14, 2022, from https://www.mdpi.com/2071-1050/14/4/2075
Layt, S. (2020, June). How social media was ahead of the curve when it came to COVID-19. The Age. Retrieved January 11, 2021, from https://www.theage.com.au/national/queensland/how-social-media-was-ahead-of-the-curve-when-it-came-to-covid-19-20200629-p557cp.html
Queensland University of Technology. (2020, November). Social media can guide public pandemic policy: QUT research. QUT News. Retrieved January 11, 2021, from https://www.qut.edu.au/news?id=170713
Queensland University of Technology. (2020, June). First 100 days of COVID-19 - Australian Twitter users’ concerns mapped. QUT News. Retrieved January 11, 2021, from https://www.qut.edu.au/news?id=164908
Schweinberger, M., Haugh, M., & Hames, S. (2021). Analysing discourse around COVID-19 in the Australian Twittersphere: A real-time corpus-based analysis. Big Data & Society, 8(1), 20539517211021437. Retrieved February 17, 2022, from https://doi.org/10.1177/20539517211021437
Tulloch, A. I. T., Miller, A., & Dean, A. J. (2021). Does scientific interest in the nature impacts of food align with consumer information-seeking behavior? Sustainability Science. Retrieved April 30, 2021, from https://doi.org/10.1007/s11625-021-00920-3
Vasconcelos Silva, C., Jayasinghe, D., & Janda, M. (2020). What Can Twitter Tell Us about Skin Cancer Communication and Prevention on Social Media? Dermatology, 236(2), 81–89. Retrieved January 11, 2021, from https://www.karger.com/Article/FullText/506458
Yigitcanlar, T., Hewa Heliyagoda Kankanamge, R. N. E., Regona, M., Maldonado, A., Rowan, B., Ryu, A., Desouza, K., et al. (2020). Artificial intelligence technologies and related urban planning and development concepts: How are they perceived and utilized in Australia? Journal of Open Innovation: Technology, Market, and Complexity, 6(4), Article number: 187 1–21. Retrieved January 11, 2021, from https://eprints.qut.edu.au/206976/
Yigitcanlar, T., Hewa Heliyagoda Kankanamge, R. N. E., & Vella, K. (2020). How are smart city concepts and technologies perceived and utilized? A systematic geo-Twitter analysis of smart cities in Australia. Journal of Urban Technology, 1–20. Retrieved January 11, 2021, from https://eprints.qut.edu.au/199301/
Yigitcanlar, T., Kankanamge, N., Preston, A., Gill, P. S., Rezayee, M., Ostadnia, M., Xia, B., et al. (2020). How can social media analytics assist authorities in pandemic-related policy decisions? Insights from Australian states and territories. Health Information Science and Systems, 8(1), 37. Retrieved January 11, 2021, from https://doi.org/10.1007/s13755-020-00121-9
Yigitcanlar, T., Regona, M., Kankanamge, N., Mehmood, R., D’Costa, J., Lindsay, S., Nelson, S., et al. (2022). Detecting Natural Hazard-Related Disaster Impacts with Social Media Analytics: The Case of Australian States and Territories. Sustainability, 14(2), 810. Retrieved April 18, 2022, from https://www.mdpi.com/2071-1050/14/2/810

ADO ecosystem

2023-06-08T00:00:00Z

Introduction

The Australian Digital Observatory (ADO) is curating an ecosystem of resources for researchers working with digital human data from the internet. This ecosystem is intended to be useful to a wide range of disciplines, particularly humanities and social sciences, data science, public health, business, law, and many others.

Research projects are, by definition, all unique: the very purpose of research is to make new contributions. Therefore, there is no one-size-fits-all system for obtaining and pre-processing research data, especially in the humanities and social sciences. However, there are many overlapping methods and skills that are part of the research data lifecycle including (but not limited to) collecting, tidying, analysing, and publishing data.

The ADO ecosystem aims to equip researchers with a set of modular, open source, interoperable methods and processes to assist with these tasks. Researchers can engage with and benefit from the ADO ecosystem by accessing tools, training, and project support services.

This modular ecosystem approach gives flexibility, allowing researchers to pick and choose resources that are relevant and suitable for them, without being locked in to a specific stack (e.g., proprietary formats/tools, specific cloud infrastructure). The modular approach also recognises different entry levels in terms of researchers’ skills and needs. Furthermore, smaller tools are easier to maintain, and focusing on modularity and interoperability allows making use of and supporting existing tools and resources already in the community rather than reinventing the wheel.

In designing and curating the ADO ecosystem, we take inspiration from the open source software community and the Unix philosophy of designing interoperable tools that each do one thing well, and from foundational tooling ecosystems such as the R Tidyverse, which has well demonstrated the suitability of such a structure to the academic research environment. We hope to learn from these communities, and also to contribute back to them.

What is it for?

The ADO ecosystem aims to support researchers to make use of dynamic digital human data on the internet. This includes social media data, review websites, blogs, shared knowledge bases such as Wikipedia, and forums such as Reddit. Data collected from these sources can include the written words, metadata, platform structures and affordances, user contributions/changes - basically any and all of the facets of data that make up our digital internet activities.

Working with dynamic digital human data on the internet poses a number of challenges, especially for researchers from disciplines that traditionally do not require computational methods. These data-centric challenges can be framed in terms of the research lifecycle activities: explore; collect; tidy and model; store and organise; analyse; and publish. The ADO ecosystem provides resources to help researchers address these challenges in the form of:

Datasets and data sources to assist with finding and collecting data;
Software and technical utilities/systems to assist with processing and analysing data;
Documentation to guide researchers on best practice, workflows, and technical methods;
Services for more specialised support (e.g., data-related consulting, bespoke software development, training).

Example workflows

Here are some examples of research workflows that benefit from our ecosystem approach.

Twitter conversation analysis

A researcher is interested in the Twitter conversation around Australia’s federal election. The workflow for answering this problem would include a preliminary feasibility check (to ensure there are sufficient data for analysis), followed by data collection, processing, and analysis. To implement this workflow, we use the following tools:

ADOReD: high-level social media analysis via an interactive dashboard. It can be used to derive a list of tweet IDs relevant to the research question.
twarc hydrate: open-source command-line tool for extracting data from Twitter API. It can be used to hydrate full tweet content from tweet IDs.
tidy_tweet: in-development open-source Python library for processing raw responses from Twitter API into an SQLite database.
tweet_exploR: open-source R package providing descriptive statistics and visualisation of Twitter data within an SQLite database produced by tidy_tweet.

The chart below illustrates the research workflow, as well as the specific tools used in each phase.

Multi-platform content analysis

Many research projects require data from multiple sources. For example, a researcher is interested in the dynamics and development of content and networks related to the Critical Race Theory in light of the Black Lives Matter movement. Analyses would require data from Twitter, Youtube, Reddit, Wikipedia, as well as scholarly citation networks. A set of different bespoke tools are developed and used to support this workflow:

twarc hydrate: open-source command-line tool for extracting data from Twitter API. It can be used to hydrate full tweet content from tweet IDs.
tidy_tweet: in-development open-source Python library for processing raw responses from Twitter API into an SQLite database.
youte: open-source tool developed by QUT Digital Observatory to assist with collecting and processing YouTube data.
reddit_collector: custom script developed by QUT Digital Observatory to assist with collecting and processing Reddit data via the open Pushshift data platform.
wikipedia_collector: custom script developed by QUT Digital Observatory to assist with collecting and processing Wikipedia data.
citation_collector: custom script developed by QUT Digital Observatory to assist with collecting and processing scholarly citation data from OpenAlex database.

What other resources are there for researchers?

The ADO Ecosystem is a part of a broader community and ecosystem of open source or openly licensed resources and methods. We’d like to share a list of projects whose work we make use of in our own resources, and which we also frequently recommend to researchers to use either with our workflows or in their own right.

Data collection resources:

twarc: Command-line utility and Python library for collecting Twitter data
Wayback Machine: The Internet Archive and its historical collection of webpages
Webrecorder: Utility for creating and curating targeted datasets of webpage archives
Pushshift API: Reddit data archive
GLAM Workbench: A broad group of utilities and instructional examples for collecting and utilising data from galleries, libraries, archives, and museums

Data analysis resources:

Language Technology and Data Analysis Laboratory (LADAL): Collection of tutorials and examples for computational linguistics and text analytics, primarily written in R
Australian Text Analytics Platform (ATAP): Tools and training for analysing, processing, and exploring text

Data storage and organisation tools:

DBeaver: Multiplatform database client
SQLite: File-based database system
Datasette: Web-based interface for SQLite databases
ClickHouse: Fast analytical column-store database

Computational skills for research resources:

The Carpentries: Community training organisation for teaching researchers computational and data science skills
The Programming Historian: A broad collection of computational skills tutorials specifically targeted at humanities and social science researchers
ResBaz: A worldwide collection of gatherings centred around digital skills for research
Hacky Hours (Queensland): Peer support sessions for technical skills for research

Foundational open source projects and communities:

The R language, and the R Tidyverse (the ADO 💙s RLadies)
The Python language and community (the ADO 💙s PyConAU and PyLadies)
The Unix, Linux, and Ubuntu operating systems and communities
The broader open source and open data communities (the ADO 💙s linux.conf.au)

The future of the ADO ecosystem

The current state of the ADO ecosystem is just the start of our journey! We plan to continue creating more tutorials, tools, documented workflows, and more, as well as refining and developing the resources already published so that they stay relevant and useful to researchers’ needs. We always love to hear from researchers and the rest of the community so that we can share, discuss, and work together to solve problems and enable research. Reach out to us anytime, and watch this space!

Australian Digital Observatory blog

What do Reddit users discuss about Australia's Federal Election 2025? Updates on week 4

Political engagement

What do Reddit users discuss about Australia's Federal Election 2025? Updates on week 3

Political engagement

What do Reddit users discuss about Australia's Federal Election 2025? Updates on week 2

Political engagement

What do Reddit users discuss about Australia's Federal Election 2025? We used GenAI to find out

Launching new Reddit databank!

What is included in the data?

What do I need to get data?

A recap of Post-API conference

A recap of ResBaz Queensland 2023

Demystifying AI for research

Recap of DO's talk at UNSW: Social Data on the Web and Using Large Language Models for Research

Web social data for research: Recent developments and current landscape

The path forward

Large Language Models (LLMs)

RIP Australian Twittersphere

About the Australian Twittersphere

Historical background

Research impacts

What's next for researchers

Publications using the Australian Twittersphere

ADO ecosystem

Introduction

What is it for?

Example workflows

Twitter conversation analysis

Multi-platform content analysis

What other resources are there for researchers?

The future of the ADO ecosystem