I say triggerfish, you say humuhumunukunukuapua’a - why we need an ontology of aquatic foods
CGIAR Initiative on Aquatic Foods
- Impact Area
Different fish have different names. And fair enough; a tuna isn’t a hake, which equally, isn’t a salmon.
But often there are multiple names for the same fish. These can vary between countries, regions, ethnic groups, language groups, and more. For example, Hawaii’s prized reef triggerfish is known locally as humuhumunukunukuapua’a. Similarly, a scup – an Atlantic schooling fish – is also a sheepshead, a porgy, a sea bream and a silver snapper all at the same time. In Bangladesh, a pakal or bamush is also a Bengal eel and a onegill eel. It just depends who you ask.
So, while diversity might be a spice of life, having different names for the same fish has the potential to cause headaches. Let’s say you’re trying to study fish stocks in a given place. If I count reef triggerfish, and you count humuhumunukunukuapua’a, we might come away with results that misrepresent the number of fish in the fishery. After all, we’ve potentially double-counted the same fish.
Fortunately, Carl Linneus solved all this in the 18th Century. His hierarchical system for naming, describing and classifying plants, animals and microorganisms – known as taxonomy – means we can work out when we’re referring to the same fish – even though we have different names for it.
But, there’s a “but”.
There’s a whole world of important fish-related language – let’s call it “fish stuff” – that isn’t standardised and organised in such a systematic way. For example, is that a gill net or a brail? Were those mussels harvested from a lake, pond or wetland? Were those humuhumunukunukuapua’a caught, landed, harvested, or do all those words mean exactly the same thing? Multiply all the different terms for “fish stuff” across countries, regions, ethnic groups, language groups, and more, and your data is going to get very messy. And this makes it difficult to compare and contrast data from different sources.
Data analysts recognise this as an issue of ontology.
An ontology describes the way in which semantic (‘word-based’) knowledge is organised. Imagine you have a pile of individual family photos with names on the back but no context about how these people are related. Furthermore, you might discover that the same people have different names: See old Uncle Bob there? He’s also brother Robert, cousin Bobby, Grandpa and Dad; again, it depends who you ask. An ontology is like a detailed family tree; it links things together by name and category by giving a unique identifier to each thing. This serves as a bridge, allowing users to see which things are connected, even when they are found in completely different contexts.
The challenge is that there is currently no ontology for small-scale fisheries and aquaculture research. In other words, there are no unique identifiers, and therefore no organised framework for representing and understanding key concepts, relationships and entities within the fisheries and aquaculture domain. This means fish scientists are not always talking about the same things in the same way, and that can cause problems: some data can’t be easily aggregated; analysis and modelling might be less accurate; and language-based artificial intelligence tools like ChatGPT might struggle to find and use the data meaningfully.
So yes, headaches.
Nevertheless, the Aquadata team of the CGIAR Initiative on Aquatic Foods is rising to the challenge. It’s developing a comprehensive ontology of small-scale fisheries and aquaculture to help clean up datasets, and provide scientists and policymakers with a more accurate picture of fisheries and aquaculture.
But it’s no simple task: currently there are around 35,000 finfish and 70,000 species of aquatic invertebrates in the FishBase.org and SeaLifeBase.org databases – the most reliable information sources for global fish and sealife species. And yes, that’s “currently” – these numbers continually fluctuate due to natural adaptation, hybridisation and scientific reclassification; a new fish species is identified approximately every week. Then there are all the aquatic plants and algaes to consider, and a whole world of additional language relating to “fish stuff” that is critical to understanding the aquatic food ecosystem. It’s a big job.
Nevertheless, the team has created a first version of the ontology. This builds on fisheries and aquaculture terminology from AGROVOC (the UN’s multilingual, controlled vocabulary relating to food and agriculture); terminology used in WorldFish innovations such as the Peskas fisheries monitoring system, and Lab-in-a-Backpack (a rapid diagnosis workflow for aquaculture pathogens); and the FishBase.org and SeaLifebase.org databases. Where terms were not already defined, the team crafted definitions from credible scientific sources.
Next, the team will organise the terminology into meaningful, computer-readable “knowledge domains”. This aims to supercharge the data: quality labelling ensures its online findability, accessibility, interoperability, and reusability for reliable interpretation by human and artificial intelligence alike. It would be the basis for robust insights and sound policy recommendations.
Now, that really would be something to sing, shout, laugh, and dance about.