Usability testing for a bullet journal app prototype

14 min readFeb 1, 2020

A “usable” product is defined by its ability to allow its user to carry out an intended task in an expected manner without any doubt, hesitation or hindrance. Usability testing is an essential process to developing a successful digital product. Running usability tests to gather real-world data about how users use the product is vital to discovering potential problems with usability.

Inductive vs Deductive Approach

While a deductive approach is aimed at testing a theory, an inductive approach is used to generate new theories from data. A deductive approach is best suited to situations where specific questions, based on existing data and expectations, need to be answered. On the contrary, in an inductive approach, all expectations are ignored and participants are given control over the direction of the study. An inductive approach provides a deeper understanding of participant perspectives, and was therefore deemed suitable for this usability test, as the purpose of the test was to find out if potential users were able to use the prototype, and what are the problems, if any, that they face.

An inductive approach uses a cause and effect reasoning where the arguments support the hypotheses. In this case, the inductive hypothesis is that if the test participants are unable to effectively use the product, all potential users will have similar difficulties in using the product and vice versa. A bottom-up approach was used to direct the usability strategy and an in-depth observation of the situation was used to test the hypothesis.

UI Design for the prototype

In order to guide the UI design process for the prototype, a user research survey was conducted. A random sampling technique was used to identify participants for the survey. A filter question was used to exclude participants who did not use any type of planners or diaries. The survey was completed by 493 valid participants, out of which over 90% mostly used traditional pen and paper planners. Among the 451 participants who used traditional methods, over 93% used bullet journals and a large majority stated that the biggest drawback of bullet journals was that it was a time consuming activity. This may have been the driving factor for over 95% of the participants to express interest in trying digital a digital bullet journal app.

Based on the results of the user research survey, the target audience was identified as women between the ages of 15 and 55, who are interested in keeping bullet journals. Over 83% of the target audience was unfamiliar with any digital bullet journal apps on the market. The unfamiliarity of users with digital planning app UIs may have an impact on all attributes of usability; usefulness: the ability of the product to help users achieve their goals, efficiency: the speed at which a task can be completed accurately, effectiveness and learnability: the degree to which the product behaves as users expect it to, and satisfaction which is dependent on the other four attributes. Results of the user research survey were used to guide the UI/UX design of the digital prototype and mimic the user interaction journey of a traditional bullet journal user. Usability metrics and tasks outlined in the following section were based on the bullet journal tasks that majority of survey respondents mentioned they undertook.

Test Metrics

Navigation

Ease of navigation is a critical factor that determines the usability of a digital product. Improper or confusing navigation leaves users lost and in many cases leads to frustration and high dropout rates. Furthermore, user perception of navigational controls may be biased by expectations that are influenced by past experiences, current context and future goals. Although the design implications of perceptual bias have been considered during the UI design stage, the inductive hypotheses identified points to the need to test the navigation and verify that all users would interpret the navigation in the same way.

The UI Design framework by Fresh Consultancy identified three criteria for navigation related usability: Placement, Clarity and Content.

Placement: The placement of action options has an impact on usability. Research has shown that users use an F-shaped pattern to scan pages, suggesting that menus are best placed on the top horizontally or on the left vertically. However, the prototype uses a bottom horizontal menu. It was expected that this placement will not have an impact on usability due to user familiarity with horizontal bottom navigation menus used by apps such as Pinterest and Instagram that were popular among the target audience.

Clarity: An important part of navigation is making users aware that they are on the right path to completing their task. Information foraging theory has been used to guide clear navigation design. It was expected that the labelling of navigation items will ensure users that they are on the right path and nudge them to continue towards task completion.

Content: Using content to guide navigation creates an intuitive and broad information architecture that provides a better usability experience as it reduces the number of clicks and consequently the number of decisions to be made. The prototype includes three items in the navigation menu, with related content bundled under each item. It was expected that users will have to use fewer clicks to complete a task due to the shallow information architecture implemented within the UI.

Layout (Icons)

Icons are key visual elements of UI due to their various benefits ranging from easy targeting and touchability to their ability to save valuable screen space. However, icons can negatively impact usability, especially when functionality is hidden behind icons that are hard to recognise. There are four quality criteria for icons: Findability, Recognition, Information Scent and Visual Appeal.

Findability: This criterion defines the ease in which users are able to find the icon on a page. In order to test this criterion, icons must be shown in the context of the full UI.

Recognition: This criterion deals with the ability of users to understand the representation of an icon. Although it is recommended that recognition testing of icons is done out-of-context, in the absence of labels and other UI elements, this usability strategy tests icon recognition in context. In an icon recognition test conducted by Usability Hub, the importance of context is outlined using the example of a share icon that is easily recognised by Apple product users, however less recognised by non-users of Apple products. Correspondingly, context is considered important in the testing of icon recognition for the bullet journal app. For example, the heart icon used to represent moods/feelings, taken out of context, would represent favourites or likes, due to user familiarity with icon functionality in previous digital experiences.

Information Scent: This criterion evaluates the ability of users to correctly guess the result of their interaction with an icon. In this usability test, the information scent of icons is tested in context to identify if users can correctly infer icon functionality.

Visual Appeal: Icons contribute to brand personality through style and colour. The attractiveness of the icons used is tested by asking users to rate visual appeal on a scale of 1–7.

Test Tasks

In order to test navigation and layout, four scenarios resembling potential real life use of the app were designed.

Scenario 1

Metrics tested: Navigation, Layout

You have found an interesting image that you’d like to add to your “Doodles & Calligraphy” inspiration board. Starting from the home page, how would you do this?

Scenario 2

Metric tested: Navigation

You have just bought some apples and want to cross it off your shopping list. Starting from the home page, how would you do this?

Scenario 3

Metric tested: Layout

You would like to change the background colour of your shopping list to blue. Starting from the home page, how would you do this?

Scenario 4

Metrics tested: Navigation, Layout

You would like to view statistics for your moods on your mood tracker page. Starting from the shopping page, how would you do this?

Measuring success rate

Three factors are used to measure the success rate of the usability test: Time taken, Task completion rate and User satisfaction.

Time taken: The time taken to complete a task is an important measure in diagnosing usability issues, with long task times indicating problems in interaction with the UI.

Task completion rate: This is defined as the percentage of tasks completed correctly by the user. This measure is analysed in conjunction with eye tracking data to formulate conclusions regarding the reasons for failure or success.

User satisfaction: User satisfaction is the third independent measure of usability that is tested using a post-test questionnaire. A likert scale questionnaire with a balance of positively and negatively phrased questions was used to avoid acquiescence bias.

Pre-defined benchmarks

In order to analyse the quantitative data collected, a set of benchmarks were defined based on usability measures of digital app tasks similar to the test tasks. Mobile digital apps such as Pinterest, Notepad and MyFitnessPal were used as tasks similar to the test tasks could be carried out on these apps. Three users were asked to carry out tasks on these apps that mimicked the test tasks. Predefined benchmarks were set based on the results.

Participants were encouraged to complete benchmark tasks. This may have affected completion rates as few participants showed signs of frustration and readiness to quit with certain tasks, but were pushed to try and complete all tasks. A second limitation was that some participants were familiar with certain apps used to set benchmarks. These participants performed tasks quickly and at ease, which may have an impact on the average time taken to complete tasks.

Participant recruitment

Due to time constraints, a non-probability convenience sampling technique was used. However, the importance of recruitment criteria and selecting suitable participants was not overlooked. Results of the test may not be accurate if the test base does not reflect the target audience. Therefore, participant recruitment was based on the results of the user research survey. The results of the user research showed that over 97% of potential users were female. Therefore, selected participants were female users of planning diaries, journals and/ or apps. Furthermore, the user research survey indicated that the age group of the potential target market for the app ranged from 15 to 55. Therefore, a wide age range of female participants were recruited. A majority of both traditional and digital planner users stated that they had never heard of or used any digital bullet journal apps. Therefore, participants recruited for the study had no previous experience using digital bullet journal apps.

It is recommended that for a usability study to produce statistically significant results, at least 10–12 participants are required. However, studies have shown that almost 80% of usability problems presented by a set of given tasks was revealed after testing the product with 4 users. Other studies showed that high quality user interface and usability are achieved by competitive testing and parallel, iterative design. Therefore, a total of 5 participants were recruited and this usability test was treated as the first of a set of tests to identify and improve product weaknesses.

Recruited participants & their key characteristics

Due to an early decision regarding the location of the usability test within university campus, all participants were recruited within the university campus to ensure convenient participation and encourage a high show-up rate. No monetary or non-monetary incentives were provided.

Running the test

The test was run using an Open Retrospective Probing technique. Participant reactions were observed during the process and notes were made. These notes were used as the basis for retrospective probing after the session to better explain specific participant reactions to certain tasks. Furthermore, participants were encouraged to freely express their thoughts and opinions on the prototype during the retrospective probing session. The RP technique was selected over the CTA (Concurrent Think Aloud) technique so as to not interfere with usability metrics such as accuracy and time spent on task. The eye tracking usability test was run using Tobii eye tracking software in the SSU usability lab. Due to time constraints, a pilot test was not conducted. However, software and processes were checked beforehand to ensure smooth sessions. Details about the study and the processes used were explained to the participants and all participants were given a chance to clarify any doubts they may have had prior to taking part in the study. Participants were made aware of the recording processes used and were asked to sign a consent form .

All sessions started with calibrating participants eyes. Participants were given the set of four tasks to complete and in addition to collecting eye tracking data, participants were asked to fill out a short survey to understand their needs and level of satisfaction with the navigation and layout features of the tested prototype.

Data collection

Eye-tracking data was collected and visualised through gaze replays, gaze plots and heat maps.

Gaze replays were recorded as these are the most accurate method to analyse eye-tracking data. Due to the time consuming nature of gaze replay analysis, gaze plots and heat maps were also used during data analysis.

As the metrics tested were navigation and layout, it was necessary to collect and visualise data that revealed the order, location and fixation duration. Therefore, gaze plots were used to analyse usability.

In order to analyse the visual attention of the participants, heat map data was collected. The data was used to analyse the distribution of the users looking over the stimulus.

The following table shows the average time taken by participants to complete the tasks. All tasks were successfully completed by all participants.

Key results of the user satisfaction survey:

Analysis

It was hypothesised that users would interact better with a bottom horizontal menu due to familiarity with similar menus on other creative apps such as Instagram. However, gaze replays revealed that participants took an average of 1 minute and 12 seconds to notice the bottom horizontal navigation menu. On being questioned retrospectively, participants mentioned that they expected all the navigation links to be collectively placed on the top.

In agreement with existing research, heat maps revealed that participants scanned pages in an F-shaped pattern, suggesting that the navigation menu would be more effective if placed horizontally on the top.

Text labels were used for most navigational elements in order to ensure participants that they were on the right path to completing tasks and nudge them towards task completion. However, elements on the main horizontal navigation bar did not have any text labels, which were pointed out by participants as one of the reasons behind their inability to quickly spot these navigation icons. 3/5 participants stated that text labels were extremely helpful in navigating through the app. Furthermore, all participants expressed the need to have text labels on all elements for easy navigation.

It was hypothesised that implementing a shallow information architecture within the navigation menu would contribute to a positive user experience due to a smaller number of clicks required to complete a task. This could not be confirmed during this test due to the layout problems that were pointed out about the icons used in navigation menu, as well as due to the problems caused by the lack of text labels.

Heat map data shows that participants’ visual attention was focused on bigger icons on the top and centre

On an average, participants took 72.4 seconds to notice the icons on the bottom horizontal menu. Participants stated that their focus was on the bigger, labelled icons on the top and centre of the page, as confirmed by heat map data. Participants described difficulty in understanding the representation of certain icons. Furthermore, the inability of participants to recognise certain icons may have had an impact on their ability to rightly guess the result of their interaction with these icons. For example, all participants had difficulty understanding the representation of the “index” icon and relied on other clues such as change in icon colour when active, to decipher its meaning. Furthermore, some participants found the use of icons such as the “heart” icon misleading as they did not associate the heart icon with “moods”, but would have rather associated “moods” with a face icon.

Although participants had problems with the findability and recognition of icons,they agreed that the icons were generally visually appealing and matched the overall look and feel of the app. On comparing the average task completion time for all four tasks with the recorded benchmark completion times, it was observed that although participants took a considerably longer time to complete task 1 on the tested prototype, tasks 2–4 were completed within a fraction of the recorded benchmark times. Furthermore, participants who had previously used digital planning apps completed the first task with more speed and ease as compared with participants who stated that they only used traditional planning methods. All participants attributed faster completion times of tasks 2–4 to being more familiar with the UI after completing task 1. This confirms that although users were previously unfamiliar with digital bullet journal apps, the existing prototype UI and UX was effective and learnable once users had the time to initially explore and interact with the app.

Gaze plot of 21 yo participant vs gaze plot of 55 yo participant for task 1

While gaze plots revealed the contrast between the gaze of participants of different age groups, the pattern in which all participants completed tasks 2–4 within a significantly smaller period of time when compared with task 1 confirmed that user unfamiliarity may not be an issue as the UI was easily and quickly learned participants of all age groups. This was also confirmed by results of the survey in which all participants agreed that the navigation of the app was simple and predictable.

Gaze plot of 21 yo participant vs gaze plot of 55 yo participant for task 3

Recommendations

Recommendations for changes in information architecture

Data analysis revealed that users were able to understand the information architecture of the prototype and efficiently navigate and effectively use the app once they familiarised themselves with the user interface. Therefore, it may be useful to consider implementing a help menu to allow first time users to familiarise themselves with app features.

Recommendations for changes in interaction design

The position and size of the main navigation menu horizontally on the bottom was ineffective as users failed to notice the menu and focused on the other bigger and bolder elements on the top and centre of the page. It is suggested that the size of icons on the navigation menu be increased and that the navigation menu be moved to the horizontal top where there would be more quickly noticed by users.

Most icons were found and recognised by users; however, participants unanimously stated that using text labels for all icons would have facilitated navigation and increased effectiveness. Therefore, in addition to using text labels, it is recommended that further research is carried out to design icons that the target audience would easily recognise.