0% found this document useful (0 votes)
47 views

PS Final Report

1) The document discusses measuring the effectiveness of privacy policies for voice assistant applications like Alexa and Google Assistant. 2) It analyzes over 1,000 skills on Amazon Alexa and 500 actions on Google Assistant to assess the quality and comprehensiveness of their privacy policies. 3) The study finds inconsistencies between some voice applications' privacy policies and their actual data practices as described in their applications.

Uploaded by

Devang Karuskar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views

PS Final Report

1) The document discusses measuring the effectiveness of privacy policies for voice assistant applications like Alexa and Google Assistant. 2) It analyzes over 1,000 skills on Amazon Alexa and 500 actions on Google Assistant to assess the quality and comprehensiveness of their privacy policies. 3) The study finds inconsistencies between some voice applications' privacy policies and their actual data practices as described in their applications.

Uploaded by

Devang Karuskar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Measuring the Effectiveness of Privacy Policies for Voice Assistant Applications

Karuskar Devangkumar Dhansukbhai, Bimlesh Kumar Shah


IIT Bombay, Mumbai.

Abstract
VAs like Amazon Alexa and Google Assistant are fast failing to comply with its privacy policy's General
and easily integrated into people's daily lives. VA is the Data Protection Regulation (EU GDPR). This
fastest-growing market. The rising use of VA services sanction was imposed because the company did
poses privacy concerns, such as the potential for not have a solid enough privacy policy and did not
private discussions and sensitive information to be disclose adequate information to users [10]. The
leaked. Users' privacy concerns are addressed, and Federal Trade Commission (FTC) or other
users are informed about data collecting, storage, and regulatory bodies may take public enforcement
sharing practices through privacy policies. Third-party action as a result of such anomalies [11]. For
developers can create new voice apps and upload them example, the FTC penalized Path (a mobile app
to the app store using the VA platforms (Amazon operator) $800,000 for failing to disclose all of its
Alexa and Google Assistant). Developers of voice apps data practice in its privacy policy [12].
must offer privacy policies to reveal their apps' data
practices. Third-party developers can use VA platforms to
create new voice apps (called skills on Amazon's
We want to learn more about the quality and usefulness platform and actions on Google's platform,
of privacy policies offered by developers in today's app respectively) and distribute them to app stores.
stores. We looked at 1000 skills on Amazon Alexa and Voice-app developers are expected to provide
500 actions on Google Assistant. Surprisingly, Google privacy policies and alert users of their
and Amazon have official voice-apps that are in applications' data practice in order to comply with
violation of their own privacy policies. privacy legislation (such as COPPA [13]) and
safeguard consumers' privacy. A proper privacy
1. Introduction:
policy is often a document that contains answers
Virtual assistants (VAs) like Amazon Alexa and to at least three key questions [14]: 1) What kind
Google Assistant have become a part of our of data is being gathered? 2) What is the purpose
everyday lives. By 2023, the number of users is of this data? and 3) With whom is the information
expected to reach 8.4 billion, which is larger than being shared? There are a lot of third-party talents
the present global population [1]. For normal life and actions in the separate marketplaces. For their
work like Ordering ordinary things, managing voice-apps, developers must post privacy policies.
bank accounts, operating smart home gadgets, and Policies may be varied and poorly stated, causing
recommending clothes stores and new designs are more users to disregard the privacy policy and
just a few of the questions that humans ask VA. choose not to read it. As a result, customers may
Despite the many useful benefits, there is growing utilize a privacy-sensitive service without fully
worry over VA users' privacy threats [2, 3, 4, 5, 6, comprehending the data acquired about them and
7, 8, 9]. what the developer intends to do with it. On the
other hand, the ability to control VA devices such
Every user wishes to safe their data and privacy as Amazon Echo and Google Assistant with voice
should be maintained. A French data protection commands without having to physically access
regulator penalized Google €50 million for them is a feature that makes them appealing.
Despite the convenience, it complicates the
creation of effective privacy notifications that
allow users to make educated privacy decisions.

1
privacy policy links, we observed that there are
five types of policy pages: i) normal html pages;
The following two research questions (RQs) are
ii) pdf pages; iii) Google doc and Google drive
the focus of this study:
documents; iv) txt files; and v) other types of files
• RQ1: How good are the privacy policies (e.g., doc, docx or rtf). For normal html pages, we
supplied by voice-app developers across various used the web driver [12] tool to collect the
VA platforms? Do they have clear and webpage content when they are opened. For the
comprehensive privacy policies, as required by other types of pages, we downloaded them and
VA platforms? then extracted the content from them. Finally, we
• RQ2: Can we trust a supposedly well-written converted all the privacy policies from different
privacy policy that contains critical information formats to the .txt format.
about the service given to users? Can we discover
inconsistencies in voice-app privacy policies? 3.1 Privacy Policy Dataset:

On both the Amazon Alexa and Google Assistant We collected 64,720 unique skills under 21
platforms, we conduct the first empirical study to categories from Alexa’s skills store, and 17,952 of
assess the effectiveness of voice-app developers' these skills provide privacy policy links. Among
privacy rules. Previously, no such endeavour had the 2,201 Google actions we collected, 1,967 have
been documented. privacy policy links.
We calculated the number of words in the
document for each skill/action with a valid policy
2. Literature Review:
link. It shows the cumulative distribution function
Privacy policy of rubetek.com shows that they of the privacy policy length. The average length is
collect personal data for their services and if we 2,336 words for Alexa skills and 1,479 words for
fail to provide the personal data then they won’t Google actions. We also observed many concise
provide their products and services. They collect privacy policies which are not relatable.
data related to your activities and functionalities
Voice app name Skill/ Privacy
such as sleeping patterns, movement data and
action policy
smart alarm related information. While sharing
the content with family or friends, they collect Story time Skill App is now
personal information of those people such as unavailable
names, email addresses, telephone numbers and KidsBrushYourTeeth Skill Nill
mailing addresses. As per their saying, various Song
things can be done by these data such as track of
orders, communicate with customers, conduct of 3.2 Description Dataset:
investigations regarding products and services.
One can withdraw their consent for the collection, Voice app descriptions are used to provide general
use and/or disclosure of personal data. information and working procedure to users. It
may also contain data practices (e.g., the data
Google has provided guidance to developers
required to be collected to achieve functionality)
building on the Actions on google platform. If the
of the voice app. We collected voice-app
customer’s action violates the policy, they will
descriptions and used them as baselines to detect
receive a notification with a specific reason for
potentially inconsistent privacy policies. In our
removal and rejection. They may take action
dataset, all skills/actions come with descriptions.
based on number of factors including, but not
limited to, a pattern of harmful behaviour or high 4. Capturing data practices:
risk of abuse.
We develop a keyword-based approach using
3. Data collection: NLP. However, we want to emphasize that we do
not claim to resolve challenges comprehensively
First during data collection we faced problem of
extracting data practices (i.e., data collection,
uniformity in privacy policy file like Given the
sharing, and storing) from natural language

2
policies. Instead, we mainly focus on obtaining 4.3 Phase extraction:
empirical evidence of problematic privacy
policies using a simple and accurate (i.e., in terms we combined two basic phrases to generate a
of the true positive) approach. longer phrase if they share the same verb. The
combined phrase would follow patterns:
We used spaCy: a free, open-source library for
advanced Natural Language Processing (NLP) in
"subject+verb+object" or "subject+is+passive
Python.
verb".
spaCy is explicitly designed for production use For example: for the sentence "Alexa skill will
and helps you build applications that process and quickly tell you the name and time of the next
“understand” large volumes of text. It can be used meeting on your Outlook calendar", we obtained
to build information extraction, natural language the phrase "Alex tell name, meeting, calendar".
understanding systems, or pre-process text for
deep learning. 5. Inconsistency checking:

4.1.Verb set related to data practices: We collected voice-app descriptions and used
them as baselines to detect potentially
Collect: collect, gather, or acquire data from inconsistent privacy policies. In our dataset, all
users. Use indicates an app would use or process skills/actions come with descriptions.
data.
For example, the description of skill "Thought
Retain: storing or remembering user data. Leaders" mentions "Permission required:
Disclose: means an app would share or transfer Customer’s Full name, Customer’s Email
data to another party. Address, Customer’s Phone number", but none of
them are mentioned in its privacy policy. We
4.2 Noun set related to data practices: consider it an incomplete privacy policy

From Amazon’s skill permission list [5] and 6. Missing required privacy policies:
Amazon Developer Services Agreement [3], we
manually collected a dictionary of 16 nouns We have shown some Google actions do not have
related to data practices. Table 2 lists a dictionary a privacy policy provided, which violates its own
with 40 verbs and 16 nouns that we used in our restriction "Google require all actions to post a
privacy policy analysis link to their privacy policy in the directory". To
collect user’s personal data for use within the
Verb set Access, Ask, Assign, Collect, voice-apps, developers can use the built-in feature
Create, Enter, Gather, Import, of collecting the personal information directly
Obtain, Observe, Organize, from their Amazon account after taking
Provide, Receive, Request, permissions from the user. This permission is
Share, Use, Include, Integrate, taken from the user when the skill is first enabled.
Monitor, Process, See, Utilize, While this is appropriate and respect the users
Retain, Cache, Delete, Erase, privacy, there is another channel that can be
Keep, Remove, Store, Transfer, misused for collecting personal information. A
Communicate, Disclose, developer can develop a skill to ask for the
Reveal, Sell, Send, Update, personal information from the user through the
View, Need, Require, Save conversational interface. Both Amazon and
Noun Set Address, Name, Email, Phone, Google prohibit the use of conversational
Birthday, Age, Gender, interface to collect personal data. But, in the case
Location, Data, Contact, of Amazon, this is not strictly enforced in the
Phonebook, SMS, Call, vetting process. By collecting personal
Profession, Income, Information information in this manner, the developer can
avoid adding a privacy policy URL to the skill’s
distribution requirements. This is possible
because Amazon requires only skills that publicly
declare that they collect personal information to

3
mandatorily have a privacy policy. The developer [3] Alexander Benlian, Johannes Klumpe, and Oliver
can easily bypass this requirement by lying about Hinz. Mitigating the intrusive effects of smart home
not collecting personal information. assistants by using anthropomorphic design features: A
7. Conclusion: multimethod investigation. Information Systems
Journal, pages 1–33, 2019
A substantial number of problematic privacy [4] H. Chung, M. Iorga, J. Voas, and S. Lee. “alexa,
policies exist in Amazon Alexa and Google can i trust you?”. IEEE Computer, 50(9):100–104,
Assistant platforms, a worrisome reality of 2017.
privacy policies on VA platforms. Google and
Amazon even have official voice-apps violating [5] Christine Geeng and Franziska Roesner. Who’s in
their own requirements regarding the privacy control?: Interactions in multiuser smart homes. In
policy. As applications are updated every day its Proceedings of the 2019 CHI Conference on Human
hard to track all data. In the first stage of checking Factor in Computing Systems (CHI’19), pages 1–13,
URL, we found a broken URL of application, also 2019.
many of applications also collected unnecessary
[6] Nathan Malkin, Joe Deatrick, Allen Tong, Primal
user data. And same application has different
Wijesekera, Serge Egelman, and David Wagner.
privacy policy for google and amazon platforms
Privacy attitudes of smart speaker users. In 19th
(Cross-platform inconsistency). We have
Privacy Enhancing Technologies Symposium (PETS),
gathered some suspicious evidence in revealing 2019.
issues over privacy policies on VA platforms.
[7] Graeme McLean and Kofi Osei-Frimpong. Hey
alexa: examine the variables influencing the use of
Future work : artificial intelligent in-home voice assistants.
Computers in Human Behavior, 99:28 – 37, 2019.
We plan to use machine learning techniques to
train a model to identify data practices from [8] Faysal Shezan, Hang Hu, Jiamin Wang, Gang
natural language documents. Wang, and Yuan Tian. Read between the lines: An
The dataset of Google actions that we collected empirical measurement of sensitive applications of
and used for our study is not complete and does voice personal assistant systems. In Proceedings of The
not contain all the voice-apps available in the app Web Conference (WWW), 2020.
store. [9] Maurice E. Stucke and Ariel Ezrachi. How digital
assistants can harm our economy, privacy, and
Limitation: democracy. Berkeley Technology Law Journal,
32(3):1240– 1299, 2017.
We are unable to examine the actual source code
[10] Google fined €50 million for GDPR violation in
of voice apps. There are a lot of data to handle and
France.
train due to specification constrained. We can use https://www.theverge.com/2019/1/21/18191591/googl
all the data sets which we collected. Also, as e-gdpr-fine-50-millioneuros-data-consent-cnil/.
applications get updated, there will be some
changes to their privacy policy. So it is tough to [11] Sebastian Zimmeck, Ziqi Wang, Lieyong Zou,
track all data as they frequently updated. Roger Iyengar, Bin Liu, Florian Schaub, Shormir
Wilson, Norman Sadeh, Steven M. Bellovin, and Joel
Reidenberg. Automated analysis of privacy
References: requirements for mobile apps. In 24th Network &
Distributed System Security Symposium (NDSS
[1]. Number of digital voice assistants in use 2017), 2017.
worldwide from 2019 to 2023. [12] Snapchat Transmitted Users’ Location and
https://www.statista.com/statistics/973815/worldwide Collected Their Address Books Without Notice Or
-digital-voiceassistant-in-use/. Consent.
[2] Tawfiq Ammari, Jofish Kaye, Janice Y. Tsai, and https://www.orrick.com/Insights/2013/02/FTCAssess
Frank Bentley. Music, search, and iot: How people es-800000-Fine-Against-Mobile-App-Operator-and-
(really) use voice assistants. ACM Transactions on Issues-MobilePrivacy-and-Security-Guidance
ComputerHuman Interaction (TOCHI), 26(3):1–28, [13] Noah Apthorpe, Sarah Varghese, and Nick
2019 Feamster. Evaluating the contextual integrity of

4
privacy regulation: Parents’ iot toy privacy norms
versus COPPA. In 28th USENIX Security Symposium
(USENIX Security), 2019.

[14] Google Privacy Policy Guidance.


https://developers.google.com/assistant/console/
policies/privacy-policy-guide.

Some Result picture:

Fig: found some missing privacy policy during


research
Fig: latest rubetek application description

Fig: privious rubetek application description

You might also like