基于隐私政策条款和机器学习的应用分类

为了提升隐私政策可读性并评价其质量,提出一种基于机器学习的中文隐私政策条款自动分类方法。首先,确立条款分类指标体系,从不同类别条款中提取特征;其次,建立和训练基于机器学习算法的层次多标签分类模型,在测试集上通过实验对比各算法性能;最后,基于分类结果检测隐私政策的虚假性和完整性,同时设计了隐私政策评价方法对其进行评分。实验结果表明,支持向量机模型在分类效果上优于其他模型,准确率达到 86%,验证了该方法在自动分类隐私政策条款上的可行性。此外,对华为应用市场中1500 篇隐私政策检测发现,其中 38.5%不是隐私政策,余下隐私政策中92.5%的内容不完整,大部分得分偏低。

朱璋颖,陆亦恬,唐祝寿,张 燕

Privacy Policy Embedding for Chinese Words and Phrases

Features:

  1. 200-dimension vector representation.
  2. over one million english sentences in total.

A script for JEB

Features:

  1. Find cross reference to a package (eg., third-party library), you can use it to study how a package is used, to find the misuse of a library for instance.
  2. Works on the latest JEB, tested on jeb-demo-3.14.0.202002252048-JEBDecompilerDemo-121820464987384330.
  3. Developed by zoudeneng@PanguTeam.

enjoy!

基于自然语言处理的隐私政策自动表述研究

对来自华为应用市场的1,500份中文隐私政策进行检测,检测结果表明38.5%的隐私政策为虚假隐私政策,剩余合法的隐私政策中,92.5%的隐私政策在完整性方面不符合“自评估指南”的要求。在隐私政策自动表述的基础上,设计了一种隐私政策打分方法,实验结果表明大部分隐私政策的得分位于低分数区间内。

Review Embedding Corpus for English Words and Phrases Released (2019.2.19)

A. Features

  1. 200-dimension vector representation.
  2. 213,118 english sentences in total.
  3. Access via this Link and will be continuously updated.

B. Case: To find similar word

 

Janus Embedding Corpus for Chinese Words and Phrases Released (2019.2.15)

A. Features

  1. Phrases come from Janus.
  2. Coarse-grained segmentation.
  3. 200-dimension vector representation.
  4. 7957 apks and 232274 sentences in total till now.
  5. Access via this Link and will be continuously updated.

B. Case: To find similar word

 

enjoy!

GooglePlay Security (Monthly Recap, Jan. 2019)

1. Summary


This month, we evaluated apps on GooglePlay. 10,029 apps are collected from China, America, Russia and Turkey regions. Among these apps, we found 22 apps in total are malwares or graywares (termed as PHA by Google), they are:

(GooglePlay has removed some of these apps, but all of them can be accessed via Janus)

2. Interesting Findings


1. Most of the PHAs are Adwares.

2. Tricky SMS fraud apps take a variety of techniques to bypass the vetting process of GooglePlay, e.g.,e9a2786a318968184fabdc21244dae7ef1058de9 sends SMS under the control of C&C server, dfb182f6d277acc54a63a629794e4e2cba42dabc sends SMS  if it is lunched via AD network.

3. “Your are the winner, but you should pay for the delivery in advance”. The fraudulent story in web is now migrating to app, and 2ea95471a4f490b12afa138ab1ffe228a528d112, which targets the Russia user, is an instance.

4. End-users are enticed to pay, after that, they found they are fooled. 5e7322607a7d0575d4bee48115aaec4c700a9274 is the case.

 

3. About US


In 2014, Pangu Team (@panguteam) founded PWNZEN InfoTech Co., LTD, a startup company at Shanghai, China, and expanded its research team to the Pangu Lab, with more general research interests from iOS jailbreaking, to IoT security, App security auditing, Android security, etc.